Re: combining xml and nutch index in solr

2011-08-02 Thread abhayd
hi
thanks, That's exactly what i want

as far as I know we can not update solr index with partial values it does
not update the index record, it gets recreated.

so I m not sure how solrindex command will work here

--
View this message in context: 
http://lucene.472066.n3.nabble.com/combining-xml-and-nutch-index-in-solr-tp3209911p3218125.html
Sent from the Solr - User mailing list archive at Nabble.com.


xpath expression not working

2011-08-02 Thread abhayd
hi 
I have a xml doc whichi would like to index using xpath entity processor.
add
doc
 id1/id
 detailsxyz/details
/doc
doc
 id2/id
 detailsxyz2/details
/doc
/add

if i want to just load document with id=2 how would that work? 

I tried xpath expression that works with xpath tools but not in solr. 

dataConfig
dataSource type=FileDataSource /
document
entity name=f processor=FileListEntityProcessor
baseDir=c:\temp fileName=promotions.xml 
recursive=false rootEntity=false dataSource=null
entity name=x processor=XPathEntityProcessor
forEach=/add/doc url=${f.fileAbsolutePath} pk=id
field column=id xpath=/add/doc/[id=2]/id/
/entity
/entity
/document
/dataConfig

Any help how i can do this?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/xpath-expression-not-working-tp3218133p3218133.html
Sent from the Solr - User mailing list archive at Nabble.com.


SEVERE: org.apache.solr.common.SolrException: Error loading class 'solr.ICUTokenizerFactory'

2011-08-02 Thread Satish Talim
I am using Solr 3.3 on a Windows box.

I want to use the solr.ICUTokenizerFactory in my schema.xml and added the
fieldType name=text_icu as per the URL -
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ICUTokenizerFactory

I also added the following files to my apache-solr-3.3.0\example\lib folder:
lucene-icu-3.3.0.jar
lucene-smartcn-3.3.0.jar
icu4j-4_8.jar
lucene-stempel-3.3.0.jar

When I start my Solr server from apache-solr-3.3.0\example folder:
java -jar start.jar

I get the following errors:

SEVERE: org.apache.solr.common.SolrException: Error loading class
'solr.ICUTokenizerFactory'

SEVERE: org.apache.solr.common.SolrException: analyzer without class or
tokenizer  filter list

SEVERE: org.apache.solr.common.SolrException: Unknown fieldtype 'text_icu'
specified on field subject

I tried adding various other jar files to the lib folder but it does not
help.

What am I doing wrong?

Satish


Solr 3.3 crashes after ~18 hours?

2011-08-02 Thread alexander sulz

Hello folks,

I'm using the latest stable Solr release - 3.3 and I encounter strange 
phenomena with it.
After about 19 hours it just crashes, but I can't find anything in the 
logs, no exceptions, no warnings,

no suspicious info entries..

I have an index-job running from 6am to 8pm every 10 minutes. After each 
job there is a commit.

An optimize-job is done twice a day at 12:15pm and 9:15pm.

Does anyone have an idea what could possibly be wrong or where to look 
for further debug info?


regards and thank you
 alex


Re: Solr 3.3 crashes after ~18 hours?

2011-08-02 Thread Bernd Fehling

Any JAVA_OPTS set?

Do not use -XX:+OptimizeStringConcat or -XX:+AggressiveOpts flags.


Am 02.08.2011 12:01, schrieb alexander sulz:

Hello folks,

I'm using the latest stable Solr release - 3.3 and I encounter strange 
phenomena with it.
After about 19 hours it just crashes, but I can't find anything in the logs, no 
exceptions, no warnings,
no suspicious info entries..

I have an index-job running from 6am to 8pm every 10 minutes. After each job 
there is a commit.
An optimize-job is done twice a day at 12:15pm and 9:15pm.

Does anyone have an idea what could possibly be wrong or where to look for 
further debug info?

regards and thank you
alex


Re: Solr 3.3 crashes after ~18 hours?

2011-08-02 Thread alexander sulz

Nope, none :/

Am 02.08.2011 12:33, schrieb Bernd Fehling:

Any JAVA_OPTS set?

Do not use -XX:+OptimizeStringConcat or -XX:+AggressiveOpts flags.


Am 02.08.2011 12:01, schrieb alexander sulz:

Hello folks,

I'm using the latest stable Solr release - 3.3 and I encounter 
strange phenomena with it.
After about 19 hours it just crashes, but I can't find anything in 
the logs, no exceptions, no warnings,

no suspicious info entries..

I have an index-job running from 6am to 8pm every 10 minutes. After 
each job there is a commit.

An optimize-job is done twice a day at 12:15pm and 9:15pm.

Does anyone have an idea what could possibly be wrong or where to 
look for further debug info?


regards and thank you
alex




performance crossover between single index and sharding

2011-08-02 Thread Bernd Fehling

Is there any knowledge on this list about the performance
crossover between a single index and sharding and
when to change from a single index to sharding?

E.g. if index size is larger than 150GB and num of docs is
more than 25 mio. then it is better to change from single index
to sharding and have two shards.
Or something like this...

Sure, solr might even handle 50 mio. docs but performance is going down
and a sharded system with distributed search will be faster than
a single index, or not?

Is a single index always fast than sharding?

Regards
Bernd


Re: Solr 3.3 crashes after ~18 hours?

2011-08-02 Thread Markus Jelsma
Strange, anything out of the ordinary in the syslog?

On Tuesday 02 August 2011 12:01:35 alexander sulz wrote:
 Hello folks,
 
 I'm using the latest stable Solr release - 3.3 and I encounter strange
 phenomena with it.
 After about 19 hours it just crashes, but I can't find anything in the
 logs, no exceptions, no warnings,
 no suspicious info entries..
 
 I have an index-job running from 6am to 8pm every 10 minutes. After each
 job there is a commit.
 An optimize-job is done twice a day at 12:15pm and 9:15pm.
 
 Does anyone have an idea what could possibly be wrong or where to look
 for further debug info?
 
 regards and thank you
   alex

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Re: Solr 3.3 crashes after ~18 hours?

2011-08-02 Thread Pranav Prakash
What do you mean by it just crashes? Does the process stops execution? Does
it takes too long to respond which might result in lots of 503s in your
application? Does the system run out of resources?

Are you indexing and serving from the same server? It happened once with us
that Solr was performing commit and then optimize while the load from app
server was at its peak. This caused slow response from search server, which
caused requests getting stacked up at app server and causing 503s. Could you
look if you have a similar syndrome?

*Pranav Prakash*

temet nosce

Twitter http://twitter.com/pranavprakash | Blog http://blog.myblive.com |
Google http://www.google.com/profiles/pranny


On Tue, Aug 2, 2011 at 15:31, alexander sulz a.s...@digiconcept.net wrote:

 Hello folks,

 I'm using the latest stable Solr release - 3.3 and I encounter strange
 phenomena with it.
 After about 19 hours it just crashes, but I can't find anything in the
 logs, no exceptions, no warnings,
 no suspicious info entries..

 I have an index-job running from 6am to 8pm every 10 minutes. After each
 job there is a commit.
 An optimize-job is done twice a day at 12:15pm and 9:15pm.

 Does anyone have an idea what could possibly be wrong or where to look for
 further debug info?

 regards and thank you
  alex



RE: changing the root directory where solrCloud stores info inside zookeeper File system

2011-08-02 Thread Yatir Ben Shlomo
Thanks A lot mark,
Since My SolrCloud code was old I tried downloading and building the
newest code from here
https://svn.apache.org/repos/asf/lucene/dev/trunk/
I am using tomcat6
I manually created the sc sub-directory in my zooKeeper ensemble
file-system
I used this connection String to my ZK ensemble
zook1:2181/sc,zook2:2181/sc,zook3:2181/sc
but I still get the same problem
here is the entire catalina.out log with the exception

Using CATALINA_BASE:   /opt/tomcat6
Using CATALINA_HOME:   /opt/tomcat6
Using CATALINA_TMPDIR: /opt/tomcat6/temp
Using JRE_HOME:/usr/java/default/
Using CLASSPATH:   /opt/tomcat6/bin/bootstrap.jar
Java HotSpot(TM) 64-Bit Server VM warning: Failed to reserve shared memory
(errno = 12).
Aug 2, 2011 4:28:46 AM org.apache.catalina.core.AprLifecycleListener init
INFO: The APR based Apache Tomcat Native library which allows optimal
performance in production environments was not found on the
java.library.path:
/usr/java/jdk1.6.0_21/jre/lib/amd64/server:/usr/java/jdk1.6.0_21/jre/lib/a
md64:/usr/java/jdk1.6.0_21/jre/../lib/amd64:/usr/java/packages/lib/amd64:/
usr/lib64:/lib64:/lib:/usr/lib
Aug 2, 2011 4:28:46 AM org.apache.coyote.http11.Http11Protocol init
INFO: Initializing Coyote HTTP/1.1 on http-8983
Aug 2, 2011 4:28:46 AM org.apache.coyote.http11.Http11Protocol init
INFO: Initializing Coyote HTTP/1.1 on http-8080
Aug 2, 2011 4:28:46 AM org.apache.catalina.startup.Catalina load
INFO: Initialization processed in 448 ms
Aug 2, 2011 4:28:46 AM org.apache.catalina.core.StandardService start
INFO: Starting service Catalina
Aug 2, 2011 4:28:46 AM org.apache.catalina.core.StandardEngine start
INFO: Starting Servlet Engine: Apache Tomcat/6.0.29
Aug 2, 2011 4:28:46 AM org.apache.catalina.startup.HostConfig
deployDescriptor
INFO: Deploying configuration descriptor solr1.xml
Aug 2, 2011 4:28:46 AM org.apache.solr.core.SolrResourceLoader
locateSolrHome
INFO: Using JNDI solr.home: /home/tomcat/solrCloud1
Aug 2, 2011 4:28:46 AM org.apache.solr.core.SolrResourceLoader init
INFO: Solr home set to '/home/tomcat/solrCloud1/'
Aug 2, 2011 4:28:46 AM org.apache.solr.servlet.SolrDispatchFilter init
INFO: SolrDispatchFilter.init()
Aug 2, 2011 4:28:46 AM org.apache.solr.core.SolrResourceLoader
locateSolrHome
INFO: Using JNDI solr.home: /home/tomcat/solrCloud1
Aug 2, 2011 4:28:46 AM org.apache.solr.core.CoreContainer$Initializer
initialize
INFO: looking for solr.xml: /home/tomcat/solrCloud1/solr.xml
Aug 2, 2011 4:28:46 AM org.apache.solr.core.CoreContainer init
INFO: New CoreContainer 853527367
Aug 2, 2011 4:28:46 AM org.apache.solr.core.SolrResourceLoader
locateSolrHome
INFO: Using JNDI solr.home: /home/tomcat/solrCloud1
Aug 2, 2011 4:28:46 AM org.apache.solr.core.SolrResourceLoader init
INFO: Solr home set to '/home/tomcat/solrCloud1/'
Aug 2, 2011 4:28:46 AM org.apache.solr.cloud.SolrZkServerProps
getProperties
INFO: Reading configuration from: /home/tomcat/solrCloud1/zoo.cfg
Aug 2, 2011 4:28:46 AM org.apache.solr.core.CoreContainer initZooKeeper
INFO: Zookeeper client=zook1:2181/sc,zook2:2181/sc,zook3:2181/sc
Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv
INFO: Client environment:zookeeper.version=3.3.1-942149, built on
05/07/2010 17:14 GMT
Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv
INFO: Client environment:host.name=ob1079.nydc1.outbrain.com
Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv
INFO: Client environment:java.version=1.6.0_21
Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv
INFO: Client environment:java.vendor=Sun Microsystems Inc.
Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv
INFO: Client environment:java.home=/usr/java/jdk1.6.0_21/jre
Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv
INFO: Client environment:java.class.path=/opt/tomcat6/bin/bootstrap.jar
Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv
INFO: Client
environment:java.library.path=/usr/java/jdk1.6.0_21/jre/lib/amd64/server:/
usr/java/jdk1.6.0_21/jre/lib/amd64:/usr/java/jdk1.6.0_21/jre/../lib/amd64:
/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv
INFO: Client environment:java.io.tmpdir=/opt/tomcat6/temp
Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv
INFO: Client environment:java.compiler=NA
Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv
INFO: Client environment:os.name=Linux
Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv
INFO: Client environment:os.arch=amd64
Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv
INFO: Client environment:os.version=2.6.18-194.8.1.el5
Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv
INFO: Client environment:user.name=tomcat
Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv
INFO: Client environment:user.home=/home/tomcat
Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv
INFO: Client 

Re: SEVERE: org.apache.solr.common.SolrException: Error loading class 'solr.ICUTokenizerFactory'

2011-08-02 Thread Robert Muir
did you add the analysis-extras jar itself? thats what has this factory.

On Tue, Aug 2, 2011 at 5:03 AM, Satish Talim satish.ta...@gmail.com wrote:
 I am using Solr 3.3 on a Windows box.

 I want to use the solr.ICUTokenizerFactory in my schema.xml and added the
 fieldType name=text_icu as per the URL -
 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ICUTokenizerFactory

 I also added the following files to my apache-solr-3.3.0\example\lib folder:
 lucene-icu-3.3.0.jar
 lucene-smartcn-3.3.0.jar
 icu4j-4_8.jar
 lucene-stempel-3.3.0.jar

 When I start my Solr server from apache-solr-3.3.0\example folder:
 java -jar start.jar

 I get the following errors:

 SEVERE: org.apache.solr.common.SolrException: Error loading class
 'solr.ICUTokenizerFactory'

 SEVERE: org.apache.solr.common.SolrException: analyzer without class or
 tokenizer  filter list

 SEVERE: org.apache.solr.common.SolrException: Unknown fieldtype 'text_icu'
 specified on field subject

 I tried adding various other jar files to the lib folder but it does not
 help.

 What am I doing wrong?

 Satish




-- 
lucidimagination.com


indexing taking very long time

2011-08-02 Thread Naveen Gupta
Hi

We have a requirement where we are indexing all the messages of a a thread,
a thread may have attachment too . We are adding to the solr for indexing
and searching for applying few business rule.

For a user, we have almost many threads (100k) in number and each thread may
be having 10-20 messages.

Now what we are finding is that it is taking 30 mins to index the entire
threads.

When we run optimize then it is taking faster time.

The question here is that how frequently this optimize should be called and
when ?

Please note that we are following commit strategy (that is every after 10k
threads, commit is called). we are not calling commit after every doc.

Secondly how can we use multi threading from solr perspective in order to
improve jvm and other utilization ?


Thanks
Naveen


DIH + signature

2011-08-02 Thread jodehaes
Hi,

I'm using solr 3.3 and want to add a signature field to solr to later be
able to deduplicate search results using field collapsing.  I'm using DIH to
fill solr.

Extract from solrconfig.xml

updateRequestProcessorChain name=dedupe
processor
class=solr.update.processor.SignatureUpdateProcessorFactory
  bool name=enabledtrue/bool
  bool name=overwriteDupesfalse/bool
  str name=signatureFieldsignature/str
  str name=fieldsctcontent/str
  str
name=signatureClasssolr.update.processor.Lookup3Signature/str
/processor
processor class=solr.LogUpdateProcessorFactory /
processor class=solr.RunUpdateProcessorFactory /
/updateRequestProcessorChain

requestHandler name=/dataimport
class=org.apache.solr.handler.dataimport.DataImportHandler
  lst name=defaults
  str name=configdata-config.xml/str
  str name=update.processordedupe/str
/lst
/requestHandler

in the schema.xml there is:

field name=signature type=string indexed=true stored=true
multiValued=false /
and
field name=ctcontent type=text_nl_splitting indexed=true
stored=true termVectors=on termPositions=on termOffsets=on/

When I run a full-import however the signature field remains empty.  Any
insight on what I'm doing wrong would be greatly appreciated!

Kind regards,

Jo

--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-signature-tp3218813p3218813.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 3.3 crashes after ~18 hours?

2011-08-02 Thread wakemaster 39
Monitor your memory usage.  I use to encounter a problem like this before
where nothing was in the logs and the process was just gone.

Turned out my system was out odd memory and swap got used up because of
another process which then forced the kernel to start killing off processes.
Google OOM linux and you will find plenty of other programs and people with
a similar problem.

Cameron
On Aug 2, 2011 6:02 AM, alexander sulz a.s...@digiconcept.net wrote:
 Hello folks,

 I'm using the latest stable Solr release - 3.3 and I encounter strange
 phenomena with it.
 After about 19 hours it just crashes, but I can't find anything in the
 logs, no exceptions, no warnings,
 no suspicious info entries..

 I have an index-job running from 6am to 8pm every 10 minutes. After each
 job there is a commit.
 An optimize-job is done twice a day at 12:15pm and 9:15pm.

 Does anyone have an idea what could possibly be wrong or where to look
 for further debug info?

 regards and thank you
 alex


Re: Solr 3.3 crashes after ~18 hours?

2011-08-02 Thread François Schiettecatte
Assuming you are running on Linux, you might want to check /var/log/messages 
too (the location might vary), I think the kernel logs forced process 
termination there. I recall that the kernel will usually picks the process 
consuming the most memory, there may be other factors involved too.

François

On Aug 2, 2011, at 9:04 AM, wakemaster 39 wrote:

 Monitor your memory usage.  I use to encounter a problem like this before
 where nothing was in the logs and the process was just gone.
 
 Turned out my system was out odd memory and swap got used up because of
 another process which then forced the kernel to start killing off processes.
 Google OOM linux and you will find plenty of other programs and people with
 a similar problem.
 
 Cameron
 On Aug 2, 2011 6:02 AM, alexander sulz a.s...@digiconcept.net wrote:
 Hello folks,
 
 I'm using the latest stable Solr release - 3.3 and I encounter strange
 phenomena with it.
 After about 19 hours it just crashes, but I can't find anything in the
 logs, no exceptions, no warnings,
 no suspicious info entries..
 
 I have an index-job running from 6am to 8pm every 10 minutes. After each
 job there is a commit.
 An optimize-job is done twice a day at 12:15pm and 9:15pm.
 
 Does anyone have an idea what could possibly be wrong or where to look
 for further debug info?
 
 regards and thank you
 alex



Re: Different options for autocomplete/autosuggestion

2011-08-02 Thread Erick Erickson
You have to tell us more information about what not right means.
Please review:

http://wiki.apache.org/solr/UsingMailingLists

Best
Erick

On Wed, Jul 27, 2011 at 6:12 AM, scorpking lehoank1...@gmail.com wrote:
 HI Bell,
 i used autocomplete in solr 3.1. same this:

  searchComponent name=autocomplete class=solr.SpellCheckComponent
    lst name=spellchecker
      str name=nameautocomplete/str
      str
 name=classnameorg.apache.solr.spelling.suggest.Suggester/str
      str
 name=lookupImplorg.apache.solr.spelling.suggest.jaspell.JaspellLookup/s
 tr
      str name=fieldautocomplete/str
      str name=buildOnCommittrue/str
 /lst

 and i make following URL*
 http://solr.pl/en/2010/11/15/solr-and-autocomplete-part-2/* to index my
 data. and had a problem. with one word, it have done very good. But when i
 typed more two words, rerults return not right. I don't know why? Can any
 one know this problem? Thanks for your help.

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Different-options-for-autocomplete-autosuggestion-tp2678899p3203032.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr Master-slave master failover without data loss

2011-08-02 Thread Erick Erickson
Not OOB. You say that the index updates, but if the data hasn't been
committed, it isn't really in the index. After the commit (which varies
time-wise depending on merges etc.) the next replication from the slave
should get the new index, regardless of whether the master has gone down
or not.

One way to handle this issue is to re-index data from some time before the
master went down, relying on the uniqueKey to replace any duplicate
documents

Best
Erick

On Wed, Jul 27, 2011 at 10:43 AM, Nagendraprasad
nagu.nutalap...@gmail.com wrote:
 Suppose master goes down immediately after the index updates, while the
 updates haven't been replicated to the slaves, data loss seems to happen.
 Does Solr have any mechanism to deal with that?

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-Master-slave-master-failover-without-data-loss-tp3203644p3203644.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: DIH + signature

2011-08-02 Thread jodehaes
Follow-up on this issue.

I eventually found the problem.

The naming scheme changed from solr 3.2 onwards.

The line as it states in the documentation:
str name=update.processordedupe/str

should now be:
str name=update.chaindedupe/str

https://issues.apache.org/jira/browse/SOLR-2105


--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-signature-tp3218813p3218979.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: xpath expression not working

2011-08-02 Thread karsten-solr
Hi abhayd,

XPathEntityProcessor does only support a subset of xpath,
like div[@id=2] but not [id=2]
Take a look to
https://issues.apache.org/jira/browse/SOLR-1437#commentauthor_12756469_verbose

I solve this problem by using xslt a preprocessor (with full xpath).

The drawback is performance wasting: See
http://lucene.472066.n3.nabble.com/DIH-Enhance-XPathRecordReader-to-deal-with-body-FLATTEN-true-and-body-h1-td2799005.html

Best regards
  Karsten

 Original-Nachricht 
 Datum: Mon, 1 Aug 2011 23:21:45 -0700 (PDT)
 Von: abhayd ajdabhol...@hotmail.com
 An: solr-user@lucene.apache.org
 Betreff: xpath expression not working

 hi 
 I have a xml doc whichi would like to index using xpath entity processor.
 add
 doc
  id1/id
  detailsxyz/details
 /doc
 doc
  id2/id
  detailsxyz2/details
 /doc
 /add
 
 if i want to just load document with id=2 how would that work? 
 
 I tried xpath expression that works with xpath tools but not in solr. 
 
 dataConfig
 dataSource type=FileDataSource /
 document
 entity name=f processor=FileListEntityProcessor
 baseDir=c:\temp fileName=promotions.xml 
 recursive=false rootEntity=false dataSource=null
 entity name=x processor=XPathEntityProcessor
 forEach=/add/doc url=${f.fileAbsolutePath} pk=id
 field column=id xpath=/add/doc/[id=2]/id/
 /entity
 /entity
 /document
 /dataConfig
 
 Any help how i can do this?
 
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/xpath-expression-not-working-tp3218133p3218133.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Store complete XML record (DIH XPathEntityProcessor)

2011-08-02 Thread karsten-solr
Hi g, Hi Chantal

I had the same problem.
You can use XPathEntityProcessor but you have to insert an xsl. The drawback is 
performance wasting: See
http://lucene.472066.n3.nabble.com/DIH-Enhance-XPathRecordReader-to-deal-with-body-FLATTEN-true-and-body-h1-td2799005.html

Best regards
  Karsten

 Original-Nachricht 
 Datum: Mon, 1 Aug 2011 12:17:45 +0200
 Von: Chantal Ackermann chantal.ackerm...@btelligent.de
 An: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Betreff: Re: Store complete XML record  (DIH  XPathEntityProcessor)

 Hi g,
 
 ok, I understand your problem, now. (Sorry for answering that late.)
 
 I don't think PlainTextEntityProcessor can help you. It does not take a
 regex. LineEntityProcessor does but your record elements probably do not
 come on their own line each and you wouldn't want to depend on that,
 anyway.
 
 I guess you would be best off writing your own entity processor - maybe
 by extending XPath EP if that gives you some advantage. You can of
 course also implement your own importer using SolrJ and your favourite
 XML parser framework - or any other programming language.
 
 If you are looking for a config-only solution - i'm not sure that there
 is one. Someone else might be able to comment on that?
 
 Cheers,
 Chantal
 
 
 On Thu, 2011-07-28 at 19:17 +0200, solruser@9913 wrote:
  Thanks Chantal
  I am ok with the second call and I already tried using that. 
 Unfortunatly
  It reads the whole file into a field.  My file is as below example
  xml  
record 
... 
/record

record 
... 
/record
   
 record 
... 
/record
  
  /xml
  
  Now the XPATH does the 'for each /record' part.  For each record I also
 need
  to store the raw log in there.  If I use the  PlainTextEntityProcessor
 then
  it gives me the whole file (from xml .. /xml ) and not each of the
  record /record
  
  Am I using the PlainTextEntityProcessor wrong?
  
  THanks
  g
  
  
  --
  View this message in context:
 http://lucene.472066.n3.nabble.com/Store-complete-XML-record-DIH-XPathEntityProcessor-tp3205524p3207203.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 


Re: Matching queries on a per-element basis against a multivalued field

2011-08-02 Thread Mike Sokolov

You have a few choices:

1) flatten your field structure - like your undesirable example, but 
wouldn't you want to have the document identifier as a field value also?


2) use phrase queries to make sure the key/value pairs are adjacent

3) use a join query

That's all I can think of

-Mike

On 08/01/2011 08:08 PM, Suk-Hyun Cho wrote:

I'm sure someone asked this before, but I couldn't find a previous post
regarding this.


The problem:


Let's say that I have a multivalued field called myFriends that tokenizes on
whitespaces. Basically, I'm treating it like a List of Lists (attributes of
friends):


Document A:

myFriends = [
 isCool=true SOME_JUNK_HERE gender=male bloodType=A
]

Document B:

myFriends = [
 isCool=true SOME_JUNK_HERE gender=female bloodType=O,
 isCool=false SOME_JUNK_HERE gender=male bloodType=AB
]

Now, let's say that I want to search for all the cool male friends I have.
Naively, I can query q=myFriends:isCool=true+AND+myFriends:gender=male.
However, this returns documents A and B, because the two criteria are tested
against the entire collection, rather than against individual elements.


I could work around this by not tokenizing on whitespaces and using
wildcards:


q=myFriends:isCool=true\ *\ gender=male


but this becomes painful when the query becomes more complex. What if I
wanted to find cool friends who are either type A or type O? I could do
q=myFriends:(isCool=true\ *\ bloodType=A+OR+isCool=true\ *\ bloodType=O).
And you can see that the number of criteria will just explode as queries get
more complex.


There are other methods that I've considered, such as duplicating documents
for every friend, like so:


Document A1:

myFriend = [
 isCool=true,
 gender=male,
 bloodType=A
]

Document B1:

myFriend = [
 isCool=true,
 gender=female,
 bloodType=O
]

Document B2:

myFriend = [
 isCool=false,
 gender=male,
 bloodType=AB
]

but this would be less than desirable.

I would like to hear any other ideas around solving this problem, but going
back to the original question, is there a way to match multiple criteria on
a per-item basis rather than against the entire multifield?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Matching-queries-on-a-per-element-basis-against-a-multivalued-field-tp3217432p3217432.html
Sent from the Solr - User mailing list archive at Nabble.com.
   


Re: performance crossover between single index and sharding

2011-08-02 Thread Shawn Heisey

On 8/2/2011 4:44 AM, Bernd Fehling wrote:

Is there any knowledge on this list about the performance
crossover between a single index and sharding and
when to change from a single index to sharding?

E.g. if index size is larger than 150GB and num of docs is
more than 25 mio. then it is better to change from single index
to sharding and have two shards.
Or something like this...

Sure, solr might even handle 50 mio. docs but performance is going down
and a sharded system with distributed search will be faster than
a single index, or not?


The answer I've always seen here boils down to it depends on a large 
number of variables unique to every situation.  The nature of your data 
will affect things, like the number of fields, number of unique terms 
per field, etc.  If you have really complicated queries, that will slow 
things down.


Probably the greatest limiting factor is memory.  Having enough free 
memory to fit the entire index into the operating system's disk cache is 
the best thing you can do for performance.  This is memory over and 
above whatever you give to your Java heap.  If you have a 150GB index 
and you can afford machines with at least 192GB of RAM, a single index 
would perform very well, once it is warmed up.  Performance on a cold 
index would not be very good.  In a sharded scenario, you want to try 
and size each machine so that its piece fits into RAM.


Next would be disk I/O.  Any data that won't fit in the disk cache must 
be retrieved from disk, which is typically the weakest link in the 
chain.  If you can put your index on solid state disks, that's almost as 
good as having the index entirely in memory.  Performance on a cold 
index with SSD would be incredible.


Having a lot of high speed CPU available will help, but not as much as 
memory and I/O.


Index rebuild time is another consideration that might lead you to go 
distributed, as long as your data source can keep up with multiple readers.


My own index is too big to fit in RAM, even sharded.  Each of the six 
large shards is getting close to 19GB.  Each machine has 14GB of RAM 
(it's a virtual environment with three large shards per physical host) 
and has 3GB allocated to Java.  I am in the process of upgrading the 
memory, at which point it will fit, but our growth will exceed the 
maximum server memory again in the next year or so.  I have plans to 
eliminate the virtualization and have three shards in cores on each server.


I know this isn't really what you were looking for, but there are no 
simple answers to your question.


Thanks,
Shawn



How to cut off hits with score below threshold?

2011-08-02 Thread Otis Gospodnetic
Hello,

If one wanted to cut off hits whose score is below some threshold (I know, I 
know, one doesn't typically want to do this), what are the most elegant options?
I can think of 2 options, but I wonder if there are better choices:

1) custom Collector (problem: one can't specify a custom Collector via an API, 
so one would have to modify Solr source code)

2) custom SearchComponent that filters hits with score  threshold (problem: if 
hits are removed from results then too few hits will be returned to the client, 
so one has to either request more rows from Solr or re-request more hits or do 
both to avoid this problem)

Is there something better one can do?

Thanks,
Otis

Sematext is hiring Search Engineers -- http://sematext.com/about/jobs.html


Re: Matching queries on a per-element basis against a multivalued field

2011-08-02 Thread karsten-solr
Hi Suk-Hyun Cho,

if myFriend is the unit of retrieval you should use this as lucene document 
with the fields isCool gender bloodType ...

if you realy want to insert all myFriends in one field like your
myFriends = [
isCool=true SOME_JUNK_HERE gender=female bloodType=O,
isCool=false SOME_JUNK_HERE gender=male bloodType=AB
]
example, you can use SpanQueries

http://www.lucidimagination.com/blog/2009/07/18/the-spanquery/

with SpanNotQuery you can search for all isCool true and gender male where 
no other isCool is between both phrases.

Best regards
  Karsten


P.S. see in context
http://lucene.472066.n3.nabble.com/Matching-queries-on-a-per-element-basis-against-a-multivalued-field-td3217432.html


RE: Spell Check

2011-08-02 Thread Dyer, James
The most likely problem is forgetting to specify spellcheck.build=true on the 
first query since the last restart.  This builds the spell check dictionary 
used by the IndexBasedSpellChecker.  You should put this in a warming query or 
alternatively, specify build-on-commit or build-on-optimize.

It also looks like str name=queryAnalyzerFieldTypetextSpell/str should 
probably be str name=queryAnalyzerFieldTypetextSpellPhrase/str .

Finally, if you've done a build and changing the query Analyzer field type 
doesn't help, then you have to wonder if dizeagar exists somewhere in your 
data.  If the keyword exists in the spelling dictionary, Solr's spellchecker 
will not try to correct it.  See 
https://issues.apache.org/jira/browse/SOLR-2585 for a potential solution to 
this problem.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: tamanjit.bin...@yahoo.co.in [mailto:tamanjit.bin...@yahoo.co.in] 
Sent: Tuesday, August 02, 2011 12:30 AM
To: solr-user@lucene.apache.org
Subject: Spell Check

Hi All,
Facing some issue with Solr spellcheck. I got an index based dictionary
made.

My changes to *solrconfig.xml* are:

 searchComponent name=spellcheck class=solr.SpellCheckComponent

str name=queryAnalyzerFieldTypetextSpell/str

  lst name=spellchecker
str name=classnamesolr.IndexBasedSpellChecker/str
str name=namelocSpell/str
str name=fieldlocSpell/str
str name=buildOnOptimizetrue/str
str name=spellcheckIndexDir./spellchecker_loc_spell/str
   /lst
  /searchComponent

  requestHandler name=/spellCheckCompRH class=solr.SearchHandler
lst name=locSpell
str name=echoParamsexplicit/str
  
str name=spellcheck.dictionarylocSpell/str
  str name=spellcheck.onlyMorePopularfalse/str
  
  str name=spellcheck.extendedResultstrue/str
  
  str name=spellcheck.count5/str
/lst
arr name=last-components
  strspellcheck/str
/arr
  /requestHandler

I got my dictionary made to the folder spellchecker_loc_spell post an
optimize.

Now my changes to schema.xml are as follows:

New *fieldtype
*

fieldType name=textSpellPhrase class=solr.TextField
positionIncrementGap=100 stored=false multiValued=true
analyzer
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
/analyzer
/fieldType


my *fields*:

field name=id type=integer indexed=true stored=true/   
   field name=locName type=string indexed=true stored=true/   
   field name=ct type=integer indexed=true stored=true/   
   field name=st type=integer indexed=true stored=true/   
   field name=ppd type=string indexed=true stored=true/   
   field name=ecd type=string indexed=true stored=true/   
   field name=city type=text indexed=true stored=true/   
   field name=state type=text indexed=true stored=true/   
  field name=locSpell type=textSpellPhrase indexed=true
stored=false/   


 defaultSearchFieldlocName/defaultSearchField

 

 copyField source=locName dest=locSpell/





Now when I send the following command

http://SolrIP/MagicBricks/Locality/spellCheckCompRH/?q=Dizeagarversion=2.2start=0rows=10indent=onspellcheck=truespellcheck.collate=truespellcheck.extendedResults=truespellcheck.count=3spellcheck.dictionary=locSpell


I get the following result::


−
response
−
lst name=responseHeader
int name=status0/int
int name=QTime1/int
/lst
result name=response numFound=0 start=0/
−
lst name=spellcheck
−
lst name=suggestions
bool name=correctlySpelledtrue/bool
/lst
/lst
/response


Which should not be the case as it is wrongly spelled. Could anyone help me
out as to why am I getting this strange result that it is
correctlySpelled=true when it is not.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Spell-Check-tp3218037p3218037.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to cut off hits with score below threshold?

2011-08-02 Thread karsten-solr
Hi Otis,

is this the same question as
http://lucene.472066.n3.nabble.com/Filter-by-relevance-td1837486.html
?

If yes, perhaps something like (http://search-lucene.com/m/4AHNF17wIJW1/)
q={!frange l=0.85}query($qq)
qq=the original relevancy query
will help?

(BTW, a also would like to specify a custom Collector via API in Solr, possible 
an issue?)

Best regards
  Karsten


in context:
http://lucene.472066.n3.nabble.com/How-to-cut-off-hits-with-score-below-threshold-td3219064.html

 Original-Nachricht 
 If one wanted to cut off hits whose score is below some threshold (I know,
 I know, one doesn't typically want to do this), what are the most elegant
 options?


Re: How to cut off hits with score below threshold?

2011-08-02 Thread Markus Jelsma
Be careful with that approach as it will return score=1.0f for all documents 
(fl=*,score). This, however, doesn't affect the outcome of the frange.

Feels like a bug though

On Tuesday 02 August 2011 16:29:16 karsten-s...@gmx.de wrote:
 Hi Otis,
 
 is this the same question as
 http://lucene.472066.n3.nabble.com/Filter-by-relevance-td1837486.html
 ?
 
 If yes, perhaps something like (http://search-lucene.com/m/4AHNF17wIJW1/)
 q={!frange l=0.85}query($qq)
 qq=the original relevancy query
 will help?
 
 (BTW, a also would like to specify a custom Collector via API in Solr,
 possible an issue?)
 
 Best regards
   Karsten
 
 
 in context:
 http://lucene.472066.n3.nabble.com/How-to-cut-off-hits-with-score-below-thr
 eshold-td3219064.html
 
  Original-Nachricht 
 
  If one wanted to cut off hits whose score is below some threshold (I
  know, I know, one doesn't typically want to do this), what are the most
  elegant options?

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


lucene/solr, raw indexing/searching

2011-08-02 Thread dhastings
Hello,
I am trying to get lucene and solr to agree on a completely Raw indexing
method.  I use lucene in my indexers that write to an index on disk, and
solr to search those indexes that i create, as creating the indexes without
solr is much much faster than using the solr server.

are there settings for BOTH solr and lucene to use EXACTLY whats in the
content as opposed to interpreting what it thinks im trying to do?  My
content is extremely specific and needs no interpretation or adjustment,
indexing or searching, a text field.

for example:

203.1 seems to be indexed as 2031.  searching for 203.1 i can get to work
correctly, but then it wont find whats indexed using 3.1's standard
analyzer.

if i have content that is :
this is rev. 23.302

i need it indexed EXACTLY as it appears,
this is rev. 23.302

I do not want any of solr or lucenes attempts to fix my content or my
queries.  rev. needs to stay rev. and not turn into rev, 23.302
needs to stay as such, and NOT turn into 23302.  this is for BOTH indexing
and searching.  

any hints?

right now for indexing i have:

Set nostopwords = new HashSet(); nostopwords.add(buahahahahahaha);

Analyzer an = new StandardAnalyzer(Version.LUCENE_31, nostopwords);
writer  = new IndexWriter(fsDir,an,MaxFieldLength.UNLIMITED);   
 
writer.setUseCompoundFile(false) ;


and for searching i have in my schema :


 fieldType name=text class=solr.TextField positionIncrementGap=100
   analyzer
tokenizer class=solr.StandardTokenizerFactory/
 
filter class=solr.LowerCaseFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType


Thanks.  Very much appreciated.


--
View this message in context: 
http://lucene.472066.n3.nabble.com/lucene-solr-raw-indexing-searching-tp3219277p3219277.html
Sent from the Solr - User mailing list archive at Nabble.com.


CoreContainer from CommonsHttpSolrServer

2011-08-02 Thread Matthias
Hi everybody,

I'm using Solr (with multiple cores) in a Webapp and access the differnt
cores using CommonsHttpSolrServer. As I would like to know, which cores are
configured and what there status is I would like to get an instance of
CoreContainer.

The site http://wiki.apache.org/solr/CoreAdmin tells me how to interact with
the CoreAdminHandler via my browser. But I would like to get the information
provided by the STATUS action in my java application. As CoreContainer
provides appropriate methods I need to get access to such an object.

What's the best way to achieve that.
Thanks in advance.

Matthias 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/CoreContainer-from-CommonsHttpSolrServer-tp3219299p3219299.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to cut off hits with score below threshold?

2011-08-02 Thread Markus Jelsma
I've created an issue to track this funny behaviour:
https://issues.apache.org/jira/browse/SOLR-2689

On Tuesday 02 August 2011 16:46:18 Markus Jelsma wrote:
 Be careful with that approach as it will return score=1.0f for all
 documents (fl=*,score). This, however, doesn't affect the outcome of the
 frange.
 
 Feels like a bug though
 
 On Tuesday 02 August 2011 16:29:16 karsten-s...@gmx.de wrote:
  Hi Otis,
  
  is this the same question as
  http://lucene.472066.n3.nabble.com/Filter-by-relevance-td1837486.html
  ?
  
  If yes, perhaps something like (http://search-lucene.com/m/4AHNF17wIJW1/)
  q={!frange l=0.85}query($qq)
  qq=the original relevancy query
  will help?
  
  (BTW, a also would like to specify a custom Collector via API in Solr,
  possible an issue?)
  
  Best regards
  
Karsten
  
  in context:
  http://lucene.472066.n3.nabble.com/How-to-cut-off-hits-with-score-below-t
  hr eshold-td3219064.html
  
   Original-Nachricht 
  
   If one wanted to cut off hits whose score is below some threshold (I
   know, I know, one doesn't typically want to do this), what are the most
   elegant options?

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Jetty error message regarding EnvEntry in WebAppContext

2011-08-02 Thread Marian Steinbach
Hi!

I am trying to deploy Solr under Jetty 6.1.22-1ubuntu1 (installed the
jetty and libjetty-extra-java packages). However, it seems as if I can't
get the webapp configuration set right.

With this configuration...

 Configure class=org.mortbay.jetty.webapp.WebAppContext
 ...
 *Call name=addEnvEntry*
   Arg/solr/home/Arg
   Arg type=java.lang.String/opt/exptbx-solr/solr/Arg
   Arg type=java.lang.Booleantrue/Arg
 /Call
 /Configure

... I get the error:

426 [main] WARN org.mortbay.log - Config error at Call
name=addEnvEntryArg/solr/home/ArgArg
type=java.lang.String/opt/exptbx-solr/solr/ArgArg
type=java.lang.Booleantrue/Arg/Call
426 [main] ERROR org.mortbay.log - EXCEPTION
java.lang.IllegalStateException: No Method: Call
name=addEnvEntryArg/solr/home/ArgArg
type=java.lang.String/opt/exptbx-solr/solr/ArgArg
type=java.lang.Booleantrue/Arg/Call on class
org.mortbay.jetty.webapp.WebAppContext



With this configuration instead...

 Configure class=org.mortbay.jetty.webapp.WebAppContext
  ...
 *New class=org.mortbay.jetty.plus.naming.EnvEntry*
Arg/solr/home/Arg
Arg type=java.lang.String/opt/exptbx-solr/solr/Arg
Arg type=java.lang.Booleantrue/Arg
  /New
 /Configure

I get the following error:

438 [main] WARN org.mortbay.log - Config error at New
class=org.mortbay.jetty.plus.naming.EnvEntryArg/solr/home/ArgArg
type=java.lang.String/opt/exptbx-solr/solr/ArgArg
type=java.lang.Booleantrue/Arg/New
438 [main] WARN org.mortbay.log - EXCEPTION
java.lang.ClassNotFoundException: org.mortbay.jetty.plus.naming.EnvEntry


Both examples are derived from http://wiki.apache.org/solr/SolrJetty - the
second one being a user-contributed config. It seems that the second problem
occurs since I'm not using Jetty Plus. Or at least I don't have the library
in the path.

Can anyone tell me how a working configuration for Jetty 6.1.22 would have
to look like?

Thanks!

Marian


Re: Matching queries on a per-element basis against a multivalued field

2011-08-02 Thread David Smiley (@MITRE.org)
Suk,
 
You're hitting on a well known limitation with Lucene, and the solutions
are work-arounds that may be unacceptable depending on the specifics of your
case.

Solr 4.0 (trunk)'s support for Joins is definitely an up and coming option,
as Mike pointed out.

Kersen's suggestion of using an index just for friends is very good,
although depending on the specifics of your actual needs it may not work or
be unscalable.

Mike also pointed out phrase queries, which will work, but remember to add a
proximity, e.g. isCool=true gender=male~50   You'll want to consider the
position increment gap setting in your schema.  A limitation here is that
your text analysis options are limited since all the data is in the same
field.  You're also limited to simple term search; no range queries.

I took a different approach for an app I built. I indexed into separate
fields (i.e. isCool, gender, bloodType) so that I could analyze each of them
appropriately. But I did have to add a filter that basically collapsed all
position offsets within a value to zero, effectively nullifying my ability
to do a phrase query for a particular value. That was acceptable to me and
it can be ameliorated with shingling. Then at search time I used Span
queries and their unique ability to positionally query over more than one
field.  There were some edge conditions that were tricky to debug when I had
a null value, but it was at least fixable with a sentinal value kluge.  

~ David Smiley

-
 Author: https://www.packtpub.com/solr-1-4-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Matching-queries-on-a-per-element-basis-against-a-multivalued-field-tp3217432p3219352.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: changing the root directory where solrCloud stores info inside zookeeper File system

2011-08-02 Thread yatir
Thanks A lot mark,
Since My SolrCloud code was old I tried downloading and building the newest
code from here https://svn.apache.org/repos/asf/lucene/dev/trunk/
I am using tomcat6
I manually created the sc sub-directory in my zooKeeper ensemble file-system
I used this connection String to my ZK ensemble
zook1:2181/sc,zook2:2181/sc,zook3:2181/sc
but I still get the same problem
here is the entire catalina.out log with the exception

Using CATALINA_BASE:   /opt/tomcat6
Using CATALINA_HOME:   /opt/tomcat6
Using CATALINA_TMPDIR: /opt/tomcat6/temp
Using JRE_HOME:/usr/java/default/
Using CLASSPATH:   /opt/tomcat6/bin/bootstrap.jar
Java HotSpot(TM) 64-Bit Server VM warning: Failed to reserve shared memory
(errno = 12).
Aug 2, 2011 4:28:46 AM org.apache.catalina.core.AprLifecycleListener init
INFO: The APR based Apache Tomcat Native library which allows optimal
performance in production environments was not found on the
java.library.path:
/usr/java/jdk1.6.0_21/jre/lib/amd64/server:/usr/java/jdk1.6.0_21/jre/lib/amd64:/usr/java/jdk1.6.0_21/jre/../lib/amd64:/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
Aug 2, 2011 4:28:46 AM org.apache.coyote.http11.Http11Protocol init
INFO: Initializing Coyote HTTP/1.1 on http-8983 Aug 2, 2011 4:28:46 AM
org.apache.coyote.http11.Http11Protocol init
INFO: Initializing Coyote HTTP/1.1 on http-8080 Aug 2, 2011 4:28:46 AM
org.apache.catalina.startup.Catalina load
INFO: Initialization processed in 448 ms Aug 2, 2011 4:28:46 AM
org.apache.catalina.core.StandardService start
INFO: Starting service Catalina
Aug 2, 2011 4:28:46 AM org.apache.catalina.core.StandardEngine start
INFO: Starting Servlet Engine: Apache Tomcat/6.0.29 Aug 2, 2011 4:28:46 AM
org.apache.catalina.startup.HostConfig deployDescriptor
INFO: Deploying configuration descriptor solr1.xml Aug 2, 2011 4:28:46 AM
org.apache.solr.core.SolrResourceLoader locateSolrHome
INFO: Using JNDI solr.home: /home/tomcat/solrCloud1 Aug 2, 2011 4:28:46 AM
org.apache.solr.core.SolrResourceLoader init
INFO: Solr home set to '/home/tomcat/solrCloud1/'
Aug 2, 2011 4:28:46 AM org.apache.solr.servlet.SolrDispatchFilter init
INFO: SolrDispatchFilter.init()
Aug 2, 2011 4:28:46 AM org.apache.solr.core.SolrResourceLoader
locateSolrHome
INFO: Using JNDI solr.home: /home/tomcat/solrCloud1 Aug 2, 2011 4:28:46 AM
org.apache.solr.core.CoreContainer$Initializer initialize
INFO: looking for solr.xml: /home/tomcat/solrCloud1/solr.xml Aug 2, 2011
4:28:46 AM org.apache.solr.core.CoreContainer init
INFO: New CoreContainer 853527367
Aug 2, 2011 4:28:46 AM org.apache.solr.core.SolrResourceLoader
locateSolrHome
INFO: Using JNDI solr.home: /home/tomcat/solrCloud1 Aug 2, 2011 4:28:46 AM
org.apache.solr.core.SolrResourceLoader init
INFO: Solr home set to '/home/tomcat/solrCloud1/'
Aug 2, 2011 4:28:46 AM org.apache.solr.cloud.SolrZkServerProps getProperties
INFO: Reading configuration from: /home/tomcat/solrCloud1/zoo.cfg Aug 2,
2011 4:28:46 AM org.apache.solr.core.CoreContainer initZooKeeper
INFO: Zookeeper client=zook1:2181/sc,zook2:2181/sc,zook3:2181/sc
Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv
INFO: Client environment:zookeeper.version=3.3.1-942149, built on 05/07/2010
17:14 GMT Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv
INFO: Client environment:host.name=ob1079.nydc1.outbrain.com
Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv
INFO: Client environment:java.version=1.6.0_21 Aug 2, 2011 4:28:46 AM
org.apache.zookeeper.Environment logEnv
INFO: Client environment:java.vendor=Sun Microsystems Inc.
Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv
INFO: Client environment:java.home=/usr/java/jdk1.6.0_21/jre
Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv
INFO: Client environment:java.class.path=/opt/tomcat6/bin/bootstrap.jar
Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv
INFO: Client
environment:java.library.path=/usr/java/jdk1.6.0_21/jre/lib/amd64/server:/usr/java/jdk1.6.0_21/jre/lib/amd64:/usr/java/jdk1.6.0_21/jre/../lib/amd64:/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv
INFO: Client environment:java.io.tmpdir=/opt/tomcat6/temp
Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv
INFO: Client environment:java.compiler=NA Aug 2, 2011 4:28:46 AM
org.apache.zookeeper.Environment logEnv
INFO: Client environment:os.name=Linux
Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv
INFO: Client environment:os.arch=amd64
Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv
INFO: Client environment:os.version=2.6.18-194.8.1.el5
Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv
INFO: Client environment:user.name=tomcat Aug 2, 2011 4:28:46 AM
org.apache.zookeeper.Environment logEnv
INFO: Client environment:user.home=/home/tomcat
Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv
INFO: Client environment:user.dir=/opt/tomcat6 

Re: performance crossover between single index and sharding

2011-08-02 Thread David Smiley (@MITRE.org)
That's a fantastic answer, Shawn.

To more directly answer Bernd's question: Bernard, shard your data once
you've done reasonable performance optimizations to your single core index
setup (see Chapter 9 of my book) and the query response time isn't meeting
your requirements in spite of this.  Solr scales pretty darned well
horizontally --  so as you shard your data more and more, the query
responses will get faster.  At some extreme point there will be diminishing
returns and a performance decrease, but I wouldn't worry about that at all
until you've got many terabytes -- I don't know how many but don't worry
about it.

~ David

-
 Author: https://www.packtpub.com/solr-1-4-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/performance-crossover-between-single-index-and-sharding-tp3218561p3219397.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: performance crossover between single index and sharding

2011-08-02 Thread Markus Jelsma
Actually, i do worry about it. Would be marvelous if someone could provide 
some metrics for an index of many terabytes.

 [..] At some extreme point there will be diminishing
 returns and a performance decrease, but I wouldn't worry about that at all
 until you've got many terabytes -- I don't know how many but don't worry
 about it.
 
 ~ David
 
 -
  Author: https://www.packtpub.com/solr-1-4-enterprise-search-server/book
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/performance-crossover-between-single-in
 dex-and-sharding-tp3218561p3219397.html Sent from the Solr - User mailing
 list archive at Nabble.com.


RE: lucene/solr, raw indexing/searching

2011-08-02 Thread Craig Stires

dhastings,

my recommendation for the approaches from both sides ...

Lucene:
try on a whitespace analyzer for size

   Analyzer an = new WhitespaceAnalyzer(Version.LUCENE_31);


Solr:
in your /index/solr/conf/schema.xml

   fieldType name=text class=solr.TextField positionIncrementGap=100
 analyzer type=query
   tokenizer class=solr.WhitespaceTokenizerFactory/
...
 /analyzer
   /fieldType


-craig


-Original Message-
From: dhastings [mailto:dhasti...@wshein.com] 
Sent: Tuesday, 2 August 2011 10:14 PM
To: solr-user@lucene.apache.org
Subject: lucene/solr, raw indexing/searching

Hello,
I am trying to get lucene and solr to agree on a completely Raw indexing
method.  I use lucene in my indexers that write to an index on disk, and
solr to search those indexes that i create, as creating the indexes without
solr is much much faster than using the solr server.

are there settings for BOTH solr and lucene to use EXACTLY whats in the
content as opposed to interpreting what it thinks im trying to do?  My
content is extremely specific and needs no interpretation or adjustment,
indexing or searching, a text field.

for example:

203.1 seems to be indexed as 2031.  searching for 203.1 i can get to work
correctly, but then it wont find whats indexed using 3.1's standard
analyzer.

if i have content that is :
this is rev. 23.302

i need it indexed EXACTLY as it appears,
this is rev. 23.302

I do not want any of solr or lucenes attempts to fix my content or my
queries.  rev. needs to stay rev. and not turn into rev, 23.302
needs to stay as such, and NOT turn into 23302.  this is for BOTH indexing
and searching.  

any hints?

right now for indexing i have:

Set nostopwords = new HashSet(); nostopwords.add(buahahahahahaha);

Analyzer an = new StandardAnalyzer(Version.LUCENE_31, nostopwords);
writer  = new IndexWriter(fsDir,an,MaxFieldLength.UNLIMITED);

writer.setUseCompoundFile(false) ;


and for searching i have in my schema :


 fieldType name=text class=solr.TextField positionIncrementGap=100
   analyzer
tokenizer class=solr.StandardTokenizerFactory/
 
filter class=solr.LowerCaseFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType


Thanks.  Very much appreciated.


--
View this message in context:
http://lucene.472066.n3.nabble.com/lucene-solr-raw-indexing-searching-tp3219
277p3219277.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: lucene/solr, raw indexing/searching

2011-08-02 Thread Jonathan Rochkind
In your solr schema.xml, are the fields you are using defined as text 
fields with analyzers? It sounds like you want no analysis at all, which 
probably means you don't want text fields either, you just want string 
fields. That will make it impossible to search for individual tokens 
though, searches will match only on complete matches of the value.


I'm not quite sure how to do what you want, it depends on exactly what 
you want. What kind of searching do you expect to support?  If you still 
do want tokenization, you'll still want some analysis... but I'm not 
quite sure how that corresponds to what you'd want to do on the lucene 
end.  What you're trying to do is going to be inevitably confusing, I 
think. Which doesn't mean it's not possible.  You might find it less 
confusing if you were willing to use Solr to index though, rather than 
straight lucene -- you could use Solr via the SolrJ java classes, rather 
than the HTTP interface.


On 8/2/2011 11:14 AM, dhastings wrote:

Hello,
I am trying to get lucene and solr to agree on a completely Raw indexing
method.  I use lucene in my indexers that write to an index on disk, and
solr to search those indexes that i create, as creating the indexes without
solr is much much faster than using the solr server.

are there settings for BOTH solr and lucene to use EXACTLY whats in the
content as opposed to interpreting what it thinks im trying to do?  My
content is extremely specific and needs no interpretation or adjustment,
indexing or searching, a text field.

for example:

203.1 seems to be indexed as 2031.  searching for 203.1 i can get to work
correctly, but then it wont find whats indexed using 3.1's standard
analyzer.

if i have content that is :
this is rev. 23.302

i need it indexed EXACTLY as it appears,
this is rev. 23.302

I do not want any of solr or lucenes attempts to fix my content or my
queries.  rev. needs to stay rev. and not turn into rev, 23.302
needs to stay as such, and NOT turn into 23302.  this is for BOTH indexing
and searching.

any hints?

right now for indexing i have:

 Set nostopwords = new HashSet(); nostopwords.add(buahahahahahaha);

Analyzer an = new StandardAnalyzer(Version.LUCENE_31, nostopwords);
writer  = new IndexWriter(fsDir,an,MaxFieldLength.UNLIMITED);
writer.setUseCompoundFile(false) ;


and for searching i have in my schema :


  fieldType name=text class=solr.TextField positionIncrementGap=100
analyzer
 tokenizer class=solr.StandardTokenizerFactory/

 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
 /fieldType


Thanks.  Very much appreciated.


--
View this message in context: 
http://lucene.472066.n3.nabble.com/lucene-solr-raw-indexing-searching-tp3219277p3219277.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Jetty error message regarding EnvEntry in WebAppContext

2011-08-02 Thread Jonathan Rochkind

On 8/2/2011 11:42 AM, Marian Steinbach wrote:

Can anyone tell me how a working configuration for Jetty 6.1.22 would have
to look like?


You know that Solr distro comes with a jetty with a Solr in it, right, 
as an example application? Even if you don't want to use it for some 
reason, that would probably be the best model to look at for a working 
jetty with solr.


Or is the problem that you want a different version of jetty?

As it happens, I just recently set up a jetty 6.1.26 for another 
project, not for solr. It was kind of a pain not being too familiar with 
java deployment or jetty.  But I did get JDNI working, by following the 
jetty instructions here: http://docs.codehaus.org/display/JETTY/JNDI  
(It was a bit confusing to figure out what they were talking about not 
being familiar with jetty, but eventually I got it, and the instructions 
were correct.)


But if I wanted to run Solr in jetty, I'd start with the jetty that is 
distributed with solr, rather than trying to build my own.


Re: Matching queries on a per-element basis against a multivalued field

2011-08-02 Thread Suk-Hyun Cho
I appreciate your replies and ideas.

SpanQuery would work, and I'll look into this further. However, what about
the original question? Is there no way to match documents on a per-element
basis against a multivalued field? If not, would it perhaps make sense to
create a feature request?

Also, regarding the join support you guys have mentioned: is it only on a
field within the same core, or is it across cores (as if cores are tables in
a database)? Joining on cores would eliminate most of the issues I'm having.
The examples I gave are simplified, but actually I have an entity A that has
entity B that has entity C, and I'm flattening out queriable fields of B and
C into the schema for A. This way, I can search for documents for the core A
that match some criteria for A, B, and/or C.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Matching-queries-on-a-per-element-basis-against-a-multivalued-field-tp3217432p3219565.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Matching queries on a per-element basis against a multivalued field

2011-08-02 Thread David Smiley (@MITRE.org)

On Aug 2, 2011, at 1:09 PM, Suk-Hyun Cho [via Lucene] wrote:

 I appreciate your replies and ideas. 
 
 SpanQuery would work, and I'll look into this further. However, what about 
 the original question? Is there no way to match documents on a per-element 
 basis against a multivalued field?

Correct; there is no way.  Aside from Solr 4's Join feature, everything else 
suggested is a hack / work-around for a fundamental limitation.  

 If not, would it perhaps make sense to create a feature request? 

You could but I wouldn't bother because its unlikely to get any traction as 
it's a fundamental issue with Lucene and at the Solr level there is a solution 
on the horizon.

 Also, regarding the join support you guys have mentioned: is it only on a 
 field within the same core, or is it across cores (as if cores are tables in 
 a database)? Joining on cores would eliminate most of the issues I'm having. 
 The examples I gave are simplified, but actually I have an entity A that has 
 entity B that has entity C, and I'm flattening out queriable fields of B and 
 C into the schema for A. This way, I can search for documents for the core A 
 that match some criteria for A, B, and/or C. 

The Join support works across cores.  See the wiki and associated JIRA issue 
for it.

~ David Smiley



-
 Author: https://www.packtpub.com/solr-1-4-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Matching-queries-on-a-per-element-basis-against-a-multivalued-field-tp3217432p3219638.html
Sent from the Solr - User mailing list archive at Nabble.com.

how to get row no. of current record

2011-08-02 Thread Ranveer

Hi,

How to know the row number of current record.

i.e : suppose we have 10 million record indexed. Currently I am on 
5th records and id of the this record is XYZ00234, how to know that 
the current record rows no is 5th.


thanks..

regards
Ranveer



RE: performance crossover between single index and sharding

2011-08-02 Thread Burton-West, Tom
Hi Markus,

Just as a data point for a very large sharded index, we have the full text of 
9.3 million books with an index size of about 6+ TB spread over 12 shards on 4 
machines. Each machine has 3 shards. The size of each shard ranges between 
475GB and 550GB.  We are definitely I/O bound. Our machines have 144GB of 
memory with about 16GB dedicated to the tomcat instance running the 3 Solr 
instances, which leaves about 120 GB (or 40GB per shard) for the OS disk cache. 
 We release a new index every morning and then warm the caches with several 
thousand queries.  I probably should add that our disk storage is a very high 
performance Isilon appliance that has over 500 drives and every block of every 
file is striped over no less than 14 different drives. (See blog for details *)

We have a very low number of queries per second (0.3-2 qps) and our modest 
response time goal is to keep 99th percentile response time for our application 
(i.e. Solr + application) under 10 seconds.

Our current performance statistics are:

average response time  300 ms
median response time   113 ms
90th percentile663 ms
95th percentile1,691 ms

We had plans to do some performance testing to determine the optimum shard size 
and optimum number of shards per machine, but that has remained on the back 
burner for a long time as other higher priority items keep pushing it down on 
the todo list.

We would be really interested to hear about the experiences of people who have 
so many shards that the overhead of distributing the queries, and 
consolidating/merging the responses becomes a serious issue.


Tom Burton-West

http://www.hathitrust.org/blogs/large-scale-search

* 
http://www.hathitrust.org/blogs/large-scale-search/scaling-large-scale-search-50-volumes-5-million-volumes-and-beyond

-Original Message-
From: Markus Jelsma [mailto:markus.jel...@openindex.io] 
Sent: Tuesday, August 02, 2011 12:33 PM
To: solr-user@lucene.apache.org
Subject: Re: performance crossover between single index and sharding

Actually, i do worry about it. Would be marvelous if someone could provide 
some metrics for an index of many terabytes.

 [..] At some extreme point there will be diminishing
 returns and a performance decrease, but I wouldn't worry about that at all
 until you've got many terabytes -- I don't know how many but don't worry
 about it.
 
 ~ David
 
 -
  Author: https://www.packtpub.com/solr-1-4-enterprise-search-server/book
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/performance-crossover-between-single-in
 dex-and-sharding-tp3218561p3219397.html Sent from the Solr - User mailing
 list archive at Nabble.com.


Re: performance crossover between single index and sharding

2011-08-02 Thread Jonathan Rochkind
What's the reasoning  behind having three shards on one machine, instead 
of just combining those into one shard? Just curious.  I had been 
thinking the point of shards was to get them on different machines, and 
there'd be no reason to have multiple shards on one machine.


On 8/2/2011 1:59 PM, Burton-West, Tom wrote:

Hi Markus,

Just as a data point for a very large sharded index, we have the full text of 
9.3 million books with an index size of about 6+ TB spread over 12 shards on 4 
machines. Each machine has 3 shards. The size of each shard ranges between 
475GB and 550GB.  We are definitely I/O bound. Our machines have 144GB of 
memory with about 16GB dedicated to the tomcat instance running the 3 Solr 
instances, which leaves about 120 GB (or 40GB per shard) for the OS disk cache. 
 We release a new index every morning and then warm the caches with several 
thousand queries.  I probably should add that our disk storage is a very high 
performance Isilon appliance that has over 500 drives and every block of every 
file is striped over no less than 14 different drives. (See blog for details *)

We have a very low number of queries per second (0.3-2 qps) and our modest 
response time goal is to keep 99th percentile response time for our application 
(i.e. Solr + application) under 10 seconds.

Our current performance statistics are:

average response time  300 ms
median response time   113 ms
90th percentile663 ms
95th percentile1,691 ms

We had plans to do some performance testing to determine the optimum shard size 
and optimum number of shards per machine, but that has remained on the back 
burner for a long time as other higher priority items keep pushing it down on 
the todo list.

We would be really interested to hear about the experiences of people who have 
so many shards that the overhead of distributing the queries, and 
consolidating/merging the responses becomes a serious issue.


Tom Burton-West

http://www.hathitrust.org/blogs/large-scale-search

* 
http://www.hathitrust.org/blogs/large-scale-search/scaling-large-scale-search-50-volumes-5-million-volumes-and-beyond

-Original Message-
From: Markus Jelsma [mailto:markus.jel...@openindex.io]
Sent: Tuesday, August 02, 2011 12:33 PM
To: solr-user@lucene.apache.org
Subject: Re: performance crossover between single index and sharding

Actually, i do worry about it. Would be marvelous if someone could provide
some metrics for an index of many terabytes.


[..] At some extreme point there will be diminishing
returns and a performance decrease, but I wouldn't worry about that at all
until you've got many terabytes -- I don't know how many but don't worry
about it.

~ David

-
  Author: https://www.packtpub.com/solr-1-4-enterprise-search-server/book
--
View this message in context:
http://lucene.472066.n3.nabble.com/performance-crossover-between-single-in
dex-and-sharding-tp3218561p3219397.html Sent from the Solr - User mailing
list archive at Nabble.com.


Re: performance crossover between single index and sharding

2011-08-02 Thread Markus Jelsma
Hi Tom,

Very interesting indeed! But i keep wondering why some engineers choose to 
store multiple shards of the same index on the same machine, there must be 
significant overhead. The only reason i can think of is ease of maintenance in 
moving shards to a separate physical machine.
I know that rearranging the shard topology can be a real pain in a large 
existing cluster (e.g. consistent hashing is not consistent anymore and having 
to shuffle docs to their new shard), is this the reason you choose this 
approach?

Cheers,

 Hi Markus,
 
 Just as a data point for a very large sharded index, we have the full text
 of 9.3 million books with an index size of about 6+ TB spread over 12
 shards on 4 machines. Each machine has 3 shards. The size of each shard
 ranges between 475GB and 550GB.  We are definitely I/O bound. Our machines
 have 144GB of memory with about 16GB dedicated to the tomcat instance
 running the 3 Solr instances, which leaves about 120 GB (or 40GB per
 shard) for the OS disk cache.  We release a new index every morning and
 then warm the caches with several thousand queries.  I probably should add
 that our disk storage is a very high performance Isilon appliance that has
 over 500 drives and every block of every file is striped over no less than
 14 different drives. (See blog for details *)
 
 We have a very low number of queries per second (0.3-2 qps) and our modest
 response time goal is to keep 99th percentile response time for our
 application (i.e. Solr + application) under 10 seconds.
 
 Our current performance statistics are:
 
 average response time  300 ms
 median response time   113 ms
 90th percentile663 ms
 95th percentile1,691 ms
 
 We had plans to do some performance testing to determine the optimum shard
 size and optimum number of shards per machine, but that has remained on
 the back burner for a long time as other higher priority items keep
 pushing it down on the todo list.
 
 We would be really interested to hear about the experiences of people who
 have so many shards that the overhead of distributing the queries, and
 consolidating/merging the responses becomes a serious issue.
 
 
 Tom Burton-West
 
 http://www.hathitrust.org/blogs/large-scale-search
 
 *
 http://www.hathitrust.org/blogs/large-scale-search/scaling-large-scale-sea
 rch-50-volumes-5-million-volumes-and-beyond
 
 -Original Message-
 From: Markus Jelsma [mailto:markus.jel...@openindex.io]
 Sent: Tuesday, August 02, 2011 12:33 PM
 To: solr-user@lucene.apache.org
 Subject: Re: performance crossover between single index and sharding
 
 Actually, i do worry about it. Would be marvelous if someone could provide
 some metrics for an index of many terabytes.
 
  [..] At some extreme point there will be diminishing
  returns and a performance decrease, but I wouldn't worry about that at
  all until you've got many terabytes -- I don't know how many but don't
  worry about it.
  
  ~ David
  
  -
  
   Author: https://www.packtpub.com/solr-1-4-enterprise-search-server/book
  
  --
  View this message in context:
  http://lucene.472066.n3.nabble.com/performance-crossover-between-single-i
  n dex-and-sharding-tp3218561p3219397.html Sent from the Solr - User
  mailing list archive at Nabble.com.


RE: performance crossover between single index and sharding

2011-08-02 Thread Burton-West, Tom
Hi Jonothan and Markus,

Why 3 shards on one machine instead of one larger shard per machine?

Good question!

We made this architectural decision several years ago and I'm not remembering 
the rationale at the moment. I believe we originally made the decision due to 
some tests showing a sweetspot for I/O performance for shards with 
500,000-600,000 documents, but those tests were made before we implemented 
CommonGrams and when we were still using attached storage.  I think we also 
might have had concerns about Java OOM errors with a really large shard/index, 
but we now know that we can keep memory usage under control by tweaking the 
amount of the terms index that gets read into memory.

We should probably do some tests and revisit the question.

The reason we don't have 12 shards on 12 machines is that current performance 
is good enough that we can't justify buying 8 more machines:)

Tom

-Original Message-
From: Markus Jelsma [mailto:markus.jel...@openindex.io] 
Sent: Tuesday, August 02, 2011 2:12 PM
To: solr-user@lucene.apache.org
Subject: Re: performance crossover between single index and sharding

Hi Tom,

Very interesting indeed! But i keep wondering why some engineers choose to 
store multiple shards of the same index on the same machine, there must be 
significant overhead. The only reason i can think of is ease of maintenance in 
moving shards to a separate physical machine.
I know that rearranging the shard topology can be a real pain in a large 
existing cluster (e.g. consistent hashing is not consistent anymore and having 
to shuffle docs to their new shard), is this the reason you choose this 
approach?

Cheers,
bble.com.


Re: performance crossover between single index and sharding

2011-08-02 Thread Ken Krugler
With low qps and multi-core servers, I believe one reason to have multiple 
shards on one server is to provide better parallelism for a request, and thus 
reduce your response time.

-- Ken

On Aug 2, 2011, at 11:06am, Jonathan Rochkind wrote:

 What's the reasoning  behind having three shards on one machine, instead of 
 just combining those into one shard? Just curious.  I had been thinking the 
 point of shards was to get them on different machines, and there'd be no 
 reason to have multiple shards on one machine.
 
 On 8/2/2011 1:59 PM, Burton-West, Tom wrote:
 Hi Markus,
 
 Just as a data point for a very large sharded index, we have the full text 
 of 9.3 million books with an index size of about 6+ TB spread over 12 shards 
 on 4 machines. Each machine has 3 shards. The size of each shard ranges 
 between 475GB and 550GB.  We are definitely I/O bound. Our machines have 
 144GB of memory with about 16GB dedicated to the tomcat instance running the 
 3 Solr instances, which leaves about 120 GB (or 40GB per shard) for the OS 
 disk cache.  We release a new index every morning and then warm the caches 
 with several thousand queries.  I probably should add that our disk storage 
 is a very high performance Isilon appliance that has over 500 drives and 
 every block of every file is striped over no less than 14 different drives. 
 (See blog for details *)
 
 We have a very low number of queries per second (0.3-2 qps) and our modest 
 response time goal is to keep 99th percentile response time for our 
 application (i.e. Solr + application) under 10 seconds.
 
 Our current performance statistics are:
 
 average response time  300 ms
 median response time   113 ms
 90th percentile663 ms
 95th percentile1,691 ms
 
 We had plans to do some performance testing to determine the optimum shard 
 size and optimum number of shards per machine, but that has remained on the 
 back burner for a long time as other higher priority items keep pushing it 
 down on the todo list.
 
 We would be really interested to hear about the experiences of people who 
 have so many shards that the overhead of distributing the queries, and 
 consolidating/merging the responses becomes a serious issue.
 
 
 Tom Burton-West
 
 http://www.hathitrust.org/blogs/large-scale-search
 
 * 
 http://www.hathitrust.org/blogs/large-scale-search/scaling-large-scale-search-50-volumes-5-million-volumes-and-beyond
 
 -Original Message-
 From: Markus Jelsma [mailto:markus.jel...@openindex.io]
 Sent: Tuesday, August 02, 2011 12:33 PM
 To: solr-user@lucene.apache.org
 Subject: Re: performance crossover between single index and sharding
 
 Actually, i do worry about it. Would be marvelous if someone could provide
 some metrics for an index of many terabytes.
 
 [..] At some extreme point there will be diminishing
 returns and a performance decrease, but I wouldn't worry about that at all
 until you've got many terabytes -- I don't know how many but don't worry
 about it.
 
 ~ David
 
 -
  Author: https://www.packtpub.com/solr-1-4-enterprise-search-server/book
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/performance-crossover-between-single-in
 dex-and-sharding-tp3218561p3219397.html Sent from the Solr - User mailing
 list archive at Nabble.com.

--
Ken Krugler
+1 530-210-6378
http://bixolabs.com
custom data mining solutions








Re: performance crossover between single index and sharding

2011-08-02 Thread Shawn Heisey

On 8/2/2011 12:06 PM, Jonathan Rochkind wrote:
What's the reasoning  behind having three shards on one machine, 
instead of just combining those into one shard? Just curious.  I had 
been thinking the point of shards was to get them on different 
machines, and there'd be no reason to have multiple shards on one 
machine.


I'd be interested in hearing Tom's answer as well, but my answer boils 
down to the time it takes to do a full index rebuild and worries about 
performance.


Because I'm in a virtualized environment, I effectively have three large 
shards on each machine even though they are logically separate.  When I 
first got involved, we had a distributed EasyAsk index on 20 separate 
low-end physical servers.  That evolved into basically the same solution 
with a smaller number of virtual machines, on a pair of very powerful 
physical hosts.  On this system, doing a full rebuild took nearly two 
days and wasn't an atomic operation.  The EasyAsk system (also based on 
Lucene) was unable to deal with more than about 4 million documents per 
machine (real or virtual).  The only way to get acceptable performance 
was distributed search.  The cost of providing redundancy was too high, 
so we didn't have any.


When we first started implementing Solr, we assumed from our previous 
experience that we'd need distributed search, especially if query volume 
were to go up.  For that reason, we continued our virtualization model, 
but with only seven shards - six large static shards and a smaller 
incremental shard to hold data less than a week old.  This is where we 
are now, and performance is MUCH better than the old solution.  The low 
shard count made redundancy affordable, so we now have that too.


At the time Solr was first implemented, we could rebuild the entire 
index in about two hours and swap it into place all at once.  Our index 
has grown enough since then that it takes a little less than three 
hours, which is still pretty quick for 60 million documents.


I did try some early tests with a single large index.  Performance was 
pretty decent once it got warmed up, but I was worried about how it 
would perform under a heavy load, and how it would cope with frequent 
updates.  I never really got very far with testing those fears, because 
the full rebuild time was unacceptable - at least 8 hours.  The source 
database can keep up with six DIH instances reindexing at once, which 
completes much quicker than a single machine grabbing the entire 
database.  I may increase the number of shards after I remove 
virtualization, but I'll need to fix a few limitations in my build system.


Thanks,
Shawn



Re: Query on multi valued field

2011-08-02 Thread Chris Hostetter

: The query is get only those documents which have multiple elements for
: that multivalued field.
: 
: I.e, doc 2 and 3  should be returned from the above set..

The only way to do something like this is to add a field when you index 
your documents that contains the number and then filter on that field 
using a range query.

With an UpdateProcessor (or a ScriptTransformer in DIH) you can automate 
counting how many values there are -- but it has to be indexed to 
search/filter on it.



-Hoss


Re: Why Slop doens't match anything?

2011-08-02 Thread Alexander Ramos Jardim
Hey dude,

Sorry for the long absence. (Need to check my personal email more times o0)

I am not using dismax. I didn't find the solution for the problem. I just
made a full-import and the problem ended. Still odd.

2011/7/27 Gora Mohanty g...@mimirtech.com

 On Wed, Jul 27, 2011 at 8:38 PM, Alexander Ramos Jardim
 alexander.ramos.jar...@gmail.com wrote:
  Hello pals,
 
  Using solr 1.4.0. Trying to understand something. When I run the query
  *fieldA:nokia
  c3*, I get 5 results. All with nokia c3, as expected. But when I run
  fieldA:nokia c3~100, I don get any result!
 
  As far as I understand the ~100 should make my query bring even more
  results as not only documents with nokia c3 in their fieldA will be
 found.
  Something like nokia blue c3 should match too. Right?
 [...]

 That does seem odd. You are not using the dismax query handler by
 any chance, are you? If so, then the query slop needs to be specified
 by adding qs=100 to the query.

 Regards,
 Gora




-- 
Alexander Ramos Jardim


Re: Matching queries on a per-element basis against a multivalued field

2011-08-02 Thread Suk-Hyun Cho
Thanks. I saw the related jira issue but didn't follow closely enough to see
the cross-core join being added later. Any idea/hint on when I can expect
Solr 4 to be released?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Matching-queries-on-a-per-element-basis-against-a-multivalued-field-tp3217432p3220091.html
Sent from the Solr - User mailing list archive at Nabble.com.


TikaEntityProcessor is filling logs

2011-08-02 Thread O. Klein
I want to use TikaEntityProcessor for URLs defined in link from the parent
entity. This field can be empty as well. While the dataimport is working OK,
the logging is filling up with exceptions in case link is null. Is there way
to prevent this?


field column=id xpath=/doc/id /
field column=text xpath=/doc/text /
field column=link xpath=/doc/link /
entity name=tika processor=TikaEntityProcessor url=${crawl.link}
dataSource=bin onError=continue format=text
field column=text / 
/entity

--
View this message in context: 
http://lucene.472066.n3.nabble.com/TikaEntityProcessor-is-filling-logs-tp3220100p3220100.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Matching queries on a per-element basis against a multivalued field

2011-08-02 Thread David Smiley (@MITRE.org)
My best guess (and it is just a guess) is between December and March.  

The roots of Solr 4 which triggered the major version change is known as
flexible indexing (or just flex for short amongst developers).  The
genesis of it was posted to JIRA as a patch on 18 November 2008 --
LUCENE-1458 (almost 3 years ago!). About a year later it was committed into
a special flex branch that is probably gone now, and then around
April/early-May 2010, it went into trunk whereas the pre-flex code on trunk
went to a newly formed 3x branch. That is ancient history now, and there are
some amazing performance improvements tied to flex that haven't seen the
light of day in an official release. It's a shame, really. So it's been so
long that, well, after it dawns on everyone that it that the code is 3
friggin years old without a release -- it's time to get on with the show.  

~ David Smiley

-
 Author: https://www.packtpub.com/solr-1-4-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Matching-queries-on-a-per-element-basis-against-a-multivalued-field-tp3217432p3220242.html
Sent from the Solr - User mailing list archive at Nabble.com.


MultiSearcher/ParallelSearcher - searching over multiple cores?

2011-08-02 Thread Ralf Musick

Hi *,

I searched the web for an answer whether it is possible in SOLR to make 
a query over several cores with all features(boosting, pagination, 
highlighting) and so on out of the box.

In Lucene it it possible with MultiSearcher/Parallelsearcher.

I do not mean Distributed Search or merging several indexes together.
I mean a search over several cores with different types (different 
search fields).


It sounds quit difficult, so I think it is no SOLR out of the box 
feature and I have to implement it by hand.


Am I right?


Thanks in advance,
 Ralf




Re: Matching queries on a per-element basis against a multivalued field

2011-08-02 Thread eks dev
Well, Lucid released LucidWorks Enterprise
with   Complete Apache Solr 4.x Release Integrated and tested with
powerful enhancements

Whatever it means for solr 4.0



On Tue, Aug 2, 2011 at 11:10 PM, David Smiley (@MITRE.org)
dsmi...@mitre.org wrote:
 My best guess (and it is just a guess) is between December and March.

 The roots of Solr 4 which triggered the major version change is known as
 flexible indexing (or just flex for short amongst developers).  The
 genesis of it was posted to JIRA as a patch on 18 November 2008 --
 LUCENE-1458 (almost 3 years ago!). About a year later it was committed into
 a special flex branch that is probably gone now, and then around
 April/early-May 2010, it went into trunk whereas the pre-flex code on trunk
 went to a newly formed 3x branch. That is ancient history now, and there are
 some amazing performance improvements tied to flex that haven't seen the
 light of day in an official release. It's a shame, really. So it's been so
 long that, well, after it dawns on everyone that it that the code is 3
 friggin years old without a release -- it's time to get on with the show.

 ~ David Smiley

 -
  Author: https://www.packtpub.com/solr-1-4-enterprise-search-server/book
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Matching-queries-on-a-per-element-basis-against-a-multivalued-field-tp3217432p3220242.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Matching queries on a per-element basis against a multivalued field

2011-08-02 Thread Smiley, David W.
LucidWorks Enterprise (which is more than Solr, and a modified Solr at that) 
isn't free; so you can't extract the Solr part of that package and use it 
unless you are willing to pay them.

Lucid's Certified Solr, on the other hand, is free.  But they have yet to 
bump that to trunk/4.x; it was only recently updated to 3.2.

On Aug 2, 2011, at 5:26 PM, eks dev wrote:

 Well, Lucid released LucidWorks Enterprise
 with   Complete Apache Solr 4.x Release Integrated and tested with
 powerful enhancements
 
 Whatever it means for solr 4.0
 
 
 
 On Tue, Aug 2, 2011 at 11:10 PM, David Smiley (@MITRE.org)
 dsmi...@mitre.org wrote:
 My best guess (and it is just a guess) is between December and March.
 
 The roots of Solr 4 which triggered the major version change is known as
 flexible indexing (or just flex for short amongst developers).  The
 genesis of it was posted to JIRA as a patch on 18 November 2008 --
 LUCENE-1458 (almost 3 years ago!). About a year later it was committed into
 a special flex branch that is probably gone now, and then around
 April/early-May 2010, it went into trunk whereas the pre-flex code on trunk
 went to a newly formed 3x branch. That is ancient history now, and there are
 some amazing performance improvements tied to flex that haven't seen the
 light of day in an official release. It's a shame, really. So it's been so
 long that, well, after it dawns on everyone that it that the code is 3
 friggin years old without a release -- it's time to get on with the show.
 
 ~ David Smiley
 
 -
  Author: https://www.packtpub.com/solr-1-4-enterprise-search-server/book
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Matching-queries-on-a-per-element-basis-against-a-multivalued-field-tp3217432p3220242.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 



Re: Matching queries on a per-element basis against a multivalued field

2011-08-02 Thread eks dev
Sure, I know...,
the point I was trying to make, if someone serious like Lucid  is
using solr 4.x as a core technology for own customers, the trunk could
not be all that bad = release date not as far as 2012 :)


On Tue, Aug 2, 2011 at 11:33 PM, Smiley, David W. dsmi...@mitre.org wrote:
 LucidWorks Enterprise (which is more than Solr, and a modified Solr at 
 that) isn't free; so you can't extract the Solr part of that package and use 
 it unless you are willing to pay them.

 Lucid's Certified Solr, on the other hand, is free.  But they have yet to 
 bump that to trunk/4.x; it was only recently updated to 3.2.

 On Aug 2, 2011, at 5:26 PM, eks dev wrote:

 Well, Lucid released LucidWorks Enterprise
 with   Complete Apache Solr 4.x Release Integrated and tested with
 powerful enhancements

 Whatever it means for solr 4.0



 On Tue, Aug 2, 2011 at 11:10 PM, David Smiley (@MITRE.org)
 dsmi...@mitre.org wrote:
 My best guess (and it is just a guess) is between December and March.

 The roots of Solr 4 which triggered the major version change is known as
 flexible indexing (or just flex for short amongst developers).  The
 genesis of it was posted to JIRA as a patch on 18 November 2008 --
 LUCENE-1458 (almost 3 years ago!). About a year later it was committed into
 a special flex branch that is probably gone now, and then around
 April/early-May 2010, it went into trunk whereas the pre-flex code on trunk
 went to a newly formed 3x branch. That is ancient history now, and there are
 some amazing performance improvements tied to flex that haven't seen the
 light of day in an official release. It's a shame, really. So it's been so
 long that, well, after it dawns on everyone that it that the code is 3
 friggin years old without a release -- it's time to get on with the show.

 ~ David Smiley

 -
  Author: https://www.packtpub.com/solr-1-4-enterprise-search-server/book
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Matching-queries-on-a-per-element-basis-against-a-multivalued-field-tp3217432p3220242.html
 Sent from the Solr - User mailing list archive at Nabble.com.





Re: Matching queries on a per-element basis against a multivalued field

2011-08-02 Thread Smiley, David W.
On Aug 2, 2011, at 5:47 PM, eks dev wrote:

 Sure, I know...,
 the point I was trying to make, if someone serious like Lucid  is
 using solr 4.x as a core technology for own customers, the trunk could
 not be all that bad = release date not as far as 2012 :)

Oh the current trunk is most definitely *not* all that bad, as you say; that 
wasn't a point of discussion.  Code coverage is excellent, testing is rather 
extensive, and many folks like me use it in in production.  But after nearly 3 
years of waiting, I wouldn't hold your breath on it getting released w/i 6 
months (before 2012).

~ David

Re: Solr with many indexes

2011-08-02 Thread Vikram Kumar
We have a multi-tenant Solr deployment with a core for each user.

Due to the limitations we are facing with number of cores,
lazy-loading (and associated warm-up times), we are researching about
consolidating several users into one core with queries limited by
user-id field.

My question is about autosuggest.

1. Are there ways we can limit the autosuggest to only documents with
matching ids?

2. What other SOLR operations like these which need further
consideration when merging multiple indices and limiting by a field?

-- Vikram

On Sat, Jan 22, 2011 at 4:02 PM, Erick Erickson erickerick...@gmail.com wrote:
 See below.

 On Wed, Jan 19, 2011 at 7:26 PM, Joscha Feth jos...@feth.com wrote:

 Hello Erick,

 Thanks for your answer!

 But I question why you *require* many different indexes. [...] including
  isolating one
  users'
  data from all others, [...]


 Yes, thats exactly what I am after - I need to make sure that indexes don't
 mix, as every user shall only be able to query his own data (index).


 well, this can also be handled by simply appending the equivalent of
 +user:theuser
 to each query. This solution does have some interesting side effects
 though.
 In particular if you autosuggest based on combined documents, users will see
 terms NOT in documents they own.



 And even using lots of cores can be made to work if you don't pre-warm
  newly-opened
  cores, assuming that the response time when using cold searchers is
  adequate.
 

 Could you explain that further or point me to some documentation? Are you
 talking about: http://wiki.apache.org/solr/CoreAdmin#UNLOAD? if yes, LOAD
 does not seem to be implemented, yet. Or has this something to do with
 http://wiki.apache.org/solr/SolrCaching#autowarmCount only? About what
 time
 per X documents are we talking here for delay if auto warming is disabled?
 Is there more documentation about this setting?


 It's the autoWarm parameter. When you open a core the first few queries that
 run
 on it will pay some penalty for filling caches etc. If your cores are small
 enough,
 then this penalty may not be noticeable to your users, in which case you can
 just
 not bother autowarming (see firstSearcher , newSearcher). You might also
 be able to get away with having very small caches, it mostly depends on your
 usage patterns. If your pattern as that a user signs on, makes one search
 and
 signs off, there may not be much good in having large caches. On the other
 and,
 if users sign on and search for hours continually, their experience may be
 enhanced
 by having significant caches. It all depends.

 Hopt that helps
 Erick


 Kind regards,
 Joscha





-- 
- Vikram


Re: Matching queries on a per-element basis against a multivalued field

2011-08-02 Thread Suk-Hyun Cho
Thanks for the history and the current state of trunk, guys. It sounds like
it's rather stable for serious use... in which case it's probably ready for
a release, but let's not go back in circles. :) I'll give it a shot
sometime.

Thanks, again!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Matching-queries-on-a-per-element-basis-against-a-multivalued-field-tp3217432p3220449.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr with many indexes

2011-08-02 Thread Otis Gospodnetic
Hello,


From: Vikram Kumar vikrambku...@gmail.com

We have a multi-tenant Solr deployment with a core for each user.

Due to the limitations we are facing with number of cores,
lazy-loading (and associated warm-up times), we are researching about
consolidating several users into one core with queries limited by
user-id field.

My question is about autosuggest.

1. Are there ways we can limit the autosuggest to only documents with
matching ids?


Not sure about Solr's Suggester, but yes this and more is doable with 
Sematext's Autocomplete: http://sematext.com/products/autocomplete/index.html

2. What other SOLR operations like these which need further
consideration when merging multiple indices and limiting by a field?


Spellchecking is the first thing that comes to mind.  Not sure what else...

Otis


On Sat, Jan 22, 2011 at 4:02 PM, Erick Erickson erickerick...@gmail.com 
wrote:
 See below.

 On Wed, Jan 19, 2011 at 7:26 PM, Joscha Feth jos...@feth.com wrote:

 Hello Erick,

 Thanks for your answer!

 But I question why you *require* many different indexes. [...] including
  isolating one
  users'
  data from all others, [...]


 Yes, thats exactly what I am after - I need to make sure that indexes don't
 mix, as every user shall only be able to query his own data (index).


 well, this can also be handled by simply appending the equivalent of
 +user:theuser
 to each query. This solution does have some interesting side effects
 though.
 In particular if you autosuggest based on combined documents, users will see
 terms NOT in documents they own.



 And even using lots of cores can be made to work if you don't pre-warm
  newly-opened
  cores, assuming that the response time when using cold searchers is
  adequate.
 

 Could you explain that further or point me to some documentation? Are you
 talking about: http://wiki.apache.org/solr/CoreAdmin#UNLOAD? if yes, LOAD
 does not seem to be implemented, yet. Or has this something to do with
 http://wiki.apache.org/solr/SolrCaching#autowarmCount only? About what
 time
 per X documents are we talking here for delay if auto warming is disabled?
 Is there more documentation about this setting?


 It's the autoWarm parameter. When you open a core the first few queries that
 run
 on it will pay some penalty for filling caches etc. If your cores are small
 enough,
 then this penalty may not be noticeable to your users, in which case you can
 just
 not bother autowarming (see firstSearcher , newSearcher). You might also
 be able to get away with having very small caches, it mostly depends on your
 usage patterns. If your pattern as that a user signs on, makes one search
 and
 signs off, there may not be much good in having large caches. On the other
 and,
 if users sign on and search for hours continually, their experience may be
 enhanced
 by having significant caches. It all depends.

 Hopt that helps
 Erick


 Kind regards,
 Joscha





-- 
- Vikram


 


SolrCloud: is there a programmatic way to create an ensemble

2011-08-02 Thread Yury Kats
I have multiple SolrCloud instances, each running its own Zookeeper
(Solr launched with -DzkRun).

I would like to create an ensemble out of them. I know about -DzkHost
parameter, but can I achieve the same programmatically? Either with
SolrJ or REST API?

Thanks,
Yury


Re: I can't pass the unit test when compile from apache-solr-3.3.0-src

2011-08-02 Thread Shawn Heisey

On 7/29/2011 5:26 PM, Chris Hostetter wrote:

Can you please be specific...
  * which test(s) fail for you?
  * what are the failures?

Any time a test fails, that info appears in the ant test output, and the
full details or all tests are written to build/test-results

you can run ant test-reports from the solr directory to generate an HTML
report of all the success/failure info.


I am also having a consistent build failure with the 3.3 source.  Some 
info from junit about the failure is below.  If you want something 
different I still have it in my session, let me know.


[junit] NOTE: reproduce with: ant test 
-Dtestcase=TestSqlEntityProcessorDelta 
-Dtestmethod=testNonWritablePersistFile 
-Dtests.seed=4609081405510352067:771607526385155597

[junit] NOTE: test params are: locale=ko_KR, timezone=Asia/Saigon
[junit] NOTE: all tests run in this JVM:
[junit] [TestCachedSqlEntityProcessor, TestClobTransformer, 
TestContentStreamDataSource, TestDataConfig, TestDateFormatTransformer, 
TestDocBuilder, TestDocBuilder2, TestEntityProcessorBase, 
TestErrorHandling, TestEvaluatorBag, 
TestF eldReader, 
TestFileListEntityProcessor, TestJdbcDataSource, 
TestLineEntityProcessor, TestNumberFormatTransformer, 
TestPlainTextEntityProcessor, TestRegexTransformer, 
TestScriptTransformer, TestSqlEntityProcessor, 
TestSqlEntityProcessor2
TestSqlEntityProcessorDelta]
[junit] NOTE: Linux 2.6.18-238.12.1.el5.centos.plusxen amd64/Sun 
Microsystems Inc. 1.6.0_26 
(64-bit)/cpus=3,threads=4,free=100917744,total=254148608



Here's what I did on the last run:

rm -rf lucene_solr_3_3
svn co 
https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_3 
lucene_solr_3_3

cd lucene_solr_3_3/solr
ant clean test

Thanks,
Shawn



Re: IMP: indexing taking very long time

2011-08-02 Thread Naveen Gupta
Can somebody answer this?

What should be the best strategy for optimize (when million of messages we
are indexing for a new registered user)

Thanks
Naveen

On Tue, Aug 2, 2011 at 5:36 PM, Naveen Gupta nkgiit...@gmail.com wrote:

 Hi

 We have a requirement where we are indexing all the messages of a a thread,
 a thread may have attachment too . We are adding to the solr for indexing
 and searching for applying few business rule.

 For a user, we have almost many threads (100k) in number and each thread
 may be having 10-20 messages.

 Now what we are finding is that it is taking 30 mins to index the entire
 threads.

 When we run optimize then it is taking faster time.

 The question here is that how frequently this optimize should be called and
 when ?

 Please note that we are following commit strategy (that is every after 10k
 threads, commit is called). we are not calling commit after every doc.

 Secondly how can we use multi threading from solr perspective in order to
 improve jvm and other utilization ?


 Thanks
 Naveen



Re: SEVERE: org.apache.solr.common.SolrException: Error loading class 'solr.ICUTokenizerFactory'

2011-08-02 Thread Satish Talim
I copied the file apache-solr-analysis-extras-3.3.0.jar into solr's lib
folder. Now the error is different -

SEVERE: java.lang.NoClassDefFoundError:
org/apache/solr/analysis/BaseTokenizerFactory

Please help.

Satish

On Tue, Aug 2, 2011 at 5:23 PM, Robert Muir rcm...@gmail.com wrote:

 did you add the analysis-extras jar itself? thats what has this factory.

 On Tue, Aug 2, 2011 at 5:03 AM, Satish Talim satish.ta...@gmail.com
 wrote:
  I am using Solr 3.3 on a Windows box.
 
  I want to use the solr.ICUTokenizerFactory in my schema.xml and added the
  fieldType name=text_icu as per the URL -
 
 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ICUTokenizerFactory
 
  I also added the following files to my apache-solr-3.3.0\example\lib
 folder:
  lucene-icu-3.3.0.jar
  lucene-smartcn-3.3.0.jar
  icu4j-4_8.jar
  lucene-stempel-3.3.0.jar
 
  When I start my Solr server from apache-solr-3.3.0\example folder:
  java -jar start.jar
 
  I get the following errors:
 
  SEVERE: org.apache.solr.common.SolrException: Error loading class
  'solr.ICUTokenizerFactory'
 
  SEVERE: org.apache.solr.common.SolrException: analyzer without class or
  tokenizer  filter list
 
  SEVERE: org.apache.solr.common.SolrException: Unknown fieldtype
 'text_icu'
  specified on field subject
 
  I tried adding various other jar files to the lib folder but it does not
  help.
 
  What am I doing wrong?
 
  Satish
 



 --
 lucidimagination.com



Re: how to get row no. of current record

2011-08-02 Thread Ranveer

any help..

On Tuesday 02 August 2011 11:22 PM, Ranveer wrote:

Hi,

How to know the row number of current record.

i.e : suppose we have 10 million record indexed. Currently I am on 
5th records and id of the this record is XYZ00234, how to know 
that the current record rows no is 5th.


thanks..






Re: how to get row no. of current record

2011-08-02 Thread Anshum
Hi Ranveer,
I'm not really sure if you mean lucene's docid (as that's the auto increment
id used in here). Why would you need that in the first place? I'd suggest
you not to expose that. Let me know in case you wanted something else. Also,
perhaps you could explain the exact usecase and one of us give you a better
solution.
Hope that helps.
--
Anshum Gupta
http://ai-cafe.blogspot.com


On Tue, Aug 2, 2011 at 11:22 PM, Ranveer ranveer.s...@gmail.com wrote:

 Hi,

 How to know the row number of current record.

 i.e : suppose we have 10 million record indexed. Currently I am on 5th
 records and id of the this record is XYZ00234, how to know that the current
 record rows no is 5th.

 thanks..

 regards
 Ranveer




Re: how to get row no. of current record

2011-08-02 Thread Ranveer

Hi Anshum,

Thanks for reply.

My requirement is to get result start from current id. For this I need 
to set start rows.
I am looking something like Jonty's post : 
http://lucene.472066.n3.nabble.com/previous-and-next-rows-of-current-record-td3187935.html


thanks
Ranveer



On Wednesday 03 August 2011 08:31 AM, Anshum wrote:

Hi Ranveer,
I'm not really sure if you mean lucene's docid (as that's the auto 
increment id used in here). Why would you need that in the first 
place? I'd suggest you not to expose that. Let me know in case you 
wanted something else. Also, perhaps you could explain the exact 
usecase and one of us give you a better solution.

Hope that helps.
--
Anshum Gupta
http://ai-cafe.blogspot.com


On Tue, Aug 2, 2011 at 11:22 PM, Ranveer ranveer.s...@gmail.com 
mailto:ranveer.s...@gmail.com wrote:


Hi,

How to know the row number of current record.

i.e : suppose we have 10 million record indexed. Currently I am on
5th records and id of the this record is XYZ00234, how to know
that the current record rows no is 5th.

thanks..

regards
Ranveer






PivotFaceting in solr 3.3

2011-08-02 Thread Isha Garg

Hi All!

  Can anyone tell which patch should I apply to solr 3.3 to enable 
pivot faceting in it.


Thanks in advance!
Isha garg






Re: PivotFaceting in solr 3.3

2011-08-02 Thread Pranav Prakash
From what I know, this is a feature in Solr 4.0 marked as SOLR-792 in JIRA.
Is this what you are looking for ?

https://issues.apache.org/jira/browse/SOLR-792


*Pranav Prakash*

temet nosce

Twitter http://twitter.com/pranavprakash | Blog http://blog.myblive.com |
Google http://www.google.com/profiles/pranny


On Wed, Aug 3, 2011 at 10:16, Isha Garg isha.g...@orkash.com wrote:

 Hi All!

  Can anyone tell which patch should I apply to solr 3.3 to enable pivot
 faceting in it.

 Thanks in advance!
 Isha garg







Re: PivotFaceting in solr 3.3

2011-08-02 Thread Isha Garg

Hi Pranav,

 I know Pivot faceting is a feature in solr 4.0 But i 
want is there any patch that can make pivot faceting possible in solr3.3.

Thanks!
Isha


On Wednesday 03 August 2011 10:23 AM, Pranav Prakash wrote:

 From what I know, this is a feature in Solr 4.0 marked as SOLR-792 in JIRA.
Is this what you are looking for ?

https://issues.apache.org/jira/browse/SOLR-792


*Pranav Prakash*

temet nosce

Twitterhttp://twitter.com/pranavprakash  | Bloghttp://blog.myblive.com  |
Googlehttp://www.google.com/profiles/pranny


On Wed, Aug 3, 2011 at 10:16, Isha Gargisha.g...@orkash.com  wrote:

   

Hi All!

  Can anyone tell which patch should I apply to solr 3.3 to enable pivot
faceting in it.

Thanks in advance!
Isha garg





 
   




Re: Query on multi valued field

2011-08-02 Thread rajini maski
Thank you. This logic works for me.

Thanks a lot.


Regards,
Rajani Maski




On Wed, Aug 3, 2011 at 1:21 AM, Chris Hostetter hossman_luc...@fucit.orgwrote:


 : The query is get only those documents which have multiple elements
 for
 : that multivalued field.
 :
 : I.e, doc 2 and 3  should be returned from the above set..

 The only way to do something like this is to add a field when you index
 your documents that contains the number and then filter on that field
 using a range query.

 With an UpdateProcessor (or a ScriptTransformer in DIH) you can automate
 counting how many values there are -- but it has to be indexed to
 search/filter on it.



 -Hoss