Re: Best way to check Solr index for completeness

2010-09-28 Thread Dennis Gearon
How soon do you need to know? Couldn't you just regenerate the index using some 
kind of 'nice' factor to not use too much processor/disk/etc?

Dennis Gearon

Signature Warning

EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Tue, 9/28/10, dshvadskiy  wrote:

> From: dshvadskiy 
> Subject: Re: Best way to check Solr index for completeness
> To: solr-user@lucene.apache.org
> Date: Tuesday, September 28, 2010, 2:11 PM
> 
> That will certainly work for most recent updates but I need
> to compare entire
> index.
> 
> Dmitriy
> 
> Luke Crouch wrote:
> > 
> > Is there a 1:1 ratio of db records to solr documents?
> If so, couldn't you
> > simply select the most recent updated record from the
> db and check to make
> > sure the corresponding solr doc has the same
> timestamp?
> > 
> > -L
> > 
> > On Tue, Sep 28, 2010 at 3:48 PM, Dmitriy Shvadskiy
> > wrote:
> > 
> >> Hello,
> >> What would be the best way to check Solr index
> against original system
> >> (Database) to make sure index is up to date? I can
> use Solr fields like
> >> Id
> >> and timestamp to check against appropriate fields
> in database. Our index
> >> currently contains over 2 mln documents across
> several cores. Pulling all
> >> documents from Solr index via search (1000 docs at
> a time) is very slow.
> >> Is
> >> there a better way to do it?
> >>
> >> Thanks,
> >> Dmitriy
> >>
> > 
> > 
> 
> -- 
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Best-way-to-check-Solr-index-for-completeness-tp1598626p1598733.html
> Sent from the Solr - User mailing list archive at
> Nabble.com.
> 


deadlock in solrj?

2010-09-28 Thread Michal Stefanczak
Hello!

 

I' using solrj 1.4.0 with java 1.6, on two occasions when indexing
~18000 documents we got the following problem:

 

(trace from jconsole)

 

Name: pool-1-thread-1

State: WAITING on
java.util.concurrent.locks.abstractqueuedsynchronizer$conditionobj...@11
e464a

Total blocked: 25  Total waited: 1

 

Stack trace: 

sun.misc.Unsafe.park(Native Method)

java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)

java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.aw
ait(AbstractQueuedSynchronizer.java:1925)

java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:25
4)

org.apache.solr.client.solrj.impl.StreamingUpdateSolrServer.request(Stre
amingUpdateSolrServer.java:196)

org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(Abstr
actUpdateRequest.java:105)

 

This is the codeblock that's used for indexing

 

public UpdateResponse
indexDocuments(Collection docs, int commitWithin){

   UpdateResponse updated = null;

   if(docs.isEmpty()){

   return null;

   }

   try {

   UpdateRequest req = new
UpdateRequest();

 
req.setCommitWithin(commitWithin);

   req.add(docs);

   updated =
req.process(solr); 

   } catch (SolrServerException e) {

   logger.error("Error while
indexing documents ["+docs+"]", e);

   } catch (IOException e) {

   logger.error("IOException
while indexing documents ["+docs+"]",e);

   }

   return updated;

}

 

 

The commitWithin used in application is 1.

 

 

If I'm not wrong it's a deadlock. Is this a known issue? 

 

With regards

Michal Stefanczak



Re: multiple local indexes

2010-09-28 Thread Brent Palmer
 Thanks for your comments, Jonathon.  Here is some information that 
gives a brief overview of the eGranary Platform in order to quickly 
outline the need for a solution for bringing multiple indexes into one 
searchable collection.


http://www.widernet.org/egranary/info/multipleIndexes

Thanks,
Brent


On 9/28/2010 5:40 PM, Jonathan Rochkind wrote:

Honestly, I think just putting everything in the same index is your best bet.  Are you sure your 
"particular needs of your project" can't be served by one combined index?  You can certainly still 
query on just a portion of the index when needed using fq -- you can even create a request handler (or 
multiple request handlers) with "invariant" or "appends" to force that all queries 
through that request handler have a fixed fq.

From: Brent Palmer [br...@widernet.org]
Sent: Tuesday, September 28, 2010 6:04 PM
To: solr-user@lucene.apache.org
Subject: multiple local indexes

In our application, we need to be able to search across multiple local
indexes.  We need this not so much for performance reasons, but because
of the particular needs of our project.  But the indexes, while sharing
the same schema can be vary different in terms of size and distribution
of documents.  By that I mean that some indexes may have a lot more
documents about some topic while others will have more documents about
other topics.  We want to be able add documents to the individual
indexes as well.  I can provide more detail about our project is
necessary.  Thus, the Distributed Search feature with shards in
different cores seems to be an obvious solution except for the
limitation of distributed idf.

First, I want to make sure my understanding about the distributed idf
limitation are correct:  If your documents are spread across your shards
evenly, then the distribution of terms across the individual shards can
be assumed to be even enough not to matter.  If, as in our case, the
shards are not very uniform, then this limitation is magnified.  Even
though simplistic, do I have the basic idea?

We have hacked together something that allows us to read from multiple
indexes, but it isn't really a long-term solution.  It's just sort of
shoe-horned in there.   Here are some notes from the programmer who
worked on this:
Two custom files: EgranaryIndexReaderFactory.java and
EgranaryIndexReader.java
EgranaryIndexReader.java
No real work is done here. This class extends
lucene.index.MultiReader and overrides the directory() and getVersion()
methods inherited from IndexReader.
These methods don't  make sense for a MultiReader as they only return
a single value. However, Solr expects Readers to have these methods.
directory() was
overridden to return a call to directory() on the first reader in the
subreader list. The same was done for getVersion(). This hack makes any
use of these methods
by Solr somewhat pointless.

EgranaryIndexReaderFactory.java
Overrides the newReader(Directory indexDir, boolean readOnly) method
The expected behavior of this method is to construct a Reader from
the index at indexDir.
However, this method ignores indexDir and reads a list of indexDirs
from the solrconfig.xml file.
These indices are used to create a list of lucene.index.IndexReader
classes. This list is then used to create the EgranaryIndexReader.

So the second questions is: Does anybody have other ideas about how we
might solve this problem?  Is distributed search still our best bet?

Thanks for your thoughts!
Brent




Solr with example Jetty and score problem

2010-09-28 Thread Floyd Wu
Hi there

I have a problem, the situation is when I issue a query to single instance,
Solr response XML like following
as you can see, the score is normal()
===
 

0
23

_l_title,score
0
_l_unique_key:12
*
true
999




1.9808292
GTest





12





===

But when I issue the query with shard(two instances), the response XML will
be like following.
as you can see, that score has bee tranfer to a element  of 
===
 

0
64

localhost:8983/solr/core0,172.16.6.35:8983/solr
_l_title,score
0
_l_unique_key:12
*
true
999




Gtest

1.9808292






12





===
My Schema.xml like following

   
   
   
   
   

   
 
 _l_unique_key
 _l_body

I don't really know what happended. Is my schema problem or is the behavior
of Solr?
please help on this.


Why the query performance is so different for queries?

2010-09-28 Thread newsam
Hi guys,

I have posted a thread "The search response time is too long". 
 

The SOLR searcher instance is deployed with Tomcat 5.5.21.  
.
The index file is 8.2G. The doc num is 6110745. DELL Server has Intel(R) 
Xeon(TM) CPU (4 cores) 3.00GHZ and 6G RAM.

In SOLR back-end, "query=key:*" costs almost 60s while "query=*:*" only needs 
500ms. Another case is "query=product_name_title:*", which costs 7s. I am 
confused about the query performance. Do you have any suggestions?

btw, the cache setting is as follows:

filterCache: 256, 256, 0
queryResultCache: 1024, 512, 128
documentCache: 16384, 4096, n/a 

Thanks.




Re: How to tell whether a plugin is loaded?

2010-09-28 Thread Chris Hostetter

: then in method createParser() add the following:
: 
: req.getCore().getInfoRegistry().put(getName(), this);

that doesn't seem like a good idea -- createParser will be called every 
time a string needs to be parsed, you're overwriting the same entry in the 
infoRegistry over and over and over again.

I would just put that logic in your init() method (make sure to put the 
QParserPlugin in the registry, not the individual QParser instances)

: I wonder though whether it'd be useful if Solr QParserPlugin did 
: implement SolrInfoMBean by default already...

I agree ... i think that was an oversight when QParser was added.

There's an open issue for it, but no one has had a chance to get arround 
to it yet...

https://issues.apache.org/jira/browse/SOLR-1428



-Hoss

--
http://lucenerevolution.org/  ...  October 7-8, Boston
http://bit.ly/stump-hoss  ...  Stump The Chump!



Re: Best way to check Solr index for completeness

2010-09-28 Thread Erick Erickson
Have you looked at SOLRs TermComponent? Assuming you have a unique key,
I think you could use TermsComponent to walk that field for comparing
against
your database rather then getting all the documents.

HTH
Erick

On Tue, Sep 28, 2010 at 5:11 PM, dshvadskiy  wrote:

>
> That will certainly work for most recent updates but I need to compare
> entire
> index.
>
> Dmitriy
>
> Luke Crouch wrote:
> >
> > Is there a 1:1 ratio of db records to solr documents? If so, couldn't you
> > simply select the most recent updated record from the db and check to
> make
> > sure the corresponding solr doc has the same timestamp?
> >
> > -L
> >
> > On Tue, Sep 28, 2010 at 3:48 PM, Dmitriy Shvadskiy
> > wrote:
> >
> >> Hello,
> >> What would be the best way to check Solr index against original system
> >> (Database) to make sure index is up to date? I can use Solr fields like
> >> Id
> >> and timestamp to check against appropriate fields in database. Our index
> >> currently contains over 2 mln documents across several cores. Pulling
> all
> >> documents from Solr index via search (1000 docs at a time) is very slow.
> >> Is
> >> there a better way to do it?
> >>
> >> Thanks,
> >> Dmitriy
> >>
> >
> >
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Best-way-to-check-Solr-index-for-completeness-tp1598626p1598733.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Re:The search response time is too loong

2010-09-28 Thread newsam
Thx. I will let you know the latest status.
>From: Lance Norskog 
>Reply-To: solr-user@lucene.apache.org
>To: solr-user@lucene.apache.org, newsam 
>Subject: Re: Re:The search response time is too loong
>Date: Tue, 28 Sep 2010 13:34:53 -0700
>
>Copy the index. Delete half of the documents. Optimize.
>Copy the index. Delete the other half of the documents. Optimize.
>
>2010/9/28 newsam
:
>> I guess you are correct. We used the default SOLR cache configuration. I 
>> will change the cache configuration.
>>
>> BTW, I want to deploy several shards from the existing 8G index file, such 
>> as 4G per shards. Is there any tool to generate two shards from one 8G index 
>> file?
>>
>>>From: kenf_nc

>>>Reply-To: solr-user@lucene.apache.org
>>>To: solr-user@lucene.apache.org
>>>Subject: Re: Re:The search response time is too loong
>>>Date: Mon, 27 Sep 2010 05:37:25 -0700 (PDT)
>>>
>>>
>>>"mem usage is over 400M", do you mean Tomcat mem size? If you don't give your
>>>cache sizes enough room to grow you will choke the performance. You should
>>>adjust your Tomcat settings to let the cache grow to at least 1GB or better
>>>would be 2GB. You may also want to look into
>>>http://wiki.apache.org/solr/SolrCaching warming the cache to make the first
>>>time call a little faster.
>>>
>>>For comparison, I also have about 8GB in my index but only 2.8 million
>>>documents. My search query times on a smaller box than you specify are 6533
>>>milliseconds on an unwarmed (newly rebooted) instance.
>>>--
>>>View this message in context: 
>>>http://lucene.472066.n3.nabble.com/Re-The-search-response-time-is-too-loong-tp1587395p1588554.html
>>>Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>
>
>
>-- 
>Lance Norskog
>goks...@gmail.com
> 

SolrCore / Index Searcher Instances

2010-09-28 Thread entdeveloper

This may seem like a stupid question, but why on the info / stats pages do we
see two instances on SolrIndexSearcher?

The reason I ask is that we've implemented SOLR-465 to try and serve our
index from a RAMDirectory, but it appears that our index is being loaded
into memory twice, as our JVM heap size requirements are > 2 x our index
size on disk

Does Solr actually create two instances of SolrCore / SolrIndexSearcher on
startup? If one is used for warming, why isn't it destroyed when it's
finished?

http://lucene.472066.n3.nabble.com/file/n1599373/Screen_shot_2010-09-28_at_4.40.19_PM.png
 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCore-Index-Searcher-Instances-tp1599373p1599373.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Re: Solr Deduplication and Field Collpasing

2010-09-28 Thread Markus Jelsma
 Correction, Java heap size should be RAM buffer size if i'm not too mistaken.

 
-Original message-
From: Markus Jelsma 
Sent: Wed 29-09-2010 01:17
To: solr-user@lucene.apache.org; 
Subject: RE: Re: Solr Deduplication and Field Collpasing

If you can set the digest field for your `non-nutch` documents easily, that 
would be a more quicker approach indeed. No need to create a custom update 
processor or anything like that. But to do so, you would have to reindex the 
whole bunch again. There is no way to update a document without completely 
reindexing that document.

 

Although you can test on a smaller index, indexing 3m documents shouldn't take 
too long if not all come from a Nutch running locally, that would take a while. 
Also, you can speed up indexing [1] but most tips won't help if you're updating 
through Solr's HTTP API, in that case increasing the merge factor and assigning 
more RAM to the Java heap through -Xmx will have the most benefits.

 

[1]: http://wiki.apache.org/lucene-java/ImproveIndexingSpeed

 
-Original message-
From: Nemani, Raj 
Sent: Wed 29-09-2010 00:57
To: solr-user@lucene.apache.org; 
Subject: Re: Solr Deduplication and Field Collpasing

I have the digest field already in the schema because the index is shared 
between nutch docs and others.  I do not know if the second approach is the 
quickest in my case.

I can set the digest value to something unique for non nutch documets easily (I 
have an I'd field that I can use to populate the digest field during indxing of 
new non_nutch documets.  I have custom tool that does the indexing of these 
docs).  But I have more than3 millon documents in the index already that I 
don't want start over with new indexing again if I don't have to. Is there a 
way I can update the digest field with the value from the corresponding I'd 
field using solr? 

Thanks
Raj

- Original Message -
From: Markus Jelsma 
To: solr-user@lucene.apache.org 
Sent: Tue Sep 28 18:19:17 2010
Subject: RE: Solr Deduplication and Field Collpasing

You could create a custom update processor that adds a digest field for newly 
added documents that do not have the digest field themselves. This way, the 
documents that are not added by Nutch get a proper non-empty digest field so 
the deduplication processor won't create the same empty hash and overwrite 
those. Or you could extend 
org.apache.solr.update.processor.SignatureUpdateProcessorFactory so it skips 
documents with an empty digest field. I'd think the latter would be the 
quickest route but correct me if i'm wrong.

 

Cheers,
 
-Original message-
From: Nemani, Raj 
Sent: Tue 28-09-2010 23:28
To: solr-user@lucene.apache.org; 
Subject: Solr Deduplication and Field Collpasing

All,



I have setup Nutch to submit the crawl results to Solr index.  I have
some duplicates in the documents generated by the Nutch crawl.  There is
filed 'digest' that Nutch generates that is same for those documents
that are duplicates.  While setting up the the dedupe processor in the
Solr config file, I have used this 'Digest' field in the following
way(see below for config details).  Since my index has documents other
than the ones generated by Nutch I cannot use 'overwritedupes=true"
because for non-Nutch generated documents the digest field will not be
populated and I found that Solr deletes every one of those documents
that do not have the digest filed populated. Probably because they all
will have the same 'sig' filed value generated based on an 'empty'
digest field forcing Solr to delete everything?



In any case, given the scenario I though I would set
'overwritedupes=false' and use field collapsing based on digest or sig
filed but I could not get filed collapsing to work.  Based on the wiki
documentation I was adding the query string
"&group=true&group.filed=sig&" "&group=true&group.filed=digest&" to my
over all query in admin console and I still got the duplicate documents
in the results.  Is there anything special I need to do to get field
collapsing working?  I am running Solr 1.4.



All this is because Nutch thinks that (url *is* the unique id for the
nutch document)

http://mysite.mydomain.com/index.html and http://mysite/index.html (the
difference is only in the alias and for an internal site both are valid)
are different documents depending on how the link is setup.  This is
reason for me to try deduplication.  I cannot submit SolrDedup command
from Nutch because non-Nutch generated documents do not have digest
filed populated and I read on the mailing lists that this will cause the
SolrDedup initiated from Nutch to fail.  This forced me to do try
deduplication on Solr side.



Thanks so much in advance for your help.





Here is my configuration:



SolrConfig.xml

               

   

RE: Re: Solr Deduplication and Field Collpasing

2010-09-28 Thread Markus Jelsma
If you can set the digest field for your `non-nutch` documents easily, that 
would be a more quicker approach indeed. No need to create a custom update 
processor or anything like that. But to do so, you would have to reindex the 
whole bunch again. There is no way to update a document without completely 
reindexing that document.

 

Although you can test on a smaller index, indexing 3m documents shouldn't take 
too long if not all come from a Nutch running locally, that would take a while. 
Also, you can speed up indexing [1] but most tips won't help if you're updating 
through Solr's HTTP API, in that case increasing the merge factor and assigning 
more RAM to the Java heap through -Xmx will have the most benefits.

 

[1]: http://wiki.apache.org/lucene-java/ImproveIndexingSpeed

 
-Original message-
From: Nemani, Raj 
Sent: Wed 29-09-2010 00:57
To: solr-user@lucene.apache.org; 
Subject: Re: Solr Deduplication and Field Collpasing

I have the digest field already in the schema because the index is shared 
between nutch docs and others.  I do not know if the second approach is the 
quickest in my case.

I can set the digest value to something unique for non nutch documets easily (I 
have an I'd field that I can use to populate the digest field during indxing of 
new non_nutch documets.  I have custom tool that does the indexing of these 
docs).  But I have more than3 millon documents in the index already that I 
don't want start over with new indexing again if I don't have to. Is there a 
way I can update the digest field with the value from the corresponding I'd 
field using solr? 

Thanks
Raj

- Original Message -
From: Markus Jelsma 
To: solr-user@lucene.apache.org 
Sent: Tue Sep 28 18:19:17 2010
Subject: RE: Solr Deduplication and Field Collpasing

You could create a custom update processor that adds a digest field for newly 
added documents that do not have the digest field themselves. This way, the 
documents that are not added by Nutch get a proper non-empty digest field so 
the deduplication processor won't create the same empty hash and overwrite 
those. Or you could extend 
org.apache.solr.update.processor.SignatureUpdateProcessorFactory so it skips 
documents with an empty digest field. I'd think the latter would be the 
quickest route but correct me if i'm wrong.

 

Cheers,
 
-Original message-
From: Nemani, Raj 
Sent: Tue 28-09-2010 23:28
To: solr-user@lucene.apache.org; 
Subject: Solr Deduplication and Field Collpasing

All,



I have setup Nutch to submit the crawl results to Solr index.  I have
some duplicates in the documents generated by the Nutch crawl.  There is
filed 'digest' that Nutch generates that is same for those documents
that are duplicates.  While setting up the the dedupe processor in the
Solr config file, I have used this 'Digest' field in the following
way(see below for config details).  Since my index has documents other
than the ones generated by Nutch I cannot use 'overwritedupes=true"
because for non-Nutch generated documents the digest field will not be
populated and I found that Solr deletes every one of those documents
that do not have the digest filed populated. Probably because they all
will have the same 'sig' filed value generated based on an 'empty'
digest field forcing Solr to delete everything?



In any case, given the scenario I though I would set
'overwritedupes=false' and use field collapsing based on digest or sig
filed but I could not get filed collapsing to work.  Based on the wiki
documentation I was adding the query string
"&group=true&group.filed=sig&" "&group=true&group.filed=digest&" to my
over all query in admin console and I still got the duplicate documents
in the results.  Is there anything special I need to do to get field
collapsing working?  I am running Solr 1.4.



All this is because Nutch thinks that (url *is* the unique id for the
nutch document)

http://mysite.mydomain.com/index.html and http://mysite/index.html (the
difference is only in the alias and for an internal site both are valid)
are different documents depending on how the link is setup.  This is
reason for me to try deduplication.  I cannot submit SolrDedup command
from Nutch because non-Nutch generated documents do not have digest
filed populated and I read on the mailing lists that this will cause the
SolrDedup initiated from Nutch to fail.  This forced me to do try
deduplication on Solr side.



Thanks so much in advance for your help.





Here is my configuration:



SolrConfig.xml

               

               

               

               

               

                   

                     true

                     sig

                     false

                     org.apache.solr.update.processor.Lookup3Signature<

         

Re: Solr Deduplication and Field Collpasing

2010-09-28 Thread Nemani, Raj
I have the digest field already in the schema because the index is shared 
between nutch docs and others.  I do not know if the second approach is the 
quickest in my case.

I can set the digest value to something unique for non nutch documets easily (I 
have an I'd field that I can use to populate the digest field during indxing of 
new non_nutch documets.  I have custom tool that does the indexing of these 
docs).  But I have more than3 millon documents in the index already that I 
don't want start over with new indexing again if I don't have to. Is there a 
way I can update the digest field with the value from the corresponding I'd 
field using solr? 

Thanks
Raj

- Original Message -
From: Markus Jelsma 
To: solr-user@lucene.apache.org 
Sent: Tue Sep 28 18:19:17 2010
Subject: RE: Solr Deduplication and Field Collpasing

You could create a custom update processor that adds a digest field for newly 
added documents that do not have the digest field themselves. This way, the 
documents that are not added by Nutch get a proper non-empty digest field so 
the deduplication processor won't create the same empty hash and overwrite 
those. Or you could extend 
org.apache.solr.update.processor.SignatureUpdateProcessorFactory so it skips 
documents with an empty digest field. I'd think the latter would be the 
quickest route but correct me if i'm wrong.

 

Cheers,
 
-Original message-
From: Nemani, Raj 
Sent: Tue 28-09-2010 23:28
To: solr-user@lucene.apache.org; 
Subject: Solr Deduplication and Field Collpasing

All,



I have setup Nutch to submit the crawl results to Solr index.  I have
some duplicates in the documents generated by the Nutch crawl.  There is
filed 'digest' that Nutch generates that is same for those documents
that are duplicates.  While setting up the the dedupe processor in the
Solr config file, I have used this 'Digest' field in the following
way(see below for config details).  Since my index has documents other
than the ones generated by Nutch I cannot use 'overwritedupes=true"
because for non-Nutch generated documents the digest field will not be
populated and I found that Solr deletes every one of those documents
that do not have the digest filed populated. Probably because they all
will have the same 'sig' filed value generated based on an 'empty'
digest field forcing Solr to delete everything?



In any case, given the scenario I though I would set
'overwritedupes=false' and use field collapsing based on digest or sig
filed but I could not get filed collapsing to work.  Based on the wiki
documentation I was adding the query string
"&group=true&group.filed=sig&" "&group=true&group.filed=digest&" to my
over all query in admin console and I still got the duplicate documents
in the results.  Is there anything special I need to do to get field
collapsing working?  I am running Solr 1.4.



All this is because Nutch thinks that (url *is* the unique id for the
nutch document)

http://mysite.mydomain.com/index.html and http://mysite/index.html (the
difference is only in the alias and for an internal site both are valid)
are different documents depending on how the link is setup.  This is
reason for me to try deduplication.  I cannot submit SolrDedup command
from Nutch because non-Nutch generated documents do not have digest
filed populated and I read on the mailing lists that this will cause the
SolrDedup initiated from Nutch to fail.  This forced me to do try
deduplication on Solr side.



Thanks so much in advance for your help.





Here is my configuration:



SolrConfig.xml

               

               

               

               

               

                   

                     true

                     sig

                     false

                     org.apache.solr.update.processor.Lookup3Signature<

               /str> 

                 digest

                 

                   

                   

                 

               

               

               

                  

                    dedupe

                  

                

               

               Schema.xml

               

               

               



Thanks so much for your help





RE: multiple local indexes

2010-09-28 Thread Jonathan Rochkind
Honestly, I think just putting everything in the same index is your best bet.  
Are you sure your "particular needs of your project" can't be served by one 
combined index?  You can certainly still query on just a portion of the index 
when needed using fq -- you can even create a request handler (or multiple 
request handlers) with "invariant" or "appends" to force that all queries 
through that request handler have a fixed fq. 

From: Brent Palmer [br...@widernet.org]
Sent: Tuesday, September 28, 2010 6:04 PM
To: solr-user@lucene.apache.org
Subject: multiple local indexes

In our application, we need to be able to search across multiple local
indexes.  We need this not so much for performance reasons, but because
of the particular needs of our project.  But the indexes, while sharing
the same schema can be vary different in terms of size and distribution
of documents.  By that I mean that some indexes may have a lot more
documents about some topic while others will have more documents about
other topics.  We want to be able add documents to the individual
indexes as well.  I can provide more detail about our project is
necessary.  Thus, the Distributed Search feature with shards in
different cores seems to be an obvious solution except for the
limitation of distributed idf.

First, I want to make sure my understanding about the distributed idf
limitation are correct:  If your documents are spread across your shards
evenly, then the distribution of terms across the individual shards can
be assumed to be even enough not to matter.  If, as in our case, the
shards are not very uniform, then this limitation is magnified.  Even
though simplistic, do I have the basic idea?

We have hacked together something that allows us to read from multiple
indexes, but it isn't really a long-term solution.  It's just sort of
shoe-horned in there.   Here are some notes from the programmer who
worked on this:
   Two custom files: EgranaryIndexReaderFactory.java and
EgranaryIndexReader.java
   EgranaryIndexReader.java
   No real work is done here. This class extends
lucene.index.MultiReader and overrides the directory() and getVersion()
methods inherited from IndexReader.
   These methods don't  make sense for a MultiReader as they only return
a single value. However, Solr expects Readers to have these methods.
directory() was
   overridden to return a call to directory() on the first reader in the
subreader list. The same was done for getVersion(). This hack makes any
use of these methods
   by Solr somewhat pointless.

   EgranaryIndexReaderFactory.java
   Overrides the newReader(Directory indexDir, boolean readOnly) method
   The expected behavior of this method is to construct a Reader from
the index at indexDir.
   However, this method ignores indexDir and reads a list of indexDirs
from the solrconfig.xml file.
   These indices are used to create a list of lucene.index.IndexReader
classes. This list is then used to create the EgranaryIndexReader.

So the second questions is: Does anybody have other ideas about how we
might solve this problem?  Is distributed search still our best bet?

Thanks for your thoughts!
Brent


RE: Solr Deduplication and Field Collpasing

2010-09-28 Thread Markus Jelsma
You could create a custom update processor that adds a digest field for newly 
added documents that do not have the digest field themselves. This way, the 
documents that are not added by Nutch get a proper non-empty digest field so 
the deduplication processor won't create the same empty hash and overwrite 
those. Or you could extend 
org.apache.solr.update.processor.SignatureUpdateProcessorFactory so it skips 
documents with an empty digest field. I'd think the latter would be the 
quickest route but correct me if i'm wrong.

 

Cheers,
 
-Original message-
From: Nemani, Raj 
Sent: Tue 28-09-2010 23:28
To: solr-user@lucene.apache.org; 
Subject: Solr Deduplication and Field Collpasing

All,



I have setup Nutch to submit the crawl results to Solr index.  I have
some duplicates in the documents generated by the Nutch crawl.  There is
filed 'digest' that Nutch generates that is same for those documents
that are duplicates.  While setting up the the dedupe processor in the
Solr config file, I have used this 'Digest' field in the following
way(see below for config details).  Since my index has documents other
than the ones generated by Nutch I cannot use 'overwritedupes=true"
because for non-Nutch generated documents the digest field will not be
populated and I found that Solr deletes every one of those documents
that do not have the digest filed populated. Probably because they all
will have the same 'sig' filed value generated based on an 'empty'
digest field forcing Solr to delete everything?



In any case, given the scenario I though I would set
'overwritedupes=false' and use field collapsing based on digest or sig
filed but I could not get filed collapsing to work.  Based on the wiki
documentation I was adding the query string
"&group=true&group.filed=sig&" "&group=true&group.filed=digest&" to my
over all query in admin console and I still got the duplicate documents
in the results.  Is there anything special I need to do to get field
collapsing working?  I am running Solr 1.4.



All this is because Nutch thinks that (url *is* the unique id for the
nutch document)

http://mysite.mydomain.com/index.html and http://mysite/index.html (the
difference is only in the alias and for an internal site both are valid)
are different documents depending on how the link is setup.  This is
reason for me to try deduplication.  I cannot submit SolrDedup command
from Nutch because non-Nutch generated documents do not have digest
filed populated and I read on the mailing lists that this will cause the
SolrDedup initiated from Nutch to fail.  This forced me to do try
deduplication on Solr side.



Thanks so much in advance for your help.





Here is my configuration:



SolrConfig.xml

               

               

               

               

               

                   

                     true

                     sig

                     false

                     org.apache.solr.update.processor.Lookup3Signature<

               /str> 

                 digest

                 

                   

                   

                 

               

               

               

                  

                    dedupe

                  

                

               

               Schema.xml

               

               

               



Thanks so much for your help





multiple local indexes

2010-09-28 Thread Brent Palmer
In our application, we need to be able to search across multiple local 
indexes.  We need this not so much for performance reasons, but because 
of the particular needs of our project.  But the indexes, while sharing 
the same schema can be vary different in terms of size and distribution 
of documents.  By that I mean that some indexes may have a lot more 
documents about some topic while others will have more documents about 
other topics.  We want to be able add documents to the individual 
indexes as well.  I can provide more detail about our project is 
necessary.  Thus, the Distributed Search feature with shards in 
different cores seems to be an obvious solution except for the 
limitation of distributed idf.


First, I want to make sure my understanding about the distributed idf 
limitation are correct:  If your documents are spread across your shards 
evenly, then the distribution of terms across the individual shards can 
be assumed to be even enough not to matter.  If, as in our case, the 
shards are not very uniform, then this limitation is magnified.  Even 
though simplistic, do I have the basic idea?


We have hacked together something that allows us to read from multiple 
indexes, but it isn't really a long-term solution.  It's just sort of 
shoe-horned in there.   Here are some notes from the programmer who 
worked on this:
  Two custom files: EgranaryIndexReaderFactory.java and 
EgranaryIndexReader.java

  EgranaryIndexReader.java
  No real work is done here. This class extends 
lucene.index.MultiReader and overrides the directory() and getVersion() 
methods inherited from IndexReader.
  These methods don't  make sense for a MultiReader as they only return 
a single value. However, Solr expects Readers to have these methods. 
directory() was
  overridden to return a call to directory() on the first reader in the 
subreader list. The same was done for getVersion(). This hack makes any 
use of these methods

  by Solr somewhat pointless.

  EgranaryIndexReaderFactory.java
  Overrides the newReader(Directory indexDir, boolean readOnly) method
  The expected behavior of this method is to construct a Reader from 
the index at indexDir.
  However, this method ignores indexDir and reads a list of indexDirs 
from the solrconfig.xml file.
  These indices are used to create a list of lucene.index.IndexReader 
classes. This list is then used to create the EgranaryIndexReader.


So the second questions is: Does anybody have other ideas about how we 
might solve this problem?  Is distributed search still our best bet?


Thanks for your thoughts!
Brent


Re: Conditional Function Queries

2010-09-28 Thread Jan Høydahl / Cominvent
Ok, I created the issues:

IF function: SOLR-2136
AND, OR, NOT: SOLR-2137

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 28. sep. 2010, at 19.36, Yonik Seeley wrote:

> On Tue, Sep 28, 2010 at 11:33 AM, Jan Høydahl / Cominvent
>  wrote:
>> Have anyone written any conditional functions yet for use in Function 
>> Queries?
> 
> Nope - but it makes sense and has been on my list of things to do for
> a long time.
> 
> -Y
> http://lucenerevolution.org  Lucene/Solr Conference, Boston Oct 7-8



Re: Using separate Analyzers for querying and indexing.

2010-09-28 Thread James Norton

Excellent, exactly what I needed.

Thanks,

James

On Sep 28, 2010, at 4:28 PM, Luke Crouch wrote:

> Yeah. You can specify two analyzers in the same fieldType:
> 
> 
> 
> ...
> 
> 
> ...
> 
> 
> 
> -L
> 
> On Tue, Sep 28, 2010 at 2:31 PM, James Norton wrote:
> 
>> Hello,
>> 
>> I am migrating from a pure Lucene application to using solr.  For legacy
>> reasons I must support a somewhat obscure query feature: lowercase words in
>> the query should match lowercase or uppercase in the index, while uppercase
>> words in the query should only match uppercase words in the index.
>> 
>> To do this with Lucene we created a custom Analyzer and custom TokenFilter.
>> During indexing, the custom TokenFilter duplicates uppercase tokens as
>> lowercase ones and sets their offsets to make them appear in same position
>> as the upper case token, i.e., you get two tokens for every uppercase token.
>> Then at query time a normal (case sensitive) analyzer is used so that
>> lowercase tokens will match either upper or lower, while the uppercase will
>> only match uppercase.
>> 
>> I have looked through the documentation and I see how to specify the
>> Analyzer in the schema.xml file that is used for indexing, but I don't know
>> how to specify that a different Analyzer (the case sensitive one) should be
>> used for queries.
>> 
>> Is this possible?
>> 
>> Thanks,
>> 
>> James



Solr Deduplication and Field Collpasing

2010-09-28 Thread Nemani, Raj
All,

 

I have setup Nutch to submit the crawl results to Solr index.  I have
some duplicates in the documents generated by the Nutch crawl.  There is
filed 'digest' that Nutch generates that is same for those documents
that are duplicates.  While setting up the the dedupe processor in the
Solr config file, I have used this 'Digest' field in the following
way(see below for config details).  Since my index has documents other
than the ones generated by Nutch I cannot use 'overwritedupes=true"
because for non-Nutch generated documents the digest field will not be
populated and I found that Solr deletes every one of those documents
that do not have the digest filed populated. Probably because they all
will have the same 'sig' filed value generated based on an 'empty'
digest field forcing Solr to delete everything?

 

In any case, given the scenario I though I would set
'overwritedupes=false' and use field collapsing based on digest or sig
filed but I could not get filed collapsing to work.  Based on the wiki
documentation I was adding the query string
"&group=true&group.filed=sig&" "&group=true&group.filed=digest&" to my
over all query in admin console and I still got the duplicate documents
in the results.  Is there anything special I need to do to get field
collapsing working?  I am running Solr 1.4.

 

All this is because Nutch thinks that (url *is* the unique id for the
nutch document)

http://mysite.mydomain.com/index.html and http://mysite/index.html (the
difference is only in the alias and for an internal site both are valid)
are different documents depending on how the link is setup.  This is
reason for me to try deduplication.  I cannot submit SolrDedup command
from Nutch because non-Nutch generated documents do not have digest
filed populated and I read on the mailing lists that this will cause the
SolrDedup initiated from Nutch to fail.  This forced me to do try
deduplication on Solr side.

 

Thanks so much in advance for your help.





Here is my configuration:

 

SolrConfig.xml













  true

  sig

  false

  org.apache.solr.update.processor.Lookup3Signature<

/str> 

  digest

  





  







   

 dedupe

   

 



Schema.xml







 

Thanks so much for your help

 



Re: Dismax Request handler and Solrconfig.xml

2010-09-28 Thread Luke Crouch
I notice we don't have the default=true, instead we manually specify
qt=dismax in our queries. HTH.

-L

On Tue, Sep 28, 2010 at 4:24 PM, Luke Crouch  wrote:

> What you have is exactly what I have on 1.4.0:
>
>
>   
> 
>  dismax
>
> And it has worked fine. We copied our solrconfig.xml from the examples and
> changed them for our purposes. You might compare your solrconfig.xml to some
> of the examples.
>
> -L
>
>
> On Tue, Sep 28, 2010 at 4:19 PM, Thumuluri, Sai <
> sai.thumul...@verizonwireless.com> wrote:
>
>> Can I please get some help here? I am in a tight timeline to get this
>> done - any ideas/suggestions would be greatly appreciated.
>>
>> -Original Message-
>> From: Thumuluri, Sai [mailto:sai.thumul...@verizonwireless.com]
>> Sent: Tuesday, September 28, 2010 12:15 PM
>> To: solr-user@lucene.apache.org
>> Subject: Dismax Request handler and Solrconfig.xml
>> Importance: High
>>
>> Hi,
>>
>> I am using Solr 1.4.1 with Nutch to index some of our intranet content.
>> In Solrconfig.xml, default request handler is set to "standard". I am
>> planning to change that to use dismax as the request handler but when I
>> set "default=true" for dismax - Solr does not return any results - I get
>> results only when I comment out "dismax".
>>
>> This works
>>  > default="true">
>>
>> 
>>   explicit
>>   10
>>   *
>>   title^20.0 pagedescription^15.0
>>   2.1
>> 
>>  
>>
>> DOES NOT WORK
>>  > default="true">
>>
>> dismax
>> explicit
>>
>> THIS WORKS
>>  > default="true">
>>
>> 
>> > name="echoParams">explicit
>>
>>
>>
>


Re: Dismax Request handler and Solrconfig.xml

2010-09-28 Thread Luke Crouch
What you have is exactly what I have on 1.4.0:

  

 dismax

And it has worked fine. We copied our solrconfig.xml from the examples and
changed them for our purposes. You might compare your solrconfig.xml to some
of the examples.

-L

On Tue, Sep 28, 2010 at 4:19 PM, Thumuluri, Sai <
sai.thumul...@verizonwireless.com> wrote:

> Can I please get some help here? I am in a tight timeline to get this
> done - any ideas/suggestions would be greatly appreciated.
>
> -Original Message-
> From: Thumuluri, Sai [mailto:sai.thumul...@verizonwireless.com]
> Sent: Tuesday, September 28, 2010 12:15 PM
> To: solr-user@lucene.apache.org
> Subject: Dismax Request handler and Solrconfig.xml
> Importance: High
>
> Hi,
>
> I am using Solr 1.4.1 with Nutch to index some of our intranet content.
> In Solrconfig.xml, default request handler is set to "standard". I am
> planning to change that to use dismax as the request handler but when I
> set "default=true" for dismax - Solr does not return any results - I get
> results only when I comment out "dismax".
>
> This works
>   default="true">
>
> 
>   explicit
>   10
>   *
>   title^20.0 pagedescription^15.0
>   2.1
> 
>  
>
> DOES NOT WORK
>   default="true">
>
> dismax
> explicit
>
> THIS WORKS
>   default="true">
>
> 
>  name="echoParams">explicit
>
>
>


RE: Dismax Request handler and Solrconfig.xml

2010-09-28 Thread Thumuluri, Sai
Can I please get some help here? I am in a tight timeline to get this
done - any ideas/suggestions would be greatly appreciated. 

-Original Message-
From: Thumuluri, Sai [mailto:sai.thumul...@verizonwireless.com] 
Sent: Tuesday, September 28, 2010 12:15 PM
To: solr-user@lucene.apache.org
Subject: Dismax Request handler and Solrconfig.xml
Importance: High

Hi,

I am using Solr 1.4.1 with Nutch to index some of our intranet content.
In Solrconfig.xml, default request handler is set to "standard". I am
planning to change that to use dismax as the request handler but when I
set "default=true" for dismax - Solr does not return any results - I get
results only when I comment out "dismax". 

This works
  

 
   explicit
   10
   *
   title^20.0 pagedescription^15.0
   2.1
 
  

DOES NOT WORK
  

 dismax
 explicit

THIS WORKS
  


 explicit




Re: is EmbeddedSolrServer thread safe ?

2010-09-28 Thread Reuben A Christie
 No it is not same for EmbeddedSolrServer, we learned it hard way, I 
guess you would have also learned it by now.



at SolrJ wiki page : http://wiki.apache.org/solr/Solrj#EmbeddedSolrServer

"CommonsHttpSolrServer is thread-safe and if you are using the following 
constructor,
you *MUST* re-use the same instance for all requests. ..."

But is it the same for EmbeddedSolrServer ?

Best regards

Jean-François



--
Reuben Christie
 -^-
 °v°
/(_)\
 ^ ^



Re: Best way to check Solr index for completeness

2010-09-28 Thread dshvadskiy

That will certainly work for most recent updates but I need to compare entire
index.

Dmitriy

Luke Crouch wrote:
> 
> Is there a 1:1 ratio of db records to solr documents? If so, couldn't you
> simply select the most recent updated record from the db and check to make
> sure the corresponding solr doc has the same timestamp?
> 
> -L
> 
> On Tue, Sep 28, 2010 at 3:48 PM, Dmitriy Shvadskiy
> wrote:
> 
>> Hello,
>> What would be the best way to check Solr index against original system
>> (Database) to make sure index is up to date? I can use Solr fields like
>> Id
>> and timestamp to check against appropriate fields in database. Our index
>> currently contains over 2 mln documents across several cores. Pulling all
>> documents from Solr index via search (1000 docs at a time) is very slow.
>> Is
>> there a better way to do it?
>>
>> Thanks,
>> Dmitriy
>>
> 
> 

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Best-way-to-check-Solr-index-for-completeness-tp1598626p1598733.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Concurrent access to EmbeddedSolrServer

2010-09-28 Thread Reuben A Christie


  
  
we learned it hard way, Wish I had read this
  before
  http://wiki.apache.org/solr/EmbeddedSolr
  
  it is not threadsafe. start seeing concurrent modification
  exception as soon as within 100 Samples, when you load it with
  more than 1 Concurrent Users ( I have tested it using jmeter) 
  
  best,
  Reuben

On 12/9/2009 12:47 PM, Jon Poulton wrote:

  Hi there,
I'm about to start implementing some code which will access a Solr instance via a ThreadPool concurrently. I've been looking at the solrj API docs ( particularly http://lucene.apache.org/solr/api/index.html?org/apache/solr/client/solrj/embedded/EmbeddedSolrServer.html )  and I just want to make sure what I have in mind makes sense. The Javadoc is a bit sparse, so I thought I'd ask a couple of questions here.


1)  I'm assuming that EmbeddedSolrServer can be accessed concurrently by several threads at once for add, delete and query operations (on the SolrServer parent interface). Is that right? I don't have to enforce single-threaded access?

2)  What happens if multiple threads simultaneously call commit?

3)  What happens if multiple threads simultaneously call optimize?

4)  Both commit and optimise have optional parameters called "waitFlush" and "waitSearcher". These are undocumented in the Javadoc. What do they signify?

Thanks in advance for any help.

Cheers

Jon




-- 
  
  



Re: Best way to check Solr index for completeness

2010-09-28 Thread Luke Crouch
Is there a 1:1 ratio of db records to solr documents? If so, couldn't you
simply select the most recent updated record from the db and check to make
sure the corresponding solr doc has the same timestamp?

-L

On Tue, Sep 28, 2010 at 3:48 PM, Dmitriy Shvadskiy wrote:

> Hello,
> What would be the best way to check Solr index against original system
> (Database) to make sure index is up to date? I can use Solr fields like Id
> and timestamp to check against appropriate fields in database. Our index
> currently contains over 2 mln documents across several cores. Pulling all
> documents from Solr index via search (1000 docs at a time) is very slow. Is
> there a better way to do it?
>
> Thanks,
> Dmitriy
>


Best way to check Solr index for completeness

2010-09-28 Thread Dmitriy Shvadskiy
Hello,
What would be the best way to check Solr index against original system
(Database) to make sure index is up to date? I can use Solr fields like Id
and timestamp to check against appropriate fields in database. Our index
currently contains over 2 mln documents across several cores. Pulling all
documents from Solr index via search (1000 docs at a time) is very slow. Is
there a better way to do it?

Thanks,
Dmitriy


Re: Search Interface

2010-09-28 Thread Lance Norskog
There is already a simple Velocity app. Just hit
http://localhost:8983/solr/browse.
You can configure some handy parameters to make walkable facets in
solrconfig.xml.

On Tue, Sep 28, 2010 at 5:23 AM, Antonio Calo'  wrote:
>  Hi
>
> You could try to use the Velocity framework to build GUIs in a  quick and
> efficent manner.
>
> Solr come with a velocity handler already integrated that could be the best
> solution in your case:
>
> http://wiki.apache.org/solr/VelocityResponseWriter
>
> Also take these hints on the same topic:
> http://www.lucidimagination.com/blog/2009/11/04/solritas-solr-1-4s-hidden-gem/
>
> there is also a webinar about rapid prototyping with solr:
>
> http://www.slideshare.net/erikhatcher/rapid-prototyping-with-solr-4312681
>
> Hope this help
>
> Antonio
>
>
> Il 28/09/2010 4.35, Claudio Devecchi ha scritto:
>>
>> Hi everybody,
>>
>> I`m implementing my first solr engine for conceptual tests, I`m crawling
>> my
>> wiki intranet to make some searches, the engine is working fine already,
>> but
>> I need some interface to make my searchs.
>> Somebody knows where can I find some search interface just for
>> customizations?
>>
>> Tks
>
>



-- 
Lance Norskog
goks...@gmail.com


Re: Re:The search response time is too loong

2010-09-28 Thread Lance Norskog
Copy the index. Delete half of the documents. Optimize.
Copy the index. Delete the other half of the documents. Optimize.

2010/9/28 newsam :
> I guess you are correct. We used the default SOLR cache configuration. I will 
> change the cache configuration.
>
> BTW, I want to deploy several shards from the existing 8G index file, such as 
> 4G per shards. Is there any tool to generate two shards from one 8G index 
> file?
>
>>From: kenf_nc 
>>Reply-To: solr-user@lucene.apache.org
>>To: solr-user@lucene.apache.org
>>Subject: Re: Re:The search response time is too loong
>>Date: Mon, 27 Sep 2010 05:37:25 -0700 (PDT)
>>
>>
>>"mem usage is over 400M", do you mean Tomcat mem size? If you don't give your
>>cache sizes enough room to grow you will choke the performance. You should
>>adjust your Tomcat settings to let the cache grow to at least 1GB or better
>>would be 2GB. You may also want to look into
>>http://wiki.apache.org/solr/SolrCaching warming the cache  to make the first
>>time call a little faster.
>>
>>For comparison, I also have about 8GB in my index but only 2.8 million
>>documents. My search query times on a smaller box than you specify are 6533
>>milliseconds on an unwarmed (newly rebooted) instance.
>>--
>>View this message in context: 
>>http://lucene.472066.n3.nabble.com/Re-The-search-response-time-is-too-loong-tp1587395p1588554.html
>>Sent from the Solr - User mailing list archive at Nabble.com.
>>



-- 
Lance Norskog
goks...@gmail.com


Re: Using separate Analyzers for querying and indexing.

2010-09-28 Thread Luke Crouch
Yeah. You can specify two analyzers in the same fieldType:



...


...



-L

On Tue, Sep 28, 2010 at 2:31 PM, James Norton wrote:

> Hello,
>
> I am migrating from a pure Lucene application to using solr.  For legacy
> reasons I must support a somewhat obscure query feature: lowercase words in
> the query should match lowercase or uppercase in the index, while uppercase
> words in the query should only match uppercase words in the index.
>
> To do this with Lucene we created a custom Analyzer and custom TokenFilter.
>  During indexing, the custom TokenFilter duplicates uppercase tokens as
> lowercase ones and sets their offsets to make them appear in same position
> as the upper case token, i.e., you get two tokens for every uppercase token.
>  Then at query time a normal (case sensitive) analyzer is used so that
> lowercase tokens will match either upper or lower, while the uppercase will
> only match uppercase.
>
> I have looked through the documentation and I see how to specify the
> Analyzer in the schema.xml file that is used for indexing, but I don't know
> how to specify that a different Analyzer (the case sensitive one) should be
> used for queries.
>
> Is this possible?
>
> Thanks,
>
> James


Using separate Analyzers for querying and indexing.

2010-09-28 Thread James Norton
Hello,

I am migrating from a pure Lucene application to using solr.  For legacy 
reasons I must support a somewhat obscure query feature: lowercase words in the 
query should match lowercase or uppercase in the index, while uppercase words 
in the query should only match uppercase words in the index.

To do this with Lucene we created a custom Analyzer and custom TokenFilter.  
During indexing, the custom TokenFilter duplicates uppercase tokens as 
lowercase ones and sets their offsets to make them appear in same position as 
the upper case token, i.e., you get two tokens for every uppercase token.  Then 
at query time a normal (case sensitive) analyzer is used so that lowercase 
tokens will match either upper or lower, while the uppercase will only match 
uppercase.

I have looked through the documentation and I see how to specify the Analyzer 
in the schema.xml file that is used for indexing, but I don't know how to 
specify that a different Analyzer (the case sensitive one) should be used for 
queries.

Is this possible?

Thanks,

James

SolrException: Bad Request

2010-09-28 Thread Pavel Minchenkov
Hi,
I'm getting a rather strange exception after long web server idle (TomCat
7.0.2). If I immediately run the same request -- no errors are occurred. In
what may be the problem? All server settings are defaults.

Exception:


...
at sun.reflect.GeneratedMethodAccessor101.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at
org.apache.cxf.service.invoker.AbstractInvoker.performInvocation(AbstractInvoker.java:173)
at
org.apache.cxf.service.invoker.AbstractInvoker.invoke(AbstractInvoker.java:89)
at
org.apache.cxf.jaxws.JAXWSMethodInvoker.invoke(JAXWSMethodInvoker.java:60)
at
org.apache.cxf.service.invoker.AbstractInvoker.invoke(AbstractInvoker.java:75)
at
org.apache.cxf.interceptor.ServiceInvokerInterceptor$1.run(ServiceInvokerInterceptor.java:58)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
org.apache.cxf.workqueue.SynchronousExecutor.execute(SynchronousExecutor.java:37)
at
org.apache.cxf.interceptor.ServiceInvokerInterceptor.handleMessage(ServiceInvokerInterceptor.java:106)
at
org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:243)
at
org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:110)
at
org.apache.cxf.transport.servlet.ServletDestination.invoke(ServletDestination.java:98)
at
org.apache.cxf.transport.servlet.ServletController.invokeDestination(ServletController.java:423)
at
org.apache.cxf.transport.servlet.ServletController.invoke(ServletController.java:178)
at
org.apache.cxf.transport.servlet.AbstractCXFServlet.invoke(AbstractCXFServlet.java:142)
at
org.apache.cxf.transport.servlet.AbstractHTTPServlet.handleRequest(AbstractHTTPServlet.java:179)
at
org.apache.cxf.transport.servlet.AbstractHTTPServlet.doPost(AbstractHTTPServlet.java:103)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:641)
at
org.apache.cxf.transport.servlet.AbstractHTTPServlet.service(AbstractHTTPServlet.java:159)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:243)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:201)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:163)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:108)
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:556)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:401)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:242)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:267)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:245)
at
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:260)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
Caused by: org.apache.solr.client.solrj.SolrServerException: Error executing
query
at
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95)
at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:118)
at
com.gramit.services.searching.SearchingService.search(SearchingService.java:186)
... 57 more
Caused by: org.apache.solr.common.SolrException: Bad Request

Bad Request

request: 
http://127.0.0.1/solr/select?q=кофе&fq=lat:[55.16728264288879
TO 56.437558186276114] AND lng:[36.47475305185914
TO
38.735977228049315]&spellcheck=true&spellcheck.count=1&spellcheck.collate=true&spellcheck.q=кофе
&start=0&rows=10&sort=dist(2,lat,lng,55.8076049,37.5869184)
asc&facet=true&facet.limit=5&facet.mincount=1&facet.field=marketplaceCfg_id&facet.field=productCfg_id&stats=true&stats.field=price&wt=javabin&version=1
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:435)
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
at
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89)
... 59 more

Thanks.

-- 
Pavel Minchenkov


Re: Conditional Function Queries

2010-09-28 Thread Yonik Seeley
On Tue, Sep 28, 2010 at 11:33 AM, Jan Høydahl / Cominvent
 wrote:
> Have anyone written any conditional functions yet for use in Function Queries?

Nope - but it makes sense and has been on my list of things to do for
a long time.

-Y
http://lucenerevolution.org  Lucene/Solr Conference, Boston Oct 7-8


RE: Dismax Request handler and Solrconfig.xml

2010-09-28 Thread Thumuluri, Sai
I removed default=true from standard request handler

-Original Message-
From: Luke Crouch [mailto:lcro...@geek.net] 
Sent: Tuesday, September 28, 2010 12:50 PM
To: solr-user@lucene.apache.org
Subject: Re: Dismax Request handler and Solrconfig.xml

Are you removing the standard default requestHandler when you do this?
Or
are you specifying two requestHandler's with default="true" ?

-L

On Tue, Sep 28, 2010 at 11:14 AM, Thumuluri, Sai <
sai.thumul...@verizonwireless.com> wrote:

> Hi,
>
> I am using Solr 1.4.1 with Nutch to index some of our intranet
content.
> In Solrconfig.xml, default request handler is set to "standard". I am
> planning to change that to use dismax as the request handler but when
I
> set "default=true" for dismax - Solr does not return any results - I
get
> results only when I comment out "dismax".
>
> This works
>   default="true">
>
> 
>   explicit
>   10
>   *
>   title^20.0 pagedescription^15.0
>   2.1
> 
>  
>
> DOES NOT WORK
>   default="true">
>
> dismax
> explicit
>
> THIS WORKS
>   default="true">
>
> 
> 
name="echoParams">explicit
>
> Please let me know what I am doing wrong here.
>
> Sai Thumuluri
> Sr. Member - Application Staff
> IT Intranet & Knoweldge Mgmt. Systems
> 614 560-8041 (Desk)
> 614 327-7200 (Mobile)
>
>
>


Re: Dismax Request handler and Solrconfig.xml

2010-09-28 Thread Luke Crouch
Are you removing the standard default requestHandler when you do this? Or
are you specifying two requestHandler's with default="true" ?

-L

On Tue, Sep 28, 2010 at 11:14 AM, Thumuluri, Sai <
sai.thumul...@verizonwireless.com> wrote:

> Hi,
>
> I am using Solr 1.4.1 with Nutch to index some of our intranet content.
> In Solrconfig.xml, default request handler is set to "standard". I am
> planning to change that to use dismax as the request handler but when I
> set "default=true" for dismax - Solr does not return any results - I get
> results only when I comment out "dismax".
>
> This works
>   default="true">
>
> 
>   explicit
>   10
>   *
>   title^20.0 pagedescription^15.0
>   2.1
> 
>  
>
> DOES NOT WORK
>   default="true">
>
> dismax
> explicit
>
> THIS WORKS
>   default="true">
>
> 
>  name="echoParams">explicit
>
> Please let me know what I am doing wrong here.
>
> Sai Thumuluri
> Sr. Member - Application Staff
> IT Intranet & Knoweldge Mgmt. Systems
> 614 560-8041 (Desk)
> 614 327-7200 (Mobile)
>
>
>


Dismax Request handler and Solrconfig.xml

2010-09-28 Thread Thumuluri, Sai
Hi,

I am using Solr 1.4.1 with Nutch to index some of our intranet content.
In Solrconfig.xml, default request handler is set to "standard". I am
planning to change that to use dismax as the request handler but when I
set "default=true" for dismax - Solr does not return any results - I get
results only when I comment out "dismax". 

This works
  

 
   explicit
   10
   *
   title^20.0 pagedescription^15.0
   2.1
 
  

DOES NOT WORK
  

 dismax
 explicit

THIS WORKS
  


 explicit

Please let me know what I am doing wrong here. 

Sai Thumuluri
Sr. Member - Application Staff
IT Intranet & Knoweldge Mgmt. Systems
614 560-8041 (Desk)
614 327-7200 (Mobile)




Conditional Function Queries

2010-09-28 Thread Jan Høydahl / Cominvent
Hi,

Have anyone written any conditional functions yet for use in Function Queries?

I see the use for a function which can run different sub functions depending on 
the value of a field.

Say you have three documents:
A: title=Sports car, color=red
B: title=Boring car, color=green
B: title=Big car, color=black

Now we have a requirement to boost red cars over green and green cars over 
black.

The only way I have found to do this today is (ab)using the map() function. 
DisMax syntax:
q=car&bf=sum(map(query($qr),0,0,0,100.0),map(query($qg),0,0,0,50.0))&qr=color:red&qg=color:green

But I suspect this is expensive in terms of two sub queries being applied and 
scored.

An elegant way to achieve the same would be through a new native if() or case() 
function:
q=car&bf=if(color=="red"; 100; if(color=="green"; 50; 0))
OR
q=car&bf=case(color, "red":100, "green":sum(30,20))

What do you think?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com



Re: What's the difference between TokenizerFactory, Tokenizer, & Analyzer?

2010-09-28 Thread Ahmet Arslan
> 1) KeywordTokenizerFactory seems to be a "tokenizer
> factory" while CJKTokenizer seems to be just a tokenizer.
> Are they the same type of things at all? 
> Could I just replace 
> 
> with
>  class="org.apache.lucene.analysis.cjk.CJKTokenizer"/>
> ??


You should use org.apache.solr.analysis.CJKTokenizerFactory instead.


> 2) I'm also interested in trying out SmartChineseAnalyzer
> (http://lucene.apache.org/java/2_9_0/api/contrib-smartcn/org/apache/lucene/analysis/cn/smart/SmartChineseAnalyzer.html)
> However SmartChineseAnalyzer doesn't offer a separate
> tokenizer. It's just an analyzer and that's it. How do I use
> it in Solr?

You can use lucene analyzer directly in solr:


  




  


RE: Need help with spellcheck city name

2010-09-28 Thread Dyer, James
You might want to look at SOLR-2010.  This patch works with the "collation" 
feature, having it test the collations it returns to ensure they'll return 
hits.  So if a user types "san jos" it will know that the combination "san 
jose" is in the index and "san ojos" is not.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Savannah Beckett [mailto:savannah_becket...@yahoo.com] 
Sent: Monday, September 27, 2010 7:45 PM
To: solr-user@lucene.apache.org
Cc: erickerick...@gmail.com
Subject: Re: Need help with spellcheck city name

No, I checked, there is a city called Swan in Iowa.  So, it is getting from the 
city index, so is Clerk.  But why does it favor Swan than San?  Spellcheck get 
weird after I treat city name as one token.  If I do it in the old way, it let 
San go, and correct Jos as Ojos instead of Jose because Ojos is ranked as #1 
and 
Jose at the middle.  Any more suggestions?  Rank it by frequency first then 
score doesn't work neither.  


 


From: Erick Erickson 
To: solr-user@lucene.apache.org
Sent: Mon, September 27, 2010 5:24:25 PM
Subject: Re: Need help with spellcheck city name

Hmmm, did you rebuild your spelling index after the config changes?

And it really looks like somehow you're getting results from a field other
than city. Are you also sure that your cityname field is of type
autocomplete1?

Shooting in the dark here, but these results are so weird that I suspect
it's
something fundamental

Best
Erick

On Mon, Sep 27, 2010 at 8:05 PM, Savannah Beckett <
savannah_becket...@yahoo.com> wrote:

> No, it doesn't work, I got weird result. I set my city name field to be
> parsed
> as a token as following:
>
>         positionIncrementGap="100">
>          
>            
>            
>          
>          
>            
>            
>          
>        
>
> I got following result for spellcheck:
>
> 
> -    
> -        
>              1
>              0
>              3
> -            
>                  swan
>          
>      
> -        
>              1
>              4
>        8
>                
>          clark
>      
>      
>  
>
>
>
>
>
> 
> From: Tom Hill 
> To: solr-user@lucene.apache.org
> Sent: Mon, September 27, 2010 3:52:48 PM
> Subject: Re: Need help with spellcheck city name
>
> Maybe process the city name as a single token?
>
> On Mon, Sep 27, 2010 at 3:25 PM, Savannah Beckett
>  wrote:
> > Hi,
> >  I have city name as a text field, and I want to do spellcheck on it.  I
> use
> > setting in http://wiki.apache.org/solr/SpellCheckComponent
> >
> > If I setup city name as text field and do spell check on "San Jos" for
> San
> >Jose,
> > I get suggestion for Jos as "ojos".  I checked the extendedresult and I
> found
> > that Jose is in the middle of all 10 suggestions in term of score and
> > frequency.  I then set city name as string field, and spell check again,
> I got
> > Van for San and Ross for Jos, which is weird because San is correct.
> >
> >
> > How do you setup spellchecker to spellcheck city names?  City name can
> have
> > multiple words.
> > Thanks.
> >
> >
> >
>
>
>
>
>



  


Re: Is Solr right for our project?

2010-09-28 Thread Jan Høydahl / Cominvent
Yes, in the latest released version (1.4.1), there is a shards= parameter but 
the client needs to fill it, i.e. the client needs to know what servers are 
indexers, searchers, shard masters and shard replicas...

The SolrCloud stuff is still not committed and only available as a patch right 
now. However, we encourage you to do a test install based on TRUNK+SOLR-1873 
and give it a try. But we cannot guarantee that the APIs will not change in the 
released version (hopefully 3.1 sometime this year).

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 28. sep. 2010, at 10.44, Mike Thomsen wrote:

> Interesting. So what you are saying, though, is that at the moment it
> is NOT there?
> 
> On Mon, Sep 27, 2010 at 9:06 PM, Jan Høydahl / Cominvent
>  wrote:
>> Solr will match this in version 3.1 which is the next major release.
>> Read this page: http://wiki.apache.org/solr/SolrCloud for feature 
>> descriptions
>> Coming to a trunk near you - see 
>> https://issues.apache.org/jira/browse/SOLR-1873
>> 
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>> 
>> On 27. sep. 2010, at 17.44, Mike Thomsen wrote:
>> 
>>> (I apologize in advance if I missed something in your documentation,
>>> but I've read through the Wiki on the subject of distributed searches
>>> and didn't find anything conclusive)
>>> 
>>> We are currently evaluating Solr and Autonomy. Solr is attractive due
>>> to its open source background, following and price. Autonomy is
>>> expensive, but we know for a fact that it can handle our distributed
>>> search requirements perfectly.
>>> 
>>> What we need to know is if Solr has capabilities that match or roughly
>>> approximate Autonomy's Distributed Search Handler. What it does it
>>> acts as a front-end for all of Autonomy's IDOL search servers (which
>>> correspond in this scenario to Solr shards). It is configured to know
>>> what is on each shard, which servers hold each shard and intelligently
>>> farms out queries based on that configuration. There is no need to
>>> specify which IDOL servers to hit while querying; the DiSH just knows
>>> where to go. Additionally, I believe in cases where an index piece is
>>> mirrored, it also monitors server health and falls back intelligently
>>> on other backup instances of a shard/index piece based on that.
>>> 
>>> I'd appreciate it if someone can give me a frank explanation of where
>>> Solr stands in this area.
>>> 
>>> Thanks,
>>> 
>>> Mike
>> 
>> 



RE: Limitations of prohibited clausses in sub-expression - pure negative query

2010-09-28 Thread Patrick Sauts
Maybe SOLR-80 jira issue ?

 

As written in Solr 1.4 book; "pure negative query doesn't work correctly ."
you have to add 'AND *:* '

 

thx

 

 

 

From: Patrick Sauts [mailto:patrick.via...@gmail.com] 
Sent: mardi 28 septembre 2010 11:53
To: 'solr-user@lucene.apache.org'
Subject: Limitations of prohibited clausses in sub-expression - pure
negative query

 

I can find the answer but is this problem solved in Solr 1.4.1 ?

Thx for your answers.

 

 



Re: Limitations of prohibited clausses in sub-expression - pure negative query

2010-09-28 Thread Erick Erickson
Please explain what you want to *do*, your message is so terse it makes it
really hard to figure out what you're asking. A couple of example queries
would help a lot.

Best
Erick

On Tue, Sep 28, 2010 at 5:53 AM, Patrick Sauts wrote:

> I can find the answer but is this problem solved in Solr 1.4.1 ?
>
> Thx for your answers.
>
>
>
>
>
>


Re: Search Interface

2010-09-28 Thread Antonio Calo'

 Hi

You could try to use the Velocity framework to build GUIs in a  quick 
and efficent manner.


Solr come with a velocity handler already integrated that could be the 
best solution in your case:


http://wiki.apache.org/solr/VelocityResponseWriter

Also take these hints on the same topic: 
http://www.lucidimagination.com/blog/2009/11/04/solritas-solr-1-4s-hidden-gem/


there is also a webinar about rapid prototyping with solr:

http://www.slideshare.net/erikhatcher/rapid-prototyping-with-solr-4312681

Hope this help

Antonio


Il 28/09/2010 4.35, Claudio Devecchi ha scritto:

Hi everybody,

I`m implementing my first solr engine for conceptual tests, I`m crawling my
wiki intranet to make some searches, the engine is working fine already, but
I need some interface to make my searchs.
Somebody knows where can I find some search interface just for
customizations?

Tks




Re: is multi-threads searcher feasible idea to speed up?

2010-09-28 Thread Li Li
yes, there is a multisearcher in lucene. but it's idf in 2 indexes are
not global. maybe I can modify it and also the index like:
term1  df=5 doc1 doc3 doc5
term1  df=5 doc2 doc4

2010/9/28 Li Li :
> hi all
>    I want to speed up search time for my application. In a query, the
> time is largly used in reading postlist(io with frq files) and
> calculate scores and collect result(cpu, with Priority Queue). IO is
> hardly optimized or already part optimized by nio. So I want to use
> multithreads to utilize cpu. of course, it may be decrease QPS, but
> the response time will also decrease-- that what I want. Because cpu
> is easily obtained compared to faster hard disk.
>    I read the codes of searching roughly and find it's not an easy
> task to modify search process. So I want to use other easy method .
>    One is use solr distributed search and dispatch documents to many
> shards. but due to the network and global idf problem,it seems not a
> good method for me.
>    Another one is to modify the index structure and averagely
> dispatch frq files.
>    e.g    term1 -> doc1,doc2, doc3,doc4,doc5 in _1.frq
>    I create to 2 indexes with
>            term1->doc1,doc3,doc5
>            term1->doc2,doc4
>    when searching, I create 2 threads with 2 PriorityQueues to
> collect top N docs and merging their results
>    Is the 2nd idea feasible? Or any one has related idea? thanks.
>


Re: is multi-threads searcher feasible idea to speed up?

2010-09-28 Thread Michael McCandless
This is an excellent idea!

And, desperately needed.

It's high time Lucene can take advantage of concurrency when running a
single query.  Machines have tons of cores these days!  (My dev box
has 24!).

Note that one simple way to do this is use ParallelMultiSearcher: it
uses one thread per segment in your index.

But, note that [perversely] this means if your index is optimized you
get no concurrency gain!  So, you have to create your index w/ a
carefully picked maxMergeDocs/MB to ensure you can use concurrency.

I don't like having concurrency tied to index structure.  So a better
approach would be to have each thread pull its own Scorer for the same
query, but then each one does a .advance to it's "chunk" of the index,
and then iterates from there.  Then merge PQs in the end just like
MultiSearcher.

Mike

On Tue, Sep 28, 2010 at 7:24 AM, Li Li  wrote:
> hi all
>    I want to speed up search time for my application. In a query, the
> time is largly used in reading postlist(io with frq files) and
> calculate scores and collect result(cpu, with Priority Queue). IO is
> hardly optimized or already part optimized by nio. So I want to use
> multithreads to utilize cpu. of course, it may be decrease QPS, but
> the response time will also decrease-- that what I want. Because cpu
> is easily obtained compared to faster hard disk.
>    I read the codes of searching roughly and find it's not an easy
> task to modify search process. So I want to use other easy method .
>    One is use solr distributed search and dispatch documents to many
> shards. but due to the network and global idf problem,it seems not a
> good method for me.
>    Another one is to modify the index structure and averagely
> dispatch frq files.
>    e.g    term1 -> doc1,doc2, doc3,doc4,doc5 in _1.frq
>    I create to 2 indexes with
>            term1->doc1,doc3,doc5
>            term1->doc2,doc4
>    when searching, I create 2 threads with 2 PriorityQueues to
> collect top N docs and merging their results
>    Is the 2nd idea feasible? Or any one has related idea? thanks.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


is multi-threads searcher feasible idea to speed up?

2010-09-28 Thread Li Li
hi all
I want to speed up search time for my application. In a query, the
time is largly used in reading postlist(io with frq files) and
calculate scores and collect result(cpu, with Priority Queue). IO is
hardly optimized or already part optimized by nio. So I want to use
multithreads to utilize cpu. of course, it may be decrease QPS, but
the response time will also decrease-- that what I want. Because cpu
is easily obtained compared to faster hard disk.
I read the codes of searching roughly and find it's not an easy
task to modify search process. So I want to use other easy method .
One is use solr distributed search and dispatch documents to many
shards. but due to the network and global idf problem,it seems not a
good method for me.
Another one is to modify the index structure and averagely
dispatch frq files.
e.gterm1 -> doc1,doc2, doc3,doc4,doc5 in _1.frq
I create to 2 indexes with
term1->doc1,doc3,doc5
term1->doc2,doc4
when searching, I create 2 threads with 2 PriorityQueues to
collect top N docs and merging their results
Is the 2nd idea feasible? Or any one has related idea? thanks.


Limitations of prohibited clausses in sub-expression - pure negative query

2010-09-28 Thread Patrick Sauts
I can find the answer but is this problem solved in Solr 1.4.1 ?

Thx for your answers.

 

 



Re: Is Solr right for our project?

2010-09-28 Thread Mike Thomsen
Interesting. So what you are saying, though, is that at the moment it
is NOT there?

On Mon, Sep 27, 2010 at 9:06 PM, Jan Høydahl / Cominvent
 wrote:
> Solr will match this in version 3.1 which is the next major release.
> Read this page: http://wiki.apache.org/solr/SolrCloud for feature descriptions
> Coming to a trunk near you - see 
> https://issues.apache.org/jira/browse/SOLR-1873
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> On 27. sep. 2010, at 17.44, Mike Thomsen wrote:
>
>> (I apologize in advance if I missed something in your documentation,
>> but I've read through the Wiki on the subject of distributed searches
>> and didn't find anything conclusive)
>>
>> We are currently evaluating Solr and Autonomy. Solr is attractive due
>> to its open source background, following and price. Autonomy is
>> expensive, but we know for a fact that it can handle our distributed
>> search requirements perfectly.
>>
>> What we need to know is if Solr has capabilities that match or roughly
>> approximate Autonomy's Distributed Search Handler. What it does it
>> acts as a front-end for all of Autonomy's IDOL search servers (which
>> correspond in this scenario to Solr shards). It is configured to know
>> what is on each shard, which servers hold each shard and intelligently
>> farms out queries based on that configuration. There is no need to
>> specify which IDOL servers to hit while querying; the DiSH just knows
>> where to go. Additionally, I believe in cases where an index piece is
>> mirrored, it also monitors server health and falls back intelligently
>> on other backup instances of a shard/index piece based on that.
>>
>> I'd appreciate it if someone can give me a frank explanation of where
>> Solr stands in this area.
>>
>> Thanks,
>>
>> Mike
>
>


What's the difference between TokenizerFactory, Tokenizer, & Analyzer?

2010-09-28 Thread Andy
Could someone help me to understand the differences between TokenizerFactory, 
Tokenizer, & Analyzer?

Specifically, I'm interested in implementing auto-complete for tags that could 
contain both English & Chinese. I read this article 
(http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/).
 In the article KeywordTokenizerFactory is used as tokenizer. I thought I'd try 
replacing that with CJKTokenizer. 2 questions:

1) KeywordTokenizerFactory seems to be a "tokenizer factory" while CJKTokenizer 
seems to be just a tokenizer. Are they the same type of things at all? 
Could I just replace 

with

??

2) I'm also interested in trying out SmartChineseAnalyzer 
(http://lucene.apache.org/java/2_9_0/api/contrib-smartcn/org/apache/lucene/analysis/cn/smart/SmartChineseAnalyzer.html)
However SmartChineseAnalyzer doesn't offer a separate tokenizer. It's just an 
analyzer and that's it. How do I use it in Solr?

Thanks.
Andy


  


Re: Re:The search response time is too loong

2010-09-28 Thread newsam
I guess you are correct. We used the default SOLR cache configuration. I will 
change the cache configuration.

BTW, I want to deploy several shards from the existing 8G index file, such as 
4G per shards. Is there any tool to generate two shards from one 8G index file?

>From: kenf_nc 
>Reply-To: solr-user@lucene.apache.org
>To: solr-user@lucene.apache.org
>Subject: Re: Re:The search response time is too loong
>Date: Mon, 27 Sep 2010 05:37:25 -0700 (PDT)
>
>
>"mem usage is over 400M", do you mean Tomcat mem size? If you don't give your
>cache sizes enough room to grow you will choke the performance. You should
>adjust your Tomcat settings to let the cache grow to at least 1GB or better
>would be 2GB. You may also want to look into 
>http://wiki.apache.org/solr/SolrCaching warming the cache  to make the first
>time call a little faster. 
>
>For comparison, I also have about 8GB in my index but only 2.8 million
>documents. My search query times on a smaller box than you specify are 6533
>milliseconds on an unwarmed (newly rebooted) instance. 
>-- 
>View this message in context: 
>http://lucene.472066.n3.nabble.com/Re-The-search-response-time-is-too-loong-tp1587395p1588554.html
>Sent from the Solr - User mailing list archive at Nabble.com.
>