RE: Question about autocommit

2008-11-19 Thread Nguyen, Joe
Could ramBufferSizeMB trigger the commit in this case?  

-Original Message-
From: Nickolai Toupikov [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, November 19, 2008 8:36 Joe
To: solr-user@lucene.apache.org
Subject: Question about autocommit

Hello,
I would like some details on the autocommit mechanism. I tried to search
the wiki, but found only the standard maxDoc/time settings.
i have set the autocommit parameters in solrconfig.xml to 8000 docs  and
30milis.
Indexing at around  200 docs per second (from multiple processes, using
the CommonsHttpSolrServer class), i would have expected autocommits to
occur around every  40 seconds, however the jvm log shows the following
-  sometimes more than two calls per second:

$ tail -f jvm-default.log | grep commit
[16:18:15.862] {pool-2-thread-1} start
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:18:16.788] {pool-2-thread-1} end_commit_flush [16:18:21.721]
{pool-2-thread-1} start
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:18:22.073] {pool-2-thread-1} end_commit_flush [16:18:36.047]
{pool-2-thread-1} start
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:18:36.468] {pool-2-thread-1} end_commit_flush [16:18:36.886]
{pool-2-thread-1} start
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:18:37.017] {pool-2-thread-1} end_commit_flush [16:18:37.867]
{pool-2-thread-1} start
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:18:38.448] {pool-2-thread-1} end_commit_flush [16:18:44.375]
{pool-2-thread-1} start
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:18:47.016] {pool-2-thread-1} end_commit_flush [16:18:47.154]
{pool-2-thread-1} start
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:18:47.287] {pool-2-thread-1} end_commit_flush [16:18:50.399]
{pool-2-thread-1} start
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:18:51.283] {pool-2-thread-1} end_commit_flush [16:19:13.782]
{pool-2-thread-1} start
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:19:14.664] {pool-2-thread-1} end_commit_flush [16:19:15.081]
{pool-2-thread-1} start
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:19:15.215] {pool-2-thread-1} end_commit_flush [16:19:15.357]
{pool-2-thread-1} start
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:19:15.955] {pool-2-thread-1} end_commit_flush [16:19:16.421]
{pool-2-thread-1} start
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:19:19.791] {pool-2-thread-1} end_commit_flush [16:19:50.594]
{pool-2-thread-1} start
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:19:52.098] {pool-2-thread-1} end_commit_flush [16:19:52.236]
{pool-2-thread-1} start
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:19:52.368] {pool-2-thread-1} end_commit_flush [16:19:52.917]
{pool-2-thread-1} start
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:19:53.479] {pool-2-thread-1} end_commit_flush [16:19:54.920]
{pool-2-thread-1} start
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:19:55.079] {pool-2-thread-1} end_commit_flush


additionally, in the solr admin page , the update handler reports as
many autocommits as commits - so i assume it is not some commit(); line
lost in my code.

I actually get the feeling that the commits are triggered more and more
often - with not-so-nice influence on indexing speed over time.
Restarting resin seems to get the commit rate to the original level.
Optimizing has no effect.
Is there some other parameter influencing autocommit?

Thank you very much.

Nickolai


RE: Question about autocommit

2008-11-19 Thread Nguyen, Joe
As far as I know, commit could be triggered by

Manually
1.  invoke commit() method
Automatically
2.  maxDoc
3.  maxTime

Since the document size is arbitrary and some document could be huge,
could commit also be triggered by memory buffered size?

-Original Message-
From: Mark Miller [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, November 19, 2008 9:09 Joe
To: solr-user@lucene.apache.org
Subject: Re: Question about autocommit

They are separate commits. ramBufferSizeMB controls when the underlying
Lucene IndexWriter flushes ram to disk (this isnt like the IndexWriter
commiting or closing). The solr autocommit controls when solr asks
IndexWriter to commit what its done so far.

Nguyen, Joe wrote:
 Could ramBufferSizeMB trigger the commit in this case?  

 -Original Message-
 From: Nickolai Toupikov [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, November 19, 2008 8:36 Joe
 To: solr-user@lucene.apache.org
 Subject: Question about autocommit

 Hello,
 I would like some details on the autocommit mechanism. I tried to 
 search the wiki, but found only the standard maxDoc/time settings.
 i have set the autocommit parameters in solrconfig.xml to 8000 docs  
 and 30milis.
 Indexing at around  200 docs per second (from multiple processes, 
 using the CommonsHttpSolrServer class), i would have expected 
 autocommits to occur around every  40 seconds, however the jvm log 
 shows the following
 -  sometimes more than two calls per second:

 $ tail -f jvm-default.log | grep commit
 [16:18:15.862] {pool-2-thread-1} start
 commit(optimize=false,waitFlush=true,waitSearcher=true)
 [16:18:16.788] {pool-2-thread-1} end_commit_flush [16:18:21.721] 
 {pool-2-thread-1} start
 commit(optimize=false,waitFlush=true,waitSearcher=true)
 [16:18:22.073] {pool-2-thread-1} end_commit_flush [16:18:36.047] 
 {pool-2-thread-1} start
 commit(optimize=false,waitFlush=true,waitSearcher=true)
 [16:18:36.468] {pool-2-thread-1} end_commit_flush [16:18:36.886] 
 {pool-2-thread-1} start
 commit(optimize=false,waitFlush=true,waitSearcher=true)
 [16:18:37.017] {pool-2-thread-1} end_commit_flush [16:18:37.867] 
 {pool-2-thread-1} start
 commit(optimize=false,waitFlush=true,waitSearcher=true)
 [16:18:38.448] {pool-2-thread-1} end_commit_flush [16:18:44.375] 
 {pool-2-thread-1} start
 commit(optimize=false,waitFlush=true,waitSearcher=true)
 [16:18:47.016] {pool-2-thread-1} end_commit_flush [16:18:47.154] 
 {pool-2-thread-1} start
 commit(optimize=false,waitFlush=true,waitSearcher=true)
 [16:18:47.287] {pool-2-thread-1} end_commit_flush [16:18:50.399] 
 {pool-2-thread-1} start
 commit(optimize=false,waitFlush=true,waitSearcher=true)
 [16:18:51.283] {pool-2-thread-1} end_commit_flush [16:19:13.782] 
 {pool-2-thread-1} start
 commit(optimize=false,waitFlush=true,waitSearcher=true)
 [16:19:14.664] {pool-2-thread-1} end_commit_flush [16:19:15.081] 
 {pool-2-thread-1} start
 commit(optimize=false,waitFlush=true,waitSearcher=true)
 [16:19:15.215] {pool-2-thread-1} end_commit_flush [16:19:15.357] 
 {pool-2-thread-1} start
 commit(optimize=false,waitFlush=true,waitSearcher=true)
 [16:19:15.955] {pool-2-thread-1} end_commit_flush [16:19:16.421] 
 {pool-2-thread-1} start
 commit(optimize=false,waitFlush=true,waitSearcher=true)
 [16:19:19.791] {pool-2-thread-1} end_commit_flush [16:19:50.594] 
 {pool-2-thread-1} start
 commit(optimize=false,waitFlush=true,waitSearcher=true)
 [16:19:52.098] {pool-2-thread-1} end_commit_flush [16:19:52.236] 
 {pool-2-thread-1} start
 commit(optimize=false,waitFlush=true,waitSearcher=true)
 [16:19:52.368] {pool-2-thread-1} end_commit_flush [16:19:52.917] 
 {pool-2-thread-1} start
 commit(optimize=false,waitFlush=true,waitSearcher=true)
 [16:19:53.479] {pool-2-thread-1} end_commit_flush [16:19:54.920] 
 {pool-2-thread-1} start
 commit(optimize=false,waitFlush=true,waitSearcher=true)
 [16:19:55.079] {pool-2-thread-1} end_commit_flush


 additionally, in the solr admin page , the update handler reports as 
 many autocommits as commits - so i assume it is not some commit(); 
 line lost in my code.

 I actually get the feeling that the commits are triggered more and 
 more often - with not-so-nice influence on indexing speed over time.
 Restarting resin seems to get the commit rate to the original level.
 Optimizing has no effect.
 Is there some other parameter influencing autocommit?

 Thank you very much.

 Nickolai
   



RE: No search result behavior (a la Amazon)

2008-11-19 Thread Nguyen, Joe
Have a look at DisMaxRequestHandler and play with mm (miminum terms
should match)

http://wiki.apache.org/solr/DisMaxRequestHandler?highlight=%28CategorySo
lrRequestHandler%29%7C%28%28CategorySolrRequestHandler%29%29#head-6c5fe4
1d68f3910ed544311435393f5727408e61


-Original Message-
From: Caligula [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, November 19, 2008 11:11 Joe
To: solr-user@lucene.apache.org
Subject: No search result behavior (a la Amazon)


It appears to me that Amazon is using a 100% minimum match policy.  If
there are no matches, they break down the original search terms and give
suggestion search results.

example:

http://www.amazon.com/s/ref=nb_ss_gw?url=search-alias%3Dapsfield-keywor
ds=ipod+nano+4th+generation+8gb+blue+calciumx=0y=0


Can Solr natively achieve something similar?  If not, can you suggest a
way to achieve this?  A custom RequestHandler?


Thanks!
--
View this message in context:
http://www.nabble.com/No-search-result-behavior-%28a-la-Amazon%29-tp2058
7024p20587024.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: No search result behavior (a la Amazon)

2008-11-19 Thread Nguyen, Joe

Seemed like its first search required match all terms.
If it could not find it, like you motioned, you broke down into multiple
smaller term set and ran search to get total hit for each smaller term
set, sort the results by total hits, and display summary page.

Searching for A B C would be
1. q= +A +B +C Match all terms
2. q= +A +B -C Match A and B but not C
3. q =+A -B +C
4. q = 

 

-Original Message-
From: Caligula [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, November 19, 2008 11:52 Joe
To: solr-user@lucene.apache.org
Subject: RE: No search result behavior (a la Amazon)


I understand how to do the 100% mm part.  It's the behavior when there
are no matches that i'm asking about :)



Nguyen, Joe-2 wrote:
 
 Have a look at DisMaxRequestHandler and play with mm (miminum terms 
 should match)
 
 http://wiki.apache.org/solr/DisMaxRequestHandler?highlight=%28Category
 So
 lrRequestHandler%29%7C%28%28CategorySolrRequestHandler%29%29#head-6c5f
 e4
 1d68f3910ed544311435393f5727408e61
 
 
 -Original Message-
 From: Caligula [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, November 19, 2008 11:11 Joe
 To: solr-user@lucene.apache.org
 Subject: No search result behavior (a la Amazon)
 
 
 It appears to me that Amazon is using a 100% minimum match policy.  If

 there are no matches, they break down the original search terms and 
 give suggestion search results.
 
 example:
 
 http://www.amazon.com/s/ref=nb_ss_gw?url=search-alias%3Dapsfield-keyw
 or ds=ipod+nano+4th+generation+8gb+blue+calciumx=0y=0
 
 
 Can Solr natively achieve something similar?  If not, can you suggest 
 a way to achieve this?  A custom RequestHandler?
 
 
 Thanks!
 --
 View this message in context:
 http://www.nabble.com/No-search-result-behavior-%28a-la-Amazon%29-tp20
 58
 7024p20587024.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 

--
View this message in context:
http://www.nabble.com/No-search-result-behavior-%28a-la-Amazon%29-tp2058
7024p20587896.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: Query Response Doc Score - Int Value

2008-11-18 Thread Nguyen, Joe
You don't need to hack the code since you can virtually treated these
scores 2.3518934 and 2.2173865 as if they were both equal (ignoring
digits after the decimal point).

Score = original score(2.3518934) + function(date_created)

You can scale the value of function(date_created) so that digits after
the decimal point in the original score are not significantly influent
the final score.

E.g.  
To treat digits after decimal point *insignificant*
 Score  = 2.3518934 + 10.00 = 12.3518934
  
To make those significant, make function(date_created) results in small
number
 Score = 2.3518934 + 0.2   = 2.5518934

You can specify the function in the request URL
(http://wiki.apache.org/solr/FunctionQuery)



  

-Original Message-
From: Derek Springer [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, November 18, 2008 8:39 Joe
To: solr-user@lucene.apache.org
Subject: Re: Query Response Doc Score - Int Value

Better yet, does anyone know where the method that writes the score
lives?
For instance, a getScore() method that writes the score out that I could
override and truncate? Thanks!

-Derek

On Mon, Nov 17, 2008 at 9:59 PM, Derek Springer [EMAIL PROTECTED]
wrote:

 Thanks for the heads up. Can anyone point me to (or provide me with) 
 an example of writing a function query?

 -Derek


 On Mon, Nov 17, 2008 at 8:17 PM, Yonik Seeley [EMAIL PROTECTED]
wrote:

 A function query is the likely candidate - no such quantization 
 function exists, but it would be relatively easy to write one.

 -Yonik

 On Mon, Nov 17, 2008 at 8:17 PM, Derek Springer [EMAIL PROTECTED]
wrote:
  Hello,
  I am currently performing a query to a Solr index I've set up and 
  I'm
 trying
  to 1) sort on the score and 2) sort on the date_created (a custom 
  field
 I've
  added). The sort command looks like:
sort=score+desc,created_date+desc.
 
  The gist of it is that I will 1) first return the most relevant 
  results
 then
  2) within those results, return the most recent results. However, 
  the
 issue
  I have is that the score is a decimal value that is far to precise
(e.g.
  2.3518934 vs 2.2173865) and will therefore never collide and 
  trigger
 the
  secondary sort on the date.
 
  The question I am asking is if anyone knows a way to produce a 
  score
 that is
  more coarse, or if it is possible to force the score to return as 
  an integer. That way I could have the results collide on the score 
  more
 often
  and therefore sort on the date as well.
 
  Thanks!
  -Derek
 




 --
 Derek B. Springer
 Software Developer
 Mahalo.com, Inc.
 902 Colorado Ave.,
 Santa Monica, CA 90401
 [EMAIL PROTECTED]




--
Derek B. Springer
Software Developer
Mahalo.com, Inc.
902 Colorado Ave.,
Santa Monica, CA 90401
[EMAIL PROTECTED]


RE: abt Multicore

2008-11-17 Thread Nguyen, Joe
 
Any suggestions?
-Original Message-
From: Nguyen, Joe 
Sent: Monday, November 17, 2008 9:40 Joe
To: 'solr-user@lucene.apache.org'
Subject: RE: abt Multicore

Are all the documents in the same search space?  That is, for a given
query, could any of the 10MM docs be returned?

If so, I don't think you need to worry about multicore.  You may however
need to put part of the index on various machines:
http://wiki.apache.org/solr/DistributedSearch 

I also try to make decision whether going with muticore or distributed
search. My concern is as follow:

Does that mean having a single big schema with lot of fields?
Distributed Search requires that each document must have a unique key.
In this case, the unique key cannot be a primary key of a table.

I wonder how Solr performs in this case (distributed search vs.
multicore) 1.  Distributed Search
a.  All documents are in a single index.  Indexing a single document
would lock the index and affect query performance?  
b.  If multi machines are used, Solr will need to query each machine
and merge the result.  This also could impact performance. 
C.  Support MoreLikeThis query given a document id.
2.  Multicore
a.  Each table will be associated with a single core.  Indexing a
single document would lock only a specific core index.  Thus,quering
documents on other cores won't be impacted.
B.  Querying documents across multicore must be handle by the
caller.
C.  Can't support MoreLikeThis query since document id from one core
has no meaning on other cores.

-Original Message-
From: Ryan McKinley [mailto:[EMAIL PROTECTED]
Sent: Monday, November 17, 2008 6:09 Joe
To: solr-user@lucene.apache.org
Subject: Re: abt Multicore

Are all the documents in the same search space?  That is, for a given
query, could any of the 10MM docs be returned?

If so, I don't think you need to worry about multicore.  You may however
need to put part of the index on various machines:
http://wiki.apache.org/solr/DistributedSearch

ryan


On Nov 17, 2008, at 3:47 AM, Raghunandan Rao wrote:

 Hi,

 I have an app running on weblogic and oracle. Oracle DB is quite huge;

 say some 10 millions of records. I need to integrate Solr for this and

 I am planning to use multicore. How can multicore feature can be at 
 the best?



 -Raghu




RE: Updating schema.xml without deleting index?

2008-11-17 Thread Nguyen, Joe
Don't know whether this would work... Just speculate :-)

A.  You'll need to create a new schema with the new field or you could
use dynamic field in your current schema (assume you already config the
default value to 0).
B.  Add a couple of new documents
C.  Run optimize script.  Since optimize will consolidate all segments
into a single segment.  At the end, you'll have a single segment which
include the new field. 

Would that work?

-Original Message-
From: Jeff Lerman [mailto:[EMAIL PROTECTED] 
Sent: Monday, November 17, 2008 12:45 Joe
To: solr-user@lucene.apache.org
Subject: Updating schema.xml without deleting index?

I've tried searching for this answer all over but have found no results
thus far.  I am trying to add a new field to my schema.xml with a
default value of 0.  I have a ton of data indexed right now and it would
be very hard to retrieve all of the original sources to rebuild my
index.  So my question is...is there any way to send a command to SOLR
that tells it to re-index everything it has and include the new field I
added?

 

Thanks,

 

Jeff



RE: Synonyms impacting the performance

2008-11-12 Thread Nguyen, Joe
Could you collaborate further?  20 synonyms would translated to 20
booleanQueries.  Are you saying each booleanQuery requires a disk
access? 

-Original Message-
From: Walter Underwood [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, November 12, 2008 7:46 Joe
To: solr-user@lucene.apache.org
Subject: Re: Synonyms impacting the performance

If there are twenty synonyms, then a one term query becomes a twenty
term query, and that means 20X more disk accesses.

wunder

On 11/12/08 7:08 AM, Erik Hatcher [EMAIL PROTECTED] wrote:

 
 On Nov 12, 2008, at 9:41 AM, Manepalli, Kalyan wrote:
 I did the index time synonyms and results do look much better than 
 the query time indexing.
 But is there a reason for the searches to be that slow. I understand 
 that we have a pretty long list of synonyms (one word contains 
 atleast 20 words as synonyms). Does this have such an adverse impact
 
 Apparently so :/
 
 Are there other components in your request handler that may also be
 (re)executing a query?   Does the debugQuery=true component timings
 point to any other bottlenecks?
 
 Erik
 



RE: Query Performance while updating teh index

2008-11-12 Thread Nguyen, Joe
How about create a new core, index data, then swap the core?  Old core
is still available to handle queries till new core replaces it. 

-Original Message-
From: Lance Norskog [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, November 12, 2008 11:16 Joe
To: solr-user@lucene.apache.org
Subject: RE: Query Performance while updating teh index

Yes, this is the cache autowarming.

We turned this off and staged separate queries that pre-warm our
standard queries. We are looking at pulling the query server out of the
load balancer during this process; it is the most effective way to give
fixed response time.

Lance

-Original Message-
From: oleg_gnatovskiy [mailto:[EMAIL PROTECTED]
Sent: Wednesday, November 12, 2008 11:07 AM
To: solr-user@lucene.apache.org
Subject: Re: Query Performance while updating teh index


The rsync seems to have nothing to do with slowness, because while the
rsync
is going on, there isn't any reload occurring, once the files are on the
system, it tries a curl request to reload the searcher, which at that
point
causes the delays. The file transfer probably has nothing to do with
this.
Does this mean that it happens during warming?



Yonik Seeley wrote:
 
 On Tue, Nov 11, 2008 at 9:31 PM, oleg_gnatovskiy 
 [EMAIL PROTECTED] wrote:
 Hello. We have an index with 15 million documents working on a 
 distributed environment, with an index distribution setup. While an 
 index on a slave server is being updated, query response times become

 extremely slow (upwards of 5 seconds). Is there any way to decrease 
 the hit query response times take while an index is being pushed?
 
 Can you tell why it's getting slow?  Is this during warming, or does 
 it begin during the actual transfer of the new index?
 
 One possibility is that the new index being copied forces out parts of

 the old index from the OS cache.  More memory would help in that 
 scenario.
 
 -Yonik
 
 

--
View this message in context:
http://www.nabble.com/Query-Performance-while-updating-the-index-tp20452
835p
20467099.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: FW: Score customization

2008-11-12 Thread Nguyen, Joe
You could use function query with standardRequestHandler to influence
the final score and sort result by score.  If you want to control how
much the function query would affect the original score, you could use
the linear function.

-Original Message-
From: lajkonik86 [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, November 12, 2008 2:55 Joe
To: solr-user@lucene.apache.org
Subject: RE: FW: Score customization



I effectively need to use a multiplication in the sorting of the items.
Something like score*popularity.
It seems the only way to do this is to use a bf parameter.
However how do you use bf in combination with the standard
requestHandler?


hossman wrote:
 
 
 : Now I need to know whether the FunctionQuery result is considered 
 during
 : the results sorting. That is, are search results sorted according to

 the
 : sum of the similarity and the FunctionQuery value or according to
 : similarity only?
 
 a function query in a larger query contributes to the score just like 
 any other clause ... if you sort by score, you are sorting by the 
 *whole* score ... if you sort by some other field then the *whole* 
 score is irrelevant.
 
 
 
 -Hoss
 
 
 


--
View this message in context:
http://www.nabble.com/Score-customization-tp13404845p20458084.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: Query Performance while updating teh index

2008-11-12 Thread Nguyen, Joe
Another way to handle this is not to run commit script at peak
time(still pull snapshot periodically).  Keeping track of the number of
requests, resource utilization, etc.. If the number of request exceeds
the threshold, don't commit.
 
Also, how many segments do you see under index dir?  High number of
segments (implied more files need to be opened) would impact query
response time.  In that case, you could run optimize script to
consolidate into a single segment. 

-Original Message-
From: Lance Norskog [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, November 12, 2008 11:16 Joe
To: solr-user@lucene.apache.org
Subject: RE: Query Performance while updating teh index

Yes, this is the cache autowarming.

We turned this off and staged separate queries that pre-warm our
standard queries. We are looking at pulling the query server out of the
load balancer during this process; it is the most effective way to give
fixed response time.

Lance

-Original Message-
From: oleg_gnatovskiy [mailto:[EMAIL PROTECTED]
Sent: Wednesday, November 12, 2008 11:07 AM
To: solr-user@lucene.apache.org
Subject: Re: Query Performance while updating teh index


The rsync seems to have nothing to do with slowness, because while the
rsync
is going on, there isn't any reload occurring, once the files are on the
system, it tries a curl request to reload the searcher, which at that
point
causes the delays. The file transfer probably has nothing to do with
this.
Does this mean that it happens during warming?



Yonik Seeley wrote:
 
 On Tue, Nov 11, 2008 at 9:31 PM, oleg_gnatovskiy 
 [EMAIL PROTECTED] wrote:
 Hello. We have an index with 15 million documents working on a 
 distributed environment, with an index distribution setup. While an 
 index on a slave server is being updated, query response times become

 extremely slow (upwards of 5 seconds). Is there any way to decrease 
 the hit query response times take while an index is being pushed?
 
 Can you tell why it's getting slow?  Is this during warming, or does 
 it begin during the actual transfer of the new index?
 
 One possibility is that the new index being copied forces out parts of

 the old index from the OS cache.  More memory would help in that 
 scenario.
 
 -Yonik
 
 

--
View this message in context:
http://www.nabble.com/Query-Performance-while-updating-the-index-tp20452
835p
20467099.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: Handling proper names

2008-11-07 Thread Nguyen, Joe
Use synonym.
Added these line to your ../conf/synonym.txt
Stephen,Steven,Steve
Bobby,Bob,Robert
...


-Original Message-
From: news [mailto:[EMAIL PROTECTED] On Behalf Of Jon Drukman
Sent: Friday, November 07, 2008 3:19 Joe
To: solr-user@lucene.apache.org
Subject: Handling proper names

Is there any way to tell Solr that Stephen is the same as Steven and 
Steve?  Carl and Karl?  Bobby/Bob/Robert, and so on...

-jsd-



Bias score proximity for a given field

2008-11-05 Thread Nguyen, Joe
Hi 
  Is there a way to specify a range boosting for a numeric/date field?
Suppose I have articles whose published dates are in
2005,...,2008,...,2011.  I want to boost the score of 2008 article by
20%.  Articles whose published dates 3-year distance from 2008 article
would be boosted by 0%, e.g. 2005 and 2011 articles will be boosted by
0%.   

Any idea/suggestion how I implement this?   

Cheers
Joe


Changing field datatype

2008-10-28 Thread Nguyen, Joe
I have a solr core having 2 million lengthy documents.  

1.  If I modify datatype of a field 'foo' from string to a sint and
restart the server, what would happen to the existing documents? And
documents added with the new schema?  At query time (sort=foo desc),
should I expect the documents sorted properly? 

Do I need to re-index all documents?

2. If I add two additional fields, do I need to re-index again?

Thanks.


RE: Changing field datatype

2008-10-28 Thread Nguyen, Joe
Thanks for your quick reply.

What would be a reasonable way to handle this without affecting the end
users?  

Create a new dynamic core with the new schema, load documents to the new
core, then swap the cores?  At some moments, two mostly identical cores
co-exist on solr server, would that impact query time?   

-Original Message-
From: Shalin Shekhar Mangar [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, October 28, 2008 1:33 Joe
To: solr-user@lucene.apache.org
Subject: Re: Changing field datatype

On Wed, Oct 29, 2008 at 1:55 AM, Nguyen, Joe [EMAIL PROTECTED]
wrote:


 1.  If I modify datatype of a field 'foo' from string to a sint and
 restart the server, what would happen to the existing documents? And
 documents added with the new schema?  At query time (sort=foo desc),
 should I expect the documents sorted properly?

Do I need to re-index all documents?


The fields can't be converted automatically. Therefore, a sort on foo
will
still be a lexical sort instead of a numerical sort. You'll have to
re-index
to have foo desc give a numerically non-ascending sort order.


 2. If I add two additional fields, do I need to re-index again?


The old documents won't have any values for those fields of course but
new
documents will. It is best to re-index to avoid any inconsistencies.

-- 
Regards,
Shalin Shekhar Mangar.


Query integer type

2008-10-28 Thread Nguyen, Joe
SITE is defined as integer.  I wanted to select all document whose SITE=3002, 
but SITE of the response was different.  

field name=SITE type=integer indexed=true stored=true required=true/


http://localhost:8080/solr/mysite/select?indent=onqt=standardfl=SITEfq:SITE:3002

http://localhost:8080/solr/mysite/select?indent=onqt=dismaxfl=SITEfq:SITE:3002

http://localhost:8080/solr/mysite/select?indent=onqt=standardfl=SITESITE:3002




result name=response numFound=470 start=0
−
doc
int name=SITE1/int
/doc
−
doc
int name=SITE1/int
/doc
−
doc
int name=SITE1/int
/doc
−
doc
int name=SITE1/int
/doc
−
doc
int name=SITE2/int


Field Analysis

Index Analyzer
org.apache.solr.schema.FieldType$DefaultAnalyzer {}
term position   1
term text   3002
term type   word
source start,end0,4
payload 

Query Analyzer
org.apache.solr.schema.FieldType$DefaultAnalyzer {}
term position   1
term text   3002
term type   word
source start,end0,4
payload   

Should term type be integer?

Any suggestion?

Cheers
  


RE: Query integer type

2008-10-28 Thread Nguyen, Joe
Never mind.  I misused the syntax.  :-)

-Original Message-
From: Nguyen, Joe [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, October 28, 2008 7:00 Joe
To: solr-user@lucene.apache.org
Subject: Query integer type

SITE is defined as integer.  I wanted to select all document whose SITE=3002, 
but SITE of the response was different.  

field name=SITE type=integer indexed=true stored=true required=true/


http://localhost:8080/solr/mysite/select?indent=onqt=standardfl=SITEfq:SITE:3002

http://localhost:8080/solr/mysite/select?indent=onqt=dismaxfl=SITEfq:SITE:3002

http://localhost:8080/solr/mysite/select?indent=onqt=standardfl=SITESITE:3002




result name=response numFound=470 start=0
−
doc
int name=SITE1/int
/doc
−
doc
int name=SITE1/int
/doc
−
doc
int name=SITE1/int
/doc
−
doc
int name=SITE1/int
/doc
−
doc
int name=SITE2/int


Field Analysis

Index Analyzer
org.apache.solr.schema.FieldType$DefaultAnalyzer {}
term position   1
term text   3002
term type   word
source start,end0,4
payload 

Query Analyzer
org.apache.solr.schema.FieldType$DefaultAnalyzer {}
term position   1
term text   3002
term type   word
source start,end0,4
payload   

Should term type be integer?

Any suggestion?

Cheers
  


multicore admin interface

2008-10-23 Thread Nguyen, Joe
Hi,
I have two cores.  When each core references the same dataDir, I could
access the core admin interface.  However, when core1 dirData is
referencing one directory, and core2 another directory, I could not
access the admin interface.

Any idea?

//each core references a different dir
!-- dataDir${solr.data.dir:./solr/multicore/myCore1/data}/dataDir
--

//both cores reference the same dir
dataDir${solr.data.dir:./solr/data}/dataDir