Re: Solr 1.3 - response time very long

2008-12-04 Thread sunnyfr

Thanks a lot guys for your time,
I appreciate it.

I will follow all your advice.


Yonik Seeley wrote:
 
 On Wed, Dec 3, 2008 at 11:49 AM, sunnyfr [EMAIL PROTECTED] wrote:
 Sorry the request is more :

 /select?q=text:svr09\+tutorial+AND+status_published:1+AND+status_moderated:0+AND+status_personal:0+AND+status_explicit:0+AND+status_private:0+AND+status_deleted:0+AND+status_error:0+AND+status_read
 or even I tried :
 
 There are a bunch of things you could try to speed things up a bit:
 1) optimize the index if you haven't
 2) use a faster response writer with a more compact format (i.e. add
 wt=javabin for a binary format or wt=json for JSON)
 3) use fl (field list) to restrict the results to only the fields you need
 4) never use debugQuery to benchmark performance (I don't think you
 actually did, but you did list it in the example dismax URL)
 5) pull out clauses that match many documents and that are common
 across many queries into filters.
 
 /select?q=text:svr09\+tutorialfq=status_published:1+AND+status_moderated:0+AND+status_personal:0+AND+status_explicit:0+AND+status_private:0+AND+status_deleted:0+AND+status_error:0+AND+status_read
 
 You can also use multiple filter queries for better caching if some of
 the clauses appear in smaller groups or in isolation.  If you can give
 more examples, we can tell what the common parts are.
 
 -Yonik
 
 

-- 
View this message in context: 
http://www.nabble.com/Solr-1.3---response-time-very-long-tp20795134p20829777.html
Sent from the Solr - User mailing list archive at Nabble.com.



changing schema is dynamic or not

2008-12-04 Thread Neha Bhardwaj
Hi,

Every time I make any change in schema , I have to restart the server. Is
this because I have made some mistake or It is like this only

 

I mean,

I have this doubt that if we make any kind of changes to schema.xml , do we
need to restart the server or we can continue without restarting the server.

 

 

 

 


DISCLAIMER
==
This e-mail may contain privileged and confidential information which is the 
property of Persistent Systems Ltd. It is intended only for the use of the 
individual or entity to which it is addressed. If you are not the intended 
recipient, you are not authorized to read, retain, copy, print, distribute or 
use this message. If you have received this communication in error, please 
notify the sender and delete all copies of this message. Persistent Systems 
Ltd. does not accept any liability for virus infected mails.


Re: changing schema is dynamic or not

2008-12-04 Thread Noble Paul നോബിള്‍ नोब्ळ्
you have to restart the server

You may also need to re-index the data if the changes are incompatible

On Thu, Dec 4, 2008 at 3:09 PM, Neha Bhardwaj
[EMAIL PROTECTED] wrote:
 Hi,

 Every time I make any change in schema , I have to restart the server. Is
 this because I have made some mistake or It is like this only



 I mean,

 I have this doubt that if we make any kind of changes to schema.xml , do we
 need to restart the server or we can continue without restarting the server.










 DISCLAIMER
 ==
 This e-mail may contain privileged and confidential information which is the 
 property of Persistent Systems Ltd. It is intended only for the use of the 
 individual or entity to which it is addressed. If you are not the intended 
 recipient, you are not authorized to read, retain, copy, print, distribute or 
 use this message. If you have received this communication in error, please 
 notify the sender and delete all copies of this message. Persistent Systems 
 Ltd. does not accept any liability for virus infected mails.




-- 
--Noble Paul


RE: changing schema is dynamic or not

2008-12-04 Thread Neha Bhardwaj
Is there any way by which this can be avoided.

-Original Message-
From: Noble Paul നോബിള്‍ नोब्ळ् [mailto:[EMAIL PROTECTED]
Sent: Thursday, December 04, 2008 3:12 PM
To: solr-user@lucene.apache.org
Subject: Re: changing schema is dynamic or not

you have to restart the server

You may also need to re-index the data if the changes are incompatible

On Thu, Dec 4, 2008 at 3:09 PM, Neha Bhardwaj
[EMAIL PROTECTED] wrote:
 Hi,

 Every time I make any change in schema , I have to restart the server. Is
 this because I have made some mistake or It is like this only



 I mean,

 I have this doubt that if we make any kind of changes to schema.xml , do we
 need to restart the server or we can continue without restarting the server.










 DISCLAIMER
 ==
 This e-mail may contain privileged and confidential information which is the 
 property of Persistent Systems Ltd. It is intended only for the use of the 
 individual or entity to which it is addressed. If you are not the intended 
 recipient, you are not authorized to read, retain, copy, print, distribute or 
 use this message. If you have received this communication in error, please 
 notify the sender and delete all copies of this message. Persistent Systems 
 Ltd. does not accept any liability for virus infected mails.




--
--Noble Paul


DISCLAIMER
==
This e-mail may contain privileged and confidential information which is the 
property of Persistent Systems Ltd. It is intended only for the use of the 
individual or entity to which it is addressed. If you are not the intended 
recipient, you are not authorized to read, retain, copy, print, distribute or 
use this message. If you have received this communication in error, please 
notify the sender and delete all copies of this message. Persistent Systems 
Ltd. does not accept any liability for virus infected mails.


Re: changing schema is dynamic or not

2008-12-04 Thread Noble Paul നോബിള്‍ नोब्ळ्
It is possible
you can reload a core  through the http API

but if the changes are incompatible you will have to re-index the data




On Thu, Dec 4, 2008 at 3:16 PM, Neha Bhardwaj
[EMAIL PROTECTED] wrote:
 Is there any way by which this can be avoided.

 -Original Message-
 From: Noble Paul നോബിള്‍ नोब्ळ् [mailto:[EMAIL PROTECTED]
 Sent: Thursday, December 04, 2008 3:12 PM
 To: solr-user@lucene.apache.org
 Subject: Re: changing schema is dynamic or not

 you have to restart the server

 You may also need to re-index the data if the changes are incompatible

 On Thu, Dec 4, 2008 at 3:09 PM, Neha Bhardwaj
 [EMAIL PROTECTED] wrote:
 Hi,

 Every time I make any change in schema , I have to restart the server. Is
 this because I have made some mistake or It is like this only



 I mean,

 I have this doubt that if we make any kind of changes to schema.xml , do we
 need to restart the server or we can continue without restarting the server.










 DISCLAIMER
 ==
 This e-mail may contain privileged and confidential information which is the 
 property of Persistent Systems Ltd. It is intended only for the use of the 
 individual or entity to which it is addressed. If you are not the intended 
 recipient, you are not authorized to read, retain, copy, print, distribute 
 or use this message. If you have received this communication in error, 
 please notify the sender and delete all copies of this message. Persistent 
 Systems Ltd. does not accept any liability for virus infected mails.




 --
 --Noble Paul


 DISCLAIMER
 ==
 This e-mail may contain privileged and confidential information which is the 
 property of Persistent Systems Ltd. It is intended only for the use of the 
 individual or entity to which it is addressed. If you are not the intended 
 recipient, you are not authorized to read, retain, copy, print, distribute or 
 use this message. If you have received this communication in error, please 
 notify the sender and delete all copies of this message. Persistent Systems 
 Ltd. does not accept any liability for virus infected mails.




-- 
--Noble Paul


RE: changing schema is dynamic or not

2008-12-04 Thread Neha Bhardwaj
Could you brief ,What exactly I need to do?


-Original Message-
From: Noble Paul നോബിള്‍ नोब्ळ् [mailto:[EMAIL PROTECTED]
Sent: Thursday, December 04, 2008 3:47 PM
To: solr-user@lucene.apache.org
Subject: Re: changing schema is dynamic or not

It is possible
you can reload a core  through the http API

but if the changes are incompatible you will have to re-index the data




On Thu, Dec 4, 2008 at 3:16 PM, Neha Bhardwaj
[EMAIL PROTECTED] wrote:
 Is there any way by which this can be avoided.

 -Original Message-
 From: Noble Paul നോബിള്‍ नोब्ळ् [mailto:[EMAIL PROTECTED]
 Sent: Thursday, December 04, 2008 3:12 PM
 To: solr-user@lucene.apache.org
 Subject: Re: changing schema is dynamic or not

 you have to restart the server

 You may also need to re-index the data if the changes are incompatible

 On Thu, Dec 4, 2008 at 3:09 PM, Neha Bhardwaj
 [EMAIL PROTECTED] wrote:
 Hi,

 Every time I make any change in schema , I have to restart the server. Is
 this because I have made some mistake or It is like this only



 I mean,

 I have this doubt that if we make any kind of changes to schema.xml , do we
 need to restart the server or we can continue without restarting the server.










 DISCLAIMER
 ==
 This e-mail may contain privileged and confidential information which is the 
 property of Persistent Systems Ltd. It is intended only for the use of the 
 individual or entity to which it is addressed. If you are not the intended 
 recipient, you are not authorized to read, retain, copy, print, distribute 
 or use this message. If you have received this communication in error, 
 please notify the sender and delete all copies of this message. Persistent 
 Systems Ltd. does not accept any liability for virus infected mails.




 --
 --Noble Paul


 DISCLAIMER
 ==
 This e-mail may contain privileged and confidential information which is the 
 property of Persistent Systems Ltd. It is intended only for the use of the 
 individual or entity to which it is addressed. If you are not the intended 
 recipient, you are not authorized to read, retain, copy, print, distribute or 
 use this message. If you have received this communication in error, please 
 notify the sender and delete all copies of this message. Persistent Systems 
 Ltd. does not accept any liability for virus infected mails.




--
--Noble Paul


DISCLAIMER
==
This e-mail may contain privileged and confidential information which is the 
property of Persistent Systems Ltd. It is intended only for the use of the 
individual or entity to which it is addressed. If you are not the intended 
recipient, you are not authorized to read, retain, copy, print, distribute or 
use this message. If you have received this communication in error, please 
notify the sender and delete all copies of this message. Persistent Systems 
Ltd. does not accept any liability for virus infected mails.


Re: changing schema is dynamic or not

2008-12-04 Thread Noble Paul നോബിള്‍ नोब्ळ्
http://wiki.apache.org/solr/CoreAdmin#head-3f125034c6a64611779442539812067b8b430930

On Thu, Dec 4, 2008 at 4:06 PM, Neha Bhardwaj
[EMAIL PROTECTED] wrote:
 Could you brief ,What exactly I need to do?


 -Original Message-
 From: Noble Paul നോബിള്‍ नोब्ळ् [mailto:[EMAIL PROTECTED]
 Sent: Thursday, December 04, 2008 3:47 PM
 To: solr-user@lucene.apache.org
 Subject: Re: changing schema is dynamic or not

 It is possible
 you can reload a core  through the http API

 but if the changes are incompatible you will have to re-index the data




 On Thu, Dec 4, 2008 at 3:16 PM, Neha Bhardwaj
 [EMAIL PROTECTED] wrote:
 Is there any way by which this can be avoided.

 -Original Message-
 From: Noble Paul നോബിള്‍ नोब्ळ् [mailto:[EMAIL PROTECTED]
 Sent: Thursday, December 04, 2008 3:12 PM
 To: solr-user@lucene.apache.org
 Subject: Re: changing schema is dynamic or not

 you have to restart the server

 You may also need to re-index the data if the changes are incompatible

 On Thu, Dec 4, 2008 at 3:09 PM, Neha Bhardwaj
 [EMAIL PROTECTED] wrote:
 Hi,

 Every time I make any change in schema , I have to restart the server. Is
 this because I have made some mistake or It is like this only



 I mean,

 I have this doubt that if we make any kind of changes to schema.xml , do we
 need to restart the server or we can continue without restarting the server.










 DISCLAIMER
 ==
 This e-mail may contain privileged and confidential information which is 
 the property of Persistent Systems Ltd. It is intended only for the use of 
 the individual or entity to which it is addressed. If you are not the 
 intended recipient, you are not authorized to read, retain, copy, print, 
 distribute or use this message. If you have received this communication in 
 error, please notify the sender and delete all copies of this message. 
 Persistent Systems Ltd. does not accept any liability for virus infected 
 mails.




 --
 --Noble Paul


 DISCLAIMER
 ==
 This e-mail may contain privileged and confidential information which is the 
 property of Persistent Systems Ltd. It is intended only for the use of the 
 individual or entity to which it is addressed. If you are not the intended 
 recipient, you are not authorized to read, retain, copy, print, distribute 
 or use this message. If you have received this communication in error, 
 please notify the sender and delete all copies of this message. Persistent 
 Systems Ltd. does not accept any liability for virus infected mails.




 --
 --Noble Paul


 DISCLAIMER
 ==
 This e-mail may contain privileged and confidential information which is the 
 property of Persistent Systems Ltd. It is intended only for the use of the 
 individual or entity to which it is addressed. If you are not the intended 
 recipient, you are not authorized to read, retain, copy, print, distribute or 
 use this message. If you have received this communication in error, please 
 notify the sender and delete all copies of this message. Persistent Systems 
 Ltd. does not accept any liability for virus infected mails.




-- 
--Noble Paul


Re: Multi Language Search

2008-12-04 Thread Grant Ingersoll


On Dec 2, 2008, at 4:52 AM, tushar kapoor wrote:


1. Russian Word 1 AND Russian Word 2


This is the way the query should look, but there's no reason why you  
can't let you're users input AND in Russian and then you substitute it  
when you create the query.





or rather,

2 . Russian Word 1 AND in Russian Russian Word 2

Now over to solr specific question. In case the answer to above is  
either 1.
or 2. how does one do it using Solr. I tried using the Language  
anallyzers

but I m not too sure how exactly it works.



Just send the string, with AND into Solr and the default query parser  
will know what to do.


-Grant

--
Grant Ingersoll

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ












Re: Newbie question - using existing Lucene Index

2008-12-04 Thread Grant Ingersoll


On Dec 3, 2008, at 11:53 AM, Sudarsan, Sithu D. wrote:


Hi All,

Using Lucene, index has been created. It has five different fields.

How to just use those index from SOLR for searching? I tried changing
the schema as in tutorial, and copied the index to the data directory,
but all searches return empty and no error message!


You also need to make sure the analyzers in the schema.xml are setup  
the same way as the one you indexed with to create the index originally.


You might also try going to wherever you are running, i.e. http://localhost:8983/solr/admin 
 (or whatever you're URL is) and entering *:* into the query box  
there.  This should return all documents in the index and doesn't  
require any analysis.


Also, check your logs to see if there were any exceptions on startup.




Is there a sample project available which shows using tomcat as the  
web

engine rather than jetty?


Instructions should be on the Wiki: http://wiki.apache.org/solr





Your help is appreciated,
Sincerely,
Sithu D Sudarsan

ORISE Fellow, DESE/OSEL/CDRH
WO62 - 3209

GRA, UALR

[EMAIL PROTECTED]
[EMAIL PROTECTED]



--
Grant Ingersoll

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ












Re: Solr 1.3 - response time very long

2008-12-04 Thread sunnyfr

Hi Yonik,

I've tried everything but it's doesn't change anything, I tried as well the
last trunk version but nothing changed.
There is nothings that I can do about the indexation ...maybe I can optimize
something before searching ???
I'm using linux system, apache 5.5, last solr version updated.
Memory : 8G Intel 

Do you think its a lot for the index size 7.6G for 8.5M of document?

And idea what can I do ??? 
Thanks a lot for your time


Yonik Seeley wrote:
 
 On Wed, Dec 3, 2008 at 11:49 AM, sunnyfr [EMAIL PROTECTED] wrote:
 Sorry the request is more :

 /select?q=text:svr09\+tutorial+AND+status_published:1+AND+status_moderated:0+AND+status_personal:0+AND+status_explicit:0+AND+status_private:0+AND+status_deleted:0+AND+status_error:0+AND+status_read
 or even I tried :
 
 There are a bunch of things you could try to speed things up a bit:
 1) optimize the index if you haven't
 2) use a faster response writer with a more compact format (i.e. add
 wt=javabin for a binary format or wt=json for JSON)
 3) use fl (field list) to restrict the results to only the fields you need
 4) never use debugQuery to benchmark performance (I don't think you
 actually did, but you did list it in the example dismax URL)
 5) pull out clauses that match many documents and that are common
 across many queries into filters.
 
 /select?q=text:svr09\+tutorialfq=status_published:1+AND+status_moderated:0+AND+status_personal:0+AND+status_explicit:0+AND+status_private:0+AND+status_deleted:0+AND+status_error:0+AND+status_read
 
 You can also use multiple filter queries for better caching if some of
 the clauses appear in smaller groups or in isolation.  If you can give
 more examples, we can tell what the common parts are.
 
 -Yonik
 
 

-- 
View this message in context: 
http://www.nabble.com/Solr-1.3---response-time-very-long-tp20795134p20833091.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Re[4]: solr performance

2008-12-04 Thread sunnyfr

Hi,
I was reading this post and I wondering how can I parallelize document
processing??? 
Thanks Erik


Erik Hatcher wrote:
 
 
 On Feb 21, 2007, at 4:25 PM, Jack L wrote:
 couple of times today at around 158 documents / sec.

 This is not bad at all. How about search performance?
 How many concurrent queries have people been having?
 What does the response time look like?
 
 I'm the only user :)   What I've done is a proof-of-concept for our  
 library.  We have 3.7M records that I've indexed and faceted.  Search  
 performance (in my unrealistic single user scenario) is blazing (50ms  
 or so) for purely full-text queries.  For queries that return facets,  
 the response times are actually quite good too (~900ms, or less  
 depending on the request) - provided the filter cache is warmed and  
 large enough.  This is running on my laptop (MacBook Pro, 2GB RAM,  
 1.83GHz) - I'm sure on a beefier box it'll only get better.
 
 Thanks to the others that clarified.  I run my indexers in
 parallel... but a single instance of Solr (which in turn handles
 requests in parallel as well).

 Do you feel if multi-threaded posting is helpful?
 
 It depends.  If the data processing can be parallelized and your  
 hardware supports it, it can certainly make a big difference... it  
 did in my case.  Both CPUs were cooking during my parallel indexing  
 runs.
 
   Erik
 
 
 
 
 

-- 
View this message in context: 
http://www.nabble.com/solr-performance-tp9055437p20833421.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Resolr performance

2008-12-04 Thread sunnyfr

Hi,

When I check my CPU, all my CPU are not full, how can I change this ? 
Do I have to change a parameter ?? 

Thanks a lot , 
Johanna


Walter Underwood wrote:
 
 Try running your submits while watching a CPU load meter.
 Do this on a multi-CPU machine.
 
 If all CPUs are busy, you are running as fast as possible.
 
 If one CPU is busy (around 50% usage on a dual-CPU system),
 parallel submits might help.
 
 If no CPU is 100% busy, the bottleneck is probably disk
 or network.
 
 wunder
 
 On 2/20/07 10:46 AM, Jack L [EMAIL PROTECTED] wrote:
 
 Thanks to all who replied. It's encouraging :)
 
 The numbers vary quite a bit though, from 13 docs/s (Burkamp)
 to 250 docs/s (Walter) to 1000 docs/s I understand the results also
 depend
 on the doc size and hardware.
 
 I have a question for Erik: you mentioned single threaded indexer
 (below). I'm not familiar with solr at all and did a search on solr
 wiki for thread and didn't find anything. Is it so that I can
 actually configure solr to be single-threaded and multi-threaded?
 
 And I'm not sure what you meant by parallelizing the indexer?
 Running multiple instances of the indexer, or multiple instances
 of solr?
 
 Thanks,
 
 Jack
 
 My largest Solr index is currently at 1.4M and it takes a max of 3ms
 to add a document (according to Solr's console), most of them 1ms.
 My single threaded indexer is indexing around 1000 documents per
 minute, but I think I can get this number even faster by
 parallelizing the indexer.
 
 
 __
 Do You Yahoo!?
 Tired of spam?  Yahoo! Mail has the best spam protection around
 http://mail.yahoo.com 
 
 
 

-- 
View this message in context: 
http://www.nabble.com/solr-performance-tp9055437p20833521.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: solr performance

2008-12-04 Thread Mark Miller
Kick off some indexing more than once - eg, post a folder of docs, and 
while thats working, post another.


I've been thinking about a multi threaded UpdateProcessor as well - that 
could be interesting.


- Mark

sunnyfr wrote:

Hi,
I was reading this post and I wondering how can I parallelize document
processing??? 
Thanks Erik



Erik Hatcher wrote:
  

On Feb 21, 2007, at 4:25 PM, Jack L wrote:


couple of times today at around 158 documents / sec.


This is not bad at all. How about search performance?
How many concurrent queries have people been having?
What does the response time look like?
  
I'm the only user :)   What I've done is a proof-of-concept for our  
library.  We have 3.7M records that I've indexed and faceted.  Search  
performance (in my unrealistic single user scenario) is blazing (50ms  
or so) for purely full-text queries.  For queries that return facets,  
the response times are actually quite good too (~900ms, or less  
depending on the request) - provided the filter cache is warmed and  
large enough.  This is running on my laptop (MacBook Pro, 2GB RAM,  
1.83GHz) - I'm sure on a beefier box it'll only get better.




Thanks to the others that clarified.  I run my indexers in
parallel... but a single instance of Solr (which in turn handles
requests in parallel as well).


Do you feel if multi-threaded posting is helpful?
  
It depends.  If the data processing can be parallelized and your  
hardware supports it, it can certainly make a big difference... it  
did in my case.  Both CPUs were cooking during my parallel indexing  
runs.


Erik








  




Re: solr performance

2008-12-04 Thread sunnyfr

Ok ... 
Actually my problem is more multi thread which take long time ... like 3sec
when 100 threads/sec.
I thought that could have helped me .. but no link actually :s 
sorry 


markrmiller wrote:
 
 Kick off some indexing more than once - eg, post a folder of docs, and 
 while thats working, post another.
 
 I've been thinking about a multi threaded UpdateProcessor as well - that 
 could be interesting.
 
 - Mark
 
 sunnyfr wrote:
 Hi,
 I was reading this post and I wondering how can I parallelize document
 processing??? 
 Thanks Erik


 Erik Hatcher wrote:
   
 On Feb 21, 2007, at 4:25 PM, Jack L wrote:
 
 couple of times today at around 158 documents / sec.
 
 This is not bad at all. How about search performance?
 How many concurrent queries have people been having?
 What does the response time look like?
   
 I'm the only user :)   What I've done is a proof-of-concept for our  
 library.  We have 3.7M records that I've indexed and faceted.  Search  
 performance (in my unrealistic single user scenario) is blazing (50ms  
 or so) for purely full-text queries.  For queries that return facets,  
 the response times are actually quite good too (~900ms, or less  
 depending on the request) - provided the filter cache is warmed and  
 large enough.  This is running on my laptop (MacBook Pro, 2GB RAM,  
 1.83GHz) - I'm sure on a beefier box it'll only get better.

 
 Thanks to the others that clarified.  I run my indexers in
 parallel... but a single instance of Solr (which in turn handles
 requests in parallel as well).
 
 Do you feel if multi-threaded posting is helpful?
   
 It depends.  If the data processing can be parallelized and your  
 hardware supports it, it can certainly make a big difference... it  
 did in my case.  Both CPUs were cooking during my parallel indexing  
 runs.

 Erik





 

   
 
 
 

-- 
View this message in context: 
http://www.nabble.com/solr-performance-tp9055437p20833662.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: solr performance

2008-12-04 Thread Yonik Seeley
On Thu, Dec 4, 2008 at 8:39 AM, Mark Miller [EMAIL PROTECTED] wrote:
 Kick off some indexing more than once - eg, post a folder of docs, and while
 thats working, post another.

 I've been thinking about a multi threaded UpdateProcessor as well - that
 could be interesting.

Not sure how that would work (unless you didn't want responses), but
I've thought about it from the SolrJ side - something you could
quickly add documents to and it would manage a number of threads under
the covers to maximize throughput.  Not sure what would be the best
for error handling though - perhaps just polling (allow user to ask
for failed or successful operations).

-Yonik


Re: Resolr performance

2008-12-04 Thread Yonik Seeley
On Thu, Dec 4, 2008 at 8:36 AM, sunnyfr [EMAIL PROTECTED] wrote:
 When I check my CPU, all my CPU are not full, how can I change this ?

If this is while you are indexing, then it simply means that you are
not feeding documents to Solr fast enough (use multiple threads to
send to Solr, and send multiple documents in each update request if
possible).  If CPU utilization is still low, then it means you are IO
(disk) bound... if you want to go faster, get faster disks.

-Yonik

 Do I have to change a parameter ??

 Thanks a lot ,
 Johanna


 Walter Underwood wrote:

 Try running your submits while watching a CPU load meter.
 Do this on a multi-CPU machine.

 If all CPUs are busy, you are running as fast as possible.

 If one CPU is busy (around 50% usage on a dual-CPU system),
 parallel submits might help.

 If no CPU is 100% busy, the bottleneck is probably disk
 or network.

 wunder

 On 2/20/07 10:46 AM, Jack L [EMAIL PROTECTED] wrote:

 Thanks to all who replied. It's encouraging :)

 The numbers vary quite a bit though, from 13 docs/s (Burkamp)
 to 250 docs/s (Walter) to 1000 docs/s I understand the results also
 depend
 on the doc size and hardware.

 I have a question for Erik: you mentioned single threaded indexer
 (below). I'm not familiar with solr at all and did a search on solr
 wiki for thread and didn't find anything. Is it so that I can
 actually configure solr to be single-threaded and multi-threaded?

 And I'm not sure what you meant by parallelizing the indexer?
 Running multiple instances of the indexer, or multiple instances
 of solr?

 Thanks,

 Jack

 My largest Solr index is currently at 1.4M and it takes a max of 3ms
 to add a document (according to Solr's console), most of them 1ms.
 My single threaded indexer is indexing around 1000 documents per
 minute, but I think I can get this number even faster by
 parallelizing the indexer.


 __
 Do You Yahoo!?
 Tired of spam?  Yahoo! Mail has the best spam protection around
 http://mail.yahoo.com




 --
 View this message in context: 
 http://www.nabble.com/solr-performance-tp9055437p20833521.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: Resolr performance

2008-12-04 Thread sunnyfr

When I run my stress test ..sending multi thread ... around 100/sec I don't
start indexation at all ... 
?
maybe my cache ??? will check that


Yonik Seeley wrote:
 
 On Thu, Dec 4, 2008 at 8:36 AM, sunnyfr [EMAIL PROTECTED] wrote:
 When I check my CPU, all my CPU are not full, how can I change this ?
 
 If this is while you are indexing, then it simply means that you are
 not feeding documents to Solr fast enough (use multiple threads to
 send to Solr, and send multiple documents in each update request if
 possible).  If CPU utilization is still low, then it means you are IO
 (disk) bound... if you want to go faster, get faster disks.
 
 -Yonik
 
 Do I have to change a parameter ??

 Thanks a lot ,
 Johanna


 Walter Underwood wrote:

 Try running your submits while watching a CPU load meter.
 Do this on a multi-CPU machine.

 If all CPUs are busy, you are running as fast as possible.

 If one CPU is busy (around 50% usage on a dual-CPU system),
 parallel submits might help.

 If no CPU is 100% busy, the bottleneck is probably disk
 or network.

 wunder

 On 2/20/07 10:46 AM, Jack L [EMAIL PROTECTED] wrote:

 Thanks to all who replied. It's encouraging :)

 The numbers vary quite a bit though, from 13 docs/s (Burkamp)
 to 250 docs/s (Walter) to 1000 docs/s I understand the results also
 depend
 on the doc size and hardware.

 I have a question for Erik: you mentioned single threaded indexer
 (below). I'm not familiar with solr at all and did a search on solr
 wiki for thread and didn't find anything. Is it so that I can
 actually configure solr to be single-threaded and multi-threaded?

 And I'm not sure what you meant by parallelizing the indexer?
 Running multiple instances of the indexer, or multiple instances
 of solr?

 Thanks,

 Jack

 My largest Solr index is currently at 1.4M and it takes a max of 3ms
 to add a document (according to Solr's console), most of them 1ms.
 My single threaded indexer is indexing around 1000 documents per
 minute, but I think I can get this number even faster by
 parallelizing the indexer.


 __
 Do You Yahoo!?
 Tired of spam?  Yahoo! Mail has the best spam protection around
 http://mail.yahoo.com




 --
 View this message in context:
 http://www.nabble.com/solr-performance-tp9055437p20833521.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 

-- 
View this message in context: 
http://www.nabble.com/solr-performance-tp9055437p20833790.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Resolr performance

2008-12-04 Thread Yonik Seeley
On Thu, Dec 4, 2008 at 8:52 AM, sunnyfr [EMAIL PROTECTED] wrote:

 When I run my stress test ..sending multi thread ... around 100/sec I don't
 start indexation at all ...

If you can't go higher than 100 requests / sec and the CPUs arent at
100% then the possibilities are:
- If the index is bigger than free memory the OS can use to cache,
then cache misses (at the OS level) can cause CPU to go lower - these
cache mises are most
likely to happen when retrieving stored fields for hits.
- You can also be network IO bound if you are doing requests from a
different machine.
- Internal locking contention... pretty much every system will reach a
peak number of requests/sec and then start declining as you add more
concurrent requests.

If you haven't yet, try a nightly build from December - the
index-level locking should be improved under high load for non-Windows
systems.

-Yonik


 maybe my cache ??? will check that


 Yonik Seeley wrote:

 On Thu, Dec 4, 2008 at 8:36 AM, sunnyfr [EMAIL PROTECTED] wrote:
 When I check my CPU, all my CPU are not full, how can I change this ?

 If this is while you are indexing, then it simply means that you are
 not feeding documents to Solr fast enough (use multiple threads to
 send to Solr, and send multiple documents in each update request if
 possible).  If CPU utilization is still low, then it means you are IO
 (disk) bound... if you want to go faster, get faster disks.

 -Yonik

 Do I have to change a parameter ??

 Thanks a lot ,
 Johanna


 Walter Underwood wrote:

 Try running your submits while watching a CPU load meter.
 Do this on a multi-CPU machine.

 If all CPUs are busy, you are running as fast as possible.

 If one CPU is busy (around 50% usage on a dual-CPU system),
 parallel submits might help.

 If no CPU is 100% busy, the bottleneck is probably disk
 or network.

 wunder

 On 2/20/07 10:46 AM, Jack L [EMAIL PROTECTED] wrote:

 Thanks to all who replied. It's encouraging :)

 The numbers vary quite a bit though, from 13 docs/s (Burkamp)
 to 250 docs/s (Walter) to 1000 docs/s I understand the results also
 depend
 on the doc size and hardware.

 I have a question for Erik: you mentioned single threaded indexer
 (below). I'm not familiar with solr at all and did a search on solr
 wiki for thread and didn't find anything. Is it so that I can
 actually configure solr to be single-threaded and multi-threaded?

 And I'm not sure what you meant by parallelizing the indexer?
 Running multiple instances of the indexer, or multiple instances
 of solr?

 Thanks,

 Jack

 My largest Solr index is currently at 1.4M and it takes a max of 3ms
 to add a document (according to Solr's console), most of them 1ms.
 My single threaded indexer is indexing around 1000 documents per
 minute, but I think I can get this number even faster by
 parallelizing the indexer.


 __
 Do You Yahoo!?
 Tired of spam?  Yahoo! Mail has the best spam protection around
 http://mail.yahoo.com




 --
 View this message in context:
 http://www.nabble.com/solr-performance-tp9055437p20833521.html
 Sent from the Solr - User mailing list archive at Nabble.com.





 --
 View this message in context: 
 http://www.nabble.com/solr-performance-tp9055437p20833790.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: Solr 1.3 - response time very long

2008-12-04 Thread Yonik Seeley
On Thu, Dec 4, 2008 at 8:13 AM, sunnyfr [EMAIL PROTECTED] wrote:

 Hi Yonik,

 I've tried everything but it's doesn't change anything, I tried as well the
 last trunk version but nothing changed.
 There is nothings that I can do about the indexation ...maybe I can optimize
 something before searching ???

Did you optimize the index (send in the optimize command) after
indexing but before searching?

curl http://localhost:8983/solr/update?optimize=true

 I'm using linux system, apache 5.5, last solr version updated.
 Memory : 8G Intel

 Do you think its a lot for the index size 7.6G for 8.5M of document?

So it could be due to the index being slightly to big - subtract out
memory for Solr and other stuff, and there's not enough left for
everything to fully be cached by the OS.

You can make it bigger or smaller depending on how you have the schema
configured.
The example schema isn't necessarily optimized for speed or size - it
serves as an example of many field types and operations.

Make sure you only index fields you need to search, sort, or facet on.
Make sure you only store fields (marked as stored in the schema) that
you really need returned in results.
The example schema as copyFields and default values that you don't
need - hopefully you've removed them.

What's your schema, and do you have more examples of URLs you are
sending to Solr (all the parameters)?

-Yonik


Response status

2008-12-04 Thread Robert Young
In the standard response format, what does the status mean? It always seems
to be 0.

Thanks
Rob


Re: Solr 1.3 - response time very long

2008-12-04 Thread sunnyfr

Huge thanks for your help Yonik,
I optimized the index so I will try to reduce the size ... like I explained
you I stored all language text ... 
So I will reduce my stored data.
Cheers... I will let you know :)


Yonik Seeley wrote:
 
 On Thu, Dec 4, 2008 at 8:13 AM, sunnyfr [EMAIL PROTECTED] wrote:

 Hi Yonik,

 I've tried everything but it's doesn't change anything, I tried as well
 the
 last trunk version but nothing changed.
 There is nothings that I can do about the indexation ...maybe I can
 optimize
 something before searching ???
 
 Did you optimize the index (send in the optimize command) after
 indexing but before searching?
 
 curl http://localhost:8983/solr/update?optimize=true
 
 I'm using linux system, apache 5.5, last solr version updated.
 Memory : 8G Intel

 Do you think its a lot for the index size 7.6G for 8.5M of document?
 
 So it could be due to the index being slightly to big - subtract out
 memory for Solr and other stuff, and there's not enough left for
 everything to fully be cached by the OS.
 
 You can make it bigger or smaller depending on how you have the schema
 configured.
 The example schema isn't necessarily optimized for speed or size - it
 serves as an example of many field types and operations.
 
 Make sure you only index fields you need to search, sort, or facet on.
 Make sure you only store fields (marked as stored in the schema) that
 you really need returned in results.
 The example schema as copyFields and default values that you don't
 need - hopefully you've removed them.
 
 What's your schema, and do you have more examples of URLs you are
 sending to Solr (all the parameters)?
 
 -Yonik
 
 

-- 
View this message in context: 
http://www.nabble.com/Solr-1.3---response-time-very-long-tp20795134p20834935.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Response status

2008-12-04 Thread Erik Hatcher
It means the request was successful.  If the status is non-zero (err,  
1) then there was an error of some sort.


Erik

On Dec 4, 2008, at 9:32 AM, Robert Young wrote:

In the standard response format, what does the status mean? It  
always seems

to be 0.

Thanks
Rob




Re: Response status

2008-12-04 Thread Robert Young
Thanks

On Thu, Dec 4, 2008 at 2:53 PM, Erik Hatcher [EMAIL PROTECTED]wrote:

 It means the request was successful.  If the status is non-zero (err, 1)
 then there was an error of some sort.

Erik


 On Dec 4, 2008, at 9:32 AM, Robert Young wrote:

  In the standard response format, what does the status mean? It always
 seems
 to be 0.

 Thanks
 Rob





Re: Solr 1.3 - response time very long

2008-12-04 Thread sunnyfr

Hi Yonik,

I will index my data again  Can you advice me to optimize a lot my data
and tell me if you see something very wrong or bad for the memory, according
to the fact that I just need to show back the ID, that's it.
But I need to boost some field ... like description .. description_country
according to the country ... 

Thanks a lot, I would appreciate it, 


 fields
field name=idtype=sintindexed=true
stored=true  omitNorms=true /


field name=status_privatetype=boolean indexed=true
stored=false omitNorms=true /
field name=status_... 6 more like that

field name=duration  type=sintindexed=true
stored=false omitNorms=true /
field name=created   type=dateindexed=true
stored=false omitNorms=true /
field name=modified  type=dateindexed=true
stored=false omitNorms=true /
field name=rating_binratetype=sintindexed=true
stored=false omitNorms=true /
field name=user_id   type=sintindexed=true
stored=false omitNorms=true /
field name=country   type=string  indexed=true
stored=false omitNorms=true /
field name=language  type=string  indexed=true
stored=false omitNorms=true /
field name=creative_type type=string  indexed=true
stored=false omitNorms=true /


field name=rel_group_ids type=sintindexed=true
stored=false omitNorms=true multiValued=true /
... 3more like that 



field name=rel_featured_user_ids type=sintindexed=true
stored=false omitNorms=true multiValued=true /
field name=rel_featured_group_idstype=sintindexed=true
stored=false omitNorms=true multiValued=true /

field name=stat_viewstype=sintindexed=true
stored=false omitNorms=true /
...6more like that ... 

field name=title type=text   indexed=true
stored=false /
field name=title_fr  type=text_frindexed=true
stored=false /
field name=title_en  type=text_enindexed=true
stored=false /
field name=title_de  type=text_deindexed=true
stored=false /
field name=title_es  type=text_esindexed=true
stored=false /
field name=title_ru  type=text_ruindexed=true
stored=false /
field name=title_pt  type=text_ptindexed=true
stored=false /
field name=title_nl  type=text_nlindexed=true
stored=false /
field name=title_el  type=text_elindexed=true
stored=false /
field name=title_ja  type=text_jaindexed=true
stored=false /
field name=title_it  type=text_itindexed=true
stored=false /

field name=description   type=textindexed=true
stored=false /
field name=description_frtype=text_frindexed=true
stored=false /
field name=description_entype=text_enindexed=true
stored=false /
field name=description_detype=text_deindexed=true
stored=false /
field name=description_estype=text_esindexed=true
stored=false /
field name=description_rutype=text_ruindexed=true
stored=false /
field name=description_pttype=text_ptindexed=true
stored=false /
field name=description_nltype=text_nlindexed=true
stored=false /
field name=description_eltype=text_elindexed=true
stored=false /
field name=description_jatype=text_jaindexed=true
stored=false /
field name=description_ittype=text_itindexed=true
stored=false /

field name=tag1  type=string  indexed=true
stored=false omitNorms=true /
field name=tag2  type=string  indexed=true
stored=false omitNorms=true /
field name=tag3  type=string  indexed=true
stored=false omitNorms=true /
field name=tag4  type=string  indexed=true
stored=false omitNorms=true /
field name=tags  type=string  indexed=true
stored=false omitNorms=true multiValued=true termVectors=true /
field name=owner_login   type=string  indexed=true
stored=false omitNorms=true /

   field name=text type=text indexed=true stored=false
multiValued=false/
   field name=timestamp type=date indexed=true stored=true
default=NOW multiValued=false/
   field name=spell type=textSpell indexed=true stored=false
multiValued=true/
   dynamicField name=random* type=random /

/fields




Yonik Seeley wrote:
 
 On Thu, Dec 4, 2008 at 8:13 AM, sunnyfr [EMAIL PROTECTED] wrote:

 Hi Yonik,

 I've tried everything but it's doesn't change anything, I tried as well
 the
 last trunk version but nothing changed.
 There is nothings that I can do about the indexation ...maybe I can
 optimize
 something before 

Re: Solr 1.3 - response time very long

2008-12-04 Thread Yonik Seeley
remove this entry from the example schema unless you need the
timestamp when it was indexed:

   field name=timestamp type=date indexed=true stored=true
default=NOW multiValued=false/

Also, only index fields you really need to search separately.
For example, if the description field is also indexed into the
text field via copyField, and you only search it via the text
field, then don't store or index the description field.

Retrieving only ids is something that could be optimized in Solr, but
hasn't been done yet.

-Yonik

On Thu, Dec 4, 2008 at 11:10 AM, sunnyfr [EMAIL PROTECTED] wrote:

 Hi Yonik,

 I will index my data again  Can you advice me to optimize a lot my data
 and tell me if you see something very wrong or bad for the memory, according
 to the fact that I just need to show back the ID, that's it.
 But I need to boost some field ... like description .. description_country
 according to the country ...

 Thanks a lot, I would appreciate it,


  fields
field name=idtype=sintindexed=true
 stored=true  omitNorms=true /


field name=status_privatetype=boolean indexed=true
 stored=false omitNorms=true /
field name=status_... 6 more like that

field name=duration  type=sintindexed=true
 stored=false omitNorms=true /
field name=created   type=dateindexed=true
 stored=false omitNorms=true /
field name=modified  type=dateindexed=true
 stored=false omitNorms=true /
field name=rating_binratetype=sintindexed=true
 stored=false omitNorms=true /
field name=user_id   type=sintindexed=true
 stored=false omitNorms=true /
field name=country   type=string  indexed=true
 stored=false omitNorms=true /
field name=language  type=string  indexed=true
 stored=false omitNorms=true /
field name=creative_type type=string  indexed=true
 stored=false omitNorms=true /


field name=rel_group_ids type=sintindexed=true
 stored=false omitNorms=true multiValued=true /
... 3more like that



field name=rel_featured_user_ids type=sintindexed=true
 stored=false omitNorms=true multiValued=true /
field name=rel_featured_group_idstype=sintindexed=true
 stored=false omitNorms=true multiValued=true /

field name=stat_viewstype=sintindexed=true
 stored=false omitNorms=true /
...6more like that ...

field name=title type=text   indexed=true
 stored=false /
field name=title_fr  type=text_frindexed=true
 stored=false /
field name=title_en  type=text_enindexed=true
 stored=false /
field name=title_de  type=text_deindexed=true
 stored=false /
field name=title_es  type=text_esindexed=true
 stored=false /
field name=title_ru  type=text_ruindexed=true
 stored=false /
field name=title_pt  type=text_ptindexed=true
 stored=false /
field name=title_nl  type=text_nlindexed=true
 stored=false /
field name=title_el  type=text_elindexed=true
 stored=false /
field name=title_ja  type=text_jaindexed=true
 stored=false /
field name=title_it  type=text_itindexed=true
 stored=false /

field name=description   type=textindexed=true
 stored=false /
field name=description_frtype=text_frindexed=true
 stored=false /
field name=description_entype=text_enindexed=true
 stored=false /
field name=description_detype=text_deindexed=true
 stored=false /
field name=description_estype=text_esindexed=true
 stored=false /
field name=description_rutype=text_ruindexed=true
 stored=false /
field name=description_pttype=text_ptindexed=true
 stored=false /
field name=description_nltype=text_nlindexed=true
 stored=false /
field name=description_eltype=text_elindexed=true
 stored=false /
field name=description_jatype=text_jaindexed=true
 stored=false /
field name=description_ittype=text_itindexed=true
 stored=false /

field name=tag1  type=string  indexed=true
 stored=false omitNorms=true /
field name=tag2  type=string  indexed=true
 stored=false omitNorms=true /
field name=tag3  type=string  indexed=true
 stored=false omitNorms=true /
field name=tag4  type=string  indexed=true
 stored=false omitNorms=true /
field name=tags  type=string  indexed=true
 stored=false omitNorms=true multiValued=true termVectors=true /
field name=owner_login   type=string  indexed=true
 

Re: solr performance

2008-12-04 Thread Mark Miller

Yonik Seeley wrote:
  
Not sure what would be the best

for error handling though - perhaps just polling (allow user to ask
for failed or successful operations).
  
Thats how I've handled similar situations in the past. Your submitting a 
batch of data to be processed, and if your so inclined to see how it 
went, you can inspect some kind of report object. If the batch process 
blocks, you could return the report object, or if not, you could return 
a batch/job id (with reports valid for x amount of time after they are 
done?).


It seems like a sound enough method to me, but it would be interesting 
to hear if someone has a better idea.


- Mark


Re: Solr 1.3 - response time very long

2008-12-04 Thread sunnyfr

Ok thanks a lot,

so I can remove all this part
field name=title type=text   indexed=true
stored=false /
field name=description   type=textindexed=true
stored=false /

field name=tag1  type=string  indexed=true
stored=false omitNorms=true /
field name=tag2  type=string  indexed=true
stored=false omitNorms=true /
field name=tag3  type=string  indexed=true
stored=false omitNorms=true /
field name=tag4  type=string  indexed=true
stored=false omitNorms=true /
field name=tags  type=string  indexed=true
stored=false omitNorms=true multiValued=true termVectors=true /


and just keep ... :
  copyField source=title  dest=text/
  copyField source=title_en  dest=text/ ..title_es ... 
  copyField source=descriptiondest=text/
  copyField source=description_endest=text/ ..title_en ... 
  copyField source=tag1   dest=text/
  copyField source=tag2   dest=text/
  copyField source=tag3   dest=text/
  copyField source=tag4   dest=text/

Just to be sure ... I index, title and description if for exemple i need to
boost them sepearetly... bf. title^2 description^1.5
Languages one ... need to be index to apply analyzer/stemmer, and boost them
differently according to the country.. but I copy them to be searchable.

thanks so much for your time .. again and again...


Yonik Seeley wrote:
 
 remove this entry from the example schema unless you need the
 timestamp when it was indexed:
 
field name=timestamp type=date indexed=true stored=true
 default=NOW multiValued=false/
 
 Also, only index fields you really need to search separately.
 For example, if the description field is also indexed into the
 text field via copyField, and you only search it via the text
 field, then don't store or index the description field.
 
 Retrieving only ids is something that could be optimized in Solr, but
 hasn't been done yet.
 
 -Yonik
 
 On Thu, Dec 4, 2008 at 11:10 AM, sunnyfr [EMAIL PROTECTED] wrote:

 Hi Yonik,

 I will index my data again  Can you advice me to optimize a lot my
 data
 and tell me if you see something very wrong or bad for the memory,
 according
 to the fact that I just need to show back the ID, that's it.
 But I need to boost some field ... like description ..
 description_country
 according to the country ...

 Thanks a lot, I would appreciate it,


  fields
field name=idtype=sintindexed=true
 stored=true  omitNorms=true /


field name=status_privatetype=boolean indexed=true
 stored=false omitNorms=true /
field name=status_... 6 more like that

field name=duration  type=sintindexed=true
 stored=false omitNorms=true /
field name=created   type=dateindexed=true
 stored=false omitNorms=true /
field name=modified  type=dateindexed=true
 stored=false omitNorms=true /
field name=rating_binratetype=sintindexed=true
 stored=false omitNorms=true /
field name=user_id   type=sintindexed=true
 stored=false omitNorms=true /
field name=country   type=string  indexed=true
 stored=false omitNorms=true /
field name=language  type=string  indexed=true
 stored=false omitNorms=true /
field name=creative_type type=string  indexed=true
 stored=false omitNorms=true /


field name=rel_group_ids type=sintindexed=true
 stored=false omitNorms=true multiValued=true /
... 3more like that



field name=rel_featured_user_ids type=sintindexed=true
 stored=false omitNorms=true multiValued=true /
field name=rel_featured_group_idstype=sintindexed=true
 stored=false omitNorms=true multiValued=true /

field name=stat_viewstype=sintindexed=true
 stored=false omitNorms=true /
...6more like that ...

field name=title type=text  
 indexed=true
 stored=false /
field name=title_fr  type=text_fr   
 indexed=true
 stored=false /
field name=title_en  type=text_en   
 indexed=true
 stored=false /
field name=title_de  type=text_de   
 indexed=true
 stored=false /
field name=title_es  type=text_es   
 indexed=true
 stored=false /
field name=title_ru  type=text_ru   
 indexed=true
 stored=false /
field name=title_pt  type=text_pt   
 indexed=true
 stored=false /
field name=title_nl  type=text_nl   
 indexed=true
 stored=false /
field name=title_el  type=text_el   
 indexed=true
 stored=false /
field name=title_ja  type=text_ja   
 indexed=true
 stored=false /
field name=title_it  type=text_it   
 indexed=true
 stored=false /


Re: Solr 1.3 - response time very long

2008-12-04 Thread Yonik Seeley
On Thu, Dec 4, 2008 at 11:41 AM, sunnyfr [EMAIL PROTECTED] wrote:

 Ok thanks a lot,

 so I can remove all this part

I wouldn't remove them if they are the source of a copyField (with the
destination being text).
Simply change to indexed=false stored=false
otherwise you may get an undefined field exception.

-Yonik

field name=title type=text   indexed=true
 stored=false /
field name=description   type=textindexed=true
 stored=false /

field name=tag1  type=string  indexed=true
 stored=false omitNorms=true /
field name=tag2  type=string  indexed=true
 stored=false omitNorms=true /
field name=tag3  type=string  indexed=true
 stored=false omitNorms=true /
field name=tag4  type=string  indexed=true
 stored=false omitNorms=true /
field name=tags  type=string  indexed=true
 stored=false omitNorms=true multiValued=true termVectors=true /


 and just keep ... :
  copyField source=title  dest=text/
  copyField source=title_en  dest=text/ ..title_es ...
  copyField source=descriptiondest=text/
  copyField source=description_endest=text/ ..title_en ...
  copyField source=tag1   dest=text/
  copyField source=tag2   dest=text/
  copyField source=tag3   dest=text/
  copyField source=tag4   dest=text/

 Just to be sure ... I index, title and description if for exemple i need to
 boost them sepearetly... bf. title^2 description^1.5
 Languages one ... need to be index to apply analyzer/stemmer, and boost them
 differently according to the country.. but I copy them to be searchable.

 thanks so much for your time .. again and again...


 Yonik Seeley wrote:

 remove this entry from the example schema unless you need the
 timestamp when it was indexed:

field name=timestamp type=date indexed=true stored=true
 default=NOW multiValued=false/

 Also, only index fields you really need to search separately.
 For example, if the description field is also indexed into the
 text field via copyField, and you only search it via the text
 field, then don't store or index the description field.

 Retrieving only ids is something that could be optimized in Solr, but
 hasn't been done yet.

 -Yonik

 On Thu, Dec 4, 2008 at 11:10 AM, sunnyfr [EMAIL PROTECTED] wrote:

 Hi Yonik,

 I will index my data again  Can you advice me to optimize a lot my
 data
 and tell me if you see something very wrong or bad for the memory,
 according
 to the fact that I just need to show back the ID, that's it.
 But I need to boost some field ... like description ..
 description_country
 according to the country ...

 Thanks a lot, I would appreciate it,


  fields
field name=idtype=sintindexed=true
 stored=true  omitNorms=true /


field name=status_privatetype=boolean indexed=true
 stored=false omitNorms=true /
field name=status_... 6 more like that

field name=duration  type=sintindexed=true
 stored=false omitNorms=true /
field name=created   type=dateindexed=true
 stored=false omitNorms=true /
field name=modified  type=dateindexed=true
 stored=false omitNorms=true /
field name=rating_binratetype=sintindexed=true
 stored=false omitNorms=true /
field name=user_id   type=sintindexed=true
 stored=false omitNorms=true /
field name=country   type=string  indexed=true
 stored=false omitNorms=true /
field name=language  type=string  indexed=true
 stored=false omitNorms=true /
field name=creative_type type=string  indexed=true
 stored=false omitNorms=true /


field name=rel_group_ids type=sintindexed=true
 stored=false omitNorms=true multiValued=true /
... 3more like that



field name=rel_featured_user_ids type=sintindexed=true
 stored=false omitNorms=true multiValued=true /
field name=rel_featured_group_idstype=sintindexed=true
 stored=false omitNorms=true multiValued=true /

field name=stat_viewstype=sintindexed=true
 stored=false omitNorms=true /
...6more like that ...

field name=title type=text
 indexed=true
 stored=false /
field name=title_fr  type=text_fr
 indexed=true
 stored=false /
field name=title_en  type=text_en
 indexed=true
 stored=false /
field name=title_de  type=text_de
 indexed=true
 stored=false /
field name=title_es  type=text_es
 indexed=true
 stored=false /
field name=title_ru  type=text_ru
 indexed=true
 stored=false /
field name=title_pt  type=text_pt
 indexed=true
 stored=false /
field name=title_nl  type=text_nl
 indexed=true
 stored=false /
field 

Re: Solr 1.3 - response time very long

2008-12-04 Thread sunnyfr

right !!!


Yonik Seeley wrote:
 
 On Thu, Dec 4, 2008 at 11:41 AM, sunnyfr [EMAIL PROTECTED] wrote:

 Ok thanks a lot,

 so I can remove all this part
 
 I wouldn't remove them if they are the source of a copyField (with the
 destination being text).
 Simply change to indexed=false stored=false
 otherwise you may get an undefined field exception.
 
 -Yonik
 
field name=title type=text  
 indexed=true
 stored=false /
field name=description   type=textindexed=true
 stored=false /

field name=tag1  type=string  indexed=true
 stored=false omitNorms=true /
field name=tag2  type=string  indexed=true
 stored=false omitNorms=true /
field name=tag3  type=string  indexed=true
 stored=false omitNorms=true /
field name=tag4  type=string  indexed=true
 stored=false omitNorms=true /
field name=tags  type=string  indexed=true
 stored=false omitNorms=true multiValued=true termVectors=true /


 and just keep ... :
  copyField source=title  dest=text/
  copyField source=title_en  dest=text/ ..title_es ...
  copyField source=descriptiondest=text/
  copyField source=description_endest=text/ ..title_en ...
  copyField source=tag1   dest=text/
  copyField source=tag2   dest=text/
  copyField source=tag3   dest=text/
  copyField source=tag4   dest=text/

 Just to be sure ... I index, title and description if for exemple i need
 to
 boost them sepearetly... bf. title^2 description^1.5
 Languages one ... need to be index to apply analyzer/stemmer, and boost
 them
 differently according to the country.. but I copy them to be searchable.

 thanks so much for your time .. again and again...


 Yonik Seeley wrote:

 remove this entry from the example schema unless you need the
 timestamp when it was indexed:

field name=timestamp type=date indexed=true stored=true
 default=NOW multiValued=false/

 Also, only index fields you really need to search separately.
 For example, if the description field is also indexed into the
 text field via copyField, and you only search it via the text
 field, then don't store or index the description field.

 Retrieving only ids is something that could be optimized in Solr, but
 hasn't been done yet.

 -Yonik

 On Thu, Dec 4, 2008 at 11:10 AM, sunnyfr [EMAIL PROTECTED] wrote:

 Hi Yonik,

 I will index my data again  Can you advice me to optimize a lot my
 data
 and tell me if you see something very wrong or bad for the memory,
 according
 to the fact that I just need to show back the ID, that's it.
 But I need to boost some field ... like description ..
 description_country
 according to the country ...

 Thanks a lot, I would appreciate it,


  fields
field name=idtype=sint   
 indexed=true
 stored=true  omitNorms=true /


field name=status_privatetype=boolean
 indexed=true
 stored=false omitNorms=true /
field name=status_... 6 more like that

field name=duration  type=sint   
 indexed=true
 stored=false omitNorms=true /
field name=created   type=date   
 indexed=true
 stored=false omitNorms=true /
field name=modified  type=date   
 indexed=true
 stored=false omitNorms=true /
field name=rating_binratetype=sint   
 indexed=true
 stored=false omitNorms=true /
field name=user_id   type=sint   
 indexed=true
 stored=false omitNorms=true /
field name=country   type=string 
 indexed=true
 stored=false omitNorms=true /
field name=language  type=string 
 indexed=true
 stored=false omitNorms=true /
field name=creative_type type=string 
 indexed=true
 stored=false omitNorms=true /


field name=rel_group_ids type=sint   
 indexed=true
 stored=false omitNorms=true multiValued=true /
... 3more like that



field name=rel_featured_user_ids type=sint   
 indexed=true
 stored=false omitNorms=true multiValued=true /
field name=rel_featured_group_idstype=sint   
 indexed=true
 stored=false omitNorms=true multiValued=true /

field name=stat_viewstype=sint   
 indexed=true
 stored=false omitNorms=true /
...6more like that ...

field name=title type=text
 indexed=true
 stored=false /
field name=title_fr  type=text_fr
 indexed=true
 stored=false /
field name=title_en  type=text_en
 indexed=true
 stored=false /
field name=title_de  type=text_de
 indexed=true
 stored=false /
field name=title_es  type=text_es
 indexed=true
 stored=false /
field name=title_ru  type=text_ru
 indexed=true
 stored=false /
field name=title_pt  type=text_pt
 indexed=true
 stored=false /
field name=title_nl 

Re: Throughput Optimization

2008-12-04 Thread wojtekpia

It looks like file locking was the bottleneck - CPU usage is up to ~98% (from
the previous peak of ~50%). I'm running the trunk code from Dec 2 with the
faceting improvement (SOLR-475) turned off. Thanks for all the help!


Yonik Seeley wrote:
 
 FYI, SOLR-465 has been committed.  Let us know if it improves your
 scenario.
 
 -Yonik
 

-- 
View this message in context: 
http://www.nabble.com/Throughput-Optimization-tp20335132p20840017.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Throughput Optimization

2008-12-04 Thread Yonik Seeley
On Thu, Dec 4, 2008 at 1:54 PM, wojtekpia [EMAIL PROTECTED] wrote:
 It looks like file locking was the bottleneck - CPU usage is up to ~98% (from
 the previous peak of ~50%).

Great to hear it!

 I'm running the trunk code from Dec 2 with the
 faceting improvement (SOLR-475) turned off. Thanks for all the help!

new faceting stuff off because it didn't improve things in your case,
or because you didn't want to change that variable just now?

-Yonik


 Yonik Seeley wrote:

 FYI, SOLR-465 has been committed.  Let us know if it improves your
 scenario.

 -Yonik


 --
 View this message in context: 
 http://www.nabble.com/Throughput-Optimization-tp20335132p20840017.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: Ordering updates

2008-12-04 Thread Shalin Shekhar Mangar
It is not clear how you are using Solr i.e. distributed vs single index.

Summarily, Solr does not update documents. It overwrites the old document
with the new one if an old document with the same uniqueKey exists in the
index.

Does that answer your question?

On Thu, Dec 4, 2008 at 1:46 AM, Laurence Rowe [EMAIL PROTECTED] wrote:

 Hi,

 Our CMS is distributed over a cluster and I was wandering how I can
 ensure that index records of newer versions of documents are never
 overwritten by older ones. Amazon AWS uses a timestamp on requests to
 ensure 'eventual consistency' of operations. Is there a way to supply
 a transaction ID with an update so an update is conditional on the
 supplied transaction id being greater than the existing indexed
 transaction id?

 Laurence




-- 
Regards,
Shalin Shekhar Mangar.


Re: new faceting algorithm

2008-12-04 Thread wojtekpia

I'm seeing some strange behavior with my garbage collector that disappears
when I turn off this optimization. I'm running load tests on my deployment.
For the first few minutes, everything is fine (and this patch does make
things faster - I haven't quantified the improvement yet). After that, the
garbage collector stops collecting. Specifically, the new generation part of
the heap is full, but never garbage collected, and the old generation is
emptied, then never gets anything more. This throttles Solr performance
(average response times that used to be ~500ms are now ~25s). 

I described my deployment scenario in an earlier post:
http://www.nabble.com/Throughput-Optimization-td20335132.html

Does it sound like the new faceting algorithm could be the culprit?


wojtekpia wrote:
 
 Definitely, but it'll take me a few days. I'll also report findings on
 SOLR-465. (I've been on holiday for a few weeks)
 
 
 Noble Paul നോബിള്‍ नोब्ळ् wrote:
 
 wojtek, you can report back the numbers if possible
 
 It would be nice to know how the new impl performs in real-world
 
 
 
 
 

-- 
View this message in context: 
http://www.nabble.com/new-faceting-algorithm-tp20674902p20840622.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Throughput Optimization

2008-12-04 Thread wojtekpia

New faceting stuff off because I'm encountering some problems when I turn it
on, I posted the details:
http://www.nabble.com/new-faceting-algorithm-td20674902.html#a20840622


Yonik Seeley wrote:
 
 On Thu, Dec 4, 2008 at 1:54 PM, wojtekpia [EMAIL PROTECTED] wrote:
 It looks like file locking was the bottleneck - CPU usage is up to ~98%
 (from
 the previous peak of ~50%).
 
 Great to hear it!
 
 I'm running the trunk code from Dec 2 with the
 faceting improvement (SOLR-475) turned off. Thanks for all the help!
 
 new faceting stuff off because it didn't improve things in your case,
 or because you didn't want to change that variable just now?
 
 -Yonik
 

-- 
View this message in context: 
http://www.nabble.com/Throughput-Optimization-tp20335132p20840668.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Throughput Optimization

2008-12-04 Thread Yonik Seeley
On Thu, Dec 4, 2008 at 2:30 PM, wojtekpia [EMAIL PROTECTED] wrote:

 New faceting stuff off because I'm encountering some problems when I turn it
 on, I posted the details:
 http://www.nabble.com/new-faceting-algorithm-td20674902.html#a20840622

Missed that, thanks... will respond there.

-Yonik


 Yonik Seeley wrote:

 On Thu, Dec 4, 2008 at 1:54 PM, wojtekpia [EMAIL PROTECTED] wrote:
 It looks like file locking was the bottleneck - CPU usage is up to ~98%
 (from
 the previous peak of ~50%).

 Great to hear it!

 I'm running the trunk code from Dec 2 with the
 faceting improvement (SOLR-475) turned off. Thanks for all the help!

 new faceting stuff off because it didn't improve things in your case,
 or because you didn't want to change that variable just now?

 -Yonik


 --
 View this message in context: 
 http://www.nabble.com/Throughput-Optimization-tp20335132p20840668.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: new faceting algorithm

2008-12-04 Thread Yonik Seeley
On Thu, Dec 4, 2008 at 2:28 PM, wojtekpia [EMAIL PROTECTED] wrote:
 I'm seeing some strange behavior with my garbage collector that disappears
 when I turn off this optimization. I'm running load tests on my deployment.
 For the first few minutes, everything is fine (and this patch does make
 things faster - I haven't quantified the improvement yet). After that, the
 garbage collector stops collecting. Specifically, the new generation part of
 the heap is full, but never garbage collected, and the old generation is
 emptied, then never gets anything more.

Are you doing commits at any time?
One possibility is the caching mechanism (weak-ref on the
IndexReader)... that's going to be changing soon hopefully.

-Yonik


 This throttles Solr performance
 (average response times that used to be ~500ms are now ~25s).

 I described my deployment scenario in an earlier post:
 http://www.nabble.com/Throughput-Optimization-td20335132.html

 Does it sound like the new faceting algorithm could be the culprit?


 wojtekpia wrote:

 Definitely, but it'll take me a few days. I'll also report findings on
 SOLR-465. (I've been on holiday for a few weeks)


 Noble Paul നോബിള്‍ नोब्ळ् wrote:

 wojtek, you can report back the numbers if possible

 It would be nice to know how the new impl performs in real-world






 --
 View this message in context: 
 http://www.nabble.com/new-faceting-algorithm-tp20674902p20840622.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: new faceting algorithm

2008-12-04 Thread wojtekpia


Yonik Seeley wrote:
 
 
 Are you doing commits at any time?
 One possibility is the caching mechanism (weak-ref on the
 IndexReader)... that's going to be changing soon hopefully.
 
 -Yonik
 


No commits during this test. Should I start looking into my heap size
distribution and garbage collector selection?
-- 
View this message in context: 
http://www.nabble.com/new-faceting-algorithm-tp20674902p20841219.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: new faceting algorithm

2008-12-04 Thread Yonik Seeley
On Thu, Dec 4, 2008 at 2:57 PM, wojtekpia [EMAIL PROTECTED] wrote:
 Yonik Seeley wrote:

 Are you doing commits at any time?
 One possibility is the caching mechanism (weak-ref on the
 IndexReader)... that's going to be changing soon hopefully.

 -Yonik


 No commits during this test. Should I start looking into my heap size
 distribution and garbage collector selection?

Hmmm, OK.  The other big difference would then be that retrieving the
top facets requires creating a Lucene TermEnum (not all facet values
are stored in memory).  The lucene version in Solr has changed since I
did long running tests... with various Lucene changes to thread-local
caching, etc.  I'll try and reproduce.  Or maybe this is somehow a GC
bug just tickled by the current caching mechanism? (weak hash map)

-Yonik


Is there a clean way to determine whether a core exists?

2008-12-04 Thread Dean Thompson
The ping command gives me a 500 status if the core exists, or a 404 if  
it doesn't.  For example, when I hit


http://doom:8983/solr/content_item_representations_20081201/admin/ping

I see

HTTP ERROR: 500

INTERNAL_SERVER_ERROR

RequestURI=/solr/admin/ping

Powered by Jetty://

(Obviously I'm using Jetty.  Moving to Tomcat is on our list.)

I could depend on this behavior, but that seems ugly, so I decided to  
try a probe query instead.


I am using the Solrj client library.  Unfortunately, for core non- 
existence or any other problem, Solrj uses an unhelpful catch-all  
exception:


/** Exception to catch all types of communication / parsing  
issues associated with talking to SOLR

  *
  * @version $Id: SolrServerException.java 555343 2007-07-11  
17:46:25Z hossman $

  * @since solr 1.3
  */
public class SolrServerException extends Exception {
...
}

So I have wound up, thus far, with the following code:

private boolean solrCoreExists(String url)
throws SolrException, MalformedURLException,  
SolrServerException

{
try {
SolrQuery solrQuery = new SolrQuery();
solrQuery.setQuery(xyzzy=plugh);
new CommonsHttpSolrServer(url).query(solrQuery);
return true;
} catch (SolrServerException e) {
if (e.getCause() != null   
e.getCause().getMessage().startsWith(Not Found)) {

return false;
} else {
throw e;
}
}
}

Hopefully there's a better solution?

Dean



Re: Is there a clean way to determine whether a core exists?

2008-12-04 Thread Ryan McKinley

what about just calling:
http://doom:8983/solr/content_item_representations_20081201/select

That should give you a 404 if it does not exist.

the admin stuff will behave funny if the core does not exist (perhaps  
you can file a JIRA issue for that)


ryan


On Dec 4, 2008, at 3:38 PM, Dean Thompson wrote:

The ping command gives me a 500 status if the core exists, or a 404  
if it doesn't.  For example, when I hit


   http://doom:8983/solr/content_item_representations_20081201/admin/ping

I see

   HTTP ERROR: 500

   INTERNAL_SERVER_ERROR

   RequestURI=/solr/admin/ping

   Powered by Jetty://

(Obviously I'm using Jetty.  Moving to Tomcat is on our list.)

I could depend on this behavior, but that seems ugly, so I decided  
to try a probe query instead.


I am using the Solrj client library.  Unfortunately, for core non- 
existence or any other problem, Solrj uses an unhelpful catch-all  
exception:


   /** Exception to catch all types of communication / parsing  
issues associated with talking to SOLR

 *
 * @version $Id: SolrServerException.java 555343 2007-07-11  
17:46:25Z hossman $

 * @since solr 1.3
 */
   public class SolrServerException extends Exception {
   ...
   }

So I have wound up, thus far, with the following code:

   private boolean solrCoreExists(String url)
   throws SolrException, MalformedURLException,  
SolrServerException

   {
   try {
   SolrQuery solrQuery = new SolrQuery();
   solrQuery.setQuery(xyzzy=plugh);
   new CommonsHttpSolrServer(url).query(solrQuery);
   return true;
   } catch (SolrServerException e) {
   if (e.getCause() != null   
e.getCause().getMessage().startsWith(Not Found)) {

   return false;
   } else {
   throw e;
   }
   }
   }

Hopefully there's a better solution?

Dean





Re: Is there a clean way to determine whether a core exists?

2008-12-04 Thread Dean Thompson

Thanks for the quick response, Ryan!

Actually, my admin/ping call gives me a 404 if the core doesn't exist,  
which seemed reasonable.  I get the 500 if the core *did* exist.


Thanks for the suggestion of using the select URL, but that gives me:

HTTP ERROR: 500

null

java.lang.NullPointerException
at org.apache.solr.common.util.StrUtils.splitSmart(StrUtils.java:37)
	at  
org.apache.solr.search.OldLuceneQParser.parse(LuceneQParserPlugin.java: 
104)

at org.apache.solr.search.QParser.getQuery(QParser.java:88)
	at  
org 
.apache 
.solr.handler.component.QueryComponent.prepare(QueryComponent.java:82)
	at  
org 
.apache 
.solr 
.handler.component.SearchHandler.handleRequestBody(SearchHandler.java: 
148)
	at  
org 
.apache 
.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java: 
131)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)
	at  
org 
.apache 
.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
	at  
org 
.apache 
.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
	at org.mortbay.jetty.servlet.ServletHandler 
$CachedChain.doFilter(ServletHandler.java:1089)
	at  
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
	at  
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java: 
216)
	at  
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
	at  
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
	at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java: 
405)
	at  
org 
.mortbay 
.jetty 
.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java: 
211)
	at  
org 
.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java: 
114)
	at  
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)

at org.mortbay.jetty.Server.handle(Server.java:285)
	at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java: 
502)
	at org.mortbay.jetty.HttpConnection 
$RequestHandler.headerComplete(HttpConnection.java:821)

at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
	at org.mortbay.jetty.bio.SocketConnector 
$Connection.run(SocketConnector.java:226)
	at org.mortbay.thread.BoundedThreadPool 
$PoolThread.run(BoundedThreadPool.java:442)


RequestURI=/solr/content_item_representations_20081201/select

On Dec 4, 2008, at 3:48 PM, Ryan McKinley wrote:


http://doom:8983/solr/content_item_representations_20081201/select




Re: Ordering updates

2008-12-04 Thread Laurence Rowe
Hi,

We currently have a single Solr server, with a single index. There are
a number of CMS processes distributed over a number of servers, with
each CMS process sending an update to the Solr index when changes are
made to a content object.

My concern is that a scenario is possible where a content object is
changed and reindexed concurrently by two CMS processes. The database
ensures consistency within the CMS, these transactions get comitted as
T1 and T2. But I cannot see how to ensure that the reindexing
operations (that result in a delete and add for the document) are
processed in the order R1 then R2, rather than R2 then R1. In the
second case the index record is now inconsistent with the content
object in the database.

I would like to supply a transaction id with the reindex request, and
configure Solr such that a reindex operation is processed if and only
if the supplied transaction id is greater than the currently indexed
transaction id.

Otherwise the only way I can see to guarantee consistency is 1) have
index operations processed by a single writer, or 2) commit the index
operation between database prepare and commit statements.

The first is not desirable as we introduce a single point of failure
(in addition to the single Solr server) and delay updating the index.
The second is not desirable because it reduces the throughput of the
database, and with a distributed Solr setup would not solve the
problem.

From what I can tell this conditional indexing feature is not
supported by Solr. Might it be supported by Lucene but not exposed by
Solr?

Thanks,

Laurence

2008/12/4 Shalin Shekhar Mangar [EMAIL PROTECTED]:
 It is not clear how you are using Solr i.e. distributed vs single index.

 Summarily, Solr does not update documents. It overwrites the old document
 with the new one if an old document with the same uniqueKey exists in the
 index.

 Does that answer your question?

 On Thu, Dec 4, 2008 at 1:46 AM, Laurence Rowe [EMAIL PROTECTED] wrote:

 Hi,

 Our CMS is distributed over a cluster and I was wandering how I can
 ensure that index records of newer versions of documents are never
 overwritten by older ones. Amazon AWS uses a timestamp on requests to
 ensure 'eventual consistency' of operations. Is there a way to supply
 a transaction ID with an update so an update is conditional on the
 supplied transaction id being greater than the existing indexed
 transaction id?

 Laurence




 --
 Regards,
 Shalin Shekhar Mangar.



Re: Ordering updates

2008-12-04 Thread Shalin Shekhar Mangar
On Fri, Dec 5, 2008 at 2:42 AM, Laurence Rowe [EMAIL PROTECTED] wrote:


 We currently have a single Solr server, with a single index. There are
 a number of CMS processes distributed over a number of servers, with
 each CMS process sending an update to the Solr index when changes are
 made to a content object.

 My concern is that a scenario is possible where a content object is
 changed and reindexed concurrently by two CMS processes. The database
 ensures consistency within the CMS, these transactions get comitted as
 T1 and T2.


If each CMS process has a consistent view of the data and it wishes to
update Solr with that data, where is the question of inconsistency here?


 But I cannot see how to ensure that the reindexing
 operations (that result in a delete and add for the document) are
 processed in the order R1 then R2, rather than R2 then R1. In the
 second case the index record is now inconsistent with the content
 object in the database.


When you need to update a document in Solr, you need to send the complete
document  and it will automatically do the replace. They will be visible to
searchers when you call commit on Solr. From your CMS's perspective, it is a
single operation. I hope I am understanding your problem correctly.

I would like to supply a transaction id with the reindex request, and
 configure Solr such that a reindex operation is processed if and only
 if the supplied transaction id is greater than the currently indexed
 transaction id.

 Otherwise the only way I can see to guarantee consistency is 1) have
 index operations processed by a single writer, or 2) commit the index
 operation between database prepare and commit statements.

 The first is not desirable as we introduce a single point of failure
 (in addition to the single Solr server) and delay updating the index.
 The second is not desirable because it reduces the throughput of the
 database, and with a distributed Solr setup would not solve the
 problem.

 From what I can tell this conditional indexing feature is not
 supported by Solr. Might it be supported by Lucene but not exposed by
 Solr?


No this is not supported by either of Lucene/Solr.

-- 
Regards,
Shalin Shekhar Mangar.


Boost a query by field at query time - Standard Request Handler

2008-12-04 Thread ashokc

Here is the problem I am trying to solve. I have to use the Standard Request
Handler.

Query (can be quite complex, as it gets built from an advanced search form):
term1^2.0 OR term2 OR term3 term4

I have 3 fields - content (the default search field), title and url.

Any matches in the title or url fields should be weighed more. I can specify
index time boosting for these two fields, but I would rather not, as it is a
heavy handed solution. I need to make it user configurable for advanced
search.

What should my query to SOLR be? Something like this?

content:term1^2.0 OR content:term2 OR content:term3 term4 OR
title:term1^2.0 OR title:term2 OR title:term3 term4 OR url:term1^2.0 OR
url:term2 OR url:term3 term4

Looks like it can get pretty long and error prone. With the 'dismax' handler
I can simply specify

qf=content title^2 url^2

no matter how complex the 'q' parameter is.

Is there a similar easier way I can do query time boosting with Standard
Request Handler, that I am missing?

Thanks for your help

- ashok

-- 
View this message in context: 
http://www.nabble.com/Boost-a-query--by-field-at-query-time---Standard-Request-Handler-tp20842675p20842675.html
Sent from the Solr - User mailing list archive at Nabble.com.



Standard request with functional query

2008-12-04 Thread Sammy Yu
Hi guys,
I have a standard query that searches across multiple text fields such as
q=title:iphone OR bodytext:iphone OR title:firmware OR bodytext:firmware

This comes back with documents that have iphone and firmware (I know I
can use dismax handler but it seems to be really slow), which is
great.  Now I want to give some more weight to more recent documents
(there is a dateCreated field in each document).

So I've modified the query as such:
(title:iphone OR bodytext:iphone OR title:firmware OR
bodytext:firmware) AND _val_:ord(dateCreated)^0.1
URLencoded to 
q=(title%3Aiphone+OR+bodytext%3Aiphone+OR+title%3Afirmware+OR+bodytext%3Afirmware)+AND+_val_%3Aord(dateCreated)^0.1

However, the results are not as one would expects.  The first few
documents only come back with the word iphone and appears to be sorted
by date created.  It seems to completely ignore the score and use the
dateCreated field for the score.

On a not directly related issue it seems like if you put the weight
within the double quotes:
(title:iphone OR bodytext:iphone OR title:firmware OR
bodytext:firmware) AND _val_:ord(dateCreated)^0.1

the parser complains:
org.apache.lucene.queryParser.ParseException: Cannot parse
'(title:iphone OR bodytext:iphone OR title:firmware OR
bodytext:firmware) AND _val_:ord(dateCreated)^0.1': Expected ',' at
position 16 in 'ord(dateCreated)^0.1'

Thanks,
Sammy


Re: Standard request with functional query

2008-12-04 Thread Yonik Seeley
On Thu, Dec 4, 2008 at 4:35 PM, Sammy Yu [EMAIL PROTECTED] wrote:
 bodytext:firmware) AND _val_:ord(dateCreated)^0.1': Expected ',' at
 position 16 in 'ord(dateCreated)^0.1'

^0.1 is not function query syntax, it's Lucene/Solr QueryParser
syntax.  Try _val_:ord(dateCreated)^0.1

-Yonik


Merging Indices

2008-12-04 Thread ashokc

The SOLR wiki says

3. Make sure both indexes you want to merge are closed.

What exactly does 'closed' mean?

1. Do I need to stop SOLR search on both indexes before running the merge
command? So a brief downtime is required?
Or do I simply prevent any 'updates/deletes' to these indices during the
merge time so they can still serve up results (read only?) while I am
creating a new merged index?

2. Before the new index replaces the old index, do I need to stop SOLR for
that instance? Or can I simply move the old index out and place the new
index in the same place, without having to stop SOLR

3. If SOLR has to be stopped during the merge operation, can we work with a
redundant/failover instance and stagger the merge so the search service will
not go down? Any guidelines here are welcome.

Thanks

- ashok
-- 
View this message in context: 
http://www.nabble.com/Merging-Indices-tp20845009p20845009.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solrj queries

2008-12-04 Thread lupiss

Hola! soy nueva en solr, yo uso solrj, y tu pregunta es como consultar con
solrj mira, ahí te va una pequeña descripción de lo que puedes hacer:
1. hacer tu jsp o php, etc. donde el usuario introducira el nombre y
apellido, después lees esos valores de tal vez con algun getText.
2. entonces construyes la consulta:
suponiendo que en tu esquema definiste el campo de nombre como name y el de
apellido como lname; y que en el jsp o php (en tu formulario pues) los
campos de texto donde el usuario introduce el nombre y apellido se llaman
nam, lnam, respectivamente, entonces tendrías que hacer algo así:
String Consulta = name: +nam.getText();+ lname: +lnam.getText();
bien hasta ahora solo tienes una cadena que contiene la consulta que le
mandarás a solr, esta cadena la recibe de parámetro el objeto query,
entonces lo construyes:

SolrQuery query;
QueryResponse qrsp;

//función que consulta
   public  SolrDocumentList consultar(String Consulta) throws
SolrServerException {
SolrDocumentList docs;
query = new SolrQuery();
query.setQuery( Consulta );
qrsp = server.query( query ); 
docs= qrsp.getResults();
return docs;
}

entonces solo mandarías llamar esa función pasando de parámetro la cadena
consulta que creaste antes:

consultar(Consulta)

-- 
View this message in context: 
http://www.nabble.com/Solrj-queries-tp20494859p20845145.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solrj queries

2008-12-04 Thread lupiss

me faltó ;
jejeje
-- 
View this message in context: 
http://www.nabble.com/Solrj-queries-tp20494859p20845222.html
Sent from the Solr - User mailing list archive at Nabble.com.



Quick thanks for all the assistance getting me up to speed

2008-12-04 Thread Ian Connor
Hi Yonik Seeley, Erik Hatcher and others.

Thanks for all the help fixing the bugs I ran into using the new 1.3
distributed features with rails (shards).

I now have medline fully indexed in 7 solr shards (with 2 spare). Each
server has 8GB RAM and a Quad Core 2.4GHz. As a test, I ran about 2 million
queries over it last night (pubmed gets about 3 million per day) and the
response time was within a few seconds. This was also under write load as
well which made me feel very confidant in the scalability of solr and
lucene.

-- 
Regards,

Ian Connor
pubget.com
Cambridge, MA
iconnor [at] mit.edu


Re: NIO not working yet

2008-12-04 Thread wojtekpia

I've updated my deployment to use NIOFSDirectory. Now I'd like to confirm
some previous results with the original FSDirectory. Can I turn it off with
a parameter? I tried:

java
-Dorg.apache.lucene.FSDirectory.class=org.apache.lucene.store.FSDirectory
...

but that didn't work. 

-- 
View this message in context: 
http://www.nabble.com/NIO-not-working-yet-tp20468152p20845732.html
Sent from the Solr - User mailing list archive at Nabble.com.



Solr on Solaris

2008-12-04 Thread Kashyap, Raghu
We are running solr on a solaris box with 4 CPU's(8 cores) and  3GB Ram.
When we try to index sometimes the HTTP Connection just hangs and the
client which is posting documents to solr doesn't get any response back.
We since then have added timeouts to our http requests from the clients.

 

I then get this error.

 

java.lang.OutOfMemoryError: requested 239848 bytes for Chunk::new. Out
of swap space?

java.lang.OutOfMemoryError: unable to create new native thread

Exception in thread JmxRmiRegistryConnectionPoller
java.lang.OutOfMemoryError: unable to create new native thread

 

We are running JDK 1.6_10 on the solaris box. . The weird thing is we
are running the same application on linux box with JDK 1.6 and we
haven't seen any problem like this.



Any suggestions?

 

-Raghu



Re: Solr on Solaris

2008-12-04 Thread Jon Baer

Just curious, is this off a zone by any chance?

- Jon

On Dec 4, 2008, at 10:40 PM, Kashyap, Raghu wrote:

We are running solr on a solaris box with 4 CPU's(8 cores) and  3GB  
Ram.

When we try to index sometimes the HTTP Connection just hangs and the
client which is posting documents to solr doesn't get any response  
back.
We since then have added timeouts to our http requests from the  
clients.




I then get this error.



java.lang.OutOfMemoryError: requested 239848 bytes for Chunk::new. Out
of swap space?

java.lang.OutOfMemoryError: unable to create new native thread

Exception in thread JmxRmiRegistryConnectionPoller
java.lang.OutOfMemoryError: unable to create new native thread



We are running JDK 1.6_10 on the solaris box. . The weird thing is we
are running the same application on linux box with JDK 1.6 and we
haven't seen any problem like this.



Any suggestions?



-Raghu





Re: Is there a clean way to determine whether a core exists?

2008-12-04 Thread Chris Hostetter

: Subject: Is there a clean way to determine whether a core exists?

doesn't the CoreAdminHandler's STATUS feature make this easy?




-Hoss



Re: Is there a clean way to determine whether a core exists?

2008-12-04 Thread Noble Paul നോബിള്‍ नोब्ळ्
SOLR-880 is an issue raised for the same

On Fri, Dec 5, 2008 at 10:16 AM, Chris Hostetter
[EMAIL PROTECTED] wrote:

 : Subject: Is there a clean way to determine whether a core exists?

 doesn't the CoreAdminHandler's STATUS feature make this easy?




 -Hoss





-- 
--Noble Paul


Re: Is there a clean way to determine whether a core exists?

2008-12-04 Thread Ryan McKinley


On Dec 4, 2008, at 3:57 PM, Dean Thompson wrote:


Thanks for the quick response, Ryan!

Actually, my admin/ping call gives me a 404 if the core doesn't  
exist, which seemed reasonable.  I get the 500 if the core *did*  
exist.


aaah -- check what ping query you have configured and make sure that  
is a valid query.  If you use the example one and then change your  
schema it may be referencing fields that don't exist and give you the  
500






Thanks for the suggestion of using the select URL, but that gives me:

java.lang.NullPointerException
at org.apache.solr.common.util.StrUtils.splitSmart(StrUtils.java:37)



This is a really stupid error we need to fix -- it should actually  
say: missing required parameter q

https://issues.apache.org/jira/browse/SOLR-435


ryan


Re: Is there a clean way to determine whether a core exists?

2008-12-04 Thread Ryan McKinley

yes:
http://localhost:8983/solr/admin/cores?action=STATUS

will give you a list of running cores.  However that is not easy to  
check with a simple status != 404


see:
http://wiki.apache.org/solr/CoreAdmin


On Dec 4, 2008, at 11:46 PM, Chris Hostetter wrote:



: Subject: Is there a clean way to determine whether a core exists?

doesn't the CoreAdminHandler's STATUS feature make this easy?




-Hoss