subject:"Solr\-Distributed search"

d: org.apache.http.ParseException: Invalid content type: - solr distributed search 4.10.4

2017-05-13 Thread Natarajan, Rajeswari

Hi,

When doing a distributed query from solr 4.10.4 ,getting below exception

org.apache.solr.common.SolrException: org.apache.http.ParseException: Invalid 
content type:

org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:311)

org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
org.apache.solr.core.SolrCore.execute(SolrCore.java:1976)

ariba.arches.search.ArchesSearcher.invokeSearch(ArchesSearcher.java:306)

ariba.arches.search.ArchesSearcher.search(ArchesSearcher.java:169)

ariba.arches.search.SearchManagerServlet.handleSelect(SearchManagerServlet.java:651)

ariba.arches.search.SearchManagerServlet.service(SearchManagerServlet.java:146)


javax.servlet.http.HttpServlet.service(HttpServlet.java:848)


query is below:

http://:20042/ 
/search/select?q=(*:*)=xml=5=SupplierID,MarketPrice=:20042
 /search/select/executeS2-63,:20022/ search/select/execute/S1-69


In the code

Below method n SolrCore is used to execute the query.
execute(SolrRequestHandler handler, SolrQueryRequest req, SolrQueryResponse 
rsp) {


Saw ssame issue in https://lists.gt.net/lucene/java-dev/242650.

If we test the distributed query in a stand alone solr  as below it works

http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:8984/solr=true=ipod+solr


Any pointers  to resolve this issue please.


Thank you,
Raji

Re: Fwd: Issue with SOLR Distributed Search

2014-12-18 Thread Shawn Heisey

On 12/18/2014 12:35 AM, rashi gandhi wrote:
 Also, as per our investigation currently there is work ongoing in SOLR
 community to support this concept of distributed/Global IDF. But, I wanted
 to know if there is any solution possible right now to manage/control the
 score of the documents during distributed search, so that the results seem
 more relevant.

SOLR-1632 covers the distributed IDF issue.  Plans right now are to
include this in Solr 5.0 when it is released.

https://issues.apache.org/jira/browse/SOLR-1632

The only way to have a reasonably accurate distributed score currently
is to load your shards as evenly as possible.  A good way to do this is
to use the hash value of the uniqueKey field as the deciding factor for
which shard gets the request.  This is what SolrCloud does if you let it
handle the routing.

Thanks,
Shawn

Fwd: Issue with SOLR Distributed Search

2014-12-17 Thread rashi gandhi

Hi,

This is regarding the issue that we are facing with SOLR distributed search.
In our application, we are managing multiple shards at SOLR server to
manage the load. But there is a problem with the order of results that we
going to return to client during the search.

For Example: Currently there are two shards on which data is randomly
distributed.
When I search something, it was observerd that the results from one shard
appear first and then results from other shard.

Moreover, we are ordering results by applying two levels of sorting
(configurable as per user also):
1. Score
2. Modified Time

I did investigations for the above scenario and found that it is not
necessary that documents coming from one shard will always have the same
score as documents coming from other shard, even if they are identical.
I also went through the various SOLR documentations and links, and found
that currently there is a limitation to distributed search in SOLR that
Inverse-document frequency (IDF) calculations cannot be distributed and
TF/IDF computations are per shard.

This issue is particularly visible when there is significant difference
between the number of documents indexed in each shard. (For Ex: first shard
has 15000 docs and second shard has 5000).

Please review and let me know whether our findings for the above scenario
are appropriate or not.

Also, as per our investigation currently there is work ongoing in SOLR
community to support this concept of distributed/Global IDF. But, I wanted
to know if there is any solution possible right now to manage/control the
score of the documents during distributed search, so that the results seem
more relevant.

Thanks
Rashi

Re: Solr-Distributed search

2014-06-06 Thread Aman Tandon

Hi,

 Does this *shards* parameter will also work in near future with solr 5?

With Regards
Aman Tandon


On Thu, Jun 5, 2014 at 2:59 PM, Mahmoud Almokadem prog.mahm...@gmail.com
wrote:

 Hi, you can search using this sample Url


 http://localhost:8080/solr/core1/select?q=*:*shards=localhost:8080/solr/core1,localhost:8080/solr/core2,localhost:8080/solr/core3

 Mahmoud Almokadem


 On Thu, Jun 5, 2014 at 8:13 AM, Anurag Verma vermanur...@gmail.com
 wrote:

  Hi,
  Can you please help me solr distribued search in multicore? i
 would
  be very happy as i am stuck here.
 
  In java code how do i implement distributed search?
  --
  Thanks  Regards
  Anurag Verma

Re: Solr-Distributed search

2014-06-06 Thread Shawn Heisey

On 6/6/2014 6:25 AM, Aman Tandon wrote:
  Does this *shards* parameter will also work in near future with solr 5?

I am not aware of any plan to deprecate or remove the shards parameter. 
My personal experience is with versions from 1.4.0 through 4.7.2.  It
works in all of those versions.  Without SolrCloud, the shards parameter
is the only way you can do a distributed search.

Thanks,
Shawn

Re: Solr-Distributed search

2014-06-06 Thread Aman Tandon

Thanks shawn.

In my organisation we also want to implement the solrcloud, but the problem
is that, we are using the master-slave architecture and on master we do all
indexing, architecture of master is lower than the slaves.

So if we implement the solrcloud in a fashion that master will be the
leader, and slaves will be the replicas then in that case, in the case of
high load leader can bear it,  I guess every query firstly goes to leader
then it distributes the request as i noticed from the logs and blogs :)

As well as master is in NY and slaves are in Dallas, which also might cause
latency issue and it will instead fail our purpose of faster query response.

So i thought to use this shards parameter so that we query only from the
replicas not to the leader so that leader just work fine. But we were not
sure about this shards parameter, what do you think? what should we do with
latency issue and shards parameter.

With Regards
Aman Tandon


On Fri, Jun 6, 2014 at 7:24 PM, Shawn Heisey s...@elyograg.org wrote:

 On 6/6/2014 6:25 AM, Aman Tandon wrote:
   Does this *shards* parameter will also work in near future with solr 5?

 I am not aware of any plan to deprecate or remove the shards parameter.
 My personal experience is with versions from 1.4.0 through 4.7.2.  It
 works in all of those versions.  Without SolrCloud, the shards parameter
 is the only way you can do a distributed search.

 Thanks,
 Shawn

Re: Solr-Distributed search

2014-06-06 Thread Aman Tandon

Thanks shawn.

In my organisation we also want to implement the solrcloud, but the problem
is that, we are using the master-slave architecture and on master we do all
indexing, architecture of master is lower than the slaves.

So if we implement the solrcloud in a fashion that master will be the
leader, and slaves will be the replicas then in that case, in the case of
high load leader can bear it,  I guess every query firstly goes to leader
then it distributes the request as i noticed from the logs and blogs :)

As well as master is in NY and slaves are in Dallas, which also might cause
latency issue and it will instead fail our purpose of faster query response.

So i thought to use this shards parameter so that we query only from the
replicas not to the leader so that leader just work fine. But we were not
sure about this shards parameter, what do you think? what should we do with
latency issue and shards parameter.

With Regards
Aman Tandon


On Fri, Jun 6, 2014 at 7:24 PM, Shawn Heisey s...@elyograg.org wrote:

 On 6/6/2014 6:25 AM, Aman Tandon wrote:
   Does this *shards* parameter will also work in near future with solr 5?

 I am not aware of any plan to deprecate or remove the shards parameter.
 My personal experience is with versions from 1.4.0 through 4.7.2.  It
 works in all of those versions.  Without SolrCloud, the shards parameter
 is the only way you can do a distributed search.

 Thanks,
 Shawn

Re: Solr-Distributed search

2014-06-06 Thread Shawn Heisey

On 6/6/2014 8:31 AM, Aman Tandon wrote:
 In my organisation we also want to implement the solrcloud, but the problem
 is that, we are using the master-slave architecture and on master we do all
 indexing, architecture of master is lower than the slaves.

 So if we implement the solrcloud in a fashion that master will be the
 leader, and slaves will be the replicas then in that case, in the case of
 high load leader can bear it,  I guess every query firstly goes to leader
 then it distributes the request as i noticed from the logs and blogs :)

 As well as master is in NY and slaves are in Dallas, which also might cause
 latency issue and it will instead fail our purpose of faster query response.

 So i thought to use this shards parameter so that we query only from the
 replicas not to the leader so that leader just work fine. But we were not
 sure about this shards parameter, what do you think? what should we do with
 latency issue and shards parameter.

SolrCloud does not yet have any way to prefer one set of replicas over
the others, so if you just send it requests, they would be sent to both
Dallas and New York, affecting search latency.  Local replica preference
is a desperately needed feature.

Old-style distributed search with the shards parameter, combined with
master/slave replication, is an effective way to be absolutely sure
which servers you are querying.

I would actually recommend that you get rid of replication and have your
index updating software update each copy of the index independently. 
This is how I do my Solr install.  It opens up a whole new set of
possibilities -- you can change the schema and/or config on one set of
servers, or upgrade any component -- Solr, Java, etc., without affecting
the other set of servers at all.

One note: in order for the indexing paradigm I've outlined to be
actually effective, you must separately track which
inserts/updates/deletes have been done for each server set.  If you
don't do that, they can get out of sync when you restart a server. 
Also, if you don't do this, having a server is down for an extended
period of time might cause all indexing activity to stop on BOTH server
sets.

Thanks,
Shawn

Re: Solr-Distributed search

2014-06-06 Thread Aman Tandon

Thanks shawn i will try to think in that way too :)

With Regards
Aman Tandon


On Fri, Jun 6, 2014 at 8:19 PM, Shawn Heisey s...@elyograg.org wrote:

 On 6/6/2014 8:31 AM, Aman Tandon wrote:
  In my organisation we also want to implement the solrcloud, but the
 problem
  is that, we are using the master-slave architecture and on master we do
 all
  indexing, architecture of master is lower than the slaves.
 
  So if we implement the solrcloud in a fashion that master will be the
  leader, and slaves will be the replicas then in that case, in the case of
  high load leader can bear it,  I guess every query firstly goes to leader
  then it distributes the request as i noticed from the logs and blogs :)
 
  As well as master is in NY and slaves are in Dallas, which also might
 cause
  latency issue and it will instead fail our purpose of faster query
 response.
 
  So i thought to use this shards parameter so that we query only from the
  replicas not to the leader so that leader just work fine. But we were not
  sure about this shards parameter, what do you think? what should we do
 with
  latency issue and shards parameter.

 SolrCloud does not yet have any way to prefer one set of replicas over
 the others, so if you just send it requests, they would be sent to both
 Dallas and New York, affecting search latency.  Local replica preference
 is a desperately needed feature.

 Old-style distributed search with the shards parameter, combined with
 master/slave replication, is an effective way to be absolutely sure
 which servers you are querying.

 I would actually recommend that you get rid of replication and have your
 index updating software update each copy of the index independently.
 This is how I do my Solr install.  It opens up a whole new set of
 possibilities -- you can change the schema and/or config on one set of
 servers, or upgrade any component -- Solr, Java, etc., without affecting
 the other set of servers at all.

 One note: in order for the indexing paradigm I've outlined to be
 actually effective, you must separately track which
 inserts/updates/deletes have been done for each server set.  If you
 don't do that, they can get out of sync when you restart a server.
 Also, if you don't do this, having a server is down for an extended
 period of time might cause all indexing activity to stop on BOTH server
 sets.

 Thanks,
 Shawn

Solr-Distributed search

2014-06-05 Thread Anurag Verma

Hi,
Can you please help me solr distribued search in multicore? i would
be very happy as i am stuck here.

In java code how do i implement distributed search?
-- 
Thanks  Regards
Anurag Verma

Re: Solr-Distributed search

2014-06-05 Thread Mahmoud Almokadem

Hi, you can search using this sample Url

http://localhost:8080/solr/core1/select?q=*:*shards=localhost:8080/solr/core1,localhost:8080/solr/core2,localhost:8080/solr/core3

Mahmoud Almokadem


On Thu, Jun 5, 2014 at 8:13 AM, Anurag Verma vermanur...@gmail.com wrote:

 Hi,
 Can you please help me solr distribued search in multicore? i would
 be very happy as i am stuck here.

 In java code how do i implement distributed search?
 --
 Thanks  Regards
 Anurag Verma

Re: Solr Distributed Search vs Hadoop

2011-12-28 Thread Lance Norskog

Here is an example of schema design: a PDF file of 5MB might have
maybe 50k of actual text. The Solr ExtractingRequestHandler will find
that text and only index that. If you set the field to stored=true,
the 5mb will be saved. If saved=false, the PDF is not saved. Instead,
you would store a link to it.

One problem with indexing is that Solr continally copies data into
segments (index parts) while you index. So, each 5MB PDF might get
copied 50 times during a full index job. If you can strip the index
down to what you really want to search on, terabytes become gigabytes.
Solr seems to handle 100g-200g fine on modern hardware.

Lance

On Fri, Dec 23, 2011 at 1:54 AM, Nick Vincent n...@vtype.com wrote:
 For data of this size you may want to look at something like Apache
 Cassandra, which is made specifically to handle data at this kind of
 scale across many machines.

 You can still use Hadoop to analyse and transform the data in a
 performant manner, however it's probably best to do some research on
 this on the relevant technical forums for those technologies.

 Nick



-- 
Lance Norskog
goks...@gmail.com

Re: Solr Distributed Search vs Hadoop

2011-12-28 Thread Ted Dunning

This copying is a bit overstated here because of the way that small
segments are merged into larger segments.  Those larger segments are then
copied much less often than the smaller ones.

While you can wind up with lots of copying in certain extreme cases, it is
quite rare.  In particular, if you have one of the following cases, you
won't see very many copies for any particular document:

- you don't delete files one at a time (i.e. indexing only without updates
or deletion)

or

- most documents that are going to be deleted are deleted as young documents

or

- the probability that any particular document will be deleted in a fixed
period of time decreases exponentially with the age of the documents

Any of these characteristics or many others will prevent a file from being
copied very many times because as the document ages, it keeps company with
similarly aged documents which are accordingly unlikely to have enough
compatriots deleted to make their segment have a small number of live
documents in it.  Put another way, the intervals between merges that a
particular document undergoes will become longer and longer as it ages and
thus the total number of copies it can undergo cannot grow very fast.

On Wed, Dec 28, 2011 at 7:53 PM, Lance Norskog goks...@gmail.com wrote:

 ...
 One problem with indexing is that Solr continally copies data into
 segments (index parts) while you index. So, each 5MB PDF might get
 copied 50 times during a full index job. If you can strip the index
 down to what you really want to search on, terabytes become gigabytes.
 Solr seems to handle 100g-200g fine on modern hardware.

Re: Solr Distributed Search vs Hadoop

2011-12-23 Thread Nick Vincent

For data of this size you may want to look at something like Apache
Cassandra, which is made specifically to handle data at this kind of
scale across many machines.

You can still use Hadoop to analyse and transform the data in a
performant manner, however it's probably best to do some research on
this on the relevant technical forums for those technologies.

Nick

Solr Distributed Search vs Hadoop

2011-12-20 Thread Alireza Salimi

Hi,

I have a basic question, let's say we're going to have a very very huge set
of data.
In a way that for sure we will need many servers (tens or hundreds of
servers).
We will also need failover.
Now the question is, if we should use Hadoop or using Solr Distributed
Search
with shards would be enough?

I've read lots of articles like:
http://www.lucidimagination.com/content/scaling-lucene-and-solr
http://wiki.apache.org/solr/DistributedSearch

But I'm still confused, Solr's distributed search seems to be able to handle
splitting the queries and merging the result. So what's the point of using
Hadoop?

I'm pretty sure I'm missing something here. Can anyone suggest
some links regarding this issue?

Regards

-- 
Alireza Salimi
Java EE Developer

Re: Solr Distributed Search vs Hadoop

2011-12-20 Thread Ted Dunning

You didn't mention how big your data is or how you create it.

Hadoop would mostly used in the preparation of the data or the off-line
creation of indexes.

On Tue, Dec 20, 2011 at 12:28 PM, Alireza Salimi
alireza.sal...@gmail.comwrote:

 Hi,

 I have a basic question, let's say we're going to have a very very huge set
 of data.
 In a way that for sure we will need many servers (tens or hundreds of
 servers).
 We will also need failover.
 Now the question is, if we should use Hadoop or using Solr Distributed
 Search
 with shards would be enough?

 I've read lots of articles like:
 http://www.lucidimagination.com/content/scaling-lucene-and-solr
 http://wiki.apache.org/solr/DistributedSearch

 But I'm still confused, Solr's distributed search seems to be able to
 handle
 splitting the queries and merging the result. So what's the point of using
 Hadoop?

 I'm pretty sure I'm missing something here. Can anyone suggest
 some links regarding this issue?

 Regards

 --
 Alireza Salimi
 Java EE Developer

Re: Solr Distributed Search vs Hadoop

2011-12-20 Thread Alireza Salimi

Well, actually we haven't started the actual project yet.
But probably it will have to handle the data of millions of users,
and a rough estimation for each user's data would be something around
5 MB.

The other problem is that those data will be changed very often.

I hope I answered your question.

Thanks

On Tue, Dec 20, 2011 at 4:00 PM, Ted Dunning ted.dunn...@gmail.com wrote:

 You didn't mention how big your data is or how you create it.

 Hadoop would mostly used in the preparation of the data or the off-line
 creation of indexes.

 On Tue, Dec 20, 2011 at 12:28 PM, Alireza Salimi
 alireza.sal...@gmail.comwrote:

  Hi,
 
  I have a basic question, let's say we're going to have a very very huge
 set
  of data.
  In a way that for sure we will need many servers (tens or hundreds of
  servers).
  We will also need failover.
  Now the question is, if we should use Hadoop or using Solr Distributed
  Search
  with shards would be enough?
 
  I've read lots of articles like:
  http://www.lucidimagination.com/content/scaling-lucene-and-solr
  http://wiki.apache.org/solr/DistributedSearch
 
  But I'm still confused, Solr's distributed search seems to be able to
  handle
  splitting the queries and merging the result. So what's the point of
 using
  Hadoop?
 
  I'm pretty sure I'm missing something here. Can anyone suggest
  some links regarding this issue?
 
  Regards
 
  --
  Alireza Salimi
  Java EE Developer
 




-- 
Alireza Salimi
Java EE Developer

Re: Solr Distributed Search vs Hadoop

2011-12-20 Thread Ted Dunning

Well that begins to not look so much like a Solr/Lucene problem.  Overall
data is moderately large (TB's to 10's of TB's) for Lucene and the
individual user profiles are distinctly large to be storing in Lucene.

If there is part of the profile that you might want to search, that would
be appropriate for Lucene.  If you can split the user data into several
components that are updated independently, then Hbase might be appropriate
with different components in different column families.

You aren't going to get a definitive answer on a mailing list, however.
 You are going to need somebody with a bit of experience to advise you
directly and/or you are going to need to prototype test cases.

On Tue, Dec 20, 2011 at 1:07 PM, Alireza Salimi alireza.sal...@gmail.comwrote:

 Well, actually we haven't started the actual project yet.
 But probably it will have to handle the data of millions of users,
 and a rough estimation for each user's data would be something around
 5 MB.

 The other problem is that those data will be changed very often.

 I hope I answered your question.

 Thanks

 On Tue, Dec 20, 2011 at 4:00 PM, Ted Dunning ted.dunn...@gmail.com
 wrote:

  You didn't mention how big your data is or how you create it.
 
  Hadoop would mostly used in the preparation of the data or the off-line
  creation of indexes.
 
  On Tue, Dec 20, 2011 at 12:28 PM, Alireza Salimi
  alireza.sal...@gmail.comwrote:
 
   Hi,
  
   I have a basic question, let's say we're going to have a very very huge
  set
   of data.
   In a way that for sure we will need many servers (tens or hundreds of
   servers).
   We will also need failover.
   Now the question is, if we should use Hadoop or using Solr Distributed
   Search
   with shards would be enough?
  
   I've read lots of articles like:
   http://www.lucidimagination.com/content/scaling-lucene-and-solr
   http://wiki.apache.org/solr/DistributedSearch
  
   But I'm still confused, Solr's distributed search seems to be able to
   handle
   splitting the queries and merging the result. So what's the point of
  using
   Hadoop?
  
   I'm pretty sure I'm missing something here. Can anyone suggest
   some links regarding this issue?
  
   Regards
  
   --
   Alireza Salimi
   Java EE Developer
  
 



 --
 Alireza Salimi
 Java EE Developer

Re: Huge Performance: Solr distributed search

2011-12-02 Thread Tom Gullo

Interesting info.

You should look into using Solid State Drives.  I moved my search engine to
SSD and saw dramatic improvements.  

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Huge-Performance-Solr-distributed-search-tp3530627p346.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Huge Performance: Solr distributed search

2011-11-28 Thread Artem Lokotosh


Hi all again. Thanks to all for your replies.

On this weekend I'd made some interesting tests, and I would like to  share it 
with you.


First of all I made speed test of my hdd:

root@LSolr:~# hdparm -t /dev/sda9


/dev/sda9:

 Timing buffered disk reads:  146 MB in  3.01 seconds =  48.54 MB/sec


Then with iperf I had tested my network:

[  4]  0.0-18.7 sec  2.00 GBytes917 Mbits/sec


Then, I tried to post my quesries using shard parameter with one

shard, so my queries were like:

http://localhost:8080/solr1/select/?q=(test)qt=requestShards  
http://localhost:8080/solr1/select/?q=%28test%29qt=requestShards

where requestShards is:

requestHandler name=requestShards class=solr.SearchHandler default=false

 lst name=defaults

   str name=echoParamsexplicit/str

   int name=rows10/int

   str name=shards127.0.0.1:8080/solr1  http://127.0.0.1:8080/solr1/str

 /lst

/requestHandler


Maybe its not correct, but:

INFO: [] webapp=/solr1 
path=/select/params={fl=*,scoreident=truestart=0q=(genuflections)qt=requestShardsrows=2000}status=0
 QTime=6525

INFO: [] webapp=/solr1 
path=/select/params={fl=*,scoreident=truestart=0q=(tunefulness)qt=requestShardsrows=2000}
  status=0 QTime=20170

INFO: [] webapp=/solr1 
path=/select/params={fl=*,scoreident=truestart=0q=(societal)qt=requestShardsrows=2000}
  status=0 QTime=44958

INFO: [] webapp=/solr1 
path=/select/params={fl=*,scoreident=truestart=0q=(euchre's)qt=requestShardsrows=2000}
  status=0 QTime=32161

INFO: [] webapp=/solr1 
path=/select/params={fl=*,scoreident=truestart=0q=(monogram's)qt=requestShardsrows=2000}
  status=0 QTime=85252


When I posted similar queries direct to solr1 without requestShards I had:

INFO: [] webapp=/solr1 
path=/select/params={fl=*,scoreident=truestart=0q=(reopening)rows=2000}  
hits=712 status=0 QTime=10

INFO: [] webapp=/solr1 
path=/select/params={fl=*,scoreident=truestart=0q=(housemothers)rows=2000}  
hits=0 status=0 QTime=446

INFO: [] webapp=/solr1 
path=/select/params={fl=*,scoreident=truestart=0q=(harpooners)rows=2000}  
hits=76 status=0 QTime=399

INFO: [] webapp=/solr1 path=/select/  
params={fl=*,scoreident=truestart=0q=(coaxing)rows=2000} hits=562  status=0 
QTime=2820

INFO: [] webapp=/solr1 path=/select/  
params={fl=*,scoreident=truestart=0q=(superstar's)rows=2000}  hits=4748 
status=0 QTime=672

INFO: [] webapp=/solr1 path=/select/  
params={fl=*,scoreident=truestart=0q=(sedateness's)rows=2000}  hits=136 
status=0 QTime=923

INFO: [] webapp=/solr1 path=/select/  
params={fl=*,scoreident=truestart=0q=(petrolatum)rows=2000} hits=8  
status=0 QTime=6183

INFO: [] webapp=/solr1 path=/select/  
params={fl=*,scoreident=truestart=0q=(everlasting's)rows=2000}  hits=1522 
status=0 QTime=2625


And finally I found a bug:

https://issues.apache.org/jira/browse/SOLR-1524  
https://issues.apache.org/jira/browse/SOLR-1524

Why is no activity on it? Its not actual?


Today I wrote a bash script:

#!/bin/bash

ds=$(date +%s.%N)

echo START: $ds  ./data/east_2000

curl  http://127.0.0.1:8080/solr1/select/?fl=*,scoreident=truestart=0q=(east)rows=2000  
http://127.0.0.1:8080/solr1/select/?fl=*,scoreident=truestart=0q=%28east%29rows=2000-s  -s-H 
'Content-type:text/xml; charset=utf-8'  ./data/east_2000

de=$(date +%s.%N)

ddf=$(echo $de - $ds | bc)

echo END: $de  ./data/east_2000

echo DIFF: $ddf  ./data/east_2000


Before runing a Tomcat I'd dropped cache:

root@LSolr:~# echo 3  /proc/sys/vm/drop_caches


Then I started Tomcat and run the script. Result is bellow:

START: 1322476131.783146691

?xml version=1.0 encoding=UTF-8?

response

lst name=responseHeaderint name=status0/intint

name=QTime125/intlst name=paramsstr

name=fl*,score/strstr name=identtrue/strstr

name=start0/strstr name=q(east)/strstr

name=rows2000/str/lst/lstresult name=response

numFound=21439 start=0 maxScore=4.387605

...

/response

END: 1322476180.262770244

DIFF: 48.479623553


File size is:

root@LSolr:~# ls -l | grep east

-rw-r--r-- 1 root root 1063579 Nov 28 12:29 east_2000


I'm using nmon to monitor a HDD activity. It was near 100% when I run  the 
script. But when I tried to run it again the result was:

DIFF: .063678709

and no much HDD activity at nmon.


I can't undestand one thing: is this my huge hardware such as slow HDDor its a 
Solr troubles?

And why is no activity on bug  https://issues.apache.org/jira/browse/SOLR-1524  
https://issues.apache.org/jira/browse/SOLR-1524  since 27/Oct/09 07:19?


On 11/25/2011 10:02 AM, Dmitry Kan wrote:


45 000 000 per shard approx, Tomcat, caching was tweaked in solrconfig and
shard given 12GB of RAM max.

!-- Filter Cache

  Cache used by SolrIndexSearcher for filters (DocSets),
  unordered sets of *all* documents that match a query.  When a
  new searcher is opened, its caches may be prepopulated or
  autowarmed using data from caches in the old searcher.
  autowarmCount is the number of items to prepopulate.  For
  LRUCache, the autowarmed items will be

Re: Huge Performance: Solr distributed search

2011-11-28 Thread Artem Lokotosh

Problem has been resolved. My disk subsystem been a bottleneck for quick search.
I put my indexes to RAM and I see very nice QTimes :)
Sorry for your time, guys.

On Mon, Nov 28, 2011 at 4:02 PM, Artem Lokotosh arco...@gmail.com wrote:
 Hi all again. Thanks to all for your replies.

 On this weekend I'd made some interesting tests, and I would like to  share
 it with you.


 First of all I made speed test of my hdd:

 root@LSolr:~# hdparm -t /dev/sda9


 /dev/sda9:

  Timing buffered disk reads:  146 MB in  3.01 seconds =  48.54 MB/sec


 Then with iperf I had tested my network:

 [  4]  0.0-18.7 sec  2.00 GBytes    917 Mbits/sec


 Then, I tried to post my quesries using shard parameter with one

 shard, so my queries were like:

 http://localhost:8080/solr1/select/?q=(test)qt=requestShards
  http://localhost:8080/solr1/select/?q=%28test%29qt=requestShards

 where requestShards is:

 requestHandler name=requestShards class=solr.SearchHandler
 default=false

  lst name=defaults

   str name=echoParamsexplicit/str

   int name=rows10/int

   str name=shards127.0.0.1:8080/solr1
  http://127.0.0.1:8080/solr1/str

  /lst

 /requestHandler


 Maybe its not correct, but:

 INFO: [] webapp=/solr1
 path=/select/params={fl=*,scoreident=truestart=0q=(genuflections)qt=requestShardsrows=2000}status=0
 QTime=6525

 INFO: [] webapp=/solr1
 path=/select/params={fl=*,scoreident=truestart=0q=(tunefulness)qt=requestShardsrows=2000}
  status=0 QTime=20170

 INFO: [] webapp=/solr1
 path=/select/params={fl=*,scoreident=truestart=0q=(societal)qt=requestShardsrows=2000}
  status=0 QTime=44958

 INFO: [] webapp=/solr1
 path=/select/params={fl=*,scoreident=truestart=0q=(euchre's)qt=requestShardsrows=2000}
  status=0 QTime=32161

 INFO: [] webapp=/solr1
 path=/select/params={fl=*,scoreident=truestart=0q=(monogram's)qt=requestShardsrows=2000}
  status=0 QTime=85252


 When I posted similar queries direct to solr1 without requestShards I had:

 INFO: [] webapp=/solr1
 path=/select/params={fl=*,scoreident=truestart=0q=(reopening)rows=2000}
  hits=712 status=0 QTime=10

 INFO: [] webapp=/solr1
 path=/select/params={fl=*,scoreident=truestart=0q=(housemothers)rows=2000}
  hits=0 status=0 QTime=446

 INFO: [] webapp=/solr1
 path=/select/params={fl=*,scoreident=truestart=0q=(harpooners)rows=2000}
  hits=76 status=0 QTime=399

 INFO: [] webapp=/solr1 path=/select/
  params={fl=*,scoreident=truestart=0q=(coaxing)rows=2000} hits=562
  status=0 QTime=2820

 INFO: [] webapp=/solr1 path=/select/
  params={fl=*,scoreident=truestart=0q=(superstar's)rows=2000}  hits=4748
 status=0 QTime=672

 INFO: [] webapp=/solr1 path=/select/
  params={fl=*,scoreident=truestart=0q=(sedateness's)rows=2000}  hits=136
 status=0 QTime=923

 INFO: [] webapp=/solr1 path=/select/
  params={fl=*,scoreident=truestart=0q=(petrolatum)rows=2000} hits=8
  status=0 QTime=6183

 INFO: [] webapp=/solr1 path=/select/
  params={fl=*,scoreident=truestart=0q=(everlasting's)rows=2000}
  hits=1522 status=0 QTime=2625


 And finally I found a bug:

 https://issues.apache.org/jira/browse/SOLR-1524
  https://issues.apache.org/jira/browse/SOLR-1524

 Why is no activity on it? Its not actual?


 Today I wrote a bash script:

 #!/bin/bash

 ds=$(date +%s.%N)

 echo START: $ds  ./data/east_2000

 curl
  http://127.0.0.1:8080/solr1/select/?fl=*,scoreident=truestart=0q=(east)rows=2000
  http://127.0.0.1:8080/solr1/select/?fl=*,scoreident=truestart=0q=%28east%29rows=2000-s
  -s-H 'Content-type:text/xml; charset=utf-8'  ./data/east_2000

 de=$(date +%s.%N)

 ddf=$(echo $de - $ds | bc)

 echo END: $de  ./data/east_2000

 echo DIFF: $ddf  ./data/east_2000


 Before runing a Tomcat I'd dropped cache:

 root@LSolr:~# echo 3  /proc/sys/vm/drop_caches


 Then I started Tomcat and run the script. Result is bellow:

 START: 1322476131.783146691

 ?xml version=1.0 encoding=UTF-8?

 response

 lst name=responseHeaderint name=status0/intint

 name=QTime125/intlst name=paramsstr

 name=fl*,score/strstr name=identtrue/strstr

 name=start0/strstr name=q(east)/strstr

 name=rows2000/str/lst/lstresult name=response

 numFound=21439 start=0 maxScore=4.387605

 ...

 /response

 END: 1322476180.262770244

 DIFF: 48.479623553


 File size is:

 root@LSolr:~# ls -l | grep east

 -rw-r--r-- 1 root root 1063579 Nov 28 12:29 east_2000


 I'm using nmon to monitor a HDD activity. It was near 100% when I run  the
 script. But when I tried to run it again the result was:

 DIFF: .063678709

 and no much HDD activity at nmon.


 I can't undestand one thing: is this my huge hardware such as slow HDDor its
 a Solr troubles?

 And why is no activity on bug
  https://issues.apache.org/jira/browse/SOLR-1524
  https://issues.apache.org/jira/browse/SOLR-1524  since 27/Oct/09 07:19?


 On 11/25/2011 10:02 AM, Dmitry Kan wrote:

 45 000 000 per shard approx, Tomcat, caching was tweaked in solrconfig and
 shard given 12GB of RAM max.

 !-- Filter Cache

          Cache used by SolrIndexSearcher for filters (DocSets),

Re: Huge Performance: Solr distributed search

2011-11-25 Thread Dmitry Kan

45 000 000 per shard approx, Tomcat, caching was tweaked in solrconfig and
shard given 12GB of RAM max.

!-- Filter Cache

 Cache used by SolrIndexSearcher for filters (DocSets),
 unordered sets of *all* documents that match a query.  When a
 new searcher is opened, its caches may be prepopulated or
 autowarmed using data from caches in the old searcher.
 autowarmCount is the number of items to prepopulate.  For
 LRUCache, the autowarmed items will be the most recently
 accessed items.

 Parameters:
   class - the SolrCache implementation LRUCache or
   (LRUCache or FastLRUCache)
   size - the maximum number of entries in the cache
   initialSize - the initial capacity (number of entries) of
   the cache.  (see java.util.HashMap)
   autowarmCount - the number of entries to prepopulate from
   and old cache.
  --

filterCache class=solr.FastLRUCache size=1200 initialSize=1200
autowarmCount=128/

!-- Query Result Cache

 Caches results of searches - ordered lists of document ids
 (DocList) based on a query, a sort, and the range of
documents requested.
  --

queryResultCache class=solr.LRUCache size=512 initialSize=512
autowarmCount=32/

!-- Document Cache

 Caches Lucene Document objects (the stored fields for each
 document).  Since Lucene internal document ids are transient,
 this cache will not be autowarmed.
  --

documentCache class=solr.LRUCache size=512 initialSize=512
autowarmCount=0/

!-- Field Value Cache

 Cache used to hold field values that are quickly accessible
 by document id.  The fieldValueCache is created by default
 even if not configured here.
  --

!--
   fieldValueCache class=solr.FastLRUCache
size=512
autowarmCount=128
showItems=32 /
  --

!-- Custom Cache

 Example of a generic cache.  These caches may be accessed by
 name through SolrIndexSearcher.getCache(),cacheLookup(), and
 cacheInsert().  The purpose is to enable easy caching of
 user/application level data.  The regenerator argument should
 be specified as an implementation of solr.CacheRegenerator
 if autowarming is desired.
  --

!--
   cache name=myUserCache
  class=solr.LRUCache
  size=4096
  initialSize=1024
  autowarmCount=1024
  regenerator=com.mycompany.MyRegenerator
  /
  --

!-- Lazy Field Loading

 If true, stored fields that are not requested will be loaded
 lazily.  This can result in a significant speed improvement
 if the usual case is to not load all stored fields,
 especially if the skipped fields are large compressed text
 fields.
--

enableLazyFieldLoading
true
/enableLazyFieldLoading

!-- Use Filter For Sorted Query

A possible optimization that attempts to use a filter to
satisfy a search.  If the requested sort does not include
score, then the filterCache will be checked for a filter
matching the query. If found, the filter will be used as the
source of document ids, and then the sort will be applied to
that.

For most situations, this will not be useful unless you
frequently get the same search repeatedly with different sort
options, and none of them ever use score
 --

!--
  useFilterForSortedQuerytrue/useFilterForSortedQuery
 --

!-- Result Window Size

An optimization for use with the queryResultCache.  When a search
is requested, a superset of the requested number of document ids
are collected.  For example, if a search for a particular query
requests matching documents 10 through 19, and queryWindowSize is 50,
then documents 0 through 49 will be collected and cached.  Any further
requests in that range can be satisfied via the cache.
 --

queryResultWindowSize
50
/queryResultWindowSize

!-- Maximum number of documents to cache for any entry in the
queryResultCache.
 --

queryResultMaxDocsCached
200
/queryResultMaxDocsCached


In you case I would first check if the network throughput is a bottleneck.

It would be nice if you could check timestamps of completing a request on
each of the shards and arrival time (via some http sniffer) at the frondend
SOLR's servers. Then you will see if it is frontend taking so much time or
was it a network issue.

Are you shards btw well balanced?

On Thu, Nov 24, 2011 at 7:06 PM, Artem Lokotosh arco...@gmail.com wrote:

  Can you merge, e.g. 3 shards together or is it much effort for your
 team?Yes, we can merge. We'll try to do this and review how it will works
 Merge does not help :(I've tried to merge two shards in one, three
 shards in one, but results are

Re: Huge Performance: Solr distributed search

2011-11-25 Thread Artem Lokotosh


On 11/25/2011 3:13 AM, Mark Miller wrote:


When you search each shard, are you positive that you are using all of the
same parameters? You are sure you are hitting request handlers that are
configured exactly the same and sending exactly the same queries?

I'm my experience, the overhead for distrib search is usually very low.

What types of queries are you trying?


I'm using the simple queries like this

http://192.168.1.90:9090/solr/select/?fl=*,scorestart=0q=(superstar)qt=requestShardsrows=2000

The requestShards handler defined as

requestHandler  name=requestShards  class=solr.SearchHandler  
default=false

lst  name=defaults
str  name=shards

192.168.1.85:8080/solr1,192.168.1.85:8080/solr2,...,192.168.1.85:8080/solr6,
192.168.1.86:8080/solr7,192.168.1.86:8080/solr8,...,192.168.1.86:8080/solr12,
...,
192.168.1.89:8080/solr25,192.168.1.89:8080/solr26,...,192.168.1.89:8080/solr30
/str
int  name=rows10/int

/lst

/requestHandler


--
Best regards,
Artem Lokotoshmailto:arco...@gmail.com

Re: Huge Performance: Solr distributed search

2011-11-25 Thread Mikhail Garber

in general terms, when your Java heap is so large, it is beneficial to
set mx and ms to the same size.

On Wed, Nov 23, 2011 at 5:12 AM, Artem Lokotosh arco...@gmail.com wrote:
 Hi!

 * Data:
 - Solr 3.4;
 - 30 shards ~ 13GB, 27-29M docs each shard.

 * Machine parameters (Ubuntu 10.04 LTS):
 user@Solr:~$ uname -a
 Linux Solr 2.6.32-31-server #61-Ubuntu SMP Fri Apr 8 19:44:42 UTC 2011
 x86_64 GNU/Linux
 user@Solr:~$ cat /proc/cpuinfo
 processor       : 0 - 3
 vendor_id       : GenuineIntel
 cpu family      : 6
 model           : 44
 model name      : Intel(R) Xeon(R) CPU           X5690  @ 3.47GHz
 stepping        : 2
 cpu MHz         : 3458.000
 cache size      : 12288 KB
 fpu             : yes
 fpu_exception   : yes
 cpuid level     : 11
 wp              : yes
 flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
 mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss syscall nx
 rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology
 tsc_reliable nonstop_tsc aperfmperf pni pclmulqdq ssse3 cx16 sse4_1
 sse4_2 popcnt aes hypervisor lahf_lm ida arat
 bogomips        : 6916.00
 clflush size    : 64
 cache_alignment : 64
 address sizes   : 40 bits physical, 48 bits virtual
 power management:
 user@Solr:~$ cat /proc/meminfo
 MemTotal:       16992680 kB
 MemFree:          110424 kB
 Buffers:            9976 kB
 Cached:         11588380 kB
 SwapCached:        41952 kB
 Active:          9860764 kB
 Inactive:        6198668 kB
 Active(anon):    4062144 kB
 Inactive(anon):   398972 kB
 Active(file):    5798620 kB
 Inactive(file):  5799696 kB
 Unevictable:           0 kB
 Mlocked:               0 kB
 SwapTotal:      46873592 kB
 SwapFree:       46810712 kB
 Dirty:                36 kB
 Writeback:             0 kB
 AnonPages:       4424756 kB
 Mapped:           940660 kB
 Shmem:                40 kB
 Slab:             362344 kB
 SReclaimable:     350372 kB
 SUnreclaim:        11972 kB
 KernelStack:        2488 kB
 PageTables:        68568 kB
 NFS_Unstable:          0 kB
 Bounce:                0 kB
 WritebackTmp:          0 kB
 CommitLimit:    55369932 kB
 Committed_AS:    5740556 kB
 VmallocTotal:   34359738367 kB
 VmallocUsed:      350532 kB
 VmallocChunk:   34359384964 kB
 HardwareCorrupted:     0 kB
 HugePages_Total:       0
 HugePages_Free:        0
 HugePages_Rsvd:        0
 HugePages_Surp:        0
 Hugepagesize:       2048 kB
 DirectMap4k:       10240 kB
 DirectMap2M:    17299456 kB

 - Apache Tomcat 6.0.32:
 !-- java arguments --
 -XX:+DisableExplicitGC
 -XX:PermSize=512M
 -XX:MaxPermSize=512M
 -Xmx12G
 -Xms3G
 -XX:NewSize=128M
 -XX:MaxNewSize=128M
 -XX:+UseParNewGC
 -XX:+UseConcMarkSweepGC
 -XX:+CMSClassUnloadingEnabled
 -XX:CMSInitiatingOccupancyFraction=50
 -XX:GCTimeRatio=9
 -XX:MinHeapFreeRatio=25
 -XX:MaxHeapFreeRatio=25
 -verbose:gc
 -XX:+PrintGCTimeStamps
 -Xloggc:/opt/search/tomcat/logs/gc.log

 Out search schema is:
 - 5 servers with configuration above;
 - one tomcat6 application on each server with 6 solr applications.

 - Full addresses are:
 1) 
 http://192.168.1.85:8080/solr1,http://192.168.1.85:8080/solr2,...,http://192.168.1.85:8080/solr6
 2) 
 http://192.168.1.86:8080/solr7,http://192.168.1.86:8080/solr8,...,http://192.168.1.86:8080/solr12
 ...
 5) 
 http://192.168.1.89:8080/solr25,http://192.168.1.89:8080/solr26,...,http://192.168.1.89:8080/solr30
 - At another server there is a additional common application with
 shards paramerter:
 requestHandler name=search class=solr.SearchHandler default=true
 lst name=defaults
 str name=echoParamsexplicit/str
 str 
 name=shards192.168.1.85:8080/solr1,192.168.1.85:8080/solr2,...,192.168.1.89:8080/solr30/str
 int name=rows10/int
 /lst
 /requestHandler
 - schema and solrconfig are identical for all shards, for first shard
 see attach;
 - on these servers are only search, indexation is on another
 (optimized to 2 segments shards replicate with ssh/rsync scripts).

 So now the major problem is huge performance on distributed search.
 Take look on, for example, these logs:
 This is on 30 shards:
 INFO: [] webapp=/solr
 path=/select/params={fl=*,scoreident=truestart=0q=(barium)rows=2000}
 status=0 QTime=40712
 INFO: [] webapp=/solr
 path=/select/params={fl=*,scoreident=truestart=0q=(pittances)rows=2000}
 status=0 QTime=36097
 INFO: [] webapp=/solr
 path=/select/params={fl=*,scoreident=truestart=0q=(reliability)rows=2000}
 status=0 QTime=75756
 INFO: [] webapp=/solr
 path=/select/params={fl=*,scoreident=truestart=0q=(blessing's)rows=2000}
 status=0 QTime=30342
 INFO: [] webapp=/solr
 path=/select/params={fl=*,scoreident=truestart=0q=(reiterated)rows=2000}
 status=0 QTime=55690

 Sometimes QTime is more than 15. But when we run identical queries
 on one shard separately, QTime is between 200 and 1500.
 Does ditributed solr search really slow or our architecture is non
 optimal? Or maybe need to use any third-party applications?
 Thanks for any replies.

 --
 Best regards,
 Artem

Re: Huge Performance: Solr distributed search

2011-11-24 Thread Artem Lokotosh

 Can you merge, e.g. 3 shards together or is it much effort for your 
 team?Yes, we can merge. We'll try to do this and review how it will works
Merge does not help :(I've tried to merge two shards in one, three
shards in one, but results are similar to results first configuration
with 30 shardsbut this solution have an one big minus the optimization
proccess may take more time
In our setup we currently have 16 shards with ~30GB each, but we 
rarelysearch in all of them at once
How many documents per shards in your setup?Any difference between
Tomcat, Jetty or other?
Have you configured your servlet more specifically than default configuration?


On Wed, Nov 23, 2011 at 4:38 PM, Artem Lokotosh arco...@gmail.com wrote:
 Is this log from the frontend SOLR (aggregator) or from a shard?
 from aggregator

 Can you merge, e.g. 3 shards together or is it much effort for your team?
 Yes, we can merge. We'll try to do this and review how it will works
 Thanks, Dmitry

 Any another ideas?

 On Wed, Nov 23, 2011 at 4:01 PM, Dmitry Kan dmitry@gmail.com wrote:
 Hello,

 Is this log from the frontend SOLR (aggregator) or from a shard?
 Can you merge, e.g. 3 shards together or is it much effort for your team?

 In our setup we currently have 16 shards with ~30GB each, but we rarely
 search in all of them at once.

 Best,
 Dmitry

 On Wed, Nov 23, 2011 at 3:12 PM, Artem Lokotosh arco...@gmail.com wrote:

 --
 Best regards,
 Artem Lokotosh        mailto:arco...@gmail.com


-- 
Best regards,
Artem Lokotosh        mailto:arco...@gmail.com

Re: Huge Performance: Solr distributed search

2011-11-24 Thread Artem Lokotosh

How big are the documents you return (how many fields, avg KB per doc, etc.)?
I have a following schema in my solr configurationfieldsfield
name=field1 type=text indexed=true stored=false/field
name=field2 type=text indexed=true stored=true/field
name=field3 type=text indexed=true stored=true/field
name=field4 type=tlong indexed=true stored=true/field
name=field5 type=tdate indexed=true stored=true/field
name=field6 type=text indexed=true stored=true/field
name=field7 type=text indexed=true stored=true/field
name=field8 type=tlong indexed=true stored=true/field
name=field9 type=text indexed=true stored=true/field
name=field10 type=tdate indexed=true stored=true/field
name=field11 type=text indexed=true stored=true/field
name=id type=string indexed=true stored=true
required=true//fields
27M–30M docs and 12-15 GB for each shard, 0.5KB per doc
Does performance get much better if you only request top 100, or top10 
documents instead of top 1000?
 |10 |100 |   1000 |2000
-|---|||
MIN  |   124 |146 |237 | 747
AVG  |   832 |   4666 |  16130 |   72542
MAX  |  3602 |  30197 |  57339 |  159482
QUERIES/5MIN |75 | 73 | 49 |  51
What if you only request a couple fields, instead of fl=*?What if you only 
search 10 shards instead of 30?
Results are similar to table above, btw I need to recieve all fields from shards
Another one problem.I use solrmeter or simple bash script to check the
search speed.I've got QTime from 16K to 24K for first ~20 queriesfrom
50K to 100K for next ~20 queries and until servlet goes down

On Wed, Nov 23, 2011 at 5:55 PM, Robert Stewart bstewart...@gmail.com wrote:
 If you request 1000 docs from each shard, then aggregator is really
 fetching 30,000 total documents, which then it must merge (re-sort
 results, and take top 1000 to return to client).  Its possible that
 SOLR merging implementation needs optimized, but it does not seem like
 it could be that slow.  How big are the documents you return (how many
 fields, avg KB per doc, etc.)?  I would take a look at network to make
 sure that is not some bottleneck, and also to make sure there is not
 some underlying issue making 30 concurrent HTTP requests from the
 aggregator.  I am not an expert in Java, but under .NET there is a
 setting that limits concurrent out-going HTTP requests from a process
 that must be over-ridden via configuration, otherwise by default is
 very limiting.

 Does performance get much better if you only request top 100, or top
 10 documents instead of top 1000?

 What if you only request a couple fields, instead of fl=*?

 What if you only search 10 shards instead of 30?

 I would collect those numbers and try to determine if time increases
 linearly or not as you increase shards and/or # of docs.





 On Wed, Nov 23, 2011 at 9:55 AM, Artem Lokotosh arco...@gmail.com wrote:
 If the response time from each shard shows decent figures, then aggregator 
 seems to be a bottleneck. Do you btw have a lot of concurrent users?For now 
 is not a problem, but we expect from 1K to 10K of concurrent users and 
 maybe more
 On Wed, Nov 23, 2011 at 4:43 PM, Dmitry Kan dmitry@gmail.com wrote:
 If the response time from each shard shows decent figures, then aggregator
 seems to be a bottleneck. Do you btw have a lot of concurrent users?

 On Wed, Nov 23, 2011 at 4:38 PM, Artem Lokotosh arco...@gmail.com wrote:

  Is this log from the frontend SOLR (aggregator) or from a shard?
 from aggregator

  Can you merge, e.g. 3 shards together or is it much effort for your team?
 Yes, we can merge. We'll try to do this and review how it will works
 Thanks, Dmitry

 Any another ideas?


 --
 Best regards,
 Artem Lokotosh        mailto:arco...@gmail.com





-- 
Best regards,
Artem Lokotosh        mailto:arco...@gmail.com

Re: Huge Performance: Solr distributed search

2011-11-24 Thread Mark Miller

On Thu, Nov 24, 2011 at 12:09 PM, Artem Lokotosh arco...@gmail.com wrote:

 How big are the documents you return (how many fields, avg KB per doc,
 etc.)?
 I have a following schema in my solr configurationfieldsfield
 name=field1 type=text indexed=true stored=false/field
 name=field2 type=text indexed=true stored=true/field
 name=field3 type=text indexed=true stored=true/field
 name=field4 type=tlong indexed=true stored=true/field
 name=field5 type=tdate indexed=true stored=true/field
 name=field6 type=text indexed=true stored=true/field
 name=field7 type=text indexed=true stored=true/field
 name=field8 type=tlong indexed=true stored=true/field
 name=field9 type=text indexed=true stored=true/field
 name=field10 type=tdate indexed=true stored=true/field
 name=field11 type=text indexed=true stored=true/field
 name=id type=string indexed=true stored=true
 required=true//fields
 27M–30M docs and 12-15 GB for each shard, 0.5KB per doc
 Does performance get much better if you only request top 100, or top10
 documents instead of top 1000?
  |10 |100 |   1000 |2000
 -|---|||
 MIN  |   124 |146 |237 | 747
 AVG  |   832 |   4666 |  16130 |   72542
 MAX  |  3602 |  30197 |  57339 |  159482
 QUERIES/5MIN |75 | 73 | 49 |  51
 What if you only request a couple fields, instead of fl=*?What if you
 only search 10 shards instead of 30?
 Results are similar to table above, btw I need to recieve all fields from
 shards
 Another one problem.I use solrmeter or simple bash script to check the
 search speed.I've got QTime from 16K to 24K for first ~20 queriesfrom
 50K to 100K for next ~20 queries and until servlet goes down

 On Wed, Nov 23, 2011 at 5:55 PM, Robert Stewart bstewart...@gmail.com
 wrote:
  If you request 1000 docs from each shard, then aggregator is really
  fetching 30,000 total documents, which then it must merge (re-sort
  results, and take top 1000 to return to client).  Its possible that
  SOLR merging implementation needs optimized, but it does not seem like
  it could be that slow.  How big are the documents you return (how many
  fields, avg KB per doc, etc.)?  I would take a look at network to make
  sure that is not some bottleneck, and also to make sure there is not
  some underlying issue making 30 concurrent HTTP requests from the
  aggregator.  I am not an expert in Java, but under .NET there is a
  setting that limits concurrent out-going HTTP requests from a process
  that must be over-ridden via configuration, otherwise by default is
  very limiting.
 
  Does performance get much better if you only request top 100, or top
  10 documents instead of top 1000?
 
  What if you only request a couple fields, instead of fl=*?
 
  What if you only search 10 shards instead of 30?
 
  I would collect those numbers and try to determine if time increases
  linearly or not as you increase shards and/or # of docs.
 
 
 
 
 
  On Wed, Nov 23, 2011 at 9:55 AM, Artem Lokotosh arco...@gmail.com
 wrote:
  If the response time from each shard shows decent figures, then
 aggregator seems to be a bottleneck. Do you btw have a lot of concurrent
 users?For now is not a problem, but we expect from 1K to 10K of concurrent
 users and maybe more
  On Wed, Nov 23, 2011 at 4:43 PM, Dmitry Kan dmitry@gmail.com
 wrote:
  If the response time from each shard shows decent figures, then
 aggregator
  seems to be a bottleneck. Do you btw have a lot of concurrent users?
 
  On Wed, Nov 23, 2011 at 4:38 PM, Artem Lokotosh arco...@gmail.com
 wrote:
 
   Is this log from the frontend SOLR (aggregator) or from a shard?
  from aggregator
 
   Can you merge, e.g. 3 shards together or is it much effort for your
 team?
  Yes, we can merge. We'll try to do this and review how it will works
  Thanks, Dmitry
 
  Any another ideas?
 
 
  --
  Best regards,
  Artem Lokotoshmailto:arco...@gmail.com
 
 



 --
 Best regards,
 Artem Lokotoshmailto:arco...@gmail.com



When you search each shard, are you positive that you are using all of the
same parameters? You are sure you are hitting request handlers that are
configured exactly the same and sending exactly the same queries?

I'm my experience, the overhead for distrib search is usually very low.

What types of queries are you trying?

-- 
- Mark

http://www.lucidimagination.com

Huge Performance: Solr distributed search

2011-11-23 Thread Artem Lokotosh

Hi!

* Data:
- Solr 3.4;
- 30 shards ~ 13GB, 27-29M docs each shard.

* Machine parameters (Ubuntu 10.04 LTS):
user@Solr:~$ uname -a
Linux Solr 2.6.32-31-server #61-Ubuntu SMP Fri Apr 8 19:44:42 UTC 2011
x86_64 GNU/Linux
user@Solr:~$ cat /proc/cpuinfo
processor   : 0 - 3
vendor_id   : GenuineIntel
cpu family  : 6
model   : 44
model name  : Intel(R) Xeon(R) CPU   X5690  @ 3.47GHz
stepping: 2
cpu MHz : 3458.000
cache size  : 12288 KB
fpu : yes
fpu_exception   : yes
cpuid level : 11
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss syscall nx
rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology
tsc_reliable nonstop_tsc aperfmperf pni pclmulqdq ssse3 cx16 sse4_1
sse4_2 popcnt aes hypervisor lahf_lm ida arat
bogomips: 6916.00
clflush size: 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:
user@Solr:~$ cat /proc/meminfo
MemTotal:   16992680 kB
MemFree:  110424 kB
Buffers:9976 kB
Cached: 11588380 kB
SwapCached:41952 kB
Active:  9860764 kB
Inactive:6198668 kB
Active(anon):4062144 kB
Inactive(anon):   398972 kB
Active(file):5798620 kB
Inactive(file):  5799696 kB
Unevictable:   0 kB
Mlocked:   0 kB
SwapTotal:  46873592 kB
SwapFree:   46810712 kB
Dirty:36 kB
Writeback: 0 kB
AnonPages:   4424756 kB
Mapped:   940660 kB
Shmem:40 kB
Slab: 362344 kB
SReclaimable: 350372 kB
SUnreclaim:11972 kB
KernelStack:2488 kB
PageTables:68568 kB
NFS_Unstable:  0 kB
Bounce:0 kB
WritebackTmp:  0 kB
CommitLimit:55369932 kB
Committed_AS:5740556 kB
VmallocTotal:   34359738367 kB
VmallocUsed:  350532 kB
VmallocChunk:   34359384964 kB
HardwareCorrupted: 0 kB
HugePages_Total:   0
HugePages_Free:0
HugePages_Rsvd:0
HugePages_Surp:0
Hugepagesize:   2048 kB
DirectMap4k:   10240 kB
DirectMap2M:17299456 kB

- Apache Tomcat 6.0.32:
!-- java arguments --
-XX:+DisableExplicitGC
-XX:PermSize=512M
-XX:MaxPermSize=512M
-Xmx12G
-Xms3G
-XX:NewSize=128M
-XX:MaxNewSize=128M
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:+CMSClassUnloadingEnabled
-XX:CMSInitiatingOccupancyFraction=50
-XX:GCTimeRatio=9
-XX:MinHeapFreeRatio=25
-XX:MaxHeapFreeRatio=25
-verbose:gc
-XX:+PrintGCTimeStamps
-Xloggc:/opt/search/tomcat/logs/gc.log

Out search schema is:
- 5 servers with configuration above;
- one tomcat6 application on each server with 6 solr applications.

- Full addresses are:
1) 
http://192.168.1.85:8080/solr1,http://192.168.1.85:8080/solr2,...,http://192.168.1.85:8080/solr6
2) 
http://192.168.1.86:8080/solr7,http://192.168.1.86:8080/solr8,...,http://192.168.1.86:8080/solr12
...
5) 
http://192.168.1.89:8080/solr25,http://192.168.1.89:8080/solr26,...,http://192.168.1.89:8080/solr30
- At another server there is a additional common application with
shards paramerter:
requestHandler name=search class=solr.SearchHandler default=true
lst name=defaults
str name=echoParamsexplicit/str
str 
name=shards192.168.1.85:8080/solr1,192.168.1.85:8080/solr2,...,192.168.1.89:8080/solr30/str
int name=rows10/int
/lst
/requestHandler
- schema and solrconfig are identical for all shards, for first shard
see attach;
- on these servers are only search, indexation is on another
(optimized to 2 segments shards replicate with ssh/rsync scripts).

So now the major problem is huge performance on distributed search.
Take look on, for example, these logs:
This is on 30 shards:
INFO: [] webapp=/solr
path=/select/params={fl=*,scoreident=truestart=0q=(barium)rows=2000}
status=0 QTime=40712
INFO: [] webapp=/solr
path=/select/params={fl=*,scoreident=truestart=0q=(pittances)rows=2000}
status=0 QTime=36097
INFO: [] webapp=/solr
path=/select/params={fl=*,scoreident=truestart=0q=(reliability)rows=2000}
status=0 QTime=75756
INFO: [] webapp=/solr
path=/select/params={fl=*,scoreident=truestart=0q=(blessing's)rows=2000}
status=0 QTime=30342
INFO: [] webapp=/solr
path=/select/params={fl=*,scoreident=truestart=0q=(reiterated)rows=2000}
status=0 QTime=55690

Sometimes QTime is more than 15. But when we run identical queries
on one shard separately, QTime is between 200 and 1500.
Does ditributed solr search really slow or our architecture is non
optimal? Or maybe need to use any third-party applications?
Thanks for any replies.

--
Best regards,
Artem

Re: Huge Performance: Solr distributed search

2011-11-23 Thread Dmitry Kan

Hello,

Is this log from the frontend SOLR (aggregator) or from a shard?
Can you merge, e.g. 3 shards together or is it much effort for your team?

In our setup we currently have 16 shards with ~30GB each, but we rarely
search in all of them at once.

Best,
Dmitry

On Wed, Nov 23, 2011 at 3:12 PM, Artem Lokotosh arco...@gmail.com wrote:

 Hi!

 * Data:
 - Solr 3.4;
 - 30 shards ~ 13GB, 27-29M docs each shard.

 * Machine parameters (Ubuntu 10.04 LTS):
 user@Solr:~$ uname -a
 Linux Solr 2.6.32-31-server #61-Ubuntu SMP Fri Apr 8 19:44:42 UTC 2011
 x86_64 GNU/Linux
 user@Solr:~$ cat /proc/cpuinfo
 processor   : 0 - 3
 vendor_id   : GenuineIntel
 cpu family  : 6
 model   : 44
 model name  : Intel(R) Xeon(R) CPU   X5690  @ 3.47GHz
 stepping: 2
 cpu MHz : 3458.000
 cache size  : 12288 KB
 fpu : yes
 fpu_exception   : yes
 cpuid level : 11
 wp  : yes
 flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
 mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss syscall nx
 rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology
 tsc_reliable nonstop_tsc aperfmperf pni pclmulqdq ssse3 cx16 sse4_1
 sse4_2 popcnt aes hypervisor lahf_lm ida arat
 bogomips: 6916.00
 clflush size: 64
 cache_alignment : 64
 address sizes   : 40 bits physical, 48 bits virtual
 power management:
 user@Solr:~$ cat /proc/meminfo
 MemTotal:   16992680 kB
 MemFree:  110424 kB
 Buffers:9976 kB
 Cached: 11588380 kB
 SwapCached:41952 kB
 Active:  9860764 kB
 Inactive:6198668 kB
 Active(anon):4062144 kB
 Inactive(anon):   398972 kB
 Active(file):5798620 kB
 Inactive(file):  5799696 kB
 Unevictable:   0 kB
 Mlocked:   0 kB
 SwapTotal:  46873592 kB
 SwapFree:   46810712 kB
 Dirty:36 kB
 Writeback: 0 kB
 AnonPages:   4424756 kB
 Mapped:   940660 kB
 Shmem:40 kB
 Slab: 362344 kB
 SReclaimable: 350372 kB
 SUnreclaim:11972 kB
 KernelStack:2488 kB
 PageTables:68568 kB
 NFS_Unstable:  0 kB
 Bounce:0 kB
 WritebackTmp:  0 kB
 CommitLimit:55369932 kB
 Committed_AS:5740556 kB
 VmallocTotal:   34359738367 kB
 VmallocUsed:  350532 kB
 VmallocChunk:   34359384964 kB
 HardwareCorrupted: 0 kB
 HugePages_Total:   0
 HugePages_Free:0
 HugePages_Rsvd:0
 HugePages_Surp:0
 Hugepagesize:   2048 kB
 DirectMap4k:   10240 kB
 DirectMap2M:17299456 kB

 - Apache Tomcat 6.0.32:
 !-- java arguments --
 -XX:+DisableExplicitGC
 -XX:PermSize=512M
 -XX:MaxPermSize=512M
 -Xmx12G
 -Xms3G
 -XX:NewSize=128M
 -XX:MaxNewSize=128M
 -XX:+UseParNewGC
 -XX:+UseConcMarkSweepGC
 -XX:+CMSClassUnloadingEnabled
 -XX:CMSInitiatingOccupancyFraction=50
 -XX:GCTimeRatio=9
 -XX:MinHeapFreeRatio=25
 -XX:MaxHeapFreeRatio=25
 -verbose:gc
 -XX:+PrintGCTimeStamps
 -Xloggc:/opt/search/tomcat/logs/gc.log

 Out search schema is:
 - 5 servers with configuration above;
 - one tomcat6 application on each server with 6 solr applications.

 - Full addresses are:
 1) http://192.168.1.85:8080/solr1,http://192.168.1.85:8080/solr2,...,
 http://192.168.1.85:8080/solr6
 2) http://192.168.1.86:8080/solr7,http://192.168.1.86:8080/solr8,...,
 http://192.168.1.86:8080/solr12
 ...
 5) http://192.168.1.89:8080/solr25,http://192.168.1.89:8080/solr26,...,
 http://192.168.1.89:8080/solr30
 - At another server there is a additional common application with
 shards paramerter:
 requestHandler name=search class=solr.SearchHandler default=true
 lst name=defaults
 str name=echoParamsexplicit/str
 str name=shards192.168.1.85:8080/solr1,192.168.1.85:8080/solr2,...,
 192.168.1.89:8080/solr30/str
 int name=rows10/int
 /lst
 /requestHandler
 - schema and solrconfig are identical for all shards, for first shard
 see attach;
 - on these servers are only search, indexation is on another
 (optimized to 2 segments shards replicate with ssh/rsync scripts).

 So now the major problem is huge performance on distributed search.
 Take look on, for example, these logs:
 This is on 30 shards:
 INFO: [] webapp=/solr
 path=/select/params={fl=*,scoreident=truestart=0q=(barium)rows=2000}
 status=0 QTime=40712
 INFO: [] webapp=/solr
 path=/select/params={fl=*,scoreident=truestart=0q=(pittances)rows=2000}
 status=0 QTime=36097
 INFO: [] webapp=/solr

 path=/select/params={fl=*,scoreident=truestart=0q=(reliability)rows=2000}
 status=0 QTime=75756
 INFO: [] webapp=/solr

 path=/select/params={fl=*,scoreident=truestart=0q=(blessing's)rows=2000}
 status=0 QTime=30342
 INFO: [] webapp=/solr

 path=/select/params={fl=*,scoreident=truestart=0q=(reiterated)rows=2000}
 status=0 QTime=55690

 Sometimes QTime is more than 15. But when we run identical queries
 on one shard separately, QTime is between 200 and 1500.
 Does ditributed solr search really slow or our architecture

Re: Huge Performance: Solr distributed search

2011-11-23 Thread Artem Lokotosh

 Is this log from the frontend SOLR (aggregator) or from a shard?
from aggregator

 Can you merge, e.g. 3 shards together or is it much effort for your team?
Yes, we can merge. We'll try to do this and review how it will works
Thanks, Dmitry

Any another ideas?

On Wed, Nov 23, 2011 at 4:01 PM, Dmitry Kan dmitry@gmail.com wrote:
 Hello,

 Is this log from the frontend SOLR (aggregator) or from a shard?
 Can you merge, e.g. 3 shards together or is it much effort for your team?

 In our setup we currently have 16 shards with ~30GB each, but we rarely
 search in all of them at once.

 Best,
 Dmitry

 On Wed, Nov 23, 2011 at 3:12 PM, Artem Lokotosh arco...@gmail.com wrote:

 Hi!

 * Data:
 - Solr 3.4;
 - 30 shards ~ 13GB, 27-29M docs each shard.

 * Machine parameters (Ubuntu 10.04 LTS):
 user@Solr:~$ uname -a
 Linux Solr 2.6.32-31-server #61-Ubuntu SMP Fri Apr 8 19:44:42 UTC 2011
 x86_64 GNU/Linux
 user@Solr:~$ cat /proc/cpuinfo
 processor       : 0 - 3
 vendor_id       : GenuineIntel
 cpu family      : 6
 model           : 44
 model name      : Intel(R) Xeon(R) CPU           X5690  @ 3.47GHz
 stepping        : 2
 cpu MHz         : 3458.000
 cache size      : 12288 KB
 fpu             : yes
 fpu_exception   : yes
 cpuid level     : 11
 wp              : yes
 flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
 mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss syscall nx
 rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology
 tsc_reliable nonstop_tsc aperfmperf pni pclmulqdq ssse3 cx16 sse4_1
 sse4_2 popcnt aes hypervisor lahf_lm ida arat
 bogomips        : 6916.00
 clflush size    : 64
 cache_alignment : 64
 address sizes   : 40 bits physical, 48 bits virtual
 power management:
 user@Solr:~$ cat /proc/meminfo
 MemTotal:       16992680 kB
 MemFree:          110424 kB
 Buffers:            9976 kB
 Cached:         11588380 kB
 SwapCached:        41952 kB
 Active:          9860764 kB
 Inactive:        6198668 kB
 Active(anon):    4062144 kB
 Inactive(anon):   398972 kB
 Active(file):    5798620 kB
 Inactive(file):  5799696 kB
 Unevictable:           0 kB
 Mlocked:               0 kB
 SwapTotal:      46873592 kB
 SwapFree:       46810712 kB
 Dirty:                36 kB
 Writeback:             0 kB
 AnonPages:       4424756 kB
 Mapped:           940660 kB
 Shmem:                40 kB
 Slab:             362344 kB
 SReclaimable:     350372 kB
 SUnreclaim:        11972 kB
 KernelStack:        2488 kB
 PageTables:        68568 kB
 NFS_Unstable:          0 kB
 Bounce:                0 kB
 WritebackTmp:          0 kB
 CommitLimit:    55369932 kB
 Committed_AS:    5740556 kB
 VmallocTotal:   34359738367 kB
 VmallocUsed:      350532 kB
 VmallocChunk:   34359384964 kB
 HardwareCorrupted:     0 kB
 HugePages_Total:       0
 HugePages_Free:        0
 HugePages_Rsvd:        0
 HugePages_Surp:        0
 Hugepagesize:       2048 kB
 DirectMap4k:       10240 kB
 DirectMap2M:    17299456 kB

 - Apache Tomcat 6.0.32:
 !-- java arguments --
 -XX:+DisableExplicitGC
 -XX:PermSize=512M
 -XX:MaxPermSize=512M
 -Xmx12G
 -Xms3G
 -XX:NewSize=128M
 -XX:MaxNewSize=128M
 -XX:+UseParNewGC
 -XX:+UseConcMarkSweepGC
 -XX:+CMSClassUnloadingEnabled
 -XX:CMSInitiatingOccupancyFraction=50
 -XX:GCTimeRatio=9
 -XX:MinHeapFreeRatio=25
 -XX:MaxHeapFreeRatio=25
 -verbose:gc
 -XX:+PrintGCTimeStamps
 -Xloggc:/opt/search/tomcat/logs/gc.log

 Out search schema is:
 - 5 servers with configuration above;
 - one tomcat6 application on each server with 6 solr applications.

 - Full addresses are:
 1) http://192.168.1.85:8080/solr1,http://192.168.1.85:8080/solr2,...,
 http://192.168.1.85:8080/solr6
 2) http://192.168.1.86:8080/solr7,http://192.168.1.86:8080/solr8,...,
 http://192.168.1.86:8080/solr12
 ...
 5) http://192.168.1.89:8080/solr25,http://192.168.1.89:8080/solr26,...,
 http://192.168.1.89:8080/solr30
 - At another server there is a additional common application with
 shards paramerter:
 requestHandler name=search class=solr.SearchHandler default=true
 lst name=defaults
 str name=echoParamsexplicit/str
 str name=shards192.168.1.85:8080/solr1,192.168.1.85:8080/solr2,...,
 192.168.1.89:8080/solr30/str
 int name=rows10/int
 /lst
 /requestHandler
 - schema and solrconfig are identical for all shards, for first shard
 see attach;
 - on these servers are only search, indexation is on another
 (optimized to 2 segments shards replicate with ssh/rsync scripts).

 So now the major problem is huge performance on distributed search.
 Take look on, for example, these logs:
 This is on 30 shards:
 INFO: [] webapp=/solr
 path=/select/params={fl=*,scoreident=truestart=0q=(barium)rows=2000}
 status=0 QTime=40712
 INFO: [] webapp=/solr
 path=/select/params={fl=*,scoreident=truestart=0q=(pittances)rows=2000}
 status=0 QTime=36097
 INFO: [] webapp=/solr

 path=/select/params={fl=*,scoreident=truestart=0q=(reliability)rows=2000}
 status=0 QTime=75756
 INFO: [] webapp=/solr

Re: Huge Performance: Solr distributed search

2011-11-23 Thread Dmitry Kan

If the response time from each shard shows decent figures, then aggregator
seems to be a bottleneck. Do you btw have a lot of concurrent users?

On Wed, Nov 23, 2011 at 4:38 PM, Artem Lokotosh arco...@gmail.com wrote:

  Is this log from the frontend SOLR (aggregator) or from a shard?
 from aggregator

  Can you merge, e.g. 3 shards together or is it much effort for your team?
 Yes, we can merge. We'll try to do this and review how it will works
 Thanks, Dmitry

 Any another ideas?

 On Wed, Nov 23, 2011 at 4:01 PM, Dmitry Kan dmitry@gmail.com wrote:
  Hello,
 
  Is this log from the frontend SOLR (aggregator) or from a shard?
  Can you merge, e.g. 3 shards together or is it much effort for your team?
 
  In our setup we currently have 16 shards with ~30GB each, but we rarely
  search in all of them at once.
 
  Best,
  Dmitry
 
  On Wed, Nov 23, 2011 at 3:12 PM, Artem Lokotosh arco...@gmail.com
 wrote:
 
  Hi!
 
  * Data:
  - Solr 3.4;
  - 30 shards ~ 13GB, 27-29M docs each shard.
 
  * Machine parameters (Ubuntu 10.04 LTS):
  user@Solr:~$ uname -a
  Linux Solr 2.6.32-31-server #61-Ubuntu SMP Fri Apr 8 19:44:42 UTC 2011
  x86_64 GNU/Linux
  user@Solr:~$ cat /proc/cpuinfo
  processor   : 0 - 3
  vendor_id   : GenuineIntel
  cpu family  : 6
  model   : 44
  model name  : Intel(R) Xeon(R) CPU   X5690  @ 3.47GHz
  stepping: 2
  cpu MHz : 3458.000
  cache size  : 12288 KB
  fpu : yes
  fpu_exception   : yes
  cpuid level : 11
  wp  : yes
  flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
  mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss syscall nx
  rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology
  tsc_reliable nonstop_tsc aperfmperf pni pclmulqdq ssse3 cx16 sse4_1
  sse4_2 popcnt aes hypervisor lahf_lm ida arat
  bogomips: 6916.00
  clflush size: 64
  cache_alignment : 64
  address sizes   : 40 bits physical, 48 bits virtual
  power management:
  user@Solr:~$ cat /proc/meminfo
  MemTotal:   16992680 kB
  MemFree:  110424 kB
  Buffers:9976 kB
  Cached: 11588380 kB
  SwapCached:41952 kB
  Active:  9860764 kB
  Inactive:6198668 kB
  Active(anon):4062144 kB
  Inactive(anon):   398972 kB
  Active(file):5798620 kB
  Inactive(file):  5799696 kB
  Unevictable:   0 kB
  Mlocked:   0 kB
  SwapTotal:  46873592 kB
  SwapFree:   46810712 kB
  Dirty:36 kB
  Writeback: 0 kB
  AnonPages:   4424756 kB
  Mapped:   940660 kB
  Shmem:40 kB
  Slab: 362344 kB
  SReclaimable: 350372 kB
  SUnreclaim:11972 kB
  KernelStack:2488 kB
  PageTables:68568 kB
  NFS_Unstable:  0 kB
  Bounce:0 kB
  WritebackTmp:  0 kB
  CommitLimit:55369932 kB
  Committed_AS:5740556 kB
  VmallocTotal:   34359738367 kB
  VmallocUsed:  350532 kB
  VmallocChunk:   34359384964 kB
  HardwareCorrupted: 0 kB
  HugePages_Total:   0
  HugePages_Free:0
  HugePages_Rsvd:0
  HugePages_Surp:0
  Hugepagesize:   2048 kB
  DirectMap4k:   10240 kB
  DirectMap2M:17299456 kB
 
  - Apache Tomcat 6.0.32:
  !-- java arguments --
  -XX:+DisableExplicitGC
  -XX:PermSize=512M
  -XX:MaxPermSize=512M
  -Xmx12G
  -Xms3G
  -XX:NewSize=128M
  -XX:MaxNewSize=128M
  -XX:+UseParNewGC
  -XX:+UseConcMarkSweepGC
  -XX:+CMSClassUnloadingEnabled
  -XX:CMSInitiatingOccupancyFraction=50
  -XX:GCTimeRatio=9
  -XX:MinHeapFreeRatio=25
  -XX:MaxHeapFreeRatio=25
  -verbose:gc
  -XX:+PrintGCTimeStamps
  -Xloggc:/opt/search/tomcat/logs/gc.log
 
  Out search schema is:
  - 5 servers with configuration above;
  - one tomcat6 application on each server with 6 solr applications.
 
  - Full addresses are:
  1) http://192.168.1.85:8080/solr1,http://192.168.1.85:8080/solr2,...,
  http://192.168.1.85:8080/solr6
  2) http://192.168.1.86:8080/solr7,http://192.168.1.86:8080/solr8,...,
  http://192.168.1.86:8080/solr12
  ...
  5) http://192.168.1.89:8080/solr25,http://192.168.1.89:8080/solr26,...,
  http://192.168.1.89:8080/solr30
  - At another server there is a additional common application with
  shards paramerter:
  requestHandler name=search class=solr.SearchHandler default=true
  lst name=defaults
  str name=echoParamsexplicit/str
  str name=shards192.168.1.85:8080/solr1,192.168.1.85:8080/solr2,...,
  192.168.1.89:8080/solr30/str
  int name=rows10/int
  /lst
  /requestHandler
  - schema and solrconfig are identical for all shards, for first shard
  see attach;
  - on these servers are only search, indexation is on another
  (optimized to 2 segments shards replicate with ssh/rsync scripts).
 
  So now the major problem is huge performance on distributed search.
  Take look on, for example, these logs:
  This is on 30 shards:
  INFO: [] webapp=/solr

Re: Huge Performance: Solr distributed search

2011-11-23 Thread Artem Lokotosh

 If the response time from each shard shows decent figures, then aggregator 
 seems to be a bottleneck. Do you btw have a lot of concurrent users?For now 
 is not a problem, but we expect from 1K to 10K of concurrent users and maybe 
 more
On Wed, Nov 23, 2011 at 4:43 PM, Dmitry Kan dmitry@gmail.com wrote:
 If the response time from each shard shows decent figures, then aggregator
 seems to be a bottleneck. Do you btw have a lot of concurrent users?

 On Wed, Nov 23, 2011 at 4:38 PM, Artem Lokotosh arco...@gmail.com wrote:

  Is this log from the frontend SOLR (aggregator) or from a shard?
 from aggregator

  Can you merge, e.g. 3 shards together or is it much effort for your team?
 Yes, we can merge. We'll try to do this and review how it will works
 Thanks, Dmitry

 Any another ideas?


-- 
Best regards,
Artem Lokotosh        mailto:arco...@gmail.com

Re: Huge Performance: Solr distributed search

2011-11-23 Thread Robert Stewart

If you request 1000 docs from each shard, then aggregator is really
fetching 30,000 total documents, which then it must merge (re-sort
results, and take top 1000 to return to client).  Its possible that
SOLR merging implementation needs optimized, but it does not seem like
it could be that slow.  How big are the documents you return (how many
fields, avg KB per doc, etc.)?  I would take a look at network to make
sure that is not some bottleneck, and also to make sure there is not
some underlying issue making 30 concurrent HTTP requests from the
aggregator.  I am not an expert in Java, but under .NET there is a
setting that limits concurrent out-going HTTP requests from a process
that must be over-ridden via configuration, otherwise by default is
very limiting.

Does performance get much better if you only request top 100, or top
10 documents instead of top 1000?

What if you only request a couple fields, instead of fl=*?

What if you only search 10 shards instead of 30?

I would collect those numbers and try to determine if time increases
linearly or not as you increase shards and/or # of docs.





On Wed, Nov 23, 2011 at 9:55 AM, Artem Lokotosh arco...@gmail.com wrote:
 If the response time from each shard shows decent figures, then aggregator 
 seems to be a bottleneck. Do you btw have a lot of concurrent users?For now 
 is not a problem, but we expect from 1K to 10K of concurrent users and maybe 
 more
 On Wed, Nov 23, 2011 at 4:43 PM, Dmitry Kan dmitry@gmail.com wrote:
 If the response time from each shard shows decent figures, then aggregator
 seems to be a bottleneck. Do you btw have a lot of concurrent users?

 On Wed, Nov 23, 2011 at 4:38 PM, Artem Lokotosh arco...@gmail.com wrote:

  Is this log from the frontend SOLR (aggregator) or from a shard?
 from aggregator

  Can you merge, e.g. 3 shards together or is it much effort for your team?
 Yes, we can merge. We'll try to do this and review how it will works
 Thanks, Dmitry

 Any another ideas?


 --
 Best regards,
 Artem Lokotosh        mailto:arco...@gmail.com

About solr distributed search

2011-09-29 Thread Pengkai Qin

Hi all,

Now I'm doing research on solr distributed search, and it is said documents 
more than one million is reasonable to use distributed search.
So I want to know, does anyone have the test result(Such as time cost) of using 
single index and distributed search of more than one million data? I need the 
test result very urgent, thanks in advance!

Best Regards,
Pengkai

RE: About solr distributed search

2011-09-29 Thread Jaeger, Jay - DOT

I am no expert, but here is my take and our situation.

Firstly, are you asking what the minimum number of documents is before it makes 
*any* sense at all to use a distributed search, or are you asking what the 
maximum number of documents is before a distributed search is essentially 
required?  The answers would be different.  I get the feeling you are asking 
the second question, so I'll proceed under that assumption.

I expect that in part the answer is it depends.  I expect that it is mostly a 
function of the size of the index (and the interaction between that and memory 
and search performance), which depends on both the number of documents and how 
much is stored for the documents.  It also would depend upon your update load.

If the documents are small and/or the amount of stuff you store per document is 
small , then until the number of documents and/or updates gets truly enormous a 
single machine will probably be fine.

But, if your documents (the amount stored per document) is very large, then at 
some point the index files get so large that performance on a single machine 
isn't adequate.  Alternatively, if your update load is very very large, you 
might need to spread out that load among multiple servers to handle the update 
load without crippling your ability to respond to queries.

As for a specific instance, we have a single index of 7 Million (going on 28 
Million), with maybe 512 bytes of data stored for each document, with maybe 26 
or so indexed fields (we have a *lot* of copyField operations in order to index 
the data the way we want it, yet preserve the original data to return), and did 
not need to use distributed search.

JRJ

-Original Message-
From: Pengkai Qin [mailto:qin19890...@163.com] 
Sent: Thursday, September 29, 2011 5:15 AM
To: solr-user@lucene.apache.org; d...@lucene.apache.org
Subject: About solr distributed search

Hi all,

Now I'm doing research on solr distributed search, and it is said documents 
more than one million is reasonable to use distributed search.
So I want to know, does anyone have the test result(Such as time cost) of using 
single index and distributed search of more than one million data? I need the 
test result very urgent, thanks in advance!

Best Regards,
Pengkai

Re: About solr distributed search

2011-09-29 Thread Gregor Kaczor


Hi Pengkai,

my experience is based on http://www.findfiles.net/ which holds 700 Mio 
documents, each about 2kb size.


A single Index containing that kind of data should hold below 80 Mio 
documents. In case you have complex queries with lots of facets, 
sorting, function queries then even 50 Mio documents per index could be 
your upper limit. On very fast Hardware and warmed index you might 
deliver results on average within 1 second.


For documents above 5kb in size those numbers might not necessarily be 
the same.


Try to test your documents by creating (NOT COPYING) and index them in 
vast numbers. After every 10 Mio documents test the average response 
time with caches switched off. If the average response time hits your 
threshold, then the number of documents in index is your limit per index.


Scaling up is no problem. AFAIK 20 to 50 indexes should be fine within a 
distributed productive system.


Kind Regards
Gregor

On 09/29/2011 12:14 PM, Pengkai Qin wrote:

Hi all,

Now I'm doing research on solr distributed search, and it is said 
documents more than one million is reasonable to use distributed search.
So I want to know, does anyone have the test result(Such as time cost) 
of using single index and distributed search of more than one million 
data? I need the test result very urgent, thanks in advance!


Best Regards,
Pengkai







--
How to find files on the Internet? FindFiles.net http://findfiles.net!

About solr distributed search

2011-09-29 Thread 秦鹏凯

Hi all,

Now I'm doing research on solr distributed search, and it
 is said documents more than one million is reasonable to use 
distributed search.
So I want to know, does anyone have the test 
result(Such as time cost) of using single index and distributed search 
of more than one million data? I need the test result very urgent, 
thanks in advance!

Best Regards,
Pengkai

Re: About solr distributed search

2011-09-29 Thread Jerry Li

hi

建议你自己搭个环境测试一下吧，1M这点儿数据一点儿问题没有


2011/9/30 秦鹏凯 qinpeng...@yahoo.cn:
 Hi all,

 Now I'm doing research on solr distributed search, and it
  is said documents more than one million is reasonable to use
 distributed search.
 So I want to know, does anyone have the test
 result(Such as time cost) of using single index and distributed search
 of more than one million data? I need the test result very urgent,
 thanks in advance!

 Best Regards,
 Pengkai



--

Re: solr distributed search don't work

2011-09-01 Thread olivier sallou

   requestHandler name=MYREQUESTHANDLER class=solr.SearchHandler
!-- default values for query parameters --
 lst name=defaults
   str name=echoParamsexplicit/str
   str name=facet.methodenum/str
   str name=facet.mincount1/str
   str name=facet.limit10/str
  str name=shards192.168.1.6/solr/,192.168.1.7/solr//str
 /lst
  /requestHandler

2011/8/19 Li Li fancye...@gmail.com

 could you please show me your configuration in solrconfig.xml?

 On Fri, Aug 19, 2011 at 5:31 PM, olivier sallou
 olivier.sal...@gmail.com wrote:
  Hi,
  I do not use spell but I use distributed search, using qt=spell is
 correct,
  should not use qt=\spell.
  For shards, I specify it in solrconfig directly, not in url, but should
  work the same.
  Maybe an issue in your spell request handler.
 
 
  2011/8/19 Li Li fancye...@gmail.com
 
  hi all,
  I follow the wiki http://wiki.apache.org/solr/SpellCheckComponent
  but there is something wrong.
  the url given my the wiki is
 
 
 http://solr:8983/solr/select?q=*:*spellcheck=truespellcheck.build=truespellcheck.q=toyataqt=spellshards.qt=spellshards=solr-shard1:8983/solr,solr-shard2:8983/solr
  but it does not work. I trace the codes and find that
  qt=spellshards.qt=spell should be qt=/spellshards.qt=/spell
  After modification of url, It return all documents but nothing
  about spell check.
  I debug it and find the
  AbstractLuceneSpellChecker.getSuggestions() is called.

solr distributed search don't work

2011-08-19 Thread Li Li

hi all,
 I follow the wiki http://wiki.apache.org/solr/SpellCheckComponent
but there is something wrong.
 the url given my the wiki is
http://solr:8983/solr/select?q=*:*spellcheck=truespellcheck.build=truespellcheck.q=toyataqt=spellshards.qt=spellshards=solr-shard1:8983/solr,solr-shard2:8983/solr
 but it does not work. I trace the codes and find that
qt=spellshards.qt=spell should be qt=/spellshards.qt=/spell
 After modification of url, It return all documents but nothing
about spell check.
 I debug it and find the
AbstractLuceneSpellChecker.getSuggestions() is called.

Re: solr distributed search don't work

2011-08-19 Thread olivier sallou

Hi,
I do not use spell but I use distributed search, using qt=spell is correct,
should not use qt=\spell.
For shards, I specify it in solrconfig directly, not in url, but should
work the same.
Maybe an issue in your spell request handler.


2011/8/19 Li Li fancye...@gmail.com

 hi all,
 I follow the wiki http://wiki.apache.org/solr/SpellCheckComponent
 but there is something wrong.
 the url given my the wiki is

 http://solr:8983/solr/select?q=*:*spellcheck=truespellcheck.build=truespellcheck.q=toyataqt=spellshards.qt=spellshards=solr-shard1:8983/solr,solr-shard2:8983/solr
 but it does not work. I trace the codes and find that
 qt=spellshards.qt=spell should be qt=/spellshards.qt=/spell
 After modification of url, It return all documents but nothing
 about spell check.
 I debug it and find the
 AbstractLuceneSpellChecker.getSuggestions() is called.

Re: solr distributed search don't work

2011-08-19 Thread Li Li

could you please show me your configuration in solrconfig.xml?

On Fri, Aug 19, 2011 at 5:31 PM, olivier sallou
olivier.sal...@gmail.com wrote:
 Hi,
 I do not use spell but I use distributed search, using qt=spell is correct,
 should not use qt=\spell.
 For shards, I specify it in solrconfig directly, not in url, but should
 work the same.
 Maybe an issue in your spell request handler.


 2011/8/19 Li Li fancye...@gmail.com

 hi all,
     I follow the wiki http://wiki.apache.org/solr/SpellCheckComponent
 but there is something wrong.
     the url given my the wiki is

 http://solr:8983/solr/select?q=*:*spellcheck=truespellcheck.build=truespellcheck.q=toyataqt=spellshards.qt=spellshards=solr-shard1:8983/solr,solr-shard2:8983/solr
     but it does not work. I trace the codes and find that
 qt=spellshards.qt=spell should be qt=/spellshards.qt=/spell
     After modification of url, It return all documents but nothing
 about spell check.
     I debug it and find the
 AbstractLuceneSpellChecker.getSuggestions() is called.

Re: a bug of solr distributed search

2010-10-27 Thread Toke Eskildsen

On Tue, 2010-10-26 at 15:48 +0200, Ron Mayer wrote:
 And a third potential reason - it's arguably a feature instead of a bug
 for some applications.  Depending on how I organize my shards, give me
 the most relevant document from each shard for this search seems like
 it could be useful.

You can get that even if the shards scored equally, so it is a
limitation, not a feature. I hope to find the time later this week to
read some of the papers Andrzej was kind enough to point out, but it
seems like I really need to do the heavy lifting of setting up
comparisons for our own material.

The problem is of course to judge the quality of the outputs, but
setting the single index as the norm and plotting the differences in
document positions in the result sets might provide some insight.

Regards,
Toke Eskildsen

Re: a bug of solr distributed search

2010-10-26 Thread Ron Mayer

Andrzej Bialecki wrote:
 On 2010-10-25 11:22, Toke Eskildsen wrote:
 On Thu, 2010-07-22 at 04:21 +0200, Li Li wrote: 
 But itshows a problem of distrubted search without common idf.
 A doc will get different score in different shard.
 Bingo.

 I really don't understand why this fundamental problem with sharding
 isn't mentioned more often. Every time the advice use sharding is
 given, it should be followed with a but be aware that it will make
 relevance ranking unreliable.
 
 The reason is twofold, I think:


And a third potential reason - it's arguably a feature instead of a bug
for some applications.  Depending on how I organize my shards, give me
the most relevant document from each shard for this search seems like
it could be useful.

 * there is an exact solution to this problem, namely to make two
 distributed calls instead of one (first call to collect per-shard IDFs
 for given query terms, second call to submit a query rewritten with the
 global IDF-s). This solution is implemented in SOLR-1632, with some
 caching to reduce the cost for common queries. However, this means that
 now for every query you need to make two calls instead of one, which
 potentially doubles the time to return results (for simple common
 queries - for rare complex queries the time will be still dominated by
 the query runtime on shard servers).
 
 * another reason is that in many many cases the difference between using
 exact global IDF and per-shard IDFs is not that significant. If shards
 are more or less homogenous (e.g. you assign documents to shards by
 hash(docId)) then term distributions will be also similar. So then the
 question is whether you can accept an N% variance in scores across
 shards, or whether you want to bear the cost of an additional
 distributed RPC for every query...
 
 To summarize, I would qualify your statement with: ...if the
 composition of your shards is drastically different. Otherwise the cost
 of using global IDF is not worth it, IMHO.

Re: a bug of solr distributed search

2010-10-25 Thread Toke Eskildsen

On Thu, 2010-07-22 at 04:21 +0200, Li Li wrote: 
 But itshows a problem of distrubted search without common idf.
 A doc will get different score in different shard.

Bingo.

I really don't understand why this fundamental problem with sharding
isn't mentioned more often. Every time the advice use sharding is
given, it should be followed with a but be aware that it will make
relevance ranking unreliable.

Regards,
Toke Eskildsen

Re: a bug of solr distributed search

2010-10-25 Thread Andrzej Bialecki

On 2010-10-25 11:22, Toke Eskildsen wrote:
 On Thu, 2010-07-22 at 04:21 +0200, Li Li wrote: 
 But itshows a problem of distrubted search without common idf.
 A doc will get different score in different shard.
 
 Bingo.
 
 I really don't understand why this fundamental problem with sharding
 isn't mentioned more often. Every time the advice use sharding is
 given, it should be followed with a but be aware that it will make
 relevance ranking unreliable.

The reason is twofold, I think:

* there is an exact solution to this problem, namely to make two
distributed calls instead of one (first call to collect per-shard IDFs
for given query terms, second call to submit a query rewritten with the
global IDF-s). This solution is implemented in SOLR-1632, with some
caching to reduce the cost for common queries. However, this means that
now for every query you need to make two calls instead of one, which
potentially doubles the time to return results (for simple common
queries - for rare complex queries the time will be still dominated by
the query runtime on shard servers).

* another reason is that in many many cases the difference between using
exact global IDF and per-shard IDFs is not that significant. If shards
are more or less homogenous (e.g. you assign documents to shards by
hash(docId)) then term distributions will be also similar. So then the
question is whether you can accept an N% variance in scores across
shards, or whether you want to bear the cost of an additional
distributed RPC for every query...

To summarize, I would qualify your statement with: ...if the
composition of your shards is drastically different. Otherwise the cost
of using global IDF is not worth it, IMHO.

-- 
Best regards,
Andrzej Bialecki 
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Re: a bug of solr distributed search

2010-10-25 Thread Toke Eskildsen

On Mon, 2010-10-25 at 11:50 +0200, Andrzej Bialecki wrote:
 * there is an exact solution to this problem, namely to make two
 distributed calls instead of one (first call to collect per-shard IDFs
 for given query terms, second call to submit a query rewritten with the
 global IDF-s). This solution is implemented in SOLR-1632, with some
 caching to reduce the cost for common queries.

I must admit that I have not tried the patch myself. Looking at
https://issues.apache.org/jira/browse/SOLR-1632
i see that the last comment is from LiLi with a failed patch, but as
there are no further comments it is unclear if the problem is general or
just with LiLi's setup. I might be a bit harsh here, but the other
comments for the JIRA issue also indicate that one would have to be
somewhat adventurous to run this in production. 

 * another reason is that in many many cases the difference between using
 exact global IDF and per-shard IDFs is not that significant. If shards
 are more or less homogenous (e.g. you assign documents to shards by
 hash(docId)) then term distributions will be also similar.

While I agree on the validity of the solution, it does put some serious
constraints on the shard-setup.

 To summarize, I would qualify your statement with: ...if the
 composition of your shards is drastically different. Otherwise the cost
 of using global IDF is not worth it, IMHO.

Do you know of any studies of the differences in ranking with regard to
indexing-distribution by hashing, logical grouping and distributed IDF?

Regards,
Toke Eskildsen

Re: a bug of solr distributed search

2010-10-25 Thread Andrzej Bialecki

On 2010-10-25 13:37, Toke Eskildsen wrote:
 On Mon, 2010-10-25 at 11:50 +0200, Andrzej Bialecki wrote:
 * there is an exact solution to this problem, namely to make two
 distributed calls instead of one (first call to collect per-shard IDFs
 for given query terms, second call to submit a query rewritten with the
 global IDF-s). This solution is implemented in SOLR-1632, with some
 caching to reduce the cost for common queries.
 
 I must admit that I have not tried the patch myself. Looking at
 https://issues.apache.org/jira/browse/SOLR-1632
 i see that the last comment is from LiLi with a failed patch, but as
 there are no further comments it is unclear if the problem is general or
 just with LiLi's setup. I might be a bit harsh here, but the other
 comments for the JIRA issue also indicate that one would have to be
 somewhat adventurous to run this in production. 

Oh, definitely this is not production quality yet - there are known
bugs, for example, that I need to fix, and then it needs to be
forward-ported to trunk. It shouldn't be too much work to bring it back
into usable state.

 * another reason is that in many many cases the difference between using
 exact global IDF and per-shard IDFs is not that significant. If shards
 are more or less homogenous (e.g. you assign documents to shards by
 hash(docId)) then term distributions will be also similar.
 
 While I agree on the validity of the solution, it does put some serious
 constraints on the shard-setup.

True. But this is the simplest setup that just may be enough.

 
 To summarize, I would qualify your statement with: ...if the
 composition of your shards is drastically different. Otherwise the cost
 of using global IDF is not worth it, IMHO.
 
 Do you know of any studies of the differences in ranking with regard to
 indexing-distribution by hashing, logical grouping and distributed IDF?

Unfortunately, this information is surprisingly scarce - research
predating year 2000 is often not applicable, and most current research
concentrates on P2P systems, which are really a different ball of wax.
Here's a few papers that I found that are related to this issue:

* Global Term Weights in Distributed Environments, H. Witschel, 2007
(Elsevier)

* KLEE: A Framework for Distributed Top-k Query Algorithms, S. Michel,
P. Triantafillou, G. Weikum, VLDB'05 (ACM)

* Exploring the Stability of IDF Term Weighting, Xin Fu and  Miao Chen,
2008 (Springer Verlag)

* A Comparison of Techniques for Estimating IDF Values to Generate
Lexical Signatures for the Web, M. Klein, M. Nelson, WIDM'08 (ACM)

* Comparison of dierent Collection Fusion Models in Distributed
Information Retrieval, Alexander Steidinger - this paper gives a nice
comparison framework for different strategies for joining partial
results; apparently we use the most primitive strategy explained there,
based on raw scores...

These papers likely don't fully answer your question, but at least they
provide a broader picture of the issue...

-- 
Best regards,
Andrzej Bialecki 
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Re: a bug of solr distributed search

2010-07-26 Thread MitchK


Good morning,

https://issues.apache.org/jira/browse/SOLR-1632

- Mitch


Li Li wrote:
 
 where is the link of this patch?
 
 2010/7/24 Yonik Seeley yo...@lucidimagination.com:
 On Fri, Jul 23, 2010 at 2:23 PM, MitchK mitc...@web.de wrote:
 why do we do not send the output of TermsComponent of every node in the
 cluster to a Hadoop instance?
 Since TermsComponent does the map-part of the map-reduce concept, Hadoop
 only needs to reduce the stuff. Maybe we even do not need Hadoop for
 this.
 After reducing, every node in the cluster gets the current values to
 compute
 the idf.
 We can store this information in a HashMap-based SolrCache (or something
 like that) to provide constant-time access. To keep the values up to
 date,
 we can repeat that after every x minutes.

 There's already a patch in JIRA that does distributed IDF.
 Hadoop wouldn't be the right tool for that anyway... it's for batch
 oriented systems, not low-latency queries.

 If we got that, it does not care whereas we use doc_X from shard_A or
 shard_B, since they will all have got the same scores.

 That only works if the docs are exactly the same - they may not be.

 -Yonik
 http://www.lucidimagination.com

 
 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p995407.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: a bug of solr distributed search

2010-07-25 Thread Li Li

where is the link of this patch?

2010/7/24 Yonik Seeley yo...@lucidimagination.com:
 On Fri, Jul 23, 2010 at 2:23 PM, MitchK mitc...@web.de wrote:
 why do we do not send the output of TermsComponent of every node in the
 cluster to a Hadoop instance?
 Since TermsComponent does the map-part of the map-reduce concept, Hadoop
 only needs to reduce the stuff. Maybe we even do not need Hadoop for this.
 After reducing, every node in the cluster gets the current values to compute
 the idf.
 We can store this information in a HashMap-based SolrCache (or something
 like that) to provide constant-time access. To keep the values up to date,
 we can repeat that after every x minutes.

 There's already a patch in JIRA that does distributed IDF.
 Hadoop wouldn't be the right tool for that anyway... it's for batch
 oriented systems, not low-latency queries.

 If we got that, it does not care whereas we use doc_X from shard_A or
 shard_B, since they will all have got the same scores.

 That only works if the docs are exactly the same - they may not be.

 -Yonik
 http://www.lucidimagination.com

Re: a bug of solr distributed search

2010-07-25 Thread Li Li

the solr version I used is 1.4

2010/7/26 Li Li fancye...@gmail.com:
 where is the link of this patch?

 2010/7/24 Yonik Seeley yo...@lucidimagination.com:
 On Fri, Jul 23, 2010 at 2:23 PM, MitchK mitc...@web.de wrote:
 why do we do not send the output of TermsComponent of every node in the
 cluster to a Hadoop instance?
 Since TermsComponent does the map-part of the map-reduce concept, Hadoop
 only needs to reduce the stuff. Maybe we even do not need Hadoop for this.
 After reducing, every node in the cluster gets the current values to compute
 the idf.
 We can store this information in a HashMap-based SolrCache (or something
 like that) to provide constant-time access. To keep the values up to date,
 we can repeat that after every x minutes.

 There's already a patch in JIRA that does distributed IDF.
 Hadoop wouldn't be the right tool for that anyway... it's for batch
 oriented systems, not low-latency queries.

 If we got that, it does not care whereas we use doc_X from shard_A or
 shard_B, since they will all have got the same scores.

 That only works if the docs are exactly the same - they may not be.

 -Yonik
 http://www.lucidimagination.com

Re: a bug of solr distributed search

2010-07-24 Thread MitchK


Okay, but than LiLi did something wrong, right?

I mean, if the document exists only at one shard, it should get the same
score whenever one requests it, no?
Of course, this only applies if nothing gets changed between the requests.
The only remaining problem here would be, that you need distributed IDF
(like at the mentioned JIRA-issue) to normalize your results's scoring. 

But the mentioned problem at this mailing-list-posting has nothing to do
with that...

Regards
- Mitch
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p991907.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: a bug of solr distributed search

2010-07-23 Thread MitchK


Yonik,

why do we do not send the output of TermsComponent of every node in the
cluster to a Hadoop instance?
Since TermsComponent does the map-part of the map-reduce concept, Hadoop
only needs to reduce the stuff. Maybe we even do not need Hadoop for this.
After reducing, every node in the cluster gets the current values to compute
the idf.
We can store this information in a HashMap-based SolrCache (or something
like that) to provide constant-time access. To keep the values up to date,
we can repeat that after every x minutes.

If we got that, it does not care whereas we use doc_X from shard_A or
shard_B, since they will all have got the same scores. 

Even if we got large indices with 10 million or more unique terms, this will
only need some megabyte network-traffic.

Kind regards,
- Mitch


Yonik Seeley-2-2 wrote:
 
 As the comments suggest, it's not a bug, but just the best we can do
 for now since our priority queues don't support removal of arbitrary
 elements.  I guess we could rebuild the current priority queue if we
 detect a duplicate, but that will have an obvious performance impact.
 Any other suggestions?
 
 -Yonik
 http://www.lucidimagination.com
 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p990506.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: a bug of solr distributed search

2010-07-23 Thread Yonik Seeley

On Fri, Jul 23, 2010 at 2:23 PM, MitchK mitc...@web.de wrote:
 why do we do not send the output of TermsComponent of every node in the
 cluster to a Hadoop instance?
 Since TermsComponent does the map-part of the map-reduce concept, Hadoop
 only needs to reduce the stuff. Maybe we even do not need Hadoop for this.
 After reducing, every node in the cluster gets the current values to compute
 the idf.
 We can store this information in a HashMap-based SolrCache (or something
 like that) to provide constant-time access. To keep the values up to date,
 we can repeat that after every x minutes.

There's already a patch in JIRA that does distributed IDF.
Hadoop wouldn't be the right tool for that anyway... it's for batch
oriented systems, not low-latency queries.

 If we got that, it does not care whereas we use doc_X from shard_A or
 shard_B, since they will all have got the same scores.

That only works if the docs are exactly the same - they may not be.

-Yonik
http://www.lucidimagination.com

Re: a bug of solr distributed search

2010-07-23 Thread MitchK


... Additionally to my previous posting:
To keep this sync we could do two things:
Waiting for every server to make sure that everyone uses the same values to
compute the score and than apply them.
Or: Let's say that we collect the new values every 15 minutes. To merge and
send them over the network, we declare that this will need 3 additionally
minutes (We want to keep the network traffic for such actions very low, so
we do not send everything instantly).
Okay, and now we say 2 additionally minutes, if 3 were not enough or
something needs a little bit more time than we tought.. After those 2
minutes, every node has to apply the new values.
Pro: If one node gets broken, we do not delay the Application of the new
values.
Con: We need two HashMaps and both will have roughly the same sice. That
means we will waste some RAM for this operation, if we do not write the
values to disk (Which I do not suggest).

Thoughts?

- Mitch

MitchK wrote:
 
 Yonik,
 
 why do we do not send the output of TermsComponent of every node in the
 cluster to a Hadoop instance?
 Since TermsComponent does the map-part of the map-reduce concept, Hadoop
 only needs to reduce the stuff. Maybe we even do not need Hadoop for this.
 After reducing, every node in the cluster gets the current values to
 compute the idf.
 We can store this information in a HashMap-based SolrCache (or something
 like that) to provide constant-time access. To keep the values up to date,
 we can repeat that after every x minutes.
 
 If we got that, it does not care whereas we use doc_X from shard_A or
 shard_B, since they will all have got the same scores. 
 
 Even if we got large indices with 10 million or more unique terms, this
 will only need some megabyte network-traffic.
 
 Kind regards,
 - Mitch
 
 
 Yonik Seeley-2-2 wrote:
 
 As the comments suggest, it's not a bug, but just the best we can do
 for now since our priority queues don't support removal of arbitrary
 elements.  I guess we could rebuild the current priority queue if we
 detect a duplicate, but that will have an obvious performance impact.
 Any other suggestions?
 
 -Yonik
 http://www.lucidimagination.com
 
 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p990551.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: a bug of solr distributed search

2010-07-23 Thread MitchK



That only works if the docs are exactly the same - they may not be. 
Ahm, what? Why? If the uniqueID is the same, the docs *should* be the same,
don't they?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p990563.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: a bug of solr distributed search

2010-07-23 Thread Yonik Seeley

On Fri, Jul 23, 2010 at 2:40 PM, MitchK mitc...@web.de wrote:
 That only works if the docs are exactly the same - they may not be.
 Ahm, what? Why? If the uniqueID is the same, the docs *should* be the same,
 don't they?

Documents aren't supposed to be duplicated across shards... so the
presence of multiple docs with the same id is a bug anyway.  We've
chosen to try and handle it gracefully rather than fail hard.

Some people have treated this as a feature - and that's OK as long as
expectations are set appropriately.

-Yonik
http://www.lucidimagination.com

Re: a bug of solr distributed search

2010-07-22 Thread Yonik Seeley

As the comments suggest, it's not a bug, but just the best we can do
for now since our priority queues don't support removal of arbitrary
elements.  I guess we could rebuild the current priority queue if we
detect a duplicate, but that will have an obvious performance impact.
Any other suggestions?

-Yonik
http://www.lucidimagination.com

On Wed, Jul 21, 2010 at 3:13 AM, Li Li fancye...@gmail.com wrote:
 in QueryComponent.mergeIds. It will remove document which has
 duplicated uniqueKey with others. In current implementation, it use
 the first encountered.
          String prevShard = uniqueDoc.put(id, srsp.getShard());
          if (prevShard != null) {
            // duplicate detected
            numFound--;
            collapseList.remove(id+);
            docs.set(i, null);//remove it.
            // For now, just always use the first encountered since we
 can't currently
            // remove the previous one added to the priority queue.
 If we switched
            // to the Java5 PriorityQueue, this would be easier.
            continue;
            // make which duplicate is used deterministic based on shard
            // if (prevShard.compareTo(srsp.shard) = 0) {
            //  TODO: remove previous from priority queue
            //  continue;
            // }
          }

  It iterate ove ShardResponse by
 for (ShardResponse srsp : sreq.responses)
 But the sreq.responses may be different. That is -- shard1's result
 and shard2's result may interchange position
 So when an uniqueKey(such as url) occurs in both shard1 and shard2.
 which one will be used is unpredicatable. But the socre of these 2
 docs are different because of different idf.
 So the same query will get different result.
 One possible solution is to sort ShardResponse srsp  by shard name.

Re: a bug of solr distributed search

2010-07-22 Thread Chris Hostetter


: As the comments suggest, it's not a bug, but just the best we can do
: for now since our priority queues don't support removal of arbitrary

FYI: I updated the DistributedSearch wiki to be more clear about this -- 
it previously didn't make it explicitly clear that docIds were suppose to 
be unique across all shards, and suggested that there was specific well 
definied behavior when they weren't.


-Hoss

a bug of solr distributed search

2010-07-21 Thread Li Li

in QueryComponent.mergeIds. It will remove document which has
duplicated uniqueKey with others. In current implementation, it use
the first encountered.
  String prevShard = uniqueDoc.put(id, srsp.getShard());
  if (prevShard != null) {
// duplicate detected
numFound--;
collapseList.remove(id+);
docs.set(i, null);//remove it.
// For now, just always use the first encountered since we
can't currently
// remove the previous one added to the priority queue.
If we switched
// to the Java5 PriorityQueue, this would be easier.
continue;
// make which duplicate is used deterministic based on shard
// if (prevShard.compareTo(srsp.shard) = 0) {
//  TODO: remove previous from priority queue
//  continue;
// }
  }

 It iterate ove ShardResponse by
for (ShardResponse srsp : sreq.responses)
But the sreq.responses may be different. That is -- shard1's result
and shard2's result may interchange position
So when an uniqueKey(such as url) occurs in both shard1 and shard2.
which one will be used is unpredicatable. But the socre of these 2
docs are different because of different idf.
So the same query will get different result.
One possible solution is to sort ShardResponse srsp  by shard name.

Re: a bug of solr distributed search

2010-07-21 Thread MitchK


Li Li,

this is the intended behaviour, not a bug.
Otherwise you could get back the same record in a response for several
times, which may not be intended by the user.

Kind regards,
- Mitch
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p983675.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: a bug of solr distributed search

2010-07-21 Thread Li Li

But users will think there is something wrong with it when he/she
search the same query but got different result.

2010/7/21 MitchK mitc...@web.de:

 Li Li,

 this is the intended behaviour, not a bug.
 Otherwise you could get back the same record in a response for several
 times, which may not be intended by the user.

 Kind regards,
 - Mitch
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p983675.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: a bug of solr distributed search

2010-07-21 Thread MitchK


Ah, okay. I understand your problem. Why should doc x be at position 1 when
searching for the first time, and when I search for the 2nd time it occurs
at position 8 - right?

I am not sure, but I think you can't prevent this without custom coding or
making a document's occurence unique.

Kind regards,
- Mitch
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p983771.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: a bug of solr distributed search

2010-07-21 Thread Li Li

yes. This will make user think our search engine has some bug.
from the comments of the codes, it needs more things to do
  if (prevShard != null) {
// For now, just always use the first encountered since we
can't currently
// remove the previous one added to the priority queue.
If we switched
// to the Java5 PriorityQueue, this would be easier.
continue;
// make which duplicate is used deterministic based on shard
// if (prevShard.compareTo(srsp.shard) = 0) {
//  TODO: remove previous from priority queue
//  continue;
// }
  }

2010/7/21 MitchK mitc...@web.de:

 Ah, okay. I understand your problem. Why should doc x be at position 1 when
 searching for the first time, and when I search for the 2nd time it occurs
 at position 8 - right?

 I am not sure, but I think you can't prevent this without custom coding or
 making a document's occurence unique.

 Kind regards,
 - Mitch
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p983771.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: a bug of solr distributed search

2010-07-21 Thread MitchK


I don't know much about the code. 
Maybe you can tell me to what file you are referring?

However, from the comments one can see, that the problem is known but one
decided to let it happen, because of System requirements in the Java
version. 

- Mitch
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p983880.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: a bug of solr distributed search

2010-07-21 Thread Siva Kommuri

How about sorting over the score? Would that be possible?

On Jul 21, 2010, at 12:13 AM, Li Li wrote:

 in QueryComponent.mergeIds. It will remove document which has
 duplicated uniqueKey with others. In current implementation, it use
 the first encountered.
  String prevShard = uniqueDoc.put(id, srsp.getShard());
  if (prevShard != null) {
// duplicate detected
numFound--;
collapseList.remove(id+);
docs.set(i, null);//remove it.
// For now, just always use the first encountered since we
 can't currently
// remove the previous one added to the priority queue.
 If we switched
// to the Java5 PriorityQueue, this would be easier.
continue;
// make which duplicate is used deterministic based on shard
// if (prevShard.compareTo(srsp.shard) = 0) {
//  TODO: remove previous from priority queue
//  continue;
// }
  }
 
 It iterate ove ShardResponse by
 for (ShardResponse srsp : sreq.responses)
 But the sreq.responses may be different. That is -- shard1's result
 and shard2's result may interchange position
 So when an uniqueKey(such as url) occurs in both shard1 and shard2.
 which one will be used is unpredicatable. But the socre of these 2
 docs are different because of different idf.
 So the same query will get different result.
 One possible solution is to sort ShardResponse srsp  by shard name.

Re: a bug of solr distributed search

2010-07-21 Thread MitchK


It already was sorted by score.

The problem here is the following:
Shard_A and shard_B contain doc_X and doc_X.
If you are querying for something, doc_X could have a score of 1.0 at
shard_A and a score of 12.0 at shard_B.

You can never be sure which doc Solr sees first. In the bad case, Solr sees
the doc_X firstly at shard_A and ignores it at shard_B. That means, that the
doc maybe would occur at page 10 in pagination, although it *should* occur
at page 1 or 2.

Kind regards,
- Mitch
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p984743.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: a bug of solr distributed search

2010-07-21 Thread Li Li

I think what Siva mean is that when there are docs with the same url,
leave the doc whose score is large.
This is the right solution.
But itshows a problem of distrubted search without common idf. A doc
will get different score in different shard.
2010/7/22 MitchK mitc...@web.de:

 It already was sorted by score.

 The problem here is the following:
 Shard_A and shard_B contain doc_X and doc_X.
 If you are querying for something, doc_X could have a score of 1.0 at
 shard_A and a score of 12.0 at shard_B.

 You can never be sure which doc Solr sees first. In the bad case, Solr sees
 the doc_X firstly at shard_A and ignores it at shard_B. That means, that the
 doc maybe would occur at page 10 in pagination, although it *should* occur
 at page 1 or 2.

 Kind regards,
 - Mitch
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p984743.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Solr Distributed Search throws org.apache.solr.common.SolrException: Form_too_large Exception

2009-07-01 Thread GiriGG


Hi All,

I am trying to do a distributed search and getting the below error. Please
let me know if you know how to solve this issue.

18:20:28,200 ERROR [STDERR]
org.apache.solr.client.solrj.SolrServerException: Error executing query
18:20:28,200 ERROR [STDERR] at
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:96)
18:20:28,200 ERROR [STDERR] at
org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:109)
...
18:20:28,202 ERROR [STDERR] Caused by: org.apache.solr.common.SolrException:
Form_too_large__javalangIllegalStateException_Form_too_large__at_orgmortbayjettyRequestextractParametersRequestjava1273__at_orgmortbayjettyRequestgetParameterMapRequestjava650__
at_orgapachesolrrequestServletSolrParamsinitServletSolrParamsjava29__at_orgapachesolrservletStandardRequestParserparseParamsAndFillStreamsSolrRequestParsersjava392__
at_orgapachesolrservletSolrRequestParsersparseSolrRequestParsersjava113__at_orgapachesolrservletSolrDispatchFilterdoFilterSolrDispatchFilterjava200__
at_orgmortbayjettyservletServletHandler$CachedChaindoFilterServletHandlerjava1089
__at_orgmortbayjettyservletServletHandlerhandleServletHandlerjava365__
at_orgmortbayjettysecuritySecurityHandlerhandleSecurityHandlerjava216__at_orgmortbayjettyservletSessionHandlerhandleSessionHandlerjava181__
at_orgmortbayjettyhandlerContextHandlerhandleContextHandlerjava712__at_orgmortbayjettywebappWebAppContexthandleWebAppContextjava405__
at_orgmortbayjettyhandlerContextHandlerCollectionhandleContextHandlerCollectionjava211__at_orgmortbayjettyhandlerHandlerCollectionhandleHandlerCollectionjava114__
at_orgmortbayjettyhandlerHandlerWrapperhandleHandlerWrapperjava139__at_orgmortbayjettyServerhandleServerjava285__at_orgmortbayjettyHttpConnectionhandleRequestHttpConnectionjava502__
at_orgmortbayjettyHttpConnection$RequestHandlercontentHttpConnectionjava835__at_orgmortbayjettyHttpParserparseNextHttpParserjava641__at_orgmortbayjettyHttpParserparseAvailableHttpParserjava202__
at_orgmortbayjettyHttpConnectionhandleHttpConnectionjava378__at_orgmortbayjettybioSocketConnector$ConnectionrunSocketConnectorjava226__at_orgmortbaythreadBoundedThreadPool$PoolThreadrunBoundedThreadPooljava442
___Form_too_large__javalangIllegalStateException_Form_too_large__at_orgmortbayjettyRequestextractParametersRequestjava1273__at_orgmortbayjettyRequestgetParameterMapRequestjava650__at_orgapachesolrrequestServletSolrParamsinitServletSolrParamsjava29__
at_orgapachesolrservletStandardRequestParserparseParamsAndFillStreamsSolrRequestParsersjava392__at_orgapachesolrservletSolrRequestParserspa

My code:

String SOLR_SHARD1 = ap1.corp.org.com:8983/solr/;
String SOLR_SHARD2 = ap2.corp.org.com:8983/solr/;
String SOLR_SHARDS = SOLR_SHARD1 + , + SOLR_SHARD2;

QueryResponse response = null;
SolrServer solr = new
CommonsHttpSolrServer(http://ap1.corp.org.com:8983/solr/;);
String queryStr =...;
SolrQuery query = new SolrQuery();
query.setQuery(queryStr);
response = solr.query(query);
SolrDocumentList docs = response.getResults();
long docNum = docs.getNumFound();


 
-- 
View this message in context: 
http://www.nabble.com/Solr-Distributed-Search-throws-org.apache.solr.common.SolrException%3A-Form_too_large-Exception-tp24295114p24295114.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: solr distributed search example - exception

2009-06-09 Thread Rakhi Khatwani

Hi Mark,
 i actually got this error coz i was using an old version of
java. now the problem is solved

Thanks anyways
Raakhi

On Tue, Jun 9, 2009 at 11:17 AM, Rakhi Khatwani rkhatw...@gmail.com wrote:

 Hi Mark,
 yea i would like to open a JIRA issue for it. how do i go about
 that?

 Regards,
 Raakhi



 On Mon, Jun 8, 2009 at 7:58 PM, Mark Miller markrmil...@gmail.com wrote:

 That is a very odd cast exception to get. Do you want to open a JIRA issue
 for this?

 It looks like an odd exception because the call is:

   NodeList nodes = (NodeList)solrConfig.evaluate(configPath,
 XPathConstants.NODESET); // cast exception is we get an ArrayList rather
 than NodeList

 Which leads to:

 Object o = xpath.evaluate(xstr, doc, type);

 where type = XPathConstants.NODESET

 So you get back an Object based on the XPathConstant passed. There does
 not appear to be a value that would return an ArrayList.
 Using XPathConstants.NODESET gets you a NodeList according to the XPath
 API.

 I'm not sure what could cause this to happen.

 - Mark


 Rakhi Khatwani wrote:

 Hi,
 I was executing a simple example which demonstrates
 DistributedSearch.
 example provided in the following link:

  http://wiki.apache.org/solr/DistributedSearch

 however, when i startup the server in both port nos: 8983 and 7574, i get
 the following exception:

 SEVERE: Could not start SOLR. Check solr/home property
 java.lang.ClassCastException: java.util.ArrayList cannot be cast to
 org.w3c.dom.NodeList
   at

 org.apache.solr.search.CacheConfig.getMultipleConfigs(CacheConfig.java:61)
   at org.apache.solr.core.SolrConfig.init(SolrConfig.java:131)
   at org.apache.solr.core.SolrConfig.init(SolrConfig.java:70)
   at

 org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:117)
   at

 org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:69)
   at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99)
   at
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
   at

 org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:594)
   at org.mortbay.jetty.servlet.Context.startContext(Context.java:139)
   at

 org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1218)
   at
 org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:500)
   at
 org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448)
   at
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
   at

 org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
   at

 org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:161)
   at
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
   at

 org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
   at
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
   at
 org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:117)
   at org.mortbay.jetty.Server.doStart(Server.java:210)
   at
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
   at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:929)
   at java.lang.reflect.Method.invoke(libgcj.so.7rh)
   at org.mortbay.start.Main.invokeMain(Main.java:183)
   at org.mortbay.start.Main.start(Main.java:497)
   at org.mortbay.start.Main.main(Main.java:115)
 2009-06-08 18:36:28.016::WARN:  failed SolrRequestFilter
 java.lang.NoClassDefFoundError: org.apache.solr.core.SolrCore
   at java.lang.Class.initializeClass(libgcj.so.7rh)
   at

 org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:77)
   at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99)
   at
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
   at

 org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:594)
   at org.mortbay.jetty.servlet.Context.startContext(Context.java:139)
   at

 org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1218)
   at
 org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:500)
   at
 org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448)
   at
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
   at

 org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
   at

 org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:161)
   at
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
   at

 org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
   at
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
   at
 org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:117)
   at org.mortbay.jetty.Server.doStart(Server.java:210)
   at

Re: solr distributed search example - exception

2009-06-09 Thread Mark Miller


Thanks for bringing closure to this Raakhi.

- Mark

Rakhi Khatwani wrote:

Hi Mark,
 i actually got this error coz i was using an old version of
java. now the problem is solved

Thanks anyways
Raakhi

On Tue, Jun 9, 2009 at 11:17 AM, Rakhi Khatwani rkhatw...@gmail.com wrote:

  

Hi Mark,
yea i would like to open a JIRA issue for it. how do i go about
that?

Regards,
Raakhi



On Mon, Jun 8, 2009 at 7:58 PM, Mark Miller markrmil...@gmail.com wrote:



That is a very odd cast exception to get. Do you want to open a JIRA issue
for this?

It looks like an odd exception because the call is:

  NodeList nodes = (NodeList)solrConfig.evaluate(configPath,
XPathConstants.NODESET); // cast exception is we get an ArrayList rather
than NodeList

Which leads to:

Object o = xpath.evaluate(xstr, doc, type);

where type = XPathConstants.NODESET

So you get back an Object based on the XPathConstant passed. There does
not appear to be a value that would return an ArrayList.
Using XPathConstants.NODESET gets you a NodeList according to the XPath
API.

I'm not sure what could cause this to happen.

- Mark


Rakhi Khatwani wrote:

  

Hi,
I was executing a simple example which demonstrates
DistributedSearch.
example provided in the following link:

 http://wiki.apache.org/solr/DistributedSearch

however, when i startup the server in both port nos: 8983 and 7574, i get
the following exception:

SEVERE: Could not start SOLR. Check solr/home property
java.lang.ClassCastException: java.util.ArrayList cannot be cast to
org.w3c.dom.NodeList
  at

org.apache.solr.search.CacheConfig.getMultipleConfigs(CacheConfig.java:61)
  at org.apache.solr.core.SolrConfig.init(SolrConfig.java:131)
  at org.apache.solr.core.SolrConfig.init(SolrConfig.java:70)
  at

org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:117)
  at

org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:69)
  at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99)
  at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
  at

org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:594)
  at org.mortbay.jetty.servlet.Context.startContext(Context.java:139)
  at

org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1218)
  at
org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:500)
  at
org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448)
  at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
  at

org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
  at

org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:161)
  at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
  at

org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
  at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
  at
org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:117)
  at org.mortbay.jetty.Server.doStart(Server.java:210)
  at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
  at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:929)
  at java.lang.reflect.Method.invoke(libgcj.so.7rh)
  at org.mortbay.start.Main.invokeMain(Main.java:183)
  at org.mortbay.start.Main.start(Main.java:497)
  at org.mortbay.start.Main.main(Main.java:115)
2009-06-08 18:36:28.016::WARN:  failed SolrRequestFilter
java.lang.NoClassDefFoundError: org.apache.solr.core.SolrCore
  at java.lang.Class.initializeClass(libgcj.so.7rh)
  at

org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:77)
  at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99)
  at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
  at

org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:594)
  at org.mortbay.jetty.servlet.Context.startContext(Context.java:139)
  at

org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1218)
  at
org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:500)
  at
org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448)
  at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
  at

org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
  at

org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:161)
  at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
  at

org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
  at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
  at
org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:117)
  at org.mortbay.jetty.Server.doStart(Server.java:210)
  at

solr distributed search example - exception

2009-06-08 Thread Rakhi Khatwani

Hi,
 I was executing a simple example which demonstrates DistributedSearch.
example provided in the following link:

  http://wiki.apache.org/solr/DistributedSearch

however, when i startup the server in both port nos: 8983 and 7574, i get
the following exception:

SEVERE: Could not start SOLR. Check solr/home property
java.lang.ClassCastException: java.util.ArrayList cannot be cast to
org.w3c.dom.NodeList
   at
org.apache.solr.search.CacheConfig.getMultipleConfigs(CacheConfig.java:61)
   at org.apache.solr.core.SolrConfig.init(SolrConfig.java:131)
   at org.apache.solr.core.SolrConfig.init(SolrConfig.java:70)
   at
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:117)
   at
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:69)
   at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99)
   at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
   at
org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:594)
   at org.mortbay.jetty.servlet.Context.startContext(Context.java:139)
   at
org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1218)
   at
org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:500)
   at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448)
   at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
   at
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
   at
org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:161)
   at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
   at
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
   at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
   at
org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:117)
   at org.mortbay.jetty.Server.doStart(Server.java:210)
   at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
   at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:929)
   at java.lang.reflect.Method.invoke(libgcj.so.7rh)
   at org.mortbay.start.Main.invokeMain(Main.java:183)
   at org.mortbay.start.Main.start(Main.java:497)
   at org.mortbay.start.Main.main(Main.java:115)
2009-06-08 18:36:28.016::WARN:  failed SolrRequestFilter
java.lang.NoClassDefFoundError: org.apache.solr.core.SolrCore
   at java.lang.Class.initializeClass(libgcj.so.7rh)
   at
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:77)
   at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99)
   at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
   at
org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:594)
   at org.mortbay.jetty.servlet.Context.startContext(Context.java:139)
   at
org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1218)
   at
org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:500)
   at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448)
   at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
   at
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
   at
org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:161)
   at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
   at
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
   at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
   at
org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:117)
   at org.mortbay.jetty.Server.doStart(Server.java:210)
   at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
   at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:929)
   at java.lang.reflect.Method.invoke(libgcj.so.7rh)
   at org.mortbay.start.Main.invokeMain(Main.java:183)
   at org.mortbay.start.Main.start(Main.java:497)
   at org.mortbay.start.Main.main(Main.java:115)
Caused by: java.lang.ClassNotFoundException:
org.apache.solr.core.JmxMonitoredMap not found in
StartLoader[file:/home/ithurs/apache-solr-1.3.0/example7574/,
file:/home/ithurs/apache-solr-1.3.0/example7574/lib/jetty-6.1.3.jar,
file:/home/ithurs/apache-solr-1.3.0/example7574/lib/jetty-util-6.1.3.jar,
file:/home/ithurs/apache-solr-1.3.0/example7574/lib/servlet-api-2.5-6.1.3.jar]
   at java.net.URLClassLoader.findClass(libgcj.so.7rh)
   at java.lang.ClassLoader.loadClass(libgcj.so.7rh)
   at java.lang.ClassLoader.loadClass(libgcj.so.7rh)
   at
org.mortbay.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:375)
   at
org.mortbay.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:337)
   at java.lang.Class.forName(libgcj.so.7rh)
   at java.lang.Class.initializeClass(libgcj.so.7rh)
   ...22 more
2009-06-08

Re: solr distributed search example - exception

2009-06-08 Thread Rakhi Khatwani

Hi Mark,
yea i would like to open a JIRA issue for it. how do i go about
that?

Regards,
Raakhi


On Mon, Jun 8, 2009 at 7:58 PM, Mark Miller markrmil...@gmail.com wrote:

 That is a very odd cast exception to get. Do you want to open a JIRA issue
 for this?

 It looks like an odd exception because the call is:

   NodeList nodes = (NodeList)solrConfig.evaluate(configPath,
 XPathConstants.NODESET); // cast exception is we get an ArrayList rather
 than NodeList

 Which leads to:

 Object o = xpath.evaluate(xstr, doc, type);

 where type = XPathConstants.NODESET

 So you get back an Object based on the XPathConstant passed. There does not
 appear to be a value that would return an ArrayList.
 Using XPathConstants.NODESET gets you a NodeList according to the XPath
 API.

 I'm not sure what could cause this to happen.

 - Mark


 Rakhi Khatwani wrote:

 Hi,
 I was executing a simple example which demonstrates DistributedSearch.
 example provided in the following link:

  http://wiki.apache.org/solr/DistributedSearch

 however, when i startup the server in both port nos: 8983 and 7574, i get
 the following exception:

 SEVERE: Could not start SOLR. Check solr/home property
 java.lang.ClassCastException: java.util.ArrayList cannot be cast to
 org.w3c.dom.NodeList
   at
 org.apache.solr.search.CacheConfig.getMultipleConfigs(CacheConfig.java:61)
   at org.apache.solr.core.SolrConfig.init(SolrConfig.java:131)
   at org.apache.solr.core.SolrConfig.init(SolrConfig.java:70)
   at

 org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:117)
   at

 org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:69)
   at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99)
   at
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
   at

 org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:594)
   at org.mortbay.jetty.servlet.Context.startContext(Context.java:139)
   at

 org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1218)
   at
 org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:500)
   at
 org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448)
   at
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
   at

 org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
   at

 org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:161)
   at
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
   at

 org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
   at
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
   at
 org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:117)
   at org.mortbay.jetty.Server.doStart(Server.java:210)
   at
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
   at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:929)
   at java.lang.reflect.Method.invoke(libgcj.so.7rh)
   at org.mortbay.start.Main.invokeMain(Main.java:183)
   at org.mortbay.start.Main.start(Main.java:497)
   at org.mortbay.start.Main.main(Main.java:115)
 2009-06-08 18:36:28.016::WARN:  failed SolrRequestFilter
 java.lang.NoClassDefFoundError: org.apache.solr.core.SolrCore
   at java.lang.Class.initializeClass(libgcj.so.7rh)
   at

 org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:77)
   at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99)
   at
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
   at

 org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:594)
   at org.mortbay.jetty.servlet.Context.startContext(Context.java:139)
   at

 org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1218)
   at
 org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:500)
   at
 org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448)
   at
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
   at

 org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
   at

 org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:161)
   at
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
   at

 org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
   at
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
   at
 org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:117)
   at org.mortbay.jetty.Server.doStart(Server.java:210)
   at
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
   at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:929)
   at java.lang.reflect.Method.invoke(libgcj.so.7rh)
   at org.mortbay.start.Main.invokeMain(Main.java:183)
   at

Re: Question on Solr Distributed Search

2009-04-10 Thread Shalin Shekhar Mangar

On Fri, Apr 10, 2009 at 7:50 AM, vivek sar vivex...@gmail.com wrote:

 Just an update. I changed the schema to store the unique id field, but
 I still get the connection reset exception. I did notice that if there
 is no data in the core then it returns the 0 result (no exception),
 but if there is data and you search using shards parameter I get the
 connection reset exception. Can anyone provide some tip on where can I
 look for this problem?


Did you re-index after changing the field to stored?
-- 
Regards,
Shalin Shekhar Mangar.

Re: Question on Solr Distributed Search

2009-04-10 Thread vivek sar

yes - it's all new indexes. I can search them individually, but adding
shards throws Connection Reset error. Is there any way I can debug
this or any other pointers?

-vivek

On Fri, Apr 10, 2009 at 4:49 AM, Shalin Shekhar Mangar
shalinman...@gmail.com wrote:
 On Fri, Apr 10, 2009 at 7:50 AM, vivek sar vivex...@gmail.com wrote:

 Just an update. I changed the schema to store the unique id field, but
 I still get the connection reset exception. I did notice that if there
 is no data in the core then it returns the 0 result (no exception),
 but if there is data and you search using shards parameter I get the
 connection reset exception. Can anyone provide some tip on where can I
 look for this problem?


 Did you re-index after changing the field to stored?
 --
 Regards,
 Shalin Shekhar Mangar.

Question on Solr Distributed Search

2009-04-09 Thread vivek sar

Hi,

  I've another thread on multi-core distributed search, but just
wanted to put a simple question here on distributed search to get some
response. I've a search query,

   http://etsx19.co.com:8080/solr/20090409_9/select?q=usa -
returns with 10 result

now if I add shards parameter to it,

  
http://etsx19.co.com:8080/solr/20090409_9/select?shards=etsx19.co.com:8080/solr/20090409_9q=usa
 - this fails with

org.apache.solr.client.solrj.SolrServerException:
java.net.SocketException: Connection reset
org.apache.solr.common.SolrException:
org.apache.solr.client.solrj.SolrServerException:
java.net.SocketException: Connection reset at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:282)
at
..
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
at java.lang.Thread.run(Thread.java:637)
Caused by: org.apache.solr.client.solrj.SolrServerException:
java.net.SocketException: Connection reset
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:473)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:242)
at 
org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:422)
..
Caused by: java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:168)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
at 
org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78)
at 
org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106)
at 
org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1116)
at 
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1413)
at 
org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1973)
at 
org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1735)

Attached is my solrconfig.xml. Do I need a special RequestHandler for
sharding? I haven't been able to make any distributed search
successfully. Any help is appreciated.

Note: I'm indexing using Solrj - not sure if that makes any difference
to the search part.

Thanks,
-vivek
?xml version=1.0 ?
!--
 Licensed to the Apache Software Foundation (ASF) under one or more
 contributor license agreements.  See the NOTICE file distributed with
 this work for additional information regarding copyright ownership.
 The ASF licenses this file to You under the Apache License, Version 2.0
 (the License); you may not use this file except in compliance with
 the License.  You may obtain a copy of the License at

 http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing, software
 distributed under the License is distributed on an AS IS BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.
--

config

  !-- Used to specify an alternate directory to hold all index data
   other than the default ./data under the Solr home.
   If replication is in use, this should match the replication configuration. --
  !--
  dataDir./solr/data/dataDir
  --

  indexDefaults
   !-- Values here affect all index writers and act as a default unless overridden. --
useCompoundFiletrue/useCompoundFile
mergeFactor100/mergeFactor
!-- maxBufferedDocs1/maxBufferedDocs --
ramBufferSizeMB64/ramBufferSizeMB
maxMergeDocs2147483647/maxMergeDocs
maxFieldLength1/maxFieldLength
writeLockTimeout1000/writeLockTimeout
commitLockTimeout1/commitLockTimeout
lockTypesingle/lockType
  /indexDefaults

  mainIndex
!-- options specific to the main on-disk lucene index --
useCompoundFiletrue/useCompoundFile
mergeFactor100/mergeFactor

!-- maxBufferedDocs1000/maxBufferedDocs  --
!-- Tell Lucene when to flush documents to disk.
Giving Lucene more memory for indexing means faster indexing at the cost of more RAM
If both ramBufferSizeMB and maxBufferedDocs is set, then Lucene will flush based on whichever limit is hit first.
--
ramBufferSizeMB64/ramBufferSizeMB
maxMergeDocs2147483647/maxMergeDocs
maxFieldLength1/maxFieldLength

!-- If true, unlock any held write or commit locks on startup. 
 This defeats the locking mechanism that allows multiple
 processes to safely access a lucene index, and should be
 used with care. --
unlockOnStartuptrue/unlockOnStartup
lockTypesingle/lockType
  /mainIndex

  !-- the

Re: Question on Solr Distributed Search

2009-04-09 Thread vivek sar

I think the reason behind the connection reset is. Looking at the
code it points to QueryComponent.mergeIds()

resultIds.put(shardDoc.id.toString(), shardDoc);

looks like the doc unique id is returning null. I'm not sure how is it
possible as its a required field. Right my unique id is not stored
(only indexed) - does it has to be stored for distributed search?

HTTP Status 500 - null java.lang.NullPointerException at
org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:432)
at 
org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:276)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:290)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333) at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845)
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
at java.lang.Thread.run(Thread.java:637)

On Thu, Apr 9, 2009 at 5:01 PM, vivek sar vivex...@gmail.com wrote:
 Hi,

  I've another thread on multi-core distributed search, but just
 wanted to put a simple question here on distributed search to get some
 response. I've a search query,

   http://etsx19.co.com:8080/solr/20090409_9/select?q=usa     -
 returns with 10 result

 now if I add shards parameter to it,

  http://etsx19.co.com:8080/solr/20090409_9/select?shards=etsx19.co.com:8080/solr/20090409_9q=usa
  - this fails with

 org.apache.solr.client.solrj.SolrServerException:
 java.net.SocketException: Connection reset
 org.apache.solr.common.SolrException:
 org.apache.solr.client.solrj.SolrServerException:
 java.net.SocketException: Connection reset at
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:282)
 at
 ..
        at 
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
        at 
 org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
        at java.lang.Thread.run(Thread.java:637)
 Caused by: org.apache.solr.client.solrj.SolrServerException:
 java.net.SocketException: Connection reset
        at 
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:473)
        at 
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:242)
        at 
 org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:422)
 ..
 Caused by: java.net.SocketException: Connection reset
        at java.net.SocketInputStream.read(SocketInputStream.java:168)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
        at 
 org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78)
        at 
 org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106)
        at 
 org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1116)
        at 
 org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1413)
        at 
 org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1973)
        at 
 org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1735)

 Attached is my solrconfig.xml. Do I need a special RequestHandler for
 sharding? I haven't been able to make any distributed search
 successfully. Any help is appreciated.

 Note: I'm indexing using Solrj - not sure if that makes any difference
 to the search part.

 Thanks,
 -vivek

Re: Question on Solr Distributed Search

2009-04-09 Thread vivek sar

Just an update. I changed the schema to store the unique id field, but
I still get the connection reset exception. I did notice that if there
is no data in the core then it returns the 0 result (no exception),
but if there is data and you search using shards parameter I get the
connection reset exception. Can anyone provide some tip on where can I
look for this problem?


Apr 10, 2009 3:16:04 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException:
org.apache.solr.client.solrj.SolrServerException:
java.net.SocketException: Connection reset
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:282)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845)
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
at java.lang.Thread.run(Thread.java:637)
Caused by: org.apache.solr.client.solrj.SolrServerException:
java.net.SocketException: Connection reset
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:473)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:242)
at 
org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:422)
at 
org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:395)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
... 1 more
Caused by: java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:168)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
at 
org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78)
at 
org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106)
at 
org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1116)
at 
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1413)
at 
org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1973)
at 
org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1735)
at 
org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1098)
at 
org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398)


On Thu, Apr 9, 2009 at 6:51 PM, vivek sar vivex...@gmail.com wrote:
 I think the reason behind the connection reset is. Looking at the
 code it points to QueryComponent.mergeIds()

 resultIds.put(shardDoc.id.toString(), shardDoc);

 looks like the doc unique id is returning null. I'm not sure how is it
 possible as its a required field. Right my unique id is not stored
 (only indexed) - does it has to be stored for distributed search?

 HTTP Status 500 - null java.lang.NullPointerException at
 org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:432)
 at 
 org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:276)
 at

78 matches

Mail list logo