Understanding RecoveryStrategy

2012-05-05 Thread Trym R. Møller

Hi

Using Solr trunk with the replica feature, I see the below exception 
repeatedly in the Solr log.
I have been looking into the code of RecoveryStrategy#commitOnLeader and 
read the code as follows:
1. sends a commit request (with COMMIT_END_POINT=true) to the Solr 
instance containing the leader of the slice
2. sends a commit request to the Solr instance containing the leader of 
the slice
The first results in a commit on the shards in the single leader Solr 
instance and the second results in a commit on the shards in the single 
leader Solr plus on all other Solrs having slices or replica belonging 
to the collection.


I would expect that the first request is the relevant (and enough to do 
a recovery of the specific replica).

Am I reading the second request wrong or is it a bug?

The code I'm referring to is
UpdateRequest ureq = new UpdateRequest();
ureq.setParams(new ModifiableSolrParams());
ureq.getParams().set(DistributedUpdateProcessor.COMMIT_END_POINT, 
true);

ureq.getParams().set(RecoveryStrategy.class.getName(), baseUrl);
1.ureq.setAction(AbstractUpdateRequest.ACTION.COMMIT, false, 
true).process(

server);
2.server.commit();

Thanks in advance for any input.

Best regards Trym R. Møller

Apr 21, 2012 10:14:11 AM org.apache.solr.common.SolrException log
SEVERE: Error while trying to 
recover:org.apache.solr.client.solrj.SolrServerException: 
http://myIP:8983/solr/myShardId
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:493)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:264)
at 
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:103)
at 
org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:180)
at 
org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:156)
at 
org.apache.solr.cloud.RecoveryStrategy.commitOnLeader(RecoveryStrategy.java:170)
at 
org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:120)
at 
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:341)
at 
org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:206)

Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
at 
org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78)
at 
org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106)
at 
org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1116)
at 
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1413)
at 
org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1973)
at 
org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1735)
at 
org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1098)
at 
org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398)
at 
org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
at 
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
at 
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:440)

... 8 more


Re: Facet and totaltermfreq

2012-05-05 Thread Dmitry Kan
The query:

http://localhost:8983/solr/select/?qt=tvrhq=query:thetv.fl=querytv.all=truef.id.tv.tf=truefacet.field=idfacet=truefacet.limit=-1facet.mincount=1

be careful with facet.limit=-1, it'll pull everything matching the query.
Probably paging would make more sense in your case.
f.id.tv.tf will make sure to output term frequencies.

In my schema the field query contains some text data to search against
(don't get mixed up ;) ). You will also need termVectors=true
termPositions=true termOffsets=true for your searching field.

Relevant exceprts from response:

result name=response numFound=787 start=0
doc
str name=id721/str
str name=querythe doors the end/str
date name=timestamp2002-08-09T04:05:28Z/date
str name=userwifcbcdr/str
/doc
...
/result
lst name=facet_counts
lst name=facet_queries/
lst name=facet_fields
lst name=id
int name=7211/int
...
/lst
/lst
lst name=facet_dates/
lst name=facet_ranges/
lst name=termVectors
lst name=doc-720 !-- Notice how strange the doc-id is less than solr
uniqueKey id by one. I have no idea why it is like this --
str name=uniqueKey721/str
lst name=query
lst name=doors
int name=tf1/int
lst name=offsets
int name=start4/int
int name=end9/int
/lst
lst name=positions
int name=position1/int
/lst
int name=df37/int
double name=tf-idf0.02702702702702703/double
/lst
lst name=end
int name=tf1/int
lst name=offsets
int name=start15/int
int name=end18/int
/lst
lst name=positions
int name=position3/int
/lst
int name=df43/int
double name=tf-idf0.023255813953488372/double
/lst
lst name=*the* !-- In-document term frequency --
int name=tf2/int
lst name=offsets
int name=start0/int
int name=end3/int
int name=start11/int
int name=end14/int
/lst
lst name=positions
int name=position0/int
int name=position2/int
/lst
int name=df787/int
double name=tf-idf0.0025412960609911056/double
/lst
/lst
/lst
...
/lst

-Dmitry

On Sat, May 5, 2012 at 12:05 AM, Jamie Johnson jej2...@gmail.com wrote:

 it might be...can you provide an example of the request/response?

 On Fri, May 4, 2012 at 3:31 PM, Dmitry Kan dmitry@gmail.com wrote:
  I have tried (as a test) combining facets and term vectors (
  http://wiki.apache.org/solr/TermVectorComponent ) in one query and was
 able
  to get a list of facets and for each facet there was a term freq under
  termVectors section. Not sure, if that's what you are trying to achieve.
 
  -Dmitry
 
  On Fri, May 4, 2012 at 8:37 PM, Jamie Johnson jej2...@gmail.com wrote:
 
  Is it possible when faceting to return not only the strings but also
  the total term frequency for those facets?  I am trying to avoid
  building a customized faceting component and making multiple queries.
  In our scenario we have multivalued fields which may have duplicates
  and I would like to be able to get a count of how many documents that
  term appears (currently what faceting does) but also how many times
  that term appears in general.
 
 
 
 
  --
  Regards,
 
  Dmitry Kan




-- 
Regards,

Dmitry Kan


Re: 1MB file to Zookeeper

2012-05-05 Thread Jan Høydahl
ZK is not really designed for keeping large data files, from 
http://zookeeper.apache.org/doc/current/zookeeperProgrammers.html#Data+Access:
 ZooKeeper was not designed to be a general database or large object 
 store.If large data storage is needed, the usually pattern of dealing 
 with such data is to store it on a bulk storage system, such as NFS or HDFS, 
 and store pointers to the storage locations in ZooKeeper.

So perhaps we should think about adding K/V store support to ResourceLoader? If 
a file is 1Mb, a reference to the file is stored in ZK under
the original resource name, in a way that ResourceLoader can tell that it is a 
reference, not the complete file. We then make a simple 
LargeObjectStoreInterface (with get/put/del) which ResourceLoader uses to get 
the complete file based on reference. To start with we can make a
ZkLargeFileStoreImpl where the put(key,val) method chops up the file and stores 
it spanning multiple 1M ZK nodes, and the get(key) method
assembles all parts and returns the object. It would be good enough for most, 
but if you require something better you can easily impl
support for CouchDb, Voldemort or whatever.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 4. mai 2012, at 19:09, Yonik Seeley wrote:

 On Fri, May 4, 2012 at 12:50 PM, Mark Miller markrmil...@gmail.com wrote:
 And how should we detect if data is compressed when
 reading from ZooKeeper?
 
 I was thinking we could somehow use file extensions?
 
 eg synonyms.txt.gzip - then you can use different compression algs depending 
 on the ext, etc.
 
 We would want to try and make it as transparent as possible though...
 
 At first I thought about adding a marker to the beginning of a file, but
 file extensions could work too, as long as the resource loader made it
 transparent
 (i.e. code would just need to ask for synonyms.txt, but the resource
 loader would search
 for synonyms.txt.gzip, etc, if the original name was not found)
 
 Hmmm, but this breaks down for things like watches - I guess that's
 where putting the encoding inside the file would be a better option.
 
 -Yonik
 lucenerevolution.com - Lucene/Solr Open Source Search Conference.
 Boston May 7-10



Re: 1MB file to Zookeeper

2012-05-05 Thread Yonik Seeley
On Sat, May 5, 2012 at 8:39 AM, Jan Høydahl jan@cominvent.com wrote:
 support for CouchDb, Voldemort or whatever.

Hmmm... Or Solr!

-Yonik


solr: adding a string on to a field via DIH

2012-05-05 Thread okayndc
Hello,

Is it possible to concatenate a field via DIH?
For example for the id field, in order to make it unique
I want to add 'project' to the beginning of the id field.
So the field would look like 'project1234'
Is this possible?

field column=id name=id /

Thanks


Re: solr: adding a string on to a field via DIH

2012-05-05 Thread Michael Della Bitta
There might be a Solr way of accomplishing this, but I've always done
stuff like this in SQL (i.e. the CONCAT command). Doing it a Solr-native
way would probably be better in terms of bandwidth consumption, but just
giving you that option early in case there's not a better one.

Michael

On Sat, 2012-05-05 at 09:12 -0400, okayndc wrote:
 Hello,
 
 Is it possible to concatenate a field via DIH?
 For example for the id field, in order to make it unique
 I want to add 'project' to the beginning of the id field.
 So the field would look like 'project1234'
 Is this possible?
 
 field column=id name=id /
 
 Thanks




Re: solr: adding a string on to a field via DIH

2012-05-05 Thread Jack Krupansky
Sounds like you need a Template Transformer: ... it helps to concatenate 
multiple values or add extra characters to field for injection.


entity name=e transformer=TemplateTransformer ..
field column=namedesc template=hello${e.name},${eparent.surname} /
...
/entity

See:
http://wiki.apache.org/solr/DataImportHandler#TemplateTransformer

Or did you have something different in mind?

-- Jack Krupansky

-Original Message- 
From: okayndc

Sent: Saturday, May 05, 2012 9:12 AM
To: solr-user@lucene.apache.org
Subject: solr: adding a string on to a field via DIH

Hello,

Is it possible to concatenate a field via DIH?
For example for the id field, in order to make it unique
I want to add 'project' to the beginning of the id field.
So the field would look like 'project1234'
Is this possible?

field column=id name=id /

Thanks 



Re: solr: adding a string on to a field via DIH

2012-05-05 Thread okayndc
Thanks guys.  I had taken a quick look at
the Template Transformer and it looks it does
what I need it to dodidn't see the 'hello' part
when reviewing earlier.

On Sat, May 5, 2012 at 11:47 AM, Jack Krupansky j...@basetechnology.comwrote:

 Sounds like you need a Template Transformer: ... it helps to
 concatenate multiple values or add extra characters to field for injection.

 entity name=e transformer=**TemplateTransformer ..
 field column=namedesc template=hello${e.name},${**eparent.surname} /
 ...
 /entity

 See:
 http://wiki.apache.org/solr/**DataImportHandler#**TemplateTransformerhttp://wiki.apache.org/solr/DataImportHandler#TemplateTransformer

 Or did you have something different in mind?

 -- Jack Krupansky

 -Original Message- From: okayndc
 Sent: Saturday, May 05, 2012 9:12 AM
 To: solr-user@lucene.apache.org
 Subject: solr: adding a string on to a field via DIH


 Hello,

 Is it possible to concatenate a field via DIH?
 For example for the id field, in order to make it unique
 I want to add 'project' to the beginning of the id field.
 So the field would look like 'project1234'
 Is this possible?

 field column=id name=id /

 Thanks



Re: Solr Merge during off peak times

2012-05-05 Thread Shawn Heisey

On 5/4/2012 8:10 PM, Lance Norskog wrote:

Optimize takes a 'maxSegments' option. This tells it to stop when
there are N segments instead of just one.

If you use a very high mergeFactor and then call optimize with a sane
number like 50, it only merges the little teeny segments.


When I optimize, I want only one segment.  My main concern in doing 
occasional optimizes is removing deleted documents.  Whatever speedup I 
get from having only one segment is just a nice bonus.


When it comes to only merging the small segments, I am concerned about 
that happening when regular indexing builds up enough segments to do a 
merge.  If I start with one large optimized segment, then do indexing 
operations such that I reach segmentsPerTier, will it leave the large 
segment alone and just work on the little ones?  I am using Solr 3.5 
with the following config:


mergePolicy class=org.apache.lucene.index.TieredMergePolicy
int name=maxMergeAtOnce35/int
int name=segmentsPerTier35/int
int name=maxMergeAtOnceExplicit105/int
/mergePolicy

Thanks,
Shawn



Re: 1MB file to Zookeeper

2012-05-05 Thread Mark Miller

On May 5, 2012, at 8:39 AM, Jan Høydahl wrote:

 ZK is not really designed for keeping large data files, 
 fromhttp://zookeeper.apache.org/doc/current/zookeeperProgrammers.html#Data+Access:
 ZooKeeper was not designed to be a general database or large object 
 store.If large data storage is needed, the usually pattern of dealing 
 with such data is to store it on a bulk storage system, such as NFS or HDFS, 
 and store pointers to the storage locations in ZooKeeper.
 

I don't really think it's that big a deal where Solr is concerned. We don't use 
ZooKeeper intensively. Only for state changes, and initially loading config for 
SolrCore starts or reloads. ZooKeeper perf should not be a big deal for Solr. I 
think the main issue is going to be that zk wants everything in ram - but 
again, a few, couple MB conf files should be no big deal AFAICT.

Other large scale applications that constantly and intensively use zookeeper 
are a different story - when Solr is in its running state, it doesnt do 
anything with zookeeper other than maintain it's heartbeat.

- Mark Miller
lucidimagination.com













Re: Understanding RecoveryStrategy

2012-05-05 Thread Mark Miller

On May 5, 2012, at 2:37 AM, Trym R. Møller wrote:

 Hi
 
 Using Solr trunk with the replica feature, I see the below exception 
 repeatedly in the Solr log.
 I have been looking into the code of RecoveryStrategy#commitOnLeader and read 
 the code as follows:
 1. sends a commit request (with COMMIT_END_POINT=true) to the Solr instance 
 containing the leader of the slice
 2. sends a commit request to the Solr instance containing the leader of the 
 slice
 The first results in a commit on the shards in the single leader Solr 
 instance and the second results in a commit on the shards in the single 
 leader Solr plus on all other Solrs having slices or replica belonging to the 
 collection.
 
 I would expect that the first request is the relevant (and enough to do a 
 recovery of the specific replica).
 Am I reading the second request wrong or is it a bug?

Your right - that second server.commit() looks like a bug - I don't think it 
should be harmful - it would just send out a commit to the cluster - but it 
should not be there.

I'm not sure why that second commit is timing out though. I guess it may be 
that reopening the searcher so quickly twice in a row is taking longer than the 
30 second timeout. I guess we have to consider if we even need that commit to 
open a new searcher - but either way it should be a lot better once we remove 
that second commit.

 
 The code I'm referring to is
UpdateRequest ureq = new UpdateRequest();
ureq.setParams(new ModifiableSolrParams());
ureq.getParams().set(DistributedUpdateProcessor.COMMIT_END_POINT, true);
ureq.getParams().set(RecoveryStrategy.class.getName(), baseUrl);
 1.ureq.setAction(AbstractUpdateRequest.ACTION.COMMIT, false, 
 true).process(
server);
 2.server.commit();
 
 Thanks in advance for any input.
 
 Best regards Trym R. Møller
 
 Apr 21, 2012 10:14:11 AM org.apache.solr.common.SolrException log
 SEVERE: Error while trying to 
 recover:org.apache.solr.client.solrj.SolrServerException: 
 http://myIP:8983/solr/myShardId
at 
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:493)
at 
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:264)
at 
 org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:103)
at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:180)
at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:156)
at 
 org.apache.solr.cloud.RecoveryStrategy.commitOnLeader(RecoveryStrategy.java:170)
at 
 org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:120)
at 
 org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:341)
at 
 org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:206)
 Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
at 
 org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78)
at 
 org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106)
at 
 org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1116)
at 
 org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1413)
at 
 org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1973)
at 
 org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1735)
at 
 org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1098)
at 
 org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398)
at 
 org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
at 
 org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
at 
 org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
at 
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:440)
... 8 more

- Mark Miller
lucidimagination.com













Re: Understanding RecoveryStrategy

2012-05-05 Thread Mark Miller
https://issues.apache.org/jira/browse/SOLR-3437

On May 5, 2012, at 12:46 PM, Mark Miller wrote:

 
 On May 5, 2012, at 2:37 AM, Trym R. Møller wrote:
 
 Hi
 
 Using Solr trunk with the replica feature, I see the below exception 
 repeatedly in the Solr log.
 I have been looking into the code of RecoveryStrategy#commitOnLeader and 
 read the code as follows:
 1. sends a commit request (with COMMIT_END_POINT=true) to the Solr instance 
 containing the leader of the slice
 2. sends a commit request to the Solr instance containing the leader of the 
 slice
 The first results in a commit on the shards in the single leader Solr 
 instance and the second results in a commit on the shards in the single 
 leader Solr plus on all other Solrs having slices or replica belonging to 
 the collection.
 
 I would expect that the first request is the relevant (and enough to do a 
 recovery of the specific replica).
 Am I reading the second request wrong or is it a bug?
 
 Your right - that second server.commit() looks like a bug - I don't think it 
 should be harmful - it would just send out a commit to the cluster - but it 
 should not be there.
 
 I'm not sure why that second commit is timing out though. I guess it may be 
 that reopening the searcher so quickly twice in a row is taking longer than 
 the 30 second timeout. I guess we have to consider if we even need that 
 commit to open a new searcher - but either way it should be a lot better once 
 we remove that second commit.
 
 
 The code I'm referring to is
   UpdateRequest ureq = new UpdateRequest();
   ureq.setParams(new ModifiableSolrParams());
   ureq.getParams().set(DistributedUpdateProcessor.COMMIT_END_POINT, true);
   ureq.getParams().set(RecoveryStrategy.class.getName(), baseUrl);
 1.ureq.setAction(AbstractUpdateRequest.ACTION.COMMIT, false, 
 true).process(
   server);
 2.server.commit();
 
 Thanks in advance for any input.
 
 Best regards Trym R. Møller
 
 Apr 21, 2012 10:14:11 AM org.apache.solr.common.SolrException log
 SEVERE: Error while trying to 
 recover:org.apache.solr.client.solrj.SolrServerException: 
 http://myIP:8983/solr/myShardId
   at 
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:493)
   at 
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:264)
   at 
 org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:103)
   at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:180)
   at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:156)
   at 
 org.apache.solr.cloud.RecoveryStrategy.commitOnLeader(RecoveryStrategy.java:170)
   at 
 org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:120)
   at 
 org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:341)
   at 
 org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:206)
 Caused by: java.net.SocketTimeoutException: Read timed out
   at java.net.SocketInputStream.socketRead0(Native Method)
   at java.net.SocketInputStream.read(SocketInputStream.java:129)
   at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
   at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
   at 
 org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78)
   at 
 org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106)
   at 
 org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1116)
   at 
 org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1413)
   at 
 org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1973)
   at 
 org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1735)
   at 
 org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1098)
   at 
 org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398)
   at 
 org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
   at 
 org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
   at 
 org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
   at 
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:440)
   ... 8 more
 
 - Mark Miller
 lucidimagination.com
 
 
 
 
 
 
 
 
 
 
 

- Mark Miller
lucidimagination.com













Re: Invalid version (expected 2, but 60) on CentOS in production please Help!!!

2012-05-05 Thread Erick Erickson
The first thing I'd check is if, in the log, there is a replication happening
immediately prior to the error. I confess I'm not entirely up on the
version thing, but is it possible you're replicating an index that
is built with some other version of Solr?

That would at least explain your statement that it runs OK, but then
fails sometime later.

Best
Erick

On Fri, May 4, 2012 at 1:50 PM, Ravi Solr ravis...@gmail.com wrote:
 Hello,
         We Recently we migrated our SOLR 3.6 server OS from Solaris
 to CentOS and from then on we started seeing Invalid version
 (expected 2, but 60) errors on one of the query servers (oddly one
 other query server seems fine). If we restart the server having issue
 everything will be alright, but the next day in the morning again we
 get the same exception. I made sure that all the client applications
 are using SOLR 3.6 version.

 The Glassfish on which all the applications  and SOLR are deployed use
 Java  1.6.0_29. The only difference I could see

 1. The process indexing to the server having issues is using java1.6.0_31
 2. The process indexing to the server that DOES NOT have issues is
 using java1.6.0_29

 Could the Java minor version being greater than the SOLR instance be
 the cause of this issue  ???

 Can anybody please help me debug this a bit more ? what else can I
 look at to understand the underlying problem. The stack trace is given
 below


 [#|2012-05-04T09:58:43.985-0400|SEVERE|sun-appserver2.1.1|xxx...|_ThreadID=32;_ThreadName=httpSSLWorkerThread-9001-7;_RequestID=a19f92cc-2a8c-47e8-b159-a20330f14af5;
 org.apache.solr.client.solrj.SolrServerException: Error executing query
        at 
 org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95)
        at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:311)
        at 
 com.wpost.ipad.feeds.FeedController.findLinksetNewsBySection(FeedController.java:743)
        at 
 com.wpost.ipad.feeds.FeedController.findNewsBySection(FeedController.java:347)
        at sun.reflect.GeneratedMethodAccessor282.invoke(Unknown Source)
        at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at 
 org.springframework.web.bind.annotation.support.HandlerMethodInvoker.invokeHandlerMethod(HandlerMethodInvoker.java:175)
        at 
 org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.invokeHandlerMethod(AnnotationMethodHandlerAdapter.java:421)
        at 
 org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.handle(AnnotationMethodHandlerAdapter.java:409)
        at 
 org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:774)
        at 
 org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:719)
        at 
 org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:644)
        at 
 org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:549)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:734)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:847)
        at 
 org.apache.catalina.core.ApplicationFilterChain.servletService(ApplicationFilterChain.java:427)
        at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:315)
        at 
 org.apache.catalina.core.StandardContextValve.invokeInternal(StandardContextValve.java:287)
        at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:218)
        at 
 org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:648)
        at 
 org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:593)
        at com.sun.enterprise.web.WebPipeline.invoke(WebPipeline.java:94)
        at 
 com.sun.enterprise.web.PESessionLockingStandardPipeline.invoke(PESessionLockingStandardPipeline.java:98)
        at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:222)
        at 
 org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:648)
        at 
 org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:593)
        at 
 org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:587)
        at 
 org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:1093)
        at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:166)
        at 
 org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:648)
        at 
 org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:593)
        at 
 org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:587)
        at 
 org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:1093)
        at 
 org.apache.coyote.tomcat5.CoyoteAdapter.service(CoyoteAdapter.java:291)
       

Re: Single Index to Shards

2012-05-05 Thread Erick Erickson
Oh, isn't that easier! Need more coffee before suggesting things..

Thanks,
Erick

On Fri, May 4, 2012 at 8:16 PM, Lance Norskog goks...@gmail.com wrote:
 If you are not using SolrCloud, splitting an index is simple:
 1) copy the index
 2) remove what you do not want via delete-by-query
 3) Optimize!

 #2 brings up a basic design question: you have to decide which
 documents go to which shards. Mostly people use a value generated by a
 hash on the actual id- this allows you to assign docs evenly.

 http://wiki.apache.org/solr/UniqueKey

 On Fri, May 4, 2012 at 4:28 PM, Young, Cody cody.yo...@move.com wrote:
 You can also make a copy of your existing index, bring it up as a second 
 instance/core and then send delete queries to both indexes.

 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Friday, May 04, 2012 8:37 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Single Index to Shards

 There's no way to split an _existing_ index into multiple shards, although 
 some of the work on SolrCloud is considering being able to do this. You have 
 a couple of choices here:

 1 Just reindex everything from scratch into two shards
 2 delete all the docs from your index that will go into shard 2 and
 2 just
     index the docs for shard 2 in your new shard

 But I want to be sure you're on the right track here. You only need to shard 
 if your index contains too many documents for your hardware to produce 
 decent query rates. If you are getting (and I'm picking this number out of 
 thin air) 50 QPS on your hardware (i.e. you're not stressing memory
 etc) and just want to get to 150 QPS, use replication rather than sharding.

 see: http://wiki.apache.org/solr/SolrReplication

 Best
 Erick

 On Fri, May 4, 2012 at 9:44 AM, michaelsever sever_mich...@bah.com wrote:
 If I have a single Solr index running on a Core, can I split it or
 migrate it into 2 shards?

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Single-Index-to-Shards-tp3962380.ht
 ml Sent from the Solr - User mailing list archive at Nabble.com.



 --
 Lance Norskog
 goks...@gmail.com


Re: Single Index to Shards

2012-05-05 Thread Lance Norskog
We did it at my last job. Took a few days to split a 500mdoc index.

On Sat, May 5, 2012 at 9:55 AM, Erick Erickson erickerick...@gmail.com wrote:
 Oh, isn't that easier! Need more coffee before suggesting things..

 Thanks,
 Erick

 On Fri, May 4, 2012 at 8:16 PM, Lance Norskog goks...@gmail.com wrote:
 If you are not using SolrCloud, splitting an index is simple:
 1) copy the index
 2) remove what you do not want via delete-by-query
 3) Optimize!

 #2 brings up a basic design question: you have to decide which
 documents go to which shards. Mostly people use a value generated by a
 hash on the actual id- this allows you to assign docs evenly.

 http://wiki.apache.org/solr/UniqueKey

 On Fri, May 4, 2012 at 4:28 PM, Young, Cody cody.yo...@move.com wrote:
 You can also make a copy of your existing index, bring it up as a second 
 instance/core and then send delete queries to both indexes.

 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Friday, May 04, 2012 8:37 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Single Index to Shards

 There's no way to split an _existing_ index into multiple shards, although 
 some of the work on SolrCloud is considering being able to do this. You 
 have a couple of choices here:

 1 Just reindex everything from scratch into two shards
 2 delete all the docs from your index that will go into shard 2 and
 2 just
     index the docs for shard 2 in your new shard

 But I want to be sure you're on the right track here. You only need to 
 shard if your index contains too many documents for your hardware to 
 produce decent query rates. If you are getting (and I'm picking this number 
 out of thin air) 50 QPS on your hardware (i.e. you're not stressing memory
 etc) and just want to get to 150 QPS, use replication rather than sharding.

 see: http://wiki.apache.org/solr/SolrReplication

 Best
 Erick

 On Fri, May 4, 2012 at 9:44 AM, michaelsever sever_mich...@bah.com wrote:
 If I have a single Solr index running on a Core, can I split it or
 migrate it into 2 shards?

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Single-Index-to-Shards-tp3962380.ht
 ml Sent from the Solr - User mailing list archive at Nabble.com.



 --
 Lance Norskog
 goks...@gmail.com



-- 
Lance Norskog
goks...@gmail.com


Re: Why would solr norms come up different from Lucene norms?

2012-05-05 Thread Lance Norskog
Which Similarity class do you use for the Lucene code? Solr has a custom one.

On Fri, May 4, 2012 at 6:30 AM, Benson Margulies bimargul...@gmail.com wrote:
 So, I've got some code that stores the same documents in a Lucene
 3.5.0 index and a Solr 3.5.0 instance. It's only five documents.

 For a particular field, the Solr norm is always 0.625, while the
 Lucene norm is .5.

 I've watched the code in NormsWriterPerField in both cases.

 In Solr we've got .577, in naked Lucene it's .5.

 I tried to check for boosts, and I don't see any non-1.0 document or
 field boosts.

 The Solr field is:

 field name=bt_rni_NameHRK_encodedName type=text_ws indexed=true
 stored=true multiValued=false /



-- 
Lance Norskog
goks...@gmail.com


Re: problem with date searching.

2012-05-05 Thread Lance Norskog
Use debugQuery=true to see exactly how the dismax parser sees this query.

Also, since this is a binary query, you can use filter queries
instead. Those use the Lucene syntax.

On Fri, May 4, 2012 at 8:14 AM, Erick Erickson erickerick...@gmail.com wrote:
 Right, you need to do the explicit qualification of the date field.
 dismax parsing is intended to work with text-type fields, not
 numeric or date fields. If you attach debugQuery=on, you'll
 see that your scanneddate field is just dropped.

 Furthermore, dismax was never intended to work with range
 queries. Note this from the DisMaxQParserPlugin page:

  extremely simplified subset of the Lucene QueryParser syntax

 I'll expand on this a bit on the Wiki page.


 Best
 Erick

 On Fri, May 4, 2012 at 6:45 AM, Dmitry Kan dmitry@gmail.com wrote:
 unless, something else is wrong, my question would be, if you have the
 documents in solr stamped with these dates?
 also could try for a test specifying the field name directly:

 q=scanneddate:[2011-09-22T22:40:30Z TO 2012-02-02T01:30:52Z]

 also, in your first e-mail you said you have used

 [*2012-02-02T01:30:52Z TO 2012-02-02T01:30:52Z*]

 with asterisks *, what scanneddate values did you then get?

 On Fri, May 4, 2012 at 1:37 PM, ayyappan ayyaba...@gmail.com wrote:

 thanks for quick response.

  I tried your advice .  [2011-09-22T22:40:30Z TO 2012-02-02T01:30:52Z]
 like that even though i am not getting any result .

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/problem-with-date-searching-tp3961761p3961833.html
 Sent from the Solr - User mailing list archive at Nabble.com.




 --
 Regards,

 Dmitry Kan



-- 
Lance Norskog
goks...@gmail.com


Re: Why would solr norms come up different from Lucene norms?

2012-05-05 Thread Benson Margulies
On Sat, May 5, 2012 at 7:59 PM, Lance Norskog goks...@gmail.com wrote:
 Which Similarity class do you use for the Lucene code? Solr has a custom one.

I am embarassed to report that I also have a custom similarity that I
didn't know about, and once I configured that into Solr all was well.



 On Fri, May 4, 2012 at 6:30 AM, Benson Margulies bimargul...@gmail.com 
 wrote:
 So, I've got some code that stores the same documents in a Lucene
 3.5.0 index and a Solr 3.5.0 instance. It's only five documents.

 For a particular field, the Solr norm is always 0.625, while the
 Lucene norm is .5.

 I've watched the code in NormsWriterPerField in both cases.

 In Solr we've got .577, in naked Lucene it's .5.

 I tried to check for boosts, and I don't see any non-1.0 document or
 field boosts.

 The Solr field is:

 field name=bt_rni_NameHRK_encodedName type=text_ws indexed=true
 stored=true multiValued=false /



 --
 Lance Norskog
 goks...@gmail.com