Re: Need Help in Patching OPENNLP

2013-04-02 Thread karthicrnair
Thanks much !!

Explorer -- Internet Explorer :) Sorry for the miscommunication. Yeah let me
check it once again.

appreciate all the help :)

krn



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Need-Help-in-Patching-OPENNLP-tp4052362p4053094.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: MoreLikeThis - Odd results - what am I doing wrong?

2013-04-02 Thread DC tech
OK - so I have my SOLR instance running on AWS. 
Any suggestions on how to safely share the link?  Right now, the whole SOLR 
instance is totally open. 



Gagandeep singh gagan.g...@gmail.com wrote:

say debugQuery=truemlt=true and see the scores for the MLT query, not a
sample query. You can use Amazon ec2 to bring up your solr, you should be
able to get a micro instance for free trial.


On Mon, Apr 1, 2013 at 5:10 AM, dc tech dctech1...@gmail.com wrote:

 I did try the raw query against the *simi* field and those seem to return
 results in the order expected.
 For instance, Acura MDX has  ( large, SUV, 4WD   Luxury) in the simi field.
 Running a query with those words against the simi field returns the
 expected models (X5, Audi Q5, etc) and then the subsequent documents have
 decreasing relevance. So the basic query mechanism seems to be fine.

 The issue just seems to be with MoreLikeThis component and handler.
 I can post the index on a public SOLR instance - any suggestions? (or for
 hosting)


 On Sun, Mar 31, 2013 at 1:54 PM, Gagandeep singh gagan.g...@gmail.com
 wrote:

  If you can bring up your solr setup on a public machine then im sure a
 lot
  of debugging can be done. Without that, i think what you should look at
 is
  the tf-idf scores of the terms like camry etc. Usually idf is the
  deciding factor into which results show at the top (tf should be 1 for
 your
  data).
  Enable debugQuery=true and look at explain section to see show score is
  getting calculated.
 
  You should try giving different boosts to class, type, drive, size to
  control the results.
 
 
  On Sun, Mar 31, 2013 at 8:52 PM, dc tech dctech1...@gmail.com wrote:
 
  I am running some experiments on more like this and the results seem
  rather odd - I am doing something wrong but just cannot figure out what.
  Basically, the similarity results are decent - but not great.
 
  *Issue 1  = Quality*
  Toyota Camry : finds Altima (good) but then next one is Camry Hybrid
  whereas it should have found Accord.
  I have normalized the data into a simi field which has only the
  attributes that I care about.
  Without the simi field, I could not get mlt.qf boosts to work well
 enough
  to return results
 
  *Issue 2*
  Some fields do not work at all. For instance, text+simi (in mlt.fl)
 works
  whereas just simi does not.
  So some weirdness that am just not understanding.
 
  Would be grateful for your guidance !
 
 
  Here is the setup:
  *1. SOLR Version*
  solr-spec 4.2.0.2013.03.06.22.32.13
  solr-impl 4.2.0 1453694   rmuir - 2013-03-06 22:32:13
  lucene-spec 4.2.0
  lucene-impl 4.2.0 1453694 -  rmuir - 2013-03-06 22:25:29
 
  *2. Machine Information*
  Sun Microsystems Inc. Java HotSpot(TM) 64-Bit Server VM (1.6.0_23
  19.0-b09)
  Windows 7 Home 64 Bit with 4 GB RAM
 
  *3. Sample Data *
  I created this 'dummy' data of cars  - the idea being that these would
 be
  sufficient and simple to generate similarity and understand how it would
  work.
  There are 181 rows in the data set (I have attached it for reference in
  CSV format)
 
  [image: Inline image 1]
 
  *4. SCHEMA*
  *Field Definitions*
 field name=id type=string indexed=true stored=true
  termVectors=true multiValued=false/
 field name=make type=string indexed=true stored=true
  termVectors=true multiValued=false/
 field name=model type=string indexed=true stored=true
  termVectors=true multiValued=false/
 field name=class type=string indexed=true stored=true
  termVectors=true multiValued=false/
 field name=type type=string indexed=true stored=true
  termVectors=true multiValued=false/
 field name=drive type=string indexed=true stored=true
  termVectors=true multiValued=false/
 field name=comment type=text_general indexed=true
 stored=true
  termVectors=true multiValued=true/
 field name=size type=string indexed=true stored=true
  termVectors=true multiValued=false/
  *
  *
  *Copy Fields*
  copyField   source=make dest=make_en   /  !-- Search  --
  copyField   source=model dest=model_en   /  !-- Search  --
  copyField   source=class dest=class_en   /  !-- Search  --
  copyField   source=type dest=type_en   /  !-- Search  --
  copyField   source=drive dest=drive_en   /  !-- Search  --
  copyField   source=comment dest=comment_en   /  !-- Search
  --
  copyField   source=size dest=size_en   /  !-- Search  --
  copyField   source=id dest=text   /  !-- Glob  --
  copyField   source=make dest=text   /  !-- Glob  --
  copyField   source=model dest=text   /  !-- Glob  --
  copyField   source=class dest=text   /  !-- Glob  --
  copyField   source=type dest=text   /  !-- Glob  --
  copyField   source=drive dest=text   /  !-- Glob  --
  copyField   source=comment dest=text   /  !-- Glob  --
  copyField   source=size dest=text   /  !-- Glob  --
  copyField   source=size dest=text   /  !-- Glob  --
  *copyField   source=class dest=simi_en   /  !-- similarity
   --*
  

java.lang.OutOfMemoryError: Map failed

2013-04-02 Thread Arkadi Colson

Hi

Recently solr crashed. I've found this in the error log.
My commit settings are loking like this:
 autoCommit
   maxTime1/maxTime
   openSearcherfalse/openSearcher
 /autoCommit

   autoSoftCommit
 maxTime2000/maxTime
   /autoSoftCommit

The machine has 10GB of memory. Tomcat is running with -Xms2048m -Xmx6144m

Versions
Solr: 4.2
Tomcat: 7.0.33
Java: 1.7

Anybody any idea?

Thx!

Arkadi

SEVERE: auto commit error...:org.apache.solr.common.SolrException: Error 
opening new searcher
at 
org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1415)

at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1527)
at 
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:562)

at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at 
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)

at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:722)
Caused by: java.io.IOException: Map failed
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:849)
at 
org.apache.lucene.store.MMapDirectory.map(MMapDirectory.java:283)
at 
org.apache.lucene.store.MMapDirectory$MMapIndexInput.init(MMapDirectory.java:228)
at 
org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:195)
at 
org.apache.lucene.store.NRTCachingDirectory.openInput(NRTCachingDirectory.java:232)
at 
org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.init(CompressingStoredFieldsReader.java:96)
at 
org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsReader(CompressingStoredFieldsFormat.java:113)
at 
org.apache.lucene.index.SegmentCoreReaders.init(SegmentCoreReaders.java:147)
at 
org.apache.lucene.index.SegmentReader.init(SegmentReader.java:56)
at 
org.apache.lucene.index.ReadersAndLiveDocs.getReader(ReadersAndLiveDocs.java:121)
at 
org.apache.lucene.index.BufferedDeletesStream.applyDeletes(BufferedDeletesStream.java:269)
at 
org.apache.lucene.index.IndexWriter.applyAllDeletes(IndexWriter.java:2961)
at 
org.apache.lucene.index.IndexWriter.maybeApplyDeletes(IndexWriter.java:2952)
at 
org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:368)
at 
org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:270)
at 
org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:255)
at 
org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:249)
at 
org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1353)

... 11 more
Caused by: java.lang.OutOfMemoryError: Map failed
at sun.nio.ch.FileChannelImpl.map0(Native Method)
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:846)
... 28 more


SEVERE: auto commit error...:java.lang.IllegalStateException: this 
writer hit an OutOfMemoryError; cannot commit
at 
org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2661)
at 
org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2827)
at 
org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2807)
at 
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:541)

at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at 
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)

at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:722)




Out of memory on some faceting queries

2013-04-02 Thread Dotan Cohen
On some queries I get out of memory errors:

{error:{msg:java.lang.OutOfMemoryError: Java heap
space,trace:java.lang.RuntimeException:
java.lang.OutOfMemoryError: Java heap space\n\tat
org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:462)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:290)\n\tat
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)\n\tat
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)\n\tat
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560)\n\tat
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)\n\tat
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382)\n\tat
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)\n\tat
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)\n\tat
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)\n\tat
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)\n\tat
org.eclipse.jetty.server.Server.handle(Server.java:365)\n\tat
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485)\n\tat
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)\n\tat
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:926)\n\tat
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:988)\n\tat
org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:635)\n\tat
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)\n\tat
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)\n\tat
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)\n\tat
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)\n\tat
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)\n\tat
java.lang.Thread.run(Thread.java:679)\nCaused by:
java.lang.OutOfMemoryError: Java heap space\n\tat
org.apache.lucene.index.DocTermOrds.uninvert(DocTermOrds.java:273)\n\tat
org.apache.solr.request.UnInvertedField.init(UnInvertedField.java:178)\n\tat
org.apache.solr.request.UnInvertedField.getUnInvertedField(UnInvertedField.java:669)\n\tat
org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:325)\n\tat
org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:423)\n\tat
org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:205)\n\tat
org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:78)\n\tat
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208)\n\tat
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)\n\tat
org.apache.solr.core.SolrCore.execute(SolrCore.java:1816)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:448)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:269)\n\tat
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)\n\tat
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)\n\tat
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560)\n\tat
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)\n\tat
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382)\n\tat
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)\n\tat
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)\n\tat
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)\n\tat
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)\n\tat
org.eclipse.jetty.server.Server.handle(Server.java:365)\n\tat
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485)\n\tat
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)\n\tat

AW: java.lang.OutOfMemoryError: Map failed

2013-04-02 Thread André Widhani
Hi Arkadi,

this error usually indicates that virtual memory is not sufficient (should be 
unlimited).

Please see http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/69168 

Regards,
André


Von: Arkadi Colson [ark...@smartbit.be]
Gesendet: Dienstag, 2. April 2013 10:24
An: solr-user@lucene.apache.org
Betreff: java.lang.OutOfMemoryError: Map failed

Hi

Recently solr crashed. I've found this in the error log.
My commit settings are loking like this:
  autoCommit
maxTime1/maxTime
openSearcherfalse/openSearcher
  /autoCommit

autoSoftCommit
  maxTime2000/maxTime
/autoSoftCommit

The machine has 10GB of memory. Tomcat is running with -Xms2048m -Xmx6144m

Versions
Solr: 4.2
Tomcat: 7.0.33
Java: 1.7

Anybody any idea?

Thx!

Arkadi

SEVERE: auto commit error...:org.apache.solr.common.SolrException: Error
opening new searcher
 at
org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1415)
 at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1527)
 at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:562)
 at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
 at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
 at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
 at java.util.concurrent.FutureTask.run(FutureTask.java:166)
 at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
 at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
 at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:722)
Caused by: java.io.IOException: Map failed
 at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:849)
 at
org.apache.lucene.store.MMapDirectory.map(MMapDirectory.java:283)
 at
org.apache.lucene.store.MMapDirectory$MMapIndexInput.init(MMapDirectory.java:228)
 at
org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:195)
 at
org.apache.lucene.store.NRTCachingDirectory.openInput(NRTCachingDirectory.java:232)
 at
org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.init(CompressingStoredFieldsReader.java:96)
 at
org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsReader(CompressingStoredFieldsFormat.java:113)
 at
org.apache.lucene.index.SegmentCoreReaders.init(SegmentCoreReaders.java:147)
 at
org.apache.lucene.index.SegmentReader.init(SegmentReader.java:56)
 at
org.apache.lucene.index.ReadersAndLiveDocs.getReader(ReadersAndLiveDocs.java:121)
 at
org.apache.lucene.index.BufferedDeletesStream.applyDeletes(BufferedDeletesStream.java:269)
 at
org.apache.lucene.index.IndexWriter.applyAllDeletes(IndexWriter.java:2961)
 at
org.apache.lucene.index.IndexWriter.maybeApplyDeletes(IndexWriter.java:2952)
 at
org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:368)
 at
org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:270)
 at
org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:255)
 at
org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:249)
 at
org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1353)
 ... 11 more
Caused by: java.lang.OutOfMemoryError: Map failed
 at sun.nio.ch.FileChannelImpl.map0(Native Method)
 at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:846)
 ... 28 more


SEVERE: auto commit error...:java.lang.IllegalStateException: this
writer hit an OutOfMemoryError; cannot commit
 at
org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2661)
 at
org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2827)
 at
org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2807)
 at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:541)
 at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
 at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
 at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
 at java.util.concurrent.FutureTask.run(FutureTask.java:166)
 at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
 at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
 at

Re: AW: java.lang.OutOfMemoryError: Map failed

2013-04-02 Thread Arkadi Colson

Hmmm I checked it and it seems to be ok:

root@solr01-dcg:~# ulimit -v
unlimited

Any other tips or do you need more debug info?

BR

On 04/02/2013 11:15 AM, André Widhani wrote:

Hi Arkadi,

this error usually indicates that virtual memory is not sufficient (should be 
unlimited).

Please see http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/69168

Regards,
André


Von: Arkadi Colson [ark...@smartbit.be]
Gesendet: Dienstag, 2. April 2013 10:24
An: solr-user@lucene.apache.org
Betreff: java.lang.OutOfMemoryError: Map failed

Hi

Recently solr crashed. I've found this in the error log.
My commit settings are loking like this:
   autoCommit
 maxTime1/maxTime
 openSearcherfalse/openSearcher
   /autoCommit

 autoSoftCommit
   maxTime2000/maxTime
 /autoSoftCommit

The machine has 10GB of memory. Tomcat is running with -Xms2048m -Xmx6144m

Versions
Solr: 4.2
Tomcat: 7.0.33
Java: 1.7

Anybody any idea?

Thx!

Arkadi

SEVERE: auto commit error...:org.apache.solr.common.SolrException: Error
opening new searcher
  at
org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1415)
  at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1527)
  at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:562)
  at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
  at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
  at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
  at java.util.concurrent.FutureTask.run(FutureTask.java:166)
  at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
  at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
  at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:722)
Caused by: java.io.IOException: Map failed
  at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:849)
  at
org.apache.lucene.store.MMapDirectory.map(MMapDirectory.java:283)
  at
org.apache.lucene.store.MMapDirectory$MMapIndexInput.init(MMapDirectory.java:228)
  at
org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:195)
  at
org.apache.lucene.store.NRTCachingDirectory.openInput(NRTCachingDirectory.java:232)
  at
org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.init(CompressingStoredFieldsReader.java:96)
  at
org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsReader(CompressingStoredFieldsFormat.java:113)
  at
org.apache.lucene.index.SegmentCoreReaders.init(SegmentCoreReaders.java:147)
  at
org.apache.lucene.index.SegmentReader.init(SegmentReader.java:56)
  at
org.apache.lucene.index.ReadersAndLiveDocs.getReader(ReadersAndLiveDocs.java:121)
  at
org.apache.lucene.index.BufferedDeletesStream.applyDeletes(BufferedDeletesStream.java:269)
  at
org.apache.lucene.index.IndexWriter.applyAllDeletes(IndexWriter.java:2961)
  at
org.apache.lucene.index.IndexWriter.maybeApplyDeletes(IndexWriter.java:2952)
  at
org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:368)
  at
org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:270)
  at
org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:255)
  at
org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:249)
  at
org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1353)
  ... 11 more
Caused by: java.lang.OutOfMemoryError: Map failed
  at sun.nio.ch.FileChannelImpl.map0(Native Method)
  at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:846)
  ... 28 more


SEVERE: auto commit error...:java.lang.IllegalStateException: this
writer hit an OutOfMemoryError; cannot commit
  at
org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2661)
  at
org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2827)
  at
org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2807)
  at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:541)
  at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
  at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
  at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
  at java.util.concurrent.FutureTask.run(FutureTask.java:166)
  at

AW: AW: java.lang.OutOfMemoryError: Map failed

2013-04-02 Thread André Widhani
The output is from the root user. Are you running Solr as root?

If not, please try again using the operating system user that runs Solr.

André

Von: Arkadi Colson [ark...@smartbit.be]
Gesendet: Dienstag, 2. April 2013 11:26
An: solr-user@lucene.apache.org
Cc: André Widhani
Betreff: Re: AW: java.lang.OutOfMemoryError: Map failed

Hmmm I checked it and it seems to be ok:

root@solr01-dcg:~# ulimit -v
unlimited

Any other tips or do you need more debug info?

BR

On 04/02/2013 11:15 AM, André Widhani wrote:
 Hi Arkadi,

 this error usually indicates that virtual memory is not sufficient (should be 
 unlimited).

 Please see http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/69168

 Regards,
 André

 
 Von: Arkadi Colson [ark...@smartbit.be]
 Gesendet: Dienstag, 2. April 2013 10:24
 An: solr-user@lucene.apache.org
 Betreff: java.lang.OutOfMemoryError: Map failed

 Hi

 Recently solr crashed. I've found this in the error log.
 My commit settings are loking like this:
autoCommit
  maxTime1/maxTime
  openSearcherfalse/openSearcher
/autoCommit

  autoSoftCommit
maxTime2000/maxTime
  /autoSoftCommit

 The machine has 10GB of memory. Tomcat is running with -Xms2048m -Xmx6144m

 Versions
 Solr: 4.2
 Tomcat: 7.0.33
 Java: 1.7

 Anybody any idea?

 Thx!

 Arkadi

 SEVERE: auto commit error...:org.apache.solr.common.SolrException: Error
 opening new searcher
   at
 org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1415)
   at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1527)
   at
 org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:562)
   at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
   at
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at
 java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
   at java.util.concurrent.FutureTask.run(FutureTask.java:166)
   at
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
   at
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
   at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:722)
 Caused by: java.io.IOException: Map failed
   at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:849)
   at
 org.apache.lucene.store.MMapDirectory.map(MMapDirectory.java:283)
   at
 org.apache.lucene.store.MMapDirectory$MMapIndexInput.init(MMapDirectory.java:228)
   at
 org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:195)
   at
 org.apache.lucene.store.NRTCachingDirectory.openInput(NRTCachingDirectory.java:232)
   at
 org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.init(CompressingStoredFieldsReader.java:96)
   at
 org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsReader(CompressingStoredFieldsFormat.java:113)
   at
 org.apache.lucene.index.SegmentCoreReaders.init(SegmentCoreReaders.java:147)
   at
 org.apache.lucene.index.SegmentReader.init(SegmentReader.java:56)
   at
 org.apache.lucene.index.ReadersAndLiveDocs.getReader(ReadersAndLiveDocs.java:121)
   at
 org.apache.lucene.index.BufferedDeletesStream.applyDeletes(BufferedDeletesStream.java:269)
   at
 org.apache.lucene.index.IndexWriter.applyAllDeletes(IndexWriter.java:2961)
   at
 org.apache.lucene.index.IndexWriter.maybeApplyDeletes(IndexWriter.java:2952)
   at
 org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:368)
   at
 org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:270)
   at
 org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:255)
   at
 org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:249)
   at
 org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1353)
   ... 11 more
 Caused by: java.lang.OutOfMemoryError: Map failed
   at sun.nio.ch.FileChannelImpl.map0(Native Method)
   at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:846)
   ... 28 more


 SEVERE: auto commit error...:java.lang.IllegalStateException: this
 writer hit an OutOfMemoryError; cannot commit
   at
 org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2661)
   at
 org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2827)
   at
 org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2807)
 

Re: AW: AW: java.lang.OutOfMemoryError: Map failed

2013-04-02 Thread Arkadi Colson

It is running as root:

root@solr01-dcg:~# ps aux | grep tom
root  1809 10.2 67.5 49460420 6931232 ?Sl   Mar28 706:29 
/usr/bin/java 
-Djava.util.logging.config.file=/usr/local/tomcat/conf/logging.properties -server 
-Xms2048m -Xmx6144m -XX:PermSize=64m -XX:MaxPermSize=128m -XX:+UseG1GC 
-verbose:gc -Xloggc:/solr/tomcat-logs/gc.log -XX:+PrintGCTimeStamps 
-XX:+PrintGCDetails -Duser.timezone=UTC -Dfile.encoding=UTF8 
-Dsolr.solr.home=/opt/solr/ -Dport=8983 -Dcollection.configName=smsc 
-DzkClientTimeout=2 
-DzkHost=solr01-dcg.intnet.smartbit.be:2181,solr01-gs.intnet.smartbit.be:2181,solr02-dcg.intnet.smartbit.be:2181,solr02-gs.intnet.smartbit.be:2181,solr03-dcg.intnet.smartbit.be:2181,solr03-gs.intnet.smartbit.be:2181 
-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager 
-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port= 
-Dcom.sun.management.jmxremote.ssl=false 
-Dcom.sun.management.jmxremote.authenticate=false 
-Djava.endorsed.dirs=/usr/local/tomcat/endorsed -classpath 
/usr/local/tomcat/bin/bootstrap.jar:/usr/local/tomcat/bin/tomcat-juli.jar -Dcatalina.base=/usr/local/tomcat 
-Dcatalina.home=/usr/local/tomcat 
-Djava.io.tmpdir=/usr/local/tomcat/temp 
org.apache.catalina.startup.Bootstrap start


Arkadi

On 04/02/2013 11:29 AM, André Widhani wrote:

The output is from the root user. Are you running Solr as root?

If not, please try again using the operating system user that runs Solr.

André

Von: Arkadi Colson [ark...@smartbit.be]
Gesendet: Dienstag, 2. April 2013 11:26
An: solr-user@lucene.apache.org
Cc: André Widhani
Betreff: Re: AW: java.lang.OutOfMemoryError: Map failed

Hmmm I checked it and it seems to be ok:

root@solr01-dcg:~# ulimit -v
unlimited

Any other tips or do you need more debug info?

BR

On 04/02/2013 11:15 AM, André Widhani wrote:

Hi Arkadi,

this error usually indicates that virtual memory is not sufficient (should be 
unlimited).

Please see http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/69168

Regards,
André


Von: Arkadi Colson [ark...@smartbit.be]
Gesendet: Dienstag, 2. April 2013 10:24
An: solr-user@lucene.apache.org
Betreff: java.lang.OutOfMemoryError: Map failed

Hi

Recently solr crashed. I've found this in the error log.
My commit settings are loking like this:
autoCommit
  maxTime1/maxTime
  openSearcherfalse/openSearcher
/autoCommit

  autoSoftCommit
maxTime2000/maxTime
  /autoSoftCommit

The machine has 10GB of memory. Tomcat is running with -Xms2048m -Xmx6144m

Versions
Solr: 4.2
Tomcat: 7.0.33
Java: 1.7

Anybody any idea?

Thx!

Arkadi

SEVERE: auto commit error...:org.apache.solr.common.SolrException: Error
opening new searcher
   at
org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1415)
   at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1527)
   at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:562)
   at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
   at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
   at java.util.concurrent.FutureTask.run(FutureTask.java:166)
   at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
   at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
   at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:722)
Caused by: java.io.IOException: Map failed
   at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:849)
   at
org.apache.lucene.store.MMapDirectory.map(MMapDirectory.java:283)
   at
org.apache.lucene.store.MMapDirectory$MMapIndexInput.init(MMapDirectory.java:228)
   at
org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:195)
   at
org.apache.lucene.store.NRTCachingDirectory.openInput(NRTCachingDirectory.java:232)
   at
org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.init(CompressingStoredFieldsReader.java:96)
   at
org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsReader(CompressingStoredFieldsFormat.java:113)
   at
org.apache.lucene.index.SegmentCoreReaders.init(SegmentCoreReaders.java:147)
   at
org.apache.lucene.index.SegmentReader.init(SegmentReader.java:56)
   at
org.apache.lucene.index.ReadersAndLiveDocs.getReader(ReadersAndLiveDocs.java:121)
   at
org.apache.lucene.index.BufferedDeletesStream.applyDeletes(BufferedDeletesStream.java:269)

Re: Out of memory on some faceting queries

2013-04-02 Thread Toke Eskildsen
On Tue, 2013-04-02 at 11:09 +0200, Dotan Cohen wrote:
 On some queries I get out of memory errors:
 
 {error:{msg:java.lang.OutOfMemoryError: Java heap
[...]
 org.apache.lucene.index.DocTermOrds.uninvert(DocTermOrds.java:273)\n\tat
 org.apache.solr.request.UnInvertedField.init(UnInvertedField.java:178)\n\tat
[...]

Yep, your OOM is due to faceting.

How many documents does your index have, how many fields do you facet on
and approximately how many unique values does your facet fields have?

 I notice that this only occurs on queries that run facets. I start
 Solr with the following command:
 sudo nohup java -XX:NewRatio=1 -XX:+UseParNewGC
 -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
 -Dsolr.solr.home=/mnt/SolrFiles100/solr -jar
 /opt/solr-4.1.0/example/start.jar 

You are not specifying any maximum heap size (-Xmx), which you should do
in order to avoid unpleasant surprises. Facets and sorting are often
memory hungry, but your system seems to have 13GB free RAM so the easy
solution attempt would be to increase the heap until Solr serves the
facets without OOM.

- Toke Eskildsen, State and University Library, Denmark



Re: Out of memory on some faceting queries

2013-04-02 Thread Dotan Cohen
On Tue, Apr 2, 2013 at 12:59 PM, Toke Eskildsen t...@statsbiblioteket.dk 
wrote:
 How many documents does your index have, how many fields do you facet on
 and approximately how many unique values does your facet fields have?


8971763 documents, growing at a rate of about 500 per minute. We
actually expect that to be ~5 per minute once we get out of
testing. Most documents are less than a KiB in the 'text' field, and
they have a few other fields which store short strings, dates, or
ints. You can think of these documents like tweets: short general
purpose text messages.

 I notice that this only occurs on queries that run facets. I start
 Solr with the following command:
 sudo nohup java -XX:NewRatio=1 -XX:+UseParNewGC
 -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
 -Dsolr.solr.home=/mnt/SolrFiles100/solr -jar
 /opt/solr-4.1.0/example/start.jar 

 You are not specifying any maximum heap size (-Xmx), which you should do
 in order to avoid unpleasant surprises. Facets and sorting are often
 memory hungry, but your system seems to have 13GB free RAM so the easy
 solution attempt would be to increase the heap until Solr serves the
 facets without OOM.


Thanks, I will start with -Xmx8g and test.

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: AW: AW: java.lang.OutOfMemoryError: Map failed

2013-04-02 Thread Per Steffensen
I have seen the exact same on Ubuntu Server 12.04. It helped adding some 
swap space, but I do not understand why this is necessary, since OS 
ought to just use the actual memory mapped files if there is not room in 
(virtual) memory, swapping pages in and out on demand. Note that I saw 
this for memory mapped files opened for read+write - not in the exact 
same context as you see it where MMapDirectory is trying to map memory 
mapped files.


If you find a solution/explanation, please post it here. I really want 
to know more about why FileChannel.map can cause OOM. I do not think the 
OOM is a real OOM indicating no more space on java heap, but is more 
an exception saying that OS has no more memory (in some interpretation 
of that).


Regards, Per Steffensen

On 4/2/13 11:32 AM, Arkadi Colson wrote:

It is running as root:

root@solr01-dcg:~# ps aux | grep tom
root  1809 10.2 67.5 49460420 6931232 ?Sl   Mar28 706:29 
/usr/bin/java 
-Djava.util.logging.config.file=/usr/local/tomcat/conf/logging.properties 
-server -Xms2048m -Xmx6144m -XX:PermSize=64m -XX:MaxPermSize=128m 
-XX:+UseG1GC -verbose:gc -Xloggc:/solr/tomcat-logs/gc.log 
-XX:+PrintGCTimeStamps -XX:+PrintGCDetails -Duser.timezone=UTC 
-Dfile.encoding=UTF8 -Dsolr.solr.home=/opt/solr/ -Dport=8983 
-Dcollection.configName=smsc -DzkClientTimeout=2 
-DzkHost=solr01-dcg.intnet.smartbit.be:2181,solr01-gs.intnet.smartbit.be:2181,solr02-dcg.intnet.smartbit.be:2181,solr02-gs.intnet.smartbit.be:2181,solr03-dcg.intnet.smartbit.be:2181,solr03-gs.intnet.smartbit.be:2181 
-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager 
-Dcom.sun.management.jmxremote 
-Dcom.sun.management.jmxremote.port= 
-Dcom.sun.management.jmxremote.ssl=false 
-Dcom.sun.management.jmxremote.authenticate=false 
-Djava.endorsed.dirs=/usr/local/tomcat/endorsed -classpath 
/usr/local/tomcat/bin/bootstrap.jar:/usr/local/tomcat/bin/tomcat-juli.jar 
-Dcatalina.base=/usr/local/tomcat -Dcatalina.home=/usr/local/tomcat 
-Djava.io.tmpdir=/usr/local/tomcat/temp 
org.apache.catalina.startup.Bootstrap start


Arkadi

On 04/02/2013 11:29 AM, André Widhani wrote:

The output is from the root user. Are you running Solr as root?

If not, please try again using the operating system user that runs Solr.

André

Von: Arkadi Colson [ark...@smartbit.be]
Gesendet: Dienstag, 2. April 2013 11:26
An: solr-user@lucene.apache.org
Cc: André Widhani
Betreff: Re: AW: java.lang.OutOfMemoryError: Map failed

Hmmm I checked it and it seems to be ok:

root@solr01-dcg:~# ulimit -v
unlimited

Any other tips or do you need more debug info?

BR

On 04/02/2013 11:15 AM, André Widhani wrote:

Hi Arkadi,

this error usually indicates that virtual memory is not sufficient 
(should be unlimited).


Please see 
http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/69168


Regards,
André


Von: Arkadi Colson [ark...@smartbit.be]
Gesendet: Dienstag, 2. April 2013 10:24
An: solr-user@lucene.apache.org
Betreff: java.lang.OutOfMemoryError: Map failed

Hi

Recently solr crashed. I've found this in the error log.
My commit settings are loking like this:
autoCommit
  maxTime1/maxTime
  openSearcherfalse/openSearcher
/autoCommit

  autoSoftCommit
maxTime2000/maxTime
  /autoSoftCommit

The machine has 10GB of memory. Tomcat is running with -Xms2048m 
-Xmx6144m


Versions
Solr: 4.2
Tomcat: 7.0.33
Java: 1.7

Anybody any idea?

Thx!

Arkadi

SEVERE: auto commit error...:org.apache.solr.common.SolrException: 
Error

opening new searcher
   at
org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1415)
   at 
org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1527)

   at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:562) 

   at 
org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)

   at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
   at java.util.concurrent.FutureTask.run(FutureTask.java:166)
   at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) 


   at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) 


   at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 


   at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 


   at java.lang.Thread.run(Thread.java:722)
Caused by: java.io.IOException: Map failed
   at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:849)
   at
org.apache.lucene.store.MMapDirectory.map(MMapDirectory.java:283)
   at

Re: Out of memory on some faceting queries

2013-04-02 Thread Toke Eskildsen
On Tue, 2013-04-02 at 12:16 +0200, Dotan Cohen wrote:
 8971763 documents, growing at a rate of about 500 per minute. We
 actually expect that to be ~5 per minute once we get out of
 testing.

9M documents in a heavily updated index with faceting. Maybe you are
committing faster than the faceting can be prepared?
https://wiki.apache.org/solr/FAQ#What_does_.22exceeded_limit_of_maxWarmingSearchers.3DX.22_mean.3F

Regards,
Toke Eskildsen



Collection name via Collections API (Solr 4.x)

2013-04-02 Thread Lukasz Kujawa
Hello,

I'm using Solr collections API to create a collection.

http://127.0.0.1:8983/solr/admin/collections?action=CREATEname=test2numShards=1replicationFactor=2collection.configName=default

I'm expecting new collection to be named test2 what I get instead is
test2_shard1_replica2. I don't want to tie my index name to any curent
settings. Is there any way to set collection name precisely? 

Thank you,
Lukasz




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Collection-name-via-Collections-API-Solr-4-x-tp4053155.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Collection name via Collections API (Solr 4.x)

2013-04-02 Thread Yago Riveiro
Collection API is a wrapper for the CORE API,

If you don't want that the API defines the name for you, then use the CORE API, 
you can define the collection name and the shard id.

curl 
'http://localhost:8983/solr/admin/cores?action=CREATEname=corenamecollection=collection1shard=XX'

-- 
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Tuesday, April 2, 2013 at 1:01 PM, Lukasz Kujawa wrote:

 Hello,
 
 I'm using Solr collections API to create a collection.
 
 http://127.0.0.1:8983/solr/admin/collections?action=CREATEname=test2numShards=1replicationFactor=2collection.configName=default
 
 I'm expecting new collection to be named test2 what I get instead is
 test2_shard1_replica2. I don't want to tie my index name to any curent
 settings. Is there any way to set collection name precisely? 
 
 Thank you,
 Lukasz
 
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Collection-name-via-Collections-API-Solr-4-x-tp4053155.html
 Sent from the Solr - User mailing list archive at Nabble.com 
 (http://Nabble.com).
 
 




Re: Collection name via Collections API (Solr 4.x)

2013-04-02 Thread Anshum Gupta
Also, I am assuming that the collection name in this case should be
'test2'. The replica names would be on the lines of what you've mentioned.
Is that not the case?



On Tue, Apr 2, 2013 at 5:31 PM, Lukasz Kujawa luk...@php.net wrote:

 Hello,

 I'm using Solr collections API to create a collection.


 http://127.0.0.1:8983/solr/admin/collections?action=CREATEname=test2numShards=1replicationFactor=2collection.configName=default

 I'm expecting new collection to be named test2 what I get instead is
 test2_shard1_replica2. I don't want to tie my index name to any curent
 settings. Is there any way to set collection name precisely?

 Thank you,
 Lukasz




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Collection-name-via-Collections-API-Solr-4-x-tp4053155.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 

Anshum Gupta
http://www.anshumgupta.net


Query using function query result

2013-04-02 Thread J Mohamed Zahoor
Hi


i want to query documents which match a certain dynamic criteria.
like, How do i get all documents, where sub(field1,field2)  0 ?

i tried _val_: sub(field1,field2) and used fq:[_val_:[0 TO *]
But it doesnt work.

./Zahoor


Re: Collection name via Collections API (Solr 4.x)

2013-04-02 Thread Yago Riveiro
In this link you can see what is what 
http://wiki.apache.org/solr/SolrCloud#Glossary 

The collection represents a single index, the solrCores AKA core, encapsulates 
a single physical index, One or more make up a logical shard which make up a 
collection.

You can have a collection with the same name of the SolrCore if you want.

-- 
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Tuesday, April 2, 2013 at 1:53 PM, Anshum Gupta wrote:

 Also, I am assuming that the collection name in this case should be
 'test2'. The replica names would be on the lines of what you've mentioned.
 Is that not the case?
 
 
 
 On Tue, Apr 2, 2013 at 5:31 PM, Lukasz Kujawa luk...@php.net 
 (mailto:luk...@php.net) wrote:
 
  Hello,
  
  I'm using Solr collections API to create a collection.
  
  
  http://127.0.0.1:8983/solr/admin/collections?action=CREATEname=test2numShards=1replicationFactor=2collection.configName=default
  
  I'm expecting new collection to be named test2 what I get instead is
  test2_shard1_replica2. I don't want to tie my index name to any curent
  settings. Is there any way to set collection name precisely?
  
  Thank you,
  Lukasz
  
  
  
  
  --
  View this message in context:
  http://lucene.472066.n3.nabble.com/Collection-name-via-Collections-API-Solr-4-x-tp4053155.html
  Sent from the Solr - User mailing list archive at Nabble.com 
  (http://Nabble.com).
  
 
 
 
 
 -- 
 
 Anshum Gupta
 http://www.anshumgupta.net
 
 




Re: Top 10 Terms in Index (by date)

2013-04-02 Thread Tomás Fernández Löbbe
Oh, I see, essentially you want to get the sum of the term frequencies for
every term in a subset of documents (instead of the document frequency as
the FacetComponent would give you). I don't know of an easy/out of the box
solution for this. I know the TermVectorComponent will give you the tf for
every term in a document, but I'm not sure if you can filter or sort on it.
Maybe you can do something like:
https://issues.apache.org/jira/browse/LUCENE-2393
or what's suggested here:
http://search-lucene.com/m/of5Fn1PUOHU/
but I have never used something like that.

Tomás



On Mon, Apr 1, 2013 at 9:58 PM, Andy Pickler andy.pick...@gmail.com wrote:

 I need total number of occurrences across all documents for each term.
 Imagine this...

 Post #1: I think, therefore I am like you
 Reply #1: You think too much
 Reply #2 I think that I think much as you

 Each of those documents are put into 'content'.  Pretending I don't have
 stop words, the top term query (not considering dateCreated in this
 example) would result in something like...

 think: 4
 I: 4
 you: 3
 much: 2
 ...

 Thus, just a number of documents approach doesn't work, because if a word
 occurs more than one time in a document it needs to be counted that many
 times.  That seemed to rule out faceting like you mentioned as well as the
 TermsComponent (which as I understand also only counts documents).

 Thanks,
 Andy Pickler

 On Mon, Apr 1, 2013 at 4:31 PM, Tomás Fernández Löbbe 
 tomasflo...@gmail.com
  wrote:

  So you have one document per user comment? Why not use faceting plus
  filtering on the dateCreated field? That would count number of
  documents for each term (so, in your case, if a term is used twice in
 one
  comment it would only count once). Is that what you are looking for?
 
  Tomás
 
 
  On Mon, Apr 1, 2013 at 6:32 PM, Andy Pickler andy.pick...@gmail.com
  wrote:
 
   Our company has an application that is Facebook-like for usage by
   enterprise customers.  We'd like to do a report of top 10 terms
 entered
  by
   users over (some time period).  With that in mind I'm using the
   DataImportHandler to put all the relevant data from our database into a
   Solr 'content' field:
  
   field name=content type=text_general indexed=true stored=false
   multiValued=false required=true termVectors=true/
  
   Along with the content is the 'dateCreated' for that content:
  
   field name=dateCreated type=tdate indexed=true stored=false
   multiValued=false required=true/
  
   I'm struggling with the TermVectorComponent documentation to understand
  how
   I can put together a query that answers the 'report' mentioned above.
   For
   each document I need each term counted however many times it is entered
   (content of I think what I think would report 'think' as used twice).
Does anyone have any insight as to whether I'm headed in the right
   direction and then what my query would be?
  
   Thanks,
   Andy Pickler
  
 



Re: Out of memory on some faceting queries

2013-04-02 Thread Dotan Cohen
On Tue, Apr 2, 2013 at 2:41 PM, Toke Eskildsen t...@statsbiblioteket.dk wrote:
 9M documents in a heavily updated index with faceting. Maybe you are
 committing faster than the faceting can be prepared?
 https://wiki.apache.org/solr/FAQ#What_does_.22exceeded_limit_of_maxWarmingSearchers.3DX.22_mean.3F


Thank you Toke, this is exactly on my list of things to learn about
Solr. We do get the error mentioned and we cannot reduce the amount
of commits. Also, I do believe that we have the necessary server
resources (16 GiB RAM).

I have increased maxWarmingSearchers to 4, let's see how this goes.

Thank you.

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: Slaves always replicate entire index Index versions

2013-04-02 Thread yayati
I moved solr 4.1 to solr 4.2 on one of slave server earlier my index
directory has index.timestamp, but now, it has only index folder no
timestamp. Is this is bug.?? Though size of index is same as on master . It
shows replication running on dasboard with both master and slave version.
what happened to timestamp in index directory


index.timestamp  -- earlier with 4.1

index  -- this is new folder

Please reply asap.

thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Slaves-always-replicate-entire-index-Index-versions-tp4041256p4053179.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Top 10 Terms in Index (by date)

2013-04-02 Thread Andy Pickler
A key problem with those approaches as well as Lucene's HighFreqTerms class
(
http://lucene.apache.org/core/4_2_0/misc/org/apache/lucene/misc/HighFreqTerms.html)
is that none of them seem to have the ability to combine with a date range
query...which is key in my scenario.  I'm kinda thinking that what I'm
asking to do just isn't supported by Lucene or Solr, and that I'll have to
pursue another avenue.  If anyone has any other suggestions, I'm all ears.
I'm starting to wonder if I need to have some nightly batch job that
executes against my database and builds up that day's top terms in a
table or something.

Thanks,
Andy Pickler

On Tue, Apr 2, 2013 at 7:16 AM, Tomás Fernández Löbbe tomasflo...@gmail.com
 wrote:

 Oh, I see, essentially you want to get the sum of the term frequencies for
 every term in a subset of documents (instead of the document frequency as
 the FacetComponent would give you). I don't know of an easy/out of the box
 solution for this. I know the TermVectorComponent will give you the tf for
 every term in a document, but I'm not sure if you can filter or sort on it.
 Maybe you can do something like:
 https://issues.apache.org/jira/browse/LUCENE-2393
 or what's suggested here:
 http://search-lucene.com/m/of5Fn1PUOHU/
 but I have never used something like that.

 Tomás



 On Mon, Apr 1, 2013 at 9:58 PM, Andy Pickler andy.pick...@gmail.com
 wrote:

  I need total number of occurrences across all documents for each term.
  Imagine this...
 
  Post #1: I think, therefore I am like you
  Reply #1: You think too much
  Reply #2 I think that I think much as you
 
  Each of those documents are put into 'content'.  Pretending I don't
 have
  stop words, the top term query (not considering dateCreated in this
  example) would result in something like...
 
  think: 4
  I: 4
  you: 3
  much: 2
  ...
 
  Thus, just a number of documents approach doesn't work, because if a
 word
  occurs more than one time in a document it needs to be counted that many
  times.  That seemed to rule out faceting like you mentioned as well as
 the
  TermsComponent (which as I understand also only counts documents).
 
  Thanks,
  Andy Pickler
 
  On Mon, Apr 1, 2013 at 4:31 PM, Tomás Fernández Löbbe 
  tomasflo...@gmail.com
   wrote:
 
   So you have one document per user comment? Why not use faceting plus
   filtering on the dateCreated field? That would count number of
   documents for each term (so, in your case, if a term is used twice in
  one
   comment it would only count once). Is that what you are looking for?
  
   Tomás
  
  
   On Mon, Apr 1, 2013 at 6:32 PM, Andy Pickler andy.pick...@gmail.com
   wrote:
  
Our company has an application that is Facebook-like for usage by
enterprise customers.  We'd like to do a report of top 10 terms
  entered
   by
users over (some time period).  With that in mind I'm using the
DataImportHandler to put all the relevant data from our database
 into a
Solr 'content' field:
   
field name=content type=text_general indexed=true
 stored=false
multiValued=false required=true termVectors=true/
   
Along with the content is the 'dateCreated' for that content:
   
field name=dateCreated type=tdate indexed=true stored=false
multiValued=false required=true/
   
I'm struggling with the TermVectorComponent documentation to
 understand
   how
I can put together a query that answers the 'report' mentioned above.
For
each document I need each term counted however many times it is
 entered
(content of I think what I think would report 'think' as used
 twice).
 Does anyone have any insight as to whether I'm headed in the right
direction and then what my query would be?
   
Thanks,
Andy Pickler
   
  
 



performance on concurrent search request

2013-04-02 Thread Anatoli Matuskova
In this thread about performance on concurrent search requests, Otis said:
http://lucene.472066.n3.nabble.com/how-to-improve-concurrent-request-performance-and-stress-testing-td496411.html

/Imagine this type of code: 

synchronized (someGlobalObject) { 
  // search 
} 

What happens when  100 threads his this spot?  The first one to get there
gets in and runs the search and 99 of them wait. 
What happens if that  // search also involves expensive operations, lots
of IO, warming up, cache population, etc?  Those 99 threads will have to
wait a while :) 

That's why it is recommended to warm up the searcher ahead of time before
exposing it to real requests.  However, even if you warm things up, that
sync block will remain there, and at some point this will become a
bottleneck.  What that point is depends on the hardware, index size, query
complexity and rat, even JVM. 

Otis /

I'm wondering if this synchronized is still an issue in Solr 4.x? Is it
because how Solr deals with the index searcher or is it because how it is
implemented in Lucene?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/performance-on-concurrent-search-request-tp4053182.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Flow Chart of Solr

2013-04-02 Thread Koji Sekiguchi

(13/04/02 21:45), Furkan KAMACI wrote:

Is there any documentation something like flow chart of Solr. i.e.
Documents comes into Solr(maybe indicating which classes get documents) and
goes to parsing process (i.e. stemming processes etc.) and then reverse
indexes are get so on so forth?



There is an interesting ticket:

Architecture Diagrams needed for Lucene, Solr and Nutch
https://issues.apache.org/jira/browse/LUCENE-2412

koji
--
http://soleami.com/blog/lucene-4-is-super-convenient-for-developing-nlp-tools.html


Re: Out of memory on some faceting queries

2013-04-02 Thread Toke Eskildsen
On Tue, 2013-04-02 at 15:55 +0200, Dotan Cohen wrote:

[Tokd: maxWarmingSearchers limit exceeded?]

 Thank you Toke, this is exactly on my list of things to learn about
 Solr. We do get the error mentioned and we cannot reduce the amount
 of commits. Also, I do believe that we have the necessary server
 resources (16 GiB RAM).

Memory does not help you if you commit too frequently. If you commit
each X seconds and warming takes X+Y seconds, then you will run out of
memory at some point.

 I have increased maxWarmingSearchers to 4, let's see how this goes.

If you still get the error with 4 concurrent searchers, you will have to
either speed up warmup time or commit less frequently. You should be
able to reduce facet startup time by switching to segment based faceting
(at the cost of worse search-time performance) or maybe by using
DocValues. Some of the current threads on the solr-user list is about
these topics.

How often do you commit and how many unique values does your facet
fields have?

Regards,
Toke Eskildsen



Re: Out of memory on some faceting queries

2013-04-02 Thread Dotan Cohen
On Tue, Apr 2, 2013 at 5:33 PM, Toke Eskildsen t...@statsbiblioteket.dk wrote:
 On Tue, 2013-04-02 at 15:55 +0200, Dotan Cohen wrote:

 [Tokd: maxWarmingSearchers limit exceeded?]

 Thank you Toke, this is exactly on my list of things to learn about
 Solr. We do get the error mentioned and we cannot reduce the amount
 of commits. Also, I do believe that we have the necessary server
 resources (16 GiB RAM).

 Memory does not help you if you commit too frequently. If you commit
 each X seconds and warming takes X+Y seconds, then you will run out of
 memory at some point.

 I have increased maxWarmingSearchers to 4, let's see how this goes.

 If you still get the error with 4 concurrent searchers, you will have to
 either speed up warmup time or commit less frequently. You should be
 able to reduce facet startup time by switching to segment based faceting
 (at the cost of worse search-time performance) or maybe by using
 DocValues. Some of the current threads on the solr-user list is about
 these topics.

 How often do you commit and how many unique values does your facet
 fields have?

 Regards,
 Toke Eskildsen




-- 
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: Flow Chart of Solr

2013-04-02 Thread Andre Bois-Crettez


On 04/02/2013 04:20 PM, Koji Sekiguchi wrote:

(13/04/02 21:45), Furkan KAMACI wrote:

Is there any documentation something like flow chart of Solr. i.e.
Documents comes into Solr(maybe indicating which classes get documents) and
goes to parsing process (i.e. stemming processes etc.) and then reverse
indexes are get so on so forth?


There is an interesting ticket:

Architecture Diagrams needed for Lucene, Solr and Nutch
https://issues.apache.org/jira/browse/LUCENE-2412

koji


I like this one, it is a bit more detailed :

http://www.cominvent.com/2011/04/04/solr-architecture-diagram/

--
André Bois-Crettez

Search technology, Kelkoo
http://www.kelkoo.com/


Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 8, rue du Sentier 75002 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention 
exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce 
message, merci de le détruire et d'en avertir l'expéditeur.


Re: Flow Chart of Solr

2013-04-02 Thread Furkan KAMACI
Actually maybe one the most important core thing is that Analysis part at
last diagram but there is nothing about it i.e. stamming, lemmitazing etc.
at any of them.


2013/4/2 Andre Bois-Crettez andre.b...@kelkoo.com


 On 04/02/2013 04:20 PM, Koji Sekiguchi wrote:

 (13/04/02 21:45), Furkan KAMACI wrote:

 Is there any documentation something like flow chart of Solr. i.e.
 Documents comes into Solr(maybe indicating which classes get documents)
 and
 goes to parsing process (i.e. stemming processes etc.) and then reverse
 indexes are get so on so forth?

  There is an interesting ticket:

 Architecture Diagrams needed for Lucene, Solr and Nutch
 https://issues.apache.org/**jira/browse/LUCENE-2412https://issues.apache.org/jira/browse/LUCENE-2412

 koji


 I like this one, it is a bit more detailed :

 http://www.cominvent.com/2011/**04/04/solr-architecture-**diagram/http://www.cominvent.com/2011/04/04/solr-architecture-diagram/

 --
 André Bois-Crettez

 Search technology, Kelkoo
 http://www.kelkoo.com/


 Kelkoo SAS
 Société par Actions Simplifiée
 Au capital de € 4.168.964,30
 Siège social : 8, rue du Sentier 75002 Paris
 425 093 069 RCS Paris

 Ce message et les pièces jointes sont confidentiels et établis à
 l'attention exclusive de leurs destinataires. Si vous n'êtes pas le
 destinataire de ce message, merci de le détruire et d'en avertir
 l'expéditeur.



Re: Slaves always replicate entire index Index versions

2013-04-02 Thread Arkadi Colson
The index folder is indeed gone but it seems to work. Maybe just a 
structural change...


Met vriendelijke groeten

Arkadi Colson

Smartbit bvba • Hoogstraat 13 • 3670 Meeuwen
T +32 11 64 08 80 • F +32 11 64 08 81

On 04/02/2013 04:08 PM, yayati wrote:

I moved solr 4.1 to solr 4.2 on one of slave server earlier my index
directory has index.timestamp, but now, it has only index folder no
timestamp. Is this is bug.?? Though size of index is same as on master . It
shows replication running on dasboard with both master and slave version.
what happened to timestamp in index directory


index.timestamp  -- earlier with 4.1

index  -- this is new folder

Please reply asap.

thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Slaves-always-replicate-entire-index-Index-versions-tp4041256p4053179.html
Sent from the Solr - User mailing list archive at Nabble.com.






Re: Out of memory on some faceting queries

2013-04-02 Thread Dotan Cohen
On Tue, Apr 2, 2013 at 5:33 PM, Toke Eskildsen t...@statsbiblioteket.dk wrote:
 Memory does not help you if you commit too frequently. If you commit
 each X seconds and warming takes X+Y seconds, then you will run out of
 memory at some point.


How might I time the warming? I've been googling warming since your
earlier message but there does not seem to be any really good
documentation on the subject. If there is anything that you feel I
should be reading I would appreciate a link or a keyword to search on.
I've read the Solr wiki on caching and performance, but other than
that I don't see the issue addressed.


 I have increased maxWarmingSearchers to 4, let's see how this goes.

 If you still get the error with 4 concurrent searchers, you will have to
 either speed up warmup time or commit less frequently. You should be
 able to reduce facet startup time by switching to segment based faceting
 (at the cost of worse search-time performance) or maybe by using
 DocValues. Some of the current threads on the solr-user list is about
 these topics.

 How often do you commit and how many unique values does your facet
 fields have?


Batches of 20-50 results are added to solr a few times a minute, and a
commit is done after each batch since I'm calling Solr as such:
http://127.0.0.1:8983/solr/core/update/json?commit=true

Should I remove commit=true and run a cron job to commit once per minute?

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: Out of memory on some faceting queries

2013-04-02 Thread Dotan Cohen
 How often do you commit and how many unique values does your facet
 fields have?


Most of the time I facet on one field that has about twenty unique
values. However, once per day I would like to facet on the text field,
which is a free-text field usually around 1 KiB (about 100 words), in
order to determine what the top keywords / topics are. That query
would take up to 200 seconds to run, but it does not have to return
the results in real-time (the output goes to another process, not to a
waiting user).

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: Flow Chart of Solr

2013-04-02 Thread Yago Riveiro
For beginners is complicate understand the complexity of solr / lucene, I'm 
trying devel a custom search component and it's too hard keep in mind the flow, 
inheritance and iteration between classes. I think that there is a gap between 
software doc and user doc, or maybe I don't search enough T_T. Java doc not 
always is clear always.  

The fact that I'm beginner in solr world don't help.

Either way, this thread was very helpful, I found some very good resources here 
:)   

Cumprimentos

--  
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Tuesday, April 2, 2013 at 3:51 PM, Furkan KAMACI wrote:

 Actually maybe one the most important core thing is that Analysis part at
 last diagram but there is nothing about it i.e. stamming, lemmitazing etc.
 at any of them.
  
  
 2013/4/2 Andre Bois-Crettez andre.b...@kelkoo.com 
 (mailto:andre.b...@kelkoo.com)
  
   
  On 04/02/2013 04:20 PM, Koji Sekiguchi wrote:
   
   (13/04/02 21:45), Furkan KAMACI wrote:

Is there any documentation something like flow chart of Solr. i.e.
Documents comes into Solr(maybe indicating which classes get documents)
and
goes to parsing process (i.e. stemming processes etc.) and then reverse
indexes are get so on so forth?
 
There is an interesting ticket:

   Architecture Diagrams needed for Lucene, Solr and Nutch
   https://issues.apache.org/**jira/browse/LUCENE-2412https://issues.apache.org/jira/browse/LUCENE-2412

   koji
   
  I like this one, it is a bit more detailed :
   
  http://www.cominvent.com/2011/**04/04/solr-architecture-**diagram/http://www.cominvent.com/2011/04/04/solr-architecture-diagram/
   
  --
  André Bois-Crettez
   
  Search technology, Kelkoo
  http://www.kelkoo.com/
   
   
  Kelkoo SAS
  Société par Actions Simplifiée
  Au capital de € 4.168.964,30
  Siège social : 8, rue du Sentier 75002 Paris
  425 093 069 RCS Paris
   
  Ce message et les pièces jointes sont confidentiels et établis à
  l'attention exclusive de leurs destinataires. Si vous n'êtes pas le
  destinataire de ce message, merci de le détruire et d'en avertir
  l'expéditeur.
   
  
  
  




Re: Flow Chart of Solr

2013-04-02 Thread Furkan KAMACI
You are right about mentioning developer doc and user doc. Users separate
about it. Some of them uses Solr for indexing and monitoring via admin face
and that is quietly enough for them however some people wants to modify it
so it would be nice if there had been some documentation for developer side
too.


2013/4/2 Yago Riveiro yago.rive...@gmail.com

 For beginners is complicate understand the complexity of solr / lucene,
 I'm trying devel a custom search component and it's too hard keep in mind
 the flow, inheritance and iteration between classes. I think that there is
 a gap between software doc and user doc, or maybe I don't search enough
 T_T. Java doc not always is clear always.

 The fact that I'm beginner in solr world don't help.

 Either way, this thread was very helpful, I found some very good resources
 here :)

 Cumprimentos

 --
 Yago Riveiro
 Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


 On Tuesday, April 2, 2013 at 3:51 PM, Furkan KAMACI wrote:

  Actually maybe one the most important core thing is that Analysis part at
  last diagram but there is nothing about it i.e. stamming, lemmitazing
 etc.
  at any of them.
 
 
  2013/4/2 Andre Bois-Crettez andre.b...@kelkoo.com (mailto:
 andre.b...@kelkoo.com)
 
  
   On 04/02/2013 04:20 PM, Koji Sekiguchi wrote:
  
(13/04/02 21:45), Furkan KAMACI wrote:
   
 Is there any documentation something like flow chart of Solr. i.e.
 Documents comes into Solr(maybe indicating which classes get
 documents)
 and
 goes to parsing process (i.e. stemming processes etc.) and then
 reverse
 indexes are get so on so forth?

 There is an interesting ticket:
   
Architecture Diagrams needed for Lucene, Solr and Nutch
https://issues.apache.org/**jira/browse/LUCENE-2412
 https://issues.apache.org/jira/browse/LUCENE-2412
   
koji
  
   I like this one, it is a bit more detailed :
  
   http://www.cominvent.com/2011/**04/04/solr-architecture-**diagram/
 http://www.cominvent.com/2011/04/04/solr-architecture-diagram/
  
   --
   André Bois-Crettez
  
   Search technology, Kelkoo
   http://www.kelkoo.com/
  
  
   Kelkoo SAS
   Société par Actions Simplifiée
   Au capital de € 4.168.964,30
   Siège social : 8, rue du Sentier 75002 Paris
   425 093 069 RCS Paris
  
   Ce message et les pièces jointes sont confidentiels et établis à
   l'attention exclusive de leurs destinataires. Si vous n'êtes pas le
   destinataire de ce message, merci de le détruire et d'en avertir
   l'expéditeur.
  
 
 
 





Re: [ANNOUNCE] Solr wiki editing change

2013-04-02 Thread Ryan Ernst
Please add RyanErnst to the contributors group.  Thanks!


On Mon, Apr 1, 2013 at 7:04 PM, Steve Rowe sar...@gmail.com wrote:

 On Apr 1, 2013, at 9:40 PM, Vaillancourt, Tim tvaillanco...@ea.com
 wrote:
  I would also like to contribute to SolrCloud's wiki where possible.
 Please add myself (TimVaillancourt) when you have a chance.

 Added to solr wiki ContributorsGroup.


Re: [ANNOUNCE] Solr wiki editing change

2013-04-02 Thread Steve Rowe
On Apr 2, 2013, at 11:23 AM, Ryan Ernst r...@iernst.net wrote:
 Please add RyanErnst to the contributors group.  Thanks!

Added to solr wiki ContributorsGroup.


Re: Out of memory on some faceting queries

2013-04-02 Thread Andre Bois-Crettez

On 04/02/2013 05:04 PM, Dotan Cohen wrote:

How might I time the warming? I've been googling warming since your
earlier message but there does not seem to be any really good
documentation on the subject. If there is anything that you feel I
should be reading I would appreciate a link or a keyword to search on.
I've read the Solr wiki on caching and performance, but other than
that I don't see the issue addressed.


warmupTime is available on the admin page for each type of cache (in
milliseconds) :
http://solr-box:8983/solr/#/core1/plugins/cache

Or if you are only interested in the total :
http://solr-box:8983/solr/core1/admin/mbeans?stats=truekey=searcher


Batches of 20-50 results are added to solr a few times a minute, and a
commit is done after each batch since I'm calling Solr as such:
http://127.0.0.1:8983/solr/core/update/json?commit=true Should I
remove commit=true and run a cron job to commit once per minute?


Even better, it sounds like a job for CommitWithin :
http://wiki.apache.org/solr/CommitWithin


André

Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 8, rue du Sentier 75002 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention 
exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce 
message, merci de le détruire et d'en avertir l'expéditeur.


Re: [ANNOUNCE] Solr wiki editing change

2013-04-02 Thread Furkan KAMACI
Hi;

Please add FurkanKAMACI to the group.

Thanks;
Furkan KAMACI


2013/4/2 Steve Rowe sar...@gmail.com

 On Apr 2, 2013, at 11:23 AM, Ryan Ernst r...@iernst.net wrote:
  Please add RyanErnst to the contributors group.  Thanks!

 Added to solr wiki ContributorsGroup.



Job: Apache solr (Recruiting)

2013-04-02 Thread jessica katz
We have openings for Middleware architects (Apache solr)
*Locations:* Mountain View, California,New York City, NY, Houston, TEXAS

Mail me your resumes to jess...@kudukisgroup.com.
We can discuss more over the phone.

Thanks,
Jessica


Re: [ANNOUNCE] Solr wiki editing change

2013-04-02 Thread Steve Rowe
On Apr 2, 2013, at 11:28 AM, Furkan KAMACI furkankam...@gmail.com wrote:
 Please add FurkanKAMACI to the group.

Added to solr wiki ContributorsGroup.



Solrj 4.2 - CloudSolrServer aliases are not loaded

2013-04-02 Thread Elodie Sannier

Hello,

I am using the new collection alias feature, and it seems
CloudSolrServer class (solrj 4.2.0) does not allow to use it, either for
update or select.

When I'm requesting the CloudSolrServer with a collection alias name, I
have the error:
org.apache.solr.common.SolrException: Collection not found:
aliasedCollection

The collection alias cannot be found because, in
CloudSolrServer#getCollectionList (line 319) method, the alias variable
is always empty.

When I'm requesting the CloudSolrServer, the connect method is called
and it calls the ZkStateReader#createClusterStateWatchersAndUpdate method.
In the ZkStateReader#createClusterStateWatchersAndUpdate method, the
aliases are not loaded.

line 295, the data from /clusterstate.json are loaded :
ClusterState clusterState = ClusterState.load(zkClient, liveNodeSet);
this.clusterState = clusterState;

Should we have the same data loading from /aliases.json, in order to
fill the aliases field ?
line 299, a Watcher for aliases is created but does not seem used.


As a workaround to avoid the error, I have to force the aliases loading
at my application start and when the aliases are updated:
CloudSolrServer solrServer = new CloudSolrServer(localhost:2181);
solrServer.setDefaultCollection(aliasedCollection);
solrServer.connect();
solrServer.getZkStateReader().updateAliases();

Is there a better way to use collection aliases with solrj ?

Elodie Sannier

Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 8, rue du Sentier 75002 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention 
exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce 
message, merci de le détruire et d'en avertir l'expéditeur.


Re: Collection name via Collections API (Solr 4.x)

2013-04-02 Thread Lukasz Kujawa
If I use admin API instead of collection API according to my understanding
the new core will be only available on that server. If I will query
different solr server I will get an error. If I use collections API and I
query a server which physically doesn't hold the data I will still get
results. Creating cores manually across all Solr servers doesn't feel like
the right way to go.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Collection-name-via-Collections-API-Solr-4-x-tp4053155p4053230.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr URL uses non-standard format with pound sign

2013-04-02 Thread Dennis Haller
The Solr URL in Solr 4.2 for my localhost installation looks like this:
http://localhost:8883/solr/#/development_shard1_replica1

This URL when constructed dynamically in Ruby will not validate with the
Ruby URI:HTTP class because of the # sign in the path. This is a
non-standard URL as per RFC1738.

Here is the error message:

#URI::InvalidComponentError: bad component(expected absolute path
component): /solr/#/development_shard1_replica1


Is there another way to access the Solr URL without using the # sign?

Thanks,
Dennis Haller


Re: Solr URL uses non-standard format with pound sign

2013-04-02 Thread Chris Hostetter

: The Solr URL in Solr 4.2 for my localhost installation looks like this:
: http://localhost:8883/solr/#/development_shard1_replica1
: 
: This URL when constructed dynamically in Ruby will not validate with the
: Ruby URI:HTTP class because of the # sign in the path. This is a
: non-standard URL as per RFC1738.

1) RFC 1738 is antiquated, Among other things, RFC 3986 is much relevant 
and clarifies that # is a fragment identifier

2) the URL you are refering to is a *UI* view, and the fragement 
(/development_shard1_replica1) is dealt with entirely by your web browser 
via javascript.

3) for dealing with solr's HTTP APIs programaticly the type of base url 
you want will either be http://localhost:8883/solr/; or 
http://localhost:8883/solr/development_shard1_replica1; depending on 
wether your client code is expecting a base url for the entire server (to 
query multiple SolrCores), or a base url for a single SolrCore.


-Hoss


Re: Flow Chart of Solr

2013-04-02 Thread Alexandre Rafalovitch
I think there is a gap in the support of one's path of learning Solr . I'll
try to describe it based on my own experience. Hopefully, it is helpful.

At First, there is a Solr is a blackbox stage, where the person may not
know Java and is just using out of the box components. Wiki is reasonably
helpful there and there are other resources (blogs, etc). At this point,
Lucene is a black box within the black box and is something that is safely
ignored.

At the second stage, one hits the period where he/she understands what is
going on in their basic scenario and is trying to get into more advanced
case. This could be putting together a complex analyzer chain, trying to
use Update Request Processors or optimizing slow/OOM imports or doing
complex queries. Suddenly, they are pointed directly at Javadocs and have
to figure out the way around Java-based instructions. A Java programmer can
bridge that gap and get over the curve, but I suspect others get lost very
quickly and get stuck even when they don't need to be good programmers. An
example in my mind would be something like RegexReplaceProcessor. One has
to climb up and down the inheritance chain of the Javadoc to figure out
what can be done and what the parameters are. And the parameters syntax is
Java regular expressions rather than something used in copyField, so they
need to jump over and figure that out. So, it is fairly hard to envisage
those pieces and how they can combine together. Similarly, some of the
stuff is described in Jira requests, but also in a way that requires a
programmer's mind-set to parse it out. I think a lot of people drop out at
this stage and fall-back to 'black-box' view of Solr. Most of the questions
I see on Stack Overflow are conceptual troubles at this stage.

And then, those who get to the third stage, jump to the advanced level
where one could just read the source code to figure out what is going on. I
found www.grepcode.com to be useful (though it is quite slow now and is a
bit behind for Solr). Somewhere around here, one also starts to realize the
fuzzy relation between the Lucene and Solr code and becomes somewhat
clearer what Solr's benefits actually are (as opposed to bare Lucene's).
This also generates its own frustration and confusion of course, because
suddenly one starts to wish for Lucene's features that Solr does not use
(e.g. split/sync analyzer chains, some alternative facet implementation
features, etc).

And finally (at the end of the beginning), you become the contributor
and become very familiar with subversion/ant/etc. Though, I suspect, the
contributors become more specialized and actually understand less about
other parts of the system (e.g. Is anyone still fully understanding DIH?).

I am not blaming anyone with this story for the lack of support. I think
Solr is - in many ways - better documented than many other open source
projects. And the new manual being contributed to replace Wiki will (soon?)
make this even better. And, of course, this mailing list
is indescribably awesome. I am just trying to provide a fresh view of what
I went through and where I see people getting stuck.

I think a bit more effort in documenting that second stage would bring more
people to the community. I am trying to do my share through Wiki updates,
questions here, Jira issues, my upcoming book and some other little things.
I see others do the same. Perhaps, the diagram is something that we should
explicitly try to do. Though, I think it would be more fun to do it as a
Scrollorama Inception Explained style (
http://www.inception-explained.com/). :-)

Regards,
   Alex.


Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Tue, Apr 2, 2013 at 11:22 AM, Furkan KAMACI furkankam...@gmail.comwrote:

 You are right about mentioning developer doc and user doc. Users separate
 about it. Some of them uses Solr for indexing and monitoring via admin face
 and that is quietly enough for them however some people wants to modify it
 so it would be nice if there had been some documentation for developer side
 too.


 2013/4/2 Yago Riveiro yago.rive...@gmail.com

  For beginners is complicate understand the complexity of solr / lucene,
  I'm trying devel a custom search component and it's too hard keep in mind
  the flow, inheritance and iteration between classes. I think that there
 is
  a gap between software doc and user doc, or maybe I don't search enough
  T_T. Java doc not always is clear always.
 
  The fact that I'm beginner in solr world don't help.
 
  Either way, this thread was very helpful, I found some very good
 resources
  here :)
 
  Cumprimentos
 
  --
  Yago Riveiro
  Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
 
 
  On Tuesday, April 2, 2013 at 3:51 PM, Furkan KAMACI wrote:
 
   Actually maybe one the most 

Re: Collection name via Collections API (Solr 4.x)

2013-04-02 Thread Yago Riveiro
Solr 4.2 implements a feature to proxy requests if the core not exists in node 
requested. https://issues.apache.org/jira/browse/SOLR-4210

Actually exists a bug in this mechanism 
https://issues.apache.org/jira/browse/SOLR-4584 

Without the proxy feature, creating the cores using manually or on automatic 
way, you only can query the collection in nodes that have least 1 replica of 
the collection.

If you have a solrCluster with 4 nodes and the collection only have 2 shards 
without replicas, then you can only query the collection in 50% of the cluster. 
(assuming that proxy request mechanism doesn't work properly)

When I said to create manually the collection, you need to create manually all 
shards that form the collection and the replicas in the others nodes of the 
cluster. It takes work, but if you want have some control you need to pay the 
price.

If it is possible that you can manage the name of shard with the collection 
API, the documentation doesn't say how.



-- 
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Tuesday, April 2, 2013 at 5:15 PM, Lukasz Kujawa wrote:

 l 



Re: Collection name via Collections API (Solr 4.x)

2013-04-02 Thread Lukasz Kujawa
Thank you for you answers Yriveiro. I'm trying to use Solr for a big SaaS
platform. The reason why I want everything dynamic is each user will get own
Solr collection. It looks like there are still many issues with the
distributed computing. I hope 4.3 will arrive soon ;-) Anyway.. once again
thank you for your time.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Collection-name-via-Collections-API-Solr-4-x-tp4053155p4053245.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Flow Chart of Solr

2013-04-02 Thread Yago Riveiro
Alexandre,   

You describe the normal path when a beginner try to use a source of code that 
doesn't understand, black-box, reading code, hacking, ok now I know 10% of the 
project, with lucky :p.

First at all, the Solr community is fantastic and always helps when I need it. 
IMHO the devel documentation is dispersed in a lot of sources, blogs, wiki, 
lucidWorks wiki (I know that this wiki was donated to apache and it's in 
progress to present to the world as part of the project).

The curve for do funny thing with Solr at source level is hard, I see a lot of 
webinars teaching how deploy and use solr, but not how developing a 
ResponseWriter or a SearchComponent.

Unfortunately I don't have the knowledge to contribute right, in the future … 
will see.

--  
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Tuesday, April 2, 2013 at 5:24 PM, Alexandre Rafalovitch wrote:

 ommunity. I am trying to do my share throu  



Re: Collection name via Collections API (Solr 4.x)

2013-04-02 Thread Yago Riveiro
I use solr with a similar propose, I'm understand that you want have control 
that as the sharing is done :) 

Regards.

-- 
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Tuesday, April 2, 2013 at 5:54 PM, Lukasz Kujawa wrote:

 Thank you for you answers Yriveiro. I'm trying to use Solr for a big SaaS
 platform. The reason why I want everything dynamic is each user will get own
 Solr collection. It looks like there are still many issues with the
 distributed computing. I hope 4.3 will arrive soon ;-) Anyway.. once again
 thank you for your time.
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Collection-name-via-Collections-API-Solr-4-x-tp4053155p4053245.html
 Sent from the Solr - User mailing list archive at Nabble.com 
 (http://Nabble.com).
 
 




Re: Solrj 4.2 - CloudSolrServer aliases are not loaded

2013-04-02 Thread Mark Miller
Answers inline:

On Apr 2, 2013, at 11:45 AM, Elodie Sannier elodie.sann...@kelkoo.fr wrote:

 Hello,
 
 I am using the new collection alias feature, and it seems
 CloudSolrServer class (solrj 4.2.0) does not allow to use it, either for
 update or select.
 
 When I'm requesting the CloudSolrServer with a collection alias name, I
 have the error:
 org.apache.solr.common.SolrException: Collection not found:
 aliasedCollection
 
 The collection alias cannot be found because, in
 CloudSolrServer#getCollectionList (line 319) method, the alias variable
 is always empty.
 
 When I'm requesting the CloudSolrServer, the connect method is called
 and it calls the ZkStateReader#createClusterStateWatchersAndUpdate method.
 In the ZkStateReader#createClusterStateWatchersAndUpdate method, the
 aliases are not loaded.
 
 line 295, the data from /clusterstate.json are loaded :
 ClusterState clusterState = ClusterState.load(zkClient, liveNodeSet);
 this.clusterState = clusterState;
 
 Should we have the same data loading from /aliases.json, in order to
 fill the aliases field ?
 line 299, a Watcher for aliases is created but does not seem used.

The Watcher is used. It updates the Aliases if they changed - there is some lag 
time though. There is some work that tries to avoid the lag in the update being 
a problem, but I'm guessing somehow it's not covering your case. 

It wouldn't hurt to add the updateAliases call automatically on ZkStateReader 
init. If the watcher was indeed not being used, that would not solve things 
though - the client still needs to be able to detect alias additions and 
changes.

Your best bet is to file a JIRA issue so we can work on a test that mimics what 
you are seeing.

- Mark

 
 
 As a workaround to avoid the error, I have to force the aliases loading
 at my application start and when the aliases are updated:
 CloudSolrServer solrServer = new CloudSolrServer(localhost:2181);
 solrServer.setDefaultCollection(aliasedCollection);
 solrServer.connect();
 solrServer.getZkStateReader().updateAliases();
 
 Is there a better way to use collection aliases with solrj ?
 
 Elodie Sannier
 
 Kelkoo SAS
 Société par Actions Simplifiée
 Au capital de € 4.168.964,30
 Siège social : 8, rue du Sentier 75002 Paris
 425 093 069 RCS Paris
 
 Ce message et les pièces jointes sont confidentiels et établis à l'attention 
 exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce 
 message, merci de le détruire et d'en avertir l'expéditeur.



A request handler that manipulated the index

2013-04-02 Thread Benson Margulies
I am thinking about trying to structure a problem as a Solr plugin. The
nature of the plugin is that it would need to read and write the lucene
index to do its work. It could not be cleanly split into URP 'over here'
and a Search Component 'over there'.

Are there invariants of Solr that would preclude this, like assumptions in
the implementation of the cache?


Re: Solrj 4.2 - CloudSolrServer aliases are not loaded

2013-04-02 Thread Mark Miller
I think the current tests probably build the cloudsolrserver before creating 
the aliases - sounds like we need to do some creating the cloudsolrserver after.

- Mark

On Apr 2, 2013, at 1:31 PM, Mark Miller markrmil...@gmail.com wrote:

 Answers inline:
 
 On Apr 2, 2013, at 11:45 AM, Elodie Sannier elodie.sann...@kelkoo.fr wrote:
 
 Hello,
 
 I am using the new collection alias feature, and it seems
 CloudSolrServer class (solrj 4.2.0) does not allow to use it, either for
 update or select.
 
 When I'm requesting the CloudSolrServer with a collection alias name, I
 have the error:
 org.apache.solr.common.SolrException: Collection not found:
 aliasedCollection
 
 The collection alias cannot be found because, in
 CloudSolrServer#getCollectionList (line 319) method, the alias variable
 is always empty.
 
 When I'm requesting the CloudSolrServer, the connect method is called
 and it calls the ZkStateReader#createClusterStateWatchersAndUpdate method.
 In the ZkStateReader#createClusterStateWatchersAndUpdate method, the
 aliases are not loaded.
 
 line 295, the data from /clusterstate.json are loaded :
 ClusterState clusterState = ClusterState.load(zkClient, liveNodeSet);
 this.clusterState = clusterState;
 
 Should we have the same data loading from /aliases.json, in order to
 fill the aliases field ?
 line 299, a Watcher for aliases is created but does not seem used.
 
 The Watcher is used. It updates the Aliases if they changed - there is some 
 lag time though. There is some work that tries to avoid the lag in the update 
 being a problem, but I'm guessing somehow it's not covering your case. 
 
 It wouldn't hurt to add the updateAliases call automatically on ZkStateReader 
 init. If the watcher was indeed not being used, that would not solve things 
 though - the client still needs to be able to detect alias additions and 
 changes.
 
 Your best bet is to file a JIRA issue so we can work on a test that mimics 
 what you are seeing.
 
 - Mark
 
 
 
 As a workaround to avoid the error, I have to force the aliases loading
 at my application start and when the aliases are updated:
 CloudSolrServer solrServer = new CloudSolrServer(localhost:2181);
 solrServer.setDefaultCollection(aliasedCollection);
 solrServer.connect();
 solrServer.getZkStateReader().updateAliases();
 
 Is there a better way to use collection aliases with solrj ?
 
 Elodie Sannier
 
 Kelkoo SAS
 Société par Actions Simplifiée
 Au capital de € 4.168.964,30
 Siège social : 8, rue du Sentier 75002 Paris
 425 093 069 RCS Paris
 
 Ce message et les pièces jointes sont confidentiels et établis à l'attention 
 exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce 
 message, merci de le détruire et d'en avertir l'expéditeur.
 



Re: Flow Chart of Solr

2013-04-02 Thread Alexandre Rafalovitch
Yago,

My point - perhaps lost in too much text - was that Solr is presented - and
can function - as a black-box. Which makes it different from more
traditional open-source project. So, the stage-2 happens exactly when the
non-programmers have to cross the boundary from the black-box into
code-first approach and the hand-off is not particularly smooth. Or even
when - say - php or .Net programmer  tries to get beyond the basic
operations their client library and has the understand the server-side
aspects of Solr.

Regards,
   Alex.

On Tue, Apr 2, 2013 at 1:19 PM, Yago Riveiro yago.rive...@gmail.com wrote:

 Alexandre,

 You describe the normal path when a beginner try to use a source of code
 that doesn't understand, black-box, reading code, hacking, ok now I know
 10% of the project, with lucky :p.



Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


WADL for REST service?

2013-04-02 Thread Peter Sch�tt
Hallo,

does a WADL exists for the REST service of SOLR?

Ciao
  Peter Schütt



Re: Solrj 4.2 - CloudSolrServer aliases are not loaded

2013-04-02 Thread Mark Miller
I've created https://issues.apache.org/jira/browse/SOLR-4664

- Mark

On Apr 2, 2013, at 2:07 PM, Mark Miller markrmil...@gmail.com wrote:

 I think the current tests probably build the cloudsolrserver before creating 
 the aliases - sounds like we need to do some creating the cloudsolrserver 
 after.
 
 - Mark
 
 On Apr 2, 2013, at 1:31 PM, Mark Miller markrmil...@gmail.com wrote:
 
 Answers inline:
 
 On Apr 2, 2013, at 11:45 AM, Elodie Sannier elodie.sann...@kelkoo.fr wrote:
 
 Hello,
 
 I am using the new collection alias feature, and it seems
 CloudSolrServer class (solrj 4.2.0) does not allow to use it, either for
 update or select.
 
 When I'm requesting the CloudSolrServer with a collection alias name, I
 have the error:
 org.apache.solr.common.SolrException: Collection not found:
 aliasedCollection
 
 The collection alias cannot be found because, in
 CloudSolrServer#getCollectionList (line 319) method, the alias variable
 is always empty.
 
 When I'm requesting the CloudSolrServer, the connect method is called
 and it calls the ZkStateReader#createClusterStateWatchersAndUpdate method.
 In the ZkStateReader#createClusterStateWatchersAndUpdate method, the
 aliases are not loaded.
 
 line 295, the data from /clusterstate.json are loaded :
 ClusterState clusterState = ClusterState.load(zkClient, liveNodeSet);
 this.clusterState = clusterState;
 
 Should we have the same data loading from /aliases.json, in order to
 fill the aliases field ?
 line 299, a Watcher for aliases is created but does not seem used.
 
 The Watcher is used. It updates the Aliases if they changed - there is some 
 lag time though. There is some work that tries to avoid the lag in the 
 update being a problem, but I'm guessing somehow it's not covering your 
 case. 
 
 It wouldn't hurt to add the updateAliases call automatically on 
 ZkStateReader init. If the watcher was indeed not being used, that would not 
 solve things though - the client still needs to be able to detect alias 
 additions and changes.
 
 Your best bet is to file a JIRA issue so we can work on a test that mimics 
 what you are seeing.
 
 - Mark
 
 
 
 As a workaround to avoid the error, I have to force the aliases loading
 at my application start and when the aliases are updated:
 CloudSolrServer solrServer = new CloudSolrServer(localhost:2181);
 solrServer.setDefaultCollection(aliasedCollection);
 solrServer.connect();
 solrServer.getZkStateReader().updateAliases();
 
 Is there a better way to use collection aliases with solrj ?
 
 Elodie Sannier
 
 Kelkoo SAS
 Société par Actions Simplifiée
 Au capital de € 4.168.964,30
 Siège social : 8, rue du Sentier 75002 Paris
 425 093 069 RCS Paris
 
 Ce message et les pièces jointes sont confidentiels et établis à 
 l'attention exclusive de leurs destinataires. Si vous n'êtes pas le 
 destinataire de ce message, merci de le détruire et d'en avertir 
 l'expéditeur.
 
 



RE: Confusion over Solr highlight hl.q parameter

2013-04-02 Thread Van Tassell, Kristian
Thanks Koji, this helped with some of our problems, but it is still not perfect.

This query, for example, returns no highlighting:

?q=id:abc123hl.q=text_it_IT:l'assiemehl.fl=text_it_IThl=truedefType=edismax

But this one does (when it is, in effect, the same query):

?q=text_it_IT:l'assiemehl=truedefType=edismaxhl.fl=text_it_IT

I've tried many combinations but can't seem to get the right one to work. Is 
this possibly a bug? 

-Original Message-
From: Koji Sekiguchi [mailto:k...@r.email.ne.jp] 
Sent: Saturday, March 16, 2013 6:14 PM
To: solr-user@lucene.apache.org
Subject: Re: Confusion over Solr highlight hl.q parameter

(13/03/16 4:08), Van Tassell, Kristian wrote:
 Hello everyone,
 
 If I search for a term “baz” and tell it to highlight it, it highlights just 
 fine.
 
 If, however, I search for “foo bar” using the q parameter, which appears in 
 that same document/same field, and use the hl.q parameter to search and 
 highlight “baz”, I get no highlighting results for “baz”.
 
 ?q=パーツにおける機能強化
 qf=text_ja_JP
 defType=edismax
 hl=true
 hl.simple.pre=em
 hl.simple.post=/em
 hl.fl=text_ja_JP
 
 The above highlights query term just fine.
 
 ?q=1234
 hl.q=パーツにおける機能強化
 qf=id
 defType=edismax
 hl=true
 hl.simple.pre=em
 hl.simple.post=/em
 hl.fl=text_ja_JP
 
 This one returns zero highlighting hits.

I'm just guessing, Solr highlighter tries to highlight パーツにおける機能強化 in your 
default search field? Can you try hl.q=text_ja_JP:パーツにおける機能強化 .

koji
--
http://soleami.com/blog/lucene-4-is-super-convenient-for-developing-nlp-tools.html


Re: Flow Chart of Solr

2013-04-02 Thread Furkan KAMACI
I think about myself as an example. I have started to make research about
Solr just for some weeks. I have learned Solr and its related projects. My
next step writing down the main steps Solr. We have separated learning
curve of Solr into two main categories.
First one is who are using it as out of the box components. Second one is
developer side.

Actually developer side branches into two way.

First one is general steps of it. i.e. document comes into Solr (i.e.
crawled data of Nutch). which analyzing processes are going to done
(stamming, hamming etc.), what will be doing after parsing step by step.
When a search query happens what happens step by step, at which step scores
are calculated so on so forth.
Second one is more code specific i.e. which handlers takes into account
data that will going to be indexed(no need the explain every handler at
this step) . Which are the analyzer, tokenizer classes and what are the
flow between them. How response handlers works and what are they.

Also explaining about cloud side is other work.

Some of explanations are currently presents at wiki (but some of them are
at very deep places at wiki and it is not easy to find the parent topic of
it, maybe starting wiki from a top age and branching all other topics as
possible as from it could be better)

If we could show the big picture, and beside of it the smaller pictures
within it, it would be great (if you know the main parts it will be easy to
go deep into the code i.e. you don't need to explain every handler, if you
show the way to the developer he/she could debug and find the needs)

When I think about myself as an example, I have to write down the steps of
Solr a bit detail  even I read many pages at wiki and a book about it, I
see that it is not easy even writing down the big picture of developer side.


2013/4/2 Alexandre Rafalovitch arafa...@gmail.com

 Yago,

 My point - perhaps lost in too much text - was that Solr is presented - and
 can function - as a black-box. Which makes it different from more
 traditional open-source project. So, the stage-2 happens exactly when the
 non-programmers have to cross the boundary from the black-box into
 code-first approach and the hand-off is not particularly smooth. Or even
 when - say - php or .Net programmer  tries to get beyond the basic
 operations their client library and has the understand the server-side
 aspects of Solr.

 Regards,
Alex.

 On Tue, Apr 2, 2013 at 1:19 PM, Yago Riveiro yago.rive...@gmail.com
 wrote:

  Alexandre,
 
  You describe the normal path when a beginner try to use a source of code
  that doesn't understand, black-box, reading code, hacking, ok now I know
  10% of the project, with lucky :p.
 


 Personal blog: http://blog.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all at
 once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)



Solr 4.2 Cloud Replication Replica has higher version than Master?

2013-04-02 Thread Jamie Johnson
I am currently looking at moving our Solr cluster to 4.2 and noticed a
strange issue while testing today.  Specifically the replica has a higher
version than the master which is causing the index to not replicate.
 Because of this the replica has fewer documents than the master.  What
could cause this and how can I resolve it short of taking down the index
and scping the right version in?

MASTER:
Last Modified:about an hour ago
Num Docs:164880
Max Doc:164880
Deleted Docs:0
Version:2387
Segment Count:23

REPLICA:
Last Modified: about an hour ago
Num Docs:164773
Max Doc:164773
Deleted Docs:0
Version:3001
Segment Count:30

in the replicas log it says this:

INFO: Creating new http client,
config:maxConnectionsPerHost=20maxConnections=1connTimeout=3socketTimeout=3retry=false

Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync sync

INFO: PeerSync: core=dsc-shard5-core2
url=http://10.38.33.17:7577/solrSTART replicas=[
http://10.38.33.16:7575/solr/dsc-shard5-core1/] nUpdates=100

Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync handleVersions

INFO: PeerSync: core=dsc-shard5-core2 url=http://10.38.33.17:7577/solr
Received 100 versions from 10.38.33.16:7575/solr/dsc-shard5-core1/

Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync handleVersions

INFO: PeerSync: core=dsc-shard5-core2 url=http://10.38.33.17:7577/solr  Our
versions are newer. ourLowThreshold=1431233788792274944
otherHigh=1431233789440294912

Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync sync

INFO: PeerSync: core=dsc-shard5-core2
url=http://10.38.33.17:7577/solrDONE. sync succeeded


which again seems to point that it thinks it has a newer version of the
index so it aborts.  This happened while having 10 threads indexing 10,000
items writing to a 6 shard (1 replica each) cluster.  Any thoughts on this
or what I should look for would be appreciated.


Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

2013-04-02 Thread Mark Miller
I don't think the versions you are thinking of apply here. Peersync does not 
look at that - it looks at version numbers for updates in the transaction log - 
it compares the last 100 of them on leader and replica. What it's saying is 
that the replica seems to have versions that the leader does not. Have you 
scanned the logs for any interesting exceptions?

Did the leader change during the heavy indexing? Did any zk session timeouts 
occur?

- Mark

On Apr 2, 2013, at 4:52 PM, Jamie Johnson jej2...@gmail.com wrote:

 I am currently looking at moving our Solr cluster to 4.2 and noticed a
 strange issue while testing today.  Specifically the replica has a higher
 version than the master which is causing the index to not replicate.
 Because of this the replica has fewer documents than the master.  What
 could cause this and how can I resolve it short of taking down the index
 and scping the right version in?
 
 MASTER:
 Last Modified:about an hour ago
 Num Docs:164880
 Max Doc:164880
 Deleted Docs:0
 Version:2387
 Segment Count:23
 
 REPLICA:
 Last Modified: about an hour ago
 Num Docs:164773
 Max Doc:164773
 Deleted Docs:0
 Version:3001
 Segment Count:30
 
 in the replicas log it says this:
 
 INFO: Creating new http client,
 config:maxConnectionsPerHost=20maxConnections=1connTimeout=3socketTimeout=3retry=false
 
 Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync sync
 
 INFO: PeerSync: core=dsc-shard5-core2
 url=http://10.38.33.17:7577/solrSTART replicas=[
 http://10.38.33.16:7575/solr/dsc-shard5-core1/] nUpdates=100
 
 Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync handleVersions
 
 INFO: PeerSync: core=dsc-shard5-core2 url=http://10.38.33.17:7577/solr
 Received 100 versions from 10.38.33.16:7575/solr/dsc-shard5-core1/
 
 Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync handleVersions
 
 INFO: PeerSync: core=dsc-shard5-core2 url=http://10.38.33.17:7577/solr  Our
 versions are newer. ourLowThreshold=1431233788792274944
 otherHigh=1431233789440294912
 
 Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync sync
 
 INFO: PeerSync: core=dsc-shard5-core2
 url=http://10.38.33.17:7577/solrDONE. sync succeeded
 
 
 which again seems to point that it thinks it has a newer version of the
 index so it aborts.  This happened while having 10 threads indexing 10,000
 items writing to a 6 shard (1 replica each) cluster.  Any thoughts on this
 or what I should look for would be appreciated.



Re: Add fuzzy to edismax specs?

2013-04-02 Thread Jan Høydahl
Note that the pf field already parses this syntax as of 4.0, but then it is 
used as a phrase-slop value. You could probably use same parsing code for qf.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

29. mars 2013 kl. 18:33 skrev Walter Underwood wun...@wunderwood.org:

 I've implemented this for the second time, so it is probably time to 
 contribute it. I find it really useful.
 
 I've extended the query spec parser for edismax to also accept a tilde and to 
 generate a FuzzyQuery. I used this at Netflix (on 1.3 with dismax), and 
 re-implemented it for 3.3 here at Chegg. We've had it in production for 
 nearly a year. I'll need to re-port this as part of our move to 4.x.
 
 Here is what the spec looks like. This expands to a fuzzy search on title 
 with a similarity of 0.75, and so on.
 
   str name=qftitle~0.75^4 long_title^4 title_stem^2 author~0.75/str
 
 I'm not 100% sure I understand the spec parser in edismax, so I'd like some 
 review when this is ready. I'd probably only do it for edismax.
 
 See: https://issues.apache.org/jira/browse/SOLR-629
 
 wunder
 --
 Walter Underwood
 wun...@wunderwood.org
 Search Guy, Chegg.com
 



Re: Solr Phonetic Search Highlight issue in search results

2013-04-02 Thread Jan Høydahl
If you want to highlight, you need to turn on highlighting for the actual field 
you search, and that field needs to be stored, i.e. hl.fl=ContentSearchPhonetic

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

1. apr. 2013 kl. 14:16 skrev Erick Erickson erickerick...@gmail.com:

 Good question, you're causing me to think... about code I know very
 little about G.
 
 So rather than spouting off, I tried it and.. it works fine for me, either 
 with
 or without using fast vector highlighter on, admittedly, a very simple test.
 
 So I think I'd try peeling off all the extra stuff you've put into your 
 configs
 (sorry, I don't have time right now to try to reproduce) and get the very
 simple case working, then build the rest back up and see where the
 problem begins.
 
 Sorry for the mis-direction!
 
 Erick
 
 
 
 On Mon, Apr 1, 2013 at 1:07 AM, Soumyanayan Kar
 soumyanayan@rebaca.com wrote:
 Hi Erick,
 
 Thanks for the reply. But help me understand this: If Solr is able to
 isolate the two documents which contain the term fact being the phonetic
 equivalent of the search term fakt, then why will it be unable to
 highlight the terms based on the same logic it uses to search the documents.
 
 Also, it is correctly highlighting the results in other searches which are
 also approximate searches and not exact ones for eg. Fuzzy or Synonym
 search. In these cases also the highlights in the search results are far
 from the actual search term but still they are getting correctly
 highlighted.
 
 Maybe I am getting it completely wrong but it looks like there is something
 wrong with my implementation.
 
 Thanks  Regards,
 
 Soumya.
 
 
 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: 27 March 2013 06:07 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr Phonetic Search Highlight issue in search results
 
 How would you expect it to highlight successfully? The term is fakt,
 there's nothing built in (and, indeed couldn't be) to un-phoneticize it into
 fact and apply that to the Content field. The whole point of phonetic
 processing is to do a lossy translation from the word into some variant,
 losing precision all the way.
 
 So this behavior is unsurprising...
 
 Best
 Erick
 
 
 
 
 On Tue, Mar 26, 2013 at 7:28 AM, Soumyanayan Kar soumyanayan@rebaca.com
 wrote:
 
 When we are issuing a query with Phonetic Search, it is returning the
 correct documents but not returning the highlights. When we use
 Stemming or Synonym searches we are getting the proper highlights.
 
 
 
 For example, when we execute a phonetic query for the term
 fakt(ContentSearchPhonetic:fakt) in the Solr Admin interface, it
 returns two documents containing the term fact(phonetic token
 equivalent), but the list of highlights is empty as shown in the
 response below.
 
 
 
response
 
lst name=responseHeader
 
int name=status0/int
 
int name=QTime16/int
 
lst name=params
 
  str name=qContentSearchPhonetic:fakt/str
 
  str name=wtxml/str
 
/lst
 
  /lst
 
result name=response numFound=2 start=0
 
doc
 
  long name=DocId1/long
 
  str name=DocTitleDoc 1/str
 
  str name=ContentAnyway, this game was excellent and was
 well worth the time.  The graphics are truly amazing and the sound
 track was pretty pleasant also. The  preacher was in  fact a
 thief./str
 
  long name=_version_1430480998833848320/long
 
/doc
 
doc
 
  long name=DocId2/long
 
  str name=DocTitleDoc 2/str
 
  str name=Contentstunning. The  preacher was in  fact an
 excellent thief who  had stolen the original manuscript of Hamlet
 from an exhibit on the  Riviera, where  he also  acquired his
 remarkable and tan./str
 
  long name=_version_1430480998841188352/long
 
/doc
 
  /result
 
  lst name=highlighting
 
lst name=1/
 
lst name=2/
 
  /lst
 
/response
 
 
 
 Relevant section of Solr schema:
 
 
 
field name=DocId type=long indexed=true stored=true
 required=true/
 
field name=DocTitle type=string indexed=false stored=true
 required=true/
 
field name=Content type=text_general indexed=false
 stored=true
 required=true/
 
 
 
field name=ContentSearch type=text_general indexed=true
 stored=false multiValued=true/
 
field name=ContentSearchStemming type=text_stem indexed=true
 stored=false multiValued=true/
 
field name=ContentSearchPhonetic type=text_phonetic
 indexed=true
 stored=false multiValued=true/
 
field name=ContentSearchSynonym type=text_synonym indexed=true
 stored=false multiValued=true/
 
 
 
uniqueKeyDocId/uniqueKey
 
copyField source=Content dest=ContentSearch/
 
copyField source=Content dest=ContentSearchStemming/
 
copyField source=Content dest=ContentSearchPhonetic/
 
copyField source=Content dest=ContentSearchSynonym/
 
 
 

Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

2013-04-02 Thread Jamie Johnson
Looking at the master it looks like at some point there were shards that
went down.  I am seeing things like what is below.

NFO: A cluster state change: WatchedEvent state:SyncConnected
type:NodeChildrenChanged path:/live_nodes, has occurred - updating... (live
nodes size: 12)
Apr 2, 2013 8:12:52 PM org.apache.solr.common.cloud.ZkStateReader$3 process
INFO: Updating live nodes... (9)
Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext
runLeaderProcess
INFO: Running the leader process.
Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext
shouldIBeLeader
INFO: Checking if I should try and be the leader.
Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext
shouldIBeLeader
INFO: My last published State was Active, it's okay to be the leader.
Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext
runLeaderProcess
INFO: I may be the new leader - try and sync



On Tue, Apr 2, 2013 at 5:09 PM, Mark Miller markrmil...@gmail.com wrote:

 I don't think the versions you are thinking of apply here. Peersync does
 not look at that - it looks at version numbers for updates in the
 transaction log - it compares the last 100 of them on leader and replica.
 What it's saying is that the replica seems to have versions that the leader
 does not. Have you scanned the logs for any interesting exceptions?

 Did the leader change during the heavy indexing? Did any zk session
 timeouts occur?

 - Mark

 On Apr 2, 2013, at 4:52 PM, Jamie Johnson jej2...@gmail.com wrote:

  I am currently looking at moving our Solr cluster to 4.2 and noticed a
  strange issue while testing today.  Specifically the replica has a higher
  version than the master which is causing the index to not replicate.
  Because of this the replica has fewer documents than the master.  What
  could cause this and how can I resolve it short of taking down the index
  and scping the right version in?
 
  MASTER:
  Last Modified:about an hour ago
  Num Docs:164880
  Max Doc:164880
  Deleted Docs:0
  Version:2387
  Segment Count:23
 
  REPLICA:
  Last Modified: about an hour ago
  Num Docs:164773
  Max Doc:164773
  Deleted Docs:0
  Version:3001
  Segment Count:30
 
  in the replicas log it says this:
 
  INFO: Creating new http client,
 
 config:maxConnectionsPerHost=20maxConnections=1connTimeout=3socketTimeout=3retry=false
 
  Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync sync
 
  INFO: PeerSync: core=dsc-shard5-core2
  url=http://10.38.33.17:7577/solrSTART replicas=[
  http://10.38.33.16:7575/solr/dsc-shard5-core1/] nUpdates=100
 
  Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync handleVersions
 
  INFO: PeerSync: core=dsc-shard5-core2 url=http://10.38.33.17:7577/solr
  Received 100 versions from 10.38.33.16:7575/solr/dsc-shard5-core1/
 
  Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync handleVersions
 
  INFO: PeerSync: core=dsc-shard5-core2 url=http://10.38.33.17:7577/solr Our
  versions are newer. ourLowThreshold=1431233788792274944
  otherHigh=1431233789440294912
 
  Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync sync
 
  INFO: PeerSync: core=dsc-shard5-core2
  url=http://10.38.33.17:7577/solrDONE. sync succeeded
 
 
  which again seems to point that it thinks it has a newer version of the
  index so it aborts.  This happened while having 10 threads indexing
 10,000
  items writing to a 6 shard (1 replica each) cluster.  Any thoughts on
 this
  or what I should look for would be appreciated.




Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

2013-04-02 Thread Jamie Johnson
here is another one that looks interesting

Apr 2, 2013 7:27:14 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: ClusterState says we are the
leader, but locally we don't think so
at
org.apache.solr.update.processor.DistributedUpdateProcessor.doDefensiveChecks(DistributedUpdateProcessor.java:293)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:228)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:339)
at
org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
at
org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:246)
at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
at
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1797)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343)



On Tue, Apr 2, 2013 at 5:41 PM, Jamie Johnson jej2...@gmail.com wrote:

 Looking at the master it looks like at some point there were shards that
 went down.  I am seeing things like what is below.

 NFO: A cluster state change: WatchedEvent state:SyncConnected
 type:NodeChildrenChanged path:/live_nodes, has occurred - updating... (live
 nodes size: 12)
 Apr 2, 2013 8:12:52 PM org.apache.solr.common.cloud.ZkStateReader$3 process
 INFO: Updating live nodes... (9)
 Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext
 runLeaderProcess
 INFO: Running the leader process.
 Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext
 shouldIBeLeader
 INFO: Checking if I should try and be the leader.
 Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext
 shouldIBeLeader
 INFO: My last published State was Active, it's okay to be the leader.
 Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext
 runLeaderProcess
 INFO: I may be the new leader - try and sync



 On Tue, Apr 2, 2013 at 5:09 PM, Mark Miller markrmil...@gmail.com wrote:

 I don't think the versions you are thinking of apply here. Peersync does
 not look at that - it looks at version numbers for updates in the
 transaction log - it compares the last 100 of them on leader and replica.
 What it's saying is that the replica seems to have versions that the leader
 does not. Have you scanned the logs for any interesting exceptions?

 Did the leader change during the heavy indexing? Did any zk session
 timeouts occur?

 - Mark

 On Apr 2, 2013, at 4:52 PM, Jamie Johnson jej2...@gmail.com wrote:

  I am currently looking at moving our Solr cluster to 4.2 and noticed a
  strange issue while testing today.  Specifically the replica has a
 higher
  version than the master which is causing the index to not replicate.
  Because of this the replica has fewer documents than the master.  What
  could cause this and how can I resolve it short of taking down the index
  and scping the right version in?
 
  MASTER:
  Last Modified:about an hour ago
  Num Docs:164880
  Max Doc:164880
  Deleted Docs:0
  Version:2387
  Segment Count:23
 
  REPLICA:
  Last Modified: about an hour ago
  Num Docs:164773
  Max Doc:164773
  Deleted Docs:0
  Version:3001
  Segment Count:30
 
  in the replicas log it says this:
 
  INFO: Creating new http client,
 
 config:maxConnectionsPerHost=20maxConnections=1connTimeout=3socketTimeout=3retry=false
 
  Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync sync
 
  INFO: PeerSync: core=dsc-shard5-core2
  url=http://10.38.33.17:7577/solrSTART replicas=[
  http://10.38.33.16:7575/solr/dsc-shard5-core1/] nUpdates=100
 
  Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync handleVersions
 
  INFO: PeerSync: core=dsc-shard5-core2 url=http://10.38.33.17:7577/solr
  Received 100 versions from 10.38.33.16:7575/solr/dsc-shard5-core1/
 
  Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync handleVersions
 
  INFO: PeerSync: core=dsc-shard5-core2 url=http://10.38.33.17:7577/solr Our
  versions are newer. ourLowThreshold=1431233788792274944
  otherHigh=1431233789440294912
 
  Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync sync
 
  INFO: PeerSync: core=dsc-shard5-core2
  url=http://10.38.33.17:7577/solrDONE. sync succeeded
 
 
  which again seems to point that it thinks it has a newer version of the
  index so it aborts.  This happened while having 10 threads indexing
 10,000
  items writing to a 6 shard (1 replica each) cluster.  Any thoughts on
 this
  or what I should 

Lengthy description is converted to hash symbols

2013-04-02 Thread Danny Watari
Hi, I have a field that is defined to be of type text_en.  Occasionally, I
notice that lengthy strings are converted to hash symbols.  Here is a
snippet of my field type:

fieldType name=text_en class=solr.TextField positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldType

field name=description type=text_en indexed=true stored=true
required=false /

Here is an example of the field's value:
str
name=description###/str


Any ideas why this might be happening?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Lengthy-description-is-converted-to-hash-symbols-tp4053338.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

2013-04-02 Thread Jamie Johnson
sorry for spamming here

shard5-core2 is the instance we're having issues with...

Apr 2, 2013 7:27:14 PM org.apache.solr.common.SolrException log
SEVERE: shard update error StdNode:
http://10.38.33.17:7577/solr/dsc-shard5-core2/:org.apache.solr.common.SolrException:
Server at http://10.38.33.17:7577/solr/dsc-shard5-core2 returned non ok
status:503, message:Service Unavailable
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:373)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
at
org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:332)
at
org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:306)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)


On Tue, Apr 2, 2013 at 5:43 PM, Jamie Johnson jej2...@gmail.com wrote:

 here is another one that looks interesting

 Apr 2, 2013 7:27:14 PM org.apache.solr.common.SolrException log
 SEVERE: org.apache.solr.common.SolrException: ClusterState says we are the
 leader, but locally we don't think so
 at
 org.apache.solr.update.processor.DistributedUpdateProcessor.doDefensiveChecks(DistributedUpdateProcessor.java:293)
 at
 org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:228)
 at
 org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:339)
 at
 org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
 at
 org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:246)
 at
 org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
 at
 org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
 at
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
 at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1797)
 at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343)



 On Tue, Apr 2, 2013 at 5:41 PM, Jamie Johnson jej2...@gmail.com wrote:

 Looking at the master it looks like at some point there were shards that
 went down.  I am seeing things like what is below.

 NFO: A cluster state change: WatchedEvent state:SyncConnected
 type:NodeChildrenChanged path:/live_nodes, has occurred - updating... (live
 nodes size: 12)
 Apr 2, 2013 8:12:52 PM org.apache.solr.common.cloud.ZkStateReader$3
 process
 INFO: Updating live nodes... (9)
 Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext
 runLeaderProcess
 INFO: Running the leader process.
 Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext
 shouldIBeLeader
 INFO: Checking if I should try and be the leader.
 Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext
 shouldIBeLeader
 INFO: My last published State was Active, it's okay to be the leader.
 Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext
 runLeaderProcess
 INFO: I may be the new leader - try and sync



 On Tue, Apr 2, 2013 at 5:09 PM, Mark Miller markrmil...@gmail.comwrote:

 I don't think the versions you are thinking of apply here. Peersync does
 not look at that - it looks at version numbers for updates in the
 transaction log - it compares the last 100 of them on leader and replica.
 What it's saying is that the replica seems to have versions that the leader
 does not. Have you scanned the logs for any interesting exceptions?

 Did the leader change during the heavy indexing? Did any zk session
 timeouts occur?

 - Mark

 On Apr 2, 2013, at 4:52 PM, Jamie Johnson jej2...@gmail.com wrote:

  I am currently looking at moving our Solr cluster to 4.2 and noticed a
  strange issue while testing today.  Specifically the replica has a
 higher
  version than the master which is causing the index to not replicate.
  Because of this the replica has fewer documents than the master.  What
  could cause this and how can I resolve it short of taking down the
 index
  and scping the right version in?
 
  MASTER:
  Last Modified:about an hour ago
  Num Docs:164880
  Max 

Re: Lengthy description is converted to hash symbols

2013-04-02 Thread Jack Krupansky
Can you enter the text on the Solr Admin UI Analysis page? Then you could 
tell which stage the issue occurs.


StandardTokenizer has a default token length limit of 255. You can override 
with the maxTokenLength attribute:


   tokenizer class=solr.StandardTokenizerFactory 
maxTokenLength=1024 /


See:
https://lucene.apache.org/core/4_2_0/analyzers-common/org/apache/lucene/analysis/standard/StandardTokenizerFactory.html

But the # sounds like a bug.

-- Jack Krupansky

-Original Message- 
From: Danny Watari

Sent: Tuesday, April 02, 2013 5:45 PM
To: solr-user@lucene.apache.org
Subject: Lengthy description is converted to hash symbols

Hi, I have a field that is defined to be of type text_en.  Occasionally, I
notice that lengthy strings are converted to hash symbols.  Here is a
snippet of my field type:

fieldType name=text_en class=solr.TextField positionIncrementGap=100
 analyzer type=index
   tokenizer class=solr.StandardTokenizerFactory/
   filter class=solr.LowerCaseFilterFactory/
 /analyzer
 analyzer type=query
   tokenizer class=solr.StandardTokenizerFactory/
   filter class=solr.LowerCaseFilterFactory/
 /analyzer
/fieldType

field name=description type=text_en indexed=true stored=true
required=false /

Here is an example of the field's value:
str
name=description###/str


Any ideas why this might be happening?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Lengthy-description-is-converted-to-hash-symbols-tp4053338.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Lengthy description is converted to hash symbols

2013-04-02 Thread Chris Hostetter

: Here is an example of the field's value:
: str
: 
name=description###/str

where are you getting that str ... / from? if that's what you see when 
you do a search for a document, then it has nothing to do with your 
fieldType or analyzer -- the strings returned from searches are the 
stored values, which are not modified by the analyzer at all.

What does your indexing code/process look like?
Do you have any custom UpdateProcessors?

details, details, details.

-Hoss


Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

2013-04-02 Thread Jamie Johnson
Sorry I didn't ask the obvious question.  Is there anything else that I
should be looking for here and is this a bug?  I'd be happy to troll
through the logs further if more information is needed, just let me know.

Also what is the most appropriate mechanism to fix this.  Is it required to
kill the index that is out of sync and let solr resync things?


On Tue, Apr 2, 2013 at 5:45 PM, Jamie Johnson jej2...@gmail.com wrote:

 sorry for spamming here

 shard5-core2 is the instance we're having issues with...

 Apr 2, 2013 7:27:14 PM org.apache.solr.common.SolrException log
 SEVERE: shard update error StdNode:
 http://10.38.33.17:7577/solr/dsc-shard5-core2/:org.apache.solr.common.SolrException:
 Server at http://10.38.33.17:7577/solr/dsc-shard5-core2 returned non ok
 status:503, message:Service Unavailable
 at
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:373)
 at
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
 at
 org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:332)
 at
 org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:306)
 at
 java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
 at
 java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)


 On Tue, Apr 2, 2013 at 5:43 PM, Jamie Johnson jej2...@gmail.com wrote:

 here is another one that looks interesting

 Apr 2, 2013 7:27:14 PM org.apache.solr.common.SolrException log
 SEVERE: org.apache.solr.common.SolrException: ClusterState says we are
 the leader, but locally we don't think so
 at
 org.apache.solr.update.processor.DistributedUpdateProcessor.doDefensiveChecks(DistributedUpdateProcessor.java:293)
 at
 org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:228)
 at
 org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:339)
 at
 org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
 at
 org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:246)
 at
 org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
 at
 org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
 at
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
 at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1797)
 at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343)



 On Tue, Apr 2, 2013 at 5:41 PM, Jamie Johnson jej2...@gmail.com wrote:

 Looking at the master it looks like at some point there were shards that
 went down.  I am seeing things like what is below.

 NFO: A cluster state change: WatchedEvent state:SyncConnected
 type:NodeChildrenChanged path:/live_nodes, has occurred - updating... (live
 nodes size: 12)
 Apr 2, 2013 8:12:52 PM org.apache.solr.common.cloud.ZkStateReader$3
 process
 INFO: Updating live nodes... (9)
 Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext
 runLeaderProcess
 INFO: Running the leader process.
 Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext
 shouldIBeLeader
 INFO: Checking if I should try and be the leader.
 Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext
 shouldIBeLeader
 INFO: My last published State was Active, it's okay to be the leader.
 Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext
 runLeaderProcess
 INFO: I may be the new leader - try and sync



 On Tue, Apr 2, 2013 at 5:09 PM, Mark Miller markrmil...@gmail.comwrote:

 I don't think the versions you are thinking of apply here. Peersync
 does not look at that - it looks at version numbers for updates in the
 transaction log - it compares the last 100 of them on leader and replica.
 What it's saying is that the replica seems to have versions that the leader
 does not. Have you scanned the logs for any interesting exceptions?

 Did the leader change during the heavy indexing? Did any zk session
 timeouts occur?

 - Mark

 On Apr 2, 2013, at 4:52 PM, Jamie Johnson jej2...@gmail.com wrote:

  I am currently looking 

Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

2013-04-02 Thread Mark Miller
It would appear it's a bug given what you have said.

Any other exceptions would be useful. Might be best to start tracking in a JIRA 
issue as well.

To fix, I'd bring the behind node down and back again.

Unfortunately, I'm pressed for time, but we really need to get to the bottom of 
this and fix it, or determine if it's fixed in 4.2.1 (spreading to mirrors now).

- Mark

On Apr 2, 2013, at 7:21 PM, Jamie Johnson jej2...@gmail.com wrote:

 Sorry I didn't ask the obvious question.  Is there anything else that I
 should be looking for here and is this a bug?  I'd be happy to troll
 through the logs further if more information is needed, just let me know.
 
 Also what is the most appropriate mechanism to fix this.  Is it required to
 kill the index that is out of sync and let solr resync things?
 
 
 On Tue, Apr 2, 2013 at 5:45 PM, Jamie Johnson jej2...@gmail.com wrote:
 
 sorry for spamming here
 
 shard5-core2 is the instance we're having issues with...
 
 Apr 2, 2013 7:27:14 PM org.apache.solr.common.SolrException log
 SEVERE: shard update error StdNode:
 http://10.38.33.17:7577/solr/dsc-shard5-core2/:org.apache.solr.common.SolrException:
 Server at http://10.38.33.17:7577/solr/dsc-shard5-core2 returned non ok
 status:503, message:Service Unavailable
at
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:373)
at
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
at
 org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:332)
at
 org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:306)
at
 java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
at
 java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
 
 
 On Tue, Apr 2, 2013 at 5:43 PM, Jamie Johnson jej2...@gmail.com wrote:
 
 here is another one that looks interesting
 
 Apr 2, 2013 7:27:14 PM org.apache.solr.common.SolrException log
 SEVERE: org.apache.solr.common.SolrException: ClusterState says we are
 the leader, but locally we don't think so
at
 org.apache.solr.update.processor.DistributedUpdateProcessor.doDefensiveChecks(DistributedUpdateProcessor.java:293)
at
 org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:228)
at
 org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:339)
at
 org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
at
 org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:246)
at
 org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
at
 org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1797)
at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637)
at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343)
 
 
 
 On Tue, Apr 2, 2013 at 5:41 PM, Jamie Johnson jej2...@gmail.com wrote:
 
 Looking at the master it looks like at some point there were shards that
 went down.  I am seeing things like what is below.
 
 NFO: A cluster state change: WatchedEvent state:SyncConnected
 type:NodeChildrenChanged path:/live_nodes, has occurred - updating... (live
 nodes size: 12)
 Apr 2, 2013 8:12:52 PM org.apache.solr.common.cloud.ZkStateReader$3
 process
 INFO: Updating live nodes... (9)
 Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext
 runLeaderProcess
 INFO: Running the leader process.
 Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext
 shouldIBeLeader
 INFO: Checking if I should try and be the leader.
 Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext
 shouldIBeLeader
 INFO: My last published State was Active, it's okay to be the leader.
 Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext
 runLeaderProcess
 INFO: I may be the new leader - try and sync
 
 
 
 On Tue, Apr 2, 2013 at 5:09 PM, Mark Miller markrmil...@gmail.comwrote:
 
 I don't think the versions you are thinking of apply here. Peersync
 does not look at that - it looks at 

RequestHandler.. Conditional components

2013-04-02 Thread venkata
In our use cases,  for certain query terms, we want to redirect the query
processing to external system
 for the rest of the keywords, we want to continue with query component ,
facets etc.

Based on some condition it is possible to skip some components in a request
handler?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/RequestHandler-Conditional-components-tp4053381.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: MoreLikeThis - Odd results - what am I doing wrong?

2013-04-02 Thread David Parks
Isn't this an AWS security groups question? You should probably post this 
question on the AWS forums, but for the moment, here's the basic reading 
material - go set up your EC2 security groups and lock down your systems.


http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-network-security.html

If you just want to password protect Solr here are the instructions:

http://wiki.apache.org/solr/SolrSecurity

But I most certainly would not leave it open to the world even with a password 
(note that the basic password authentication sends passwords in clear text if 
you're not using HTTPS, best lock the thing down behind a firewall).

Dave


-Original Message-
From: DC tech [mailto:dctech1...@gmail.com] 
Sent: Tuesday, April 02, 2013 1:02 PM
To: solr-user@lucene.apache.org
Subject: Re: MoreLikeThis - Odd results - what am I doing wrong?

OK - so I have my SOLR instance running on AWS. 
Any suggestions on how to safely share the link?  Right now, the whole SOLR 
instance is totally open. 



Gagandeep singh gagan.g...@gmail.com wrote:

say debugQuery=truemlt=true and see the scores for the MLT query, not 
a sample query. You can use Amazon ec2 to bring up your solr, you 
should be able to get a micro instance for free trial.


On Mon, Apr 1, 2013 at 5:10 AM, dc tech dctech1...@gmail.com wrote:

 I did try the raw query against the *simi* field and those seem to 
 return results in the order expected.
 For instance, Acura MDX has  ( large, SUV, 4WD   Luxury) in the simi field.
 Running a query with those words against the simi field returns the 
 expected models (X5, Audi Q5, etc) and then the subsequent documents 
 have decreasing relevance. So the basic query mechanism seems to be fine.

 The issue just seems to be with MoreLikeThis component and handler.
 I can post the index on a public SOLR instance - any suggestions? (or 
 for
 hosting)


 On Sun, Mar 31, 2013 at 1:54 PM, Gagandeep singh 
 gagan.g...@gmail.com
 wrote:

  If you can bring up your solr setup on a public machine then im 
  sure a
 lot
  of debugging can be done. Without that, i think what you should 
  look at
 is
  the tf-idf scores of the terms like camry etc. Usually idf is the 
  deciding factor into which results show at the top (tf should be 1 
  for
 your
  data).
  Enable debugQuery=true and look at explain section to see show 
  score is getting calculated.
 
  You should try giving different boosts to class, type, drive, size 
  to control the results.
 
 
  On Sun, Mar 31, 2013 at 8:52 PM, dc tech dctech1...@gmail.com wrote:
 
  I am running some experiments on more like this and the results 
  seem rather odd - I am doing something wrong but just cannot figure out 
  what.
  Basically, the similarity results are decent - but not great.
 
  *Issue 1  = Quality*
  Toyota Camry : finds Altima (good) but then next one is Camry 
  Hybrid whereas it should have found Accord.
  I have normalized the data into a simi field which has only the 
  attributes that I care about.
  Without the simi field, I could not get mlt.qf boosts to work well
 enough
  to return results
 
  *Issue 2*
  Some fields do not work at all. For instance, text+simi (in 
  mlt.fl)
 works
  whereas just simi does not.
  So some weirdness that am just not understanding.
 
  Would be grateful for your guidance !
 
 
  Here is the setup:
  *1. SOLR Version*
  solr-spec 4.2.0.2013.03.06.22.32.13
  solr-impl 4.2.0 1453694   rmuir - 2013-03-06 22:32:13
  lucene-spec 4.2.0
  lucene-impl 4.2.0 1453694 -  rmuir - 2013-03-06 22:25:29
 
  *2. Machine Information*
  Sun Microsystems Inc. Java HotSpot(TM) 64-Bit Server VM (1.6.0_23
  19.0-b09)
  Windows 7 Home 64 Bit with 4 GB RAM
 
  *3. Sample Data *
  I created this 'dummy' data of cars  - the idea being that these 
  would
 be
  sufficient and simple to generate similarity and understand how it 
  would work.
  There are 181 rows in the data set (I have attached it for 
  reference in CSV format)
 
  [image: Inline image 1]
 
  *4. SCHEMA*
  *Field Definitions*
 field name=id type=string indexed=true stored=true
  termVectors=true multiValued=false/
 field name=make type=string indexed=true stored=true
  termVectors=true multiValued=false/
 field name=model type=string indexed=true stored=true
  termVectors=true multiValued=false/
 field name=class type=string indexed=true stored=true
  termVectors=true multiValued=false/
 field name=type type=string indexed=true stored=true
  termVectors=true multiValued=false/
 field name=drive type=string indexed=true stored=true
  termVectors=true multiValued=false/
 field name=comment type=text_general indexed=true
 stored=true
  termVectors=true multiValued=true/
 field name=size type=string indexed=true stored=true
  termVectors=true multiValued=false/
  *
  *
  *Copy Fields*
  copyField   source=make dest=make_en   /  !-- Search  --
  copyField   source=model dest=model_en   /  !-- Search  --
  copyField   

Re: Confusion over Solr highlight hl.q parameter

2013-04-02 Thread Koji Sekiguchi
(13/04/03 5:27), Van Tassell, Kristian wrote:
 Thanks Koji, this helped with some of our problems, but it is still not 
 perfect.
 
 This query, for example, returns no highlighting:
 
 ?q=id:abc123hl.q=text_it_IT:l'assiemehl.fl=text_it_IThl=truedefType=edismax
 
 But this one does (when it is, in effect, the same query):
 
 ?q=text_it_IT:l'assiemehl=truedefType=edismaxhl.fl=text_it_IT
 
 I've tried many combinations but can't seem to get the right one to work. Is 
 this possibly a bug?

As hl.q doesn't care defType parameter but does localParams,
can you try to put {!edismax} to hl.q parameter?

koji
-- 
http://soleami.com/blog/lucene-4-is-super-convenient-for-developing-nlp-tools.html


Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

2013-04-02 Thread Jamie Johnson
I brought the bad one down and back up and it did nothing.  I can clear the
index and try4.2.1. I will save off the logs and see if there is anything
else odd
On Apr 2, 2013 9:13 PM, Mark Miller markrmil...@gmail.com wrote:

 It would appear it's a bug given what you have said.

 Any other exceptions would be useful. Might be best to start tracking in a
 JIRA issue as well.

 To fix, I'd bring the behind node down and back again.

 Unfortunately, I'm pressed for time, but we really need to get to the
 bottom of this and fix it, or determine if it's fixed in 4.2.1 (spreading
 to mirrors now).

 - Mark

 On Apr 2, 2013, at 7:21 PM, Jamie Johnson jej2...@gmail.com wrote:

  Sorry I didn't ask the obvious question.  Is there anything else that I
  should be looking for here and is this a bug?  I'd be happy to troll
  through the logs further if more information is needed, just let me know.
 
  Also what is the most appropriate mechanism to fix this.  Is it required
 to
  kill the index that is out of sync and let solr resync things?
 
 
  On Tue, Apr 2, 2013 at 5:45 PM, Jamie Johnson jej2...@gmail.com wrote:
 
  sorry for spamming here
 
  shard5-core2 is the instance we're having issues with...
 
  Apr 2, 2013 7:27:14 PM org.apache.solr.common.SolrException log
  SEVERE: shard update error StdNode:
 
 http://10.38.33.17:7577/solr/dsc-shard5-core2/:org.apache.solr.common.SolrException
 :
  Server at http://10.38.33.17:7577/solr/dsc-shard5-core2 returned non ok
  status:503, message:Service Unavailable
 at
 
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:373)
 at
 
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
 at
 
 org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:332)
 at
 
 org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:306)
 at
  java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at
  java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
 at
  java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at
 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 
 
  On Tue, Apr 2, 2013 at 5:43 PM, Jamie Johnson jej2...@gmail.com
 wrote:
 
  here is another one that looks interesting
 
  Apr 2, 2013 7:27:14 PM org.apache.solr.common.SolrException log
  SEVERE: org.apache.solr.common.SolrException: ClusterState says we are
  the leader, but locally we don't think so
 at
 
 org.apache.solr.update.processor.DistributedUpdateProcessor.doDefensiveChecks(DistributedUpdateProcessor.java:293)
 at
 
 org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:228)
 at
 
 org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:339)
 at
 
 org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
 at
 
 org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:246)
 at
  org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
 at
 
 org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
 at
 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
 at
 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1797)
 at
 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637)
 at
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343)
 
 
 
  On Tue, Apr 2, 2013 at 5:41 PM, Jamie Johnson jej2...@gmail.com
 wrote:
 
  Looking at the master it looks like at some point there were shards
 that
  went down.  I am seeing things like what is below.
 
  NFO: A cluster state change: WatchedEvent state:SyncConnected
  type:NodeChildrenChanged path:/live_nodes, has occurred - updating...
 (live
  nodes size: 12)
  Apr 2, 2013 8:12:52 PM org.apache.solr.common.cloud.ZkStateReader$3
  process
  INFO: Updating live nodes... (9)
  Apr 2, 2013 8:12:52 PM
 org.apache.solr.cloud.ShardLeaderElectionContext
  runLeaderProcess
  INFO: Running the leader process.
  Apr 2, 2013 8:12:52 PM
 org.apache.solr.cloud.ShardLeaderElectionContext
  shouldIBeLeader
  INFO: Checking if I should try and be the leader.
  Apr 2, 2013 8:12:52 PM
 org.apache.solr.cloud.ShardLeaderElectionContext
  shouldIBeLeader
  INFO: My last published State was Active, it's okay to 

Re: Solr 4.2 Cloud Replication Replica has higher version than Master?

2013-04-02 Thread Jamie Johnson
Mark
It's there a particular jira issue that you think may address this? I read
through it quickly but didn't see one that jumped out
On Apr 2, 2013 10:07 PM, Jamie Johnson jej2...@gmail.com wrote:

 I brought the bad one down and back up and it did nothing.  I can clear
 the index and try4.2.1. I will save off the logs and see if there is
 anything else odd
 On Apr 2, 2013 9:13 PM, Mark Miller markrmil...@gmail.com wrote:

 It would appear it's a bug given what you have said.

 Any other exceptions would be useful. Might be best to start tracking in
 a JIRA issue as well.

 To fix, I'd bring the behind node down and back again.

 Unfortunately, I'm pressed for time, but we really need to get to the
 bottom of this and fix it, or determine if it's fixed in 4.2.1 (spreading
 to mirrors now).

 - Mark

 On Apr 2, 2013, at 7:21 PM, Jamie Johnson jej2...@gmail.com wrote:

  Sorry I didn't ask the obvious question.  Is there anything else that I
  should be looking for here and is this a bug?  I'd be happy to troll
  through the logs further if more information is needed, just let me
 know.
 
  Also what is the most appropriate mechanism to fix this.  Is it
 required to
  kill the index that is out of sync and let solr resync things?
 
 
  On Tue, Apr 2, 2013 at 5:45 PM, Jamie Johnson jej2...@gmail.com
 wrote:
 
  sorry for spamming here
 
  shard5-core2 is the instance we're having issues with...
 
  Apr 2, 2013 7:27:14 PM org.apache.solr.common.SolrException log
  SEVERE: shard update error StdNode:
 
 http://10.38.33.17:7577/solr/dsc-shard5-core2/:org.apache.solr.common.SolrException
 :
  Server at http://10.38.33.17:7577/solr/dsc-shard5-core2 returned non
 ok
  status:503, message:Service Unavailable
 at
 
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:373)
 at
 
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
 at
 
 org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:332)
 at
 
 org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:306)
 at
  java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at
  java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
 at
  java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at
 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 
 
  On Tue, Apr 2, 2013 at 5:43 PM, Jamie Johnson jej2...@gmail.com
 wrote:
 
  here is another one that looks interesting
 
  Apr 2, 2013 7:27:14 PM org.apache.solr.common.SolrException log
  SEVERE: org.apache.solr.common.SolrException: ClusterState says we are
  the leader, but locally we don't think so
 at
 
 org.apache.solr.update.processor.DistributedUpdateProcessor.doDefensiveChecks(DistributedUpdateProcessor.java:293)
 at
 
 org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:228)
 at
 
 org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:339)
 at
 
 org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
 at
 
 org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:246)
 at
  org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
 at
 
 org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
 at
 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
 at
 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1797)
 at
 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637)
 at
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343)
 
 
 
  On Tue, Apr 2, 2013 at 5:41 PM, Jamie Johnson jej2...@gmail.com
 wrote:
 
  Looking at the master it looks like at some point there were shards
 that
  went down.  I am seeing things like what is below.
 
  NFO: A cluster state change: WatchedEvent state:SyncConnected
  type:NodeChildrenChanged path:/live_nodes, has occurred -
 updating... (live
  nodes size: 12)
  Apr 2, 2013 8:12:52 PM org.apache.solr.common.cloud.ZkStateReader$3
  process
  INFO: Updating live nodes... (9)
  Apr 2, 2013 8:12:52 PM
 org.apache.solr.cloud.ShardLeaderElectionContext
  runLeaderProcess
  INFO: Running the leader process.
  Apr 2, 2013 8:12:52 PM
 org.apache.solr.cloud.ShardLeaderElectionContext
  

Re: WADL for REST service?

2013-04-02 Thread Otis Gospodnetic
Hi Peter,

I'm afraid we don't have anything that formal... almost empty:
http://search-lucene.com/?q=wadlfc_project=Solr

Otis
--
Solr  ElasticSearch Support
http://sematext.com/





On Tue, Apr 2, 2013 at 6:38 AM, Peter Schütt newsgro...@pstt.de wrote:
 Hallo,

 does a WADL exists for the REST service of SOLR?

 Ciao
   Peter Schütt



solre scores remains same for exact match and nearly exact match

2013-04-02 Thread amit

Below is my query
http://localhost:8983/solr/select/?q=subject:session management in
phpfq=category:[*%20TO%20*]fl=category,score,subject

The result is like below

?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeader
int name=status0/int
int name=QTime983/int
lst name=params
str name=fqcategory:[* TO *]/str
str name=qsubject:session management in php/str
str name=flcategory,score,subject/str
/lst
/lst
result name=response maxScore=0.8770298 start=0 numFound=2
doc
float name=score0.8770298/float
str name=categoryAnnapurnap/str
str name=subjectsession management in asp.net/str
/doc

doc
float name=score0.8770298/float
str name=categoryAnnapurnap/str
str name=subjectsession management in PHP/str
/doc
/result 
/response

The question is how come both have the same score when 1 is exact match and
the other isn't.
This is the schema
field name=subject type=text_en_splitting indexed=true
stored=true/
field name=category type=text_general indexed=true stored=true/





--
View this message in context: 
http://lucene.472066.n3.nabble.com/solre-scores-remains-same-for-exact-match-and-nearly-exact-match-tp4053406.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Solr Phonetic Search Highlight issue in search results

2013-04-02 Thread Soumyanayan Kar
Thanks a lot Erick for trying this out.

Will wait for a reply from your end.

Thanks  Regards,

Soumya.


-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: 01 April 2013 05:46 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr Phonetic Search Highlight issue in search results

Good question, you're causing me to think... about code I know very little
about G.

So rather than spouting off, I tried it and.. it works fine for me, either
with or without using fast vector highlighter on, admittedly, a very simple
test.

So I think I'd try peeling off all the extra stuff you've put into your
configs (sorry, I don't have time right now to try to reproduce) and get the
very simple case working, then build the rest back up and see where the
problem begins.

Sorry for the mis-direction!

Erick



On Mon, Apr 1, 2013 at 1:07 AM, Soumyanayan Kar soumyanayan@rebaca.com
wrote:
 Hi Erick,

 Thanks for the reply. But help me understand this: If Solr is able to 
 isolate the two documents which contain the term fact being the 
 phonetic equivalent of the search term fakt, then why will it be 
 unable to highlight the terms based on the same logic it uses to search
the documents.

 Also, it is correctly highlighting the results in other searches which 
 are also approximate searches and not exact ones for eg. Fuzzy or 
 Synonym search. In these cases also the highlights in the search 
 results are far from the actual search term but still they are getting 
 correctly highlighted.

 Maybe I am getting it completely wrong but it looks like there is 
 something wrong with my implementation.

 Thanks  Regards,

 Soumya.


 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: 27 March 2013 06:07 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr Phonetic Search Highlight issue in search results

 How would you expect it to highlight successfully? The term is fakt, 
 there's nothing built in (and, indeed couldn't be) to un-phoneticize 
 it into fact and apply that to the Content field. The whole point of 
 phonetic processing is to do a lossy translation from the word into 
 some variant, losing precision all the way.

 So this behavior is unsurprising...

 Best
 Erick




 On Tue, Mar 26, 2013 at 7:28 AM, Soumyanayan Kar 
 soumyanayan@rebaca.com
 wrote:

 When we are issuing a query with Phonetic Search, it is returning the 
 correct documents but not returning the highlights. When we use 
 Stemming or Synonym searches we are getting the proper highlights.



 For example, when we execute a phonetic query for the term
 fakt(ContentSearchPhonetic:fakt) in the Solr Admin interface, it 
 returns two documents containing the term fact(phonetic token 
 equivalent), but the list of highlights is empty as shown in the 
 response below.



 response

 lst name=responseHeader

 int name=status0/int

 int name=QTime16/int

 lst name=params

   str name=qContentSearchPhonetic:fakt/str

   str name=wtxml/str

 /lst

   /lst

 result name=response numFound=2 start=0

 doc

   long name=DocId1/long

   str name=DocTitleDoc 1/str

   str name=ContentAnyway, this game was excellent and was 
 well worth the time.  The graphics are truly amazing and the sound 
 track was pretty pleasant also. The  preacher was in  fact a 
 thief./str

   long name=_version_1430480998833848320/long

 /doc

 doc

   long name=DocId2/long

   str name=DocTitleDoc 2/str

   str name=Contentstunning. The  preacher was in  fact an 
 excellent thief who  had stolen the original manuscript of Hamlet 
 from an exhibit on the  Riviera, where  he also  acquired his 
 remarkable and tan./str

   long name=_version_1430480998841188352/long

 /doc

   /result

   lst name=highlighting

 lst name=1/

 lst name=2/

   /lst

 /response



 Relevant section of Solr schema:



 field name=DocId type=long indexed=true stored=true
 required=true/

 field name=DocTitle type=string indexed=false stored=true
 required=true/

 field name=Content type=text_general indexed=false
 stored=true
 required=true/



 field name=ContentSearch type=text_general indexed=true
 stored=false multiValued=true/

 field name=ContentSearchStemming type=text_stem indexed=true
 stored=false multiValued=true/

 field name=ContentSearchPhonetic type=text_phonetic
 indexed=true
 stored=false multiValued=true/

 field name=ContentSearchSynonym type=text_synonym indexed=true
 stored=false multiValued=true/



 uniqueKeyDocId/uniqueKey

 copyField source=Content dest=ContentSearch/

 copyField source=Content dest=ContentSearchStemming/

 copyField source=Content dest=ContentSearchPhonetic/

 copyField source=Content dest=ContentSearchSynonym/



 fieldType name=text_stem class=solr.TextField 

   analyzer