Re: SolrCloud

2012-04-02 Thread asia
Thanks for replying,
So if i will make a replica of each shard,then should I use zookeeper for
every shards and replica or only for the replica.! more question i want to
ask is that I am using solr in tomcat and eclipse environment using solrj.so
I am a bit confuse as to how to use zookeeper in it along with tomcat.I have
downloaded zookeeper jar files also but need little help in it.
-Asia

--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-tp3867086p3876869.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud new....

2012-04-02 Thread asia
Hello,
even I am working on the same,I have tried with the wiki example but I am
getting errors.I want to use zookeeper with solrj in eclipse using
tomcat.Need little help.as to how to integrate zookeeper in eclipse for
solrcloud.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-new-tp1528872p3876928.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Responding to Requests with Chunks/Streaming

2012-04-02 Thread Mikhail Khludnev
Hello,

Small update - reading streamed response is done via callback. No
SolrDocumentList in memory.
https://github.com/m-khl/solr-patches/tree/streaming
here is the test
https://github.com/m-khl/solr-patches/blob/d028d4fabe0c20cb23f16098637e2961e9e2366e/solr/core/src/test/org/apache/solr/response/ResponseStreamingTest.java#L138

no progress in distributed search via streaming yet.

Pls let me know if you don't want to have updates from my playground.

Regards

On Thu, Mar 29, 2012 at 1:02 PM, Mikhail Khludnev 
mkhlud...@griddynamics.com wrote:

 @All
 Why nobody desires such a pretty cool feature?

 Nicholas,
 I have a tiny progress: I'm able to stream in javabin codec format while
 searching, It implies sorting by _docid_

 here is the diff

 https://github.com/m-khl/solr-patches/commit/2f9ff068c379b3008bb983d0df69dff714ddde95

 The current issue is that reading response by SolrJ is done as whole.
 Reading by callback is supported by EmbeddedServer only. Anyway it should
 not a big deal. ResponseStreamingTest.java somehow works.
 I'm stuck on introducing response streaming in distributes search, it's
 actually more challenging  - RespStreamDistributedTest fails

 Regards


 On Fri, Mar 16, 2012 at 3:51 PM, Nicholas Ball 
 nicholas.b...@nodelay.comwrote:


 Mikhail  Ludovic,

 Thanks for both your replies, very helpful indeed!

 Ludovic, I was actually looking into just that and did some tests with
 SolrJ, it does work well but needs some changes on the Solr server if we
 want to send out individual documents a various times. This could be done
 with a write() and flush() to the FastOutputStream (daos) in JavBinCodec.
 I
 therefore think that a combination of this and Mikhail's solution would
 work best!

 Mikhail, you mention that your solution doesn't currently work and not
 sure why this is the case, but could it be that you haven't flushed the
 data (os.flush()) you've written in the collect method of DocSetStreamer?
 I
 think placing the output stream into the SolrQueryRequest is the way to
 go,
 so that we can access it and write to it how we intend. However, I think
 using the JavaBinCodec would be ideal so that we can work with SolrJ
 directly, and not mess around with the encoding of the docs/data etc...

 At the moment the entry point to JavaBinCodec is through the
 BinaryResponseWriter which calls the highest level marshal() method which
 decodes and sends out the entire SolrQueryResponse (line 49 @
 BinaryResponseWriter). What would be ideal is to be able to break up the
 response and call the JavaBinCodec for pieces of it with a flush after
 each
 call. Did a few tests with a simple Thread.sleep and a flush to see if
 this
 would actually work and looks like it's working out perfectly. Just trying
 to figure out the best way to actually do it now :) any ideas?

 An another note, for a solution to work with the chunked transfer encoding
 (and therefore web browsers), a lot more development is going to be
 needed.
 Not sure if it's worth trying yet but might look into it later down the
 line.

 Nick

 On Fri, 16 Mar 2012 07:29:20 +0300, Mikhail Khludnev
 mkhlud...@griddynamics.com wrote:
  Ludovic,
 
  I looked through. First of all, it seems to me you don't amend regular
  servlet solr server, but the only embedded one.
  Anyway, the difference is that you stream DocList via callback, but it
  means that you've instantiated it in memory and keep it there until it
 will
  be completely consumed. Think about a billion numfound. Core idea of my
  approach is keep almost zero memory for response.
 
  Regards
 
  On Fri, Mar 16, 2012 at 12:12 AM, lboutros boutr...@gmail.com wrote:
 
  Hi,
 
  I was looking for something similar.
 
  I tried this patch :
 
  https://issues.apache.org/jira/browse/SOLR-2112
 
  it's working quite well (I've back-ported the code in Solr 3.5.0...).
 
  Is it really different from what you are trying to achieve ?
 
  Ludovic.
 
  -
  Jouve
  France.
  --
  View this message in context:
 

 http://lucene.472066.n3.nabble.com/Responding-to-Requests-with-Chunks-Streaming-tp3827316p3829909.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 




 --
 Sincerely yours
 Mikhail Khludnev
 ge...@yandex.ru

 http://www.griddynamics.com
  mkhlud...@griddynamics.com




-- 
Sincerely yours
Mikhail Khludnev
ge...@yandex.ru

http://www.griddynamics.com
 mkhlud...@griddynamics.com


Re: pattern error in PatternReplaceCharFilterFactory

2012-04-02 Thread OliverS
Hi

It seems to be an unrecognisable pattern, this is from the log, last
paragraph says unknown character block name. The java version is
1.6.0_31:

***
SEVERE: null:org.apache.solr.common.SolrException: Plugin init failure for
[schema.xml] fieldType:Plugin init failure for [schema.xml]
analyzer/charFilter:Configuration Error: 'pattern' can not be parsed in
org.apache.solr.analysis.PatternReplaceCharFilterFactory
at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:167)
at
org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:357)
at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:106)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:756)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:473)
at
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:296)
at
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:99)
at
org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:295)
at
org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:422)
at
org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:115)
at
org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4072)
at
org.apache.catalina.core.StandardContext.start(StandardContext.java:4726)
at
org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:799)
at
org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:779)
at
org.apache.catalina.core.StandardHost.addChild(StandardHost.java:601)
at
org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:943)
at
org.apache.catalina.startup.HostConfig.deployWARs(HostConfig.java:778)
at
org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:504)
at
org.apache.catalina.startup.HostConfig.start(HostConfig.java:1317)
at
org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:324)
at
org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:142)
at
org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1065)
at
org.apache.catalina.core.StandardHost.start(StandardHost.java:840)
at
org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1057)
at
org.apache.catalina.core.StandardEngine.start(StandardEngine.java:463)
at
org.apache.catalina.core.StandardService.start(StandardService.java:525)
at
org.apache.catalina.core.StandardServer.start(StandardServer.java:754)
at org.apache.catalina.startup.Catalina.start(Catalina.java:595)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289)
at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414)
Caused by: org.apache.solr.common.SolrException: Plugin init failure for
[schema.xml] analyzer/charFilter:Configuration Error: 'pattern' can not be
parsed in org.apache.solr.analysis.PatternReplaceCharFilterFactory
at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:167)
at
org.apache.solr.schema.FieldTypePluginLoader.readAnalyzer(FieldTypePluginLoader.java:290)
at
org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:95)
at
org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:43)
at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:141)
... 33 more
Caused by: java.lang.RuntimeException: Configuration Error: 'pattern' can
not be parsed in org.apache.solr.analysis.PatternReplaceCharFilterFactory
at
org.apache.solr.analysis.PatternReplaceCharFilterFactory.init(PatternReplaceCharFilterFactory.java:54)
at
org.apache.solr.schema.FieldTypePluginLoader$1.init(FieldTypePluginLoader.java:278)
at
org.apache.solr.schema.FieldTypePluginLoader$1.init(FieldTypePluginLoader.java:268)
at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:149)
... 37 more
Caused by: java.util.regex.PatternSyntaxException: Unknown character block
name {Latin-1_Supplement} near index 23
\p{InLatin-1_Supplement}
   ^
at java.util.regex.Pattern.error(Pattern.java:1713)
at
java.util.regex.Pattern.unicodeBlockPropertyFor(Pattern.java:2424)
at java.util.regex.Pattern.family(Pattern.java:2408)
at java.util.regex.Pattern.sequence(Pattern.java:1831)
at 

RE: Distributed grouping issue

2012-04-02 Thread fbrisbart
Hi,

when you write I get xxx results, does it come from 'numFound' ? Or
you really display xxx results ?
When using both field collapsing and sharding, the 'numFound' may be
wrong. In that case, think about using 'shards.rows' parameter with a
high value (be careful, it's bad for performance).

If the problem is really about the returned results, it may be because
of several documents having the same unique key document_id in
different shards.

Hope it helps,
Franck



Le vendredi 30 mars 2012 à 23:52 +, Young, Cody a écrit :
 I forgot to mention, I can see the distributed requests happening in the logs:
 
 Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
 INFO: [core2] webapp=/solr path=/select 
 params={group.distributed.first=truedistrib=falsewt=javabinrows=10version=2fl=document_id,scoreshard.url=localhost:8086/solr/core2NOW=1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShard=true}
  status=0 QTime=2
 Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
 INFO: [core4] webapp=/solr path=/select 
 params={group.distributed.first=truedistrib=falsewt=javabinrows=10version=2fl=document_id,scoreshard.url=localhost:8086/solr/core4NOW=1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShard=true}
  status=0 QTime=1
 Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
 INFO: [core1] webapp=/solr path=/select 
 params={group.distributed.first=truedistrib=falsewt=javabinrows=10version=2fl=document_id,scoreshard.url=localhost:8086/solr/core1NOW=1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShard=true}
  status=0 QTime=1
 Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
 INFO: [core3] webapp=/solr path=/select 
 params={group.distributed.first=truedistrib=falsewt=javabinrows=10version=2fl=document_id,scoreshard.url=localhost:8086/solr/core3NOW=1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShard=true}
  status=0 QTime=1
 Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
 INFO: [core0] webapp=/solr path=/select 
 params={group.distributed.first=truedistrib=falsewt=javabinrows=10version=2fl=document_id,scoreshard.url=localhost:8086/solr/core0NOW=1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShard=true}
  status=0 QTime=1
 Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
 INFO: [core6] webapp=/solr path=/select 
 params={group.distributed.first=truedistrib=falsewt=javabinrows=10version=2fl=document_id,scoreshard.url=localhost:8086/solr/core6NOW=1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShard=true}
  status=0 QTime=0
 Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
 INFO: [core7] webapp=/solr path=/select 
 params={group.distributed.first=truedistrib=falsewt=javabinrows=10version=2fl=document_id,scoreshard.url=localhost:8086/solr/core7NOW=1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShard=true}
  status=0 QTime=3
 Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
 INFO: [core5] webapp=/solr path=/select 
 params={group.distributed.first=truedistrib=falsewt=javabinrows=10version=2fl=document_id,scoreshard.url=localhost:8086/solr/core5NOW=1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShard=true}
  status=0 QTime=1
 Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
 INFO: [core4] webapp=/solr path=/select 
 params={distrib=falsegroup.distributed.second=truewt=javabinversion=2rows=10group.topgroups.group_field=4183765296group.topgroups.group_field=4608765424group.topgroups.group_field=3524954944group.topgroups.group_field=4182445488group.topgroups.group_field=4213143392group.topgroups.group_field=4328299312group.topgroups.group_field=4206259648group.topgroups.group_field=3465497912group.topgroups.group_field=3554417600group.topgroups.group_field=3140802904fl=document_id,scoreshard.url=localhost:8086/solr/core4NOW=1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShard=true}
  status=0 QTime=2
 Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
 INFO: [core6] webapp=/solr path=/select 
 params={distrib=falsegroup.distributed.second=truewt=javabinversion=2rows=10group.topgroups.group_field=4183765296group.topgroups.group_field=4608765424group.topgroups.group_field=3524954944group.topgroups.group_field=4182445488group.topgroups.group_field=4213143392group.topgroups.group_field=4328299312group.topgroups.group_field=4206259648group.topgroups.group_field=3465497912group.topgroups.group_field=3554417600group.topgroups.group_field=3140802904fl=document_id,scoreshard.url=localhost:8086/solr/core6NOW=1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShard=true}
  status=0 QTime=2
 Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
 INFO: [core4] webapp=/solr path=/select 
 

Re: Solr caching memory consumption Problem

2012-04-02 Thread Suneel
Hello friends,

I am using DIH for solr indexing. I have 60 million records in SQL which
need to upload on solr. i started caching its smoothly working and memory
consumption is normal, But after some time incrementally memory consumption
going high and process reach more then 6 gb. that the reason i am not able
to caching my data.
please advise me if anything need to be done in configuration or in tomcat
configuration.

this will be very help full for me.


-
Regards,

Suneel Pandey
Sr. Software Developer
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-caching-memory-consumption-Problem-tp3873158p3877081.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Virtual Memory very high

2012-04-02 Thread Suneel
Hello Everyone,

On window server.

I am facing same problem during indexing my memory consumption going very
high based on above discussion i checked in my Solrconfig.xml file and found
that directoryFactory not configured yet. if i configuring
directoryfactory then its will help me reduce the consumption of memory.

i think below configuration used for linex server.

directoryFactory name=DirectoryFactory
class=${solr.directoryFactory:solr.NIOFSDirectoryFactory}/

what will be best option for window server which solve my problem.

Please suggest me.






-
Regards,

Suneel Pandey
Sr. Software Developer
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Virtual-Memory-very-high-tp3574817p3877097.html
Sent from the Solr - User mailing list archive at Nabble.com.


Using UIMA in Solr behind a firewall

2012-04-02 Thread kodo
Hi!

I'm desperately trying to work out how to configure Solr in order to allow
it to make calls to the Alchemy service through the UIMA analysis engines.
Is there anybody who has been able to accomplish this?

Cheers

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Using-UIMA-in-Solr-behind-a-firewall-tp3877143p3877143.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Empty facet counts

2012-04-02 Thread Youri Westerman
Alright well I discovered that php converts '.' in a variable name to '_'
causing my request to contain a variable to a non-exsistent facet_field.

2012/3/30 William Bell billnb...@gmail.com

 Can you also include a /select?q=*:*wt=xml

 ?

 On Thu, Mar 29, 2012 at 11:47 AM, Erick Erickson
 erickerick...@gmail.com wrote:
  Hmmm, looking at your schema, faceting on a uniqueKey really doesn't
 make
  all that much sense, there will always be exactly one of them. At
  least it's highly
  questionable.
 
  But that's not your problem and what's wrong isn't at all obvious. Can
 you try
  pasting the results of adding debugQuery=on?
 
  Best
  Erick
 
  On Thu, Mar 29, 2012 at 11:12 AM, Youri Westerman yo...@pluxcustoms.nl
 wrote:
  The version is 3.5.0.2011.11.22.14.54.38. I did not apply any patches,
 but
  then again it is not my server.
  Do you have a clue on what is going wrong here?
 
  Regards,
 
  Youri
 
 
  2012/3/29 Bill Bell billnb...@gmail.com
 
  Send schema.xml and did you apply any patches? What version of Solr?
 
  Bill Bell
  Sent from mobile
 
 
  On Mar 29, 2012, at 5:26 AM, Youri Westerman yo...@pluxcustoms.nl
 wrote:
 
   Hi,
  
   I'm currently learning how to use solr and everything seems pretty
   straight
   forward. For some reason when I use faceted queries it returns only
   empty
   sets in the facet_count section.
  
   The get params I'm using are:
?q=*:*rows=0facet=truefacet.field=urn
  
   The result:
facet_counts: {
  
facet_queries: { },
facet_fields: { },
facet_dates: { },
facet_ranges: { }
  
}
  
   The urn field is indexed and there are enough entries to be counted.
   When
   adding facet.method=Enum, nothing changes.
   Does anyone know why this is happening? Am I missing something?
  
   Thanks in advance!
  
   Youri
 
 



 --
 Bill Bell
 billnb...@gmail.com
 cell 720-256-8076



Re: Distributed grouping issue

2012-04-02 Thread Martijn v Groningen
The matches element in the response should return the number of documents
that matched with the query and not the number of groups.
Did you encountered this issue also with other Solr versions (3.5 or
another nightly build)?

Martijn

On 2 April 2012 09:41, fbrisbart fbrisb...@bestofmedia.com wrote:

 Hi,

 when you write I get xxx results, does it come from 'numFound' ? Or
 you really display xxx results ?
 When using both field collapsing and sharding, the 'numFound' may be
 wrong. In that case, think about using 'shards.rows' parameter with a
 high value (be careful, it's bad for performance).

 If the problem is really about the returned results, it may be because
 of several documents having the same unique key document_id in
 different shards.

 Hope it helps,
 Franck



 Le vendredi 30 mars 2012 à 23:52 +, Young, Cody a écrit :
  I forgot to mention, I can see the distributed requests happening in the
 logs:
 
  Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
  INFO: [core2] webapp=/solr path=/select
 params={group.distributed.first=truedistrib=falsewt=javabinrows=10version=2fl=document_id,scoreshard.url=localhost:8086/solr/core2NOW=1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShard=true}
 status=0 QTime=2
  Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
  INFO: [core4] webapp=/solr path=/select
 params={group.distributed.first=truedistrib=falsewt=javabinrows=10version=2fl=document_id,scoreshard.url=localhost:8086/solr/core4NOW=1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShard=true}
 status=0 QTime=1
  Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
  INFO: [core1] webapp=/solr path=/select
 params={group.distributed.first=truedistrib=falsewt=javabinrows=10version=2fl=document_id,scoreshard.url=localhost:8086/solr/core1NOW=1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShard=true}
 status=0 QTime=1
  Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
  INFO: [core3] webapp=/solr path=/select
 params={group.distributed.first=truedistrib=falsewt=javabinrows=10version=2fl=document_id,scoreshard.url=localhost:8086/solr/core3NOW=1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShard=true}
 status=0 QTime=1
  Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
  INFO: [core0] webapp=/solr path=/select
 params={group.distributed.first=truedistrib=falsewt=javabinrows=10version=2fl=document_id,scoreshard.url=localhost:8086/solr/core0NOW=1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShard=true}
 status=0 QTime=1
  Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
  INFO: [core6] webapp=/solr path=/select
 params={group.distributed.first=truedistrib=falsewt=javabinrows=10version=2fl=document_id,scoreshard.url=localhost:8086/solr/core6NOW=1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShard=true}
 status=0 QTime=0
  Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
  INFO: [core7] webapp=/solr path=/select
 params={group.distributed.first=truedistrib=falsewt=javabinrows=10version=2fl=document_id,scoreshard.url=localhost:8086/solr/core7NOW=1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShard=true}
 status=0 QTime=3
  Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
  INFO: [core5] webapp=/solr path=/select
 params={group.distributed.first=truedistrib=falsewt=javabinrows=10version=2fl=document_id,scoreshard.url=localhost:8086/solr/core5NOW=1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShard=true}
 status=0 QTime=1
  Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
  INFO: [core4] webapp=/solr path=/select
 params={distrib=falsegroup.distributed.second=truewt=javabinversion=2rows=10group.topgroups.group_field=4183765296group.topgroups.group_field=4608765424group.topgroups.group_field=3524954944group.topgroups.group_field=4182445488group.topgroups.group_field=4213143392group.topgroups.group_field=4328299312group.topgroups.group_field=4206259648group.topgroups.group_field=3465497912group.topgroups.group_field=3554417600group.topgroups.group_field=3140802904fl=document_id,scoreshard.url=localhost:8086/solr/core4NOW=1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShard=true}
 status=0 QTime=2
  Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
  INFO: [core6] webapp=/solr path=/select
 params={distrib=falsegroup.distributed.second=truewt=javabinversion=2rows=10group.topgroups.group_field=4183765296group.topgroups.group_field=4608765424group.topgroups.group_field=3524954944group.topgroups.group_field=4182445488group.topgroups.group_field=4213143392group.topgroups.group_field=4328299312group.topgroups.group_field=4206259648group.topgroups.group_field=3465497912group.topgroups.group_field=3554417600group.topgroups.group_field=3140802904fl=document_id,scoreshard.url=localhost:8086/solr/core6NOW=1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShard=true}
 status=0 

Apache solr not indexing complete pdf file using tikka

2012-04-02 Thread Manoj Saini
Hello Guys,

I am using apache solr 3.3.0 with Tikka 1.0.

I have pdf files which I am pushing into solr for conent searching. Apache
solr is indexing pdf files and I can see them in apache solr admin interface
for search. But the issue is apache solr is not indexing whole file content.
It is indexing upto only limited size.

Am I missing something, some configuration, or this is the behavior of
apache solr?

I have tried to update solrconfig.xml. I have updated ramBufferSizeMB,
maxFieldLength. 

Thanks
Manoj Saini

 

 

Thanks,

Best Regards,

 

Manoj Saini | Sr. Software Engineer  | Stigasoft

m: +91 98 1034 1281 | 

e:  mailto:nseh...@stigasoft.com manoj.sa...@stigasoft.com | w:
http://www.stigasoft.com www.stigasoft.com

 



Re: How do I use localparams/joins using SolrJ and/or the Admin GUI

2012-04-02 Thread Stefan Matheis


On Friday, March 30, 2012 at 11:33 PM, vybe3142 wrote:

 When I paste the relevant part of the query into the SOLR admin UI query
 interface, 
 {!join+from=join_id+to=id}attributes_AUTHORS.4:4, I fail to retrieve any
 documents

Just go and paste the raw content into the form, then you'll get the expected 
result. if you put in + Characters they will the escaped and result in %3B (as 
Erick already said) 





Re: How do I use localparams/joins using SolrJ and/or the Admin GUI

2012-04-02 Thread Stefan Matheis
On Saturday, March 31, 2012 at 6:01 PM, Yonik Seeley wrote:
 Shouldn't that be the other way? The admin UI should do any necessary
 escaping, so those + chars should instead be a spaces?


We can, but is this really what you'd expect? 


Re: How do I use localparams/joins using SolrJ and/or the Admin GUI

2012-04-02 Thread Stefan Matheis
On Monday, April 2, 2012 at 2:00 PM, Stefan Matheis wrote:
 On Friday, March 30, 2012 at 11:33 PM, vybe3142 wrote:
  When I paste the relevant part of the query into the SOLR admin UI query
  interface, 
  {!join+from=join_id+to=id}attributes_AUTHORS.4:4, I fail to retrieve any
  documents
 
 Just go and paste the raw content into the form, then you'll get the expected 
 result. if you put in + Characters they will the escaped and result in %3B 
 (as Erick already said) 

Sorry, perhaps not clear enough .. raw content in this case means:

{!join from=join_id to=id}attributes_AUTHORS.4:4

.. space as space and not already escaped as +


Re: SolrCloud

2012-04-02 Thread Erick Erickson
No, you don't have to run zookeeper on each replica. Zookeeper
is a repository for your system (cluster) information. It knows
about each replica, but ZK does not need to run on each shard.

You can run one zookeeper instance for your entire cluster, no matter
how many shards/replicas you have.

Here's a good place to get started understanding ZK:
http://zookeeper.apache.org/

Internally, SolrCloud uses ZooKeeper to understand what to do
with update and search requests. In effect, it asks ZK
How many shards are there and what is the address of each
leader? and does the right thing with the results...

My suggestion is that you pretty much forget ZK exists until
you get a bit more comfortable with SolrCloud. Run it embedded
in a single instance (and do NOT shut that instance down!).

From there, you should see SolrCloud just work and it'll at least
get you started.

Best
Erick

On Mon, Apr 2, 2012 at 1:59 AM, asia asia.k...@lntinfotech.com wrote:
 Thanks for replying,
 So if i will make a replica of each shard,then should I use zookeeper for
 every shards and replica or only for the replica.! more question i want to
 ask is that I am using solr in tomcat and eclipse environment using solrj.so
 I am a bit confuse as to how to use zookeeper in it along with tomcat.I have
 downloaded zookeeper jar files also but need little help in it.
 -Asia

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/SolrCloud-tp3867086p3876869.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Open deleted index file failing jboss shutdown with Too many open files Error

2012-04-02 Thread Erick Erickson
How often are you committing index updates? This kind of thing
can happen if you commit too often. Consider setting
commitWithin to something like, say, 5 minutes. Or doing the
equivalent with the autoCommit parameters in solrconfig.xml

If that isn't relevant, you need to provide some more details
about what you're doing and how you're using Solr

Best
Erick

On Sun, Apr 1, 2012 at 10:47 PM, Gopal Patwa gopalpa...@gmail.com wrote:
 I am using Solr 4.0 nightly build with NRT and I often get this
 error during auto commit Too many open files. I have search this forum
 and what I found it is related to OS ulimit setting, please see below my
 ulimit settings. I am not sure what ulimit setting I should have for open
 file? ulimit -n unlimited?.

 Even if I set to higher number, it will just delay the issue until it reach
 new open file limit. What I have seen that Solr has kept deleted index file
 open by java process, which causing issue for our application server jboss
 to shutdown gracefully due this open files by java process.

 I have seen recently this issue was resolved in lucene, is it TRUE?

 https://issues.apache.org/jira/browse/LUCENE-3855


 I have 3 core with index size : core1 - 70GB, Core2 - 50GB and Core3
 - 15GB, with Single shard

 We update the index every 5 seconds, soft commit every 1 second and hard
 commit every 15 minutes

 Environment: Jboss 4.2, JDK 1.6 64 bit, CentOS , JVM Heap Size = 24GB*


 ulimit:

 core file size          (blocks, -c) 0

 data seg size           (kbytes, -d) unlimited

 scheduling priority             (-e) 0

 file size               (blocks, -f) unlimited

 pending signals                 (-i) 401408

 max locked memory       (kbytes, -l) 1024

 max memory size         (kbytes, -m) unlimited

 open files                      (-n) 4096

 pipe size            (512 bytes, -p) 8

 POSIX message queues     (bytes, -q) 819200

 real-time priority              (-r) 0

 stack size              (kbytes, -s) 10240

 cpu time               (seconds, -t) unlimited

 max user processes              (-u) 401408

 virtual memory          (kbytes, -v) unlimited

 file locks                      (-x) unlimited


 ERROR:*

 *2012-04-01* *20:08:35*,*323* [] *priority=ERROR* *app_name=*
 *thread=pool-10-thread-1* *location=CommitTracker* *line=93* *auto*
 *commit* *error...:org.apache.solr.common.SolrException:* *Error*
 *opening* *new* *searcher*
        *at* 
 *org.apache.solr.core.SolrCore.openNewSearcher*(*SolrCore.java:1138*)
        *at* *org.apache.solr.core.SolrCore.getSearcher*(*SolrCore.java:1251*)
        *at* 
 *org.apache.solr.update.DirectUpdateHandler2.commit*(*DirectUpdateHandler2.java:409*)
        *at* 
 *org.apache.solr.update.CommitTracker.run*(*CommitTracker.java:197*)
        *at* 
 *java.util.concurrent.Executors$RunnableAdapter.call*(*Executors.java:441*)
        *at* 
 *java.util.concurrent.FutureTask$Sync.innerRun*(*FutureTask.java:303*)
        *at* *java.util.concurrent.FutureTask.run*(*FutureTask.java:138*)
        *at* 
 *java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301*(*ScheduledThreadPoolExecutor.java:98*)
        *at* 
 *java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run*(*ScheduledThreadPoolExecutor.java:206*)
        *at* 
 *java.util.concurrent.ThreadPoolExecutor$Worker.runTask*(*ThreadPoolExecutor.java:886*)
        *at* 
 *java.util.concurrent.ThreadPoolExecutor$Worker.run*(*ThreadPoolExecutor.java:908*)
        *at* *java.lang.Thread.run*(*Thread.java:662*)*Caused* *by:*
 *java.io.FileNotFoundException:*
 */opt/mci/data/srwp01mci001/inventory/index/_4q1y_0.tip* (*Too many
 open files*)
        *at* *java.io.RandomAccessFile.open*(*Native* *Method*)
        *at* *java.io.RandomAccessFile.**init*(*RandomAccessFile.java:212*)
        *at* 
 *org.apache.lucene.store.FSDirectory$FSIndexOutput.**init*(*FSDirectory.java:449*)
        *at* 
 *org.apache.lucene.store.FSDirectory.createOutput*(*FSDirectory.java:288*)
        *at* 
 *org.apache.lucene.codecs.BlockTreeTermsWriter.**init*(*BlockTreeTermsWriter.java:161*)
        *at* 
 *org.apache.lucene.codecs.lucene40.Lucene40PostingsFormat.fieldsConsumer*(*Lucene40PostingsFormat.java:66*)
        *at* 
 *org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.addField*(*PerFieldPostingsFormat.java:118*)
        *at* 
 *org.apache.lucene.index.FreqProxTermsWriterPerField.flush*(*FreqProxTermsWriterPerField.java:322*)
        *at* 
 *org.apache.lucene.index.FreqProxTermsWriter.flush*(*FreqProxTermsWriter.java:92*)
        *at* *org.apache.lucene.index.TermsHash.flush*(*TermsHash.java:117*)
        *at* *org.apache.lucene.index.DocInverter.flush*(*DocInverter.java:53*)
        *at* 
 *org.apache.lucene.index.DocFieldProcessor.flush*(*DocFieldProcessor.java:81*)
        *at* 
 *org.apache.lucene.index.DocumentsWriterPerThread.flush*(*DocumentsWriterPerThread.java:475*)
        *at* 
 

Re: default operation for a field

2012-04-02 Thread Erick Erickson
You can't set the default operator for a single field. This implies
you're using edismax? If that's the case, your app layer can
massage the query to something like
term1 term2 term3 field_x:(term1 AND term2 AND term3). In which
case field_x probably should not be in your qf parameter.

Best
Erick

On Mon, Apr 2, 2012 at 2:05 AM, Alexander Aristov
alexander.aris...@gmail.com wrote:
 Hi,

 Just curious if it's possible to set default operator for a field, not for
 all application. I have a field and I want it always had AND operation. Is
 it feasible?

 Users don't enter any opeartors for this field. Only one term or several
 separated by empty spaces. But if default operation is set to OR then the
 field doesn't work as I expect. I need only AND. Maybe another solution is
 possible?

 Best Regards
 Alexander Aristov


Re: default operation for a field

2012-04-02 Thread Alexander Aristov
Ok. got it. thanks

Best Regards
Alexander Aristov


On 2 April 2012 16:37, Erick Erickson erickerick...@gmail.com wrote:

 You can't set the default operator for a single field. This implies
 you're using edismax? If that's the case, your app layer can
 massage the query to something like
 term1 term2 term3 field_x:(term1 AND term2 AND term3). In which
 case field_x probably should not be in your qf parameter.

 Best
 Erick

 On Mon, Apr 2, 2012 at 2:05 AM, Alexander Aristov
 alexander.aris...@gmail.com wrote:
  Hi,
 
  Just curious if it's possible to set default operator for a field, not
 for
  all application. I have a field and I want it always had AND operation.
 Is
  it feasible?
 
  Users don't enter any opeartors for this field. Only one term or several
  separated by empty spaces. But if default operation is set to OR then the
  field doesn't work as I expect. I need only AND. Maybe another solution
 is
  possible?
 
  Best Regards
  Alexander Aristov



Re: Virtual Memory very high

2012-04-02 Thread Erick Erickson
Why do you care about virtual memory? It's after all, virtual. You can
allocate as much as you want.

For instance, MMapDirectory maps a load of virtual memory, but that
has little relation to how much physical memory is being used. Consider
looking at your app with something like jConsole and seeing how much
physical memory is being used before you worry about this issue.

Best
Erick

On Mon, Apr 2, 2012 at 4:56 AM, Suneel pandey.sun...@gmail.com wrote:
 Hello Everyone,

 On window server.

 I am facing same problem during indexing my memory consumption going very
 high based on above discussion i checked in my Solrconfig.xml file and found
 that directoryFactory not configured yet. if i configuring
 directoryfactory then its will help me reduce the consumption of memory.

 i think below configuration used for linex server.

 directoryFactory name=DirectoryFactory
 class=${solr.directoryFactory:solr.NIOFSDirectoryFactory}/

 what will be best option for window server which solve my problem.

 Please suggest me.






 -
 Regards,

 Suneel Pandey
 Sr. Software Developer
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Virtual-Memory-very-high-tp3574817p3877097.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Virtual Memory very high

2012-04-02 Thread Michael McCandless
Are you seeing a real problem here, besides just being alarmed by the
big numbers from top?

Consumption of virtual memory by itself is basically harmless, as long
as you're not running up against any of the OS limits (and, you're
running a 64 bit JVM).

This is just top telling you that you've mapped large files into the
virtual memory space.

It's not telling you that you don't have any RAM left... virtual
memory is different from RAM.

In my tests, generally MMapDirectory gives faster search performance
than NIOFSDirectory... so unless there's an actual issue, I would
recommend sticking with MMapDirectory.

Mike McCandless

http://blog.mikemccandless.com

On Fri, Dec 9, 2011 at 11:54 PM, Rohit ro...@in-rev.com wrote:
 Hi All,



 Don't know if this question is directly related to this forum, I am running
 Solr in Tomcat on linux server. The moment I start tomcat the virtual memory
 shown using TOP command goes to its max 31.1G and then remains there.



 Is this the right behaviour, why is the virtual memory usage so high. I have
 36GB of ram on the server.



 Tasks: 309 total,   1 running, 308 sleeping,   0 stopped,   0 zombie

 Cpu(s): 19.1%us,  0.2%sy,  0.0%ni, 79.3%id,  1.2%wa,  0.0%hi,  0.2%si,
 0.0%st

 Mem:  49555260k total, 36152224k used, 13403036k free,   121612k buffers

 Swap:   999416k total,        0k used,   999416k free,  5409052k cached



  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND

 2741 mysql     20   0 6412m 5.8g 6380 S  182 12.3 108:07.45 mysqld

 2814 root      20   0 31.1g  22g 9716 S  100 46.6 375:51.70 java

 1765 root      20   0 12.2g 285m 9488 S    2  0.6   3:52.59 java

 3591 root      20   0 19352 1576 1068 R    0  0.0   0:00.24 top

    1 root      20   0 23684 1908 1276 S    0  0.0   0:06.21 init



 Regards,

 Rohit





Re: Apache solr not indexing complete pdf file using tikka

2012-04-02 Thread Erick Erickson
You can index 2B tokens, so upping maxFieldLength should have
fixed your problem at least as far as Solr is concerned. How
many tokens get indexed? I'm not as familiar with Tika, but
there may be some kind of parameter there (although I
don't remember this coming up before)...

Did you restart Solr after making the change to solrconfig.xml?

If you're seeing 10,000 tokens or so, that's the default for
maxFieldLength

I'd recommend stopping Solr, rm -rf solr home/data/index
and restarting Solr just to be sure you're not seeing leftover
junk, you'll have to re-index your docs after changing
the maxLength param.


Best
Erick


On Mon, Apr 2, 2012 at 7:19 AM, Manoj Saini manoj.sa...@stigasoft.com wrote:
 Hello Guys,

 I am using apache solr 3.3.0 with Tikka 1.0.

 I have pdf files which I am pushing into solr for conent searching. Apache
 solr is indexing pdf files and I can see them in apache solr admin interface
 for search. But the issue is apache solr is not indexing whole file content.
 It is indexing upto only limited size.

 Am I missing something, some configuration, or this is the behavior of
 apache solr?

 I have tried to update solrconfig.xml. I have updated ramBufferSizeMB,
 maxFieldLength.

 Thanks
 Manoj Saini





 Thanks,

 Best Regards,



 Manoj Saini | Sr. Software Engineer  | Stigasoft

 m: +91 98 1034 1281 |

 e:  mailto:nseh...@stigasoft.com manoj.sa...@stigasoft.com | w:
 http://www.stigasoft.com www.stigasoft.com





Problems with indexing of huge textfiles (drupal/tika/solr)

2012-04-02 Thread Sandro Feuillet
Hi,

We have troubles indexing big text files with Solr.
We extract PDF files with Tika and try to index them with Solr.
But Solr doesn't index the entire text. As soon as a certain amount of text
is reached Solr stopps indexing the rest. We haven't found a setting or
parameter wich defines the amount of text to index per Node/Document. Wher
is this limit set or how can we increase it?
At the Moment the Limit is somwhere around 40k Characters or 69kb Text.

Best Regards,
Sandro

-- 
.. .
Sandro Feuillet

zehnplus GmbH
Binzmühlestrasse 210
CH-8050 Zürich

Telefon:  +41 43 288 58 49
Mobil:+41 76 422 30 22
E-Mail:   sandro.feuil...@zehnplus.ch
Internet: http://www.zehnplus.ch
.. .


A little mild abuse of SearchHandler

2012-04-02 Thread Benson Margulies
I've got a prototype of a RequestHandler that embeds, within itself, a
SearchHandler. Yes, I read the previous advice to be a query
component, but I found it a lot easier to chart my course.

I'm having some trouble with sorting. I came up with the following.
'args' is the usual MapString, String[]. firstpassSort is an array
of score desc, myfieldname asc. Sorting isn't happening. The
QParser does not seem to be seeing my sort spec, as if something is
trimming it out of the params. Is there something here I'm missing?

 args.put(CommonParams.SORT, firstpassSort);
 LocalSolrQueryRequest lsqr = new LocalSolrQueryRequest(req.getCore(),
  bqString, standard, 0, rows, args);

  SolrQueryResponse localRes = new SolrQueryResponse();

  srh.handleRequest(lsqr, localRes); // ok, let the regular processor
do the job.


Re: A little mild abuse of SearchHandler

2012-04-02 Thread Benson Margulies
I've answered my own question, but it left me with a lot of curiosity.

Why is the convention to build strings joined with commas (e.g in
SolrQuery.addValueToParam) rather than to use the array option? All
these params are MapString, String[], so why cram multiples into the
first slot with commas ?


Re: Solr caching memory consumption Problem

2012-04-02 Thread Shawn Heisey

On 3/31/2012 4:30 AM, Suneel wrote:

Hello friends,

I am using DIH for solr indexing. I have 60 million records in SQL which
need to upload on solr. i started caching its smoothly working and memory
consumption is normal, But after some time incrementally memory consumption
going high and process reach more then 6 gb. that the reason i am not able
to caching my data.
please advise me if anything need to be done in configuration or in tomcat
configuration.


I saw your later message about virtual memory and the directoryFactory - 
most of the time it is best to go with the default 
(solr.StandardDirectoryFactory), which you can do by specifying it 
explicitly or by leaving that configuration out.


When you talk about caching, are you talking about Solr's caches or 
OS/process memory and disk cache?If you are talking about the caches 
that you can configure in solrconfig.xml (filterCache, queryResultCache, 
and documentCache), you should not be trying to cache large portions of 
your index there.  I have over 11 million documents in each of my index 
shards (68 million for the whole index) and my numbers for those three 
caches are 64, 512, and 16384, with autoWarm counts of 4 and 32, since 
the documentCache doesn't directly support warming.


If you are talking about how much memory Windows says the Java process 
says it is taking up, take a look at the replies you have already gotten 
on your Virtual Memory message.  As Erick and Michael told you, if you 
are using the latest version (3.5) with the standard directoryFactory 
config, most of the memory that you are seeing there is because the OS 
is memory mapping your entire on-disk index, taking advantage of the OS 
disk cache to speed up disk access without actually allocating the 
memory involved.  This is a good thing, even though the process numbers 
look bad.  JConsole or another java memory tool can show you the true 
picture.


With 60 million records, even if those records are small, your Solr 
index will probably grow to several gigabytes.  For the best 
performance, your server must have enough memory so that the entire 
index can fit into RAM, after discounting memory usage for the OS itself 
and the java process that contains Solr.  If you can get MOST of the 
index into RAM, performance will likely still be acceptable.


You message implies that 6GB worries you very much, so I am guessing 
that your server has somewhere in the range of 4GB to 8GB of RAM, but 
your index is very much larger than this.  You don't actually say 
whether you lose performance.  Do you, or are you just worried about the 
memory usage?  If Solr's query times start increasing, that is usually a 
good indicator that it is not healthy.


Thanks,
Shawn



Re: Problems with indexing of huge textfiles (drupal/tika/solr)

2012-04-02 Thread Erick Erickson
And probably 10,000 tokens (words). See maxFieldLength
in solrconfig.xml.

Best
Erick

On Mon, Apr 2, 2012 at 8:57 AM, Sandro Feuillet
sandro.feuil...@zehnplus.ch wrote:
 Hi,

 We have troubles indexing big text files with Solr.
 We extract PDF files with Tika and try to index them with Solr.
 But Solr doesn't index the entire text. As soon as a certain amount of text
 is reached Solr stopps indexing the rest. We haven't found a setting or
 parameter wich defines the amount of text to index per Node/Document. Wher
 is this limit set or how can we increase it?
 At the Moment the Limit is somwhere around 40k Characters or 69kb Text.

 Best Regards,
 Sandro

 --
 .. .
 Sandro Feuillet

 zehnplus GmbH
 Binzmühlestrasse 210
 CH-8050 Zürich

 Telefon:  +41 43 288 58 49
 Mobil:    +41 76 422 30 22
 E-Mail:   sandro.feuil...@zehnplus.ch
 Internet: http://www.zehnplus.ch
 .. .


How to determine memory consumption per core

2012-04-02 Thread Martin Grotzke
Hi,

is it possible to determine the memory consumption (heap space) per core
in solr trunk (4.0-SNAPSHOT)?

I just unloaded a core and saw the difference in memory usage, but it
would be nice to have a smoother way of getting the information without
core downtime.

It would also be interesting, which caches are the biggest ones, to know
which one should/might be reduced.

Thanx  cheers,
Martin



signature.asc
Description: OpenPGP digital signature


RE: ExtractingRequestHandler

2012-04-02 Thread spring
  Solr Cell is great for proof-of-concept, but for heavy-duty 
 applications,
 you're offloading all the processing on the  Solr server, 
 which can be a
 problem.

Good point!

Thank you



Thanks All, that worked (both via SOLRJ and the admin UI)

2012-04-02 Thread vybe3142
The query in question should be:



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-do-I-use-localparams-joins-using-SolrJ-and-or-the-Admin-GUI-tp3872088p3877927.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Distributed grouping issue

2012-04-02 Thread Young, Cody
In the case of group=false:

numFound=26

In the case of group=true:

int name=matches34000/int

As a note, the grouped number changes when I hit refresh. It seems to display 
the count from any single shard. (The top match also changes).

I haven't tried this in other versions of solr.

All documents of a group exist on a single shard, there are no cross-shard 
groups.

Thanks,
Cody 

-Original Message-
From: martijn.is.h...@gmail.com [mailto:martijn.is.h...@gmail.com] On Behalf Of 
Martijn v Groningen
Sent: Monday, April 02, 2012 3:15 AM
To: solr-user@lucene.apache.org
Subject: Re: Distributed grouping issue

The matches element in the response should return the number of documents 
that matched with the query and not the number of groups.
Did you encountered this issue also with other Solr versions (3.5 or another 
nightly build)?

Martijn

On 2 April 2012 09:41, fbrisbart fbrisb...@bestofmedia.com wrote:

 Hi,

 when you write I get xxx results, does it come from 'numFound' ? Or 
 you really display xxx results ?
 When using both field collapsing and sharding, the 'numFound' may be 
 wrong. In that case, think about using 'shards.rows' parameter with a 
 high value (be careful, it's bad for performance).

 If the problem is really about the returned results, it may be because 
 of several documents having the same unique key document_id in 
 different shards.

 Hope it helps,
 Franck



 Le vendredi 30 mars 2012 à 23:52 +, Young, Cody a écrit :
  I forgot to mention, I can see the distributed requests happening in 
  the
 logs:
 
  Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
  INFO: [core2] webapp=/solr path=/select
 params={group.distributed.first=truedistrib=falsewt=javabinrows=10
 version=2fl=document_id,scoreshard.url=localhost:8086/solr/core2NOW
 =1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShar
 d=true}
 status=0 QTime=2
  Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
  INFO: [core4] webapp=/solr path=/select
 params={group.distributed.first=truedistrib=falsewt=javabinrows=10
 version=2fl=document_id,scoreshard.url=localhost:8086/solr/core4NOW
 =1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShar
 d=true}
 status=0 QTime=1
  Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
  INFO: [core1] webapp=/solr path=/select
 params={group.distributed.first=truedistrib=falsewt=javabinrows=10
 version=2fl=document_id,scoreshard.url=localhost:8086/solr/core1NOW
 =1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShar
 d=true}
 status=0 QTime=1
  Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
  INFO: [core3] webapp=/solr path=/select
 params={group.distributed.first=truedistrib=falsewt=javabinrows=10
 version=2fl=document_id,scoreshard.url=localhost:8086/solr/core3NOW
 =1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShar
 d=true}
 status=0 QTime=1
  Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
  INFO: [core0] webapp=/solr path=/select
 params={group.distributed.first=truedistrib=falsewt=javabinrows=10
 version=2fl=document_id,scoreshard.url=localhost:8086/solr/core0NOW
 =1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShar
 d=true}
 status=0 QTime=1
  Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
  INFO: [core6] webapp=/solr path=/select
 params={group.distributed.first=truedistrib=falsewt=javabinrows=10
 version=2fl=document_id,scoreshard.url=localhost:8086/solr/core6NOW
 =1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShar
 d=true}
 status=0 QTime=0
  Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
  INFO: [core7] webapp=/solr path=/select
 params={group.distributed.first=truedistrib=falsewt=javabinrows=10
 version=2fl=document_id,scoreshard.url=localhost:8086/solr/core7NOW
 =1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShar
 d=true}
 status=0 QTime=3
  Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
  INFO: [core5] webapp=/solr path=/select
 params={group.distributed.first=truedistrib=falsewt=javabinrows=10
 version=2fl=document_id,scoreshard.url=localhost:8086/solr/core5NOW
 =1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShar
 d=true}
 status=0 QTime=1
  Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute
  INFO: [core4] webapp=/solr path=/select
 params={distrib=falsegroup.distributed.second=truewt=javabinversion
 =2rows=10group.topgroups.group_field=4183765296group.topgroups.grou
 p_field=4608765424group.topgroups.group_field=3524954944group.topgro
 ups.group_field=4182445488group.topgroups.group_field=4213143392grou
 p.topgroups.group_field=4328299312group.topgroups.group_field=4206259
 648group.topgroups.group_field=3465497912group.topgroups.group_field
 =3554417600group.topgroups.group_field=3140802904fl=document_id,scor
 eshard.url=localhost:8086/solr/core4NOW=1333151353217start=0q=*:*
 group.field=group_fieldgroup=trueisShard=true}
 status=0 

Re: Open deleted index file failing jboss shutdown with Too many open files Error

2012-04-02 Thread Gopal Patwa
Here is SolrConfig.xml, and I am using Lucene NRT with soft commit and
 update the index every 5 seconds, soft commit every 1 second and hard
commit every 15 minutes

 SolrConfig.xml:


indexDefaults
useCompoundFilefalse/useCompoundFile
mergeFactor10/mergeFactor
maxMergeDocs2147483647/maxMergeDocs
maxFieldLength1/maxFieldLength--
ramBufferSizeMB4096/ramBufferSizeMB
maxThreadStates10/maxThreadStates
writeLockTimeout1000/writeLockTimeout
commitLockTimeout1/commitLockTimeout
lockTypesingle/lockType

mergePolicy class=org.apache.lucene.index.TieredMergePolicy
  double name=forceMergeDeletesPctAllowed0.0/double
  double name=reclaimDeletesWeight10.0/double
/mergePolicy

deletionPolicy class=solr.SolrDeletionPolicy
  str name=keepOptimizedOnlyfalse/str
  str name=maxCommitsToKeep0/str
/deletionPolicy

/indexDefaults


updateHandler class=solr.DirectUpdateHandler2
maxPendingDeletes1000/maxPendingDeletes
 autoCommit
   maxTime90/maxTime
   openSearcherfalse/openSearcher
 /autoCommit
 autoSoftCommit
   maxTime${inventory.solr.softcommit.duration:1000}/
maxTime
 /autoSoftCommit

/updateHandler

On Sun, Apr 1, 2012 at 7:47 PM, Gopal Patwa gopalpa...@gmail.com wrote:

 I am using Solr 4.0 nightly build with NRT and I often get this
 error during auto commit Too many open files. I have search this forum
 and what I found it is related to OS ulimit setting, please see below my
 ulimit settings. I am not sure what ulimit setting I should have for open
 file? ulimit -n unlimited?.

 Even if I set to higher number, it will just delay the issue until it
 reach new open file limit. What I have seen that Solr has kept deleted
 index file open by java process, which causing issue for our application
 server jboss to shutdown gracefully due this open files by java process.

 I have seen recently this issue was resolved in lucene, is it TRUE?

 https://issues.apache.org/jira/browse/LUCENE-3855


 I have 3 core with index size : core1 - 70GB, Core2 - 50GB and Core3
 - 15GB, with Single shard

 We update the index every 5 seconds, soft commit every 1 second and hard
 commit every 15 minutes

 Environment: Jboss 4.2, JDK 1.6 64 bit, CentOS , JVM Heap Size = 24GB*


 ulimit:

 core file size  (blocks, -c) 0

 data seg size   (kbytes, -d) unlimited

 scheduling priority (-e) 0

 file size   (blocks, -f) unlimited

 pending signals (-i) 401408

 max locked memory   (kbytes, -l) 1024

 max memory size (kbytes, -m) unlimited

 open files  (-n) 4096

 pipe size(512 bytes, -p) 8

 POSIX message queues (bytes, -q) 819200

 real-time priority  (-r) 0

 stack size  (kbytes, -s) 10240

 cpu time   (seconds, -t) unlimited

 max user processes  (-u) 401408

 virtual memory  (kbytes, -v) unlimited

 file locks  (-x) unlimited


 ERROR:*

 *2012-04-01* *20:08:35*,*323* [] *priority=ERROR* *app_name=* 
 *thread=pool-10-thread-1* *location=CommitTracker* *line=93* *auto* *commit* 
 *error...:org.apache.solr.common.SolrException:* *Error* *opening* *new* 
 *searcher*
   *at* 
 *org.apache.solr.core.SolrCore.openNewSearcher*(*SolrCore.java:1138*)
   *at* *org.apache.solr.core.SolrCore.getSearcher*(*SolrCore.java:1251*)
   *at* 
 *org.apache.solr.update.DirectUpdateHandler2.commit*(*DirectUpdateHandler2.java:409*)
   *at* 
 *org.apache.solr.update.CommitTracker.run*(*CommitTracker.java:197*)
   *at* 
 *java.util.concurrent.Executors$RunnableAdapter.call*(*Executors.java:441*)
   *at* 
 *java.util.concurrent.FutureTask$Sync.innerRun*(*FutureTask.java:303*)
   *at* *java.util.concurrent.FutureTask.run*(*FutureTask.java:138*)
   *at* 
 *java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301*(*ScheduledThreadPoolExecutor.java:98*)
   *at* 
 *java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run*(*ScheduledThreadPoolExecutor.java:206*)
   *at* 
 *java.util.concurrent.ThreadPoolExecutor$Worker.runTask*(*ThreadPoolExecutor.java:886*)
   *at* 
 *java.util.concurrent.ThreadPoolExecutor$Worker.run*(*ThreadPoolExecutor.java:908*)
   *at* *java.lang.Thread.run*(*Thread.java:662*)*Caused* *by:* 
 *java.io.FileNotFoundException:* 
 */opt/mci/data/srwp01mci001/inventory/index/_4q1y_0.tip* (*Too many open 
 files*)
   *at* *java.io.RandomAccessFile.open*(*Native* *Method*)
   *at* *java.io.RandomAccessFile.**init*(*RandomAccessFile.java:212*)
   *at* 
 

Re: Open deleted index file failing jboss shutdown with Too many open files Error

2012-04-02 Thread Michael McCandless
Hmm, unless the ulimits are low, or the default mergeFactor was
changed, or you have many indexes open in a single JVM, or you keep
too many IndexReaders open, even in an NRT or frequent commit use
case, you should not run out of file descriptors.

Frequent commit/reopen should be perfectly fine, as long as you close
the old readers...

Mike McCandless

http://blog.mikemccandless.com

On Mon, Apr 2, 2012 at 8:35 AM, Erick Erickson erickerick...@gmail.com wrote:
 How often are you committing index updates? This kind of thing
 can happen if you commit too often. Consider setting
 commitWithin to something like, say, 5 minutes. Or doing the
 equivalent with the autoCommit parameters in solrconfig.xml

 If that isn't relevant, you need to provide some more details
 about what you're doing and how you're using Solr

 Best
 Erick

 On Sun, Apr 1, 2012 at 10:47 PM, Gopal Patwa gopalpa...@gmail.com wrote:
 I am using Solr 4.0 nightly build with NRT and I often get this
 error during auto commit Too many open files. I have search this forum
 and what I found it is related to OS ulimit setting, please see below my
 ulimit settings. I am not sure what ulimit setting I should have for open
 file? ulimit -n unlimited?.

 Even if I set to higher number, it will just delay the issue until it reach
 new open file limit. What I have seen that Solr has kept deleted index file
 open by java process, which causing issue for our application server jboss
 to shutdown gracefully due this open files by java process.

 I have seen recently this issue was resolved in lucene, is it TRUE?

 https://issues.apache.org/jira/browse/LUCENE-3855


 I have 3 core with index size : core1 - 70GB, Core2 - 50GB and Core3
 - 15GB, with Single shard

 We update the index every 5 seconds, soft commit every 1 second and hard
 commit every 15 minutes

 Environment: Jboss 4.2, JDK 1.6 64 bit, CentOS , JVM Heap Size = 24GB*


 ulimit:

 core file size          (blocks, -c) 0

 data seg size           (kbytes, -d) unlimited

 scheduling priority             (-e) 0

 file size               (blocks, -f) unlimited

 pending signals                 (-i) 401408

 max locked memory       (kbytes, -l) 1024

 max memory size         (kbytes, -m) unlimited

 open files                      (-n) 4096

 pipe size            (512 bytes, -p) 8

 POSIX message queues     (bytes, -q) 819200

 real-time priority              (-r) 0

 stack size              (kbytes, -s) 10240

 cpu time               (seconds, -t) unlimited

 max user processes              (-u) 401408

 virtual memory          (kbytes, -v) unlimited

 file locks                      (-x) unlimited


 ERROR:*

 *2012-04-01* *20:08:35*,*323* [] *priority=ERROR* *app_name=*
 *thread=pool-10-thread-1* *location=CommitTracker* *line=93* *auto*
 *commit* *error...:org.apache.solr.common.SolrException:* *Error*
 *opening* *new* *searcher*
        *at* 
 *org.apache.solr.core.SolrCore.openNewSearcher*(*SolrCore.java:1138*)
        *at* *org.apache.solr.core.SolrCore.getSearcher*(*SolrCore.java:1251*)
        *at* 
 *org.apache.solr.update.DirectUpdateHandler2.commit*(*DirectUpdateHandler2.java:409*)
        *at* 
 *org.apache.solr.update.CommitTracker.run*(*CommitTracker.java:197*)
        *at* 
 *java.util.concurrent.Executors$RunnableAdapter.call*(*Executors.java:441*)
        *at* 
 *java.util.concurrent.FutureTask$Sync.innerRun*(*FutureTask.java:303*)
        *at* *java.util.concurrent.FutureTask.run*(*FutureTask.java:138*)
        *at* 
 *java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301*(*ScheduledThreadPoolExecutor.java:98*)
        *at* 
 *java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run*(*ScheduledThreadPoolExecutor.java:206*)
        *at* 
 *java.util.concurrent.ThreadPoolExecutor$Worker.runTask*(*ThreadPoolExecutor.java:886*)
        *at* 
 *java.util.concurrent.ThreadPoolExecutor$Worker.run*(*ThreadPoolExecutor.java:908*)
        *at* *java.lang.Thread.run*(*Thread.java:662*)*Caused* *by:*
 *java.io.FileNotFoundException:*
 */opt/mci/data/srwp01mci001/inventory/index/_4q1y_0.tip* (*Too many
 open files*)
        *at* *java.io.RandomAccessFile.open*(*Native* *Method*)
        *at* *java.io.RandomAccessFile.**init*(*RandomAccessFile.java:212*)
        *at* 
 *org.apache.lucene.store.FSDirectory$FSIndexOutput.**init*(*FSDirectory.java:449*)
        *at* 
 *org.apache.lucene.store.FSDirectory.createOutput*(*FSDirectory.java:288*)
        *at* 
 *org.apache.lucene.codecs.BlockTreeTermsWriter.**init*(*BlockTreeTermsWriter.java:161*)
        *at* 
 *org.apache.lucene.codecs.lucene40.Lucene40PostingsFormat.fieldsConsumer*(*Lucene40PostingsFormat.java:66*)
        *at* 
 *org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.addField*(*PerFieldPostingsFormat.java:118*)
        *at* 
 *org.apache.lucene.index.FreqProxTermsWriterPerField.flush*(*FreqProxTermsWriterPerField.java:322*)
        *at* 
 

Re: Merging results from two queries

2012-04-02 Thread John Chee
Karthick,

The solution that I use to this problem is to perform query1 and
query2 and boost results matching query1. Then solr takes care of all
the deduplication (not necessarily merging) automatically, would this
work for your situation?

I stole this idea from this slide deck:

Make sure all relevant documents match... Make sure the best matching
documents score highest... --
http://www.lucidimagination.com/files/relevancy-ranking-meetup-presentation-14-dec-10.pptx
(page 19)

On Mon, Apr 2, 2012 at 7:28 AM, Karthick Duraisamy Soundararaj
karthick.soundara...@gmail.com wrote:
 Hi all,
        I am finding a need to merge the results of multiple queries to
 accomplish a functionality similar to this :

                     1. Make query 1
                     2. If results returned by query1 is less than a
 certain threshold, then Make query 2

 Extending this idea, I want to be able to create a query chain, i.e,
 provide a functionality where you could specify n queries and n-1
 thresholds in a single url. Start querying in the order from 1 to n until
 one of them produces results that exceed the threshold.

 PS: These n queries and n threshold are passed on a single url and each of
 them could use different request handlers and therefore take a different
 set of parameters.

 Any suggestions/thoughts/pointers as where to begin looking for will be of
 great help!

 Thanks,
 Karthick


SolrJ updating indexed documents?

2012-04-02 Thread Mike O'Leary
I am working on a component for indexing documents from a database that 
contains medical records. The information is organized across several tables 
and I am supposed to index records for varying sizes of sets of patients for 
others to do IR experiments with. Each patient record has one or more main 
documents associated with it, and each main document has zero or more addenda 
associated with it. (The main documents and addenda are treated alike for the 
most part, except for a parent record field that is null for main documents and 
has the number of a main document for addenda. Addenda cannot have addenda.) 
Also, each main document has one or more diagnosis records. I am trying to 
figure out the best performing way to select all of the records for each 
patient, including the main documents, addenda and diagnoses.

I tried indexing sets of these records using DataImportHandler and nested 
Entity blocks in a way similar to the Full Import example on the 
http://wiki.apache.org/solr/DataImportHandler page, with a select for all 
patients and main records in a data set, and nested selects that get all of the 
addenda and all of the diagnoses for each patient, but it didn't run very fast 
and a database resource person who looked into it with me said that issuing a 
million SQL queries for addenda and a million queries for diagnoses, one each 
for the million patient documents in a typical set of 10,000 patients, was very 
inefficient, and I should look for a different way of getting the data.

I switched to using SolrJ, and I am trying to figure out which of two ways to 
use to index this data. One would be to use one large SQL statement to get all 
of the data for a patient set. The results would contain duplication due to the 
way tables are joined together that I would need to sort out in the Java code, 
but that is doable.

The other way would be to

1.   Get all of the main document data with one SQL query, create index 
documents with the data that they contain and store them in the index,

2.   Issue another SQL query that gets all of the addenda for all of the 
patients in the data set and an id number for each one that tells which main 
document an addendum belongs with, retrieve the main documents from the index, 
add the addenda fields to the document and put them back in the index

3.   Do the same with diagnosis data.
It would be great to be able to keep the main document data that is retrieved 
from the database in a hash table, update each of those objects with addenda 
and diagnoses, and write completely filled out documents to the index once, but 
I don't have enough memory available to do this for the patient sets I am 
working with now, and they want this indexing process to scale up to patient 
sets that are ten times as large and eventually much larger than that.

Essentially for the second approach I am wondering if a Lucene index can be 
made to serve as a hash table for storing intermediate results, and whether 
SolrJ has an API for retrieving individual index documents so they can be 
updated. Basically it would be shifting from iterating over SQL queries to 
iterating over Lucene index updates. If this way of doing things is also likely 
to be slow, or the SolrJ API doesn't provide a way to do this, or there are 
other problems with it, I can go with selecting all of the data in one large 
query and dealing with the duplication.
Thanks,
Mike


Re: viewing the terms indexed for a specific document

2012-04-02 Thread karthik
A few more details to this thread -

when i try the analysis tab from the admin console I see that the synonym
is kicking in  its matching the text in the document that I am expecting
to see as part of the results. However the actual search is not returning
that document.

Also I used the termcomponent and tried to see how many docs match the
synonym term  i don't see the term at all.

So not sure how to check if this is working or not.

Thanks,
Karthik

On Mon, Apr 2, 2012 at 3:41 PM, karthik kmoha...@gmail.com wrote:

 Hi,

 I am trying to view what terms are getting indexed for a specific field in
 a specific document. How can i view this information?

 I tried the luke handler  it's not showing me what I am looking for. I am
 using Solr 3.1.0.

 I am using index time synonym expansion  saw that one of my synonym was
 not working. In general synonyms are working since there are many other
 cases where they are working. So to debug this issue I wanted to see if the
 synonym for the word is stored within the field for a given document inside
 the index. Luke showed me the actual string from the document but not the
 synonym.

 I tested luke on a different document which gets returned while using a
 synonym and I dont see the synonym term in the field str name=value
 or str name=internal of the luke handler.

 Any pointers on how to view the actual indexed term would be helpful.

 Thanks,
 Karthik



Re: Distributed grouping issue

2012-04-02 Thread Martijn v Groningen

 All documents of a group exist on a single shard, there are no cross-shard
 groups.

You only have to partition documents by group when the groupCount and some
other features need to be accurate. For the matches this is not
necessary. The matches are summed up during merging the shared responses.

I can't reproduce the error you are describing on a small local setup I
have here. I have two Solr cores with a simple schema. Each core has 3
documents. When grouping the matches element returns 6. I'm running on a
trunk that I have updated 30 minutes ago. Can you try to isolate the
problem by testing with a small subset of your data?

Martijn


RE: Distributed grouping issue

2012-04-02 Thread Young, Cody
Okay, I've played with this a bit more. Found something interesting:

When the groups returned do not include results from a core, then the core is 
excluded from the count. (I have 1 group, 2 documents per core)

Example:

http://localhost:8983/solr/core0/select/?q=*:*shards=localhost:8983/solr/core0,localhost:8983/solr/core1group=truegroup.field=group_fieldgroup.limit=10rows=1

lst name=grouped
lst name=group_field
int name=matches2/int

Then, just by changing rows=2

http://localhost:8983/solr/core0/select/?q=*:*shards=localhost:8983/solr/core0,localhost:8983/solr/core1group=truegroup.field=group_fieldgroup.limit=10rows=2

lst name=grouped
lst name=group_field
int name=matches4/int

Let me know if you have any luck reproducing.

Thanks,
Cody 

-Original Message-
From: martijn.is.h...@gmail.com [mailto:martijn.is.h...@gmail.com] On Behalf Of 
Martijn v Groningen
Sent: Monday, April 02, 2012 1:48 PM
To: solr-user@lucene.apache.org
Subject: Re: Distributed grouping issue


 All documents of a group exist on a single shard, there are no 
 cross-shard groups.

You only have to partition documents by group when the groupCount and some 
other features need to be accurate. For the matches this is not necessary. 
The matches are summed up during merging the shared responses.

I can't reproduce the error you are describing on a small local setup I have 
here. I have two Solr cores with a simple schema. Each core has 3 documents. 
When grouping the matches element returns 6. I'm running on a trunk that I have 
updated 30 minutes ago. Can you try to isolate the problem by testing with a 
small subset of your data?

Martijn


Re: pattern error in PatternReplaceCharFilterFactory

2012-04-02 Thread Chris Hostetter

: It seems to be an unrecognisable pattern, this is from the log, last
: paragraph says unknown character block name. The java version is
: 1.6.0_31:

Did you read the rest of my reply? about testing if java recognizes your 
block name independent of Solr ... because that error is coming directly 
from the java regex engine...

: Caused by: java.util.regex.PatternSyntaxException: Unknown character block
: name {Latin-1_Supplement} near index 23
: \p{InLatin-1_Supplement}
:^
: at java.util.regex.Pattern.error(Pattern.java:1713)
: at java.util.regex.Pattern.unicodeBlockPropertyFor(Pattern.java:2424)

Why are you using an _ at all? Isn't \p{InLatin-1 Supplement}  (or 
\p{InLatin-1Supplement} what you mean? Either of those work for me, and 
match the javadocs for what block names are supported in the JVM...

http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html#ubc
 The block names supported by Pattern are the valid block names accepted 
 and defined by UnicodeBlock.forName.

http://docs.oracle.com/javase/6/docs/api/java/lang/Character.UnicodeBlock.html#forName%28java.lang.String%29
 This method accepts block names in the following forms:
 
   1. Canonical block names as defined by the Unicode Standard. For
   example, the standard defines a Basic Latin block. Therefore, this
   method accepts Basic Latin as a valid block name. The documentation
   of each UnicodeBlock provides the canonical name.
   2. Canonical block names with all spaces removed. For example,
   BasicLatin is a valid block name for the Basic Latin block.
   ...



-Hoss


Re: Merging results from two queries

2012-04-02 Thread Erick Erickson
Part of it depends on what you mean by threshold. If it's
just the number of matches, then fine. But if you're talking score
here, be very, very careful. Scores are not an absolute measure
of anything, they only tell you that for _this_ query, the docs
should be order this way.

So I'd advise against any query chain based on scores
as the threshold, if that's what you mean by threshold.

Best
Erick

On Mon, Apr 2, 2012 at 10:28 AM, Karthick Duraisamy Soundararaj
karthick.soundara...@gmail.com wrote:
 Hi all,
        I am finding a need to merge the results of multiple queries to
 accomplish a functionality similar to this :

                     1. Make query 1
                     2. If results returned by query1 is less than a
 certain threshold, then Make query 2

 Extending this idea, I want to be able to create a query chain, i.e,
 provide a functionality where you could specify n queries and n-1
 thresholds in a single url. Start querying in the order from 1 to n until
 one of them produces results that exceed the threshold.

 PS: These n queries and n threshold are passed on a single url and each of
 them could use different request handlers and therefore take a different
 set of parameters.

 Any suggestions/thoughts/pointers as where to begin looking for will be of
 great help!

 Thanks,
 Karthick


Re: viewing the terms indexed for a specific document

2012-04-02 Thread Erick Erickson
If you add explainOther=some id, see:
http://wiki.apache.org/solr/SolrRelevancyFAQ

you might get some hints. You can use the TermsComponent
to see if the synonyms are getting in the index, but you'll
have to have a very restricted input set (like one doc) for that
to be helpful for a specific document.

Ahhh, try getting the stand-alone Luke program, it allows
a lower-level exploration of the index, see:
http://code.google.com/p/luke/
The LukeRequestHandler is based on Luke, but Luke
itself is more flexible.

When are you putting synonyms in? Index time? Query time?
Both? Showing your schema.xml fragment for the field
in question would help diagnose the problem, as would
showing the results of attaching debugQuery=on to the
URL.

Best
Erick

On Mon, Apr 2, 2012 at 4:26 PM, karthik kmoha...@gmail.com wrote:
 A few more details to this thread -

 when i try the analysis tab from the admin console I see that the synonym
 is kicking in  its matching the text in the document that I am expecting
 to see as part of the results. However the actual search is not returning
 that document.

 Also I used the termcomponent and tried to see how many docs match the
 synonym term  i don't see the term at all.

 So not sure how to check if this is working or not.

 Thanks,
 Karthik

 On Mon, Apr 2, 2012 at 3:41 PM, karthik kmoha...@gmail.com wrote:

 Hi,

 I am trying to view what terms are getting indexed for a specific field in
 a specific document. How can i view this information?

 I tried the luke handler  it's not showing me what I am looking for. I am
 using Solr 3.1.0.

 I am using index time synonym expansion  saw that one of my synonym was
 not working. In general synonyms are working since there are many other
 cases where they are working. So to debug this issue I wanted to see if the
 synonym for the word is stored within the field for a given document inside
 the index. Luke showed me the actual string from the document but not the
 synonym.

 I tested luke on a different document which gets returned while using a
 synonym and I dont see the synonym term in the field str name=value
 or str name=internal of the luke handler.

 Any pointers on how to view the actual indexed term would be helpful.

 Thanks,
 Karthik



Re: Tags and Folksonomies

2012-04-02 Thread Chris Hostetter

: Suppose I have content which has title and description. Users can tag content
: and search content based on tag, title and description. Tag has more
: weightage.
: 
: Any inputs on how indexing and retrieval will work given there is content
: and tags using Solr? Has anyone implemented search based on collaborative
: tagging?

simple stuff would be to have your 3 fields, and search them with a 
weighted boosting -- giving more importance to the tag field.

where things get more complicated is when you want docA to score 
higher for hte query boat then docB because 100 users have taged docA 
with boat, but only 5 users have taged docB boat

The canonical way to deal with this would be using payloads to boost the 
weight of a term -- the DelimitedPayloadTokenFilterFactory can help with 
this at index time, but off the top of my head i don't think any of the 
existing Solr QParsers will build the neccessary PayloadTermQuery, so you 
might have to roll your own -- there are afew Jira issues with patches 
that you might be able to re-use or get inspired from...

https://issues.apache.org/jira/browse/SOLR-1485




-Hoss


Re: Merging results from two queries

2012-04-02 Thread Karthick Duraisamy Soundararaj
@Eric
By threshold, all I mean is the count of the documents returned and I am
not going to play with score. So if I have to commit my code to svn, whats
the best way to go about it? I know I have to discuss my design here which
would take atleast a couple of days. But is there special instructions that
I need to follow in order to stay in a direction from where I could commit
my code?


@John
Yes, thats definitely a solution but then I dont want to make two different
http requests. I want to make 1 request and all that I mentioned has to
happen.



On Mon, Apr 2, 2012 at 7:28 PM, Erick Erickson erickerick...@gmail.comwrote:

 Part of it depends on what you mean by threshold. If it's
 just the number of matches, then fine. But if you're talking score
 here, be very, very careful. Scores are not an absolute measure
 of anything, they only tell you that for _this_ query, the docs
 should be order this way.

 So I'd advise against any query chain based on scores
 as the threshold, if that's what you mean by threshold.

 Best
 Erick

 On Mon, Apr 2, 2012 at 10:28 AM, Karthick Duraisamy Soundararaj
 karthick.soundara...@gmail.com wrote:
  Hi all,
 I am finding a need to merge the results of multiple queries to
  accomplish a functionality similar to this :
 
  1. Make query 1
  2. If results returned by query1 is less than a
  certain threshold, then Make query 2
 
  Extending this idea, I want to be able to create a query chain, i.e,
  provide a functionality where you could specify n queries and n-1
  thresholds in a single url. Start querying in the order from 1 to n until
  one of them produces results that exceed the threshold.
 
  PS: These n queries and n threshold are passed on a single url and each
 of
  them could use different request handlers and therefore take a different
  set of parameters.
 
  Any suggestions/thoughts/pointers as where to begin looking for will be
 of
  great help!
 
  Thanks,
  Karthick



Re: Distributed grouping issue

2012-04-02 Thread Martijn v Groningen
I tried the to reproduce this. However the matches always returns 4 in my
case (when using rows=1 and rows=2).
In your case the 2 documents on each core do belong to the same group,
right?

I did find something else. If I use rows=0 then an error occurs. I think we
need to further investigate this.
Can you open an issue in Jira? I'm a bit busy today. We can then further
look into this in the coming days.

Martijn

On 2 April 2012 23:00, Young, Cody cody.yo...@move.com wrote:

 Okay, I've played with this a bit more. Found something interesting:

 When the groups returned do not include results from a core, then the core
 is excluded from the count. (I have 1 group, 2 documents per core)

 Example:


 http://localhost:8983/solr/core0/select/?q=*:*shards=localhost:8983/solr/core0,localhost:8983/solr/core1group=truegroup.field=group_fieldgroup.limit=10rows=1

 lst name=grouped
 lst name=group_field
 int name=matches2/int

 Then, just by changing rows=2


 http://localhost:8983/solr/core0/select/?q=*:*shards=localhost:8983/solr/core0,localhost:8983/solr/core1group=truegroup.field=group_fieldgroup.limit=10rows=2

 lst name=grouped
 lst name=group_field
 int name=matches4/int

 Let me know if you have any luck reproducing.

 Thanks,
 Cody

 -Original Message-
 From: martijn.is.h...@gmail.com [mailto:martijn.is.h...@gmail.com] On
 Behalf Of Martijn v Groningen
 Sent: Monday, April 02, 2012 1:48 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Distributed grouping issue

 
  All documents of a group exist on a single shard, there are no
  cross-shard groups.
 
 You only have to partition documents by group when the groupCount and some
 other features need to be accurate. For the matches this is not
 necessary. The matches are summed up during merging the shared responses.

 I can't reproduce the error you are describing on a small local setup I
 have here. I have two Solr cores with a simple schema. Each core has 3
 documents. When grouping the matches element returns 6. I'm running on a
 trunk that I have updated 30 minutes ago. Can you try to isolate the
 problem by testing with a small subset of your data?

 Martijn




-- 
Met vriendelijke groet,

Martijn van Groningen