Re: SolrCloud
Thanks for replying, So if i will make a replica of each shard,then should I use zookeeper for every shards and replica or only for the replica.! more question i want to ask is that I am using solr in tomcat and eclipse environment using solrj.so I am a bit confuse as to how to use zookeeper in it along with tomcat.I have downloaded zookeeper jar files also but need little help in it. -Asia -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-tp3867086p3876869.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud new....
Hello, even I am working on the same,I have tried with the wiki example but I am getting errors.I want to use zookeeper with solrj in eclipse using tomcat.Need little help.as to how to integrate zookeeper in eclipse for solrcloud. -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-new-tp1528872p3876928.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Responding to Requests with Chunks/Streaming
Hello, Small update - reading streamed response is done via callback. No SolrDocumentList in memory. https://github.com/m-khl/solr-patches/tree/streaming here is the test https://github.com/m-khl/solr-patches/blob/d028d4fabe0c20cb23f16098637e2961e9e2366e/solr/core/src/test/org/apache/solr/response/ResponseStreamingTest.java#L138 no progress in distributed search via streaming yet. Pls let me know if you don't want to have updates from my playground. Regards On Thu, Mar 29, 2012 at 1:02 PM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: @All Why nobody desires such a pretty cool feature? Nicholas, I have a tiny progress: I'm able to stream in javabin codec format while searching, It implies sorting by _docid_ here is the diff https://github.com/m-khl/solr-patches/commit/2f9ff068c379b3008bb983d0df69dff714ddde95 The current issue is that reading response by SolrJ is done as whole. Reading by callback is supported by EmbeddedServer only. Anyway it should not a big deal. ResponseStreamingTest.java somehow works. I'm stuck on introducing response streaming in distributes search, it's actually more challenging - RespStreamDistributedTest fails Regards On Fri, Mar 16, 2012 at 3:51 PM, Nicholas Ball nicholas.b...@nodelay.comwrote: Mikhail Ludovic, Thanks for both your replies, very helpful indeed! Ludovic, I was actually looking into just that and did some tests with SolrJ, it does work well but needs some changes on the Solr server if we want to send out individual documents a various times. This could be done with a write() and flush() to the FastOutputStream (daos) in JavBinCodec. I therefore think that a combination of this and Mikhail's solution would work best! Mikhail, you mention that your solution doesn't currently work and not sure why this is the case, but could it be that you haven't flushed the data (os.flush()) you've written in the collect method of DocSetStreamer? I think placing the output stream into the SolrQueryRequest is the way to go, so that we can access it and write to it how we intend. However, I think using the JavaBinCodec would be ideal so that we can work with SolrJ directly, and not mess around with the encoding of the docs/data etc... At the moment the entry point to JavaBinCodec is through the BinaryResponseWriter which calls the highest level marshal() method which decodes and sends out the entire SolrQueryResponse (line 49 @ BinaryResponseWriter). What would be ideal is to be able to break up the response and call the JavaBinCodec for pieces of it with a flush after each call. Did a few tests with a simple Thread.sleep and a flush to see if this would actually work and looks like it's working out perfectly. Just trying to figure out the best way to actually do it now :) any ideas? An another note, for a solution to work with the chunked transfer encoding (and therefore web browsers), a lot more development is going to be needed. Not sure if it's worth trying yet but might look into it later down the line. Nick On Fri, 16 Mar 2012 07:29:20 +0300, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Ludovic, I looked through. First of all, it seems to me you don't amend regular servlet solr server, but the only embedded one. Anyway, the difference is that you stream DocList via callback, but it means that you've instantiated it in memory and keep it there until it will be completely consumed. Think about a billion numfound. Core idea of my approach is keep almost zero memory for response. Regards On Fri, Mar 16, 2012 at 12:12 AM, lboutros boutr...@gmail.com wrote: Hi, I was looking for something similar. I tried this patch : https://issues.apache.org/jira/browse/SOLR-2112 it's working quite well (I've back-ported the code in Solr 3.5.0...). Is it really different from what you are trying to achieve ? Ludovic. - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/Responding-to-Requests-with-Chunks-Streaming-tp3827316p3829909.html Sent from the Solr - User mailing list archive at Nabble.com. -- Sincerely yours Mikhail Khludnev ge...@yandex.ru http://www.griddynamics.com mkhlud...@griddynamics.com -- Sincerely yours Mikhail Khludnev ge...@yandex.ru http://www.griddynamics.com mkhlud...@griddynamics.com
Re: pattern error in PatternReplaceCharFilterFactory
Hi It seems to be an unrecognisable pattern, this is from the log, last paragraph says unknown character block name. The java version is 1.6.0_31: *** SEVERE: null:org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] fieldType:Plugin init failure for [schema.xml] analyzer/charFilter:Configuration Error: 'pattern' can not be parsed in org.apache.solr.analysis.PatternReplaceCharFilterFactory at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:167) at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:357) at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:106) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:756) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:473) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:296) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:99) at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:295) at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:422) at org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:115) at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4072) at org.apache.catalina.core.StandardContext.start(StandardContext.java:4726) at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:799) at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:779) at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:601) at org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:943) at org.apache.catalina.startup.HostConfig.deployWARs(HostConfig.java:778) at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:504) at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1317) at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:324) at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:142) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1065) at org.apache.catalina.core.StandardHost.start(StandardHost.java:840) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1057) at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:463) at org.apache.catalina.core.StandardService.start(StandardService.java:525) at org.apache.catalina.core.StandardServer.start(StandardServer.java:754) at org.apache.catalina.startup.Catalina.start(Catalina.java:595) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289) at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414) Caused by: org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] analyzer/charFilter:Configuration Error: 'pattern' can not be parsed in org.apache.solr.analysis.PatternReplaceCharFilterFactory at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:167) at org.apache.solr.schema.FieldTypePluginLoader.readAnalyzer(FieldTypePluginLoader.java:290) at org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:95) at org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:43) at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:141) ... 33 more Caused by: java.lang.RuntimeException: Configuration Error: 'pattern' can not be parsed in org.apache.solr.analysis.PatternReplaceCharFilterFactory at org.apache.solr.analysis.PatternReplaceCharFilterFactory.init(PatternReplaceCharFilterFactory.java:54) at org.apache.solr.schema.FieldTypePluginLoader$1.init(FieldTypePluginLoader.java:278) at org.apache.solr.schema.FieldTypePluginLoader$1.init(FieldTypePluginLoader.java:268) at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:149) ... 37 more Caused by: java.util.regex.PatternSyntaxException: Unknown character block name {Latin-1_Supplement} near index 23 \p{InLatin-1_Supplement} ^ at java.util.regex.Pattern.error(Pattern.java:1713) at java.util.regex.Pattern.unicodeBlockPropertyFor(Pattern.java:2424) at java.util.regex.Pattern.family(Pattern.java:2408) at java.util.regex.Pattern.sequence(Pattern.java:1831) at
RE: Distributed grouping issue
Hi, when you write I get xxx results, does it come from 'numFound' ? Or you really display xxx results ? When using both field collapsing and sharding, the 'numFound' may be wrong. In that case, think about using 'shards.rows' parameter with a high value (be careful, it's bad for performance). If the problem is really about the returned results, it may be because of several documents having the same unique key document_id in different shards. Hope it helps, Franck Le vendredi 30 mars 2012 à 23:52 +, Young, Cody a écrit : I forgot to mention, I can see the distributed requests happening in the logs: Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute INFO: [core2] webapp=/solr path=/select params={group.distributed.first=truedistrib=falsewt=javabinrows=10version=2fl=document_id,scoreshard.url=localhost:8086/solr/core2NOW=1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShard=true} status=0 QTime=2 Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute INFO: [core4] webapp=/solr path=/select params={group.distributed.first=truedistrib=falsewt=javabinrows=10version=2fl=document_id,scoreshard.url=localhost:8086/solr/core4NOW=1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShard=true} status=0 QTime=1 Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute INFO: [core1] webapp=/solr path=/select params={group.distributed.first=truedistrib=falsewt=javabinrows=10version=2fl=document_id,scoreshard.url=localhost:8086/solr/core1NOW=1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShard=true} status=0 QTime=1 Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute INFO: [core3] webapp=/solr path=/select params={group.distributed.first=truedistrib=falsewt=javabinrows=10version=2fl=document_id,scoreshard.url=localhost:8086/solr/core3NOW=1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShard=true} status=0 QTime=1 Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute INFO: [core0] webapp=/solr path=/select params={group.distributed.first=truedistrib=falsewt=javabinrows=10version=2fl=document_id,scoreshard.url=localhost:8086/solr/core0NOW=1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShard=true} status=0 QTime=1 Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute INFO: [core6] webapp=/solr path=/select params={group.distributed.first=truedistrib=falsewt=javabinrows=10version=2fl=document_id,scoreshard.url=localhost:8086/solr/core6NOW=1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShard=true} status=0 QTime=0 Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute INFO: [core7] webapp=/solr path=/select params={group.distributed.first=truedistrib=falsewt=javabinrows=10version=2fl=document_id,scoreshard.url=localhost:8086/solr/core7NOW=1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShard=true} status=0 QTime=3 Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute INFO: [core5] webapp=/solr path=/select params={group.distributed.first=truedistrib=falsewt=javabinrows=10version=2fl=document_id,scoreshard.url=localhost:8086/solr/core5NOW=1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShard=true} status=0 QTime=1 Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute INFO: [core4] webapp=/solr path=/select params={distrib=falsegroup.distributed.second=truewt=javabinversion=2rows=10group.topgroups.group_field=4183765296group.topgroups.group_field=4608765424group.topgroups.group_field=3524954944group.topgroups.group_field=4182445488group.topgroups.group_field=4213143392group.topgroups.group_field=4328299312group.topgroups.group_field=4206259648group.topgroups.group_field=3465497912group.topgroups.group_field=3554417600group.topgroups.group_field=3140802904fl=document_id,scoreshard.url=localhost:8086/solr/core4NOW=1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShard=true} status=0 QTime=2 Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute INFO: [core6] webapp=/solr path=/select params={distrib=falsegroup.distributed.second=truewt=javabinversion=2rows=10group.topgroups.group_field=4183765296group.topgroups.group_field=4608765424group.topgroups.group_field=3524954944group.topgroups.group_field=4182445488group.topgroups.group_field=4213143392group.topgroups.group_field=4328299312group.topgroups.group_field=4206259648group.topgroups.group_field=3465497912group.topgroups.group_field=3554417600group.topgroups.group_field=3140802904fl=document_id,scoreshard.url=localhost:8086/solr/core6NOW=1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShard=true} status=0 QTime=2 Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute INFO: [core4] webapp=/solr path=/select
Re: Solr caching memory consumption Problem
Hello friends, I am using DIH for solr indexing. I have 60 million records in SQL which need to upload on solr. i started caching its smoothly working and memory consumption is normal, But after some time incrementally memory consumption going high and process reach more then 6 gb. that the reason i am not able to caching my data. please advise me if anything need to be done in configuration or in tomcat configuration. this will be very help full for me. - Regards, Suneel Pandey Sr. Software Developer -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-caching-memory-consumption-Problem-tp3873158p3877081.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Virtual Memory very high
Hello Everyone, On window server. I am facing same problem during indexing my memory consumption going very high based on above discussion i checked in my Solrconfig.xml file and found that directoryFactory not configured yet. if i configuring directoryfactory then its will help me reduce the consumption of memory. i think below configuration used for linex server. directoryFactory name=DirectoryFactory class=${solr.directoryFactory:solr.NIOFSDirectoryFactory}/ what will be best option for window server which solve my problem. Please suggest me. - Regards, Suneel Pandey Sr. Software Developer -- View this message in context: http://lucene.472066.n3.nabble.com/Virtual-Memory-very-high-tp3574817p3877097.html Sent from the Solr - User mailing list archive at Nabble.com.
Using UIMA in Solr behind a firewall
Hi! I'm desperately trying to work out how to configure Solr in order to allow it to make calls to the Alchemy service through the UIMA analysis engines. Is there anybody who has been able to accomplish this? Cheers -- View this message in context: http://lucene.472066.n3.nabble.com/Using-UIMA-in-Solr-behind-a-firewall-tp3877143p3877143.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Empty facet counts
Alright well I discovered that php converts '.' in a variable name to '_' causing my request to contain a variable to a non-exsistent facet_field. 2012/3/30 William Bell billnb...@gmail.com Can you also include a /select?q=*:*wt=xml ? On Thu, Mar 29, 2012 at 11:47 AM, Erick Erickson erickerick...@gmail.com wrote: Hmmm, looking at your schema, faceting on a uniqueKey really doesn't make all that much sense, there will always be exactly one of them. At least it's highly questionable. But that's not your problem and what's wrong isn't at all obvious. Can you try pasting the results of adding debugQuery=on? Best Erick On Thu, Mar 29, 2012 at 11:12 AM, Youri Westerman yo...@pluxcustoms.nl wrote: The version is 3.5.0.2011.11.22.14.54.38. I did not apply any patches, but then again it is not my server. Do you have a clue on what is going wrong here? Regards, Youri 2012/3/29 Bill Bell billnb...@gmail.com Send schema.xml and did you apply any patches? What version of Solr? Bill Bell Sent from mobile On Mar 29, 2012, at 5:26 AM, Youri Westerman yo...@pluxcustoms.nl wrote: Hi, I'm currently learning how to use solr and everything seems pretty straight forward. For some reason when I use faceted queries it returns only empty sets in the facet_count section. The get params I'm using are: ?q=*:*rows=0facet=truefacet.field=urn The result: facet_counts: { facet_queries: { }, facet_fields: { }, facet_dates: { }, facet_ranges: { } } The urn field is indexed and there are enough entries to be counted. When adding facet.method=Enum, nothing changes. Does anyone know why this is happening? Am I missing something? Thanks in advance! Youri -- Bill Bell billnb...@gmail.com cell 720-256-8076
Re: Distributed grouping issue
The matches element in the response should return the number of documents that matched with the query and not the number of groups. Did you encountered this issue also with other Solr versions (3.5 or another nightly build)? Martijn On 2 April 2012 09:41, fbrisbart fbrisb...@bestofmedia.com wrote: Hi, when you write I get xxx results, does it come from 'numFound' ? Or you really display xxx results ? When using both field collapsing and sharding, the 'numFound' may be wrong. In that case, think about using 'shards.rows' parameter with a high value (be careful, it's bad for performance). If the problem is really about the returned results, it may be because of several documents having the same unique key document_id in different shards. Hope it helps, Franck Le vendredi 30 mars 2012 à 23:52 +, Young, Cody a écrit : I forgot to mention, I can see the distributed requests happening in the logs: Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute INFO: [core2] webapp=/solr path=/select params={group.distributed.first=truedistrib=falsewt=javabinrows=10version=2fl=document_id,scoreshard.url=localhost:8086/solr/core2NOW=1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShard=true} status=0 QTime=2 Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute INFO: [core4] webapp=/solr path=/select params={group.distributed.first=truedistrib=falsewt=javabinrows=10version=2fl=document_id,scoreshard.url=localhost:8086/solr/core4NOW=1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShard=true} status=0 QTime=1 Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute INFO: [core1] webapp=/solr path=/select params={group.distributed.first=truedistrib=falsewt=javabinrows=10version=2fl=document_id,scoreshard.url=localhost:8086/solr/core1NOW=1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShard=true} status=0 QTime=1 Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute INFO: [core3] webapp=/solr path=/select params={group.distributed.first=truedistrib=falsewt=javabinrows=10version=2fl=document_id,scoreshard.url=localhost:8086/solr/core3NOW=1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShard=true} status=0 QTime=1 Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute INFO: [core0] webapp=/solr path=/select params={group.distributed.first=truedistrib=falsewt=javabinrows=10version=2fl=document_id,scoreshard.url=localhost:8086/solr/core0NOW=1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShard=true} status=0 QTime=1 Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute INFO: [core6] webapp=/solr path=/select params={group.distributed.first=truedistrib=falsewt=javabinrows=10version=2fl=document_id,scoreshard.url=localhost:8086/solr/core6NOW=1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShard=true} status=0 QTime=0 Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute INFO: [core7] webapp=/solr path=/select params={group.distributed.first=truedistrib=falsewt=javabinrows=10version=2fl=document_id,scoreshard.url=localhost:8086/solr/core7NOW=1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShard=true} status=0 QTime=3 Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute INFO: [core5] webapp=/solr path=/select params={group.distributed.first=truedistrib=falsewt=javabinrows=10version=2fl=document_id,scoreshard.url=localhost:8086/solr/core5NOW=1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShard=true} status=0 QTime=1 Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute INFO: [core4] webapp=/solr path=/select params={distrib=falsegroup.distributed.second=truewt=javabinversion=2rows=10group.topgroups.group_field=4183765296group.topgroups.group_field=4608765424group.topgroups.group_field=3524954944group.topgroups.group_field=4182445488group.topgroups.group_field=4213143392group.topgroups.group_field=4328299312group.topgroups.group_field=4206259648group.topgroups.group_field=3465497912group.topgroups.group_field=3554417600group.topgroups.group_field=3140802904fl=document_id,scoreshard.url=localhost:8086/solr/core4NOW=1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShard=true} status=0 QTime=2 Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute INFO: [core6] webapp=/solr path=/select params={distrib=falsegroup.distributed.second=truewt=javabinversion=2rows=10group.topgroups.group_field=4183765296group.topgroups.group_field=4608765424group.topgroups.group_field=3524954944group.topgroups.group_field=4182445488group.topgroups.group_field=4213143392group.topgroups.group_field=4328299312group.topgroups.group_field=4206259648group.topgroups.group_field=3465497912group.topgroups.group_field=3554417600group.topgroups.group_field=3140802904fl=document_id,scoreshard.url=localhost:8086/solr/core6NOW=1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShard=true} status=0
Apache solr not indexing complete pdf file using tikka
Hello Guys, I am using apache solr 3.3.0 with Tikka 1.0. I have pdf files which I am pushing into solr for conent searching. Apache solr is indexing pdf files and I can see them in apache solr admin interface for search. But the issue is apache solr is not indexing whole file content. It is indexing upto only limited size. Am I missing something, some configuration, or this is the behavior of apache solr? I have tried to update solrconfig.xml. I have updated ramBufferSizeMB, maxFieldLength. Thanks Manoj Saini Thanks, Best Regards, Manoj Saini | Sr. Software Engineer | Stigasoft m: +91 98 1034 1281 | e: mailto:nseh...@stigasoft.com manoj.sa...@stigasoft.com | w: http://www.stigasoft.com www.stigasoft.com
Re: How do I use localparams/joins using SolrJ and/or the Admin GUI
On Friday, March 30, 2012 at 11:33 PM, vybe3142 wrote: When I paste the relevant part of the query into the SOLR admin UI query interface, {!join+from=join_id+to=id}attributes_AUTHORS.4:4, I fail to retrieve any documents Just go and paste the raw content into the form, then you'll get the expected result. if you put in + Characters they will the escaped and result in %3B (as Erick already said)
Re: How do I use localparams/joins using SolrJ and/or the Admin GUI
On Saturday, March 31, 2012 at 6:01 PM, Yonik Seeley wrote: Shouldn't that be the other way? The admin UI should do any necessary escaping, so those + chars should instead be a spaces? We can, but is this really what you'd expect?
Re: How do I use localparams/joins using SolrJ and/or the Admin GUI
On Monday, April 2, 2012 at 2:00 PM, Stefan Matheis wrote: On Friday, March 30, 2012 at 11:33 PM, vybe3142 wrote: When I paste the relevant part of the query into the SOLR admin UI query interface, {!join+from=join_id+to=id}attributes_AUTHORS.4:4, I fail to retrieve any documents Just go and paste the raw content into the form, then you'll get the expected result. if you put in + Characters they will the escaped and result in %3B (as Erick already said) Sorry, perhaps not clear enough .. raw content in this case means: {!join from=join_id to=id}attributes_AUTHORS.4:4 .. space as space and not already escaped as +
Re: SolrCloud
No, you don't have to run zookeeper on each replica. Zookeeper is a repository for your system (cluster) information. It knows about each replica, but ZK does not need to run on each shard. You can run one zookeeper instance for your entire cluster, no matter how many shards/replicas you have. Here's a good place to get started understanding ZK: http://zookeeper.apache.org/ Internally, SolrCloud uses ZooKeeper to understand what to do with update and search requests. In effect, it asks ZK How many shards are there and what is the address of each leader? and does the right thing with the results... My suggestion is that you pretty much forget ZK exists until you get a bit more comfortable with SolrCloud. Run it embedded in a single instance (and do NOT shut that instance down!). From there, you should see SolrCloud just work and it'll at least get you started. Best Erick On Mon, Apr 2, 2012 at 1:59 AM, asia asia.k...@lntinfotech.com wrote: Thanks for replying, So if i will make a replica of each shard,then should I use zookeeper for every shards and replica or only for the replica.! more question i want to ask is that I am using solr in tomcat and eclipse environment using solrj.so I am a bit confuse as to how to use zookeeper in it along with tomcat.I have downloaded zookeeper jar files also but need little help in it. -Asia -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-tp3867086p3876869.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Open deleted index file failing jboss shutdown with Too many open files Error
How often are you committing index updates? This kind of thing can happen if you commit too often. Consider setting commitWithin to something like, say, 5 minutes. Or doing the equivalent with the autoCommit parameters in solrconfig.xml If that isn't relevant, you need to provide some more details about what you're doing and how you're using Solr Best Erick On Sun, Apr 1, 2012 at 10:47 PM, Gopal Patwa gopalpa...@gmail.com wrote: I am using Solr 4.0 nightly build with NRT and I often get this error during auto commit Too many open files. I have search this forum and what I found it is related to OS ulimit setting, please see below my ulimit settings. I am not sure what ulimit setting I should have for open file? ulimit -n unlimited?. Even if I set to higher number, it will just delay the issue until it reach new open file limit. What I have seen that Solr has kept deleted index file open by java process, which causing issue for our application server jboss to shutdown gracefully due this open files by java process. I have seen recently this issue was resolved in lucene, is it TRUE? https://issues.apache.org/jira/browse/LUCENE-3855 I have 3 core with index size : core1 - 70GB, Core2 - 50GB and Core3 - 15GB, with Single shard We update the index every 5 seconds, soft commit every 1 second and hard commit every 15 minutes Environment: Jboss 4.2, JDK 1.6 64 bit, CentOS , JVM Heap Size = 24GB* ulimit: core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 401408 max locked memory (kbytes, -l) 1024 max memory size (kbytes, -m) unlimited open files (-n) 4096 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 10240 cpu time (seconds, -t) unlimited max user processes (-u) 401408 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited ERROR:* *2012-04-01* *20:08:35*,*323* [] *priority=ERROR* *app_name=* *thread=pool-10-thread-1* *location=CommitTracker* *line=93* *auto* *commit* *error...:org.apache.solr.common.SolrException:* *Error* *opening* *new* *searcher* *at* *org.apache.solr.core.SolrCore.openNewSearcher*(*SolrCore.java:1138*) *at* *org.apache.solr.core.SolrCore.getSearcher*(*SolrCore.java:1251*) *at* *org.apache.solr.update.DirectUpdateHandler2.commit*(*DirectUpdateHandler2.java:409*) *at* *org.apache.solr.update.CommitTracker.run*(*CommitTracker.java:197*) *at* *java.util.concurrent.Executors$RunnableAdapter.call*(*Executors.java:441*) *at* *java.util.concurrent.FutureTask$Sync.innerRun*(*FutureTask.java:303*) *at* *java.util.concurrent.FutureTask.run*(*FutureTask.java:138*) *at* *java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301*(*ScheduledThreadPoolExecutor.java:98*) *at* *java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run*(*ScheduledThreadPoolExecutor.java:206*) *at* *java.util.concurrent.ThreadPoolExecutor$Worker.runTask*(*ThreadPoolExecutor.java:886*) *at* *java.util.concurrent.ThreadPoolExecutor$Worker.run*(*ThreadPoolExecutor.java:908*) *at* *java.lang.Thread.run*(*Thread.java:662*)*Caused* *by:* *java.io.FileNotFoundException:* */opt/mci/data/srwp01mci001/inventory/index/_4q1y_0.tip* (*Too many open files*) *at* *java.io.RandomAccessFile.open*(*Native* *Method*) *at* *java.io.RandomAccessFile.**init*(*RandomAccessFile.java:212*) *at* *org.apache.lucene.store.FSDirectory$FSIndexOutput.**init*(*FSDirectory.java:449*) *at* *org.apache.lucene.store.FSDirectory.createOutput*(*FSDirectory.java:288*) *at* *org.apache.lucene.codecs.BlockTreeTermsWriter.**init*(*BlockTreeTermsWriter.java:161*) *at* *org.apache.lucene.codecs.lucene40.Lucene40PostingsFormat.fieldsConsumer*(*Lucene40PostingsFormat.java:66*) *at* *org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.addField*(*PerFieldPostingsFormat.java:118*) *at* *org.apache.lucene.index.FreqProxTermsWriterPerField.flush*(*FreqProxTermsWriterPerField.java:322*) *at* *org.apache.lucene.index.FreqProxTermsWriter.flush*(*FreqProxTermsWriter.java:92*) *at* *org.apache.lucene.index.TermsHash.flush*(*TermsHash.java:117*) *at* *org.apache.lucene.index.DocInverter.flush*(*DocInverter.java:53*) *at* *org.apache.lucene.index.DocFieldProcessor.flush*(*DocFieldProcessor.java:81*) *at* *org.apache.lucene.index.DocumentsWriterPerThread.flush*(*DocumentsWriterPerThread.java:475*) *at*
Re: default operation for a field
You can't set the default operator for a single field. This implies you're using edismax? If that's the case, your app layer can massage the query to something like term1 term2 term3 field_x:(term1 AND term2 AND term3). In which case field_x probably should not be in your qf parameter. Best Erick On Mon, Apr 2, 2012 at 2:05 AM, Alexander Aristov alexander.aris...@gmail.com wrote: Hi, Just curious if it's possible to set default operator for a field, not for all application. I have a field and I want it always had AND operation. Is it feasible? Users don't enter any opeartors for this field. Only one term or several separated by empty spaces. But if default operation is set to OR then the field doesn't work as I expect. I need only AND. Maybe another solution is possible? Best Regards Alexander Aristov
Re: default operation for a field
Ok. got it. thanks Best Regards Alexander Aristov On 2 April 2012 16:37, Erick Erickson erickerick...@gmail.com wrote: You can't set the default operator for a single field. This implies you're using edismax? If that's the case, your app layer can massage the query to something like term1 term2 term3 field_x:(term1 AND term2 AND term3). In which case field_x probably should not be in your qf parameter. Best Erick On Mon, Apr 2, 2012 at 2:05 AM, Alexander Aristov alexander.aris...@gmail.com wrote: Hi, Just curious if it's possible to set default operator for a field, not for all application. I have a field and I want it always had AND operation. Is it feasible? Users don't enter any opeartors for this field. Only one term or several separated by empty spaces. But if default operation is set to OR then the field doesn't work as I expect. I need only AND. Maybe another solution is possible? Best Regards Alexander Aristov
Re: Virtual Memory very high
Why do you care about virtual memory? It's after all, virtual. You can allocate as much as you want. For instance, MMapDirectory maps a load of virtual memory, but that has little relation to how much physical memory is being used. Consider looking at your app with something like jConsole and seeing how much physical memory is being used before you worry about this issue. Best Erick On Mon, Apr 2, 2012 at 4:56 AM, Suneel pandey.sun...@gmail.com wrote: Hello Everyone, On window server. I am facing same problem during indexing my memory consumption going very high based on above discussion i checked in my Solrconfig.xml file and found that directoryFactory not configured yet. if i configuring directoryfactory then its will help me reduce the consumption of memory. i think below configuration used for linex server. directoryFactory name=DirectoryFactory class=${solr.directoryFactory:solr.NIOFSDirectoryFactory}/ what will be best option for window server which solve my problem. Please suggest me. - Regards, Suneel Pandey Sr. Software Developer -- View this message in context: http://lucene.472066.n3.nabble.com/Virtual-Memory-very-high-tp3574817p3877097.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Virtual Memory very high
Are you seeing a real problem here, besides just being alarmed by the big numbers from top? Consumption of virtual memory by itself is basically harmless, as long as you're not running up against any of the OS limits (and, you're running a 64 bit JVM). This is just top telling you that you've mapped large files into the virtual memory space. It's not telling you that you don't have any RAM left... virtual memory is different from RAM. In my tests, generally MMapDirectory gives faster search performance than NIOFSDirectory... so unless there's an actual issue, I would recommend sticking with MMapDirectory. Mike McCandless http://blog.mikemccandless.com On Fri, Dec 9, 2011 at 11:54 PM, Rohit ro...@in-rev.com wrote: Hi All, Don't know if this question is directly related to this forum, I am running Solr in Tomcat on linux server. The moment I start tomcat the virtual memory shown using TOP command goes to its max 31.1G and then remains there. Is this the right behaviour, why is the virtual memory usage so high. I have 36GB of ram on the server. Tasks: 309 total, 1 running, 308 sleeping, 0 stopped, 0 zombie Cpu(s): 19.1%us, 0.2%sy, 0.0%ni, 79.3%id, 1.2%wa, 0.0%hi, 0.2%si, 0.0%st Mem: 49555260k total, 36152224k used, 13403036k free, 121612k buffers Swap: 999416k total, 0k used, 999416k free, 5409052k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2741 mysql 20 0 6412m 5.8g 6380 S 182 12.3 108:07.45 mysqld 2814 root 20 0 31.1g 22g 9716 S 100 46.6 375:51.70 java 1765 root 20 0 12.2g 285m 9488 S 2 0.6 3:52.59 java 3591 root 20 0 19352 1576 1068 R 0 0.0 0:00.24 top 1 root 20 0 23684 1908 1276 S 0 0.0 0:06.21 init Regards, Rohit
Re: Apache solr not indexing complete pdf file using tikka
You can index 2B tokens, so upping maxFieldLength should have fixed your problem at least as far as Solr is concerned. How many tokens get indexed? I'm not as familiar with Tika, but there may be some kind of parameter there (although I don't remember this coming up before)... Did you restart Solr after making the change to solrconfig.xml? If you're seeing 10,000 tokens or so, that's the default for maxFieldLength I'd recommend stopping Solr, rm -rf solr home/data/index and restarting Solr just to be sure you're not seeing leftover junk, you'll have to re-index your docs after changing the maxLength param. Best Erick On Mon, Apr 2, 2012 at 7:19 AM, Manoj Saini manoj.sa...@stigasoft.com wrote: Hello Guys, I am using apache solr 3.3.0 with Tikka 1.0. I have pdf files which I am pushing into solr for conent searching. Apache solr is indexing pdf files and I can see them in apache solr admin interface for search. But the issue is apache solr is not indexing whole file content. It is indexing upto only limited size. Am I missing something, some configuration, or this is the behavior of apache solr? I have tried to update solrconfig.xml. I have updated ramBufferSizeMB, maxFieldLength. Thanks Manoj Saini Thanks, Best Regards, Manoj Saini | Sr. Software Engineer | Stigasoft m: +91 98 1034 1281 | e: mailto:nseh...@stigasoft.com manoj.sa...@stigasoft.com | w: http://www.stigasoft.com www.stigasoft.com
Problems with indexing of huge textfiles (drupal/tika/solr)
Hi, We have troubles indexing big text files with Solr. We extract PDF files with Tika and try to index them with Solr. But Solr doesn't index the entire text. As soon as a certain amount of text is reached Solr stopps indexing the rest. We haven't found a setting or parameter wich defines the amount of text to index per Node/Document. Wher is this limit set or how can we increase it? At the Moment the Limit is somwhere around 40k Characters or 69kb Text. Best Regards, Sandro -- .. . Sandro Feuillet zehnplus GmbH Binzmühlestrasse 210 CH-8050 Zürich Telefon: +41 43 288 58 49 Mobil:+41 76 422 30 22 E-Mail: sandro.feuil...@zehnplus.ch Internet: http://www.zehnplus.ch .. .
A little mild abuse of SearchHandler
I've got a prototype of a RequestHandler that embeds, within itself, a SearchHandler. Yes, I read the previous advice to be a query component, but I found it a lot easier to chart my course. I'm having some trouble with sorting. I came up with the following. 'args' is the usual MapString, String[]. firstpassSort is an array of score desc, myfieldname asc. Sorting isn't happening. The QParser does not seem to be seeing my sort spec, as if something is trimming it out of the params. Is there something here I'm missing? args.put(CommonParams.SORT, firstpassSort); LocalSolrQueryRequest lsqr = new LocalSolrQueryRequest(req.getCore(), bqString, standard, 0, rows, args); SolrQueryResponse localRes = new SolrQueryResponse(); srh.handleRequest(lsqr, localRes); // ok, let the regular processor do the job.
Re: A little mild abuse of SearchHandler
I've answered my own question, but it left me with a lot of curiosity. Why is the convention to build strings joined with commas (e.g in SolrQuery.addValueToParam) rather than to use the array option? All these params are MapString, String[], so why cram multiples into the first slot with commas ?
Re: Solr caching memory consumption Problem
On 3/31/2012 4:30 AM, Suneel wrote: Hello friends, I am using DIH for solr indexing. I have 60 million records in SQL which need to upload on solr. i started caching its smoothly working and memory consumption is normal, But after some time incrementally memory consumption going high and process reach more then 6 gb. that the reason i am not able to caching my data. please advise me if anything need to be done in configuration or in tomcat configuration. I saw your later message about virtual memory and the directoryFactory - most of the time it is best to go with the default (solr.StandardDirectoryFactory), which you can do by specifying it explicitly or by leaving that configuration out. When you talk about caching, are you talking about Solr's caches or OS/process memory and disk cache?If you are talking about the caches that you can configure in solrconfig.xml (filterCache, queryResultCache, and documentCache), you should not be trying to cache large portions of your index there. I have over 11 million documents in each of my index shards (68 million for the whole index) and my numbers for those three caches are 64, 512, and 16384, with autoWarm counts of 4 and 32, since the documentCache doesn't directly support warming. If you are talking about how much memory Windows says the Java process says it is taking up, take a look at the replies you have already gotten on your Virtual Memory message. As Erick and Michael told you, if you are using the latest version (3.5) with the standard directoryFactory config, most of the memory that you are seeing there is because the OS is memory mapping your entire on-disk index, taking advantage of the OS disk cache to speed up disk access without actually allocating the memory involved. This is a good thing, even though the process numbers look bad. JConsole or another java memory tool can show you the true picture. With 60 million records, even if those records are small, your Solr index will probably grow to several gigabytes. For the best performance, your server must have enough memory so that the entire index can fit into RAM, after discounting memory usage for the OS itself and the java process that contains Solr. If you can get MOST of the index into RAM, performance will likely still be acceptable. You message implies that 6GB worries you very much, so I am guessing that your server has somewhere in the range of 4GB to 8GB of RAM, but your index is very much larger than this. You don't actually say whether you lose performance. Do you, or are you just worried about the memory usage? If Solr's query times start increasing, that is usually a good indicator that it is not healthy. Thanks, Shawn
Re: Problems with indexing of huge textfiles (drupal/tika/solr)
And probably 10,000 tokens (words). See maxFieldLength in solrconfig.xml. Best Erick On Mon, Apr 2, 2012 at 8:57 AM, Sandro Feuillet sandro.feuil...@zehnplus.ch wrote: Hi, We have troubles indexing big text files with Solr. We extract PDF files with Tika and try to index them with Solr. But Solr doesn't index the entire text. As soon as a certain amount of text is reached Solr stopps indexing the rest. We haven't found a setting or parameter wich defines the amount of text to index per Node/Document. Wher is this limit set or how can we increase it? At the Moment the Limit is somwhere around 40k Characters or 69kb Text. Best Regards, Sandro -- .. . Sandro Feuillet zehnplus GmbH Binzmühlestrasse 210 CH-8050 Zürich Telefon: +41 43 288 58 49 Mobil: +41 76 422 30 22 E-Mail: sandro.feuil...@zehnplus.ch Internet: http://www.zehnplus.ch .. .
How to determine memory consumption per core
Hi, is it possible to determine the memory consumption (heap space) per core in solr trunk (4.0-SNAPSHOT)? I just unloaded a core and saw the difference in memory usage, but it would be nice to have a smoother way of getting the information without core downtime. It would also be interesting, which caches are the biggest ones, to know which one should/might be reduced. Thanx cheers, Martin signature.asc Description: OpenPGP digital signature
RE: ExtractingRequestHandler
Solr Cell is great for proof-of-concept, but for heavy-duty applications, you're offloading all the processing on the Solr server, which can be a problem. Good point! Thank you
Thanks All, that worked (both via SOLRJ and the admin UI)
The query in question should be: -- View this message in context: http://lucene.472066.n3.nabble.com/How-do-I-use-localparams-joins-using-SolrJ-and-or-the-Admin-GUI-tp3872088p3877927.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Distributed grouping issue
In the case of group=false: numFound=26 In the case of group=true: int name=matches34000/int As a note, the grouped number changes when I hit refresh. It seems to display the count from any single shard. (The top match also changes). I haven't tried this in other versions of solr. All documents of a group exist on a single shard, there are no cross-shard groups. Thanks, Cody -Original Message- From: martijn.is.h...@gmail.com [mailto:martijn.is.h...@gmail.com] On Behalf Of Martijn v Groningen Sent: Monday, April 02, 2012 3:15 AM To: solr-user@lucene.apache.org Subject: Re: Distributed grouping issue The matches element in the response should return the number of documents that matched with the query and not the number of groups. Did you encountered this issue also with other Solr versions (3.5 or another nightly build)? Martijn On 2 April 2012 09:41, fbrisbart fbrisb...@bestofmedia.com wrote: Hi, when you write I get xxx results, does it come from 'numFound' ? Or you really display xxx results ? When using both field collapsing and sharding, the 'numFound' may be wrong. In that case, think about using 'shards.rows' parameter with a high value (be careful, it's bad for performance). If the problem is really about the returned results, it may be because of several documents having the same unique key document_id in different shards. Hope it helps, Franck Le vendredi 30 mars 2012 à 23:52 +, Young, Cody a écrit : I forgot to mention, I can see the distributed requests happening in the logs: Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute INFO: [core2] webapp=/solr path=/select params={group.distributed.first=truedistrib=falsewt=javabinrows=10 version=2fl=document_id,scoreshard.url=localhost:8086/solr/core2NOW =1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShar d=true} status=0 QTime=2 Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute INFO: [core4] webapp=/solr path=/select params={group.distributed.first=truedistrib=falsewt=javabinrows=10 version=2fl=document_id,scoreshard.url=localhost:8086/solr/core4NOW =1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShar d=true} status=0 QTime=1 Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute INFO: [core1] webapp=/solr path=/select params={group.distributed.first=truedistrib=falsewt=javabinrows=10 version=2fl=document_id,scoreshard.url=localhost:8086/solr/core1NOW =1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShar d=true} status=0 QTime=1 Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute INFO: [core3] webapp=/solr path=/select params={group.distributed.first=truedistrib=falsewt=javabinrows=10 version=2fl=document_id,scoreshard.url=localhost:8086/solr/core3NOW =1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShar d=true} status=0 QTime=1 Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute INFO: [core0] webapp=/solr path=/select params={group.distributed.first=truedistrib=falsewt=javabinrows=10 version=2fl=document_id,scoreshard.url=localhost:8086/solr/core0NOW =1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShar d=true} status=0 QTime=1 Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute INFO: [core6] webapp=/solr path=/select params={group.distributed.first=truedistrib=falsewt=javabinrows=10 version=2fl=document_id,scoreshard.url=localhost:8086/solr/core6NOW =1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShar d=true} status=0 QTime=0 Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute INFO: [core7] webapp=/solr path=/select params={group.distributed.first=truedistrib=falsewt=javabinrows=10 version=2fl=document_id,scoreshard.url=localhost:8086/solr/core7NOW =1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShar d=true} status=0 QTime=3 Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute INFO: [core5] webapp=/solr path=/select params={group.distributed.first=truedistrib=falsewt=javabinrows=10 version=2fl=document_id,scoreshard.url=localhost:8086/solr/core5NOW =1333151353217start=0q=*:*group.field=group_fieldgroup=trueisShar d=true} status=0 QTime=1 Mar 30, 2012 4:49:13 PM org.apache.solr.core.SolrCore execute INFO: [core4] webapp=/solr path=/select params={distrib=falsegroup.distributed.second=truewt=javabinversion =2rows=10group.topgroups.group_field=4183765296group.topgroups.grou p_field=4608765424group.topgroups.group_field=3524954944group.topgro ups.group_field=4182445488group.topgroups.group_field=4213143392grou p.topgroups.group_field=4328299312group.topgroups.group_field=4206259 648group.topgroups.group_field=3465497912group.topgroups.group_field =3554417600group.topgroups.group_field=3140802904fl=document_id,scor eshard.url=localhost:8086/solr/core4NOW=1333151353217start=0q=*:* group.field=group_fieldgroup=trueisShard=true} status=0
Re: Open deleted index file failing jboss shutdown with Too many open files Error
Here is SolrConfig.xml, and I am using Lucene NRT with soft commit and update the index every 5 seconds, soft commit every 1 second and hard commit every 15 minutes SolrConfig.xml: indexDefaults useCompoundFilefalse/useCompoundFile mergeFactor10/mergeFactor maxMergeDocs2147483647/maxMergeDocs maxFieldLength1/maxFieldLength-- ramBufferSizeMB4096/ramBufferSizeMB maxThreadStates10/maxThreadStates writeLockTimeout1000/writeLockTimeout commitLockTimeout1/commitLockTimeout lockTypesingle/lockType mergePolicy class=org.apache.lucene.index.TieredMergePolicy double name=forceMergeDeletesPctAllowed0.0/double double name=reclaimDeletesWeight10.0/double /mergePolicy deletionPolicy class=solr.SolrDeletionPolicy str name=keepOptimizedOnlyfalse/str str name=maxCommitsToKeep0/str /deletionPolicy /indexDefaults updateHandler class=solr.DirectUpdateHandler2 maxPendingDeletes1000/maxPendingDeletes autoCommit maxTime90/maxTime openSearcherfalse/openSearcher /autoCommit autoSoftCommit maxTime${inventory.solr.softcommit.duration:1000}/ maxTime /autoSoftCommit /updateHandler On Sun, Apr 1, 2012 at 7:47 PM, Gopal Patwa gopalpa...@gmail.com wrote: I am using Solr 4.0 nightly build with NRT and I often get this error during auto commit Too many open files. I have search this forum and what I found it is related to OS ulimit setting, please see below my ulimit settings. I am not sure what ulimit setting I should have for open file? ulimit -n unlimited?. Even if I set to higher number, it will just delay the issue until it reach new open file limit. What I have seen that Solr has kept deleted index file open by java process, which causing issue for our application server jboss to shutdown gracefully due this open files by java process. I have seen recently this issue was resolved in lucene, is it TRUE? https://issues.apache.org/jira/browse/LUCENE-3855 I have 3 core with index size : core1 - 70GB, Core2 - 50GB and Core3 - 15GB, with Single shard We update the index every 5 seconds, soft commit every 1 second and hard commit every 15 minutes Environment: Jboss 4.2, JDK 1.6 64 bit, CentOS , JVM Heap Size = 24GB* ulimit: core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 401408 max locked memory (kbytes, -l) 1024 max memory size (kbytes, -m) unlimited open files (-n) 4096 pipe size(512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 10240 cpu time (seconds, -t) unlimited max user processes (-u) 401408 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited ERROR:* *2012-04-01* *20:08:35*,*323* [] *priority=ERROR* *app_name=* *thread=pool-10-thread-1* *location=CommitTracker* *line=93* *auto* *commit* *error...:org.apache.solr.common.SolrException:* *Error* *opening* *new* *searcher* *at* *org.apache.solr.core.SolrCore.openNewSearcher*(*SolrCore.java:1138*) *at* *org.apache.solr.core.SolrCore.getSearcher*(*SolrCore.java:1251*) *at* *org.apache.solr.update.DirectUpdateHandler2.commit*(*DirectUpdateHandler2.java:409*) *at* *org.apache.solr.update.CommitTracker.run*(*CommitTracker.java:197*) *at* *java.util.concurrent.Executors$RunnableAdapter.call*(*Executors.java:441*) *at* *java.util.concurrent.FutureTask$Sync.innerRun*(*FutureTask.java:303*) *at* *java.util.concurrent.FutureTask.run*(*FutureTask.java:138*) *at* *java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301*(*ScheduledThreadPoolExecutor.java:98*) *at* *java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run*(*ScheduledThreadPoolExecutor.java:206*) *at* *java.util.concurrent.ThreadPoolExecutor$Worker.runTask*(*ThreadPoolExecutor.java:886*) *at* *java.util.concurrent.ThreadPoolExecutor$Worker.run*(*ThreadPoolExecutor.java:908*) *at* *java.lang.Thread.run*(*Thread.java:662*)*Caused* *by:* *java.io.FileNotFoundException:* */opt/mci/data/srwp01mci001/inventory/index/_4q1y_0.tip* (*Too many open files*) *at* *java.io.RandomAccessFile.open*(*Native* *Method*) *at* *java.io.RandomAccessFile.**init*(*RandomAccessFile.java:212*) *at*
Re: Open deleted index file failing jboss shutdown with Too many open files Error
Hmm, unless the ulimits are low, or the default mergeFactor was changed, or you have many indexes open in a single JVM, or you keep too many IndexReaders open, even in an NRT or frequent commit use case, you should not run out of file descriptors. Frequent commit/reopen should be perfectly fine, as long as you close the old readers... Mike McCandless http://blog.mikemccandless.com On Mon, Apr 2, 2012 at 8:35 AM, Erick Erickson erickerick...@gmail.com wrote: How often are you committing index updates? This kind of thing can happen if you commit too often. Consider setting commitWithin to something like, say, 5 minutes. Or doing the equivalent with the autoCommit parameters in solrconfig.xml If that isn't relevant, you need to provide some more details about what you're doing and how you're using Solr Best Erick On Sun, Apr 1, 2012 at 10:47 PM, Gopal Patwa gopalpa...@gmail.com wrote: I am using Solr 4.0 nightly build with NRT and I often get this error during auto commit Too many open files. I have search this forum and what I found it is related to OS ulimit setting, please see below my ulimit settings. I am not sure what ulimit setting I should have for open file? ulimit -n unlimited?. Even if I set to higher number, it will just delay the issue until it reach new open file limit. What I have seen that Solr has kept deleted index file open by java process, which causing issue for our application server jboss to shutdown gracefully due this open files by java process. I have seen recently this issue was resolved in lucene, is it TRUE? https://issues.apache.org/jira/browse/LUCENE-3855 I have 3 core with index size : core1 - 70GB, Core2 - 50GB and Core3 - 15GB, with Single shard We update the index every 5 seconds, soft commit every 1 second and hard commit every 15 minutes Environment: Jboss 4.2, JDK 1.6 64 bit, CentOS , JVM Heap Size = 24GB* ulimit: core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 401408 max locked memory (kbytes, -l) 1024 max memory size (kbytes, -m) unlimited open files (-n) 4096 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 10240 cpu time (seconds, -t) unlimited max user processes (-u) 401408 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited ERROR:* *2012-04-01* *20:08:35*,*323* [] *priority=ERROR* *app_name=* *thread=pool-10-thread-1* *location=CommitTracker* *line=93* *auto* *commit* *error...:org.apache.solr.common.SolrException:* *Error* *opening* *new* *searcher* *at* *org.apache.solr.core.SolrCore.openNewSearcher*(*SolrCore.java:1138*) *at* *org.apache.solr.core.SolrCore.getSearcher*(*SolrCore.java:1251*) *at* *org.apache.solr.update.DirectUpdateHandler2.commit*(*DirectUpdateHandler2.java:409*) *at* *org.apache.solr.update.CommitTracker.run*(*CommitTracker.java:197*) *at* *java.util.concurrent.Executors$RunnableAdapter.call*(*Executors.java:441*) *at* *java.util.concurrent.FutureTask$Sync.innerRun*(*FutureTask.java:303*) *at* *java.util.concurrent.FutureTask.run*(*FutureTask.java:138*) *at* *java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301*(*ScheduledThreadPoolExecutor.java:98*) *at* *java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run*(*ScheduledThreadPoolExecutor.java:206*) *at* *java.util.concurrent.ThreadPoolExecutor$Worker.runTask*(*ThreadPoolExecutor.java:886*) *at* *java.util.concurrent.ThreadPoolExecutor$Worker.run*(*ThreadPoolExecutor.java:908*) *at* *java.lang.Thread.run*(*Thread.java:662*)*Caused* *by:* *java.io.FileNotFoundException:* */opt/mci/data/srwp01mci001/inventory/index/_4q1y_0.tip* (*Too many open files*) *at* *java.io.RandomAccessFile.open*(*Native* *Method*) *at* *java.io.RandomAccessFile.**init*(*RandomAccessFile.java:212*) *at* *org.apache.lucene.store.FSDirectory$FSIndexOutput.**init*(*FSDirectory.java:449*) *at* *org.apache.lucene.store.FSDirectory.createOutput*(*FSDirectory.java:288*) *at* *org.apache.lucene.codecs.BlockTreeTermsWriter.**init*(*BlockTreeTermsWriter.java:161*) *at* *org.apache.lucene.codecs.lucene40.Lucene40PostingsFormat.fieldsConsumer*(*Lucene40PostingsFormat.java:66*) *at* *org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.addField*(*PerFieldPostingsFormat.java:118*) *at* *org.apache.lucene.index.FreqProxTermsWriterPerField.flush*(*FreqProxTermsWriterPerField.java:322*) *at*
Re: Merging results from two queries
Karthick, The solution that I use to this problem is to perform query1 and query2 and boost results matching query1. Then solr takes care of all the deduplication (not necessarily merging) automatically, would this work for your situation? I stole this idea from this slide deck: Make sure all relevant documents match... Make sure the best matching documents score highest... -- http://www.lucidimagination.com/files/relevancy-ranking-meetup-presentation-14-dec-10.pptx (page 19) On Mon, Apr 2, 2012 at 7:28 AM, Karthick Duraisamy Soundararaj karthick.soundara...@gmail.com wrote: Hi all, I am finding a need to merge the results of multiple queries to accomplish a functionality similar to this : 1. Make query 1 2. If results returned by query1 is less than a certain threshold, then Make query 2 Extending this idea, I want to be able to create a query chain, i.e, provide a functionality where you could specify n queries and n-1 thresholds in a single url. Start querying in the order from 1 to n until one of them produces results that exceed the threshold. PS: These n queries and n threshold are passed on a single url and each of them could use different request handlers and therefore take a different set of parameters. Any suggestions/thoughts/pointers as where to begin looking for will be of great help! Thanks, Karthick
SolrJ updating indexed documents?
I am working on a component for indexing documents from a database that contains medical records. The information is organized across several tables and I am supposed to index records for varying sizes of sets of patients for others to do IR experiments with. Each patient record has one or more main documents associated with it, and each main document has zero or more addenda associated with it. (The main documents and addenda are treated alike for the most part, except for a parent record field that is null for main documents and has the number of a main document for addenda. Addenda cannot have addenda.) Also, each main document has one or more diagnosis records. I am trying to figure out the best performing way to select all of the records for each patient, including the main documents, addenda and diagnoses. I tried indexing sets of these records using DataImportHandler and nested Entity blocks in a way similar to the Full Import example on the http://wiki.apache.org/solr/DataImportHandler page, with a select for all patients and main records in a data set, and nested selects that get all of the addenda and all of the diagnoses for each patient, but it didn't run very fast and a database resource person who looked into it with me said that issuing a million SQL queries for addenda and a million queries for diagnoses, one each for the million patient documents in a typical set of 10,000 patients, was very inefficient, and I should look for a different way of getting the data. I switched to using SolrJ, and I am trying to figure out which of two ways to use to index this data. One would be to use one large SQL statement to get all of the data for a patient set. The results would contain duplication due to the way tables are joined together that I would need to sort out in the Java code, but that is doable. The other way would be to 1. Get all of the main document data with one SQL query, create index documents with the data that they contain and store them in the index, 2. Issue another SQL query that gets all of the addenda for all of the patients in the data set and an id number for each one that tells which main document an addendum belongs with, retrieve the main documents from the index, add the addenda fields to the document and put them back in the index 3. Do the same with diagnosis data. It would be great to be able to keep the main document data that is retrieved from the database in a hash table, update each of those objects with addenda and diagnoses, and write completely filled out documents to the index once, but I don't have enough memory available to do this for the patient sets I am working with now, and they want this indexing process to scale up to patient sets that are ten times as large and eventually much larger than that. Essentially for the second approach I am wondering if a Lucene index can be made to serve as a hash table for storing intermediate results, and whether SolrJ has an API for retrieving individual index documents so they can be updated. Basically it would be shifting from iterating over SQL queries to iterating over Lucene index updates. If this way of doing things is also likely to be slow, or the SolrJ API doesn't provide a way to do this, or there are other problems with it, I can go with selecting all of the data in one large query and dealing with the duplication. Thanks, Mike
Re: viewing the terms indexed for a specific document
A few more details to this thread - when i try the analysis tab from the admin console I see that the synonym is kicking in its matching the text in the document that I am expecting to see as part of the results. However the actual search is not returning that document. Also I used the termcomponent and tried to see how many docs match the synonym term i don't see the term at all. So not sure how to check if this is working or not. Thanks, Karthik On Mon, Apr 2, 2012 at 3:41 PM, karthik kmoha...@gmail.com wrote: Hi, I am trying to view what terms are getting indexed for a specific field in a specific document. How can i view this information? I tried the luke handler it's not showing me what I am looking for. I am using Solr 3.1.0. I am using index time synonym expansion saw that one of my synonym was not working. In general synonyms are working since there are many other cases where they are working. So to debug this issue I wanted to see if the synonym for the word is stored within the field for a given document inside the index. Luke showed me the actual string from the document but not the synonym. I tested luke on a different document which gets returned while using a synonym and I dont see the synonym term in the field str name=value or str name=internal of the luke handler. Any pointers on how to view the actual indexed term would be helpful. Thanks, Karthik
Re: Distributed grouping issue
All documents of a group exist on a single shard, there are no cross-shard groups. You only have to partition documents by group when the groupCount and some other features need to be accurate. For the matches this is not necessary. The matches are summed up during merging the shared responses. I can't reproduce the error you are describing on a small local setup I have here. I have two Solr cores with a simple schema. Each core has 3 documents. When grouping the matches element returns 6. I'm running on a trunk that I have updated 30 minutes ago. Can you try to isolate the problem by testing with a small subset of your data? Martijn
RE: Distributed grouping issue
Okay, I've played with this a bit more. Found something interesting: When the groups returned do not include results from a core, then the core is excluded from the count. (I have 1 group, 2 documents per core) Example: http://localhost:8983/solr/core0/select/?q=*:*shards=localhost:8983/solr/core0,localhost:8983/solr/core1group=truegroup.field=group_fieldgroup.limit=10rows=1 lst name=grouped lst name=group_field int name=matches2/int Then, just by changing rows=2 http://localhost:8983/solr/core0/select/?q=*:*shards=localhost:8983/solr/core0,localhost:8983/solr/core1group=truegroup.field=group_fieldgroup.limit=10rows=2 lst name=grouped lst name=group_field int name=matches4/int Let me know if you have any luck reproducing. Thanks, Cody -Original Message- From: martijn.is.h...@gmail.com [mailto:martijn.is.h...@gmail.com] On Behalf Of Martijn v Groningen Sent: Monday, April 02, 2012 1:48 PM To: solr-user@lucene.apache.org Subject: Re: Distributed grouping issue All documents of a group exist on a single shard, there are no cross-shard groups. You only have to partition documents by group when the groupCount and some other features need to be accurate. For the matches this is not necessary. The matches are summed up during merging the shared responses. I can't reproduce the error you are describing on a small local setup I have here. I have two Solr cores with a simple schema. Each core has 3 documents. When grouping the matches element returns 6. I'm running on a trunk that I have updated 30 minutes ago. Can you try to isolate the problem by testing with a small subset of your data? Martijn
Re: pattern error in PatternReplaceCharFilterFactory
: It seems to be an unrecognisable pattern, this is from the log, last : paragraph says unknown character block name. The java version is : 1.6.0_31: Did you read the rest of my reply? about testing if java recognizes your block name independent of Solr ... because that error is coming directly from the java regex engine... : Caused by: java.util.regex.PatternSyntaxException: Unknown character block : name {Latin-1_Supplement} near index 23 : \p{InLatin-1_Supplement} :^ : at java.util.regex.Pattern.error(Pattern.java:1713) : at java.util.regex.Pattern.unicodeBlockPropertyFor(Pattern.java:2424) Why are you using an _ at all? Isn't \p{InLatin-1 Supplement} (or \p{InLatin-1Supplement} what you mean? Either of those work for me, and match the javadocs for what block names are supported in the JVM... http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html#ubc The block names supported by Pattern are the valid block names accepted and defined by UnicodeBlock.forName. http://docs.oracle.com/javase/6/docs/api/java/lang/Character.UnicodeBlock.html#forName%28java.lang.String%29 This method accepts block names in the following forms: 1. Canonical block names as defined by the Unicode Standard. For example, the standard defines a Basic Latin block. Therefore, this method accepts Basic Latin as a valid block name. The documentation of each UnicodeBlock provides the canonical name. 2. Canonical block names with all spaces removed. For example, BasicLatin is a valid block name for the Basic Latin block. ... -Hoss
Re: Merging results from two queries
Part of it depends on what you mean by threshold. If it's just the number of matches, then fine. But if you're talking score here, be very, very careful. Scores are not an absolute measure of anything, they only tell you that for _this_ query, the docs should be order this way. So I'd advise against any query chain based on scores as the threshold, if that's what you mean by threshold. Best Erick On Mon, Apr 2, 2012 at 10:28 AM, Karthick Duraisamy Soundararaj karthick.soundara...@gmail.com wrote: Hi all, I am finding a need to merge the results of multiple queries to accomplish a functionality similar to this : 1. Make query 1 2. If results returned by query1 is less than a certain threshold, then Make query 2 Extending this idea, I want to be able to create a query chain, i.e, provide a functionality where you could specify n queries and n-1 thresholds in a single url. Start querying in the order from 1 to n until one of them produces results that exceed the threshold. PS: These n queries and n threshold are passed on a single url and each of them could use different request handlers and therefore take a different set of parameters. Any suggestions/thoughts/pointers as where to begin looking for will be of great help! Thanks, Karthick
Re: viewing the terms indexed for a specific document
If you add explainOther=some id, see: http://wiki.apache.org/solr/SolrRelevancyFAQ you might get some hints. You can use the TermsComponent to see if the synonyms are getting in the index, but you'll have to have a very restricted input set (like one doc) for that to be helpful for a specific document. Ahhh, try getting the stand-alone Luke program, it allows a lower-level exploration of the index, see: http://code.google.com/p/luke/ The LukeRequestHandler is based on Luke, but Luke itself is more flexible. When are you putting synonyms in? Index time? Query time? Both? Showing your schema.xml fragment for the field in question would help diagnose the problem, as would showing the results of attaching debugQuery=on to the URL. Best Erick On Mon, Apr 2, 2012 at 4:26 PM, karthik kmoha...@gmail.com wrote: A few more details to this thread - when i try the analysis tab from the admin console I see that the synonym is kicking in its matching the text in the document that I am expecting to see as part of the results. However the actual search is not returning that document. Also I used the termcomponent and tried to see how many docs match the synonym term i don't see the term at all. So not sure how to check if this is working or not. Thanks, Karthik On Mon, Apr 2, 2012 at 3:41 PM, karthik kmoha...@gmail.com wrote: Hi, I am trying to view what terms are getting indexed for a specific field in a specific document. How can i view this information? I tried the luke handler it's not showing me what I am looking for. I am using Solr 3.1.0. I am using index time synonym expansion saw that one of my synonym was not working. In general synonyms are working since there are many other cases where they are working. So to debug this issue I wanted to see if the synonym for the word is stored within the field for a given document inside the index. Luke showed me the actual string from the document but not the synonym. I tested luke on a different document which gets returned while using a synonym and I dont see the synonym term in the field str name=value or str name=internal of the luke handler. Any pointers on how to view the actual indexed term would be helpful. Thanks, Karthik
Re: Tags and Folksonomies
: Suppose I have content which has title and description. Users can tag content : and search content based on tag, title and description. Tag has more : weightage. : : Any inputs on how indexing and retrieval will work given there is content : and tags using Solr? Has anyone implemented search based on collaborative : tagging? simple stuff would be to have your 3 fields, and search them with a weighted boosting -- giving more importance to the tag field. where things get more complicated is when you want docA to score higher for hte query boat then docB because 100 users have taged docA with boat, but only 5 users have taged docB boat The canonical way to deal with this would be using payloads to boost the weight of a term -- the DelimitedPayloadTokenFilterFactory can help with this at index time, but off the top of my head i don't think any of the existing Solr QParsers will build the neccessary PayloadTermQuery, so you might have to roll your own -- there are afew Jira issues with patches that you might be able to re-use or get inspired from... https://issues.apache.org/jira/browse/SOLR-1485 -Hoss
Re: Merging results from two queries
@Eric By threshold, all I mean is the count of the documents returned and I am not going to play with score. So if I have to commit my code to svn, whats the best way to go about it? I know I have to discuss my design here which would take atleast a couple of days. But is there special instructions that I need to follow in order to stay in a direction from where I could commit my code? @John Yes, thats definitely a solution but then I dont want to make two different http requests. I want to make 1 request and all that I mentioned has to happen. On Mon, Apr 2, 2012 at 7:28 PM, Erick Erickson erickerick...@gmail.comwrote: Part of it depends on what you mean by threshold. If it's just the number of matches, then fine. But if you're talking score here, be very, very careful. Scores are not an absolute measure of anything, they only tell you that for _this_ query, the docs should be order this way. So I'd advise against any query chain based on scores as the threshold, if that's what you mean by threshold. Best Erick On Mon, Apr 2, 2012 at 10:28 AM, Karthick Duraisamy Soundararaj karthick.soundara...@gmail.com wrote: Hi all, I am finding a need to merge the results of multiple queries to accomplish a functionality similar to this : 1. Make query 1 2. If results returned by query1 is less than a certain threshold, then Make query 2 Extending this idea, I want to be able to create a query chain, i.e, provide a functionality where you could specify n queries and n-1 thresholds in a single url. Start querying in the order from 1 to n until one of them produces results that exceed the threshold. PS: These n queries and n threshold are passed on a single url and each of them could use different request handlers and therefore take a different set of parameters. Any suggestions/thoughts/pointers as where to begin looking for will be of great help! Thanks, Karthick
Re: Distributed grouping issue
I tried the to reproduce this. However the matches always returns 4 in my case (when using rows=1 and rows=2). In your case the 2 documents on each core do belong to the same group, right? I did find something else. If I use rows=0 then an error occurs. I think we need to further investigate this. Can you open an issue in Jira? I'm a bit busy today. We can then further look into this in the coming days. Martijn On 2 April 2012 23:00, Young, Cody cody.yo...@move.com wrote: Okay, I've played with this a bit more. Found something interesting: When the groups returned do not include results from a core, then the core is excluded from the count. (I have 1 group, 2 documents per core) Example: http://localhost:8983/solr/core0/select/?q=*:*shards=localhost:8983/solr/core0,localhost:8983/solr/core1group=truegroup.field=group_fieldgroup.limit=10rows=1 lst name=grouped lst name=group_field int name=matches2/int Then, just by changing rows=2 http://localhost:8983/solr/core0/select/?q=*:*shards=localhost:8983/solr/core0,localhost:8983/solr/core1group=truegroup.field=group_fieldgroup.limit=10rows=2 lst name=grouped lst name=group_field int name=matches4/int Let me know if you have any luck reproducing. Thanks, Cody -Original Message- From: martijn.is.h...@gmail.com [mailto:martijn.is.h...@gmail.com] On Behalf Of Martijn v Groningen Sent: Monday, April 02, 2012 1:48 PM To: solr-user@lucene.apache.org Subject: Re: Distributed grouping issue All documents of a group exist on a single shard, there are no cross-shard groups. You only have to partition documents by group when the groupCount and some other features need to be accurate. For the matches this is not necessary. The matches are summed up during merging the shared responses. I can't reproduce the error you are describing on a small local setup I have here. I have two Solr cores with a simple schema. Each core has 3 documents. When grouping the matches element returns 6. I'm running on a trunk that I have updated 30 minutes ago. Can you try to isolate the problem by testing with a small subset of your data? Martijn -- Met vriendelijke groet, Martijn van Groningen