Re: TokenStream contract violation: close() call missing error in 4.9.0
What tokenizer are you using? I think, but I'm not entirely sure, that this would require a bug in a tokenizer. On Tue, Jun 9, 2015 at 10:21 AM, Ryan, Michael F. (LNG-DAY) michael.r...@lexisnexis.com wrote: I'm using Solr 4.9.0. I'm trying to figure out what would cause an error like this to occur a rare, non-deterministic manner: java.lang.IllegalStateException: TokenStream contract violation: close() call missing at org.apache.lucene.analysis.Tokenizer.setReader(Tokenizer.java:90) at org.apache.lucene.analysis.Analyzer$TokenStreamComponents.setReader(Analyzer.java:307) at org.apache.lucene.analysis.Analyzer.tokenStream(Analyzer.java:183) Are there any known bugs that would cause this, or unusual conditions? I'm thinking crazy things like a corrupted index, or a hardware issue. I don't directly use TokenStream, so I'm wondering if there is something that could indirectly cause this (i.e., me doing something wrong that causes Lucene itself to not close the TokenStream). I can provide more details later. Right now I'm just grasping at straws, hoping someone has encountered this. -Michael
Re: Korean script conversion
Why do you think that this is a good idea? Hanja are used for special purposes; they are not trivally convertable to Hanjul due to ambiguity, and it's not at all clear that a typical search user wants to treat them as equivalent. On Sun, Mar 29, 2015 at 1:52 AM, Eyal Naamati eyal.naam...@exlibrisgroup.com wrote: Hi, We are starting to index records in Korean. Korean text can be written in two scripts: Han characters (Chinese) and Hangul characters (Korean). We are looking for some solr filter or another built in solr component that converts between Han and Hangul characters (transliteration). I know there is the ICUTransformFilterFactory that can convert between Japanese or chinese scripts, for example: filter class=*solr.ICUTransformFilterFactory* id=*Katakana- Hiragana* / for Japanese script conversions So far I couldn't find anything readymade for Korean scripts, but perhaps someone knows of one? Thanks! Eyal Naamati Alma Developer Tel: +972-2-6499313 Mobile: +972-547915255 eyal.naam...@exlibrisgroup.com [image: Description: Description: Description: Description: C://signature/exlibris.jpg] www.exlibrisgroup.com
qt.shards in solrconfig.xml
A query I posted yesterday amounted to me forgetting that I have to set qt.shards when I use a URL other than plain old '/select' with SolrCloud. Is there any way to configure a query handler to automate this, so that all queries addressed to '/RNI' get that added in?
Re: qt.shards in solrconfig.xml
I apparently am feeling dense; the following does not worl. requestHandler name=/RNI class=solr.SearchHandler default=false list name=defaults str name=shards.qt/RNI/str /list arr name=components strname-indexing-query/str strname-indexing-rescore/str strfacet/str strmlt/str strhighlight/str strstats/str strdebug/str /arr /requestHandler On Thu, Feb 26, 2015 at 11:33 AM, Jack Krupansky jack.krupan...@gmail.com wrote: I was hoping that Benson was hinting at adding a qt.shards.auto=true parameter to so that would magically use on the path from the incoming request - and that this would be the default, since that's what most people would expect. Or, maybe just add a commented-out custom handler that has the qt.shards parameter as suggested, to re-emphasize to people that if they want to use a custom handler in distributed mode, then they will most likely need this parameter. -- Jack Krupansky On Thu, Feb 26, 2015 at 11:28 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Hello, Giving http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201301.mbox/%3c711daae5-c366-4349-b644-8e29e80e2...@gmail.com%3E you can add qt.shards into handler defaults/invariants. On Thu, Feb 26, 2015 at 5:40 PM, Benson Margulies bimargul...@gmail.com wrote: A query I posted yesterday amounted to me forgetting that I have to set qt.shards when I use a URL other than plain old '/select' with SolrCloud. Is there any way to configure a query handler to automate this, so that all queries addressed to '/RNI' get that added in? -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: 8 Shards of Cloud with 4.10.3.
On Wed, Feb 25, 2015 at 8:04 AM, Shawn Heisey apa...@elyograg.org wrote: On 2/25/2015 5:50 AM, Benson Margulies wrote: So, found the following line in the guide: java -DzkRun -DnumShards=2 -Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=myconf -jar start.jar using a completely clean, new, solr_home. In my own bootstrap dir, I have my own solrconfig.xml and schema.xml, and I modified to have: -DnumShards=8 -DmaxShardsPerNode=8 When I went to start loading data into this, I failed: Caused by: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: No registered leader was found after waiting for 4000ms , collection: rni slice: shard4 at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:554) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124) at org.apache.solr.client.solrj.SolrServer.deleteByQuery(SolrServer.java:285) at org.apache.solr.client.solrj.SolrServer.deleteByQuery(SolrServer.java:271) at com.basistech.rni.index.internal.SolrCloudEvaluationNameIndex.init(SolrCloudEvaluationNameIndex.java:53) with corresponding log traffic in the solr log. The cloud page in the Solr admin app shows the IP address in green. It's a bit hard to read in general, it's all squished up to the top. The way I would do it would be to start Solr *only* with the zkHost parameter. If you're going to use embedded zookeeper, I guess you would use zkRun instead. Once I had Solr running in cloud mode, I would upload the config to zookeeper using zkcli, and create the collection using the Collections API, including things like numShards and maxShardsPerNode on that CREATE call, not as startup properties. Then I would completely reindex my data into the new collection. It's a whole lot cleaner than trying to convert non-cloud to cloud and split shards. Shawn, I _am_ starting from clean. However, I didn't find a recipe for what you suggest as a process, and (following Hoss' suggestion) I found the recipe above with the boostrap_confdir scheme. I am mostly confused as to how I supply my solrconfig.xml and schema.xml when I follow the process you are suggesting. I know I'm verging on vampirism here, but if you could possibly find the time to turn your paragraph into either a pointer to a recipe or the command lines in a bit more detail, I'd be exceedingly grateful. Thanks, benson Thanks, Shawn
Re: 8 Shards of Cloud with 4.10.3.
So, found the following line in the guide: java -DzkRun -DnumShards=2 -Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=myconf -jar start.jar using a completely clean, new, solr_home. In my own bootstrap dir, I have my own solrconfig.xml and schema.xml, and I modified to have: -DnumShards=8 -DmaxShardsPerNode=8 When I went to start loading data into this, I failed: Caused by: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: No registered leader was found after waiting for 4000ms , collection: rni slice: shard4 at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:554) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124) at org.apache.solr.client.solrj.SolrServer.deleteByQuery(SolrServer.java:285) at org.apache.solr.client.solrj.SolrServer.deleteByQuery(SolrServer.java:271) at com.basistech.rni.index.internal.SolrCloudEvaluationNameIndex.init(SolrCloudEvaluationNameIndex.java:53) with corresponding log traffic in the solr log. The cloud page in the Solr admin app shows the IP address in green. It's a bit hard to read in general, it's all squished up to the top. On Tue, Feb 24, 2015 at 4:33 PM, Benson Margulies bimargul...@gmail.com wrote: On Tue, Feb 24, 2015 at 4:27 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : Unfortunately, this is all 5.1 and instructs me to run the 'start from : scratch' process. a) checkout the left nav of any ref guide page webpage which has a link to Older Versions of this Guide (PDF) b) i'm not entirely sure i understand what you're asking, but i'm guessing you mean... * you have a fully functional individual instance of Solr, with a single core * you only want to run that one single instance of the Solr process * you want tha single solr process to be a SolrCould of one node, but replace your single core with a collection that is divided into 8 shards. * presumably: you don't care about replication since you are only trying to run one node. what you want to look into (in the 4.10 ref guide) is how to bootstrap a SolrCloud instance from a non-SolrCloud node -- ie: startup zk, tell solr to take the configs from your single core and uploda them to zk as a configset, and register that single core as a collection. That should give you a single instance of solrcloud, with a single collection, consisting of one shard (your original core) Then you should be able to use the SPLITSHARD command to split your single shard into 2 shards, and then split them again, etc... (i don't think you can split directly to 8-sub shards with a single command) FWIW: unless you no longer have access to the original data, it would almost certainly be a lot easier to just start with a clean install of Solr in cloud mode, then create a collection with 8 shards, then re-index your data. OK, now I'm good to go. Thanks. -Hoss http://www.lucidworks.com/
Re: 8 Shards of Cloud with 4.10.3.
A little more data. Note that the cloud status shows the black bubble for a leader. See http://i.imgur.com/k2MhGPM.png. org.apache.solr.common.SolrException: No registered leader was found after waiting for 4000ms , collection: rni slice: shard4 at org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:568) at org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:551) at org.apache.solr.update.processor.DistributedUpdateProcessor.doDeleteByQuery(DistributedUpdateProcessor.java:1358) at org.apache.solr.update.processor.DistributedUpdateProcessor.processDelete(DistributedUpdateProcessor.java:1226) at org.apache.solr.update.processor.UpdateRequestProcessor.processDelete(UpdateRequestProcessor.java:55) at org.apache.solr.update.processor.LogUpdateProcessor.processDelete(LogUpdateProcessorFactory.java:121) at org.apache.solr.update.processor.UpdateRequestProcessor.processDelete(UpdateRequestProcessor.java:55) On Wed, Feb 25, 2015 at 9:44 AM, Benson Margulies bimargul...@gmail.com wrote: On Wed, Feb 25, 2015 at 8:04 AM, Shawn Heisey apa...@elyograg.org wrote: On 2/25/2015 5:50 AM, Benson Margulies wrote: So, found the following line in the guide: java -DzkRun -DnumShards=2 -Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=myconf -jar start.jar using a completely clean, new, solr_home. In my own bootstrap dir, I have my own solrconfig.xml and schema.xml, and I modified to have: -DnumShards=8 -DmaxShardsPerNode=8 When I went to start loading data into this, I failed: Caused by: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: No registered leader was found after waiting for 4000ms , collection: rni slice: shard4 at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:554) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124) at org.apache.solr.client.solrj.SolrServer.deleteByQuery(SolrServer.java:285) at org.apache.solr.client.solrj.SolrServer.deleteByQuery(SolrServer.java:271) at com.basistech.rni.index.internal.SolrCloudEvaluationNameIndex.init(SolrCloudEvaluationNameIndex.java:53) with corresponding log traffic in the solr log. The cloud page in the Solr admin app shows the IP address in green. It's a bit hard to read in general, it's all squished up to the top. The way I would do it would be to start Solr *only* with the zkHost parameter. If you're going to use embedded zookeeper, I guess you would use zkRun instead. Once I had Solr running in cloud mode, I would upload the config to zookeeper using zkcli, and create the collection using the Collections API, including things like numShards and maxShardsPerNode on that CREATE call, not as startup properties. Then I would completely reindex my data into the new collection. It's a whole lot cleaner than trying to convert non-cloud to cloud and split shards. Shawn, I _am_ starting from clean. However, I didn't find a recipe for what you suggest as a process, and (following Hoss' suggestion) I found the recipe above with the boostrap_confdir scheme. I am mostly confused as to how I supply my solrconfig.xml and schema.xml when I follow the process you are suggesting. I know I'm verging on vampirism here, but if you could possibly find the time to turn your paragraph into either a pointer to a recipe or the command lines in a bit more detail, I'd be exceedingly grateful. Thanks, benson Thanks, Shawn
Re: 8 Shards of Cloud with 4.10.3.
It's the zkcli options on my mind. zkcli's usage shows me 'bootstrap', 'upconfig', and uploading a solr.xml. When I use upconfig, it might work, but it sure is noise: benson@ip-10-111-1-103:/data/solr+rni$ 554331 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:9983] WARN org.apache.zookeeper.server.NIOServerCnxn – caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x14bc16c5e660003, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745) On Wed, Feb 25, 2015 at 10:52 AM, Shawn Heisey apa...@elyograg.org wrote: On 2/25/2015 8:35 AM, Benson Margulies wrote: Do I need a zkcli bootstrap or do I start with upconfig? What port does zkRun put zookeeper on? I personally would not use bootstrap options. They are only meant to be used once, when converting from non-cloud, but many people who use them do NOT use them only once -- they include them in their startup scripts and use them on every startup. The whole thing becomes extremely confusing. I would just use zkcli and the Collections API, so nothing ever happens that you don't explicitly request. I believe that the port for embedded zookeeper (zkRun) is the jetty listen port plus 1000, so 9983 if jetty.port is 8983 or not set. Thanks, Shawn
Re: 8 Shards of Cloud with 4.10.3.
Do I need a zkcli bootstrap or do I start with upconfig? What port does zkRun put zookeeper on? On Feb 25, 2015 10:15 AM, Shawn Heisey apa...@elyograg.org wrote: On 2/25/2015 7:44 AM, Benson Margulies wrote: Shawn, I _am_ starting from clean. However, I didn't find a recipe for what you suggest as a process, and (following Hoss' suggestion) I found the recipe above with the boostrap_confdir scheme. I am mostly confused as to how I supply my solrconfig.xml and schema.xml when I follow the process you are suggesting. I know I'm verging on vampirism here, but if you could possibly find the time to turn your paragraph into either a pointer to a recipe or the command lines in a bit more detail, I'd be exceedingly grateful. I'm willing to help in any way that I can. Normally in the conf directory for a non-cloud core you have solrconfig.xml and schema.xml, plus any other configs referenced by those files, like synomyms.txt, dih-config.xml, etc. In cloud terms, the directory containing these files is a confdir. It's best to keep the on-disk copy of your configs completely outside of the solr home so there's no confusion about what configurations are active. On-disk cores for solrcloud do not need or use a conf directory. The cloud-scripts/zkcli.sh (or zkcli.bat) script has an upconfig command with -confdir and -confname options. When doing upconfig, the zkHost value goes on the -z option to zkcli, and you only need to list one of your zookeeper hosts, although it is perfectly happy if you list them all. You would point -confdir at a directory containing the config files mentioned earlier, and -confname is the name that the config has in zookeeper, which you would then use on the collection.configName parameter for the Collections API call. Once the config is uploaded, here's an example call to that API for creating a collection: http://server:port /solr/admin/collections?action=CREATEname=testnumShards=8replicationFactor=1collection.configName=testcfgmaxShardsPerNode=8 If this is not enough detail, please let me know which part you need help with. Thanks, Shawn
Re: 8 Shards of Cloud with 4.10.3.
Bingo! Here's the recipe for the record: gcopts has the ton of gc options. First, set up shop: DIR=$PWD cd ../solr-4.10.3/example java -Xmx200g $gcopts DSTOP.PORT=7983 -DSTOP.KEY=solrrocks -Djetty.port=8983 -Dsolr.solr.home=/data/solr+rni/cloud_solr_home -Dsolr.install.dir=/dat\ a/solr-4.10.3 -Duser.timezone=UTC -Djava.net.preferIPv4Stack=true -DzkRun -jar start.jar and then: curl 'http://localhost:8983/solr/admin/collections?action=CREATEname=rninumShards=8replicationFactor=1collection.configName=rnimaxSh\ ardsPerNode=8' On Wed, Feb 25, 2015 at 11:03 AM, Benson Margulies bimargul...@gmail.com wrote: It's the zkcli options on my mind. zkcli's usage shows me 'bootstrap', 'upconfig', and uploading a solr.xml. When I use upconfig, it might work, but it sure is noise: benson@ip-10-111-1-103:/data/solr+rni$ 554331 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:9983] WARN org.apache.zookeeper.server.NIOServerCnxn – caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x14bc16c5e660003, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:745) On Wed, Feb 25, 2015 at 10:52 AM, Shawn Heisey apa...@elyograg.org wrote: On 2/25/2015 8:35 AM, Benson Margulies wrote: Do I need a zkcli bootstrap or do I start with upconfig? What port does zkRun put zookeeper on? I personally would not use bootstrap options. They are only meant to be used once, when converting from non-cloud, but many people who use them do NOT use them only once -- they include them in their startup scripts and use them on every startup. The whole thing becomes extremely confusing. I would just use zkcli and the Collections API, so nothing ever happens that you don't explicitly request. I believe that the port for embedded zookeeper (zkRun) is the jetty listen port plus 1000, so 9983 if jetty.port is 8983 or not set. Thanks, Shawn
Customized search handler components and cloud
We have a pair of customized search components which we used successfully with SolrCloud some releases back (4.x). In 4.10.3, I am trying to find the point of departure in debugging why we get no results back when querying to them with a sharded index. If I query the regular /select, all is swell. Obviously, there's a debugger in my future, but I wonder if this rings any bells for anyone. Here's what we add to solrconfig.xml. searchComponent name=name-indexing-query class=com.basistech.rni.solr.NameIndexingQueryComponent / searchComponent name=name-indexing-rescore class=com.basistech.rni.solr.NameIndexingRescoreComponent/ requestHandler name=/RNI class=solr.SearchHandler default=false arr name=first-components strname-indexing-query/str strname-indexing-rescore/str /arr /requestHandler
Re: 8 Shards of Cloud with 4.10.3.
On Tue, Feb 24, 2015 at 4:27 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : Unfortunately, this is all 5.1 and instructs me to run the 'start from : scratch' process. a) checkout the left nav of any ref guide page webpage which has a link to Older Versions of this Guide (PDF) b) i'm not entirely sure i understand what you're asking, but i'm guessing you mean... * you have a fully functional individual instance of Solr, with a single core * you only want to run that one single instance of the Solr process * you want tha single solr process to be a SolrCould of one node, but replace your single core with a collection that is divided into 8 shards. * presumably: you don't care about replication since you are only trying to run one node. what you want to look into (in the 4.10 ref guide) is how to bootstrap a SolrCloud instance from a non-SolrCloud node -- ie: startup zk, tell solr to take the configs from your single core and uploda them to zk as a configset, and register that single core as a collection. That should give you a single instance of solrcloud, with a single collection, consisting of one shard (your original core) Then you should be able to use the SPLITSHARD command to split your single shard into 2 shards, and then split them again, etc... (i don't think you can split directly to 8-sub shards with a single command) FWIW: unless you no longer have access to the original data, it would almost certainly be a lot easier to just start with a clean install of Solr in cloud mode, then create a collection with 8 shards, then re-index your data. OK, now I'm good to go. Thanks. -Hoss http://www.lucidworks.com/
Re: 8 Shards of Cloud with 4.10.3.
On Tue, Feb 24, 2015 at 1:30 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: Benson: Are you trying to run independent invocations of Solr for every node? Otherwise, you'd just want to create a 8 shard collection with maxShardsPerNode set to 8 (or more I guess). Michael Della Bitta, I don't want to run multiple invocations. I just want to exploit hardware cores with shards. Can you point me at doc for the process you are referencing here? I confess to some ongoing confusion between cores and collections. --benson Michael Della Bitta Senior Software Engineer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Tue, Feb 24, 2015 at 1:27 PM, Benson Margulies bimargul...@gmail.com wrote: With so much of the site shifted to 5.0, I'm having a bit of trouble finding what I need, and so I'm hoping that someone can give me a push in the right direction. On a big multi-core machine, I want to set up a configuration with 8 (or perhaps more) nodes treated as shards. I have some very particular solrconfig.xml and schema.xml that I need to use. Could some kind person point me at a relatively step-by-step layout? This is all on Linux, I'm happy to explicitly run Zookeeper.
Re: 8 Shards of Cloud with 4.10.3.
On Tue, Feb 24, 2015 at 3:32 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: https://cwiki.apache.org/confluence/display/solr/SolrCloud Unfortunately, this is all 5.1 and instructs me to run the 'start from scratch' process. I wish that I could take my existing one-core no-cloud config and convert it into a cloud, 8-shard config.
8 Shards of Cloud with 4.10.3.
With so much of the site shifted to 5.0, I'm having a bit of trouble finding what I need, and so I'm hoping that someone can give me a push in the right direction. On a big multi-core machine, I want to set up a configuration with 8 (or perhaps more) nodes treated as shards. I have some very particular solrconfig.xml and schema.xml that I need to use. Could some kind person point me at a relatively step-by-step layout? This is all on Linux, I'm happy to explicitly run Zookeeper.
Having a spot of trouble setting up /browse
So, I had set up a solr core modelled on the 'multicore' example in 4.10.3, which has no /browse. Upon request, I went to set up /browse. I copied in a minimal version. When I go there, I just get some XML back: response lst name=responseHeader int name=status0/int int name=QTime4/int lst name=params/ /lst result name=response numFound=0 start=0 maxScore=0.0/ /response What else does /browse depend upon?
codec factory versus posting format versus documentation
I think perhaps there is a minor doc drought, or perhaps just I'm having an SEO bad hair day. I'm trying to understand the relationship of codecFactory and postingFormat. Experiment 1: I just want to use my own codec. So, I make a CodecFactory, declare it in solrconfig.xml, and stand back? If so, why does codecFactory take a name attribute? Experiment 2: I want something per field. I can has a postingsFormat per field by using the SchemaCodecFactory and then naming ... some class in postingsFormat=. A postings format class? A Codec class? I will improve documentation when I have this all straight.
Re: Complaint of multiple /updates but solrconfig.xml has one
OK, I see, I forgot to include the core name in the URL. On Mon, Feb 9, 2015 at 8:27 PM, Benson Margulies ben...@basistech.com wrote: I see https://issues.apache.org/jira/browse/SOLR-6302 but I don't see what I am supposed to do about it. On Mon, Feb 9, 2015 at 8:19 PM, Benson Margulies ben...@basistech.com wrote: 4.10.3: Customized solrconfig.xml. My log shows: 2/9/2015, 8:14:44 PMWARNRequestHandlersMultiple requestHandler registered to the same name: /update ignoring: org.apache.solr.handler.UpdateRequestHandler But there is only one: requestHandler name=/update class=solr.UpdateRequestHandler lst name=defaults str name=update.chainRNI/str /lst /requestHandler And all attempts to post with the simple post tool yield: SimplePostTool: WARNING: IOException while reading response: java.io.FileNotFoundException: http://localhost:8983/solr/update 1 files indexed. COMMITting Solr index changes to http://localhost:8983/solr/update.. SimplePostTool: WARNING: Solr returned an error #404 (Not Found) for url: http://localhost:8983/solr/update?commit=true The admin UI is alive and kicking. When I look at the solrconfig.xml file from there I only see only handler on /update.
Re: Complaint of multiple /updates but solrconfig.xml has one
I see https://issues.apache.org/jira/browse/SOLR-6302 but I don't see what I am supposed to do about it. On Mon, Feb 9, 2015 at 8:19 PM, Benson Margulies ben...@basistech.com wrote: 4.10.3: Customized solrconfig.xml. My log shows: 2/9/2015, 8:14:44 PMWARNRequestHandlersMultiple requestHandler registered to the same name: /update ignoring: org.apache.solr.handler.UpdateRequestHandler But there is only one: requestHandler name=/update class=solr.UpdateRequestHandler lst name=defaults str name=update.chainRNI/str /lst /requestHandler And all attempts to post with the simple post tool yield: SimplePostTool: WARNING: IOException while reading response: java.io.FileNotFoundException: http://localhost:8983/solr/update 1 files indexed. COMMITting Solr index changes to http://localhost:8983/solr/update.. SimplePostTool: WARNING: Solr returned an error #404 (Not Found) for url: http://localhost:8983/solr/update?commit=true The admin UI is alive and kicking. When I look at the solrconfig.xml file from there I only see only handler on /update.
Complaint of multiple /updates but solrconfig.xml has one
4.10.3: Customized solrconfig.xml. My log shows: 2/9/2015, 8:14:44 PMWARNRequestHandlersMultiple requestHandler registered to the same name: /update ignoring: org.apache.solr.handler.UpdateRequestHandler But there is only one: requestHandler name=/update class=solr.UpdateRequestHandler lst name=defaults str name=update.chainRNI/str /lst /requestHandler And all attempts to post with the simple post tool yield: SimplePostTool: WARNING: IOException while reading response: java.io.FileNotFoundException: http://localhost:8983/solr/update 1 files indexed. COMMITting Solr index changes to http://localhost:8983/solr/update.. SimplePostTool: WARNING: Solr returned an error #404 (Not Found) for url: http://localhost:8983/solr/update?commit=true The admin UI is alive and kicking. When I look at the solrconfig.xml file from there I only see only handler on /update.
log location when using bin/start
Running bin/start with a command like: /data/solr-4.10.3/bin/solr start -s $PWD/solr_home -a -Djava.library.path=$libdir -Dbt.root=$bt_root\ $@ I note that the logs are ending up in the solr install dir/examples/logs. Can I move them?
Re: Is there any sentence tokenizers in sold 4.9.0?
Basis Technology's toolset includes sentence boundary detectors. Please contact me for more details. On Fri, Sep 12, 2014 at 1:15 AM, Sandeep B A belgavi.sand...@gmail.com wrote: Hi All, Sorry for the delayed response. I was out of office for last few days and was not able to reply. Thanks for the information. We have a use case were one sentence is the unit token with which we need to do normalization and semantic analyzer. We need to finalize on the type of normalizer and analyzer but was trying to view if solr has any inbuilt libraries, so that no cross language integration might be required. Again Wil get back if something works or not works. @susheel, Thanks will try to see if that works. Thanks, Sandeep. On Sep 8, 2014 12:54 PM, Sandeep B A belgavi.sand...@gmail.com wrote: Hi Susheel , Thanks for the information. I have crawled few website and all I need is for sentence tokenizers on the data I have collected. These websites are English only. Well I don't have experience in writing custom sentence tokenizers for solr. Is there any tutorial link which tell how to do it? Is it possible to integrate nltk for solr? If yes how to do it? Because I found sentence tokenizers for English in nltk. Thanks, Sandeep On Sep 5, 2014 8:10 PM, Sandeep B A belgavi.sand...@gmail.com wrote: Sorry for typo it is solr 4.9.0 instead of sold 4.9.0 On Sep 5, 2014 7:48 PM, Sandeep B A belgavi.sand...@gmail.com wrote: Hi, I was looking out the options for sentence tokenizers default in solr but could not find it. Does any one used? Integrated from any other language tokenizers to solr. Example python etc.. Please let me know. Thanks and regards, Sandeep
Re: Business Name spell check
Trying to shoehorn business name resolution or correction purely into Solr tokenization and spell checking is not, in my opinion, a viable approach. It seems to me that you need a query parser that does something very different from pure tokenization, and you might also need a more complex approach to matching names. Full disclosure: I work for a company that builds one of those. You could talk to us, or you could at least look at the problem from the point of view of our approach: take the business names, index them in some way that allows for fuzzy matching (which is _not_ just treating them as ordinary tokenized text), then take the queries, and map them to fuzzy matching. The whole business is comparable to the geo support in Solr: a special data type that is treated with domain-specific techniques.
Re: Solr Japanese support
Your problem has nothing to do with Japanese. Perhaps a content-type for CSV would work better? On Sat, Mar 15, 2014 at 12:50 PM, Bala Iyer grb...@yahoo.com wrote: Hi, I am new to Solr japanese. I added the support for japanese on schema.xml How can i insert Japanese text into that field either by solr client (java / php / ruby ) or by curl schema.xml field name=username type=string indexed=true stored=true multiValued=true omitNorms=true termVectors=true / field name=timestamp type=date indexed=true stored=true multiValued=true omitNorms=true termVectors=true / field name=jtxt type=text_ja indexed=true stored=true multiValued=true omitNorms=true termVectors=true / fieldType name=text_ja class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=false analyzer tokenizer class=solr.JapaneseTokenizerFactory mode=search/ !--tokenizer class=solr.JapaneseTokenizerFactory mode=search userDictionary=lang/userdict_ja.txt/-- !-- Reduces inflected verbs and adjectives to their base/dictionary forms (辞書形) -- filter class=solr.JapaneseBaseFormFilterFactory/ !-- Removes tokens with certain part-of-speech tags -- filter class=solr.JapanesePartOfSpeechStopFilterFactory tags=lang/stoptags_ja.txt / !-- Normalizes full-width romaji to half-width and half-width kana to full-width (Unicode NFKC subset) -- filter class=solr.CJKWidthFilterFactory/ !-- Removes common tokens typically not useful for search, but have a negative effect on ranking -- filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_ja.txt / !-- Normalizes common katakana spelling variations by removing any last long sound character (U+30FC) -- filter class=solr.JapaneseKatakanaStemFilterFactory minimumLength=4/ !-- Lower-cases romaji characters -- filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType my insert.csv file id,username,timestamp,content,jtxt 9,x,2013-12-26T10:14:26Z,Hello ,マイ ドキュメント = I am trying to insert through curl it gives me error curl http://localhost:8983/solr/collection1/update/csv?separator=,commit=true; -H Content-Type: text/plain; charset=utf-8 --data-binary @insert.csv ERROR ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status400/intint name=QTime23/int /lstlst name=errorstr name=msgDocument is missing mandatory uniqueKey field: id/strint name=code400/int/lst /response I know i should not use Content-Type as text/plain = Thanks
Mixing lucene scoring and other scoring
Some months ago, I talked to some people at LR about this, but I can't find my notes. Imagine a function of some fields that produces a score between 0 and 1. Imagine that you want to combine this score with relevance over some more or less complex ordinary query. What are the options, given the arbitrary nature of Lucene scores?
A bit lost in the land of schemaless Solr
Say that I have 10 fieldTypes for 10 languages. Is there a way to associate a naming convention from field names to field types so that I can avoid bothering with all those dynamic fields?
(lack) of error for missing library?
!-- an exact 'path' can be used instead of a 'dir' to specify a specific jar file. This will cause a serious error to be logged if it can't be loaded. -- is the comment, but when I put a completely missing path in there -- no error. Should I file a JIRA?
Re: Multi Lingual Analyzer
MT is not nearly good enough to allow approach 1 to work. On Mon, Jan 20, 2014 at 9:25 AM, Erick Erickson erickerick...@gmail.com wrote: It Depends (tm). Approach (2) will give you better, more specific search results. (1) is simpler to implement and might be good enough... On Mon, Jan 20, 2014 at 5:21 AM, David Philip davidphilipshe...@gmail.com wrote: Hi, I have a query on Multi-Lingual Analyser. Which one of the below is the best approach? 1.1.To develop a translator that translates a/any language to English and then use standard English analyzer to analyse – use translator, both at index time and while search time? 2. 2. To develop a language specific analyzer and use that by creating specific field only for that language? We have client data coming in different Languages: Kannada and Telegu and others later.This data is basically the text written by customer in that language. Requirement is to develop analyzers particular for these language. Thanks - David
Re: Tracking down the input that hits an analysis chain bug
I think that https://issues.apache.org/jira/browse/SOLR-5623 should be ready to go. Would someone please commit from the PR? If there's a preference, I can attach a patch as well. On Fri, Jan 10, 2014 at 1:37 PM, Benson Margulies bimargul...@gmail.com wrote: Thanks, that's the recipe that I need. On Fri, Jan 10, 2014 at 11:40 AM, Chris Hostetter hossman_luc...@fucit.org wrote: : Is there a neighborhood of existing tests I should be visiting here? You'll need a custom schema that refers to your new MockFailOnCertainTokensFilterFactory, so i would create a completley new test class somewhere in ...solr.update (you're testing that an update fails with a clean error) -Hoss http://www.lucidworks.com/
Analyzers versus Tokenizers/TokenFilters
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters never mentions an Analyzer class. http://wiki.apache.org/solr/SolrPlugins talks about subclasses of SolrAnalyzer as ways of delivering an entire analysis chain and still 'minding the gap'. Anyone care to offer a comparison of the viewpoints?
Re: Analyzers versus Tokenizers/TokenFilters
Ahmet, So, this is an interesting difference between Lucene (and ES) and Solr. In Lucene, the idea seems to be that you package up a reusable analysis chain as an analyzer. Saying 'use analyzer X' is less complex than saying 'use tokenizer T and filters F1, F2, ...'. thanks, benson On Wed, Jan 15, 2014 at 5:09 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi Benson, Using lucene analyzer in schema.xlm should be last resort. For very specific reasons : if you have an existing analyzer, etc. Ahmet On Wednesday, January 15, 2014 11:52 PM, Benson Margulies ben...@basistech.com wrote: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters never mentions an Analyzer class. http://wiki.apache.org/solr/SolrPlugins talks about subclasses of SolrAnalyzer as ways of delivering an entire analysis chain and still 'minding the gap'. Anyone care to offer a comparison of the viewpoints?
Re: Tracking down the input that hits an analysis chain bug
OK, patch forthcoming. On Fri, Jan 10, 2014 at 11:23 AM, Chris Hostetter hossman_luc...@fucit.org wrote: : The problem manifests as this sort of thing: : : Jan 3, 2014 6:05:33 PM org.apache.solr.common.SolrException log : SEVERE: java.lang.IllegalArgumentException: startOffset must be : non-negative, and endOffset must be = startOffset, : startOffset=-1811581632,endOffset=-1811581632 Is there a stack trace in the log to go along with that? there should be. My suspicion is that since analysis errors like these are RuntimeExceptions, they may not be getting caught re-thrown with as much context as they should -- so by the time they get logged (or returned to the client) there isn't any info about the problematic field value, let alone the unqiueKey. If we had a test case that reproduces (ie: with a mock tokenfilter that always throws a RuntimeException when a token matches fail_now or something) we could have some tests that assert indexing a doc with that token results in a useful error -- which should help ensure that useful error also gets logged (although i don't think we don't really have any easy way of asserting specific log messages at the moment) -Hoss http://www.lucidworks.com/
Re: Tracking down the input that hits an analysis chain bug
Is there a neighborhood of existing tests I should be visiting here? On Fri, Jan 10, 2014 at 11:27 AM, Benson Margulies bimargul...@gmail.com wrote: OK, patch forthcoming. On Fri, Jan 10, 2014 at 11:23 AM, Chris Hostetter hossman_luc...@fucit.org wrote: : The problem manifests as this sort of thing: : : Jan 3, 2014 6:05:33 PM org.apache.solr.common.SolrException log : SEVERE: java.lang.IllegalArgumentException: startOffset must be : non-negative, and endOffset must be = startOffset, : startOffset=-1811581632,endOffset=-1811581632 Is there a stack trace in the log to go along with that? there should be. My suspicion is that since analysis errors like these are RuntimeExceptions, they may not be getting caught re-thrown with as much context as they should -- so by the time they get logged (or returned to the client) there isn't any info about the problematic field value, let alone the unqiueKey. If we had a test case that reproduces (ie: with a mock tokenfilter that always throws a RuntimeException when a token matches fail_now or something) we could have some tests that assert indexing a doc with that token results in a useful error -- which should help ensure that useful error also gets logged (although i don't think we don't really have any easy way of asserting specific log messages at the moment) -Hoss http://www.lucidworks.com/
Re: Tracking down the input that hits an analysis chain bug
Thanks, that's the recipe that I need. On Fri, Jan 10, 2014 at 11:40 AM, Chris Hostetter hossman_luc...@fucit.org wrote: : Is there a neighborhood of existing tests I should be visiting here? You'll need a custom schema that refers to your new MockFailOnCertainTokensFilterFactory, so i would create a completley new test class somewhere in ...solr.update (you're testing that an update fails with a clean error) -Hoss http://www.lucidworks.com/
Re: Tracking down the input that hits an analysis chain bug
I rather assumed that there was some log4j-ish config to be set that would do this for me. Lacking one, I guess I'll end up there. On Fri, Jan 3, 2014 at 8:23 PM, Michael Sokolov msoko...@safaribooksonline.com wrote: Have you considered using a custom UpdateProcessor to catch the exception and provide more context in the logs? -Mike On 01/03/2014 03:33 PM, Benson Margulies wrote: Robert, Yes, if the problem was not data-dependent, indeed I wouldn't need to index anything. However, I've run a small mountain of data through our tokenizer on my machine, and never seen the error, but my customer gets these errors in the middle of a giant spew of data. As it happens, I _was_ missing that call to clearAttributes(), (and the usual implementation of end()), but I found and fixed that problem precisely by creating a random data test case using checkRandomData(). Unfortunately, fixing that didn't make the customer's errors go away. So I'm left needing to help them identify the data that provokes this, because I've so far failed to come up with any. --benson On Fri, Jan 3, 2014 at 2:16 PM, Robert Muir rcm...@gmail.com wrote: This exception comes from OffsetAttributeImpl (e.g. you dont need to index anything to reproduce it). Maybe you have a missing clearAttributes() call (your tokenizer 'returns true' without calling that first)? This could explain it, if something like a StopFilter is also present in the chain: basically the offsets overflow. the test stuff in BaseTokenStreamTestCase should be able to detect this as well... On Fri, Jan 3, 2014 at 1:56 PM, Benson Margulies ben...@basistech.com wrote: Using Solr Cloud with 4.3.1. We've got a problem with a tokenizer that manifests as calling OffsetAtt.setOffsets() with invalid inputs. OK, so, we want to figure out what input provokes our code into getting into this pickle. The problem happens on SolrCloud nodes. The problem manifests as this sort of thing: Jan 3, 2014 6:05:33 PM org.apache.solr.common.SolrException log SEVERE: java.lang.IllegalArgumentException: startOffset must be non-negative, and endOffset must be = startOffset, startOffset=-1811581632,endOffset=-1811581632 How could we get a document ID so that we can tell which document was being processed?
Tracking down the input that hits an analysis chain bug
Using Solr Cloud with 4.3.1. We've got a problem with a tokenizer that manifests as calling OffsetAtt.setOffsets() with invalid inputs. OK, so, we want to figure out what input provokes our code into getting into this pickle. The problem happens on SolrCloud nodes. The problem manifests as this sort of thing: Jan 3, 2014 6:05:33 PM org.apache.solr.common.SolrException log SEVERE: java.lang.IllegalArgumentException: startOffset must be non-negative, and endOffset must be = startOffset, startOffset=-1811581632,endOffset=-1811581632 How could we get a document ID so that we can tell which document was being processed?
Re: Tracking down the input that hits an analysis chain bug
Robert, Yes, if the problem was not data-dependent, indeed I wouldn't need to index anything. However, I've run a small mountain of data through our tokenizer on my machine, and never seen the error, but my customer gets these errors in the middle of a giant spew of data. As it happens, I _was_ missing that call to clearAttributes(), (and the usual implementation of end()), but I found and fixed that problem precisely by creating a random data test case using checkRandomData(). Unfortunately, fixing that didn't make the customer's errors go away. So I'm left needing to help them identify the data that provokes this, because I've so far failed to come up with any. --benson On Fri, Jan 3, 2014 at 2:16 PM, Robert Muir rcm...@gmail.com wrote: This exception comes from OffsetAttributeImpl (e.g. you dont need to index anything to reproduce it). Maybe you have a missing clearAttributes() call (your tokenizer 'returns true' without calling that first)? This could explain it, if something like a StopFilter is also present in the chain: basically the offsets overflow. the test stuff in BaseTokenStreamTestCase should be able to detect this as well... On Fri, Jan 3, 2014 at 1:56 PM, Benson Margulies ben...@basistech.com wrote: Using Solr Cloud with 4.3.1. We've got a problem with a tokenizer that manifests as calling OffsetAtt.setOffsets() with invalid inputs. OK, so, we want to figure out what input provokes our code into getting into this pickle. The problem happens on SolrCloud nodes. The problem manifests as this sort of thing: Jan 3, 2014 6:05:33 PM org.apache.solr.common.SolrException log SEVERE: java.lang.IllegalArgumentException: startOffset must be non-negative, and endOffset must be = startOffset, startOffset=-1811581632,endOffset=-1811581632 How could we get a document ID so that we can tell which document was being processed?
TokenizerFactory from 4.2.0 to 4.3.0
TokenizerFactory changed, incompatibly with subclasses, from 4.2.0 to 4.3.0. Subclasses must now implement a different overload of create, and may not implement the old one. Has anyone got any devious strategies other than multiple copies of code to deal with this when supporting multiple versions of Solr?
Re: Solr Patent
I am not a lawyer. The Apache Software Foundation cannot 'protect Solr developers.' Patent infringement is a claim made against someone who derived economic benefit from an invention, not someone who writes code. The patent clause in the Apache License requires people who contribute code to grant certain licenses. It does not, and cannot, prevent someone else from asserting that a user of Apache Solr is infringing on some patent owned by someone who has never contributed to the project.
SOLR-4872 and LUCENE-2145 (or, how to clean up a Tokenizer)
Could I have some help on the combination of these two? Right now, it appears that I'm stuck with a finalizer to chase after native resources in a Tokenizer. Am I missing something?
How can a Tokenizer be CoreAware?
I am currently testing some things with Solr 4.0.0. I tried to make a tokenizer CoreAware, and was rewarded with: Caused by: org.apache.solr.common.SolrException: Invalid 'Aware' object: com.basistech.rlp.solr.RLPTokenizerFactory@19336006 -- org.apache.solr.util.plugin.SolrCoreAware must be an instance of: [org.apache.solr.request.SolrRequestHandler] [org.apache.solr.response.QueryResponseWriter] [org.apache.solr.handler.component.SearchComponent] [org.apache.solr.update.processor.UpdateRequestProcessorFactory] [org.apache.solr.handler.component.ShardHandlerFactory] I need this to allow cleanup of some cached items in the tokenizer. Questions: 1: will a newer version allow me to do this directly? 2: is there some other approach that anyone would recommend? I could, for example, make a fake object in the list above to act as a singleton with a static accessor, but that seems pretty ugly.
Seeming bug in ConcurrentUpdateSolrServer
The comment here is clearly wrong, since there is no division by two. I think that the code is wrong, because this results in not starting runners when it should start runners. Am I misanalyzing? if (runners.isEmpty() || (queue.remainingCapacity() queue.size() // queue // is // half // full // and // we // can // add // more // runners runners.size() threadCount)) {
Re: Seeming bug in ConcurrentUpdateSolrServer
Ah. So now I have to find some other explanation of why it never creates more than one thread, even when I make a very deep queue and specify 6 threads. On Wed, May 29, 2013 at 2:25 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Wed, May 29, 2013 at 11:29 PM, Benson Margulies bimargul...@gmail.comwrote: The comment here is clearly wrong, since there is no division by two. I think that the code is wrong, because this results in not starting runners when it should start runners. Am I misanalyzing? if (runners.isEmpty() || (queue.remainingCapacity() queue.size() // queue // is // half // full // and // we // can // add // more // runners runners.size() threadCount)) { queue.remainingCapacity() returns capacity - queue.size() so the comment is correct. -- Regards, Shalin Shekhar Mangar.
Re: Seeming bug in ConcurrentUpdateSolrServer
I now understand the algorithm, but I don't understand why is the way it is. Consider one of these objects configure with a handful of threads and a pretty big queue. When the first request comes in, the object creates one runner. It then won't create a second runner until the Q reaches 1/2-full. If the idea is that we want to pile up 'a lot' (1/2-of-a-q) of work before sending any of it, why start that first runner? On Wed, May 29, 2013 at 2:45 PM, Benson Margulies bimargul...@gmail.com wrote: Ah. So now I have to find some other explanation of why it never creates more than one thread, even when I make a very deep queue and specify 6 threads. On Wed, May 29, 2013 at 2:25 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Wed, May 29, 2013 at 11:29 PM, Benson Margulies bimargul...@gmail.comwrote: The comment here is clearly wrong, since there is no division by two. I think that the code is wrong, because this results in not starting runners when it should start runners. Am I misanalyzing? if (runners.isEmpty() || (queue.remainingCapacity() queue.size() // queue // is // half // full // and // we // can // add // more // runners runners.size() threadCount)) { queue.remainingCapacity() returns capacity - queue.size() so the comment is correct. -- Regards, Shalin Shekhar Mangar.
Not so concurrent concurrency
I can't quite apply SolrMeter to my problem, so I did something of my own. The brains of the operation are the function here. This feeds a ConcurrentUpdateSolrServer about 95 documents, each about 10mb, and 'threads' is six. Yet Solr just barely uses more than one core. private long doIteration(File[] filesToRead) throws IOException, SolrServerException { ConcurrentUpdateSolrServer concurrentServer = new ConcurrentUpdateSolrServer(launcher.getSolrServer().getBaseURL(), 1000, threads); UpdateRequest updateRequest = new UpdateRequest(updateUrl); updateRequest.setCommitWithin(1); Stopwatch stopwatch = new Stopwatch(); ListFile allFiles = Arrays.asList(filesToRead); IteratorFile fileIterator = allFiles.iterator(); while (fileIterator.hasNext()) { ListFile thisBatch = Lists.newArrayList(); int batchByteCount = 0; while (batchByteCount BATCH_LIMIT fileIterator.hasNext()) { File thisFile = fileIterator.next(); thisBatch.add(thisFile); batchByteCount += thisFile.length(); } LOG.info(String.format(update %s files, thisBatch.size())); updateRequest.setDocIterator(new StreamingDocumentIterator(thisBatch)); stopwatch.start(); concurrentServer.request(updateRequest); concurrentServer.blockUntilFinished(); stopwatch.stop(); }
Benchmarking Solr
I'd like to run a repeatable test of having Solr ingest a corpus of docs on disk, to measure the speed of some alternative things plugged in. Anyone have some advice to share? One approach would be a quick SolrJ program that pushed the entire stack as one giant collection with a commit at the end.
Re: solr.xml or its successor in the wiki
I suppose you saw my JIRA suggesting that solr.xml should might have the same repetoire of 'lib' elements as solrconfig.xml, instead of just a single 'str'. On Mon, May 20, 2013 at 11:16 AM, Erick Erickson erickerick...@gmail.com wrote: What's supposed to happen (not guaranteeing it is completely correct, mind you) is that the presence of a cores tag defines which checks are performed. Errors are thrown on old-style constructs when no cores tag is present and vice-versa. Best Erick On Sun, May 19, 2013 at 7:20 PM, Benson Margulies bimargul...@gmail.com wrote: One point of confusion: Is the compatibility code I hit trying to prohibit the 'str' form when it sees old-fangled cores? Or when the current running version pre-5.0? I hope it's the former. On Sun, May 19, 2013 at 6:47 PM, Shawn Heisey s...@elyograg.org wrote: On 5/19/2013 4:38 PM, Benson Margulies wrote: Shawn, thanks. need any more jiras on this? I don't think so, but if you grab the 4.3 branch or branch_4x and find any bugs, let us know. Thanks, Shawn
solr.xml or its successor in the wiki
http://wiki.apache.org/solr/ConfiguringSolr does not point to any information on solr.xml. Given https://issues.apache.org/jira/browse/SOLR-4791, I'm a bit confused, and I need to set up a sharedLib directory for 4.3.0. I would do some writing or linking if I had some raw material ...
Re: solr.xml or its successor in the wiki
I found http://wiki.apache.org/solr/Solr.xml%204.3%20and%20beyond, but it doesn't mention the successor to sharedLib. On Sun, May 19, 2013 at 12:02 PM, Benson Margulies bimargul...@gmail.com wrote: http://wiki.apache.org/solr/ConfiguringSolr does not point to any information on solr.xml. Given https://issues.apache.org/jira/browse/SOLR-4791, I'm a bit confused, and I need to set up a sharedLib directory for 4.3.0. I would do some writing or linking if I had some raw material ...
Re: solr.xml or its successor in the wiki
OK, I found the successor. On Sun, May 19, 2013 at 12:40 PM, Benson Margulies bimargul...@gmail.com wrote: I found http://wiki.apache.org/solr/Solr.xml%204.3%20and%20beyond, but it doesn't mention the successor to sharedLib. On Sun, May 19, 2013 at 12:02 PM, Benson Margulies bimargul...@gmail.com wrote: http://wiki.apache.org/solr/ConfiguringSolr does not point to any information on solr.xml. Given https://issues.apache.org/jira/browse/SOLR-4791, I'm a bit confused, and I need to set up a sharedLib directory for 4.3.0. I would do some writing or linking if I had some raw material ...
Re: solr.xml or its successor in the wiki
Starting with the shipped solr.xml, I added a new-style str child to configure a shared lib, and i was rewarded with: Caused by: org.apache.solr.common.SolrException: Should not have found solr/str[@name='sharedLib'] solr.xml may be a mix of old and new style formats. at org.apache.solr.core.ConfigSolrXml.failIfFound(ConfigSolrXml.java:169) at org.apache.solr.core.ConfigSolrXml.init(ConfigSolrXml.java:150) at org.apache.solr.core.ConfigSolrXml.init(ConfigSolrXml.java:94) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:387) ... 42 more Is this a bug? I seem to now be caught on a fork between 4791 and this. On Sun, May 19, 2013 at 12:52 PM, Benson Margulies bimargul...@gmail.com wrote: OK, I found the successor. On Sun, May 19, 2013 at 12:40 PM, Benson Margulies bimargul...@gmail.com wrote: I found http://wiki.apache.org/solr/Solr.xml%204.3%20and%20beyond, but it doesn't mention the successor to sharedLib. On Sun, May 19, 2013 at 12:02 PM, Benson Margulies bimargul...@gmail.com wrote: http://wiki.apache.org/solr/ConfiguringSolr does not point to any information on solr.xml. Given https://issues.apache.org/jira/browse/SOLR-4791, I'm a bit confused, and I need to set up a sharedLib directory for 4.3.0. I would do some writing or linking if I had some raw material ...
Re: solr.xml or its successor in the wiki
Shawn, thanks. need any more jiras on this? On May 19, 2013, at 6:37 PM, Shawn Heisey s...@elyograg.org wrote: On 5/19/2013 11:27 AM, Benson Margulies wrote: Starting with the shipped solr.xml, I added a new-style str child to configure a shared lib, and i was rewarded with: Caused by: org.apache.solr.common.SolrException: Should not have found solr/str[@name='sharedLib'] solr.xml may be a mix of old and new style formats. at org.apache.solr.core.ConfigSolrXml.failIfFound(ConfigSolrXml.java:169) at org.apache.solr.core.ConfigSolrXml.init(ConfigSolrXml.java:150) at org.apache.solr.core.ConfigSolrXml.init(ConfigSolrXml.java:94) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:387) ... 42 more There are serious problems with the new solr.xml format in 4.3. Due to major changes in the code between 4.3 and 4.4, the problems will not be fixed in 4.3.1. You'll need to wait for 4.4 before attempting to use it. The new format will be used in the example in 4.4. I have updated the ConfiguringSolr page with some additional info, and reorganized it. I believe the 4.3 and beyond page should be changed to 4.4 and beyond. The sharedLib attribute is broken in 4.3.0, fixed in 4.3.1 with SOLR-4791, which should be out very soon. A workaround is to put your jars in ${solr.solr.home}/lib which does not require configuration. After 4.3.1 comes out (or if you a use dev version), if you want to use sharedLib in the old-style solr.xml file, it will not be a str tag, it is an attribute on the solr tag. The sharedLib values are relative to solr.solr.home: solr persistent=true sharedLib=libextra cores adminPath=/admin/cores Thanks, Shawn
Re: solr.xml or its successor in the wiki
One point of confusion: Is the compatibility code I hit trying to prohibit the 'str' form when it sees old-fangled cores? Or when the current running version pre-5.0? I hope it's the former. On Sun, May 19, 2013 at 6:47 PM, Shawn Heisey s...@elyograg.org wrote: On 5/19/2013 4:38 PM, Benson Margulies wrote: Shawn, thanks. need any more jiras on this? I don't think so, but if you grab the 4.3 branch or branch_4x and find any bugs, let us know. Thanks, Shawn
wiki versus downloads versus archives
http://wiki.apache.org/solr/Solr3.1 claims that Solr3.1 is available in a place where it is not, and I can't find a link on the front page to the archive for old releases.
Re: wiki versus downloads versus archives
tanks. On Thu, May 16, 2013 at 4:28 PM, Shawn Heisey s...@elyograg.org wrote: On 5/16/2013 2:21 PM, Benson Margulies wrote: http://wiki.apache.org/solr/**Solr3.1http://wiki.apache.org/solr/Solr3.1claims that Solr3.1 is available in a place where it is not, and I can't find a link on the front page to the archive for old releases. Download links fixed on the wiki pages for 3.1 and 3.2. Thanks, Shawn
A request handler that manipulated the index
I am thinking about trying to structure a problem as a Solr plugin. The nature of the plugin is that it would need to read and write the lucene index to do its work. It could not be cleanly split into URP 'over here' and a Search Component 'over there'. Are there invariants of Solr that would preclude this, like assumptions in the implementation of the cache?
Solr1.4 and threads ....
We've got a tokenizer which is quite explicitly coded on the assumption that it will only be called from one thread at a time. After all, what would it mean for two threads to make interleaved calls to the hasNext() function()? Yet, a customer of ours with a gigantic instance of Solr 1.4 reports incidents in which we throw an exception that indicates (we think), that two different threads made interleaved calls. Does this suggest anything to anyone? Other than that we've misanalyzed the logic in the tokenizer and there's a way to make it burp on one thread?
Re: Why would solr norms come up different from Lucene norms?
On Sat, May 5, 2012 at 7:59 PM, Lance Norskog goks...@gmail.com wrote: Which Similarity class do you use for the Lucene code? Solr has a custom one. I am embarassed to report that I also have a custom similarity that I didn't know about, and once I configured that into Solr all was well. On Fri, May 4, 2012 at 6:30 AM, Benson Margulies bimargul...@gmail.com wrote: So, I've got some code that stores the same documents in a Lucene 3.5.0 index and a Solr 3.5.0 instance. It's only five documents. For a particular field, the Solr norm is always 0.625, while the Lucene norm is .5. I've watched the code in NormsWriterPerField in both cases. In Solr we've got .577, in naked Lucene it's .5. I tried to check for boosts, and I don't see any non-1.0 document or field boosts. The Solr field is: field name=bt_rni_NameHRK_encodedName type=text_ws indexed=true stored=true multiValued=false / -- Lance Norskog goks...@gmail.com
Why would solr norms come up different from Lucene norms?
So, I've got some code that stores the same documents in a Lucene 3.5.0 index and a Solr 3.5.0 instance. It's only five documents. For a particular field, the Solr norm is always 0.625, while the Lucene norm is .5. I've watched the code in NormsWriterPerField in both cases. In Solr we've got .577, in naked Lucene it's .5. I tried to check for boosts, and I don't see any non-1.0 document or field boosts. The Solr field is: field name=bt_rni_NameHRK_encodedName type=text_ws indexed=true stored=true multiValued=false /
Latest solr4 snapshot seems to be giving me a lot of unhappy logging about 'Log4j', should I be concerned?
CoreContainer.java, in the method 'load', finds itself calling loader.NewInstance with an 'fname' of Log4j of the slf4j backend is 'Log4j'. e.g.: 2012-05-01 10:40:32,367 org.apache.solr.core.CoreContainer - Unable to load LogWatcher org.apache.solr.common.SolrException: Error loading class 'Log4j' What is it actually looking for? Have I misplaced something?
Re: Latest solr4 snapshot seems to be giving me a lot of unhappy logging about 'Log4j', should I be concerned?
On Tue, May 1, 2012 at 12:16 PM, Mark Miller markrmil...@gmail.com wrote: There is a recent JIRA issue about keeping the last n logs to display in the admin UI. That introduced a problem - and then the fix introduced a problem - and then the fix mitigated the problem but left that ugly logging as a by product. Don't remember the issue # offhand. I think there was a dispute about what should be done with it. On May 1, 2012, at 11:14 AM, Benson Margulies wrote: CoreContainer.java, in the method 'load', finds itself calling loader.NewInstance with an 'fname' of Log4j of the slf4j backend is 'Log4j'. Couldn't someone just fix the if statement to say, 'OK, if we're doing log4j, we have no log watcher' and skip all the loud failing on the way? e.g.: 2012-05-01 10:40:32,367 org.apache.solr.core.CoreContainer - Unable to load LogWatcher org.apache.solr.common.SolrException: Error loading class 'Log4j' What is it actually looking for? Have I misplaced something? - Mark Miller lucidimagination.com
Re: Latest solr4 snapshot seems to be giving me a lot of unhappy logging about 'Log4j', should I be concerned?
Yes, I'm the author of that JIRA. On Tue, May 1, 2012 at 8:45 PM, Ryan McKinley ryan...@gmail.com wrote: check a release since r1332752 If things still look problematic, post a comment on: https://issues.apache.org/jira/browse/SOLR-3426 this should now have a less verbose message with an older SLF4j and with Log4j On Tue, May 1, 2012 at 10:14 AM, Gopal Patwa gopalpa...@gmail.com wrote: I have similar issue using log4j for logging with trunk build, the CoreConatainer class print big stack trace on our jboss 4.2.2 startup, I am using sjfj 1.5.2 10:07:45,918 WARN [CoreContainer] Unable to read SLF4J version java.lang.NoSuchMethodError: org.slf4j.impl.StaticLoggerBinder.getSingleton()Lorg/slf4j/impl/StaticLoggerBinder; at org.apache.solr.core.CoreContainer.load(CoreContainer.java:395) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:355) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:304) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:101) On Tue, May 1, 2012 at 9:25 AM, Benson Margulies bimargul...@gmail.comwrote: On Tue, May 1, 2012 at 12:16 PM, Mark Miller markrmil...@gmail.com wrote: There is a recent JIRA issue about keeping the last n logs to display in the admin UI. That introduced a problem - and then the fix introduced a problem - and then the fix mitigated the problem but left that ugly logging as a by product. Don't remember the issue # offhand. I think there was a dispute about what should be done with it. On May 1, 2012, at 11:14 AM, Benson Margulies wrote: CoreContainer.java, in the method 'load', finds itself calling loader.NewInstance with an 'fname' of Log4j of the slf4j backend is 'Log4j'. Couldn't someone just fix the if statement to say, 'OK, if we're doing log4j, we have no log watcher' and skip all the loud failing on the way? e.g.: 2012-05-01 10:40:32,367 org.apache.solr.core.CoreContainer - Unable to load LogWatcher org.apache.solr.common.SolrException: Error loading class 'Log4j' What is it actually looking for? Have I misplaced something? - Mark Miller lucidimagination.com
Re: Unsubscribe does not appear to be working
There is no such thing as a 'solr forum' or a 'solr forum account.' If you are subscribed to this list, an email to the unsubscribe address will unsubscribe you. If some intermediary or third party is forwarding email from this list to you, no one here can help you. On Fri, Apr 27, 2012 at 12:09 PM, Kevin Bootz kbo...@caci.com wrote: I have tried the unsubscribe process but believe it to be broken as I've gone as far as deleting my solr forum account and yet continue to receive emails. Is there a moderator that can remove my email from the list please? Thanks
Re: Query parsing VS marshalling/unmarshalling
2012/4/24 Mindaugas Žakšauskas min...@gmail.com: Hi, I maintain a distributed system which Solr is part of. The data which is kept is Solr is permissioned and permissions are currently implemented by taking the original user query, adding certain bits to it which would make it return less data in the search results. Now I am at the point where I need to go over this functionality and try to improve it. Changing this to send two separate queries (q=...fq=...) would be the first logical thing to do, however I was thinking of an extra improvement. Instead of generating filter query, converting it into a String, sending over the HTTP just to parse it by Solr again - would it not be better to take generated Lucene fq query, serialize it using Java serialization, convert it to, say, Base64 and then send and deserialize it on the Solr end? Has anyone tried doing any performance comparisons on this topic? I'm about to try out a contribution for serializing queries in Javascript using Jackson. I've previously done this by serializing my own data structure and putting the JSON into a custom query parameter. I am particularly concerned about this because in extreme cases my filter queries can be very large (1000s of characters long) and we already had to do tweaks as the size of GET requests would exceed default limits. And yes, we could move to POST but I would like to minimize both the amount of data that is sent over and the time taken to parse large queries. Thanks in advance. m.
Is there such as thing as FQ on a subquery?
I found myself wanting to write ... OR _query_:{!lucene fq=\a:b\}c:d And then I started looking at query trees in the debugger, and found myself thinking that there's no possible representation for this -- a subquery with a filter, since the filters are part of the RequestBuilder, not part of the query. Am I missing something?
Questions about the query function
I've been pestering you all with a series of questions about disassembling and partially rescoring queries. Every helpful response (thanks) has led me to further reading, and this leads to more questions. If I haven't before, I'll apologize now for the high level of ignorance at which I'm starting. This morning I'm wading in the pool of subqueries. An example from the wiki: q=product(popularity, query({!dismax qf=text v='solr rocks'})) What's the use/net effect of having a Q that amounts to a number? The final number becomes the score, yes? Why doesn't this example have to use _val_? Is there an assumed defType of func? Reading QueryValueSource.java, I'm wondering about the speed of something like this, or, more to the point, something like a calculation on two queries. Does this iteratively run the subquery for each document otherwise under consideration? I see some code that might be an optimization here. I'm also wondering, is there a way to express that I only want to see results that meet some threshold value for a subquery? Does _val_ (or the local param syntax) manufacture a field that appears in returned documents? If so, is its name _val_? In which case, can there be more than one?
Re: Questions about the query function
On Sun, Apr 15, 2012 at 9:03 AM, Erik Hatcher erik.hatc...@gmail.com wrote: Why doesn't this example have to use _val_? Is there an assumed defType of fund? Yeah, that wiki page is misleading there, as it is implying a non-specified defType=func. Wanna fix up the wiki to make this clear? Yup. _val_ would work too, or of course using that function as a parameter to (e)dismay's bf, or dismay's boost params. Erik On Apr 15, 2012, at 08:43 , Benson Margulies wrote: I've been pestering you all with a series of questions about disassembling and partially rescoring queries. Every helpful response (thanks) has led me to further reading, and this leads to more questions. If I haven't before, I'll apologize now for the high level of ignorance at which I'm starting. This morning I'm wading in the pool of subqueries. An example from the wiki: q=product(popularity, query({!dismax qf=text v='solr rocks'})) What's the use/net effect of having a Q that amounts to a number? The final number becomes the score, yes? Why doesn't this example have to use _val_? Is there an assumed defType of func? Reading QueryValueSource.java, I'm wondering about the speed of something like this, or, more to the point, something like a calculation on two queries. Does this iteratively run the subquery for each document otherwise under consideration? I see some code that might be an optimization here. I'm also wondering, is there a way to express that I only want to see results that meet some threshold value for a subquery? Does _val_ (or the local param syntax) manufacture a field that appears in returned documents? If so, is its name _val_? In which case, can there be more than one?
Re: Questions about the query function
Since I ended up with 'fund' instead of 'func' we're even. I made the edit. I'd make some more if you answered more of my questions :-) On Sun, Apr 15, 2012 at 9:42 AM, Erik Hatcher erik.hatc...@gmail.com wrote: _val_ would work too, or of course using that function as a parameter to (e)dismay's bf, or dismay's boost params. oops damn you autocorrect. I've been fighting this one since upgrading to Lion and will turn it off. s/dismay/dismax/! :) Erik
It's hard to google on _val_
So, I've been experimenting to learn how the _val_ participates in scores. It seems to me that http://wiki.apache.org/solr/FunctionQuery should explain the *effect* of including an _val_ term in an ordinary query, starting with a constant. http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_change_the_score_of_a_document_based_on_the_.2Avalue.2A_of_a_field_.28say.2C_.22popularity.22.29 poses exactly my question, but does not explain the math. It just says, 'they get a boost'. I tried some experiments. Positive values of _val_ did lead to positive increments in the score, but clearly not by simple addition. Presumably, the brains of the operation are http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/scoring.html. However, it seems to me that it would be kind to us dumb animals if the Solr pages gave a 'for idiots' summary of the net effect. Left to my own devices, I'll eventually work my way through this, but if someone hands me a shortcut, I'll cheerfully play tech writer here and there.
Re: It's hard to google on _val_
On Sun, Apr 15, 2012 at 12:14 PM, Yonik Seeley yo...@lucidimagination.com wrote: On Sun, Apr 15, 2012 at 11:34 AM, Benson Margulies bimargul...@gmail.com wrote: So, I've been experimenting to learn how the _val_ participates in scores. It seems to me that http://wiki.apache.org/solr/FunctionQuery should explain the *effect* of including an _val_ term in an ordinary query, starting with a constant. It's simply added to the score as any other clause in a boolean query would be. Positive values of _val_ did lead to positive increments in the score, but clearly not by simple addition. That's just because Lucene normalizes scores. By default, this is really just multiplying scores by a magic constant (that by default is the inverse of the sum of squared weights) and doesn't change relative orderings of docs. If you add debugQuery=true and look at the scoring explanations, you'll see that queryNorm component. If you want to go down the rabbit hole on trunk, see IndexSearcher.createNormalizedWeight() I think I should be able to add some text to the wiki that would help fellow Alice's merely by looking at the debugQuery result. Thanks. -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10
Re: Can I discover what part of a score is attributable to a subquery?
yes please On Apr 14, 2012, at 2:40 AM, Paul Libbrecht p...@hoplahup.net wrote: Benson, In mid 2009, I has such a question answered with a nifty score bitwise manipulation, and a little precision loss. For each result I could pick the language of a multilingual match. If interested, I can dig. Paul -- Envoyé de mon téléphone Android avec K-9 Mail. Excusez la brièveté. Benson Margulies bimargul...@gmail.com a écrit : Given a query including a subquery, is there any way for me to learn that subquery's contribution to the overall document score? I can provide 'why on earth would anyone ...' if someone wants to know.
Re: Can I discover what part of a score is attributable to a subquery?
On Sat, Apr 14, 2012 at 12:37 PM, Paul Libbrecht p...@hoplahup.net wrote: Benson, it was in the Lucene world in May 2010: http://mail-archives.apache.org/mod_mbox/lucene-java-user/201005.mbox/%3c469705.48901...@web29016.mail.ird.yahoo.com%3E Mark Harwood pointed me to a FlagQuery which was exactly what I needed. His contribution sounds not to have been taken up, it worked for me in Lucene, 2.4.1. We used this to create an auto-completion popup which selected the right language by flagging the right sub-query that was most matched. Paul, it seems to me that the criticism in the JIRA (do you really want this calculation for every single document that matches?) applies to me. In our stuff, we run a query, and we look at the top 200 items, rearranging their order based on a name similarity metric that is too expensive to run in bulk. If the overall query is 'just us', we can discard the Lucene scores and reorder based on our own. If our query is combined with other terms, then we need to subtract out the contribution our part of the initial query. However, sending in a second query with (I suppose) ids=id1,id2,... and just our query, to retrieve the scores, should be pretty speedy for a mere 200 items. Maybe I'm missing some even easier way, given a DocList and a query, to obtain scores for those docs for that query? paul Le 14 avr. 2012 à 15:34, Benson Margulies a écrit : yes please On Apr 14, 2012, at 2:40 AM, Paul Libbrecht p...@hoplahup.net wrote: Benson, In mid 2009, I has such a question answered with a nifty score bitwise manipulation, and a little precision loss. For each result I could pick the language of a multilingual match. If interested, I can dig. Paul -- Envoyé de mon téléphone Android avec K-9 Mail. Excusez la brièveté. Benson Margulies bimargul...@gmail.com a écrit : Given a query including a subquery, is there any way for me to learn that subquery's contribution to the overall document score? I can provide 'why on earth would anyone ...' if someone wants to know.
Realtime /get versus SearchHandler
A discussion over on the dev list led me to expect that the by-if field retrievals in a SolrCloud query would come through the get handler. In fact, I've seen them turn up in my search component in the search handler that is configured with my custom QT. (I have a 'prepare' method that sets ShardParams.QT to my QT to get my processing involved in the first of the two queries.) Did I overthink this?
Re: Can I discover what part of a score is attributable to a subquery?
On Fri, Apr 13, 2012 at 6:43 PM, John Chee johnc...@mylife.com wrote: On Fri, Apr 13, 2012 at 2:40 PM, Benson Margulies bimargul...@gmail.com wrote: Given a query including a subquery, is there any way for me to learn that subquery's contribution to the overall document score? I need this number to be available in a SearchComponent that runs after QueryComponent. I can provide 'why on earth would anyone ...' if someone wants to know. Have you tried debugQuery=true? http://wiki.apache.org/solr/CommonQueryParameters#debugQuery The 'explain' field of the result explains the scoring of each document.
Re: Can I discover what part of a score is attributable to a subquery?
On Fri, Apr 13, 2012 at 7:07 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : Given a query including a subquery, is there any way for me to learn : that subquery's contribution to the overall document score? You have to just execute the subquery itself ... doc collection and score calculation doesn't keep track the subscores. you could do this using functions in the fl but since you mentioned wanting this in SearchCOmponent just pass the subquery to SolrIndexSeracher using a DocSet filter of the current page (ie: make your own DocSet based on the current DocList) I get it. Some fairly intricate dancing then can ensue with SolrCloud. Thanks. -Hoss
Re: I've broken delete in SolrCloud and I'm a bit clueless as to how
On Thu, Apr 12, 2012 at 11:56 AM, Mark Miller markrmil...@gmail.com wrote: Please see the documentation: http://wiki.apache.org/solr/SolrCloud#Required_Config Did I fail to find this in google or did I just goad you into a writing job? I'm inclined to write a JIRA asking for _version_ to be configurable just like the uniqueKey in the schema. schema.xml You must have a _version_ field defined: field name=_version_ type=long indexed=true stored=true/ On Apr 11, 2012, at 9:10 AM, Benson Margulies wrote: I didn't have a _version_ field, since nothing in the schema says that it's required! On Wed, Apr 11, 2012 at 6:35 AM, Darren Govoni dar...@ontrenet.com wrote: Hard to say why its not working for you. Start with a fresh Solr and work forward from there or back out your configs and plugins until it works again. On Tue, 2012-04-10 at 17:15 -0400, Benson Margulies wrote: In my cloud configuration, if I push delete query*:*/query /delete followed by: commit/ I get no errors, the log looks happy enough, but the documents remain in the index, visible to /query. Here's what seems my relevant bit of solrconfig.xml. My URP only implements processAdd. updateRequestProcessorChain name=RNI !-- some day, add parameters when we have some -- processor class=com.basistech.rni.solr.NameIndexingUpdateRequestProcessorFactory/ processor class=solr.LogUpdateProcessorFactory / processor class=solr.DistributedUpdateProcessorFactory/ processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain !-- activate RNI processing by adding the RNI URP to the chain for xml updates -- requestHandler name=/update class=solr.XmlUpdateRequestHandler lst name=defaults str name=update.chainRNI/str /lst /requestHandler - Mark Miller lucidimagination.com
Re: I've broken delete in SolrCloud and I'm a bit clueless as to how
I'm probably confused, but it seems to me that the case I hit does not meet any of Yonik's criteria. I have no replicas. I'm running SolrCloud in the simple mode where each doc ends up in exactly one place. I think that it's just a bug that the code refuses to do the local deletion when there's no version info. However, if I am confused, it sure seems like a candidate for the 'at least throw instead of failing silently' policy.
Re: I've broken delete in SolrCloud and I'm a bit clueless as to how
On Thu, Apr 12, 2012 at 2:14 PM, Mark Miller markrmil...@gmail.com wrote: google must not have found it - i put that in a month or so ago I believe - at least weeks. As you can see, there is still a bit to fill in, but it covers the high level. I'd like to add example snippets for the rest soon. Mark, is it all true? I don't have an update log or a replication handler, and neither does the default, and it all works fine in the simple case from the top of that wiki page.
Re: I've broken delete in SolrCloud and I'm a bit clueless as to how
See https://issues.apache.org/jira/browse/SOLR-3347. I can replace the solrconfig.xml with the vanilla solrconfig.xml and the problem remains. On Wed, Apr 11, 2012 at 6:35 AM, Darren Govoni dar...@ontrenet.com wrote: Hard to say why its not working for you. Start with a fresh Solr and work forward from there or back out your configs and plugins until it works again. On Tue, 2012-04-10 at 17:15 -0400, Benson Margulies wrote: In my cloud configuration, if I push delete query*:*/query /delete followed by: commit/ I get no errors, the log looks happy enough, but the documents remain in the index, visible to /query. Here's what seems my relevant bit of solrconfig.xml. My URP only implements processAdd. updateRequestProcessorChain name=RNI !-- some day, add parameters when we have some -- processor class=com.basistech.rni.solr.NameIndexingUpdateRequestProcessorFactory/ processor class=solr.LogUpdateProcessorFactory / processor class=solr.DistributedUpdateProcessorFactory/ processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain !-- activate RNI processing by adding the RNI URP to the chain for xml updates -- requestHandler name=/update class=solr.XmlUpdateRequestHandler lst name=defaults str name=update.chainRNI/str /lst /requestHandler
Re: I've broken delete in SolrCloud and I'm a bit clueless as to how
I didn't have a _version_ field, since nothing in the schema says that it's required! On Wed, Apr 11, 2012 at 6:35 AM, Darren Govoni dar...@ontrenet.com wrote: Hard to say why its not working for you. Start with a fresh Solr and work forward from there or back out your configs and plugins until it works again. On Tue, 2012-04-10 at 17:15 -0400, Benson Margulies wrote: In my cloud configuration, if I push delete query*:*/query /delete followed by: commit/ I get no errors, the log looks happy enough, but the documents remain in the index, visible to /query. Here's what seems my relevant bit of solrconfig.xml. My URP only implements processAdd. updateRequestProcessorChain name=RNI !-- some day, add parameters when we have some -- processor class=com.basistech.rni.solr.NameIndexingUpdateRequestProcessorFactory/ processor class=solr.LogUpdateProcessorFactory / processor class=solr.DistributedUpdateProcessorFactory/ processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain !-- activate RNI processing by adding the RNI URP to the chain for xml updates -- requestHandler name=/update class=solr.XmlUpdateRequestHandler lst name=defaults str name=update.chainRNI/str /lst /requestHandler
Re: Default qt on SolrCloud
On Wed, Apr 11, 2012 at 11:19 AM, Erick Erickson erickerick...@gmail.com wrote: What does your query request handler look like? By adding qt=standard you're specifying the standard request handler, whereas your ...solr/query?q=*:* format goes at the request handler you named query which presumably you've defined in solrconfig.xml... What does debugQuery=on show? It turned out that I had left an extra(eous) declaration for /query with my custom RT, and when I removed it all was well. thanks,benson Best Erick On Tue, Apr 10, 2012 at 12:31 PM, Benson Margulies bimargul...@gmail.com wrote: After I load documents into my cloud instance, a URL like: http://localhost:PORT/solr/query?q=*:* finds nothing. http://localhost:PORT/solr/query?q=*:*qt=standard finds everything. My custom request handlers have 'default=false'. What have I done?
Re: SolrCloud versus a SearchComponent that rescores
On Mon, Apr 9, 2012 at 9:36 PM, Mark Miller markrmil...@gmail.com wrote: Yeah, that's how it works - it ends up hitting the select request handler (this might be overridable with shards.qt) All the params are passed along, so in general, it will act the same as the top level req handler - but it can the remove the shards param so you don't have an infinite recursion of distrib requests (say in the case you put shards in the tea handler in solrconfig). I think you have to investigate shards.qt Or look at adding those components to the std select handler as well. Thanks. Sent from my iPhone On Apr 9, 2012, at 9:26 PM, Benson Margulies bimargul...@gmail.com wrote: Um, maybe I've hit a quirk? In my solrconfig.xml, my special SearchComponents are installed only for a specific QT. So, it looks to me as if that QT is not propagated into the request out to the shards, and so they run the ordinary request handler without my components in it. Is this intended behavior I have to tweak via a distribution-aware component, or perhaps a bug, or does it make no sense at all and I need to look for some mistake of mine?
Re: SolrCloud versus a SearchComponent that rescores
Another thought: currently I'm using qt=ME to indicate this process. I could, in theory, use some ME=true and make my components check for it to avoid this process, but it seems kind of peculiar from an end-user standpoint.
Re: SolrCloud versus a SearchComponent that rescores
I've updated the doc with my findings. Thanks for the pointer.
URP's versus Cloud
How are URP's managed with respect to cloud deployment? Given some solrconfig.xml like the below, do I expect it to be in the chain on the leader, the shards, or both? updateRequestProcessorChain name=RNI !-- some day, add parameters when we have some -- processor class=com.basistech.rni.solr.NameIndexingUpdateRequestProcessorFactory/ processor class=solr.LogUpdateProcessorFactory / processor class=solr.DistributedUpdateProcessorFactory/ processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain !-- activate RNI processing by adding the RNI URP to the chain for xml updates -- requestHandler name=/update class=solr.XmlUpdateRequestHandler lst name=defaults str name=update.chainRNI/str /lst /requestHandler
Re: URP's versus Cloud
On Tue, Apr 10, 2012 at 1:08 PM, Markus Jelsma markus.jel...@openindex.io wrote: In this case on each node, order matters. If you, for example, define a standard SignatureUpdateProcessorFactory before the DistributedUpdateProcessorFactory you will end up with multiple values for the signature field. That seems to imply that 'before' processors run both on the leader and on the shards. Where do the afters run? Just on the leader or just on the shards? On Tue, 10 Apr 2012 12:43:36 -0400, Benson Margulies bimargul...@gmail.com wrote: How are URP's managed with respect to cloud deployment? Given some solrconfig.xml like the below, do I expect it to be in the chain on the leader, the shards, or both? updateRequestProcessorChain name=RNI !-- some day, add parameters when we have some -- processor class=com.basistech.rni.solr.NameIndexingUpdateRequestProcessorFactory/ processor class=solr.LogUpdateProcessorFactory / processor class=solr.DistributedUpdateProcessorFactory/ processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain !-- activate RNI processing by adding the RNI URP to the chain for xml updates -- requestHandler name=/update class=solr.XmlUpdateRequestHandler lst name=defaults str name=update.chainRNI/str /lst /requestHandler -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536600 / 06-50258350
Default qt on SolrCloud
After I load documents into my cloud instance, a URL like: http://localhost:PORT/solr/query?q=*:* finds nothing. http://localhost:PORT/solr/query?q=*:*qt=standard finds everything. My custom request handlers have 'default=false'. What have I done?
I've broken delete in SolrCloud and I'm a bit clueless as to how
In my cloud configuration, if I push delete query*:*/query /delete followed by: commit/ I get no errors, the log looks happy enough, but the documents remain in the index, visible to /query. Here's what seems my relevant bit of solrconfig.xml. My URP only implements processAdd. updateRequestProcessorChain name=RNI !-- some day, add parameters when we have some -- processor class=com.basistech.rni.solr.NameIndexingUpdateRequestProcessorFactory/ processor class=solr.LogUpdateProcessorFactory / processor class=solr.DistributedUpdateProcessorFactory/ processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain !-- activate RNI processing by adding the RNI URP to the chain for xml updates -- requestHandler name=/update class=solr.XmlUpdateRequestHandler lst name=defaults str name=update.chainRNI/str /lst /requestHandler
Cloud-aware request processing?
I'm working on a prototype of a scheme that uses SolrCloud to, in effect, distribute a computation by running it inside of a request processor. If there are N shards and M operations, I want each node to perform M/N operations. That, of course, implies that I know N. Is that fact available anyplace inside Solr, or do I need to just configure it?
'No JSP support' error in embedded Jetty for solrCloud as of apache-solr-4.0-2012-04-02_11-54-55
Starting the leader with: java -Dbootstrap_confdir=./solr/conf -Dcollection.configName=rnicloud -DzkRun -DnumShards=3 -Djetty.port=9167 -jar start.jar and browsing to http://localhost:9167/solr/rnicloud/admin/zookeeper.jsp I get: HTTP ERROR 500 Problem accessing /solr/rnicloud/admin/zookeeper.jsp. Reason: JSP support not configured Powered by Jetty://
Re: Cloud-aware request processing?
Jan Høydahl, My problem is intimately connected to Solr. it is not a batch job for hadoop, it is a distributed real-time query scheme. I hate to add yet another complex framework if a Solr RP can do the job simply. For this problem, I can transform a Solr query into a subset query on each shard, and then let the SolrCloud mechanism. I am well aware of the 'zoo' of alternatives, and I will be evaluating them if I can't get what I want from Solr. On Mon, Apr 9, 2012 at 9:34 AM, Jan Høydahl jan@cominvent.com wrote: Hi, Instead of using Solr, you may want to have a look at Hadoop or another framework for distributed computation, see e.g. http://java.dzone.com/articles/comparison-gridcloud-computing -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 9. apr. 2012, at 13:41, Benson Margulies wrote: I'm working on a prototype of a scheme that uses SolrCloud to, in effect, distribute a computation by running it inside of a request processor. If there are N shards and M operations, I want each node to perform M/N operations. That, of course, implies that I know N. Is that fact available anyplace inside Solr, or do I need to just configure it?
Is http://wiki.apache.org/solr/SolrCloud#Example_A:_Simple_two_shard_cluster up to date?
I specify -Dcollection.configName=rnicloud, but the admin gui tells me that I have a collection named 'collection1'. And, as reported in a prior email, the admin UI URL in there seems wrong.
Re: Re: Cloud-aware request processing?
On Mon, Apr 9, 2012 at 9:50 AM, Darren Govoni ontre...@ontrenet.com wrote: ...it is a distributed real-time query scheme... SolrCloud does this already. It treats all the shards like one-big-index, and you can query it normally to get subset results from each shard. Why do you have to re-write the query for each shard? Seems unnecessary. For reasons described in previous email that I won't repeat here. brbrbr--- Original Message --- On 4/9/2012 08:45 AM Benson Margulies wrote:br Jan Høydahl, br brMy problem is intimately connected to Solr. it is not a batch job for brhadoop, it is a distributed real-time query scheme. I hate to add yet branother complex framework if a Solr RP can do the job simply. br brFor this problem, I can transform a Solr query into a subset query on breach shard, and then let the SolrCloud mechanism. br brI am well aware of the 'zoo' of alternatives, and I will be evaluating brthem if I can't get what I want from Solr. br brOn Mon, Apr 9, 2012 at 9:34 AM, Jan Høydahl jan@cominvent.com wrote: br Hi, br br Instead of using Solr, you may want to have a look at Hadoop or another framework for distributed computation, see e.g. http://java.dzone.com/articles/comparison-gridcloud-computing br br -- br Jan Høydahl, search solution architect br Cominvent AS - www.cominvent.com br Solr Training - www.solrtraining.com br br On 9. apr. 2012, at 13:41, Benson Margulies wrote: br br I'm working on a prototype of a scheme that uses SolrCloud to, in br effect, distribute a computation by running it inside of a request br processor. br br If there are N shards and M operations, I want each node to perform br M/N operations. That, of course, implies that I know N. br br Is that fact available anyplace inside Solr, or do I need to just configure it? br br br
Stumped on using a custom update request processor with SolrCloud
If you would be so kind as to look at https://issues.apache.org/jira/browse/SOLR-3342, you will see that I tried to use a working configuration for a URP of mine with SolrCloud, and received in return an NPE. Somehow or another, by default, the XmlUpdateRequestHandler ends up using (I think) the PeerSync class to establish the indexibleId. When I add in my URP, I am somehow turning this off, and I'm currently stumped as to how to turn it back on. If you don't care to read the JIRA, my relevant configuration is right here. Is there something else I need in the 'defaults' list, or some other processor I need to put in my chain? updateRequestProcessorChain name=RNI !-- some day, add parameters when we have some -- processor class=com.basistech.rni.solr.NameIndexingUpdateRequestProcessorFactory/ processor class=solr.LogUpdateProcessorFactory / processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain !-- activate RNI processing by adding the RNI URP to the chain for xml updates -- requestHandler name=/update class=solr.XmlUpdateRequestHandler lst name=defaults str name=update.chainRNI/str /lst /requestHandler
SolrCloud versus a SearchComponent that rescores
Those of you insomniacs who have read my messages here over the last few weeks might recall that I've been working on a request handler that wraps the SearchHandler to rewrite queries and then reorder results. (I haven't quite worked out how to apply Grant's alternative suggestions without losing the performance advantages I was looking for in the first place.) Today, I realized that the RequestHandler approach, as opposed to search components, wasn't going to be viable. I was growing too much dependency on internal Solr quirks. So I refactored it into a pair of SearchComponents -- one to go first and rewrite the query, and one to go after query and rescore. And it works just fine - until I configure it into a SolrCloud cluster. At which point it started coming up with very wrong answers. I think that the reason is that I don't have an implementation of the distributedProcess method, or, more generally, that I don't understand the protocol on a SearchComponent when distributed processing is happening. Has anyone written anything yet about these considerations? I can put multiple processes in the debugging and see who gets called with what, but I was hoping for some sort of short cut.
Re: SolrCloud versus a SearchComponent that rescores
That page seems to be saying that the 'distributed' APIs take place on the leader, and the ordinary prepare/process APIs out at the leaves. I'll set out to prove or disprove that tomorrow. On Mon, Apr 9, 2012 at 8:17 PM, Mark Miller markrmil...@gmail.com wrote: On Apr 9, 2012, at 7:34 PM, Benson Margulies wrote: Those of you insomniacs who have read my messages here over the last few weeks might recall that I've been working on a request handler that wraps the SearchHandler to rewrite queries and then reorder results. (I haven't quite worked out how to apply Grant's alternative suggestions without losing the performance advantages I was looking for in the first place.) Today, I realized that the RequestHandler approach, as opposed to search components, wasn't going to be viable. I was growing too much dependency on internal Solr quirks. So I refactored it into a pair of SearchComponents -- one to go first and rewrite the query, and one to go after query and rescore. And it works just fine - until I configure it into a SolrCloud cluster. At which point it started coming up with very wrong answers. I think that the reason is that I don't have an implementation of the distributedProcess method, or, more generally, that I don't understand the protocol on a SearchComponent when distributed processing is happening. Has anyone written anything yet about these considerations? I can put multiple processes in the debugging and see who gets called with what, but I was hoping for some sort of short cut. Grant started something on this once: http://wiki.apache.org/solr/WritingDistributedSearchComponents It's only a start though. Unfortunately, to this point, adventurous souls have had to debug and study there way to understanding the distrib process solo mostly. Perhaps we can encourage anyone that has written a distributed component to help jump in on that wiki page. Any takers? - Mark Miller lucidimagination.com
Re: SolrCloud versus a SearchComponent that rescores
Um, maybe I've hit a quirk? In my solrconfig.xml, my special SearchComponents are installed only for a specific QT. So, it looks to me as if that QT is not propagated into the request out to the shards, and so they run the ordinary request handler without my components in it. Is this intended behavior I have to tweak via a distribution-aware component, or perhaps a bug, or does it make no sense at all and I need to look for some mistake of mine?
A curious request about a curious request handler
I've made a RequestHandler class that acts as follows: 1. At its initialization, it creates a StandardRequestHandler and hangs onto it. 2. When a query comes to it (I configure it to a custom qt value), it: a. creates a new query based on the query that arrived b. creates a LocalSolrQueryRequest for the current core and a param set containing the derived query c. runs this request through the SearchHandler d. uses a searcher to retrieve all the docs e. rescores/reorders them using our code f. attaches the result of this process to the response. The rescoring code creates a DocSlice containing the usual items, and that becomes response. By and large, this works, but I've a few points of mystification at hand, and I'd be most grateful for some illumination. 1. Is there any reason to pass FL to the inner query? StandardRequestHandler.handleRequest never seems to fill fields, since that's a response-write job anyway. 2. My 'rescoring' operates on the assumption that the entire relevancy is determined by the output of our code. If we wanted to combine our ranking (which produces similarity numbers between 0 and 1) with ordinary scores, any advice on scaling? (We'd want to subtract out the contribution of the initial search on our funny fields, and then combine in our rescoring value instead). 3. In 3.5.0, the admin gui never shows me scores other than 0 when I fire this up. SolrJ works just fine, scores come back, but with the admin gui, if I put my special QT in the query type and include score in the field list, the value displayed is always 0. I'd be grateful for any clue as to what I might have gummed up.
Re: A curious request about a curious request handler
On Tue, Apr 3, 2012 at 12:27 PM, Grant Ingersoll gsing...@apache.org wrote: On Apr 3, 2012, at 9:43 AM, Benson Margulies wrote: I've made a RequestHandler class that acts as follows: 1. At its initialization, it creates a StandardRequestHandler and hangs onto it. 2. When a query comes to it (I configure it to a custom qt value), it: a. creates a new query based on the query that arrived b. creates a LocalSolrQueryRequest for the current core and a param set containing the derived query c. runs this request through the SearchHandler d. uses a searcher to retrieve all the docs e. rescores/reorders them using our code f. attaches the result of this process to the response. The rescoring code creates a DocSlice containing the usual items, and that becomes response. Couldn't you just implement a Function (that calls your code) and use sort by function and/or use that value as part of the broader match? Lot less moving parts, etc. I don't know. Feel free to point me at doc at any point, but here's the questions that spring to mind: Starting with something in 'q' like: bt_rni_name:Mortimer Q Snerd bt_rni_Name_Language:eng code of mine eats those two fields (in some sense, pseudo-fields), and spits out many other fields that we actually want to query on. Then, when the results come back, a whole slew of other fields are used to calculate the 'real' score. Do Functions do that? -Grant
Re: A curious request about a curious request handler
Grant, let me see if I can expand this, as it were: {!benson f1:v1 f2:v2 f3:v3} (or do I mean {!query defType='benson' ...}?) I see how that could expand to be anything else I like. However, the Function side has me a little more puzzled. The information from the fields inside my {! ... } gets turned into an object, and that object goes into the code that can scores a document based on values a small slew of other fields, and it's too costly to reconstruct for each result. I'm thinking that this still calls for a request handler just to hold the state, but perhaps I'm missing something? Thanks for the help, benson