SolrCloud: Collection API question and problem with core loading
Hi there, I run 2 solr instances ( Tomcat 7, Solr 4.3.0 , one shard),one external Zookeeper instance and have lots of cores. I use collection API to create the new core dynamically after the configuration for the core is uploaded to the Zookeeper and it all works fine. As there are so many cores it takes very long time to load them at start up I would like to start up the server quickly and load the cores on demand. When the core is created via collection API it is created with default parameter : loadOnStartup=true ( this can be seen in solr.xml ) Question: is there a way to specify this parameter so it can be set 'false' in collection API ? Problem: If I manually set loadOnStartup=true for the core I had exception below when I used CloudSolrServer to query the core : Error: org.apache.solr.client.solrj.SolrServerException: No live SolrServers available to handle this request Seems to me that CloudSolrServer will not trigger the core to be loaded. Is it possible to get the core loaded using CloudSolrServer? Regards, Patrick
RE: OPENNLP problems
Hi Lance, I updated the src from 4.x and applied the latest patch LUCENE-2899-x.patch uploaded on 6th June but still had the same problem. Regards, Patrick -Original Message- From: Lance Norskog [mailto:goks...@gmail.com] Sent: Thursday, 6 June 2013 5:16 p.m. To: solr-user@lucene.apache.org Subject: Re: OPENNLP problems Patrick- I found the problem with multiple documents. The problem was that the API for the life cycle of a Tokenizer changed, and I only noticed part of the change. You can now upload multiple documents in one post, and the OpenNLPTokenizer will process each document. You're right, the example on the wiki is wrong. The FilterPayloadsFilter default is to remove the given payloads, and needs keepPayloads=true to retain them. The fixed patch is up as LUCENE-2899-x.patch. Again, thanks for trying it. Lance https://issues.apache.org/jira/browse/LUCENE-2899 On 05/28/2013 10:08 PM, Patrick Mi wrote: Hi there, Checked out branch_4x and applied the latest patch LUCENE-2899-current.patch however I ran into 2 problems Followed the wiki page instruction and set up a field with this type aiming to keep nouns and verbs and do a facet on the field == fieldType name=text_opennlp_nvf class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.OpenNLPTokenizerFactory tokenizerModel=opennlp/en-token.bin/ filter class=solr.OpenNLPFilterFactory posTaggerModel=opennlp/en-pos-maxent.bin/ filter class=solr.FilterPayloadsFilterFactory payloadList=NN,NNS,NNP,NNPS,VB,VBD,VBG,VBN,VBP,VBZ,FW/ filter class=solr.StripPayloadsFilterFactory/ /analyzer /fieldType == Struggled to get that going until I put the extra parameter keepPayloads=true in as below. filter class=solr.FilterPayloadsFilterFactory keepPayloads=true payloadList=NN,NNS,NNP,NNPS,VB,VBD,VBG,VBN,VBP,VBZ,FW/ Question: am I doing the right thing? Is this a mistake on wiki Second problem: Posted the document xml one by one to the solr and the result was what I expected. add doc field name=id1/field field name=text_opennlp_nvfcheck in the hotel/field/doc /add However if I put multiple documents into the same xml file and post it in one go only the first document gets processed( only 'check' and 'hotel' were showing in the facet result.) add doc field name=id1/field field name=text_opennlp_nvfcheck in the hotel/field /doc doc field name=id2/field field name=text_opennlp_nvfremoves the payloads/field /doc doc field name=id3/field field name=text_opennlp_nvfretains only nouns and verbs /field /doc /add Same problem when updated the data using csv upload. Is that a bug or something I did wrong? Thanks in advance! Regards, Patrick
RE: OPENNLP current patch compiling problem for 4.x branch
Thanks Steve, that worked for branch_4x -Original Message- From: Steve Rowe [mailto:sar...@gmail.com] Sent: Friday, 24 May 2013 3:19 a.m. To: solr-user@lucene.apache.org Subject: Re: OPENNLP current patch compiling problem for 4.x branch Hi Patrick, I think you should check out and apply the patch to branch_4x, rather than the lucene_solr_4_3_0 tag: http://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x Steve On May 23, 2013, at 2:08 AM, Patrick Mi patrick...@touchpointgroup.com wrote: Hi, I checked out from here http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_4_3_0 and downloaded the latest patch LUCENE-2899-current.patch. Applied the patch ok but when I did 'ant compile' I got the following error: == [javac] /home/lucene_solr_4_3_0/lucene/analysis/opennlp/src/java/org/apache/lucene/a nalysis/opennlp/FilterPayloadsFilter.java:43: error r: cannot find symbol [javac] super(Version.LUCENE_44, input); [javac] ^ [javac] symbol: variable LUCENE_44 [javac] location: class Version [javac] 1 error == Compiled it on trunk without problem. Is this patch supposed to work for 4.X? Regards, Patrick
OPENNLP problems
Hi there, Checked out branch_4x and applied the latest patch LUCENE-2899-current.patch however I ran into 2 problems Followed the wiki page instruction and set up a field with this type aiming to keep nouns and verbs and do a facet on the field == fieldType name=text_opennlp_nvf class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.OpenNLPTokenizerFactory tokenizerModel=opennlp/en-token.bin/ filter class=solr.OpenNLPFilterFactory posTaggerModel=opennlp/en-pos-maxent.bin/ filter class=solr.FilterPayloadsFilterFactory payloadList=NN,NNS,NNP,NNPS,VB,VBD,VBG,VBN,VBP,VBZ,FW/ filter class=solr.StripPayloadsFilterFactory/ /analyzer /fieldType == Struggled to get that going until I put the extra parameter keepPayloads=true in as below. filter class=solr.FilterPayloadsFilterFactory keepPayloads=true payloadList=NN,NNS,NNP,NNPS,VB,VBD,VBG,VBN,VBP,VBZ,FW/ Question: am I doing the right thing? Is this a mistake on wiki Second problem: Posted the document xml one by one to the solr and the result was what I expected. add doc field name=id1/field field name=text_opennlp_nvfcheck in the hotel/field/doc /add However if I put multiple documents into the same xml file and post it in one go only the first document gets processed( only 'check' and 'hotel' were showing in the facet result.) add doc field name=id1/field field name=text_opennlp_nvfcheck in the hotel/field /doc doc field name=id2/field field name=text_opennlp_nvfremoves the payloads/field /doc doc field name=id3/field field name=text_opennlp_nvfretains only nouns and verbs /field /doc /add Same problem when updated the data using csv upload. Is that a bug or something I did wrong? Thanks in advance! Regards, Patrick
OPENNLP current patch compiling problem for 4.x branch
Hi, I checked out from here http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_4_3_0 and downloaded the latest patch LUCENE-2899-current.patch. Applied the patch ok but when I did 'ant compile' I got the following error: == [javac] /home/lucene_solr_4_3_0/lucene/analysis/opennlp/src/java/org/apache/lucene/a nalysis/opennlp/FilterPayloadsFilter.java:43: error r: cannot find symbol [javac] super(Version.LUCENE_44, input); [javac] ^ [javac] symbol: variable LUCENE_44 [javac] location: class Version [javac] 1 error == Compiled it on trunk without problem. Is this patch supposed to work for 4.X? Regards, Patrick
RE: SolrCloud with Zookeeper ensemble : fail to restart master server
After a number of testing I found that running embedded zookeeper isn't a good idea especially only run one Zookeeper instance. When the Solr instance with ZooKeeper embedded gets rebooted it got confused who should be the leader therefore it will not start while others(followers) are still running. I now use standalone Zookeeper instance and that works well. Thanks Erick for giving the right direction, much appreciated! Regards, Patrick -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Wednesday, 20 March 2013 2:57 a.m. To: solr-user@lucene.apache.org Subject: Re: SolrCloud with Zookeeper ensemble : fail to restart master server First, the bootstrap_conf and numShards should only be specified the _first_ time you start up your leader. bootstrap_conf's purpose is to push the configuration files to Zookeeper. numShards is a one-time-only parameter that you shouldn't specify more than once, it is ignored afterwards I think. Once the conf files are up in zookeeper, then they don't need to be pushed again until they change, and you can use the command-line tools to do that Terminology: we're trying to get away from master/slave and use leader/replica in SolrCloud mode to distinguish it from the old replication process, so just checking to be sure that you probably really mean leader/replica, right? Watch your admin/SolrCloud link as you bring machines up and down. That page will show you the state of each of your machines. Normally there's no trouble bringing the leader up and down, _except_ it sounds like you have your zookeeper running embedded. A quorum of ZK nodes (in this case one) needs to be running for SolrCloud to operate. Still, that shouldn't prevent your machine running ZK from coming back up. So I'm a bit puzzled, but let's straighten out the startup stuff and watch your solr log on your leader when you bring it up, that should generate some more questions.. Best Erick On Mon, Mar 18, 2013 at 11:12 PM, Patrick Mi patrick...@touchpointgroup.com wrote: Hi there, I have experienced some problems starting the master server. Solr4.2 under Tomcat 7 on Centos6. Configuration : 3 solr instances running on different machines, one shard, 3 cores, 2 replicas, using Zookeeper comes with Solr The master server A has the following run option: -Dbootstrap_conf=true -DzkRun -DnumShards=1, The slave servers B and C have : -DzkHost=masterServerIP:2181 It works well for add/update/delete etc after I start up master and slave servers in order. When the master A is up stop/start slave B and C are OK. When slave B and C are running I couldn't restart master A. Only after I shutdown B and C then I can start master A. Is this a feature or bug or something I haven't configure properly? Thanks advance for your help Regards, Patrick
SolrCloud with Zookeeper ensemble : fail to restart master server
Hi there, I have experienced some problems starting the master server. Solr4.2 under Tomcat 7 on Centos6. Configuration : 3 solr instances running on different machines, one shard, 3 cores, 2 replicas, using Zookeeper comes with Solr The master server A has the following run option: -Dbootstrap_conf=true -DzkRun -DnumShards=1, The slave servers B and C have : -DzkHost=masterServerIP:2181 It works well for add/update/delete etc after I start up master and slave servers in order. When the master A is up stop/start slave B and C are OK. When slave B and C are running I couldn't restart master A. Only after I shutdown B and C then I can start master A. Is this a feature or bug or something I haven't configure properly? Thanks advance for your help Regards, Patrick
RE: DataDirectory: relative path doesn't work
Thanks for fixing the wiki page http://wiki.apache.org/solr/SolrConfigXml now it says this: 'If this directory is not absolute, then it is relative to the directory you're in when you start SOLR.' It will be nice if you drop me a line here after you make the change on the document ... -Original Message- From: Patrick Mi [mailto:patrick...@touchpointgroup.com] Sent: Tuesday, 26 February 2013 5:49 p.m. To: solr-user@lucene.apache.org Subject: DataDirectory: relative path doesn't work I am running Solr4.0/Tomcat 7 on Centos6 According to this page http://wiki.apache.org/solr/SolrConfigXml if dataDir is not absolute, then it is relative to the instanceDir of the SolrCore. However the index directory is always created under the directory where I start the Tomcat (startup.sh) rather than under instanceDir of the SolrCore. Am I doing something wrong in configuration? Regards, Patrick
DataDirectory: relative path doesn't work
I am running Solr4.0/Tomcat 7 on Centos6 According to this page http://wiki.apache.org/solr/SolrConfigXml if dataDir is not absolute, then it is relative to the instanceDir of the SolrCore. However the index directory is always created under the directory where I start the Tomcat (startup.sh) rather than under instanceDir of the SolrCore. Am I doing something wrong in configuration? Regards, Patrick