SolrCloud: Collection API question and problem with core loading

2013-07-15 Thread Patrick Mi
Hi there,

I run 2 solr instances ( Tomcat 7, Solr 4.3.0 , one shard),one external
Zookeeper instance and have lots of cores. 

I use collection API to create the new core dynamically after the
configuration for the core is uploaded to the Zookeeper and it all works
fine.

As there are so many cores it takes very long time to load them at start up
I would like to start up the server quickly and load the cores on demand.

When the core is created via collection API it is created with default
parameter : loadOnStartup=true ( this can be seen in solr.xml )

Question: is there a way to specify this parameter so it can be set 'false'
in collection API ?  

Problem: If I manually set loadOnStartup=true for the core I had exception
below when I used CloudSolrServer to query the core : 
Error: org.apache.solr.client.solrj.SolrServerException: No live SolrServers
available to handle this request  

Seems to me that CloudSolrServer will not trigger the core to be loaded. 

Is it possible to get the core loaded using CloudSolrServer?

Regards,
Patrick




RE: OPENNLP problems

2013-06-09 Thread Patrick Mi
Hi Lance,

I updated the src from 4.x and applied the latest patch LUCENE-2899-x.patch
uploaded on 6th June but still had the same problem.


Regards,
Patrick

-Original Message-
From: Lance Norskog [mailto:goks...@gmail.com] 
Sent: Thursday, 6 June 2013 5:16 p.m.
To: solr-user@lucene.apache.org
Subject: Re: OPENNLP problems

Patrick-
I found the problem with multiple documents. The problem was that the 
API for the life cycle of a Tokenizer changed, and I only noticed part 
of the change. You can now upload multiple documents in one post, and 
the OpenNLPTokenizer will process each document.

You're right, the example on the wiki is wrong. The FilterPayloadsFilter 
default is to remove the given payloads, and needs keepPayloads=true 
to retain them.

The fixed patch is up as LUCENE-2899-x.patch. Again, thanks for trying it.

Lance

https://issues.apache.org/jira/browse/LUCENE-2899

On 05/28/2013 10:08 PM, Patrick Mi wrote:
 Hi there,

 Checked out branch_4x and applied the latest patch
 LUCENE-2899-current.patch however I ran into 2 problems

 Followed the wiki page instruction and set up a field with this type
aiming
 to keep nouns and verbs and do a facet on the field
 ==
 fieldType name=text_opennlp_nvf class=solr.TextField
 positionIncrementGap=100
analyzer
  tokenizer class=solr.OpenNLPTokenizerFactory
 tokenizerModel=opennlp/en-token.bin/
  filter class=solr.OpenNLPFilterFactory
 posTaggerModel=opennlp/en-pos-maxent.bin/
  filter class=solr.FilterPayloadsFilterFactory
 payloadList=NN,NNS,NNP,NNPS,VB,VBD,VBG,VBN,VBP,VBZ,FW/
  filter class=solr.StripPayloadsFilterFactory/
/analyzer
  /fieldType
 ==

 Struggled to get that going until I put the extra parameter
 keepPayloads=true in as below.
   filter class=solr.FilterPayloadsFilterFactory keepPayloads=true
 payloadList=NN,NNS,NNP,NNPS,VB,VBD,VBG,VBN,VBP,VBZ,FW/

 Question: am I doing the right thing? Is this a mistake on wiki

 Second problem:

 Posted the document xml one by one to the solr and the result was what I
 expected.

 add
 doc
field name=id1/field
field name=text_opennlp_nvfcheck in the hotel/field/doc
 /add

 However if I put multiple documents into the same xml file and post it in
 one go only the first document gets processed( only 'check' and 'hotel'
were
 showing in the facet result.)
   
 add
 doc
field name=id1/field
field name=text_opennlp_nvfcheck in the hotel/field
 /doc
 doc
field name=id2/field
field name=text_opennlp_nvfremoves the payloads/field
 /doc
 doc
field name=id3/field
field name=text_opennlp_nvfretains only nouns and verbs /field
 /doc
 /add

 Same problem when updated the data using csv upload.

 Is that a bug or something I did wrong?

 Thanks in advance!

 Regards,
 Patrick






RE: OPENNLP current patch compiling problem for 4.x branch

2013-05-28 Thread Patrick Mi
Thanks Steve, that worked for branch_4x 

-Original Message-
From: Steve Rowe [mailto:sar...@gmail.com] 
Sent: Friday, 24 May 2013 3:19 a.m.
To: solr-user@lucene.apache.org
Subject: Re: OPENNLP current patch compiling problem for 4.x branch

Hi Patrick,

I think you should check out and apply the patch to branch_4x, rather than
the lucene_solr_4_3_0 tag:

http://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x

Steve

On May 23, 2013, at 2:08 AM, Patrick Mi patrick...@touchpointgroup.com
wrote:

 Hi,
 
 I checked out from here
 http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_4_3_0 and
 downloaded the latest patch LUCENE-2899-current.patch.
 
 Applied the patch ok but when I did 'ant compile' I got the following
error:
 
 
 ==
[javac]

/home/lucene_solr_4_3_0/lucene/analysis/opennlp/src/java/org/apache/lucene/a
 nalysis/opennlp/FilterPayloadsFilter.java:43: error
 r: cannot find symbol
[javac] super(Version.LUCENE_44, input);
[javac]  ^
[javac]   symbol:   variable LUCENE_44
[javac]   location: class Version
[javac] 1 error
 ==
 
 Compiled it on trunk without problem.
 
 Is this patch supposed to work for 4.X?
 
 Regards,
 Patrick 
 



OPENNLP problems

2013-05-28 Thread Patrick Mi
Hi there,

Checked out branch_4x and applied the latest patch
LUCENE-2899-current.patch however I ran into 2 problems

Followed the wiki page instruction and set up a field with this type aiming
to keep nouns and verbs and do a facet on the field
==
fieldType name=text_opennlp_nvf class=solr.TextField
positionIncrementGap=100
  analyzer
tokenizer class=solr.OpenNLPTokenizerFactory
tokenizerModel=opennlp/en-token.bin/
filter class=solr.OpenNLPFilterFactory
posTaggerModel=opennlp/en-pos-maxent.bin/
filter class=solr.FilterPayloadsFilterFactory
payloadList=NN,NNS,NNP,NNPS,VB,VBD,VBG,VBN,VBP,VBZ,FW/
filter class=solr.StripPayloadsFilterFactory/
  /analyzer
/fieldType
==

Struggled to get that going until I put the extra parameter
keepPayloads=true in as below. 
 filter class=solr.FilterPayloadsFilterFactory keepPayloads=true
payloadList=NN,NNS,NNP,NNPS,VB,VBD,VBG,VBN,VBP,VBZ,FW/

Question: am I doing the right thing? Is this a mistake on wiki 

Second problem:

Posted the document xml one by one to the solr and the result was what I
expected.

add
doc
  field name=id1/field
  field name=text_opennlp_nvfcheck in the hotel/field/doc
/add

However if I put multiple documents into the same xml file and post it in
one go only the first document gets processed( only 'check' and 'hotel' were
showing in the facet result.) 
 
add
doc
  field name=id1/field
  field name=text_opennlp_nvfcheck in the hotel/field
/doc
doc
  field name=id2/field
  field name=text_opennlp_nvfremoves the payloads/field
/doc
doc
  field name=id3/field
  field name=text_opennlp_nvfretains only nouns and verbs /field
/doc
/add

Same problem when updated the data using csv upload.

Is that a bug or something I did wrong?

Thanks in advance!

Regards,
Patrick




OPENNLP current patch compiling problem for 4.x branch

2013-05-23 Thread Patrick Mi
Hi,

I checked out from here
http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_4_3_0 and
downloaded the latest patch LUCENE-2899-current.patch.

Applied the patch ok but when I did 'ant compile' I got the following error:


==
[javac]
/home/lucene_solr_4_3_0/lucene/analysis/opennlp/src/java/org/apache/lucene/a
nalysis/opennlp/FilterPayloadsFilter.java:43: error
r: cannot find symbol
[javac] super(Version.LUCENE_44, input);
[javac]  ^
[javac]   symbol:   variable LUCENE_44
[javac]   location: class Version
[javac] 1 error
==

Compiled it on trunk without problem.

Is this patch supposed to work for 4.X?

Regards,
Patrick 



RE: SolrCloud with Zookeeper ensemble : fail to restart master server

2013-04-16 Thread Patrick Mi
After a number of testing I found that running embedded zookeeper isn't a
good idea especially only run one Zookeeper instance. When the Solr instance
with ZooKeeper embedded gets rebooted it got confused who should be the
leader therefore it will not start while others(followers) are still
running. I now use standalone Zookeeper instance and that works well.

Thanks Erick for giving the right direction, much appreciated!

Regards,
Patrick

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Wednesday, 20 March 2013 2:57 a.m.
To: solr-user@lucene.apache.org
Subject: Re: SolrCloud with Zookeeper ensemble : fail to restart master
server

First, the bootstrap_conf  and numShards should only be specified the
_first_ time you start up your leader. bootstrap_conf's purpose is to push
the configuration files to Zookeeper. numShards is a one-time-only
parameter that you shouldn't specify more than once, it is ignored
afterwards I think. Once the conf files are up in zookeeper, then they
don't need to be pushed again until they change, and you can use the
command-line tools to do that

Terminology: we're trying to get away from master/slave and use
leader/replica in SolrCloud mode to distinguish it from the old replication
process, so just checking to be sure that you probably really mean
leader/replica, right?

 Watch your admin/SolrCloud link as you bring machines up and down. That
page will show you the state of each of your machines. Normally there's no
trouble bringing the leader up and down, _except_ it sounds like you have
your zookeeper running embedded. A quorum of ZK nodes (in this case one)
needs to be running for SolrCloud to operate. Still, that shouldn't prevent
your machine running ZK from coming back up.

So I'm a bit puzzled, but let's straighten out the startup stuff and watch
your solr log on your leader when you bring it up, that should generate
some more questions..

Best
Erick


On Mon, Mar 18, 2013 at 11:12 PM, Patrick Mi patrick...@touchpointgroup.com
 wrote:

 Hi there,

 I have experienced some problems starting the master server.

 Solr4.2 under Tomcat 7 on Centos6.

 Configuration :
 3 solr instances running on different machines, one shard, 3 cores, 2
 replicas, using Zookeeper comes with Solr

 The master server A has the following run option: -Dbootstrap_conf=true
 -DzkRun -DnumShards=1,
 The slave servers B and C have : -DzkHost=masterServerIP:2181

 It works well for add/update/delete etc after I start up master and slave
 servers in order.

 When the master A is up stop/start slave B and C are OK.

 When slave B and C are running I couldn't restart master A. Only after I
 shutdown B and C then I can start master A.

 Is this a feature or bug or something I haven't configure properly?

 Thanks advance for your help

 Regards,
 Patrick





SolrCloud with Zookeeper ensemble : fail to restart master server

2013-03-19 Thread Patrick Mi
Hi there,

I have experienced some problems starting the master server.

Solr4.2 under Tomcat 7 on Centos6.

Configuration : 
3 solr instances running on different machines, one shard, 3 cores, 2
replicas, using Zookeeper comes with Solr 

The master server A has the following run option: -Dbootstrap_conf=true
-DzkRun -DnumShards=1, 
The slave servers B and C have : -DzkHost=masterServerIP:2181 

It works well for add/update/delete etc after I start up master and slave
servers in order.

When the master A is up stop/start slave B and C are OK.

When slave B and C are running I couldn't restart master A. Only after I
shutdown B and C then I can start master A.

Is this a feature or bug or something I haven't configure properly?

Thanks advance for your help

Regards,
Patrick



RE: DataDirectory: relative path doesn't work

2013-03-11 Thread Patrick Mi
Thanks for fixing the wiki page http://wiki.apache.org/solr/SolrConfigXml
now it says this:
'If this directory is not absolute, then it is relative to the directory
you're in when you start SOLR.'

It will be nice if you drop me a line here after you make the change on the
document ...

-Original Message-
From: Patrick Mi [mailto:patrick...@touchpointgroup.com] 
Sent: Tuesday, 26 February 2013 5:49 p.m.
To: solr-user@lucene.apache.org
Subject: DataDirectory: relative path doesn't work 

I am running Solr4.0/Tomcat 7 on Centos6

According to this page http://wiki.apache.org/solr/SolrConfigXml if
dataDir is not absolute, then it is relative to the instanceDir of the
SolrCore.

However the index directory is always created under the directory where I
start the Tomcat (startup.sh) rather than under instanceDir of the SolrCore.

Am I doing something wrong in configuration?

Regards,
Patrick



DataDirectory: relative path doesn't work

2013-02-25 Thread Patrick Mi
I am running Solr4.0/Tomcat 7 on Centos6

According to this page http://wiki.apache.org/solr/SolrConfigXml if
dataDir is not absolute, then it is relative to the instanceDir of the
SolrCore.

However the index directory is always created under the directory where I
start the Tomcat (startup.sh) rather than under instanceDir of the SolrCore.

Am I doing something wrong in configuration?

Regards,
Patrick