Re: Is there a way for SOLR / SOLRJ to index files directly bypassing HTTP streaming?

2012-03-18 Thread vybe3142
I'm going to try the approach described here and see what happens

http://lucene.472066.n3.nabble.com/Fastest-way-to-use-solrj-td502659.html

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-there-a-way-for-SOLR-SOLRJ-to-index-files-directly-bypassing-HTTP-streaming-tp3833419p3838250.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Is there a way for SOLR / SOLRJ to index files directly bypassing HTTP streaming?

2012-03-18 Thread vybe3142
Thanks much. I plan to try this tomorrow.

Can someone describe how to use remote streaming programmatically with
solrj. For example, see the basic clients described here:
http://androidyou.blogspot.com/2010/05/client-integration-with-solr-by-using.html
and observe that  the data is transferred in the http message (which I want
to avoid).


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-there-a-way-for-SOLR-SOLRJ-to-index-files-directly-bypassing-HTTP-streaming-tp3833419p3838238.html
Sent from the Solr - User mailing list archive at Nabble.com.


Invalid version (expected 2, but 60) or the data in not in 'javabin' format

2012-03-18 Thread 怪侠
Hi, all.
 I want to update the file's index. The folowing is my code:
  ContentStreamUpdateRequest up = new ContentStreamUpdateRequest(
  "/update/extract");
up.addFile(file);
up.setParam("uprefix", "attr_");
up.setParam("fmap.content", "attr_content");
up.setParam("literal.id", file.getPath());
up.setAction(AbstractUpdateRequest.ACTION.COMMIT, false, false);
solr.request(up);
  
 and I always get the error:
 java.lang.RuntimeException: Invalid version (expected 2, but 60) or the data 
in not in 'javabin' format
  
 and the error in solr server is:
 Error processing "legacy" update command:com.ctc.wstx.exc.WstxIOException: 
Invalid UTF-8 middle byte 0xe3 (at char #10, byte #-1).
  
 Could anyone tell me how to solve it? 
  
 Thanks very much.

Does the Solr provide hightlight token position in the field?

2012-03-18 Thread neosky
Does the hightlight can provide the exact position of the query
For instance:
MSAQLRKPTA*RVCES*CGRAEHWDDDLEAWQIARTDGTKQVGSPHCLHEWDINGNFNPVAMDD
I want to know the Position of "R" in the highlight token.
I want to do the secondary query based on the position, Thanks!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Does-the-Solr-provide-hightlight-token-position-in-the-field-tp3837895p3837895.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Any way to get reference to original request object from within Solr component?

2012-03-18 Thread SUJIT PAL
Thanks Russel, thats a good idea, I think this would work too... I will try 
this and update the thread with details once.

-sujit

On Mar 18, 2012, at 7:11 AM, Russell Black wrote:

> One way to do this is to register a servlet filter that places the current 
> request in a global static ThreadLocal variable, thereby making it available 
> to your Solr component.  It's kind of a hack but would work. 
> 
> Sent from my phone
> 
> On Mar 17, 2012, at 6:53 PM, "SUJIT PAL"  wrote:
> 
>> Thanks Pravesh,
>> 
>> Yes, converting the myparam to a single (comma-separated) field is probably 
>> the best approach, but as I mentioned, this is probably a bit too late for 
>> this to be practical in my case... 
>> 
>> The myparam parameters are facet filter queries, and so far order did not 
>> matter, since the filters were just AND-ed together and applied to the 
>> result set and facets were being returned in count order. But now the 
>> requirement is to "bubble up" the selected facets so the one is most 
>> currently selected is on the top. This was uncovered during user-acceptance 
>> testing (since the client shows only the top N facets, and the currently 
>> selected facet to disappear since its no longer within the top N facets).
>> 
>> Asking the client to switch to a single comma-separated field is an option, 
>> but its the last option at this point, so I was wondering if it was possible 
>> to switch to some other data structure, or at least get a handle to the 
>> original HTTP servlet request from within the component so I could grab the 
>> parameters from there.
>> 
>> I noticed that the /select call does preserve the order of the parameters, 
>> but that is because its probably being executed by SolrServlet, which gets 
>> its parameters from the HttpServletRequest.
>> 
>> I guess I will have to just run the request through a debugger and see where 
>> exactly the parameter order gets messed up...I'll update this thread if I 
>> find out.
>> 
>> Meanwhile, if any of you have simpler alternatives, would really appreciate 
>> knowing...
>> 
>> Thanks,
>> -sujit
>> 
>> On Mar 17, 2012, at 12:01 AM, pravesh wrote:
>> 
>>> Hi Sujit,
>>> 
>>> The Http parameters ordering is above the SOLR level. Don't think this could
>>> be controlled at SOLR level.
>>> You can append all required values in a single Http param at then break at
>>> your component level.
>>> 
>>> Regds
>>> Pravesh
>>> 
>>> --
>>> View this message in context: 
>>> http://lucene.472066.n3.nabble.com/Any-way-to-get-reference-to-original-request-object-from-within-Solr-component-tp3833703p3834082.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>> 



Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

2012-03-18 Thread Matthew Parker
That idea was short lived. I excluded the document. The cluster isn't
syncing even after shutting everything down and restarting.

On Sun, Mar 18, 2012 at 2:58 PM, Matthew Parker <
mpar...@apogeeintegration.com> wrote:

> I had tried importing data from Manifold, and one document threw a Tika
> Exception.
>
> If I shut everything down and restart SOLR cloud, the system sync'd on
> startup.
>
> Could extraction errors be the issue?
>
>
> On Sun, Mar 18, 2012 at 2:50 PM, Matthew Parker <
> mpar...@apogeeintegration.com> wrote:
>
>> I have nodes running on ports: 8081-8084
>>
>> A couple of the other SOLR cloud nodes we complaining about not being
>> talk with 8081, which is the first node brought up in the cluster.
>>
>> The startup process is:
>>
>> 1. start 3 zookeeper nodes
>>
>> 2. wait until complete
>>
>> 3. start first solr node.
>>
>> 4. wait until complete
>>
>> 5. start remaining 3 solr nodes.
>>
>> I wiped the zookeper and solr nodes data directories to start fresh.
>>
>> Another question: Would a Tika Exception cause the nodes not to
>> replicate? I can see the documents being commited on the first solr node,
>> but nothing replicates to the other 3.
>>
>>
>>
>>
>> On Sun, Mar 18, 2012 at 2:07 PM, Mark Miller wrote:
>>
>>> From every node in your cluster you can hit http://MACHINE1:8084/solrin 
>>> your browser and get a response?
>>>
>>> On Mar 18, 2012, at 1:46 PM, Matthew Parker wrote:
>>>
>>> > My cloud instance finally tried to sync. It looks like it's having
>>> connection issues, but I can bring the SOLR instance up in the browser so
>>> I'm not sure why it cannot connect to it. I got the following condensed log
>>> output:
>>> >
>>> > org.apache.commons.httpclient.HttpMethodDirector executeWithRetry
>>> > I/O exception (java.net.ConnectException) caught when processing
>>> request: Connection refused: connect
>>> >
>>> > org.apache.commons.httpclient.HttpMethodDirector executeWithRetry
>>> > I/O exception (java.net.ConnectException) caught when processing
>>> request: Connection refused: connect
>>> >
>>> > org.apache.commons.httpclient.HttpMethodDirector executeWithRetry
>>> > I/O exception (java.net.ConnectException) caught when processing
>>> request: Connection refused: connect
>>> >
>>> > Retrying request
>>> >
>>> > shard update error StdNode:
>>> http://MACHINE1:8084/solr/:org.apache.solr.client.solrj.SolrServerException:
>>> http://MACHINE1:8084/solr
>>> >at
>>> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:
>>> 483)
>>> > ..
>>> > ..
>>> > ..
>>> >  Caused by: java.net.ConnectException: Connection refused: connect
>>> >at java.net.DualStackPlainSocketImpl.connect0(Native Method)
>>> > ..
>>> > ..
>>> > ..
>>> >
>>> > try and ask http://MACHINE1:8084/solr to recover
>>> >
>>> > Could not tell a replica to recover
>>> >
>>> > org.apache.solr.client.solrj.SolrServerException:
>>> http://MACHINE1:8084/solr
>>> >   at
>>> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:483)
>>> >   ...
>>> >   ...
>>> >   ...
>>> > Caused by: java.net.ConnectException: Connection refused: connect
>>> >at java.net.DualStackPlainSocketImpl.waitForConnect(Native method)
>>> >..
>>> >..
>>> >..
>>> >
>>> > On Sat, Mar 17, 2012 at 10:10 PM, Mark Miller 
>>> wrote:
>>> > Nodes talk to ZooKeeper as well as to each other. You can see the
>>> addresses they are trying to use to communicate with each other in the
>>> 'cloud' view of the Solr Admin UI. Sometimes you have to override these, as
>>> the detected default may not be an address that other nodes can reach. As a
>>> limited example: for some reason my mac cannot talk to my linux box with
>>> its default detected host address of halfmetal:8983/solr - but the mac can
>>> reach my linux box if I use halfmetal.Local - so I have to override the
>>> published address of my linux box using the host attribute if I want to
>>> setup a cluster between my macbook and linux box.
>>> >
>>> > Each nodes talks to ZooKeeper to learn about the other nodes,
>>> including their addresses. Recovery is then done node to node using the
>>> appropriate addresses.
>>> >
>>> >
>>> > - Mark Miller
>>> > lucidimagination.com
>>> >
>>> > On Mar 16, 2012, at 3:00 PM, Matthew Parker wrote:
>>> >
>>> > > I'm still having issues replicating in my work environment. Can
>>> anyone
>>> > > explain how the replication mechanism works? Is it communicating
>>> across
>>> > > ports or through zookeeper to manager the process?
>>> > >
>>> > >
>>> > >
>>> > >
>>> > > On Thu, Mar 8, 2012 at 10:57 PM, Matthew Parker <
>>> > > mpar...@apogeeintegration.com> wrote:
>>> > >
>>> > >> All,
>>> > >>
>>> > >> I recreated the cluster on my machine at home (Windows 7, Java
>>> 1.6.0.23,
>>> > >> apache-solr-4.0-2012-02-29_09-07-30) , sent some document through
>>> Manifold
>>> > >> using its crawler, and it looks like it's replicating fine once the
>>> > >> documents are

Re: Boosting terms

2012-03-18 Thread Ahmet Arslan

> Is there any possibility to boost
> terms during indexing? Searching
> that using google I found information that there is no such
> feature in
> Solr (we can only boost fields). Is it true? 

Yes, only field and document boosting exist.

You might find this article interesting. 

http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payloads/




Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

2012-03-18 Thread Matthew Parker
I had tried importing data from Manifold, and one document threw a Tika
Exception.

If I shut everything down and restart SOLR cloud, the system sync'd on
startup.

Could extraction errors be the issue?


On Sun, Mar 18, 2012 at 2:50 PM, Matthew Parker <
mpar...@apogeeintegration.com> wrote:

> I have nodes running on ports: 8081-8084
>
> A couple of the other SOLR cloud nodes we complaining about not being talk
> with 8081, which is the first node brought up in the cluster.
>
> The startup process is:
>
> 1. start 3 zookeeper nodes
>
> 2. wait until complete
>
> 3. start first solr node.
>
> 4. wait until complete
>
> 5. start remaining 3 solr nodes.
>
> I wiped the zookeper and solr nodes data directories to start fresh.
>
> Another question: Would a Tika Exception cause the nodes not to replicate?
> I can see the documents being commited on the first solr node, but nothing
> replicates to the other 3.
>
>
>
>
> On Sun, Mar 18, 2012 at 2:07 PM, Mark Miller wrote:
>
>> From every node in your cluster you can hit http://MACHINE1:8084/solr in
>> your browser and get a response?
>>
>> On Mar 18, 2012, at 1:46 PM, Matthew Parker wrote:
>>
>> > My cloud instance finally tried to sync. It looks like it's having
>> connection issues, but I can bring the SOLR instance up in the browser so
>> I'm not sure why it cannot connect to it. I got the following condensed log
>> output:
>> >
>> > org.apache.commons.httpclient.HttpMethodDirector executeWithRetry
>> > I/O exception (java.net.ConnectException) caught when processing
>> request: Connection refused: connect
>> >
>> > org.apache.commons.httpclient.HttpMethodDirector executeWithRetry
>> > I/O exception (java.net.ConnectException) caught when processing
>> request: Connection refused: connect
>> >
>> > org.apache.commons.httpclient.HttpMethodDirector executeWithRetry
>> > I/O exception (java.net.ConnectException) caught when processing
>> request: Connection refused: connect
>> >
>> > Retrying request
>> >
>> > shard update error StdNode:
>> http://MACHINE1:8084/solr/:org.apache.solr.client.solrj.SolrServerException:
>> http://MACHINE1:8084/solr
>> >at
>> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:
>> 483)
>> > ..
>> > ..
>> > ..
>> >  Caused by: java.net.ConnectException: Connection refused: connect
>> >at java.net.DualStackPlainSocketImpl.connect0(Native Method)
>> > ..
>> > ..
>> > ..
>> >
>> > try and ask http://MACHINE1:8084/solr to recover
>> >
>> > Could not tell a replica to recover
>> >
>> > org.apache.solr.client.solrj.SolrServerException:
>> http://MACHINE1:8084/solr
>> >   at
>> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:483)
>> >   ...
>> >   ...
>> >   ...
>> > Caused by: java.net.ConnectException: Connection refused: connect
>> >at java.net.DualStackPlainSocketImpl.waitForConnect(Native method)
>> >..
>> >..
>> >..
>> >
>> > On Sat, Mar 17, 2012 at 10:10 PM, Mark Miller 
>> wrote:
>> > Nodes talk to ZooKeeper as well as to each other. You can see the
>> addresses they are trying to use to communicate with each other in the
>> 'cloud' view of the Solr Admin UI. Sometimes you have to override these, as
>> the detected default may not be an address that other nodes can reach. As a
>> limited example: for some reason my mac cannot talk to my linux box with
>> its default detected host address of halfmetal:8983/solr - but the mac can
>> reach my linux box if I use halfmetal.Local - so I have to override the
>> published address of my linux box using the host attribute if I want to
>> setup a cluster between my macbook and linux box.
>> >
>> > Each nodes talks to ZooKeeper to learn about the other nodes, including
>> their addresses. Recovery is then done node to node using the appropriate
>> addresses.
>> >
>> >
>> > - Mark Miller
>> > lucidimagination.com
>> >
>> > On Mar 16, 2012, at 3:00 PM, Matthew Parker wrote:
>> >
>> > > I'm still having issues replicating in my work environment. Can anyone
>> > > explain how the replication mechanism works? Is it communicating
>> across
>> > > ports or through zookeeper to manager the process?
>> > >
>> > >
>> > >
>> > >
>> > > On Thu, Mar 8, 2012 at 10:57 PM, Matthew Parker <
>> > > mpar...@apogeeintegration.com> wrote:
>> > >
>> > >> All,
>> > >>
>> > >> I recreated the cluster on my machine at home (Windows 7, Java
>> 1.6.0.23,
>> > >> apache-solr-4.0-2012-02-29_09-07-30) , sent some document through
>> Manifold
>> > >> using its crawler, and it looks like it's replicating fine once the
>> > >> documents are committed.
>> > >>
>> > >> This must be related to my environment somehow. Thanks for your help.
>> > >>
>> > >> Regards,
>> > >>
>> > >> Matt
>> > >>
>> > >> On Fri, Mar 2, 2012 at 9:06 AM, Erick Erickson <
>> erickerick...@gmail.com>wrote:
>> > >>
>> > >>> Matt:
>> > >>>
>> > >>> Just for paranoia's sake, when I was playing around with this (the
>> > >>> _version

Re: which mergePolicy

2012-03-18 Thread Tirthankar Chatterjee
Hi,
Do you see any issues with the default one. 


On Mar 18, 2012, at 6:10 AM, Messpero wrote:

> hi everyone,
> 
> i have a big index (~100 GB, ~55 documents) with 200 fields per
> document. I search with large queries, that's the reason why i must change
> the value of maxBooleanClauses to 8192.
> I use a queryResultCache with 20 size, because a search during over
> 30sec without cache. I insert documents and search documents at the same
> time and core/index.
> 
> Which mergePolicy should i use?
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/which-mergePolicy-tp3836250p3836250.html
> Sent from the Solr - User mailing list archive at Nabble.com.

**Legal Disclaimer***
"This communication may contain confidential and privileged
material for the sole use of the intended recipient. Any
unauthorized review, use or distribution by others is strictly
prohibited. If you have received the message in error, please
advise the sender by reply email and delete the message. Thank
you."
*


Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

2012-03-18 Thread Matthew Parker
I have nodes running on ports: 8081-8084

A couple of the other SOLR cloud nodes we complaining about not being talk
with 8081, which is the first node brought up in the cluster.

The startup process is:

1. start 3 zookeeper nodes

2. wait until complete

3. start first solr node.

4. wait until complete

5. start remaining 3 solr nodes.

I wiped the zookeper and solr nodes data directories to start fresh.

Another question: Would a Tika Exception cause the nodes not to replicate?
I can see the documents being commited on the first solr node, but nothing
replicates to the other 3.




On Sun, Mar 18, 2012 at 2:07 PM, Mark Miller  wrote:

> From every node in your cluster you can hit http://MACHINE1:8084/solr in
> your browser and get a response?
>
> On Mar 18, 2012, at 1:46 PM, Matthew Parker wrote:
>
> > My cloud instance finally tried to sync. It looks like it's having
> connection issues, but I can bring the SOLR instance up in the browser so
> I'm not sure why it cannot connect to it. I got the following condensed log
> output:
> >
> > org.apache.commons.httpclient.HttpMethodDirector executeWithRetry
> > I/O exception (java.net.ConnectException) caught when processing
> request: Connection refused: connect
> >
> > org.apache.commons.httpclient.HttpMethodDirector executeWithRetry
> > I/O exception (java.net.ConnectException) caught when processing
> request: Connection refused: connect
> >
> > org.apache.commons.httpclient.HttpMethodDirector executeWithRetry
> > I/O exception (java.net.ConnectException) caught when processing
> request: Connection refused: connect
> >
> > Retrying request
> >
> > shard update error StdNode:
> http://MACHINE1:8084/solr/:org.apache.solr.client.solrj.SolrServerException:
> http://MACHINE1:8084/solr
> >at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:
> 483)
> > ..
> > ..
> > ..
> >  Caused by: java.net.ConnectException: Connection refused: connect
> >at java.net.DualStackPlainSocketImpl.connect0(Native Method)
> > ..
> > ..
> > ..
> >
> > try and ask http://MACHINE1:8084/solr to recover
> >
> > Could not tell a replica to recover
> >
> > org.apache.solr.client.solrj.SolrServerException:
> http://MACHINE1:8084/solr
> >   at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:483)
> >   ...
> >   ...
> >   ...
> > Caused by: java.net.ConnectException: Connection refused: connect
> >at java.net.DualStackPlainSocketImpl.waitForConnect(Native method)
> >..
> >..
> >..
> >
> > On Sat, Mar 17, 2012 at 10:10 PM, Mark Miller 
> wrote:
> > Nodes talk to ZooKeeper as well as to each other. You can see the
> addresses they are trying to use to communicate with each other in the
> 'cloud' view of the Solr Admin UI. Sometimes you have to override these, as
> the detected default may not be an address that other nodes can reach. As a
> limited example: for some reason my mac cannot talk to my linux box with
> its default detected host address of halfmetal:8983/solr - but the mac can
> reach my linux box if I use halfmetal.Local - so I have to override the
> published address of my linux box using the host attribute if I want to
> setup a cluster between my macbook and linux box.
> >
> > Each nodes talks to ZooKeeper to learn about the other nodes, including
> their addresses. Recovery is then done node to node using the appropriate
> addresses.
> >
> >
> > - Mark Miller
> > lucidimagination.com
> >
> > On Mar 16, 2012, at 3:00 PM, Matthew Parker wrote:
> >
> > > I'm still having issues replicating in my work environment. Can anyone
> > > explain how the replication mechanism works? Is it communicating across
> > > ports or through zookeeper to manager the process?
> > >
> > >
> > >
> > >
> > > On Thu, Mar 8, 2012 at 10:57 PM, Matthew Parker <
> > > mpar...@apogeeintegration.com> wrote:
> > >
> > >> All,
> > >>
> > >> I recreated the cluster on my machine at home (Windows 7, Java
> 1.6.0.23,
> > >> apache-solr-4.0-2012-02-29_09-07-30) , sent some document through
> Manifold
> > >> using its crawler, and it looks like it's replicating fine once the
> > >> documents are committed.
> > >>
> > >> This must be related to my environment somehow. Thanks for your help.
> > >>
> > >> Regards,
> > >>
> > >> Matt
> > >>
> > >> On Fri, Mar 2, 2012 at 9:06 AM, Erick Erickson <
> erickerick...@gmail.com>wrote:
> > >>
> > >>> Matt:
> > >>>
> > >>> Just for paranoia's sake, when I was playing around with this (the
> > >>> _version_ thing was one of my problems too) I removed the entire data
> > >>> directory as well as the zoo_data directory between experiments (and
> > >>> recreated just the data dir). This included various index.2012
> > >>> files and the tlog directory on the theory that *maybe* there was
> some
> > >>> confusion happening on startup with an already-wonky index.
> > >>>
> > >>> If you have the energy and tried that it might be helpful
> informati

Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

2012-03-18 Thread Darren Govoni
I think he's asking if all the nodes (same machine or not) return a
response. Presumably you have different ports for each node since they
are on the same machine.

On Sun, 2012-03-18 at 14:44 -0400, Matthew Parker wrote:
> The cluster is running on one machine.
> 
> On Sun, Mar 18, 2012 at 2:07 PM, Mark Miller  wrote:
> 
> > From every node in your cluster you can hit http://MACHINE1:8084/solr in
> > your browser and get a response?
> >
> > On Mar 18, 2012, at 1:46 PM, Matthew Parker wrote:
> >
> > > My cloud instance finally tried to sync. It looks like it's having
> > connection issues, but I can bring the SOLR instance up in the browser so
> > I'm not sure why it cannot connect to it. I got the following condensed log
> > output:
> > >
> > > org.apache.commons.httpclient.HttpMethodDirector executeWithRetry
> > > I/O exception (java.net.ConnectException) caught when processing
> > request: Connection refused: connect
> > >
> > > org.apache.commons.httpclient.HttpMethodDirector executeWithRetry
> > > I/O exception (java.net.ConnectException) caught when processing
> > request: Connection refused: connect
> > >
> > > org.apache.commons.httpclient.HttpMethodDirector executeWithRetry
> > > I/O exception (java.net.ConnectException) caught when processing
> > request: Connection refused: connect
> > >
> > > Retrying request
> > >
> > > shard update error StdNode:
> > http://MACHINE1:8084/solr/:org.apache.solr.client.solrj.SolrServerException:
> > http://MACHINE1:8084/solr
> > >at
> > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:
> > 483)
> > > ..
> > > ..
> > > ..
> > >  Caused by: java.net.ConnectException: Connection refused: connect
> > >at java.net.DualStackPlainSocketImpl.connect0(Native Method)
> > > ..
> > > ..
> > > ..
> > >
> > > try and ask http://MACHINE1:8084/solr to recover
> > >
> > > Could not tell a replica to recover
> > >
> > > org.apache.solr.client.solrj.SolrServerException:
> > http://MACHINE1:8084/solr
> > >   at
> > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:483)
> > >   ...
> > >   ...
> > >   ...
> > > Caused by: java.net.ConnectException: Connection refused: connect
> > >at java.net.DualStackPlainSocketImpl.waitForConnect(Native method)
> > >..
> > >..
> > >..
> > >
> > > On Sat, Mar 17, 2012 at 10:10 PM, Mark Miller 
> > wrote:
> > > Nodes talk to ZooKeeper as well as to each other. You can see the
> > addresses they are trying to use to communicate with each other in the
> > 'cloud' view of the Solr Admin UI. Sometimes you have to override these, as
> > the detected default may not be an address that other nodes can reach. As a
> > limited example: for some reason my mac cannot talk to my linux box with
> > its default detected host address of halfmetal:8983/solr - but the mac can
> > reach my linux box if I use halfmetal.Local - so I have to override the
> > published address of my linux box using the host attribute if I want to
> > setup a cluster between my macbook and linux box.
> > >
> > > Each nodes talks to ZooKeeper to learn about the other nodes, including
> > their addresses. Recovery is then done node to node using the appropriate
> > addresses.
> > >
> > >
> > > - Mark Miller
> > > lucidimagination.com
> > >
> > > On Mar 16, 2012, at 3:00 PM, Matthew Parker wrote:
> > >
> > > > I'm still having issues replicating in my work environment. Can anyone
> > > > explain how the replication mechanism works? Is it communicating across
> > > > ports or through zookeeper to manager the process?
> > > >
> > > >
> > > >
> > > >
> > > > On Thu, Mar 8, 2012 at 10:57 PM, Matthew Parker <
> > > > mpar...@apogeeintegration.com> wrote:
> > > >
> > > >> All,
> > > >>
> > > >> I recreated the cluster on my machine at home (Windows 7, Java
> > 1.6.0.23,
> > > >> apache-solr-4.0-2012-02-29_09-07-30) , sent some document through
> > Manifold
> > > >> using its crawler, and it looks like it's replicating fine once the
> > > >> documents are committed.
> > > >>
> > > >> This must be related to my environment somehow. Thanks for your help.
> > > >>
> > > >> Regards,
> > > >>
> > > >> Matt
> > > >>
> > > >> On Fri, Mar 2, 2012 at 9:06 AM, Erick Erickson <
> > erickerick...@gmail.com>wrote:
> > > >>
> > > >>> Matt:
> > > >>>
> > > >>> Just for paranoia's sake, when I was playing around with this (the
> > > >>> _version_ thing was one of my problems too) I removed the entire data
> > > >>> directory as well as the zoo_data directory between experiments (and
> > > >>> recreated just the data dir). This included various index.2012
> > > >>> files and the tlog directory on the theory that *maybe* there was
> > some
> > > >>> confusion happening on startup with an already-wonky index.
> > > >>>
> > > >>> If you have the energy and tried that it might be helpful
> > information,
> > > >>> but it may also be a total red-herring
> > > >>>
> > > >>> FWIW
> > >

Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

2012-03-18 Thread Matthew Parker
The cluster is running on one machine.

On Sun, Mar 18, 2012 at 2:07 PM, Mark Miller  wrote:

> From every node in your cluster you can hit http://MACHINE1:8084/solr in
> your browser and get a response?
>
> On Mar 18, 2012, at 1:46 PM, Matthew Parker wrote:
>
> > My cloud instance finally tried to sync. It looks like it's having
> connection issues, but I can bring the SOLR instance up in the browser so
> I'm not sure why it cannot connect to it. I got the following condensed log
> output:
> >
> > org.apache.commons.httpclient.HttpMethodDirector executeWithRetry
> > I/O exception (java.net.ConnectException) caught when processing
> request: Connection refused: connect
> >
> > org.apache.commons.httpclient.HttpMethodDirector executeWithRetry
> > I/O exception (java.net.ConnectException) caught when processing
> request: Connection refused: connect
> >
> > org.apache.commons.httpclient.HttpMethodDirector executeWithRetry
> > I/O exception (java.net.ConnectException) caught when processing
> request: Connection refused: connect
> >
> > Retrying request
> >
> > shard update error StdNode:
> http://MACHINE1:8084/solr/:org.apache.solr.client.solrj.SolrServerException:
> http://MACHINE1:8084/solr
> >at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:
> 483)
> > ..
> > ..
> > ..
> >  Caused by: java.net.ConnectException: Connection refused: connect
> >at java.net.DualStackPlainSocketImpl.connect0(Native Method)
> > ..
> > ..
> > ..
> >
> > try and ask http://MACHINE1:8084/solr to recover
> >
> > Could not tell a replica to recover
> >
> > org.apache.solr.client.solrj.SolrServerException:
> http://MACHINE1:8084/solr
> >   at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:483)
> >   ...
> >   ...
> >   ...
> > Caused by: java.net.ConnectException: Connection refused: connect
> >at java.net.DualStackPlainSocketImpl.waitForConnect(Native method)
> >..
> >..
> >..
> >
> > On Sat, Mar 17, 2012 at 10:10 PM, Mark Miller 
> wrote:
> > Nodes talk to ZooKeeper as well as to each other. You can see the
> addresses they are trying to use to communicate with each other in the
> 'cloud' view of the Solr Admin UI. Sometimes you have to override these, as
> the detected default may not be an address that other nodes can reach. As a
> limited example: for some reason my mac cannot talk to my linux box with
> its default detected host address of halfmetal:8983/solr - but the mac can
> reach my linux box if I use halfmetal.Local - so I have to override the
> published address of my linux box using the host attribute if I want to
> setup a cluster between my macbook and linux box.
> >
> > Each nodes talks to ZooKeeper to learn about the other nodes, including
> their addresses. Recovery is then done node to node using the appropriate
> addresses.
> >
> >
> > - Mark Miller
> > lucidimagination.com
> >
> > On Mar 16, 2012, at 3:00 PM, Matthew Parker wrote:
> >
> > > I'm still having issues replicating in my work environment. Can anyone
> > > explain how the replication mechanism works? Is it communicating across
> > > ports or through zookeeper to manager the process?
> > >
> > >
> > >
> > >
> > > On Thu, Mar 8, 2012 at 10:57 PM, Matthew Parker <
> > > mpar...@apogeeintegration.com> wrote:
> > >
> > >> All,
> > >>
> > >> I recreated the cluster on my machine at home (Windows 7, Java
> 1.6.0.23,
> > >> apache-solr-4.0-2012-02-29_09-07-30) , sent some document through
> Manifold
> > >> using its crawler, and it looks like it's replicating fine once the
> > >> documents are committed.
> > >>
> > >> This must be related to my environment somehow. Thanks for your help.
> > >>
> > >> Regards,
> > >>
> > >> Matt
> > >>
> > >> On Fri, Mar 2, 2012 at 9:06 AM, Erick Erickson <
> erickerick...@gmail.com>wrote:
> > >>
> > >>> Matt:
> > >>>
> > >>> Just for paranoia's sake, when I was playing around with this (the
> > >>> _version_ thing was one of my problems too) I removed the entire data
> > >>> directory as well as the zoo_data directory between experiments (and
> > >>> recreated just the data dir). This included various index.2012
> > >>> files and the tlog directory on the theory that *maybe* there was
> some
> > >>> confusion happening on startup with an already-wonky index.
> > >>>
> > >>> If you have the energy and tried that it might be helpful
> information,
> > >>> but it may also be a total red-herring
> > >>>
> > >>> FWIW
> > >>> Erick
> > >>>
> > >>> On Thu, Mar 1, 2012 at 8:28 PM, Mark Miller 
> > >>> wrote:
> > > I assuming the windows configuration looked correct?
> > 
> >  Yeah, so far I can not spot any smoking gun...I'm confounded at the
> > >>> moment. I'll re read through everything once more...
> > 
> >  - Mark
> > >>>
> > >>
> > >>
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > --
> > This e-mail and any files transmitt

Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

2012-03-18 Thread Mark Miller
From every node in your cluster you can hit http://MACHINE1:8084/solr in your 
browser and get a response?

On Mar 18, 2012, at 1:46 PM, Matthew Parker wrote:

> My cloud instance finally tried to sync. It looks like it's having connection 
> issues, but I can bring the SOLR instance up in the browser so I'm not sure 
> why it cannot connect to it. I got the following condensed log output:
> 
> org.apache.commons.httpclient.HttpMethodDirector executeWithRetry
> I/O exception (java.net.ConnectException) caught when processing request: 
> Connection refused: connect
> 
> org.apache.commons.httpclient.HttpMethodDirector executeWithRetry
> I/O exception (java.net.ConnectException) caught when processing request: 
> Connection refused: connect
> 
> org.apache.commons.httpclient.HttpMethodDirector executeWithRetry
> I/O exception (java.net.ConnectException) caught when processing request: 
> Connection refused: connect
> 
> Retrying request
> 
> shard update error StdNode: 
> http://MACHINE1:8084/solr/:org.apache.solr.client.solrj.SolrServerException: 
> http://MACHINE1:8084/solr
>at 
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:
>  483)
> ..  
> ..
> ..
>  Caused by: java.net.ConnectException: Connection refused: connect
>at java.net.DualStackPlainSocketImpl.connect0(Native Method)
> ..
> ..
> ..
> 
> try and ask http://MACHINE1:8084/solr to recover
> 
> Could not tell a replica to recover
> 
> org.apache.solr.client.solrj.SolrServerException: http://MACHINE1:8084/solr
>   at 
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:483)
>   ...
>   ...
>   ...
> Caused by: java.net.ConnectException: Connection refused: connect
>at java.net.DualStackPlainSocketImpl.waitForConnect(Native method)
>..
>..
>..
> 
> On Sat, Mar 17, 2012 at 10:10 PM, Mark Miller  wrote:
> Nodes talk to ZooKeeper as well as to each other. You can see the addresses 
> they are trying to use to communicate with each other in the 'cloud' view of 
> the Solr Admin UI. Sometimes you have to override these, as the detected 
> default may not be an address that other nodes can reach. As a limited 
> example: for some reason my mac cannot talk to my linux box with its default 
> detected host address of halfmetal:8983/solr - but the mac can reach my linux 
> box if I use halfmetal.Local - so I have to override the published address of 
> my linux box using the host attribute if I want to setup a cluster between my 
> macbook and linux box.
> 
> Each nodes talks to ZooKeeper to learn about the other nodes, including their 
> addresses. Recovery is then done node to node using the appropriate addresses.
> 
> 
> - Mark Miller
> lucidimagination.com
> 
> On Mar 16, 2012, at 3:00 PM, Matthew Parker wrote:
> 
> > I'm still having issues replicating in my work environment. Can anyone
> > explain how the replication mechanism works? Is it communicating across
> > ports or through zookeeper to manager the process?
> >
> >
> >
> >
> > On Thu, Mar 8, 2012 at 10:57 PM, Matthew Parker <
> > mpar...@apogeeintegration.com> wrote:
> >
> >> All,
> >>
> >> I recreated the cluster on my machine at home (Windows 7, Java 1.6.0.23,
> >> apache-solr-4.0-2012-02-29_09-07-30) , sent some document through Manifold
> >> using its crawler, and it looks like it's replicating fine once the
> >> documents are committed.
> >>
> >> This must be related to my environment somehow. Thanks for your help.
> >>
> >> Regards,
> >>
> >> Matt
> >>
> >> On Fri, Mar 2, 2012 at 9:06 AM, Erick Erickson 
> >> wrote:
> >>
> >>> Matt:
> >>>
> >>> Just for paranoia's sake, when I was playing around with this (the
> >>> _version_ thing was one of my problems too) I removed the entire data
> >>> directory as well as the zoo_data directory between experiments (and
> >>> recreated just the data dir). This included various index.2012
> >>> files and the tlog directory on the theory that *maybe* there was some
> >>> confusion happening on startup with an already-wonky index.
> >>>
> >>> If you have the energy and tried that it might be helpful information,
> >>> but it may also be a total red-herring
> >>>
> >>> FWIW
> >>> Erick
> >>>
> >>> On Thu, Mar 1, 2012 at 8:28 PM, Mark Miller 
> >>> wrote:
> > I assuming the windows configuration looked correct?
> 
>  Yeah, so far I can not spot any smoking gun...I'm confounded at the
> >>> moment. I'll re read through everything once more...
> 
>  - Mark
> >>>
> >>
> >>
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> --
> This e-mail and any files transmitted with it may be proprietary.  Please 
> note that any views or opinions presented in this e-mail are solely those of 
> the author and do not necessarily represent those of Apogee Integration.
> 
> 

- Mark Miller
lucidimagination.com













Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

2012-03-18 Thread Matthew Parker
This might explain another thing I'm seeing. If I take a node down,
clusterstate.json still shows it as active. Also if I'm running 4 nodes,
take one down and assign it a new port, clusterstate.json will show 5 nodes
running.

On Sat, Mar 17, 2012 at 10:10 PM, Mark Miller  wrote:

> Nodes talk to ZooKeeper as well as to each other. You can see the
> addresses they are trying to use to communicate with each other in the
> 'cloud' view of the Solr Admin UI. Sometimes you have to override these, as
> the detected default may not be an address that other nodes can reach. As a
> limited example: for some reason my mac cannot talk to my linux box with
> its default detected host address of halfmetal:8983/solr - but the mac can
> reach my linux box if I use halfmetal.Local - so I have to override the
> published address of my linux box using the host attribute if I want to
> setup a cluster between my macbook and linux box.
>
> Each nodes talks to ZooKeeper to learn about the other nodes, including
> their addresses. Recovery is then done node to node using the appropriate
> addresses.
>
>
> - Mark Miller
> lucidimagination.com
>
> On Mar 16, 2012, at 3:00 PM, Matthew Parker wrote:
>
> > I'm still having issues replicating in my work environment. Can anyone
> > explain how the replication mechanism works? Is it communicating across
> > ports or through zookeeper to manager the process?
> >
> >
> >
> >
> > On Thu, Mar 8, 2012 at 10:57 PM, Matthew Parker <
> > mpar...@apogeeintegration.com> wrote:
> >
> >> All,
> >>
> >> I recreated the cluster on my machine at home (Windows 7, Java 1.6.0.23,
> >> apache-solr-4.0-2012-02-29_09-07-30) , sent some document through
> Manifold
> >> using its crawler, and it looks like it's replicating fine once the
> >> documents are committed.
> >>
> >> This must be related to my environment somehow. Thanks for your help.
> >>
> >> Regards,
> >>
> >> Matt
> >>
> >> On Fri, Mar 2, 2012 at 9:06 AM, Erick Erickson  >wrote:
> >>
> >>> Matt:
> >>>
> >>> Just for paranoia's sake, when I was playing around with this (the
> >>> _version_ thing was one of my problems too) I removed the entire data
> >>> directory as well as the zoo_data directory between experiments (and
> >>> recreated just the data dir). This included various index.2012
> >>> files and the tlog directory on the theory that *maybe* there was some
> >>> confusion happening on startup with an already-wonky index.
> >>>
> >>> If you have the energy and tried that it might be helpful information,
> >>> but it may also be a total red-herring
> >>>
> >>> FWIW
> >>> Erick
> >>>
> >>> On Thu, Mar 1, 2012 at 8:28 PM, Mark Miller 
> >>> wrote:
> > I assuming the windows configuration looked correct?
> 
>  Yeah, so far I can not spot any smoking gun...I'm confounded at the
> >>> moment. I'll re read through everything once more...
> 
>  - Mark
> >>>
> >>
> >>
>
>
>
>
>
>
>
>
>
>
>
>
>
>

--
This e-mail and any files transmitted with it may be proprietary.  Please note 
that any views or opinions presented in this e-mail are solely those of the 
author and do not necessarily represent those of Apogee Integration.


which mergePolicy

2012-03-18 Thread Messpero
hi everyone,

i have a big index (~100 GB, ~55 documents) with 200 fields per
document. I search with large queries, that's the reason why i must change
the value of maxBooleanClauses to 8192.
I use a queryResultCache with 20 size, because a search during over
30sec without cache. I insert documents and search documents at the same
time and core/index.

Which mergePolicy should i use?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/which-mergePolicy-tp3836250p3836250.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: mailto: scheme aware tokenizer

2012-03-18 Thread Steven A Rowe
Hi Kai,

I have created an issue for this: 
https://issues.apache.org/jira/browse/LUCENE-3880

Thanks for reporting!

Steve

-Original Message-
From: Kai Gülzau [mailto:kguel...@novomind.com] 
Sent: Friday, March 16, 2012 9:59 AM
To: solr-user@lucene.apache.org
Subject: mailto: scheme aware tokenizer

Is there any analyzer out there which handles the mailto: scheme?

UAX29URLEmailTokenizer seems to split at the wrong place:

mailto:t...@example.org ->
mailto:test
example.org

As a workaround I use

mailto:"; 
replacement="mailto: "/>

Regards,

Kai Gülzau

novomind AG
__

Bramfelder Straße 121 • 22305 Hamburg

phone +49 (0)40 808071138 • fax +49 (0)40 808071-100 email 
kguel...@novomind.com • http://www.novomind.com

Vorstand : Peter Samuelsen (Vors.) • Stefan Grieben • Thomas Köhler
Aufsichtsratsvorsitzender: Werner Preuschhof
Gesellschaftssitz: Hamburg • HR B93508 Amtsgericht Hamburg


Re: Any way to get reference to original request object from within Solr component?

2012-03-18 Thread Russell Black
One way to do this is to register a servlet filter that places the current 
request in a global static ThreadLocal variable, thereby making it available to 
your Solr component.  It's kind of a hack but would work. 

Sent from my phone

On Mar 17, 2012, at 6:53 PM, "SUJIT PAL"  wrote:

> Thanks Pravesh,
> 
> Yes, converting the myparam to a single (comma-separated) field is probably 
> the best approach, but as I mentioned, this is probably a bit too late for 
> this to be practical in my case... 
> 
> The myparam parameters are facet filter queries, and so far order did not 
> matter, since the filters were just AND-ed together and applied to the result 
> set and facets were being returned in count order. But now the requirement is 
> to "bubble up" the selected facets so the one is most currently selected is 
> on the top. This was uncovered during user-acceptance testing (since the 
> client shows only the top N facets, and the currently selected facet to 
> disappear since its no longer within the top N facets).
> 
> Asking the client to switch to a single comma-separated field is an option, 
> but its the last option at this point, so I was wondering if it was possible 
> to switch to some other data structure, or at least get a handle to the 
> original HTTP servlet request from within the component so I could grab the 
> parameters from there.
> 
> I noticed that the /select call does preserve the order of the parameters, 
> but that is because its probably being executed by SolrServlet, which gets 
> its parameters from the HttpServletRequest.
> 
> I guess I will have to just run the request through a debugger and see where 
> exactly the parameter order gets messed up...I'll update this thread if I 
> find out.
> 
> Meanwhile, if any of you have simpler alternatives, would really appreciate 
> knowing...
> 
> Thanks,
> -sujit
> 
> On Mar 17, 2012, at 12:01 AM, pravesh wrote:
> 
>> Hi Sujit,
>> 
>> The Http parameters ordering is above the SOLR level. Don't think this could
>> be controlled at SOLR level.
>> You can append all required values in a single Http param at then break at
>> your component level.
>> 
>> Regds
>> Pravesh
>> 
>> --
>> View this message in context: 
>> http://lucene.472066.n3.nabble.com/Any-way-to-get-reference-to-original-request-object-from-within-Solr-component-tp3833703p3834082.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
> 


Re: Too many connections in CLOSE_WAIT state on master solr server

2012-03-18 Thread samarth s
Hi Ranveer,

You can try this '-Dhttp.maxConnections' out, may resolve the issue.
But the root cause I figured may lie with some queries made to solr
that are too heavy to have decent turnaround times. As a result the
client may close the connection abruptly, resulting in half closed
connections. You can also try adding search time out to solr queries:
https://issues.apache.org/jira/browse/SOLR-502

On Tue, Jan 10, 2012 at 8:06 AM, Ranveer  wrote:
> Hi,
>
> I am facing same problem. Did  -Dhttp.maxConnections resolve the problem ?
>
> Please let us know!
>
> regards
> Ranveer
>
>
>
> On Thursday 15 December 2011 11:30 AM, samarth s wrote:
>>
>> Thanks Erick and Mikhail. I'll try this out.
>>
>> On Wed, Dec 14, 2011 at 7:11 PM, Erick Erickson
>>  wrote:
>>>
>>> I'm guessing (and it's just a guess) that what's happening is that
>>> the container is queueing up your requests while waiting
>>> for the other connections to close, so Mikhail's suggestion
>>> seems like a good idea.
>>>
>>> Best
>>> Erick
>>>
>>> On Wed, Dec 14, 2011 at 12:28 AM, samarth s
>>>   wrote:

 The updates to the master are user driven, and are needed to be
 visible quickly. Hence, the high frequency of replication. It may be
 that too many replication requests are being handled at a time, but
 why should that result in half closed connections?

 On Wed, Dec 14, 2011 at 2:47 AM, Erick Erickson
  wrote:
>
> Replicating 40 cores every 20 seconds is just *asking* for trouble.
> How often do your cores change on the master? How big are
> they? Is there any chance you just have too many cores replicating
> at once?
>
> Best
> Erick
>
> On Tue, Dec 13, 2011 at 3:52 PM, Mikhail Khludnev
>   wrote:
>>
>> You can try to reuse your connections (prevent them from closing) by
>> specifying
>>  -Dhttp.maxConnections=N
>> in jvm startup params. At client JVM!. Number should be chosen
>> considering
>> the number of connection you'd like to keep alive.
>>
>> Let me know if it works for you.
>>
>> On Tue, Dec 13, 2011 at 2:57 PM, samarth
>> swrote:
>>
>>> Hi,
>>>
>>> I am using solr replication and am experiencing a lot of connections
>>> in the state CLOSE_WAIT at the master solr server. These disappear
>>> after a while, but till then the master solr stops responding.
>>>
>>> There are about 130 open connections on the master server with the
>>> client as the slave m/c and all are in the state CLOSE_WAIT. Also,
>>> the
>>> client port specified on the master solr server netstat results is
>>> not
>>> visible in the netstat results on the client (slave solr) m/c.
>>>
>>> Following is my environment:
>>> - 40 cores in the master solr on m/c 1
>>> - 40 cores in the slave solr on m/c 2
>>> - The replication poll interval is 20 seconds.
>>> - Replication part in solrconfig.xml in the slave solr:
>>> 
>>>           
>>>
>>>                   
>>>                   >> name="masterUrl">$mastercorename/replication
>>>
>>>                   
>>>                   00:00:20
>>>                   
>>>                   5000
>>>                   1
>>>          
>>>   
>>>
>>> Thanks for any pointers.
>>>
>>> --
>>> Regards,
>>> Samarth
>>>
>>
>>
>> --
>> Sincerely yours
>> Mikhail Khludnev
>> Developer
>> Grid Dynamics
>> tel. 1-415-738-8644
>> Skype: mkhludnev
>> 
>>  



 --
 Regards,
 Samarth
>>
>>
>>
>



-- 
Regards,
Samarth