Re: Run Solr 5.3.0 as a Service on Windows using NSSM

2015-10-06 Thread Zheng Lin Edwin Yeo
Hi Adrian,

I've waited for more than 5 minutes and most of the time when I refresh it
says that the page cannot be found. Got one or twice the main Admin page is
loaded, but none of the cores are loaded.

I have 20 cores which I'm loading. The core are of various sizes, but the
maximum one is 38GB. Others ranges from 10GB to 15GB, and there're some
which are less than 1GB.

My overall core size is about 200GB.

Regards,
Edwin


On 7 October 2015 at 12:11, Adrian Liew  wrote:

> Hi Edwin,
>
> I have setup NSSM on Solr 5.3.0 in an Azure VM and can start up Solr with
> a base standalone installation.
>
> You may have to give Solr some time to bootstrap things and wait for the
> page to reload. Are you still seeing the page after 1 minute or so?
>
> What are your core sizes? And how many cores are you trying to load?
>
> Best regards,
> Adrian
>
> -Original Message-
> From: Zheng Lin Edwin Yeo [mailto:edwinye...@gmail.com]
> Sent: Wednesday, October 7, 2015 11:46 AM
> To: solr-user@lucene.apache.org
> Subject: Run Solr 5.3.0 as a Service on Windows using NSSM
>
> Hi,
>
> I tried to follow this to start my Solr as a service using NSSM.
> http://www.norconex.com/how-to-run-solr5-as-a-service-on-windows/
>
> Everything is fine when I start the services under Component Services.
> However, when I tried to point to the Solr Admin page, it says that the
> page cannot be found.
>
> I have tried the same thing in Solr 5.1, and it was able to work. Not sure
> why it couldn't work for Solr 5.2 and Solr 5.3.
>
> Is there any changes required to what is listed on the website?
>
> Regards,
> Edwin
>


Re: Pressed optimize and now SOLR is not indexing while optimize is going on

2015-10-06 Thread Siddhartha Singh Sandhu
Nice. Will port it onto an SSD.


A have a few questions about optimize. Is the search index fully searchable
after a commit?

How much time does one have to wait in case of a hard commit for the index
to be available?

I have an index of 180G. Do I need to hit the optimize on this chunk. This
is a single core. Say I cannot get in a cloud env because of cost but this
is a fairly large
amazon machine where I have given SOLR 12G of memory.

In context of my index if I had say 20G more data per month onto it. how
much time before it is fully available for search?

And when should I hit the optimize button?

Thanks

Sid.


On Tue, Oct 6, 2015 at 6:55 AM, Toke Eskildsen 
wrote:

> On Mon, 2015-10-05 at 17:26 -0400, Siddhartha Singh Sandhu wrote:
> > Following up on that: Would having an SSD make considerable difference in
> > speed?
>
> Yes, but only to a point.
>
> The UK Web Archive has done some tests on optimizing indexes on both
> spinning drives and SSDs:
> https://github.com/ukwa/shine/tree/master/python/test-logs
>
> With spinning drives, their machines maxed out on IOWait. With SSD, the
> machine maxed out on CPU. That might sound great, but the problem is
> that optimizing on a single shard is single threaded (at least for Solr
> 4.10.x), so if there is only a single shard on the machine, only 1 CPU
> is running at full tilt. There is always a bottleneck.
>
> What might help is that the SSD (probably) does not get bogged down by
> the process, so it should be much better at handling other requests
> while the optimization is running.
>
> - Toke Eskildsen, State and University Library, Denmark
>
>
>


Solr cross core join special condition

2015-10-06 Thread Ali Nazemian
I was wondering how can I overcome this query requirement in Solr 5.2.1:

I have two different Solr cores refer as "core1" and "core2". core1  has
some fields such as field1, field2 and field3 and core2 has some other
fields such as field1, field4 and field5. I am looking for Solr query which
can return all of the documents requiring field1, field2, field3, field4
and field5 with considering some condition on core1.

For example:
core1:
-field1:123
-field2:"foo"
-field3:"bar"

core2:
-field1:123
-field4:"hello"
-field5:"world"

returning result:
field1:123
field2:"foo"
field3:"bar"
field4:"hello"
field4:"world"

Thank you very much.

Best regards.

-- 
A.Nazemian


Re: Solr cross core join special condition

2015-10-06 Thread Mikhail Khludnev
Hello,

Why do you need sibling core fields? do you facet? or just want to enrich
result page with them?

On Tue, Oct 6, 2015 at 6:04 PM, Ali Nazemian  wrote:

> I was wondering how can I overcome this query requirement in Solr 5.2.1:
>
> I have two different Solr cores refer as "core1" and "core2". core1  has
> some fields such as field1, field2 and field3 and core2 has some other
> fields such as field1, field4 and field5. I am looking for Solr query which
> can return all of the documents requiring field1, field2, field3, field4
> and field5 with considering some condition on core1.
>
> For example:
> core1:
> -field1:123
> -field2:"foo"
> -field3:"bar"
>
> core2:
> -field1:123
> -field4:"hello"
> -field5:"world"
>
> returning result:
> field1:123
> field2:"foo"
> field3:"bar"
> field4:"hello"
> field4:"world"
>
> Thank you very much.
>
> Best regards.
>
> --
> A.Nazemian
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





Re: indexing data to solrcloud with "implicit" is not distributing across cluster.

2015-10-06 Thread Shawn Heisey
On 10/6/2015 7:58 AM, Steve wrote:
> I’ve been unable to get solrcloud to distribute data across 4 solr nodes
> with the “route.name=implicit”  feature of the collections API.
> 
> The nodes are live, and the graphs are green.  All the data (the “Films”
> example data) shows up on one node, the node that received the CREATE
> command.

A better name for the implicit router is "manual."  The implicit router
doesn't actually route.  It assumes that you know what you are doing and
have sent the request to the shard where you want it to be indexed.

You want the compositeId router.

Even though the name "implicit" makes sense in the context of Solr
*code*, it is a confusing name when it comes to user expectations.
You're not the first one to be confused by this, which is why I opened
this issue:

https://issues.apache.org/jira/browse/SOLR-6630

Thanks,
Shawn



Re: Cannot connect to a zookeeper 3.4.6 instance via zkCli.cmd

2015-10-06 Thread Shawn Heisey
On 10/6/2015 3:38 AM, Adrian Liew wrote:
> Thanks for the reply. Looks like this has been resolved by manually starting 
> the Zookeeper services on each server promptly so that the tickTime value 
> does not timeout too quickly to heartbeat other peers. Hence, I increased the 
> tickTime value to about 5 minutes to give some time for a node hosting 
> Zookeeper to restart and autostart its service. This case seems fixed but I 
> will double check again once more to be sure. I am using nssm 
> (non-sucking-service-manager) to autostart Zookeeper. I will need to retest 
> this once again using nssm to make sure zookeeper services are up and running.

That sounds like a very bad idea.  A typical tickTime is two *seconds*.
 Zookeeper is designed around certain things happening very quickly.

I don't think you can increase that to five *minutes* (multiplying it by
150) without the strong possibility of something going very wrong and
processes hanging for minutes at a time waiting for a timeout that
should happen very quickly.

I am reasonably certain that tickTime is used for zookeeper operation in
several ways, so I believe that this much of an increase will cause
fundamental problems with zookeeper's normal operation.  I admit that I
have not looked at the code, so I could be wrong ... but based on the
following information from the Zookeeper docs, I don't think I am wrong:

 tickTime

the length of a single tick, which is the basic time unit used by
ZooKeeper, as measured in milliseconds. It is used to regulate
heartbeats, and timeouts. For example, the minimum session timeout will
be two ticks.

Thanks,
Shawn



Re: Best Indexing Approaches - To max the throughput

2015-10-06 Thread Bill Dueber
Just to add...my informal tests show that batching has way more effect
than solrj vs json.

I haven't look at CUSC in a while, last time I looked it was impossible to
do anything smart about error handling, so check that out before you get
too deeply into it. We use a strategy of sending a batch of json documents,
and if it returns an error sending each record one at a time until we find
the bad one and can log something useful.



On Mon, Oct 5, 2015 at 12:07 PM, Alessandro Benedetti <
benedetti.ale...@gmail.com> wrote:

> Thanks Erick,
> you confirmed my impressions!
> Thank you very much for the insights, an other opinion is welcome :)
>
> Cheers
>
> 2015-10-05 14:55 GMT+01:00 Erick Erickson :
>
> > SolrJ tends to be faster for several reasons, not the least of which
> > is that it sends packets to Solr in a more efficient binary format.
> >
> > Batching is critical. I did some rough tests using SolrJ and sending
> > docs one at a time gave a throughput of < 400 docs/second.
> > Sending 10 gave 2,300 or so. Sending 100 at a time gave
> > over 5,300 docs/second. Curiously, 1,000 at a time gave only
> > marginal improvement over 100. This was with a single thread.
> > YMMV of course.
> >
> > CloudSolrClient is definitely the better way to go with SolrCloud,
> > it routes the docs to the correct leader instead of having the
> > node you send the docs to do the routing.
> >
> > Best,
> > Erick
> >
> > On Mon, Oct 5, 2015 at 4:57 AM, Alessandro Benedetti
> >  wrote:
> > > I was doing some studies and analysis, just wondering in your opinion
> > which
> > > one is the best approach to use to index in Solr to reach the best
> > > throughput possible.
> > > I know that a lot of factor are affecting Indexing time, so let's only
> > > focus in the feeding approach.
> > > Let's isolate different scenarios :
> > >
> > > *Single Solr Infrastructure*
> > >
> > > 1) Xml/Json batch request to /update IndexHandler (xml/json)
> > >
> > > 2) SolrJ ConcurrentUpdateSolrClient ( javabin)
> > > I was thinking this to be the fastest approach for a multi threaded
> > > indexing application.
> > > Posting batch of docs if possible per request.
> > >
> > > *Solr Cloud*
> > >
> > > 1) Xml/Json batch request to /update IndexHandler(xml/json)
> > >
> > > 2) SolrJ ConcurrentUpdateSolrClient ( javabin)
> > >
> > > 3) CloudSolrClient ( javabin)
> > > it seems the best approach accordingly to this improvements [1]
> > >
> > > What are your opinions ?
> > >
> > > A bonus observation should be for using some Map/Reduce big data
> indexer,
> > > but let's assume we don't have a big cluster of cpus, but the average
> > > Indexer server.
> > >
> > >
> > > [1]
> > >
> >
> https://lucidworks.com/blog/indexing-performance-solr-5-2-now-twice-fast/
> > >
> > >
> > > Cheers
> > >
> > >
> > > --
> > > --
> > >
> > > Benedetti Alessandro
> > > Visiting card : http://about.me/alessandro_benedetti
> > >
> > > "Tyger, tyger burning bright
> > > In the forests of the night,
> > > What immortal hand or eye
> > > Could frame thy fearful symmetry?"
> > >
> > > William Blake - Songs of Experience -1794 England
> >
>
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card - http://about.me/alessandro_benedetti
> Blog - http://alexbenedetti.blogspot.co.uk
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>



-- 
Bill Dueber
Library Systems Programmer
University of Michigan Library


Re: Best Indexing Approaches - To max the throughput

2015-10-06 Thread Alessandro Benedetti
mm one broken document in a batch should not break the entire batch ,
right ( whatever approach used) ?
Are you referring to the fact that you want to programmatically re-index
the broken docs ?

Would be interesting to return the id of the broken docs along with the
solr update response!

Cheers


On 6 October 2015 at 15:30, Bill Dueber  wrote:

> Just to add...my informal tests show that batching has way more effect
> than solrj vs json.
>
> I haven't look at CUSC in a while, last time I looked it was impossible to
> do anything smart about error handling, so check that out before you get
> too deeply into it. We use a strategy of sending a batch of json documents,
> and if it returns an error sending each record one at a time until we find
> the bad one and can log something useful.
>
>
>
> On Mon, Oct 5, 2015 at 12:07 PM, Alessandro Benedetti <
> benedetti.ale...@gmail.com> wrote:
>
> > Thanks Erick,
> > you confirmed my impressions!
> > Thank you very much for the insights, an other opinion is welcome :)
> >
> > Cheers
> >
> > 2015-10-05 14:55 GMT+01:00 Erick Erickson :
> >
> > > SolrJ tends to be faster for several reasons, not the least of which
> > > is that it sends packets to Solr in a more efficient binary format.
> > >
> > > Batching is critical. I did some rough tests using SolrJ and sending
> > > docs one at a time gave a throughput of < 400 docs/second.
> > > Sending 10 gave 2,300 or so. Sending 100 at a time gave
> > > over 5,300 docs/second. Curiously, 1,000 at a time gave only
> > > marginal improvement over 100. This was with a single thread.
> > > YMMV of course.
> > >
> > > CloudSolrClient is definitely the better way to go with SolrCloud,
> > > it routes the docs to the correct leader instead of having the
> > > node you send the docs to do the routing.
> > >
> > > Best,
> > > Erick
> > >
> > > On Mon, Oct 5, 2015 at 4:57 AM, Alessandro Benedetti
> > >  wrote:
> > > > I was doing some studies and analysis, just wondering in your opinion
> > > which
> > > > one is the best approach to use to index in Solr to reach the best
> > > > throughput possible.
> > > > I know that a lot of factor are affecting Indexing time, so let's
> only
> > > > focus in the feeding approach.
> > > > Let's isolate different scenarios :
> > > >
> > > > *Single Solr Infrastructure*
> > > >
> > > > 1) Xml/Json batch request to /update IndexHandler (xml/json)
> > > >
> > > > 2) SolrJ ConcurrentUpdateSolrClient ( javabin)
> > > > I was thinking this to be the fastest approach for a multi threaded
> > > > indexing application.
> > > > Posting batch of docs if possible per request.
> > > >
> > > > *Solr Cloud*
> > > >
> > > > 1) Xml/Json batch request to /update IndexHandler(xml/json)
> > > >
> > > > 2) SolrJ ConcurrentUpdateSolrClient ( javabin)
> > > >
> > > > 3) CloudSolrClient ( javabin)
> > > > it seems the best approach accordingly to this improvements [1]
> > > >
> > > > What are your opinions ?
> > > >
> > > > A bonus observation should be for using some Map/Reduce big data
> > indexer,
> > > > but let's assume we don't have a big cluster of cpus, but the average
> > > > Indexer server.
> > > >
> > > >
> > > > [1]
> > > >
> > >
> >
> https://lucidworks.com/blog/indexing-performance-solr-5-2-now-twice-fast/
> > > >
> > > >
> > > > Cheers
> > > >
> > > >
> > > > --
> > > > --
> > > >
> > > > Benedetti Alessandro
> > > > Visiting card : http://about.me/alessandro_benedetti
> > > >
> > > > "Tyger, tyger burning bright
> > > > In the forests of the night,
> > > > What immortal hand or eye
> > > > Could frame thy fearful symmetry?"
> > > >
> > > > William Blake - Songs of Experience -1794 England
> > >
> >
> >
> >
> > --
> > --
> >
> > Benedetti Alessandro
> > Visiting card - http://about.me/alessandro_benedetti
> > Blog - http://alexbenedetti.blogspot.co.uk
> >
> > "Tyger, tyger burning bright
> > In the forests of the night,
> > What immortal hand or eye
> > Could frame thy fearful symmetry?"
> >
> > William Blake - Songs of Experience -1794 England
> >
>
>
>
> --
> Bill Dueber
> Library Systems Programmer
> University of Michigan Library
>



-- 
--

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: Solr cross core join special condition

2015-10-06 Thread Ali Nazemian
Dear Mikhail,
Hi,
I want to enrich the result.
Regards
On Oct 6, 2015 7:07 PM, "Mikhail Khludnev" 
wrote:

> Hello,
>
> Why do you need sibling core fields? do you facet? or just want to enrich
> result page with them?
>
> On Tue, Oct 6, 2015 at 6:04 PM, Ali Nazemian 
> wrote:
>
> > I was wondering how can I overcome this query requirement in Solr 5.2.1:
> >
> > I have two different Solr cores refer as "core1" and "core2". core1  has
> > some fields such as field1, field2 and field3 and core2 has some other
> > fields such as field1, field4 and field5. I am looking for Solr query
> which
> > can return all of the documents requiring field1, field2, field3, field4
> > and field5 with considering some condition on core1.
> >
> > For example:
> > core1:
> > -field1:123
> > -field2:"foo"
> > -field3:"bar"
> >
> > core2:
> > -field1:123
> > -field4:"hello"
> > -field5:"world"
> >
> > returning result:
> > field1:123
> > field2:"foo"
> > field3:"bar"
> > field4:"hello"
> > field4:"world"
> >
> > Thank you very much.
> >
> > Best regards.
> >
> > --
> > A.Nazemian
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> 
> 
>


Re: Filter first-components result in solr.SearchHandler

2015-10-06 Thread Erik Hatcher
Seems like Solr’s QueryElevationComponent is what would suit your needs here.

Or, perhaps, adding something like this to your request:  bq={!terms 
f=id}3,5,6,8,9


—
Erik Hatcher, Senior Solutions Architect
http://www.lucidworks.com 




> On Oct 6, 2015, at 7:45 AM, aniljayanti  wrote:
> 
> Hi Erik,
> 
> thanks for your response, let me explain briefly.
> 
> i wanted to make 5 employee id's as a priority id's. so every time when i am
> searching with specific keyword, then i want to append these 5 employee id's
> as first 5 results to the search results.
> 
> example : 
> 
> let's take 3,5,6,8,9 are priority employee id's.
> when i am searching with specific keyword then got 4 docs (employee id's are
> 1,2,4,7) as results. 
> then i want to display the final result as below.
> 
> final result : 3,5,6,8,9,1,2,4,7
> 
> Please suggest me.
> 
> Thanks.
> 
> AnilJayanti
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Filter-first-components-result-in-solr-SearchHandler-tp4232892p4232926.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr cross core join special condition

2015-10-06 Thread Mikhail Khludnev
On Wed, Oct 7, 2015 at 7:05 AM, Ali Nazemian  wrote:

> it
> seems there is not any way to do that right now and it should be developed
> somehow. Am I right?
>

yep


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





hi

2015-10-06 Thread John
please unsubscribe me


Run Solr 5.3.0 as a Service on Windows using NSSM

2015-10-06 Thread Zheng Lin Edwin Yeo
Hi,

I tried to follow this to start my Solr as a service using NSSM.
http://www.norconex.com/how-to-run-solr5-as-a-service-on-windows/

Everything is fine when I start the services under Component Services.
However, when I tried to point to the Solr Admin page, it says that the
page cannot be found.

I have tried the same thing in Solr 5.1, and it was able to work. Not sure
why it couldn't work for Solr 5.2 and Solr 5.3.

Is there any changes required to what is listed on the website?

Regards,
Edwin


RE: Cannot connect to a zookeeper 3.4.6 instance via zkCli.cmd

2015-10-06 Thread Adrian Liew
Hi Shawn,

Thanks for the reply. Understood your comments and will revert back to the 
defaults. However, I raised this issue because I realized that Zookeeper 
becomes impatient if it cannot heartbeat its other peers in time. So for 
example, if 1 ZK server goes down out of 3 ZK servers, the 1 ZK server will 
stop pinging other servers and complain about timeout issues to zkCli connect 
to its service.

Will revert back with an update.

Regards,
Adrian

-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: Tuesday, October 6, 2015 10:16 PM
To: solr-user@lucene.apache.org
Subject: Re: Cannot connect to a zookeeper 3.4.6 instance via zkCli.cmd

On 10/6/2015 3:38 AM, Adrian Liew wrote:
> Thanks for the reply. Looks like this has been resolved by manually starting 
> the Zookeeper services on each server promptly so that the tickTime value 
> does not timeout too quickly to heartbeat other peers. Hence, I increased the 
> tickTime value to about 5 minutes to give some time for a node hosting 
> Zookeeper to restart and autostart its service. This case seems fixed but I 
> will double check again once more to be sure. I am using nssm 
> (non-sucking-service-manager) to autostart Zookeeper. I will need to retest 
> this once again using nssm to make sure zookeeper services are up and running.

That sounds like a very bad idea.  A typical tickTime is two *seconds*.
 Zookeeper is designed around certain things happening very quickly.

I don't think you can increase that to five *minutes* (multiplying it by
150) without the strong possibility of something going very wrong and processes 
hanging for minutes at a time waiting for a timeout that should happen very 
quickly.

I am reasonably certain that tickTime is used for zookeeper operation in 
several ways, so I believe that this much of an increase will cause fundamental 
problems with zookeeper's normal operation.  I admit that I have not looked at 
the code, so I could be wrong ... but based on the following information from 
the Zookeeper docs, I don't think I am wrong:

 tickTime

the length of a single tick, which is the basic time unit used by 
ZooKeeper, as measured in milliseconds. It is used to regulate heartbeats, and 
timeouts. For example, the minimum session timeout will be two ticks.

Thanks,
Shawn



Re: Solr cross core join special condition

2015-10-06 Thread Ali Nazemian
Yeah, but child document transformer is used for nested document inside
single core but I am looking for multiple core result joining. Then it
seems there is not any way to do that right now and it should be developed
somehow. Am I right?
Regards.
On Oct 6, 2015 9:53 PM, "Mikhail Khludnev" 
wrote:

> thus, something like [child]
>
> https://cwiki.apache.org/confluence/display/solr/Transforming+Result+Documents
> can be developed.
>
> On Tue, Oct 6, 2015 at 6:45 PM, Ali Nazemian 
> wrote:
>
> > Dear Mikhail,
> > Hi,
> > I want to enrich the result.
> > Regards
> > On Oct 6, 2015 7:07 PM, "Mikhail Khludnev" 
> > wrote:
> >
> > > Hello,
> > >
> > > Why do you need sibling core fields? do you facet? or just want to
> enrich
> > > result page with them?
> > >
> > > On Tue, Oct 6, 2015 at 6:04 PM, Ali Nazemian 
> > > wrote:
> > >
> > > > I was wondering how can I overcome this query requirement in Solr
> > 5.2.1:
> > > >
> > > > I have two different Solr cores refer as "core1" and "core2". core1
> > has
> > > > some fields such as field1, field2 and field3 and core2 has some
> other
> > > > fields such as field1, field4 and field5. I am looking for Solr query
> > > which
> > > > can return all of the documents requiring field1, field2, field3,
> > field4
> > > > and field5 with considering some condition on core1.
> > > >
> > > > For example:
> > > > core1:
> > > > -field1:123
> > > > -field2:"foo"
> > > > -field3:"bar"
> > > >
> > > > core2:
> > > > -field1:123
> > > > -field4:"hello"
> > > > -field5:"world"
> > > >
> > > > returning result:
> > > > field1:123
> > > > field2:"foo"
> > > > field3:"bar"
> > > > field4:"hello"
> > > > field4:"world"
> > > >
> > > > Thank you very much.
> > > >
> > > > Best regards.
> > > >
> > > > --
> > > > A.Nazemian
> > > >
> > >
> > >
> > >
> > > --
> > > Sincerely yours
> > > Mikhail Khludnev
> > > Principal Engineer,
> > > Grid Dynamics
> > >
> > > 
> > > 
> > >
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> 
> 
>


RE: Run Solr 5.3.0 as a Service on Windows using NSSM

2015-10-06 Thread Adrian Liew
Hi Edwin,

I have setup NSSM on Solr 5.3.0 in an Azure VM and can start up Solr with a 
base standalone installation. 

You may have to give Solr some time to bootstrap things and wait for the page 
to reload. Are you still seeing the page after 1 minute or so? 

What are your core sizes? And how many cores are you trying to load?

Best regards,
Adrian

-Original Message-
From: Zheng Lin Edwin Yeo [mailto:edwinye...@gmail.com] 
Sent: Wednesday, October 7, 2015 11:46 AM
To: solr-user@lucene.apache.org
Subject: Run Solr 5.3.0 as a Service on Windows using NSSM

Hi,

I tried to follow this to start my Solr as a service using NSSM.
http://www.norconex.com/how-to-run-solr5-as-a-service-on-windows/

Everything is fine when I start the services under Component Services.
However, when I tried to point to the Solr Admin page, it says that the page 
cannot be found.

I have tried the same thing in Solr 5.1, and it was able to work. Not sure why 
it couldn't work for Solr 5.2 and Solr 5.3.

Is there any changes required to what is listed on the website?

Regards,
Edwin


If zookeeper is down, SolrCloud nodes will not start correctly, even if zookeeper is started later

2015-10-06 Thread Adrian Liew
Changing subject header.

I am encountering this issue in Solr 5.3.0 whereby I am getting haywired leader 
election using SolrCloud. I am using NSSM 2.24 to startup my solr services with 
zookeeper set as a dependency. 

For example, if I have three servers marked as 10.0.0.4, 10.0.0.5 and 10.0.0.6, 
 both 10.0.0.4  and 10.0.0.5 show up as Leaders in Solr Admin Panel.

I then decided to manually stop all services, and ensure all Zookeeper services 
are booted up first prior to starting Solr services on all machines. Then I 
refreshed my Solr Admin Panel to observe the correct leaders and followers and 
test node recovery. Everything turned out fine.

Hence, the issue is that upon startup of three machines, the startup of ZK and 
Solr is out of sequence that causes SolrCloud to behave unexpectedly. Noting 
there is Jira ticket addressed here for Solr 4.9 above to include an 
improvement to the issue above. 
(https://issues.apache.org/jira/browse/SOLR-5129) 

Can someone please advise.

Best regards,
Adrian 

-Original Message-
From: Alessandro Benedetti [mailto:benedetti.ale...@gmail.com] 
Sent: Tuesday, October 6, 2015 7:14 PM
To: solr-user@lucene.apache.org
Subject: Re: Zookeeper HA with 3x ZK with Solr server nodes

When you have a ZK Ensemble a quorum of active nodes is necessary to have the 
entire Ensemble to work ( elect leaders, manage the cluster topology etc etc) .

The quorum is 50% living nodes +1 .
If you have an ensemble of 3 nodes, the quorum is 3/2 +1 = 2 nodes .
With an ensemble of 3 nodes, you can lose 1 and the ZK ensemble will continue 
to work.

If you have an ensemble of 5 nodes, the quorum is 5/2 +1 = 3 nodes With an 
ensemble of 5 nodes, you can lose 2 and the ZK ensemble will continue to work.
ect ect

Cheers

2015-10-06 10:55 GMT+01:00 Adrian Liew :

> Hi there,
>
>
>
> I have 3 Solr server Azure VM nodes participating in SolrCloud with ZK 
> installed on each of these nodes (to avoid a single point of failure 
> with ZK for leader election). Each Solr server is hosted in a Windows 
> Server
> 2012 R2 environment. I was told by my peer that if one zookeeper 
> service fails, the entire quorum fails. Hence if a quorum fails, does 
> that mean it will not be able to elect the leader from the remaining 2 
> alive Solr servers,  even if ZK services are installed in each node?
>
>
>
> I am yet to this out as this defeats the purpose of having a ZK 
> installed on each server. I am afraid if one node fails, a leader 
> cannot be elected with the remaining two available nodes. Correct me if I am 
> wrong.
>
>
>
> Regards,
>
> Adrian
>
>
>


--
--

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: Solr cross core join special condition

2015-10-06 Thread Mikhail Khludnev
thus, something like [child]
https://cwiki.apache.org/confluence/display/solr/Transforming+Result+Documents
can be developed.

On Tue, Oct 6, 2015 at 6:45 PM, Ali Nazemian  wrote:

> Dear Mikhail,
> Hi,
> I want to enrich the result.
> Regards
> On Oct 6, 2015 7:07 PM, "Mikhail Khludnev" 
> wrote:
>
> > Hello,
> >
> > Why do you need sibling core fields? do you facet? or just want to enrich
> > result page with them?
> >
> > On Tue, Oct 6, 2015 at 6:04 PM, Ali Nazemian 
> > wrote:
> >
> > > I was wondering how can I overcome this query requirement in Solr
> 5.2.1:
> > >
> > > I have two different Solr cores refer as "core1" and "core2". core1
> has
> > > some fields such as field1, field2 and field3 and core2 has some other
> > > fields such as field1, field4 and field5. I am looking for Solr query
> > which
> > > can return all of the documents requiring field1, field2, field3,
> field4
> > > and field5 with considering some condition on core1.
> > >
> > > For example:
> > > core1:
> > > -field1:123
> > > -field2:"foo"
> > > -field3:"bar"
> > >
> > > core2:
> > > -field1:123
> > > -field4:"hello"
> > > -field5:"world"
> > >
> > > returning result:
> > > field1:123
> > > field2:"foo"
> > > field3:"bar"
> > > field4:"hello"
> > > field4:"world"
> > >
> > > Thank you very much.
> > >
> > > Best regards.
> > >
> > > --
> > > A.Nazemian
> > >
> >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> > Principal Engineer,
> > Grid Dynamics
> >
> > 
> > 
> >
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





Re: Pressed optimize and now SOLR is not indexing while optimize is going on

2015-10-06 Thread Shawn Heisey
On 10/6/2015 8:18 AM, Siddhartha Singh Sandhu wrote:
> A have a few questions about optimize. Is the search index fully searchable
> after a commit?

If openSearcher is true on the commit, then changes to the index
(additions, replacements, deletions) will be visible when the commit
completes.

> How much time does one have to wait in case of a hard commit for the index
> to be available?

This is impossible to answer.  It will take as long as it takes, and the
time will depend on many factors, so it is nearly impossible to
predict.  The only way to know is to try it ... and the number you get
on one test may be very different than what you actually see once the
system is in production.

> I have an index of 180G. Do I need to hit the optimize on this chunk. This
> is a single core. Say I cannot get in a cloud env because of cost but this
> is a fairly large
> amazon machine where I have given SOLR 12G of memory.

Whatever RAM is left over after you give 12GB to Java for Solr will be
used automatically by the operating system to cache index data on the
disk.  Solr is completely reliant on that caching for good performance. 
A perfectly ideal system for that index and heap size would have 192GB
of RAM, which is enough to entirely cache the index.  I personally
wouldn't expect good performance with less than 96GB.  Some systems with
a 180GB index and a 12GB heap might be OK with 64GBtotal memory, while
others with the same size index will require more.

https://lucidworks.com/blog/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

If the index is on SSD, then RAM is *slightly* less important, and
performance usually goes up with SSD ... but an SSD cannot completely
replace RAM, because RAM is much faster.  With SSD, you can get away
with less RAM than you can on a spinning disk system, but depending on a
bunch of factors, it may not be a LOT less RAM.

https://wiki.apache.org/solr/SolrPerformanceProblems

Optimizing the index is almost never necessary with recent versions.  In
almost all cases optimizing will get you a performance increase, but it
comes at a huge cost in terms of resource utilization to DO the
optimize.  While the optimize is happening performance will likely be
worse, possibly a LOT worse.  Newer versions of Solr (Lucene) have
closed the gap on performance with non-optimized indexes, so it doesn't
gain you as much in performance as it did in earlier versions.

Thanks,
Shawn



Query to count matching terms and disable 'coord' multiplication

2015-10-06 Thread Tim Hearn
Hello everyone,

I have two questions

1) Is there a way to query solr to rank results based purely on the amount
of terms in the query which are contained in the document?
Example:
doc1: 'foo bar poo car foo'
q1: 'foo, car, two, start'
score(doc1, q1) = 2 (since both foo and car both occur in doc1 - never mind
that foo occurs twice)

This is also the numerator in the coord query

2) Is there a way to disable the 'coord' and 'query norm' multiplication of
query results all together?


Re: Pivot facets

2015-10-06 Thread Chris Hostetter

It's not entirely clear what your queries/data look like, orwhat results 
you are expecting to get back, please consider asking your question again 
with more details...
https://wiki.apache.org/solr/UsingMailingLists

...in the mean time the best guess i can make is that perhaps you aren't 
familiar with the "facet.missing" request param?  try adding 
facet.missing=true to your request and see if that gives you what you are 
looking for...

https://cwiki.apache.org/confluence/display/solr/Faceting#Faceting-Thefacet.missingParameter
https://cwiki.apache.org/confluence/display/solr/Faceting#Faceting-Pivot%28DecisionTree%29Faceting



-Hoss
http://www.lucidworks.com/


efficient sort by title (multi word field)

2015-10-06 Thread Gili Nachum
Hi, wanted to make sure I'm implementing sort in an efficient way...

I need to allow users to sort by documents' title field. A title can
contain 1-20 words.
Title examples: "new project meeting minutes - Oct 2015 - new chance on the
horizon" or "how to create a wonderful presentation".

I'm already indexing title as a TextField, and I not comfortable with
indexing it again as an extra StrField field I'll need plus the extra
FieldCache memory.
I can probably avoid the FieldCache by using docValues to reduce mem usage.

*But... is there a more efficient way to provide this sort? perhaps
something that takes advantage of title field is a chain of words with
whitespaces between words?*

My index is 100's of millions of documents over 8 shards.

Thanks.


RE: Solr 5.2.1 and spatial polygon searches

2015-10-06 Thread Lee Duhl
We were able to resolve this issue by installing the JTS library on the server 
and updating the solr schema.xml to remap the "solr. 
SpatialRecursivePrefixTreeFieldType" class to the "JtsSpatialContextFactory"

Thank You
Lee V. Duhl
Realcomp II Ltd.
Phone: (248) 699-9133
www.realcomp.com
www.moveinmichigan.com

-Original Message-
From: Alessandro Benedetti [mailto:benedetti.ale...@gmail.com] 
Sent: Tuesday, October 06, 2015 9:43 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr 5.2.1 and spatial polygon searches

Hi lee, shot in the dark, have you tried using the *WKT *syntax with range 
spatial approach*?*

for example :
q=geoloc:["0 18" TO "18 100”] .

I am using it in 5.3

Cheers


On 6 October 2015 at 14:22, Lee Duhl  wrote:

> The following query runs fine on Solr 4.x, but errors with a "Couldn't 
> parse shape " error message in Solr 5.2.1
> geoloc:"INTERSECTS(POLYGON((-83.38434219360353
> 42.51412013568205,-83.3474349975586 
> 42.51196902987156,-83.3561897277832
> 42.495390378152244,
> -83.4001350402832 42.496149801777875,-83.38434219360353
> 42.51412013568205)))"
>
> Solr 4 required the Spatial4J library to be installed in order for the 
> above query to run.
>
> Can Spatial4J be installed on Solr 5.2.1 or is there another library 
> that needs to be installed for these types of queries to work?
>
> Note: The above query is a simple "rectangle" polygon and is used for 
> only this example. Bbox queries are not applicable as most of our 
> queries generally use more complex polygons.
>
> Thank You
> Lee V. Duhl
> Realcomp II Ltd.
> Phone: (248) 699-9133
> www.realcomp.com
> www.moveinmichigan.com
>
>


--
--

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: Best Indexing Approaches - To max the throughput

2015-10-06 Thread Gili Nachum
CloudSolrServer  Beyond
sending documents to the right leader shard, it also do this in *parallel *(for
a batch), employing its own thread pool, with a connection per shard.

On Tue, Oct 6, 2015 at 8:15 PM, Walter Underwood 
wrote:

> This is at Chegg. One of our indexes is textbooks. These are expensive and
> don’t change very often. It is better to keep yesterday’s index than to
> drop a few important books.
>
> We have occasionally had an error that happens with every book, like a new
> field that is not in the Solr schema. If we ignored errors with that, we’d
> have an empty index: delete all, add all (failing), commit.
>
> With the fail fast and rollback, we can catch problems before they mess up
> the index.
>
> Also, to pinpoint isolated problems, if there is an error in the batch, it
> re-submits that batch one at a time, so we get an accurate report of which
> document was rejected. I wrote that same thing back at Netflix, before
> SolrJ.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
> > On Oct 6, 2015, at 9:49 AM, Alessandro Benedetti <
> benedetti.ale...@gmail.com> wrote:
> >
> > Hi Walter,
> > can you explain better your use case ?
> > You index a batch of e-commerce products ( Solr documents) if one fails,
> > you want to stop and invalidate the entire batch ( using the almost never
> > used solr rollback, or manual deletion ?)
> > And then log the exception indexing size.
> > To then re-index the whole batch od docs ?
> >
> > In this scenario, the ConcurrentUpdateSolrClient will not be ideal?
> > Only curiosity.
> >
> > Cheers
> >
> > On 6 October 2015 at 17:29, Walter Underwood 
> wrote:
> >
> >> It depends on the document. In a e-commerce search, you might want to
> fail
> >> immediately and be notified. That is what we do, fail, rollback, and
> notify.
> >>
> >> wunder
> >> Walter Underwood
> >> wun...@wunderwood.org
> >> http://observer.wunderwood.org/  (my blog)
> >>
> >>
> >>> On Oct 6, 2015, at 7:58 AM, Alessandro Benedetti <
> >> benedetti.ale...@gmail.com> wrote:
> >>>
> >>> mm one broken document in a batch should not break the entire
> batch ,
> >>> right ( whatever approach used) ?
> >>> Are you referring to the fact that you want to programmatically
> re-index
> >>> the broken docs ?
> >>>
> >>> Would be interesting to return the id of the broken docs along with the
> >>> solr update response!
> >>>
> >>> Cheers
> >>>
> >>>
> >>> On 6 October 2015 at 15:30, Bill Dueber  wrote:
> >>>
>  Just to add...my informal tests show that batching has way more
> >> effect
>  than solrj vs json.
> 
>  I haven't look at CUSC in a while, last time I looked it was
> impossible
> >> to
>  do anything smart about error handling, so check that out before you
> get
>  too deeply into it. We use a strategy of sending a batch of json
> >> documents,
>  and if it returns an error sending each record one at a time until we
> >> find
>  the bad one and can log something useful.
> 
> 
> 
>  On Mon, Oct 5, 2015 at 12:07 PM, Alessandro Benedetti <
>  benedetti.ale...@gmail.com> wrote:
> 
> > Thanks Erick,
> > you confirmed my impressions!
> > Thank you very much for the insights, an other opinion is welcome :)
> >
> > Cheers
> >
> > 2015-10-05 14:55 GMT+01:00 Erick Erickson :
> >
> >> SolrJ tends to be faster for several reasons, not the least of which
> >> is that it sends packets to Solr in a more efficient binary format.
> >>
> >> Batching is critical. I did some rough tests using SolrJ and sending
> >> docs one at a time gave a throughput of < 400 docs/second.
> >> Sending 10 gave 2,300 or so. Sending 100 at a time gave
> >> over 5,300 docs/second. Curiously, 1,000 at a time gave only
> >> marginal improvement over 100. This was with a single thread.
> >> YMMV of course.
> >>
> >> CloudSolrClient is definitely the better way to go with SolrCloud,
> >> it routes the docs to the correct leader instead of having the
> >> node you send the docs to do the routing.
> >>
> >> Best,
> >> Erick
> >>
> >> On Mon, Oct 5, 2015 at 4:57 AM, Alessandro Benedetti
> >>  wrote:
> >>> I was doing some studies and analysis, just wondering in your
> opinion
> >> which
> >>> one is the best approach to use to index in Solr to reach the best
> >>> throughput possible.
> >>> I know that a lot of factor are affecting Indexing time, so let's
>  only
> >>> focus in the feeding approach.
> >>> Let's isolate different scenarios :
> >>>
> >>> *Single Solr Infrastructure*
> >>>
> >>> 1) Xml/Json batch request to /update IndexHandler (xml/json)
> >>>
> >>> 2) SolrJ ConcurrentUpdateSolrClient ( 

Re: indexing data to solrcloud with "implicit" is not distributing across cluster.

2015-10-06 Thread Shawn Heisey
On 10/6/2015 10:02 AM, Steve wrote:
> Thanks Shawn, that fixed it !
>
> The documentation int the Collections API says  "The value can be ...
> *implicit*, which uses an internal default hash".

Thank you for pointing out this error in the documentation.  I did not
know it was there.  I have updated the online Reference Guide so it is
correct.  Hopefully this will help clear up any confusion!

https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-CreateaCollection

Thanks,
Shawn



Solr 5.2.1 - ReplicationHandler - No route to a host that is long gone

2015-10-06 Thread Eric Torti
Hey guys!

We have a deploy of SolrCloud 5.2.1 that is composed of 5 to 8 amazon linux
ec2 c3.2xlarge instances. Our main core is composed of 4M docs (6GB) and we
serve an average of 70 req/s per machine.

We are using zookeeper 3.4.6 to provide cluster synchronization. The thing
is we are noticing some weird "No route to host" exceptions on our logs. It
seems that the ReplicationHandler is trying to contact some other server
that used to be the cluster leader but is long gone now.

This behaviour is triggered when accessing this specific core's
"Dashboard".

http://my-server/solr/admin/collections?action=clusterstatus tells me this
former leader is down. So zookeeper knows about it. Any ideas on why the
ReplicationHandler is still trying to contact it? I'll attach the
stacktrace just to illustrate the situation.

Any help will be greatly appreciated.

Thanks!

Best,

Eric

'''

2015-Oct-06 18:18:02,446 [qtp1690716179-12764]
org.apache.solr.handler.ReplicationHandler

  WARN  Exception while invoking 'details' method for replication on master

org.apache.solr.client.solrj.SolrServerException: IOException occured when
talking to server at: http://10.10.10.10:8983/solr/my-core

at
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:574)

at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:235)

at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:227)

at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1220)

at org.apache.solr.handler.IndexFetcher.getDetails(IndexFetcher.java:1563)

at
org.apache.solr.handler.ReplicationHandler.getReplicationDetails(ReplicationHandler.java:821)

at
org.apache.solr.handler.ReplicationHandler.handleRequestBody(ReplicationHandler.java:305)

at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064)

at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)

at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450)

at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227)

at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196)

at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)

at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)

at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)

at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)

at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)

at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)

at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)

at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)

at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)

at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)

at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)

at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)

at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)

at org.eclipse.jetty.server.Server.handle(Server.java:497)

at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)

at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)

at
org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)

at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)

at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)

at java.lang.Thread.run(Thread.java:745)

Caused by: java.net.NoRouteToHostException: No route to host

at java.net.PlainSocketImpl.socketConnect(Native Method)

at
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345)

at
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)

at
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)

at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)

at java.net.Socket.connect(Socket.java:589)

at
org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:117)

at
org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:177)

at
org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:304)

at
org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:611)

at
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:446)

at
org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:882)

at

Filter first-components result in solr.SearchHandler

2015-10-06 Thread aniljayanti
Hi All,

I am workng on solr 5.2.1. I wrote my own component to get employee id's
from first-component. I am trying to pass these id's to normal
solr.SearchHandler () to filter the employee id's.

relevant request handler in solrconfig.xml file : 



  

  explicit
  100
  text



  custom-priority
 
  

How I can pass employee id's in qf param correctly in query so that solr can
use this while searching ? 

Suggestions are appreciated..

 thanks in advance

AnilJayanti




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Filter-first-components-result-in-solr-SearchHandler-tp4232892.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Recovery Thread Blocked

2015-10-06 Thread Rallavagu
Mark - currently 5.3 is being evaluated for upgrade purposes and 
hopefully get there sooner. Meanwhile, following exception is noted from 
logs during updates


ERROR org.apache.solr.update.CommitTracker  – auto commit 
error...:java.lang.IllegalStateException: this writer hit an 
OutOfMemoryError; cannot commit
at 
org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2807)
at 
org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2984)
at 
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:559)

at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:440)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:896)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:919)

at java.lang.Thread.run(Thread.java:682)

Considering the fact that the machine is configured with 48G (24G for 
JVM which will be reduced in future) wondering how would it still go out 
of memory. For memory mapped index files the remaining 24G or what is 
available off of it should be available. Looking at the lsof output the 
memory mapped files were around 10G.


Thanks.


On 10/5/15 5:41 PM, Mark Miller wrote:

I'd make two guess:

Looks like you are using Jrocket? I don't think that is common or well
tested at this point.

There are a billion or so bug fixes from 4.6.1 to 5.3.2. Given the pace of
SolrCloud, you are dealing with something fairly ancient and so it will be
harder to find help with older issues most likely.

- Mark

On Mon, Oct 5, 2015 at 12:46 PM Rallavagu  wrote:


Any takers on this? Any kinda clue would help. Thanks.

On 10/4/15 10:14 AM, Rallavagu wrote:

As there were no responses so far, I assume that this is not a very
common issue that folks come across. So, I went into source (4.6.1) to
see if I can figure out what could be the cause.


The thread that is locking is in this block of code

synchronized (recoveryLock) {
// to be air tight we must also check after lock
if (cc.isShutDown()) {
  log.warn("Skipping recovery because Solr is shutdown");
  return;
}
log.info("Running recovery - first canceling any ongoing

recovery");

cancelRecovery();

while (recoveryRunning) {
  try {
recoveryLock.wait(1000);
  } catch (InterruptedException e) {

  }
  // check again for those that were waiting
  if (cc.isShutDown()) {
log.warn("Skipping recovery because Solr is shutdown");
return;
  }
  if (closed) return;
}

Subsequently, the thread will get into cancelRecovery method as below,

public void cancelRecovery() {
  synchronized (recoveryLock) {
if (recoveryStrat != null && recoveryRunning) {
  recoveryStrat.close();
  while (true) {
try {
  recoveryStrat.join();
} catch (InterruptedException e) {
  // not interruptible - keep waiting
  continue;
}
break;
  }

  recoveryRunning = false;
  recoveryLock.notifyAll();
}
  }
}

As per the stack trace "recoveryStrat.join()" is where things are
holding up.

I wonder why/how cancelRecovery would take time so around 870 threads
would be waiting on. Is it possible that ZK is not responding or
something else like Operating System resources could cause this? Thanks.


On 10/2/15 4:17 PM, Rallavagu wrote:

Here is the stack trace of the thread that is holding the lock.


"Thread-55266" id=77142 idx=0xc18 tid=992 prio=5 alive, waiting,
native_blocked, daemon
  -- Waiting for notification on:
org/apache/solr/cloud/RecoveryStrategy@0x3f34e8480[fat lock]
  at pthread_cond_wait@@GLIBC_2.3.2+202(:0)@0x3d4180b5ba
  at eventTimedWaitNoTransitionImpl+71(event.c:90)@0x7ff3133b6ba8
  at
syncWaitForSignalNoTransition+65(synchronization.c:28)@0x7ff31354a0b2
  at syncWaitForSignal+189(synchronization.c:85)@0x7ff31354a20e
  at syncWaitForJavaSignal+38(synchronization.c:93)@0x7ff31354a327
  at


RJNI_jrockit_vm_Threads_waitForNotifySignal+73(rnithreads.c:72)@0x7ff31351939a



  at
jrockit/vm/Threads.waitForNotifySignal(JLjava/lang/Object;)Z(Native
Method)
  at java/lang/Object.wait(J)V(Native Method)
  at java/lang/Thread.join(Thread.java:1206)
  ^-- Lock released while waiting:
org/apache/solr/cloud/RecoveryStrategy@0x3f34e8480[fat lock]
  at java/lang/Thread.join(Thread.java:1259)
  at



EdgeNGramFilterFactory question

2015-10-06 Thread vit
I have Solr 4.2

1) Is it possible to somehow use EdgeNGramFilterFactory ignoring white
spaces in n-grams?

2) Is it possible to use EdgeNGramFilterFactory in combination with stemming
?
Say applying this to "look for close hotel" instead of "looking for
closest hotels"



--
View this message in context: 
http://lucene.472066.n3.nabble.com/EdgeNGramFilterFactory-question-tp4233034.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Pressed optimize and now SOLR is not indexing while optimize is going on

2015-10-06 Thread Siddhartha Singh Sandhu
Thank you for helping out.

Further inquiry: I am committing records to my solr implementation and they
are not getting showing up in my search. I am search on the default id.
Is this related to the fact that I dont have enough memory so my SOLR is
taking a lot of time to actually making the indexed documents available
instantly.

I  also looked at the solr log when I sent in my curl commit with my
record(which I can not see in the SOLR instance even after sending it
repeatedly), but it didn't through an error.

I got this as my response on insertion of that record:

{"responseHeader":{"status":0,"QTime":57}}

Thank you.

Sid.

On Tue, Oct 6, 2015 at 3:21 PM, Shawn Heisey  wrote:

> On 10/6/2015 8:18 AM, Siddhartha Singh Sandhu wrote:
> > A have a few questions about optimize. Is the search index fully
> searchable
> > after a commit?
>
> If openSearcher is true on the commit, then changes to the index
> (additions, replacements, deletions) will be visible when the commit
> completes.
>
> > How much time does one have to wait in case of a hard commit for the
> index
> > to be available?
>
> This is impossible to answer.  It will take as long as it takes, and the
> time will depend on many factors, so it is nearly impossible to
> predict.  The only way to know is to try it ... and the number you get
> on one test may be very different than what you actually see once the
> system is in production.
>
> > I have an index of 180G. Do I need to hit the optimize on this chunk.
> This
> > is a single core. Say I cannot get in a cloud env because of cost but
> this
> > is a fairly large
> > amazon machine where I have given SOLR 12G of memory.
>
> Whatever RAM is left over after you give 12GB to Java for Solr will be
> used automatically by the operating system to cache index data on the
> disk.  Solr is completely reliant on that caching for good performance.
> A perfectly ideal system for that index and heap size would have 192GB
> of RAM, which is enough to entirely cache the index.  I personally
> wouldn't expect good performance with less than 96GB.  Some systems with
> a 180GB index and a 12GB heap might be OK with 64GBtotal memory, while
> others with the same size index will require more.
>
>
> https://lucidworks.com/blog/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
>
> If the index is on SSD, then RAM is *slightly* less important, and
> performance usually goes up with SSD ... but an SSD cannot completely
> replace RAM, because RAM is much faster.  With SSD, you can get away
> with less RAM than you can on a spinning disk system, but depending on a
> bunch of factors, it may not be a LOT less RAM.
>
> https://wiki.apache.org/solr/SolrPerformanceProblems
>
> Optimizing the index is almost never necessary with recent versions.  In
> almost all cases optimizing will get you a performance increase, but it
> comes at a huge cost in terms of resource utilization to DO the
> optimize.  While the optimize is happening performance will likely be
> worse, possibly a LOT worse.  Newer versions of Solr (Lucene) have
> closed the gap on performance with non-optimized indexes, so it doesn't
> gain you as much in performance as it did in earlier versions.
>
> Thanks,
> Shawn
>
>


Re: Recovery Thread Blocked

2015-10-06 Thread Mark Miller
That amount of RAM can easily be eaten up depending on your sorting,
faceting, data.

Do you have gc logging enabled? That should describe what is happening with
the heap.

- Mark

On Tue, Oct 6, 2015 at 4:04 PM Rallavagu  wrote:

> Mark - currently 5.3 is being evaluated for upgrade purposes and
> hopefully get there sooner. Meanwhile, following exception is noted from
> logs during updates
>
> ERROR org.apache.solr.update.CommitTracker  – auto commit
> error...:java.lang.IllegalStateException: this writer hit an
> OutOfMemoryError; cannot commit
>  at
>
> org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2807)
>  at
> org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2984)
>  at
>
> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:559)
>  at
> org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
>  at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:440)
>  at
>
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
>  at
>
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
>  at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:896)
>  at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:919)
>  at java.lang.Thread.run(Thread.java:682)
>
> Considering the fact that the machine is configured with 48G (24G for
> JVM which will be reduced in future) wondering how would it still go out
> of memory. For memory mapped index files the remaining 24G or what is
> available off of it should be available. Looking at the lsof output the
> memory mapped files were around 10G.
>
> Thanks.
>
>
> On 10/5/15 5:41 PM, Mark Miller wrote:
> > I'd make two guess:
> >
> > Looks like you are using Jrocket? I don't think that is common or well
> > tested at this point.
> >
> > There are a billion or so bug fixes from 4.6.1 to 5.3.2. Given the pace
> of
> > SolrCloud, you are dealing with something fairly ancient and so it will
> be
> > harder to find help with older issues most likely.
> >
> > - Mark
> >
> > On Mon, Oct 5, 2015 at 12:46 PM Rallavagu  wrote:
> >
> >> Any takers on this? Any kinda clue would help. Thanks.
> >>
> >> On 10/4/15 10:14 AM, Rallavagu wrote:
> >>> As there were no responses so far, I assume that this is not a very
> >>> common issue that folks come across. So, I went into source (4.6.1) to
> >>> see if I can figure out what could be the cause.
> >>>
> >>>
> >>> The thread that is locking is in this block of code
> >>>
> >>> synchronized (recoveryLock) {
> >>> // to be air tight we must also check after lock
> >>> if (cc.isShutDown()) {
> >>>   log.warn("Skipping recovery because Solr is shutdown");
> >>>   return;
> >>> }
> >>> log.info("Running recovery - first canceling any ongoing
> >> recovery");
> >>> cancelRecovery();
> >>>
> >>> while (recoveryRunning) {
> >>>   try {
> >>> recoveryLock.wait(1000);
> >>>   } catch (InterruptedException e) {
> >>>
> >>>   }
> >>>   // check again for those that were waiting
> >>>   if (cc.isShutDown()) {
> >>> log.warn("Skipping recovery because Solr is shutdown");
> >>> return;
> >>>   }
> >>>   if (closed) return;
> >>> }
> >>>
> >>> Subsequently, the thread will get into cancelRecovery method as below,
> >>>
> >>> public void cancelRecovery() {
> >>>   synchronized (recoveryLock) {
> >>> if (recoveryStrat != null && recoveryRunning) {
> >>>   recoveryStrat.close();
> >>>   while (true) {
> >>> try {
> >>>   recoveryStrat.join();
> >>> } catch (InterruptedException e) {
> >>>   // not interruptible - keep waiting
> >>>   continue;
> >>> }
> >>> break;
> >>>   }
> >>>
> >>>   recoveryRunning = false;
> >>>   recoveryLock.notifyAll();
> >>> }
> >>>   }
> >>> }
> >>>
> >>> As per the stack trace "recoveryStrat.join()" is where things are
> >>> holding up.
> >>>
> >>> I wonder why/how cancelRecovery would take time so around 870 threads
> >>> would be waiting on. Is it possible that ZK is not responding or
> >>> something else like Operating System resources could cause this?
> Thanks.
> >>>
> >>>
> >>> On 10/2/15 4:17 PM, Rallavagu wrote:
>  Here is the stack trace of the thread that is holding the lock.
> 
> 
>  "Thread-55266" id=77142 idx=0xc18 tid=992 prio=5 alive, waiting,
>  native_blocked, daemon
>    -- Waiting for notification on:
>  org/apache/solr/cloud/RecoveryStrategy@0x3f34e8480[fat lock]
>    

Re: Recovery Thread Blocked

2015-10-06 Thread Rallavagu
GC logging shows normal. The "OutOfMemoryError" appears to be pertaining 
to a thread but not to JVM.


On 10/6/15 1:07 PM, Mark Miller wrote:

That amount of RAM can easily be eaten up depending on your sorting,
faceting, data.

Do you have gc logging enabled? That should describe what is happening with
the heap.

- Mark

On Tue, Oct 6, 2015 at 4:04 PM Rallavagu  wrote:


Mark - currently 5.3 is being evaluated for upgrade purposes and
hopefully get there sooner. Meanwhile, following exception is noted from
logs during updates

ERROR org.apache.solr.update.CommitTracker  – auto commit
error...:java.lang.IllegalStateException: this writer hit an
OutOfMemoryError; cannot commit
  at

org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2807)
  at
org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2984)
  at

org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:559)
  at
org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
  at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:440)
  at

java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
  at

java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
  at

java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:896)
  at

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:919)
  at java.lang.Thread.run(Thread.java:682)

Considering the fact that the machine is configured with 48G (24G for
JVM which will be reduced in future) wondering how would it still go out
of memory. For memory mapped index files the remaining 24G or what is
available off of it should be available. Looking at the lsof output the
memory mapped files were around 10G.

Thanks.


On 10/5/15 5:41 PM, Mark Miller wrote:

I'd make two guess:

Looks like you are using Jrocket? I don't think that is common or well
tested at this point.

There are a billion or so bug fixes from 4.6.1 to 5.3.2. Given the pace

of

SolrCloud, you are dealing with something fairly ancient and so it will

be

harder to find help with older issues most likely.

- Mark

On Mon, Oct 5, 2015 at 12:46 PM Rallavagu  wrote:


Any takers on this? Any kinda clue would help. Thanks.

On 10/4/15 10:14 AM, Rallavagu wrote:

As there were no responses so far, I assume that this is not a very
common issue that folks come across. So, I went into source (4.6.1) to
see if I can figure out what could be the cause.


The thread that is locking is in this block of code

synchronized (recoveryLock) {
 // to be air tight we must also check after lock
 if (cc.isShutDown()) {
   log.warn("Skipping recovery because Solr is shutdown");
   return;
 }
 log.info("Running recovery - first canceling any ongoing

recovery");

 cancelRecovery();

 while (recoveryRunning) {
   try {
 recoveryLock.wait(1000);
   } catch (InterruptedException e) {

   }
   // check again for those that were waiting
   if (cc.isShutDown()) {
 log.warn("Skipping recovery because Solr is shutdown");
 return;
   }
   if (closed) return;
 }

Subsequently, the thread will get into cancelRecovery method as below,

public void cancelRecovery() {
   synchronized (recoveryLock) {
 if (recoveryStrat != null && recoveryRunning) {
   recoveryStrat.close();
   while (true) {
 try {
   recoveryStrat.join();
 } catch (InterruptedException e) {
   // not interruptible - keep waiting
   continue;
 }
 break;
   }

   recoveryRunning = false;
   recoveryLock.notifyAll();
 }
   }
 }

As per the stack trace "recoveryStrat.join()" is where things are
holding up.

I wonder why/how cancelRecovery would take time so around 870 threads
would be waiting on. Is it possible that ZK is not responding or
something else like Operating System resources could cause this?

Thanks.



On 10/2/15 4:17 PM, Rallavagu wrote:

Here is the stack trace of the thread that is holding the lock.


"Thread-55266" id=77142 idx=0xc18 tid=992 prio=5 alive, waiting,
native_blocked, daemon
   -- Waiting for notification on:
org/apache/solr/cloud/RecoveryStrategy@0x3f34e8480[fat lock]
   at pthread_cond_wait@@GLIBC_2.3.2+202(:0)@0x3d4180b5ba
   at eventTimedWaitNoTransitionImpl+71(event.c:90)@0x7ff3133b6ba8
   at
syncWaitForSignalNoTransition+65(synchronization.c:28)@0x7ff31354a0b2
   at syncWaitForSignal+189(synchronization.c:85)@0x7ff31354a20e
   at syncWaitForJavaSignal+38(synchronization.c:93)@0x7ff31354a327
 

Re: indexing data to solrcloud with "implicit" is not distributing across cluster.

2015-10-06 Thread Chris Hostetter

: The documentation int the Collections API says  "The value can be ...
: *implicit*, which uses an internal default hash".
: I think most people would assume the "hash" would be used to route the
: data.
: Meanwhile the description of CompositID in the "Document Routing" section
: only discusses how modify your document IDs, which I did not want to do.

Hmmm... I'm guessing you are looking at PDF copy of the ref guide?  

Pretty sure that was a mistake that's already been fixed.  At the moment 
the Collections API CREATE command says...

https://cwiki.apache.org/confluence/display/solr/Collections+API

"The 'implicit' router does not automatically route documents to different 
shards.  Whichever shard you indicate on the indexing request (or within 
each document) will be used as the destination for those documents"



And the details on document routing say...

https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud#ShardsandIndexingDatainSolrCloud-DocumentRouting

If you created the collection and defined the "implicit" router at the 
time of creation, you can additionally define a router.field parameter to 
use a field from each document to identify a shard where the document 
belongs. If the field specified is missing in the document, however, the 
document will be rejected. You could also use the _route_ parameter to 
name a specific shard.


...which i believe is all accurate.



-Hoss
http://www.lucidworks.com/


Re: Facet queries blow out the filterCache

2015-10-06 Thread Chris Hostetter
: So, no SolrCloud, default example config, about as basic as you get. I
: didn’t even bother indexing any docs. Then I issued this query:
: 
: http://localhost:8983/solr/techproducts/select?q=name:foo=1=true
: =popularity=0=-1

: This still causes an insert into the filterCache.

the faceting component is a type of operation that indicates in the 
QueryCommand that it needs to GET_DOCSET for the set of all documents 
matching the query (independent of pagination) -- the point of this DocSet 
is so the faceting logic can then compute the intersection of the set of 
all matching documents with the set of documents matching each facet 
constraint.  the cached DocSet will be re-used both within the context 
of the current request, and in future facet requests over the 
same query+filters.

: The only real difference I’m noticing vs my solrcloud collection is that
: repeating the query increments cache lookups and hits. It’s still odd
: though, because issuing new distinct queries causes a reported insert, but
: not a lookup, so the cache hit ratio is always exactly 1.

i'm not following what you are saying at all ... can you give some 
concrete examples (ie: "starting with an empty cache i do this request, 
then i see these cache stats, then i do this identical/different query and 
then the cache stats look like this...")



-Hoss
http://www.lucidworks.com/

MailEntityProcessor

2015-10-06 Thread Gaurav Gupta
Hello Guys,

I am trying MailEntityProcessor in Solr 5

Below is my configuration :


  

  


 
  


Issue I am facing :

1. transformers not working i.e. *template*
2. Looks like delta-import not supported ?
3. If I am doing full-import then its considering dataimport.properties
i.e. doing delta import

Please suggest !!

KR,
Gaurav


RE: Implementing AbstractFullDistribZkTestBase

2015-10-06 Thread Markus Jelsma
A crap, i didn't spot the httpS! I have added SSL supression, so far the tests 
roll fine.

Thanks!
Markus
 
-Original message-
> From:Mark Miller 
> Sent: Tuesday 6th October 2015 2:27
> To: solr-user@lucene.apache.org
> Subject: Re: Implementing AbstractFullDistribZkTestBase
> 
> Not sure what that means :)
> 
> SOLR-5776 would not happen all the time, but too frequently. It also
> wouldn't matter the power of CPU, cores or RAM :)
> 
> Do you see fails without https is what you want to check.
> 
> - mark
> 
> On Mon, Oct 5, 2015 at 2:16 PM Markus Jelsma 
> wrote:
> 
> > Hi - no, i don't think so, it doesn't happen all the time, but too
> > frequently. The machine running the tests has a high powered CPU, plenty of
> > cores and RAM.
> >
> > Markus
> >
> >
> >
> > -Original message-
> > > From:Mark Miller 
> > > Sent: Monday 5th October 2015 19:52
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: Implementing AbstractFullDistribZkTestBase
> > >
> > > If it's always when using https as in your examples, perhaps it's
> > SOLR-5776.
> > >
> > > - mark
> > >
> > > On Mon, Oct 5, 2015 at 10:36 AM Markus Jelsma <
> > markus.jel...@openindex.io>
> > > wrote:
> > >
> > > > Hmmm, i tried that just now but i sometimes get tons of Connection
> > reset
> > > > errors. The tests then end with "There are still nodes recoverying -
> > waited
> > > > for 30 seconds".
> > > >
> > > > [RecoveryThread-collection1] ERROR
> > org.apache.solr.cloud.RecoveryStrategy
> > > > - Error while trying to
> > recover.:java.util.concurrent.ExecutionException:
> > > > org.apache.solr.client.solrj.SolrServerException: IOException occured
> > when
> > > > talking to server at: https://127.0.0.1:49146
> > > > at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> > > > at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> > > > at
> > > >
> > org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:598)
> > > > at
> > > >
> > org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:361)
> > > > at
> > > > org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:227)
> > > > Caused by: org.apache.solr.client.solrj.SolrServerException:
> > IOException
> > > > occured when talking to server at: https://127.0.0.1:49146
> > > > at
> > > >
> > org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:574)
> > > > at
> > > >
> > org.apache.solr.client.solrj.impl.HttpSolrClient$1.call(HttpSolrClient.java:270)
> > > > at
> > > >
> > org.apache.solr.client.solrj.impl.HttpSolrClient$1.call(HttpSolrClient.java:266)
> > > > at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> > > > at
> > > >
> > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:210)
> > > > at
> > > >
> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> > > > at
> > > >
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> > > > at java.lang.Thread.run(Thread.java:745)
> > > > Caused by: java.net.SocketException: Connection reset
> > > > at java.net.SocketInputStream.read(SocketInputStream.java:209)
> > > > at java.net.SocketInputStream.read(SocketInputStream.java:141)
> > > > at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
> > > > at sun.security.ssl.InputRecord.read(InputRecord.java:503)
> > > > at
> > > > sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:954)
> > > > at
> > > >
> > sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1343)
> > > > at
> > > > sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1371)
> > > > at
> > > > sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1355)
> > > > at
> > > >
> > org.apache.http.conn.ssl.SSLSocketFactory.connectSocket(SSLSocketFactory.java:543)
> > > > at
> > > >
> > org.apache.http.conn.ssl.SSLSocketFactory.connectSocket(SSLSocketFactory.java:409)
> > > > at
> > > >
> > org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:177)
> > > > at
> > > >
> > org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:304)
> > > > at
> > > >
> > org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:611)
> > > > at
> > > >
> > org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:446)
> > > > at
> > > >
> > org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:882)
> > > > at
> > > >
> > 

Re: Numeric Sorting with 0 and NULL Values

2015-10-06 Thread Todd Long
Chris Hostetter-3 wrote
> ...i mention this as being a workarround for floats/doubles because the 
> functions are evaluated as doubles (no "casting" or "forced integer 
> context" type support at the moment), so with integer/float fields there 
> would be some loss of precision.

Excellent, thank you for the reply.

My initial thought was going with the extra un-indexed/un-stored field... I
wasn't aware of the "docValues" attribute to be used in that case for
sorting (I assume this is more for performance). Thank you for the default
value explanation.

I definitely like the workaround as a reindex-free option. I'm curious as to
where the loss of precision would be when using "-(Double.MAX_VALUE)" as you
mentioned? Also, any specific reason why you chose that over
Double.MIN_VALUE (sorry, just making sure I'm not missing something)? I
would think an int or long field would simply cast down from the double
min/max value... at least that is what I gathered from poking around the
"def()" function code. Of course, the decimal would be lost with the int and
long but I would still come away with the min value of -2147483648 and
-9223372036854775808, respectively.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Numeric-Sorting-with-0-and-NULL-Values-tp4232654p4233117.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Recovery Thread Blocked

2015-10-06 Thread Mark Miller
If it's a thread and you have plenty of RAM and the heap is fine, have you
checked raising OS thread limits?

- Mark

On Tue, Oct 6, 2015 at 4:54 PM Rallavagu  wrote:

> GC logging shows normal. The "OutOfMemoryError" appears to be pertaining
> to a thread but not to JVM.
>
> On 10/6/15 1:07 PM, Mark Miller wrote:
> > That amount of RAM can easily be eaten up depending on your sorting,
> > faceting, data.
> >
> > Do you have gc logging enabled? That should describe what is happening
> with
> > the heap.
> >
> > - Mark
> >
> > On Tue, Oct 6, 2015 at 4:04 PM Rallavagu  wrote:
> >
> >> Mark - currently 5.3 is being evaluated for upgrade purposes and
> >> hopefully get there sooner. Meanwhile, following exception is noted from
> >> logs during updates
> >>
> >> ERROR org.apache.solr.update.CommitTracker  – auto commit
> >> error...:java.lang.IllegalStateException: this writer hit an
> >> OutOfMemoryError; cannot commit
> >>   at
> >>
> >>
> org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2807)
> >>   at
> >>
> org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2984)
> >>   at
> >>
> >>
> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:559)
> >>   at
> >> org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
> >>   at
> >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:440)
> >>   at
> >>
> >>
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
> >>   at
> >>
> >>
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
> >>   at
> >>
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:896)
> >>   at
> >>
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:919)
> >>   at java.lang.Thread.run(Thread.java:682)
> >>
> >> Considering the fact that the machine is configured with 48G (24G for
> >> JVM which will be reduced in future) wondering how would it still go out
> >> of memory. For memory mapped index files the remaining 24G or what is
> >> available off of it should be available. Looking at the lsof output the
> >> memory mapped files were around 10G.
> >>
> >> Thanks.
> >>
> >>
> >> On 10/5/15 5:41 PM, Mark Miller wrote:
> >>> I'd make two guess:
> >>>
> >>> Looks like you are using Jrocket? I don't think that is common or well
> >>> tested at this point.
> >>>
> >>> There are a billion or so bug fixes from 4.6.1 to 5.3.2. Given the pace
> >> of
> >>> SolrCloud, you are dealing with something fairly ancient and so it will
> >> be
> >>> harder to find help with older issues most likely.
> >>>
> >>> - Mark
> >>>
> >>> On Mon, Oct 5, 2015 at 12:46 PM Rallavagu  wrote:
> >>>
>  Any takers on this? Any kinda clue would help. Thanks.
> 
>  On 10/4/15 10:14 AM, Rallavagu wrote:
> > As there were no responses so far, I assume that this is not a very
> > common issue that folks come across. So, I went into source (4.6.1)
> to
> > see if I can figure out what could be the cause.
> >
> >
> > The thread that is locking is in this block of code
> >
> > synchronized (recoveryLock) {
> >  // to be air tight we must also check after lock
> >  if (cc.isShutDown()) {
> >log.warn("Skipping recovery because Solr is shutdown");
> >return;
> >  }
> >  log.info("Running recovery - first canceling any ongoing
>  recovery");
> >  cancelRecovery();
> >
> >  while (recoveryRunning) {
> >try {
> >  recoveryLock.wait(1000);
> >} catch (InterruptedException e) {
> >
> >}
> >// check again for those that were waiting
> >if (cc.isShutDown()) {
> >  log.warn("Skipping recovery because Solr is shutdown");
> >  return;
> >}
> >if (closed) return;
> >  }
> >
> > Subsequently, the thread will get into cancelRecovery method as
> below,
> >
> > public void cancelRecovery() {
> >synchronized (recoveryLock) {
> >  if (recoveryStrat != null && recoveryRunning) {
> >recoveryStrat.close();
> >while (true) {
> >  try {
> >recoveryStrat.join();
> >  } catch (InterruptedException e) {
> >// not interruptible - keep waiting
> >continue;
> >  }
> >  break;
> >}
> >
> >recoveryRunning = false;
> >recoveryLock.notifyAll();
> >  }
> >}
> 

Re: Recovery Thread Blocked

2015-10-06 Thread Rallavagu

It is java thread though. Does it need increasing OS level threads?

On 10/6/15 6:21 PM, Mark Miller wrote:

If it's a thread and you have plenty of RAM and the heap is fine, have you
checked raising OS thread limits?

- Mark

On Tue, Oct 6, 2015 at 4:54 PM Rallavagu  wrote:


GC logging shows normal. The "OutOfMemoryError" appears to be pertaining
to a thread but not to JVM.

On 10/6/15 1:07 PM, Mark Miller wrote:

That amount of RAM can easily be eaten up depending on your sorting,
faceting, data.

Do you have gc logging enabled? That should describe what is happening

with

the heap.

- Mark

On Tue, Oct 6, 2015 at 4:04 PM Rallavagu  wrote:


Mark - currently 5.3 is being evaluated for upgrade purposes and
hopefully get there sooner. Meanwhile, following exception is noted from
logs during updates

ERROR org.apache.solr.update.CommitTracker  – auto commit
error...:java.lang.IllegalStateException: this writer hit an
OutOfMemoryError; cannot commit
   at



org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2807)

   at


org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2984)

   at



org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:559)

   at
org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
   at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:440)
   at



java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)

   at



java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)

   at



java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:896)

   at



java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:919)

   at java.lang.Thread.run(Thread.java:682)

Considering the fact that the machine is configured with 48G (24G for
JVM which will be reduced in future) wondering how would it still go out
of memory. For memory mapped index files the remaining 24G or what is
available off of it should be available. Looking at the lsof output the
memory mapped files were around 10G.

Thanks.


On 10/5/15 5:41 PM, Mark Miller wrote:

I'd make two guess:

Looks like you are using Jrocket? I don't think that is common or well
tested at this point.

There are a billion or so bug fixes from 4.6.1 to 5.3.2. Given the pace

of

SolrCloud, you are dealing with something fairly ancient and so it will

be

harder to find help with older issues most likely.

- Mark

On Mon, Oct 5, 2015 at 12:46 PM Rallavagu  wrote:


Any takers on this? Any kinda clue would help. Thanks.

On 10/4/15 10:14 AM, Rallavagu wrote:

As there were no responses so far, I assume that this is not a very
common issue that folks come across. So, I went into source (4.6.1)

to

see if I can figure out what could be the cause.


The thread that is locking is in this block of code

synchronized (recoveryLock) {
  // to be air tight we must also check after lock
  if (cc.isShutDown()) {
log.warn("Skipping recovery because Solr is shutdown");
return;
  }
  log.info("Running recovery - first canceling any ongoing

recovery");

  cancelRecovery();

  while (recoveryRunning) {
try {
  recoveryLock.wait(1000);
} catch (InterruptedException e) {

}
// check again for those that were waiting
if (cc.isShutDown()) {
  log.warn("Skipping recovery because Solr is shutdown");
  return;
}
if (closed) return;
  }

Subsequently, the thread will get into cancelRecovery method as

below,


public void cancelRecovery() {
synchronized (recoveryLock) {
  if (recoveryStrat != null && recoveryRunning) {
recoveryStrat.close();
while (true) {
  try {
recoveryStrat.join();
  } catch (InterruptedException e) {
// not interruptible - keep waiting
continue;
  }
  break;
}

recoveryRunning = false;
recoveryLock.notifyAll();
  }
}
  }

As per the stack trace "recoveryStrat.join()" is where things are
holding up.

I wonder why/how cancelRecovery would take time so around 870 threads
would be waiting on. Is it possible that ZK is not responding or
something else like Operating System resources could cause this?

Thanks.



On 10/2/15 4:17 PM, Rallavagu wrote:

Here is the stack trace of the thread that is holding the lock.


"Thread-55266" id=77142 idx=0xc18 tid=992 prio=5 alive, waiting,
native_blocked, daemon
-- Waiting for notification on:

RE: Cannot connect to a zookeeper 3.4.6 instance via zkCli.cmd

2015-10-06 Thread Adrian Liew
Hi Edwin,

Thanks for the reply. Looks like this has been resolved by manually starting 
the Zookeeper services on each server promptly so that the tickTime value does 
not timeout too quickly to heartbeat other peers. Hence, I increased the 
tickTime value to about 5 minutes to give some time for a node hosting 
Zookeeper to restart and autostart its service. This case seems fixed but I 
will double check again once more to be sure. I am using nssm 
(non-sucking-service-manager) to autostart Zookeeper. I will need to retest 
this once again using nssm to make sure zookeeper services are up and running.

Regards,
Adrian

Best regards,

Adrian Liew |  Consultant Application Developer
Avanade Malaysia Sdn. Bhd..| Consulting Services
(: Direct: +(603) 2382 5668
È: +6010-2288030


-Original Message-
From: Zheng Lin Edwin Yeo [mailto:edwinye...@gmail.com] 
Sent: Monday, October 5, 2015 10:02 AM
To: solr-user@lucene.apache.org
Subject: Re: Cannot connect to a zookeeper 3.4.6 instance via zkCli.cmd

Hi Adrian,

It's unlikely to be the firewall settings if it is failing intermittently.
More of a network issues.

The error says it's a connection time out, and since you say it happens only 
intermittently, I'm suspecting it could be network issues.
Have you check if the connection to the various servers are always up?

Regards,
Edwin


On 3 October 2015 at 00:22, Erick Erickson  wrote:

> Hmmm, there are usually a couple of ports that each ZK instance needs, 
> is it possible that you've got more than one process using one of 
> those ports?
>
> By default (I think), zookeeper uses "peer port + 1000" for its leader 
> election process, see:
> https://zookeeper.apache.org/doc/r3.3.3/zookeeperStarted.html
> the "Running Replicated Zookeeper" section.
>
> I'm not quite clear whether the above ZK2 port and ZK3 port are just 
> meant to indicate a single Zookeeper instance on a node or not so I 
> thought I'd check.
>
> Firewalls should always fail, not intermittently so I'm puzzled about 
> that
>
> Best,
> Erick
>
> On Fri, Oct 2, 2015 at 1:33 AM, Adrian Liew 
> wrote:
> > Hi Edwin,
> >
> > I have followed the standards recommended by the Zookeeper article. 
> > It
> seems to be working.
> >
> > Incidentally, I am facing intermittent issues whereby I am unable to
> connect to Zookeeper service via Solr's zkCli.bat command, even after 
> having setting automatic startup of my ZooKeeper service. I have 
> basically configured (non-sucking-service-manager) nssm to auto start 
> Solr with a dependency of Zookeeper to ensure both services are 
> running on startup for each Solr VM.
> >
> > Here is an example what I tried to run to connect to the ZK service:
> >
> > E:\solr-5.3.0\server\scripts\cloud-scripts>zkcli.bat -z 
> > 10.0.0.6:2183
> -cmd list
> > Exception in thread "main" org.apache.solr.common.SolrException:
> java.util.concu
> > rrent.TimeoutException: Could not connect to ZooKeeper 10.0.0.6:2183
> within 3000
> > 0 ms
> > at
> org.apache.solr.common.cloud.SolrZkClient.(SolrZkClient.java:18
> > 1)
> > at
> org.apache.solr.common.cloud.SolrZkClient.(SolrZkClient.java:11
> > 5)
> > at
> org.apache.solr.common.cloud.SolrZkClient.(SolrZkClient.java:10
> > 5)
> > at org.apache.solr.cloud.ZkCLI.main(ZkCLI.java:181)
> > Caused by: java.util.concurrent.TimeoutException: Could not connect 
> > to
> ZooKeeper
> >  10.0.0.6:2183 within 3 ms
> > at
> org.apache.solr.common.cloud.ConnectionManager.waitForConnected(Conne
> > ctionManager.java:208)
> > at
> org.apache.solr.common.cloud.SolrZkClient.(SolrZkClient.java:17
> > 3)
> > ... 3 more
> >
> >
> > Further to this I inspected the output shown in console window by
> zkServer.cmd:
> >
> > 2015-10-02 08:24:09,305 [myid:3] - WARN
> [WorkerSender[myid=3]:QuorumCnxManager@
> > 382] - Cannot open channel to 2 at election address /10.0.0.5:3888
> > java.net.SocketTimeoutException: connect timed out
> > at java.net.DualStackPlainSocketImpl.waitForConnect(Native
> Method)
> > at java.net.DualStackPlainSocketImpl.socketConnect(Unknown
> Source)
> > at java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
> > at java.net.AbstractPlainSocketImpl.connectToAddress(Unknown
> Source)
> > at java.net.AbstractPlainSocketImpl.connect(Unknown Source)
> > at java.net.PlainSocketImpl.connect(Unknown Source)
> > at java.net.SocksSocketImpl.connect(Unknown Source)
> > at java.net.Socket.connect(Unknown Source)
> > at
> org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(Quorum
> > CnxManager.java:368)
> > at
> org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxM
> > anager.java:341)
> > at
> org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$Worke
> > rSender.process(FastLeaderElection.java:449)
> > at
> 

Re: Filter first-components result in solr.SearchHandler

2015-10-06 Thread Erik Hatcher
Could you also provide an example of the type of request you want the client to 
make?

Note that `qf` is a (e)dismax query parser parameter, in case that’s 
conflicting for you.

—
Erik Hatcher, Senior Solutions Architect
http://www.lucidworks.com 




> On Oct 6, 2015, at 3:08 AM, aniljayanti  wrote:
> 
> Hi All,
> 
> I am workng on solr 5.2.1. I wrote my own component to get employee id's
> from first-component. I am trying to pass these id's to normal
> solr.SearchHandler ( class="solr.SearchHandler">) to filter the employee id's.
> 
> relevant request handler in solrconfig.xml file : 
> 
>  class="org.apache.solr.handler.component.ext.imiCustomPriority"/>
> 
>  
>
>  explicit
>  100
>  text
>
> 
>
> custom-priority
> 
>  
> 
> How I can pass employee id's in qf param correctly in query so that solr can
> use this while searching ? 
> 
> Suggestions are appreciated..
> 
> thanks in advance
> 
> AnilJayanti
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Filter-first-components-result-in-solr-SearchHandler-tp4232892.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Zookeeper HA with 3x ZK with Solr server nodes

2015-10-06 Thread Adrian Liew
Hi there,



I have 3 Solr server Azure VM nodes participating in SolrCloud with ZK 
installed on each of these nodes (to avoid a single point of failure with ZK 
for leader election). Each Solr server is hosted in a Windows Server 2012 R2 
environment. I was told by my peer that if one zookeeper service fails, the 
entire quorum fails. Hence if a quorum fails, does that mean it will not be 
able to elect the leader from the remaining 2 alive Solr servers,  even if ZK 
services are installed in each node?



I am yet to this out as this defeats the purpose of having a ZK installed on 
each server. I am afraid if one node fails, a leader cannot be elected with the 
remaining two available nodes. Correct me if I am wrong.



Regards,

Adrian




Re: Facet queries blow out the filterCache

2015-10-06 Thread Jeff Wartes

I dug far enough yesterday to find the GET_DOCSET, but not far enough to
find why. Thanks, a little context is really helpful sometimes.


So, starting with an empty filterCache...

http://localhost:8983/solr/techproducts/select?q=name:foo=1=true
=popularity

New values: lookups: 0, hits: 0, inserts: 1, size: 1

So for the reasons you explained, "inserts" is incremented for this new
search

http://localhost:8983/solr/techproducts/select?q=name:boo=1=true
=popularity

New values: inserts:lookups: 0, hits: 0, inserts 2, size: 2


Another new search, another new insert. No "lookups" though, so how does
it know name:boo wasn’t cached?

http://localhost:8983/solr/techproducts/select?q=name:boo=1=true
=popularity
New values: inserts:lookups: 1, hits: 1, inserts: 2, size: 2


But it clearly does know - when I repeat the search, I get both a lookup
and a hit. (and no insert) So is this just
a bug in the stats reporting, perhaps?


When I first started looking at this, it was in a solrcloud cluster, and
one interesting thing about that cluster is that it was configured with
the queryResultCache turned off, so let’s repeat the above experiment
without the queryResultCache. (I’m just commenting it out in the
techproducts config for this run.)


Starting with an empty filterCache...

http://localhost:8983/solr/techproducts/select?q=name:foo=1=true
=popularity
New values: lookups: 0, hits: 0, inserts: 1, size: 1

Same as before...

http://localhost:8983/solr/techproducts/select?q=name:boo=1=true
=popularity
New values: inserts:lookups: 0, hits: 0, inserts 2, size: 2

Same as before...

http://localhost:8983/solr/techproducts/select?q=name:boo=1=true
=popularity
New values: inserts: lookups: 0, hits: 0, inserts 3, size: 2

No cache hit! We get an insert instead, but it’s already in there, so the
size doesn’t change. So disabling the queryResultCache apparently causes
facet queries to be unable to use the filterCache?




I’m increasingly thinking that different use cases need different
filterCaches, rather than try to bundle every explicit or unexpected
use-case under one cache with one size and one regenerator.






On 10/6/15, 2:45 PM, "Chris Hostetter"  wrote:

>: So, no SolrCloud, default example config, about as basic as you get. I
>: didn’t even bother indexing any docs. Then I issued this query:
>: 
>: 
>http://localhost:8983/solr/techproducts/select?q=name:foo=1=tru
>e
>: =popularity=0=-1
>
>: This still causes an insert into the filterCache.
>
>the faceting component is a type of operation that indicates in the
>QueryCommand that it needs to GET_DOCSET for the set of all documents
>matching the query (independent of pagination) -- the point of this
>DocSet 
>is so the faceting logic can then compute the intersection of the set of
>all matching documents with the set of documents matching each facet
>constraint.  the cached DocSet will be re-used both within the context
>of the current request, and in future facet requests over the
>same query+filters.
>
>: The only real difference I’m noticing vs my solrcloud collection is that
>: repeating the query increments cache lookups and hits. It’s still odd
>: though, because issuing new distinct queries causes a reported insert,
>but
>: not a lookup, so the cache hit ratio is always exactly 1.
>
>i'm not following what you are saying at all ... can you give some
>concrete examples (ie: "starting with an empty cache i do this request,
>then i see these cache stats, then i do this identical/different query
>and 
>then the cache stats look like this...")
>
>
>
>-Hoss
>http://www.lucidworks.com/



Re: FieldCache?

2015-10-06 Thread Alessandro Benedetti
We should make some precision here,
When dealing with faceting , there are currently 2 main approaches :

1) *Enum Algorithm* - best for low cardinality value fields, it is based on
retrieving the term enum for all the terms in the index, and then
intersecting the related posting list with the query result set

2) *Un-Inverting Algorithms* - Best for high cardinality value fields, it
is based on uninverting the index, checking the value for each query result
document field, and counting the occurrences

Within the 2nd approach :

2a) *Doc Values * - It is better for dynamic indexes, built at indexing
time and stored on the disk ( setting the specific attribute for the field)
OR calculated at runtime thanks to the UnInvertedReader that will uninvert
to an in-memory structure that looks like DocValues

2b) *Uninverted field* - for index that changes less frequently . After the
removal related :
Only per segment field caches are available, which means you should be able
to use it using the fcs algorithm.

This should be the current situation,
I will take a look into details and let you know if I understood something
wrong.

Cheers



2015-10-06 5:03 GMT+01:00 William Bell :

> So the FieldCache was removed from Solr 5.
>
> What is the implication of this? Should we move all facets to DocValues
> when we have high cardinality (lots of values) ? Are we adding it back?
>
> Other ideas to improve performance?
>
> From Mike M:
>
> FieldCache is gone (moved to a dedicated UninvertingReader in the
> miscmodule).
> This means when you intend to sort on a field, you should index that field
> using doc values, which is much faster and less heap consuming than
> FieldCache.
>
> --
> Bill Bell
> billnb...@gmail.com
> cell 720-256-8076
>



-- 
--

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: FieldCache?

2015-10-06 Thread Alessandro Benedetti
For completeness this is the related issue :

https://issues.apache.org/jira/browse/SOLR-8096

Cheers

2015-10-06 11:21 GMT+01:00 Alessandro Benedetti 
:

> We should make some precision here,
> When dealing with faceting , there are currently 2 main approaches :
>
> 1) *Enum Algorithm* - best for low cardinality value fields, it is based
> on retrieving the term enum for all the terms in the index, and then
> intersecting the related posting list with the query result set
>
> 2) *Un-Inverting Algorithms* - Best for high cardinality value fields, it
> is based on uninverting the index, checking the value for each query result
> document field, and counting the occurrences
>
> Within the 2nd approach :
>
> 2a) *Doc Values * - It is better for dynamic indexes, built at indexing
> time and stored on the disk ( setting the specific attribute for the field)
> OR calculated at runtime thanks to the UnInvertedReader that will uninvert
> to an in-memory structure that looks like DocValues
>
> 2b) *Uninverted field* - for index that changes less frequently . After
> the removal related :
> Only per segment field caches are available, which means you should be
> able to use it using the fcs algorithm.
>
> This should be the current situation,
> I will take a look into details and let you know if I understood something
> wrong.
>
> Cheers
>
>
>
> 2015-10-06 5:03 GMT+01:00 William Bell :
>
>> So the FieldCache was removed from Solr 5.
>>
>> What is the implication of this? Should we move all facets to DocValues
>> when we have high cardinality (lots of values) ? Are we adding it back?
>>
>> Other ideas to improve performance?
>>
>> From Mike M:
>>
>> FieldCache is gone (moved to a dedicated UninvertingReader in the
>> miscmodule).
>> This means when you intend to sort on a field, you should index that field
>> using doc values, which is much faster and less heap consuming than
>> FieldCache.
>>
>> --
>> Bill Bell
>> billnb...@gmail.com
>> cell 720-256-8076
>>
>
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card - http://about.me/alessandro_benedetti
> Blog - http://alexbenedetti.blogspot.co.uk
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>



-- 
--

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: Pressed optimize and now SOLR is not indexing while optimize is going on

2015-10-06 Thread Toke Eskildsen
On Mon, 2015-10-05 at 17:26 -0400, Siddhartha Singh Sandhu wrote:
> Following up on that: Would having an SSD make considerable difference in
> speed?

Yes, but only to a point.

The UK Web Archive has done some tests on optimizing indexes on both
spinning drives and SSDs: 
https://github.com/ukwa/shine/tree/master/python/test-logs

With spinning drives, their machines maxed out on IOWait. With SSD, the
machine maxed out on CPU. That might sound great, but the problem is
that optimizing on a single shard is single threaded (at least for Solr
4.10.x), so if there is only a single shard on the machine, only 1 CPU
is running at full tilt. There is always a bottleneck.

What might help is that the SSD (probably) does not get bogged down by
the process, so it should be much better at handling other requests
while the optimization is running.

- Toke Eskildsen, State and University Library, Denmark




Re:

2015-10-06 Thread Alessandro Benedetti
>From Jetty documentation :

acceptQueueSizeThe size of the pending connection backlog. The exact
interpretation is JVM and operating system specific and you can ignore it.
Higher values allow more connections to wait pending an acceptor thread.
Because the exact interpretation is deployment dependent, it is best to
keep this value as the default unless there is a specific connection issue
for a specific OS that you need to address.

Why you decided to set it to 5000 ?

Cheers
2015-10-06 6:34 GMT+01:00 William Bell :

> What should this be set to?
>
> Do you set it with -Dsolr.jetty.https.acceptQueueSize=5000 ?
>
>  name="solr.jetty.https.acceptQueueSize" default="0"/>
>
> --
> Bill Bell
> billnb...@gmail.com
> cell 720-256-8076
>



-- 
--

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


indexing data to solrcloud with "implicit" is not distributing across cluster.

2015-10-06 Thread Steve
I’ve been unable to get solrcloud to distribute data across 4 solr nodes
with the “route.name=implicit”  feature of the collections API.

The nodes are live, and the graphs are green.  All the data (the “Films”
example data) shows up on one node, the node that received the CREATE
command.





My CREATE command is:

curl
http://host-192-168-0-60.openstacklocal:8081/solr/admin/collections?action=CREATE=CollectionFilms=2=implicit=shard-1,shard-2,shard-3,shard-4=2=configAlpha



solr version 5.3.1

zookeeper version 3.4.6

indexing with:

   cd /opt/solr/example/films;

/opt/solr/bin/post -c CollectionFilms -port 8081  films.json





Thanks,

strick


Solr 5.2.1 and spatial polygon searches

2015-10-06 Thread Lee Duhl
The following query runs fine on Solr 4.x, but errors with a "Couldn't parse 
shape " error message in Solr 5.2.1
geoloc:"INTERSECTS(POLYGON((-83.38434219360353 
42.51412013568205,-83.3474349975586 42.51196902987156,-83.3561897277832 
42.495390378152244,
-83.4001350402832 42.496149801777875,-83.38434219360353 42.51412013568205)))"

Solr 4 required the Spatial4J library to be installed in order for the above 
query to run.

Can Spatial4J be installed on Solr 5.2.1 or is there another library that needs 
to be installed for these types of queries to work?

Note: The above query is a simple "rectangle" polygon and is used for only this 
example. Bbox queries are not applicable as most of our queries generally use 
more complex polygons.

Thank You
Lee V. Duhl
Realcomp II Ltd.
Phone: (248) 699-9133
www.realcomp.com
www.moveinmichigan.com



Re: Solr 5.2.1 and spatial polygon searches

2015-10-06 Thread Alessandro Benedetti
Hi lee, shot in the dark, have you tried using the *WKT *syntax with range
spatial approach*?*

for example :
q=geoloc:["0 18" TO "18 100”] .

I am using it in 5.3

Cheers


On 6 October 2015 at 14:22, Lee Duhl  wrote:

> The following query runs fine on Solr 4.x, but errors with a "Couldn't
> parse shape " error message in Solr 5.2.1
> geoloc:"INTERSECTS(POLYGON((-83.38434219360353
> 42.51412013568205,-83.3474349975586 42.51196902987156,-83.3561897277832
> 42.495390378152244,
> -83.4001350402832 42.496149801777875,-83.38434219360353
> 42.51412013568205)))"
>
> Solr 4 required the Spatial4J library to be installed in order for the
> above query to run.
>
> Can Spatial4J be installed on Solr 5.2.1 or is there another library that
> needs to be installed for these types of queries to work?
>
> Note: The above query is a simple "rectangle" polygon and is used for only
> this example. Bbox queries are not applicable as most of our queries
> generally use more complex polygons.
>
> Thank You
> Lee V. Duhl
> Realcomp II Ltd.
> Phone: (248) 699-9133
> www.realcomp.com
> www.moveinmichigan.com
>
>


-- 
--

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: Zookeeper HA with 3x ZK with Solr server nodes

2015-10-06 Thread Alessandro Benedetti
When you have a ZK Ensemble a quorum of active nodes is necessary to have
the entire Ensemble to work ( elect leaders, manage the cluster topology
etc etc) .

The quorum is 50% living nodes +1 .
If you have an ensemble of 3 nodes, the quorum is 3/2 +1 = 2 nodes .
With an ensemble of 3 nodes, you can lose 1 and the ZK ensemble will
continue to work.

If you have an ensemble of 5 nodes, the quorum is 5/2 +1 = 3 nodes
With an ensemble of 5 nodes, you can lose 2 and the ZK ensemble will
continue to work.
ect ect

Cheers

2015-10-06 10:55 GMT+01:00 Adrian Liew :

> Hi there,
>
>
>
> I have 3 Solr server Azure VM nodes participating in SolrCloud with ZK
> installed on each of these nodes (to avoid a single point of failure with
> ZK for leader election). Each Solr server is hosted in a Windows Server
> 2012 R2 environment. I was told by my peer that if one zookeeper service
> fails, the entire quorum fails. Hence if a quorum fails, does that mean it
> will not be able to elect the leader from the remaining 2 alive Solr
> servers,  even if ZK services are installed in each node?
>
>
>
> I am yet to this out as this defeats the purpose of having a ZK installed
> on each server. I am afraid if one node fails, a leader cannot be elected
> with the remaining two available nodes. Correct me if I am wrong.
>
>
>
> Regards,
>
> Adrian
>
>
>


-- 
--

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: ??

2015-10-06 Thread Alessandro Benedetti
I would suggest you to write proper mail to this mailing list to get better
answers ...
Even the mail subject is a set of mystery ??? ...
The first thing I could suggest is to take a look to the youtube
presentation related :

https://www.youtube.com/watch?v=8JADOLMazs4

Now I can not take a look to the video.

Cheers



2015-10-06 5:32 GMT+01:00 William Bell :

>
> http://www.slideshare.net/lucidworks/high-performance-solr-and-jvm-tuning-strategies-used-for-map-quests-search-ahead-darren-spehr
>
> See ArrayBlockingQueue.
>
> What would this help with?
>
> --
> Bill Bell
> billnb...@gmail.com
> cell 720-256-8076
>



-- 
--

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: Filter first-components result in solr.SearchHandler

2015-10-06 Thread aniljayanti
Hi Erik,

thanks for your response, let me explain briefly.

i wanted to make 5 employee id's as a priority id's. so every time when i am
searching with specific keyword, then i want to append these 5 employee id's
as first 5 results to the search results.

example : 

let's take 3,5,6,8,9 are priority employee id's.
when i am searching with specific keyword then got 4 docs (employee id's are
1,2,4,7) as results. 
then i want to display the final result as below.

final result : 3,5,6,8,9,1,2,4,7

Please suggest me.

Thanks.

AnilJayanti



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Filter-first-components-result-in-solr-SearchHandler-tp4232892p4232926.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Solr 5.2.1 and spatial polygon searches

2015-10-06 Thread Lee Duhl
Alessandro

Thanks for the reply.

I'm not familiar with WKT syntax however the query sample you supplied below 
errors on both my 4.x and 5.2.1 servers with the following errors:
 "error": {
"msg": "org.apache.solr.search.SyntaxError: Cannot parse 'geoloc:[\"0 18\" 
TO \"18 100”] ': Encountered \"  \"100\\u201d \"\" at line 1, 
column 22.\nWas expecting one of:\n\"]\" ...\n\"}\" ...\n",
"code": 400
}

Looking further into my Solr4 setup it appears that JTS was also required for 
"Polygon" support: https://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4.

Thank You
Lee V. Duhl
Realcomp II Ltd.
Phone: (248) 699-9133
www.realcomp.com
www.moveinmichigan.com

-Original Message-
From: Alessandro Benedetti [mailto:benedetti.ale...@gmail.com] 
Sent: Tuesday, October 06, 2015 9:43 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr 5.2.1 and spatial polygon searches

Hi lee, shot in the dark, have you tried using the *WKT *syntax with range 
spatial approach*?*

for example :
q=geoloc:["0 18" TO "18 100”] .

I am using it in 5.3

Cheers


On 6 October 2015 at 14:22, Lee Duhl  wrote:

> The following query runs fine on Solr 4.x, but errors with a "Couldn't 
> parse shape " error message in Solr 5.2.1
> geoloc:"INTERSECTS(POLYGON((-83.38434219360353
> 42.51412013568205,-83.3474349975586 
> 42.51196902987156,-83.3561897277832
> 42.495390378152244,
> -83.4001350402832 42.496149801777875,-83.38434219360353
> 42.51412013568205)))"
>
> Solr 4 required the Spatial4J library to be installed in order for the 
> above query to run.
>
> Can Spatial4J be installed on Solr 5.2.1 or is there another library 
> that needs to be installed for these types of queries to work?
>
> Note: The above query is a simple "rectangle" polygon and is used for 
> only this example. Bbox queries are not applicable as most of our 
> queries generally use more complex polygons.
>
> Thank You
> Lee V. Duhl
> Realcomp II Ltd.
> Phone: (248) 699-9133
> www.realcomp.com
> www.moveinmichigan.com
>
>


--
--

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: Best Indexing Approaches - To max the throughput

2015-10-06 Thread Walter Underwood
This is at Chegg. One of our indexes is textbooks. These are expensive and 
don’t change very often. It is better to keep yesterday’s index than to drop a 
few important books.

We have occasionally had an error that happens with every book, like a new 
field that is not in the Solr schema. If we ignored errors with that, we’d have 
an empty index: delete all, add all (failing), commit.

With the fail fast and rollback, we can catch problems before they mess up the 
index.

Also, to pinpoint isolated problems, if there is an error in the batch, it 
re-submits that batch one at a time, so we get an accurate report of which 
document was rejected. I wrote that same thing back at Netflix, before SolrJ.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Oct 6, 2015, at 9:49 AM, Alessandro Benedetti  
> wrote:
> 
> Hi Walter,
> can you explain better your use case ?
> You index a batch of e-commerce products ( Solr documents) if one fails,
> you want to stop and invalidate the entire batch ( using the almost never
> used solr rollback, or manual deletion ?)
> And then log the exception indexing size.
> To then re-index the whole batch od docs ?
> 
> In this scenario, the ConcurrentUpdateSolrClient will not be ideal?
> Only curiosity.
> 
> Cheers
> 
> On 6 October 2015 at 17:29, Walter Underwood  wrote:
> 
>> It depends on the document. In a e-commerce search, you might want to fail
>> immediately and be notified. That is what we do, fail, rollback, and notify.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>> 
>>> On Oct 6, 2015, at 7:58 AM, Alessandro Benedetti <
>> benedetti.ale...@gmail.com> wrote:
>>> 
>>> mm one broken document in a batch should not break the entire batch ,
>>> right ( whatever approach used) ?
>>> Are you referring to the fact that you want to programmatically re-index
>>> the broken docs ?
>>> 
>>> Would be interesting to return the id of the broken docs along with the
>>> solr update response!
>>> 
>>> Cheers
>>> 
>>> 
>>> On 6 October 2015 at 15:30, Bill Dueber  wrote:
>>> 
 Just to add...my informal tests show that batching has way more
>> effect
 than solrj vs json.
 
 I haven't look at CUSC in a while, last time I looked it was impossible
>> to
 do anything smart about error handling, so check that out before you get
 too deeply into it. We use a strategy of sending a batch of json
>> documents,
 and if it returns an error sending each record one at a time until we
>> find
 the bad one and can log something useful.
 
 
 
 On Mon, Oct 5, 2015 at 12:07 PM, Alessandro Benedetti <
 benedetti.ale...@gmail.com> wrote:
 
> Thanks Erick,
> you confirmed my impressions!
> Thank you very much for the insights, an other opinion is welcome :)
> 
> Cheers
> 
> 2015-10-05 14:55 GMT+01:00 Erick Erickson :
> 
>> SolrJ tends to be faster for several reasons, not the least of which
>> is that it sends packets to Solr in a more efficient binary format.
>> 
>> Batching is critical. I did some rough tests using SolrJ and sending
>> docs one at a time gave a throughput of < 400 docs/second.
>> Sending 10 gave 2,300 or so. Sending 100 at a time gave
>> over 5,300 docs/second. Curiously, 1,000 at a time gave only
>> marginal improvement over 100. This was with a single thread.
>> YMMV of course.
>> 
>> CloudSolrClient is definitely the better way to go with SolrCloud,
>> it routes the docs to the correct leader instead of having the
>> node you send the docs to do the routing.
>> 
>> Best,
>> Erick
>> 
>> On Mon, Oct 5, 2015 at 4:57 AM, Alessandro Benedetti
>>  wrote:
>>> I was doing some studies and analysis, just wondering in your opinion
>> which
>>> one is the best approach to use to index in Solr to reach the best
>>> throughput possible.
>>> I know that a lot of factor are affecting Indexing time, so let's
 only
>>> focus in the feeding approach.
>>> Let's isolate different scenarios :
>>> 
>>> *Single Solr Infrastructure*
>>> 
>>> 1) Xml/Json batch request to /update IndexHandler (xml/json)
>>> 
>>> 2) SolrJ ConcurrentUpdateSolrClient ( javabin)
>>> I was thinking this to be the fastest approach for a multi threaded
>>> indexing application.
>>> Posting batch of docs if possible per request.
>>> 
>>> *Solr Cloud*
>>> 
>>> 1) Xml/Json batch request to /update IndexHandler(xml/json)
>>> 
>>> 2) SolrJ ConcurrentUpdateSolrClient ( javabin)
>>> 
>>> 3) CloudSolrClient ( javabin)
>>> it seems the best approach accordingly to this improvements [1]
>>> 
>>> What are your opinions ?
>>> 
>>> A bonus 

Re: Best Indexing Approaches - To max the throughput

2015-10-06 Thread Alessandro Benedetti
Hi Walter,
can you explain better your use case ?
You index a batch of e-commerce products ( Solr documents) if one fails,
you want to stop and invalidate the entire batch ( using the almost never
used solr rollback, or manual deletion ?)
And then log the exception indexing size.
To then re-index the whole batch od docs ?

In this scenario, the ConcurrentUpdateSolrClient will not be ideal?
Only curiosity.

Cheers

On 6 October 2015 at 17:29, Walter Underwood  wrote:

> It depends on the document. In a e-commerce search, you might want to fail
> immediately and be notified. That is what we do, fail, rollback, and notify.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
> > On Oct 6, 2015, at 7:58 AM, Alessandro Benedetti <
> benedetti.ale...@gmail.com> wrote:
> >
> > mm one broken document in a batch should not break the entire batch ,
> > right ( whatever approach used) ?
> > Are you referring to the fact that you want to programmatically re-index
> > the broken docs ?
> >
> > Would be interesting to return the id of the broken docs along with the
> > solr update response!
> >
> > Cheers
> >
> >
> > On 6 October 2015 at 15:30, Bill Dueber  wrote:
> >
> >> Just to add...my informal tests show that batching has way more
> effect
> >> than solrj vs json.
> >>
> >> I haven't look at CUSC in a while, last time I looked it was impossible
> to
> >> do anything smart about error handling, so check that out before you get
> >> too deeply into it. We use a strategy of sending a batch of json
> documents,
> >> and if it returns an error sending each record one at a time until we
> find
> >> the bad one and can log something useful.
> >>
> >>
> >>
> >> On Mon, Oct 5, 2015 at 12:07 PM, Alessandro Benedetti <
> >> benedetti.ale...@gmail.com> wrote:
> >>
> >>> Thanks Erick,
> >>> you confirmed my impressions!
> >>> Thank you very much for the insights, an other opinion is welcome :)
> >>>
> >>> Cheers
> >>>
> >>> 2015-10-05 14:55 GMT+01:00 Erick Erickson :
> >>>
>  SolrJ tends to be faster for several reasons, not the least of which
>  is that it sends packets to Solr in a more efficient binary format.
> 
>  Batching is critical. I did some rough tests using SolrJ and sending
>  docs one at a time gave a throughput of < 400 docs/second.
>  Sending 10 gave 2,300 or so. Sending 100 at a time gave
>  over 5,300 docs/second. Curiously, 1,000 at a time gave only
>  marginal improvement over 100. This was with a single thread.
>  YMMV of course.
> 
>  CloudSolrClient is definitely the better way to go with SolrCloud,
>  it routes the docs to the correct leader instead of having the
>  node you send the docs to do the routing.
> 
>  Best,
>  Erick
> 
>  On Mon, Oct 5, 2015 at 4:57 AM, Alessandro Benedetti
>   wrote:
> > I was doing some studies and analysis, just wondering in your opinion
>  which
> > one is the best approach to use to index in Solr to reach the best
> > throughput possible.
> > I know that a lot of factor are affecting Indexing time, so let's
> >> only
> > focus in the feeding approach.
> > Let's isolate different scenarios :
> >
> > *Single Solr Infrastructure*
> >
> > 1) Xml/Json batch request to /update IndexHandler (xml/json)
> >
> > 2) SolrJ ConcurrentUpdateSolrClient ( javabin)
> > I was thinking this to be the fastest approach for a multi threaded
> > indexing application.
> > Posting batch of docs if possible per request.
> >
> > *Solr Cloud*
> >
> > 1) Xml/Json batch request to /update IndexHandler(xml/json)
> >
> > 2) SolrJ ConcurrentUpdateSolrClient ( javabin)
> >
> > 3) CloudSolrClient ( javabin)
> > it seems the best approach accordingly to this improvements [1]
> >
> > What are your opinions ?
> >
> > A bonus observation should be for using some Map/Reduce big data
> >>> indexer,
> > but let's assume we don't have a big cluster of cpus, but the average
> > Indexer server.
> >
> >
> > [1]
> >
> 
> >>>
> >>
> https://lucidworks.com/blog/indexing-performance-solr-5-2-now-twice-fast/
> >
> >
> > Cheers
> >
> >
> > --
> > --
> >
> > Benedetti Alessandro
> > Visiting card : http://about.me/alessandro_benedetti
> >
> > "Tyger, tyger burning bright
> > In the forests of the night,
> > What immortal hand or eye
> > Could frame thy fearful symmetry?"
> >
> > William Blake - Songs of Experience -1794 England
> 
> >>>
> >>>
> >>>
> >>> --
> >>> --
> >>>
> >>> Benedetti Alessandro
> >>> Visiting card - http://about.me/alessandro_benedetti
> >>> Blog - http://alexbenedetti.blogspot.co.uk
> >>>
> >>> "Tyger, tyger burning bright

Re: Best Indexing Approaches - To max the throughput

2015-10-06 Thread Walter Underwood
It depends on the document. In a e-commerce search, you might want to fail 
immediately and be notified. That is what we do, fail, rollback, and notify.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Oct 6, 2015, at 7:58 AM, Alessandro Benedetti  
> wrote:
> 
> mm one broken document in a batch should not break the entire batch ,
> right ( whatever approach used) ?
> Are you referring to the fact that you want to programmatically re-index
> the broken docs ?
> 
> Would be interesting to return the id of the broken docs along with the
> solr update response!
> 
> Cheers
> 
> 
> On 6 October 2015 at 15:30, Bill Dueber  wrote:
> 
>> Just to add...my informal tests show that batching has way more effect
>> than solrj vs json.
>> 
>> I haven't look at CUSC in a while, last time I looked it was impossible to
>> do anything smart about error handling, so check that out before you get
>> too deeply into it. We use a strategy of sending a batch of json documents,
>> and if it returns an error sending each record one at a time until we find
>> the bad one and can log something useful.
>> 
>> 
>> 
>> On Mon, Oct 5, 2015 at 12:07 PM, Alessandro Benedetti <
>> benedetti.ale...@gmail.com> wrote:
>> 
>>> Thanks Erick,
>>> you confirmed my impressions!
>>> Thank you very much for the insights, an other opinion is welcome :)
>>> 
>>> Cheers
>>> 
>>> 2015-10-05 14:55 GMT+01:00 Erick Erickson :
>>> 
 SolrJ tends to be faster for several reasons, not the least of which
 is that it sends packets to Solr in a more efficient binary format.
 
 Batching is critical. I did some rough tests using SolrJ and sending
 docs one at a time gave a throughput of < 400 docs/second.
 Sending 10 gave 2,300 or so. Sending 100 at a time gave
 over 5,300 docs/second. Curiously, 1,000 at a time gave only
 marginal improvement over 100. This was with a single thread.
 YMMV of course.
 
 CloudSolrClient is definitely the better way to go with SolrCloud,
 it routes the docs to the correct leader instead of having the
 node you send the docs to do the routing.
 
 Best,
 Erick
 
 On Mon, Oct 5, 2015 at 4:57 AM, Alessandro Benedetti
  wrote:
> I was doing some studies and analysis, just wondering in your opinion
 which
> one is the best approach to use to index in Solr to reach the best
> throughput possible.
> I know that a lot of factor are affecting Indexing time, so let's
>> only
> focus in the feeding approach.
> Let's isolate different scenarios :
> 
> *Single Solr Infrastructure*
> 
> 1) Xml/Json batch request to /update IndexHandler (xml/json)
> 
> 2) SolrJ ConcurrentUpdateSolrClient ( javabin)
> I was thinking this to be the fastest approach for a multi threaded
> indexing application.
> Posting batch of docs if possible per request.
> 
> *Solr Cloud*
> 
> 1) Xml/Json batch request to /update IndexHandler(xml/json)
> 
> 2) SolrJ ConcurrentUpdateSolrClient ( javabin)
> 
> 3) CloudSolrClient ( javabin)
> it seems the best approach accordingly to this improvements [1]
> 
> What are your opinions ?
> 
> A bonus observation should be for using some Map/Reduce big data
>>> indexer,
> but let's assume we don't have a big cluster of cpus, but the average
> Indexer server.
> 
> 
> [1]
> 
 
>>> 
>> https://lucidworks.com/blog/indexing-performance-solr-5-2-now-twice-fast/
> 
> 
> Cheers
> 
> 
> --
> --
> 
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
> 
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
> 
> William Blake - Songs of Experience -1794 England
 
>>> 
>>> 
>>> 
>>> --
>>> --
>>> 
>>> Benedetti Alessandro
>>> Visiting card - http://about.me/alessandro_benedetti
>>> Blog - http://alexbenedetti.blogspot.co.uk
>>> 
>>> "Tyger, tyger burning bright
>>> In the forests of the night,
>>> What immortal hand or eye
>>> Could frame thy fearful symmetry?"
>>> 
>>> William Blake - Songs of Experience -1794 England
>>> 
>> 
>> 
>> 
>> --
>> Bill Dueber
>> Library Systems Programmer
>> University of Michigan Library
>> 
> 
> 
> 
> -- 
> --
> 
> Benedetti Alessandro
> Visiting card - http://about.me/alessandro_benedetti
> Blog - http://alexbenedetti.blogspot.co.uk
> 
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
> 
> William Blake - Songs of Experience -1794 England



Re: indexing data to solrcloud with "implicit" is not distributing across cluster.

2015-10-06 Thread Steve
Thanks Shawn, that fixed it !

The documentation int the Collections API says  "The value can be ...
*implicit*, which uses an internal default hash".
I think most people would assume the "hash" would be used to route the
data.
Meanwhile the description of CompositID in the "Document Routing" section
only discusses how modify your document IDs, which I did not want to do.

thanks again,
.strick



On Tue, Oct 6, 2015 at 8:15 AM, Shawn Heisey  wrote:

> On 10/6/2015 7:58 AM, Steve wrote:
> > I’ve been unable to get solrcloud to distribute data across 4 solr nodes
> > with the “route.name=implicit”  feature of the collections API.
> >
> > The nodes are live, and the graphs are green.  All the data (the “Films”
> > example data) shows up on one node, the node that received the CREATE
> > command.
>
> A better name for the implicit router is "manual."  The implicit router
> doesn't actually route.  It assumes that you know what you are doing and
> have sent the request to the shard where you want it to be indexed.
>
> You want the compositeId router.
>
> Even though the name "implicit" makes sense in the context of Solr
> *code*, it is a confusing name when it comes to user expectations.
> You're not the first one to be confused by this, which is why I opened
> this issue:
>
> https://issues.apache.org/jira/browse/SOLR-6630
>
> Thanks,
> Shawn
>
>