date:20180926

Re: Adding docs to solr - slows down searcher requests

2018-09-26 Thread ashoknix

Thank You Erick!  





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Solr Search Special Characters

2018-09-26 Thread Rathor, Piyush (US - Philadelphia)

Hi All,



We are facing some issues in search with special characters. Can you please 
help in query if the search is done using following characters:

• “&”

   Example – Tata & Sons

• AND

   Example – Tata AND Sons

• (

   Example – People (Pvt) Ltd

• )

   Example – People (Pvt) Ltd





Thanks & Regards

Piyush Rathor

Consultant

Deloitte Digital (Salesforce.com / Force.com)

Deloitte Consulting Pvt. Ltd.

Office: +1 (615) 209 4980

Mobile : +1 (302) 397 1491

prat...@deloitte.com | www.deloitte.com



Please consider the environment before printing.



This message (including any attachments) contains confidential information 
intended for a specific individual and purpose, and is protected by law. If you 
are not the intended recipient, you should delete this message and any 
disclosure, copying, or distribution of this message, or the taking of any 
action based on it, by you is strictly prohibited.

v.E.1

Re: Making Solr Indexing Errors Visible

2018-09-26 Thread Shawn Heisey


On 9/26/2018 2:39 PM, Terry Steichen wrote:

To the best of my knowledge, I'm not using SolrJ at all.  Just
Solr-out-of-the-box.  In this case, if I understand you below, it
"should indicate an error status"


I think you'd know if you were using SolrJ directly.  You'd have written 
the indexing program, or whoever DID write it would likely indicate that 
they used SolrJ to talk to Solr.  I was surprised to learn that 
SimplePostTool does NOT use SolrJ ... it uses the HTTP capability built 
into Java.



Let me try to clarify a bit - I'm just using bin/post to index the files
in a directory.  That indexing process produces a lengthy screen display
of files that were indexed.  (I realize this isn't production-quality,
but I'm not ready for production just yet, so that should be OK.)


If you check your index, are you missing files that bin/post said were 
indexed?  Have you looked in that kind of detail?


The post tool should indicate that an error occurred, and if there was 
any text in the response about the error, it should be displayed.  I was 
looking at the 7.4 code branch.  I didn't see anything about which Solr 
version you're running.


I have not spent any real time using bin/post.  It was part of a class 
that I attended as part of Lucene Revolution in 2010, but I do not 
recall what the output was.  It was all pre-designed and tested so it 
was known to work before I received it.  No errors occurred when I ran 
the script included with the class materials.



But no errors are shown (even though there have to be because the totals
indexed is less than the directory totals).

Are you saying I can't use post (to verify correct indexing), but that I
have to write custom software to accomplish that?


If you want errors detected programmatically, you'll need to write the 
indexing program.  The simple post tool won't report errors to anything 
that calls it, it will just log them.



And that there's no solr variable I can define that will do a kind of
"verbose" to show that?


If Solr returned errors during the indexing, then they will show up in 
the solr.log file, or possibly one of the rotated versions of that 
logfile.  You can also see them in the admin UI Logging tab if Solr 
hasn't been restarted, but the logfile is generally a better way to find 
them.  If you're not seeing errors there, then maybe something went 
wrong with bin/post.


I notice in a later message you indicate that you're indexing PDF and 
DOC files.  When those kinds of files are sent with bin/post, they will 
normally end up in the Extracting Request Handler, also known as SolrCell.


It is highly recommended that the Extracting Request Handler never be 
used in production.  That software embeds Tika inside Solr.  Tika is 
known to explode spectacularly when it gets a file it doesn't know how 
to handle.  PDF files in particular seem to trigger this behavior, but 
other formats can cause it as well.  If Tika is running inside Solr when 
that happens, Solr will also explode, and then you no longer have a 
search engine on that machine.  A better option is to include Tika in an 
indexing program that you write, so if it explodes, Solr stays running.


Thanks,
Shawn

Re: Making Solr Indexing Errors Visible

2018-09-26 Thread Terry Steichen

Alex,

Please look at my embedded responses to your questions.

Terry


On 09/26/2018 04:57 PM, Alexandre Rafalovitch wrote:
> The challenge here is to figure out exactly what you are doing,
> because the original description could have been 10 different things.
>
> So:
> 1) You are using bin/post command (we just found this out)
No, I said that at the outset.  And repeated it.
> 2) You are indexing a bunch of files (what format? all same or different?)
I also said I was indexing a mixture of pdf and doc files
> 3) You are indexing them into a Schema supposedly ready for those
> files (which one?)
I'm using the managed-schema, the data-driven approach
> 4) You think some of them are not in in Solr (how do you know that?
> how do you know that some are? why do you not know _which_ of the
> files are not indexed?)
I thought I made it very clear (twice) that I find that the list of
indexed files is 10% fewer than those in the directory holding the files
being indexed.  And I said that I don't know which are not getting
indexed because I am not getting error messages.
> 5) You are asking whether the error message should have told you if
> there is a problem with indexing (normally yes, but maybe there are
> some edge cases).
That's my question - why am I not getting error messages.  That's the
whole point of my query to the list.
>
> I've put the questions in brackets. I would focus on looking at
> questions in 4) first as they roughly bisect the problem. But other
> things are important too.
>
> I hope this helps,
> Alex.
>
>
> On 26 September 2018 at 16:39, Terry Steichen  wrote:
>> Shawn,
>>
>> To the best of my knowledge, I'm not using SolrJ at all.  Just
>> Solr-out-of-the-box.  In this case, if I understand you below, it
>> "should indicate an error status"
>>
>> But it doesn't.
>>
>> Let me try to clarify a bit - I'm just using bin/post to index the files
>> in a directory.  That indexing process produces a lengthy screen display
>> of files that were indexed.  (I realize this isn't production-quality,
>> but I'm not ready for production just yet, so that should be OK.)
>>
>> But no errors are shown (even though there have to be because the totals
>> indexed is less than the directory totals).
>>
>> Are you saying I can't use post (to verify correct indexing), but that I
>> have to write custom software to accomplish that?
>>
>> And that there's no solr variable I can define that will do a kind of
>> "verbose" to show that?
>>
>> And that such errors will not show up in any of solr's log files?
>>
>> Hard to believe (but what is, is, I guess).
>>
>> Terry
>>
>> On 09/26/2018 03:49 PM, Shawn Heisey wrote:
>>> On 9/26/2018 1:23 PM, Terry Steichen wrote:
 I'm pretty sure this was covered earlier.  But I can't find references
 to it.  The question is how to make indexing errors clear and obvious.
>>> If there's an indexing error and you're NOT using the concurrent
>>> client in SolrJ, the response that Solr returns should indicate an
>>> error status.  ConcurrentUpdateSolrClient gets those errors and
>>> swallows them so the calling program never knows they occurred.
>>>
 (I find that there are maybe 10% more files in a directory than end up
 in the index.  I presume they were indexing errors, but I have no idea
 which ones or what might have caused the error.)  As I recall, Solr's
 post tool doesn't give any errors when indexing.  I (vaguely) recall
 that there's a way (through the logs?) to overcome this and show the
 errors.  Or maybe it's that you have to do the indexing outside of Solr?
>>> The simple post tool is not really meant for production use.  It is a
>>> simple tool for interactive testing.
>>>
>>> I don't see anything in SimplePostTool for changing the program's exit
>>> status when an error is encountered during program operation.  If an
>>> error is encountered during the upload, a message would be logged to
>>> stderr, but you wouldn't be able to rely on the program's exit status
>>> to indicate an error.  To get that, you will need to write the
>>> indexing software.
>>>
>>> Thanks,
>>> Shawn
>>>
>>>

Re: Rule-based replication or sharing

2018-09-26 Thread Noble Paul

Yes, it uses a the autoscaling policies to achieve the same. Please refer
to the documentation here
https://lucene.apache.org/solr/guide/7_5/solrcloud-autoscaling-policy-preferences.html

On Thu, Sep 27, 2018, 02:11 Chuck Reynolds  wrote:

> Noble,
>
> Are you saying in the latest version of Solr that this would work with
> three instances of Solr running on each server?
>
> If so how?
>
> Thanks again for your help.
>
> On 9/26/18, 9:11 AM, "Noble Paul"  wrote:
>
> I'm not sure if it is pertinent to ask you to move to the latest Solr
> which has the policy based replica placement. Unfortunately, I don't
> have any other solution I can think of
>
> On Wed, Sep 26, 2018 at 11:46 PM Chuck Reynolds <
> creyno...@ancestry.com> wrote:
> >
> > Noble,
> >
> > So other than manually moving replicas of shard do you have a
> suggestion of how one might accomplish the multiple availability zone with
> multiple instances of Solr running on each server?
> >
> > Thanks
> >
> > On 9/26/18, 12:56 AM, "Noble Paul"  wrote:
> >
> > The rules suggested by Steve is correct. I tested it locally and
> I got
> > the same errors. That means a bug exists probably.
> > All the new development efforts are invested in the new policy
> feature
> > .
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_7-5F4_solrcloud-2Dautoscaling-2Dpolicy-2Dpreferences.html=DwIFaQ=kKqjBR9KKWaWpMhASkPbOg=J-2s3b-3-OTA0o6bGDhJXAQlB5Y3s4rOUxlh_78DJl0=yXVYNcm-dqN_lucLyuQI38EZfK4f8l4828Ty53e4plM=D1vfu3bOu_hOGAU2CIKPwqBTPkYiBeK1kOUoFnQZpKA=
> >
> > The old one is going to be deprecated pretty soon. So, I'm not
> sure if
> > we should be investing our resources here
> > On Wed, Sep 26, 2018 at 1:23 PM Chuck Reynolds <
> creyno...@ancestry.com> wrote:
> > >
> > > Shawn,
> > >
> > > Thanks for the info. We’ve been running this way for the past
> 4 years.
> > >
> > > We were running on very large hardware, 20 physical cores with
> 256 gigs of ram with 3 billion document and it was the only way we could
> take advantage of the hardware.
> > >
> > > Running 1 Solr instance per server never gave us the
> throughput we needed.
> > >
> > > So I somewhat disagree with your statement because our test
> proved otherwise.
> > >
> > > Thanks for the info.
> > >
> > > Sent from my iPhone
> > >
> > > > On Sep 25, 2018, at 4:19 PM, Shawn Heisey <
> apa...@elyograg.org> wrote:
> > > >
> > > >> On 9/25/2018 9:21 AM, Chuck Reynolds wrote:
> > > >> Each server has three instances of Solr running on it so
> every instance on the server has to be in the same replica set.
> > > >
> > > > You should be running exactly one Solr instance per server.
> When evaluating rules for replica placement, SolrCloud will treat each
> instance as completely separate from all others, including others on the
> same machine.  It will not know that those three instances are on the same
> machine.  One Solr instance can handle MANY indexes.
> > > >
> > > > There is only ONE situation where it makes sense to run
> multiple instances per machine, and in my strong opinion, even that
> situation should not be handled with multiple instances. That situation is
> this:  When running one instance would require a REALLY large heap.
> Garbage collection pauses can become extreme in that situation, so some
> people will run multiple instances that each have a smaller heap, and
> divide their indexes between them. In my opinion, when you have enough
> index data on an instance that it requires a huge heap, instead of running
> two or more instances on one server, it's time to add more servers.
> > > >
> > > > Thanks,
> > > > Shawn
> > > >
> >
> >
> >
> > --
> > -
> > Noble Paul
> >
> >
>
>
> --
> -
> Noble Paul
>
>
>

Re: Java version 11 for solr 7.5?

2018-09-26 Thread Jeff Courtade

This is true.

I am thinking isf solr says 8 and up it really is 8 and up there is no
other reference I find to not using G1 collection.

The java support right now for old versions is really a mess. Currently if
you want ongoing support patches without an Oracle support contract the
only way to achieve that in October 2018 is Oracle 11.

If you are doing production or commercial work you have to use openjdk or
buy a license. Such a mess

On Wed, Sep 26, 2018, 4:04 PM Christopher Schultz <
ch...@christopherschultz.net> wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA256
>
> Jeff,
>
> On 9/26/18 11:35, Jeff Courtade wrote:
> > My concern with using g1 is solely based on finding this. Does
> > anyone have any information on this?
> >
> > https://wiki.apache.org/lucene-java/JavaBugs#Oracle_Java_.2F_Sun_Java_
> .2F_OpenJDK_Bugs
> >
> >  "Do not, under any circumstances, run Lucene with the G1 garbage
> > collector. Lucene's test suite fails with the G1 garbage collector
> > on a regular basis, including bugs that cause index corruption.
> > There is no person on this planet that seems to understand such
> > bugs (see https://bugs.openjdk.java.net/browse/JDK-8038348, open
> > for over a year), so don't count on the situation changing soon.
> > This information is not out of date, and don't think that the next
> > oracle java release will fix the situation."
>
> That language is 3 years old and likely just hasn't been updated after
> it was no longer relevant. Also, it isn't attributed to anyone in
> particular (it's anonymous), so ... maybe it was one person's opinion
> and not a project-initiated warning.
>
> - -chris
>
> > On Wed, Sep 26, 2018 at 11:08 AM Walter Underwood
> >  wrote:
> >
> >> We’ve been running G1 in prod for at least 18 months. Our biggest
> >> cluster is 48 machines, each with 36 CPUs, running 6.6.2. We also
> >> run it on our 4.10.4 master/slave cluster.
> >>
> >> wunder Walter Underwood wun...@wunderwood.org
> >> http://observer.wunderwood.org/  (my blog)
> >>
> >>> On Sep 26, 2018, at 7:37 AM, Jeff Courtade
> >>> 
> >> wrote:
> >>>
> >>> Thanks for that... I am just starting to look at this I was
> >>> unaware of the license debacle.
> >>>
> >>> Automated testing up to 10 is great.
> >>>
> >>> I am still curious about the GC1 being supported now...
> >>>
> >>> On Wed, Sep 26, 2018 at 10:25 AM Zisis T. 
> >>> wrote:
> >>>
>  Jeff Courtade wrote
> > Can we use GC1 garbage collection yet or do we still need
> > to use CMS?
> 
>  I believe you should be safe to go with G1. We've applied it
>  in in a
> >> Solr
>  6.6 cluster with 10 shards, 3 replicas per shard and an index
>  of about 500GB (1,5T counting all replicas) and it works
>  extremely well (throughput > 99%). The use-case includes
>  complex search queries and faceting. There is also this post
>  you can use as a starting point
> 
> 
> >> http://blog.cloudera.com/blog/2017/06/apache-solr-memory-tuning-for-p
> roduction/
> 
> 
> 
> 
> 
> 
> >>
> - --
>  Sent from:
>  http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
> 
> >>> --
> >>>
> >>> Jeff Courtade M: 240.507.6116 <(240)%20507-6116>
> >>
> >> --
> >
> > Jeff Courtade M: 240.507.6116
> >
> -BEGIN PGP SIGNATURE-
> Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
>
> iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlur5i0ACgkQHPApP6U8
> pFioLw/9HzPmNo1wtsfZDIsJjXy4i+F0YYqBFKRqXcPH8mLgpxicE5enRVI9he6p
> 1Z3Mz0wzFj/H91eWktmGyNSKSmFjkYI2IgCBrsZv1gPDFn3mI3TwapJgTR0J4GAg
> wXB/9GRHuCCTz7qvfQexBOwOt25OKVOhcvNFVI8bxV0hFl58Nlo56Qzt33X/JS32
> jH2jIlz77pal1t5ZhnXJwCSWQyWsLnr5GtoxDisvvOl1o3Ey/WIllvCe8x7M+PvA
> 0/DIK/5niTSCwcv0LVCPIWsE/HCjsSWfdhnhtTnu1088OTKwb2dsa7wyBJItZUzw
> fCTcmcGclViGUa2QAnXNFiVPj1y0PhFxAPMCU6mWPerCSH6cYn5neicsp2AYovoj
> dRcs4LGrGf0S7PVJBq/DQdb44XbzvFkkp2SjS9WAnLpBv7RwP4bWfDvMCJsZWJOU
> 8J2r4ZbkVUjByQ3mAXMZN7bKC6hHBQLLzAwodloAV0OWHJ+Io96flTclDRPt4N6e
> J8olEQezDKcgkZDg0GV8I9WxUzeTHI+QvnZxUzwsT/sJUPgxjSDjHlous5HU29ay
> 6lynoEjVFJd4yYAwh6gaRPMw34xKFT6a62D6bDmcL0MqPCpbcbOny+kgx0k7bzl5
> FNsapJ5vCIaG0/tPTuWEY/jaqmhNNznXDr+sEX5l8Sk1ZQz8+/U=
> =Y8qg
> -END PGP SIGNATURE-
>

Re: Making Solr Indexing Errors Visible

2018-09-26 Thread Alexandre Rafalovitch

The challenge here is to figure out exactly what you are doing,
because the original description could have been 10 different things.

So:
1) You are using bin/post command (we just found this out)
2) You are indexing a bunch of files (what format? all same or different?)
3) You are indexing them into a Schema supposedly ready for those
files (which one?)
4) You think some of them are not in in Solr (how do you know that?
how do you know that some are? why do you not know _which_ of the
files are not indexed?)
5) You are asking whether the error message should have told you if
there is a problem with indexing (normally yes, but maybe there are
some edge cases).

I've put the questions in brackets. I would focus on looking at
questions in 4) first as they roughly bisect the problem. But other
things are important too.

I hope this helps,
Alex.


On 26 September 2018 at 16:39, Terry Steichen  wrote:
> Shawn,
>
> To the best of my knowledge, I'm not using SolrJ at all.  Just
> Solr-out-of-the-box.  In this case, if I understand you below, it
> "should indicate an error status"
>
> But it doesn't.
>
> Let me try to clarify a bit - I'm just using bin/post to index the files
> in a directory.  That indexing process produces a lengthy screen display
> of files that were indexed.  (I realize this isn't production-quality,
> but I'm not ready for production just yet, so that should be OK.)
>
> But no errors are shown (even though there have to be because the totals
> indexed is less than the directory totals).
>
> Are you saying I can't use post (to verify correct indexing), but that I
> have to write custom software to accomplish that?
>
> And that there's no solr variable I can define that will do a kind of
> "verbose" to show that?
>
> And that such errors will not show up in any of solr's log files?
>
> Hard to believe (but what is, is, I guess).
>
> Terry
>
> On 09/26/2018 03:49 PM, Shawn Heisey wrote:
>> On 9/26/2018 1:23 PM, Terry Steichen wrote:
>>> I'm pretty sure this was covered earlier.  But I can't find references
>>> to it.  The question is how to make indexing errors clear and obvious.
>>
>> If there's an indexing error and you're NOT using the concurrent
>> client in SolrJ, the response that Solr returns should indicate an
>> error status.  ConcurrentUpdateSolrClient gets those errors and
>> swallows them so the calling program never knows they occurred.
>>
>>> (I find that there are maybe 10% more files in a directory than end up
>>> in the index.  I presume they were indexing errors, but I have no idea
>>> which ones or what might have caused the error.)  As I recall, Solr's
>>> post tool doesn't give any errors when indexing.  I (vaguely) recall
>>> that there's a way (through the logs?) to overcome this and show the
>>> errors.  Or maybe it's that you have to do the indexing outside of Solr?
>>
>> The simple post tool is not really meant for production use.  It is a
>> simple tool for interactive testing.
>>
>> I don't see anything in SimplePostTool for changing the program's exit
>> status when an error is encountered during program operation.  If an
>> error is encountered during the upload, a message would be logged to
>> stderr, but you wouldn't be able to rely on the program's exit status
>> to indicate an error.  To get that, you will need to write the
>> indexing software.
>>
>> Thanks,
>> Shawn
>>
>>
>

Re: Making Solr Indexing Errors Visible

2018-09-26 Thread Terry Steichen

Shawn,

To the best of my knowledge, I'm not using SolrJ at all.  Just
Solr-out-of-the-box.  In this case, if I understand you below, it
"should indicate an error status" 

But it doesn't.

Let me try to clarify a bit - I'm just using bin/post to index the files
in a directory.  That indexing process produces a lengthy screen display
of files that were indexed.  (I realize this isn't production-quality,
but I'm not ready for production just yet, so that should be OK.)

But no errors are shown (even though there have to be because the totals
indexed is less than the directory totals).

Are you saying I can't use post (to verify correct indexing), but that I
have to write custom software to accomplish that? 

And that there's no solr variable I can define that will do a kind of
"verbose" to show that?

And that such errors will not show up in any of solr's log files?

Hard to believe (but what is, is, I guess).

Terry

On 09/26/2018 03:49 PM, Shawn Heisey wrote:
> On 9/26/2018 1:23 PM, Terry Steichen wrote:
>> I'm pretty sure this was covered earlier.  But I can't find references
>> to it.  The question is how to make indexing errors clear and obvious.
>
> If there's an indexing error and you're NOT using the concurrent
> client in SolrJ, the response that Solr returns should indicate an
> error status.  ConcurrentUpdateSolrClient gets those errors and
> swallows them so the calling program never knows they occurred.
>
>> (I find that there are maybe 10% more files in a directory than end up
>> in the index.  I presume they were indexing errors, but I have no idea
>> which ones or what might have caused the error.)  As I recall, Solr's
>> post tool doesn't give any errors when indexing.  I (vaguely) recall
>> that there's a way (through the logs?) to overcome this and show the
>> errors.  Or maybe it's that you have to do the indexing outside of Solr?
>
> The simple post tool is not really meant for production use.  It is a
> simple tool for interactive testing.
>
> I don't see anything in SimplePostTool for changing the program's exit
> status when an error is encountered during program operation.  If an
> error is encountered during the upload, a message would be logged to
> stderr, but you wouldn't be able to rely on the program's exit status
> to indicate an error.  To get that, you will need to write the
> indexing software.
>
> Thanks,
> Shawn
>
>

Re: Java version 11 for solr 7.5?

2018-09-26 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Jeff,

On 9/26/18 11:35, Jeff Courtade wrote:
> My concern with using g1 is solely based on finding this. Does
> anyone have any information on this?
> 
> https://wiki.apache.org/lucene-java/JavaBugs#Oracle_Java_.2F_Sun_Java_
.2F_OpenJDK_Bugs
>
>  "Do not, under any circumstances, run Lucene with the G1 garbage
> collector. Lucene's test suite fails with the G1 garbage collector
> on a regular basis, including bugs that cause index corruption.
> There is no person on this planet that seems to understand such
> bugs (see https://bugs.openjdk.java.net/browse/JDK-8038348, open
> for over a year), so don't count on the situation changing soon.
> This information is not out of date, and don't think that the next
> oracle java release will fix the situation."

That language is 3 years old and likely just hasn't been updated after
it was no longer relevant. Also, it isn't attributed to anyone in
particular (it's anonymous), so ... maybe it was one person's opinion
and not a project-initiated warning.

- -chris

> On Wed, Sep 26, 2018 at 11:08 AM Walter Underwood
>  wrote:
> 
>> We’ve been running G1 in prod for at least 18 months. Our biggest
>> cluster is 48 machines, each with 36 CPUs, running 6.6.2. We also
>> run it on our 4.10.4 master/slave cluster.
>> 
>> wunder Walter Underwood wun...@wunderwood.org 
>> http://observer.wunderwood.org/  (my blog)
>> 
>>> On Sep 26, 2018, at 7:37 AM, Jeff Courtade
>>> 
>> wrote:
>>> 
>>> Thanks for that... I am just starting to look at this I was
>>> unaware of the license debacle.
>>> 
>>> Automated testing up to 10 is great.
>>> 
>>> I am still curious about the GC1 being supported now...
>>> 
>>> On Wed, Sep 26, 2018 at 10:25 AM Zisis T. 
>>> wrote:
>>> 
 Jeff Courtade wrote
> Can we use GC1 garbage collection yet or do we still need
> to use CMS?
 
 I believe you should be safe to go with G1. We've applied it
 in in a
>> Solr
 6.6 cluster with 10 shards, 3 replicas per shard and an index
 of about 500GB (1,5T counting all replicas) and it works
 extremely well (throughput > 99%). The use-case includes
 complex search queries and faceting. There is also this post
 you can use as a starting point
 
 
>> http://blog.cloudera.com/blog/2017/06/apache-solr-memory-tuning-for-p
roduction/





>> 
- --
 Sent from:
 http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
 
>>> --
>>> 
>>> Jeff Courtade M: 240.507.6116 <(240)%20507-6116>
>> 
>> --
> 
> Jeff Courtade M: 240.507.6116
> 
-BEGIN PGP SIGNATURE-
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlur5i0ACgkQHPApP6U8
pFioLw/9HzPmNo1wtsfZDIsJjXy4i+F0YYqBFKRqXcPH8mLgpxicE5enRVI9he6p
1Z3Mz0wzFj/H91eWktmGyNSKSmFjkYI2IgCBrsZv1gPDFn3mI3TwapJgTR0J4GAg
wXB/9GRHuCCTz7qvfQexBOwOt25OKVOhcvNFVI8bxV0hFl58Nlo56Qzt33X/JS32
jH2jIlz77pal1t5ZhnXJwCSWQyWsLnr5GtoxDisvvOl1o3Ey/WIllvCe8x7M+PvA
0/DIK/5niTSCwcv0LVCPIWsE/HCjsSWfdhnhtTnu1088OTKwb2dsa7wyBJItZUzw
fCTcmcGclViGUa2QAnXNFiVPj1y0PhFxAPMCU6mWPerCSH6cYn5neicsp2AYovoj
dRcs4LGrGf0S7PVJBq/DQdb44XbzvFkkp2SjS9WAnLpBv7RwP4bWfDvMCJsZWJOU
8J2r4ZbkVUjByQ3mAXMZN7bKC6hHBQLLzAwodloAV0OWHJ+Io96flTclDRPt4N6e
J8olEQezDKcgkZDg0GV8I9WxUzeTHI+QvnZxUzwsT/sJUPgxjSDjHlous5HU29ay
6lynoEjVFJd4yYAwh6gaRPMw34xKFT6a62D6bDmcL0MqPCpbcbOny+kgx0k7bzl5
FNsapJ5vCIaG0/tPTuWEY/jaqmhNNznXDr+sEX5l8Sk1ZQz8+/U=
=Y8qg
-END PGP SIGNATURE-

Re: Making Solr Indexing Errors Visible

2018-09-26 Thread Shawn Heisey


On 9/26/2018 1:23 PM, Terry Steichen wrote:

I'm pretty sure this was covered earlier.  But I can't find references
to it.  The question is how to make indexing errors clear and obvious.


If there's an indexing error and you're NOT using the concurrent client 
in SolrJ, the response that Solr returns should indicate an error 
status.  ConcurrentUpdateSolrClient gets those errors and swallows them 
so the calling program never knows they occurred.



(I find that there are maybe 10% more files in a directory than end up
in the index.  I presume they were indexing errors, but I have no idea
which ones or what might have caused the error.)  As I recall, Solr's
post tool doesn't give any errors when indexing.  I (vaguely) recall
that there's a way (through the logs?) to overcome this and show the
errors.  Or maybe it's that you have to do the indexing outside of Solr?


The simple post tool is not really meant for production use.  It is a 
simple tool for interactive testing.


I don't see anything in SimplePostTool for changing the program's exit 
status when an error is encountered during program operation.  If an 
error is encountered during the upload, a message would be logged to 
stderr, but you wouldn't be able to rely on the program's exit status to 
indicate an error.  To get that, you will need to write the indexing 
software.


Thanks,
Shawn

Making Solr Indexing Errors Visible

2018-09-26 Thread Terry Steichen

I'm pretty sure this was covered earlier.  But I can't find references
to it.  The question is how to make indexing errors clear and obvious. 
(I find that there are maybe 10% more files in a directory than end up
in the index.  I presume they were indexing errors, but I have no idea
which ones or what might have caused the error.)  As I recall, Solr's
post tool doesn't give any errors when indexing.  I (vaguely) recall
that there's a way (through the logs?) to overcome this and show the
errors.  Or maybe it's that you have to do the indexing outside of Solr?

Terry Steichen

Realtime get not always returning existing data

2018-09-26 Thread sgaron cse

Hey all,

We're trying to use SOLR for our document store and are facing some issues
with the Realtime Get api. Basically, we're doing an api call from multiple
endpoint to retrieve configuration data. The document that we are
retrieving does not change at all but sometimes the API returns a null
document ({doc:null}). I'd say 99.99% of the time we can retrieve the
document fine but once in a blue moon we get the null document. The problem
is that for us, if SOLR returns null, that means that the document does not
exist but because this is a document that should be there it causes all
sort of problems in our system.

The API I call is the following:
http://{server_ip}/solr/config/get?id={id}=json=_source_

As far as I understand reading the documentation, the Realtime Get API
should get me the document no matter what. Even if the document is not yet
committed to the index.

I see no errors whatsoever in the SOLR logs that could help me with this
problem. in fact there are no error at all.

As for our setup, because we're still in testing phase, we only have two
SOLR instances running on the same box in cloud mode with replication=1
which means that the core that we run the Realtime Get on is only present
in one of the two instances. Our script randomly chooses which instances it
does the query on but as far as I understand, in cloud mode the API call
should be dispatched automatically to the right instance.

Am I missing anything here? Is it possible that there is a race condition
in the Realtime Get API that could return null data even if the document
exist?

Thanks,
Steve

Re: Java version 11 for solr 7.5?

2018-09-26 Thread Jeff Courtade

Can you tell me where I could get insight into the testing cycles and
results?


On Wed, Sep 26, 2018, 1:03 PM Erick Erickson 
wrote:

> There are consistent failures under JDK 11 in the automated tests that
> Solr/Lucene runs that do not happen for other releases. I personally
> haven't tried diving into them to know whether they're test artifacts
> or not.
>
> JDK 9 and JKD 10 also have open issues, especially around Hadoop
> integration.
>
> I would recommend sticking with JDK 8 for the time being, that's the
> most tested/used version of Solr. Track
> https://issues.apache.org/jira/browse/SOLR-12809 for what progress is
> made with more recent Java versions.
>
> Best,
> Erick
> On Wed, Sep 26, 2018 at 8:44 AM Markus Jelsma
>  wrote:
> >
> > Indeed, but JDK-8038348 has been fixed very recently for Java 9 or
> higher.
> >
> > -Original message-
> > > From:Jeff Courtade 
> > > Sent: Wednesday 26th September 2018 17:36
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: Java version 11 for solr 7.5?
> > >
> > > My concern with using g1 is solely based on finding this.
> > > Does anyone have any information on this?
> > >
> > >
> https://wiki.apache.org/lucene-java/JavaBugs#Oracle_Java_.2F_Sun_Java_.2F_OpenJDK_Bugs
> > >
> > > "Do not, under any circumstances, run Lucene with the G1 garbage
> collector.
> > > Lucene's test suite fails with the G1 garbage collector on a regular
> basis,
> > > including bugs that cause index corruption. There is no person on this
> > > planet that seems to understand such bugs (see
> > > https://bugs.openjdk.java.net/browse/JDK-8038348, open for over a
> year), so
> > > don't count on the situation changing soon. This information is not
> out of
> > > date, and don't think that the next oracle java release will fix the
> > > situation."
> > >
> > >
> > > On Wed, Sep 26, 2018 at 11:08 AM Walter Underwood <
> wun...@wunderwood.org>
> > > wrote:
> > >
> > > > We’ve been running G1 in prod for at least 18 months. Our biggest
> cluster
> > > > is 48 machines, each with 36 CPUs, running 6.6.2. We also run it on
> our
> > > > 4.10.4 master/slave cluster.
> > > >
> > > > wunder
> > > > Walter Underwood
> > > > wun...@wunderwood.org
> > > > http://observer.wunderwood.org/  (my blog)
> > > >
> > > > > On Sep 26, 2018, at 7:37 AM, Jeff Courtade  >
> > > > wrote:
> > > > >
> > > > > Thanks for that...
> > > > > I am just starting to look at this I was unaware of the license
> debacle.
> > > > >
> > > > > Automated testing up to 10 is great.
> > > > >
> > > > > I am still curious about the GC1 being supported now...
> > > > >
> > > > > On Wed, Sep 26, 2018 at 10:25 AM Zisis T. 
> wrote:
> > > > >
> > > > >> Jeff Courtade wrote
> > > > >>> Can we use GC1 garbage collection yet or do we still need to use
> CMS?
> > > > >>
> > > > >> I believe you should be safe to go with G1. We've applied it in
> in a
> > > > Solr
> > > > >> 6.6 cluster with 10 shards, 3 replicas per shard and an index of
> about
> > > > >> 500GB
> > > > >> (1,5T counting all replicas) and it works extremely well
> (throughput >
> > > > >> 99%).
> > > > >> The use-case includes complex search queries and faceting.
> > > > >> There is also this post you can use as a starting point
> > > > >>
> > > > >>
> > > >
> http://blog.cloudera.com/blog/2017/06/apache-solr-memory-tuning-for-production/
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >> --
> > > > >> Sent from:
> http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
> > > > >>
> > > > > --
> > > > >
> > > > > Jeff Courtade
> > > > > M: 240.507.6116 <(240)%20507-6116>
> > > >
> > > > --
> > >
> > > Jeff Courtade
> > > M: 240.507.6116
> > >
>

Re: Java version 11 for solr 7.5?

2018-09-26 Thread Jeff Courtade

The CMS settings are very nearly what we use after tons of load testing we
changed newratio to 2 and it cut the 10 second pauses way down for us 
huge heap though

On Wed, Sep 26, 2018, 2:17 PM Shawn Heisey  wrote:

> On 9/26/2018 9:35 AM, Jeff Courtade wrote:
> > My concern with using g1 is solely based on finding this.
> > Does anyone have any information on this?
> >
> >
> https://wiki.apache.org/lucene-java/JavaBugs#Oracle_Java_.2F_Sun_Java_.2F_OpenJDK_Bugs
>
> I have never had a single problem with Solr running with the G1
> collector.  I'm only aware of one actual bug in Lucene that mentions G1
> ... and it is specific to the 32-bit version of Java.  It is strongly
> recommended for other reasons to only use a 64-bit Java.
>
> On the subject of the blog post mentioned by Zisis T... generally
> speaking, it is not a good idea to explicitly set the size of the
> various generations.  G1 will tune the sizes of each generation as it
> runs for best results.  By setting or limiting the size, that tuning
> cannot work with freedom, and you might be unhappy with the results.
>
> Here is a wiki page that contains my own experiments with garbage
> collection tuning:
>
> https://wiki.apache.org/solr/ShawnHeisey
>
> Thanks,
> Shawn
>
>

Re: Json object values in solr string field

2018-09-26 Thread Balanathagiri Ayyasamypalanivel

Hi,

Thanks for the reply, actually we are planning to optimize the huge volume
of data.

For example, in our current system we have as below, so we can do facet
pivot or stats to get the sum of asset_td for each acct, but the data
growing lot whenever more asset getting added.

Id | Accts| assetid | asset_td
1| Acct1 | asset1 | 20
2| Acct1 | asset2 | 30
3| Acct2 | asset3 | 10
4| Acct3 | asset2 | 10

So we planned to change as

Id | Accts | asset_s
1  | Acct1 | [{"asset1": "20", "asset2":"30"}]
2  | Acct2 | [{"asset3": "10"}]
3  | Acct3 | [{"asset2": "10"}]

But only draw back here is we have to parse the json to do the sum of the
values, is there any other way to handle this scenario.

Regards,
Bala.

On Wed, Sep 26, 2018, 2:25 PM Shawn Heisey  wrote:

> On 9/26/2018 12:20 PM, Balanathagiri Ayyasamypalanivel wrote:
> > Currently I am storing json object type of values in string field in
> solr.
> > Using this field, in the code I am parsing json objects and doing sum of
> > the values under it.
> >
> > In solr, do we have any option in doing it by default when using the json
> > object field values.
>
> Even if you have JSON-formatted strings in Solr, Solr doesn't know
> this.  It has no idea that the data is JSON, and won't be able to do
> anything special with the info contained there.
>
> Thanks,
> Shawn
>
>

Re: to cloud or not to cloud

2018-09-26 Thread Jeff Courtade

APX=approximately sorry

On Wed, Sep 26, 2018, 2:09 PM Shawn Heisey  wrote:

> On 9/26/2018 9:45 AM, Jeff Courtade wrote:
> > We are considering a move to solr 7.x  my question is Must we use cloud?
> We
> > currently do not and all is well. It seems all work is done referencing
> > cloud implementations.
>
> You do not have to use cloud.
>
> For most people who are starting from scratch, I would suggest using
> SolrCloud.  Many many things are just a lot easier with cloud.
>
> For somebody who has an existing setup that's NOT running cloud, if they
> are happy with their setup, I see no reason to change it ... but those
> people should at least *investigate* SolrCloud, just to find out whether
> it might make their operations easier.
>
> > solr 4.3.0 master/slave
> > 14 servers RHEL 32 core 96 gb ram 7 shards one replica per shard
> > Total index is 333Gb around 47.5 GB per server.
> > APX 2million docs per shard
>
> Sharded indexes are a LOT easier in SolrCloud.  I have dealt with
> sharded indexes without cloud.  If SolrCloud had existed when I began
> that work, I would have definitely used it. That index might still be
> using master/slave, but it was not possible to set up replication
> between 1.4.1 and 3.2.0, so master/slave went out the window.
>
> I have no idea what APX is.
>
> Thanks,
> Shawn
>
>

Re: Json object values in solr string field

2018-09-26 Thread Shawn Heisey


On 9/26/2018 12:20 PM, Balanathagiri Ayyasamypalanivel wrote:

Currently I am storing json object type of values in string field in solr.
Using this field, in the code I am parsing json objects and doing sum of
the values under it.

In solr, do we have any option in doing it by default when using the json
object field values.


Even if you have JSON-formatted strings in Solr, Solr doesn't know 
this.  It has no idea that the data is JSON, and won't be able to do 
anything special with the info contained there.


Thanks,
Shawn

Json object values in solr string field

2018-09-26 Thread Balanathagiri Ayyasamypalanivel

Hi,
Currently I am storing json object type of values in string field in solr.
Using this field, in the code I am parsing json objects and doing sum of
the values under it.

In solr, do we have any option in doing it by default when using the json
object field values.

Regards,
Bala.

Re: Java version 11 for solr 7.5?

2018-09-26 Thread Shawn Heisey


On 9/26/2018 9:35 AM, Jeff Courtade wrote:

My concern with using g1 is solely based on finding this.
Does anyone have any information on this?

https://wiki.apache.org/lucene-java/JavaBugs#Oracle_Java_.2F_Sun_Java_.2F_OpenJDK_Bugs


I have never had a single problem with Solr running with the G1 
collector.  I'm only aware of one actual bug in Lucene that mentions G1 
... and it is specific to the 32-bit version of Java.  It is strongly 
recommended for other reasons to only use a 64-bit Java.


On the subject of the blog post mentioned by Zisis T... generally 
speaking, it is not a good idea to explicitly set the size of the 
various generations.  G1 will tune the sizes of each generation as it 
runs for best results.  By setting or limiting the size, that tuning 
cannot work with freedom, and you might be unhappy with the results.


Here is a wiki page that contains my own experiments with garbage 
collection tuning:


https://wiki.apache.org/solr/ShawnHeisey

Thanks,
Shawn

Re: to cloud or not to cloud

2018-09-26 Thread Shawn Heisey


On 9/26/2018 9:45 AM, Jeff Courtade wrote:

We are considering a move to solr 7.x  my question is Must we use cloud? We
currently do not and all is well. It seems all work is done referencing
cloud implementations.


You do not have to use cloud.

For most people who are starting from scratch, I would suggest using 
SolrCloud.  Many many things are just a lot easier with cloud.


For somebody who has an existing setup that's NOT running cloud, if they 
are happy with their setup, I see no reason to change it ... but those 
people should at least *investigate* SolrCloud, just to find out whether 
it might make their operations easier.



solr 4.3.0 master/slave
14 servers RHEL 32 core 96 gb ram 7 shards one replica per shard
Total index is 333Gb around 47.5 GB per server.
APX 2million docs per shard


Sharded indexes are a LOT easier in SolrCloud.  I have dealt with 
sharded indexes without cloud.  If SolrCloud had existed when I began 
that work, I would have definitely used it. That index might still be 
using master/slave, but it was not possible to set up replication 
between 1.4.1 and 3.2.0, so master/slave went out the window.


I have no idea what APX is.

Thanks,
Shawn

The way to update Managed Resources.

2018-09-26 Thread Yasufumi Mizoguchi

Hi,

I am trying to use ManagedSynonymGraphFilterFactory and want to add
"tokenizerFactory" attribute into Managed
Resources(_schema_analysis_synonyms_*.json under conf directory).
To do this, is it OK to update json file manually?
If should not, is there any way to update ManagedResources except REST API?

Thanks,
Yasufumi

Re: Replication error in SOLR-6.5.1

2018-09-26 Thread Erick Erickson

bq. In all my solr servers I have 40% free space

Well, clearly that's not enough if you're getting this error: "No
space left on device"

Solr/Lucene need _at least_ as much free space as the indexes occupy.
In some circumstances it can require more. It sounds like you're
having an issue with full replications all happening at the same time
and effectively at least doubling your index space requirements.

I'd fix that problem first, the other messages likely will go away.
Either get bigger disks, make your indexes smaller or move some of the
replicas to a new machine.

Best,
Erick
On Tue, Sep 25, 2018 at 10:20 PM SOLR4189  wrote:
>
> Hi all,
>
> I use SOLR-6.5.1. Before couple weeks I started to use replication feature
> in cloud mode without override default behavior of ReplicationHandler.
>
> After deployment replication feature to production, almost every day I hit
> these errors:
> SolrException: Unable to download  completely. Downloaded x!=y
> OR
> SolrException: Unable to download  completely. (Downloaded x of y
> bytes) No space left on device
> OR
> Error deleting file: 
> NoSuchFileException: /opt/solr//data/index./
>
> All these errors I get when replica in recovering mode, sometimes after
> physical machine failing or sometimes after simple solr restarting. Today I
> have only one solution for it: after 5th unsuccess replica recovering, I
> remove replica and add replica anew.
>
> In all my solr servers I have 40% free space, hard/soft commit is 5 minutes.
>
>
> What's wrong here and what can be done to correct these errors?
> Due to free space or commitReserveDuration parameter or something else?
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Adding docs to solr - slows down searcher requests

2018-09-26 Thread Erick Erickson

5 zookeepers is overkill for 4 nodes. 3 should be more than adequate.
But that's a tangent.

Sure. Configs to tune:
1> indexing rate. If you're flooding the cluster with updates at a
very high rate, the CPU cycles needed to index the docs are going to
take away from query processing. So if you throttle your indexing you
can maybe make it better.

2> How often to you commit such that it opens a searcher? Either soft
commits or hard commits with openSearcher=true. See:
https://lucidworks.com/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

3> How much autowarming are you doing? Excessive autowarm counts for
filterCache and querResult cache can be a problem, especially when
combined with frequent commits. Contrariwise, no autowarming at all
will also cause response spikes. I usually start with 10-20.

4> If you're using a 7x version of Solr, you can use TLOG and/or PULL
replicas. Those index on the leader then the updated index is pushed
to the follower.

5> If the client that sends docs is issuing a commit with every
update, that's an anti-pattern. Check that.

Best,
Erick
On Wed, Sep 26, 2018 at 9:24 AM ashoknix  wrote:
>
> We have solr cloud  4 nodes with 5 zookeepers.
>
> Usually search request are super fast! But, when we add docs to leader solr
> - it starts pushing updates to other nodes - causing search request to
> respond back at snail speed :( :( :(
>
> We see tons of such logs for period of 2-3 mins and then once it is
> completed - again searcher is faster.
>
> 65497572 [http-apr-8980-exec-40] INFO
> org.apache.solr.update.processor.LogUpdateProcessor  – [bbc] webapp=/solr
> path=/update params=
> {distrib.from=http://ser6.rit.net:8980/solr/bbc/=FROMLEADER=javabin=2=dedupe}
> {add=[B63AD0671A5E57B9 (1612655281772167168), 00B8A4CCFABFA1AC
> (1612655281784750080), 9C89A1516C9166E6 (1612655281798381568),
> 9322E17ECEAADE66 (1612655281803624448), C6DDB4BF8E94DE6B
> (1612655281814110208), DAA49178A5E74285 (1612655281830887424),
> 829C2AE38A3E78E4 (1612655281845567488), 4C7B19756D8E4208
> (1612655281859198976), BE0F7354DC30164C (1612655281869684736),
> 59C4A764BB50B13B (1612655281880170496)]} 0 9
>
> Can SOLR gurus - advise what is the best strategy to follow ? / or any
> configs to tune ? or any other methods ?
>
> Thanks a ton!
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Telegraf monitoring for Solr Cluster

2018-09-26 Thread Walter Underwood

I’m still learning Telegraf/InfluxDB, but I like it so far. Does anybody have 
experience adding simple URL-based probes? For example, I’d like to graph this 
for each collection.

http://mycluster:8983/solr/mycollection/select?q=${query}=0=json; | jq 
-r .response.numFound

And this for each core. The script to get the nodes and cores is some scary 
bash code, I probably should have done this in Python.

curl -s 
"http://$url_frag/${cores[i]}/replication?_=1535148863458=details=json;
 | jq -r '.details.indexSize' | cut -d ' ' -f 1

The CPU and disk monitoring was easy to set up.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

Re: Java version 11 for solr 7.5?

2018-09-26 Thread Erick Erickson

There are consistent failures under JDK 11 in the automated tests that
Solr/Lucene runs that do not happen for other releases. I personally
haven't tried diving into them to know whether they're test artifacts
or not.

JDK 9 and JKD 10 also have open issues, especially around Hadoop integration.

I would recommend sticking with JDK 8 for the time being, that's the
most tested/used version of Solr. Track
https://issues.apache.org/jira/browse/SOLR-12809 for what progress is
made with more recent Java versions.

Best,
Erick
On Wed, Sep 26, 2018 at 8:44 AM Markus Jelsma
 wrote:
>
> Indeed, but JDK-8038348 has been fixed very recently for Java 9 or higher.
>
> -Original message-
> > From:Jeff Courtade 
> > Sent: Wednesday 26th September 2018 17:36
> > To: solr-user@lucene.apache.org
> > Subject: Re: Java version 11 for solr 7.5?
> >
> > My concern with using g1 is solely based on finding this.
> > Does anyone have any information on this?
> >
> > https://wiki.apache.org/lucene-java/JavaBugs#Oracle_Java_.2F_Sun_Java_.2F_OpenJDK_Bugs
> >
> > "Do not, under any circumstances, run Lucene with the G1 garbage collector.
> > Lucene's test suite fails with the G1 garbage collector on a regular basis,
> > including bugs that cause index corruption. There is no person on this
> > planet that seems to understand such bugs (see
> > https://bugs.openjdk.java.net/browse/JDK-8038348, open for over a year), so
> > don't count on the situation changing soon. This information is not out of
> > date, and don't think that the next oracle java release will fix the
> > situation."
> >
> >
> > On Wed, Sep 26, 2018 at 11:08 AM Walter Underwood 
> > wrote:
> >
> > > We’ve been running G1 in prod for at least 18 months. Our biggest cluster
> > > is 48 machines, each with 36 CPUs, running 6.6.2. We also run it on our
> > > 4.10.4 master/slave cluster.
> > >
> > > wunder
> > > Walter Underwood
> > > wun...@wunderwood.org
> > > http://observer.wunderwood.org/  (my blog)
> > >
> > > > On Sep 26, 2018, at 7:37 AM, Jeff Courtade 
> > > wrote:
> > > >
> > > > Thanks for that...
> > > > I am just starting to look at this I was unaware of the license debacle.
> > > >
> > > > Automated testing up to 10 is great.
> > > >
> > > > I am still curious about the GC1 being supported now...
> > > >
> > > > On Wed, Sep 26, 2018 at 10:25 AM Zisis T.  wrote:
> > > >
> > > >> Jeff Courtade wrote
> > > >>> Can we use GC1 garbage collection yet or do we still need to use CMS?
> > > >>
> > > >> I believe you should be safe to go with G1. We've applied it in in a
> > > Solr
> > > >> 6.6 cluster with 10 shards, 3 replicas per shard and an index of about
> > > >> 500GB
> > > >> (1,5T counting all replicas) and it works extremely well (throughput >
> > > >> 99%).
> > > >> The use-case includes complex search queries and faceting.
> > > >> There is also this post you can use as a starting point
> > > >>
> > > >>
> > > http://blog.cloudera.com/blog/2017/06/apache-solr-memory-tuning-for-production/
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
> > > >>
> > > > --
> > > >
> > > > Jeff Courtade
> > > > M: 240.507.6116 <(240)%20507-6116>
> > >
> > > --
> >
> > Jeff Courtade
> > M: 240.507.6116
> >

Re: SOLR Index Time Running Optimization

2018-09-26 Thread Walter Underwood

How long does the query take when it is run directly, without Solr?

For our DIH queries, Solr was not the slow part. It took 90 minutes
directly or with DIH. With our big cluster, I’ve seen indexing rates of
one million docs per minute.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Sep 26, 2018, at 9:44 AM, Jan Høydahl  wrote:
> 
> With DIH you are doing indexing single-threaded. You should be able to 
> configure multiple DIH's on the same collection and then partition the data 
> between them, issuing slightly different SQL to each. But I don't exactly 
> know what that would look like.
> 
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> 
>> 26. sep. 2018 kl. 14:30 skrev Susheel Kumar :
>> 
>> Also are you using Solr data import? That will be much slower compare to if
>> you write our own little indexer which does indexing in batches and with
>> multiple threads.
>> 
>> On Wed, Sep 26, 2018 at 8:00 AM Vincenzo D'Amore  wrote:
>> 
>>> Hi, I know this is the shortest way but, had you tried to add more core or
>>> CPU to your solr instances? How big is you collection in terms of GB and
>>> number of documents?
>>> 
>>> Ciao,
>>> Vincenzo
>>> 
>>> 
 On 26 Sep 2018, at 08:36, Krizelle Mae Hernandez <
>>> krizellemae.marti...@sas.com> wrote:
 
 Hi.
 
 Our SOLR currently is running approximately 39hours for Full and Delta
>>> Import. I would like to ask for your assistance on how can we shorten the
>>> 39hours run time in any possible solution?
 For SOLR version, we are using solr 5.3.1.
 
 Regards,
 Krizelle Mae M. Hernandez
>>> 
>

Re: SOLR Index Time Running Optimization

2018-09-26 Thread Jan Høydahl

With DIH you are doing indexing single-threaded. You should be able to 
configure multiple DIH's on the same collection and then partition the data 
between them, issuing slightly different SQL to each. But I don't exactly know 
what that would look like.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 26. sep. 2018 kl. 14:30 skrev Susheel Kumar :
> 
> Also are you using Solr data import? That will be much slower compare to if
> you write our own little indexer which does indexing in batches and with
> multiple threads.
> 
> On Wed, Sep 26, 2018 at 8:00 AM Vincenzo D'Amore  wrote:
> 
>> Hi, I know this is the shortest way but, had you tried to add more core or
>> CPU to your solr instances? How big is you collection in terms of GB and
>> number of documents?
>> 
>> Ciao,
>> Vincenzo
>> 
>> 
>>> On 26 Sep 2018, at 08:36, Krizelle Mae Hernandez <
>> krizellemae.marti...@sas.com> wrote:
>>> 
>>> Hi.
>>> 
>>> Our SOLR currently is running approximately 39hours for Full and Delta
>> Import. I would like to ask for your assistance on how can we shorten the
>> 39hours run time in any possible solution?
>>> For SOLR version, we are using solr 5.3.1.
>>> 
>>> Regards,
>>> Krizelle Mae M. Hernandez
>>

Adding docs to solr - slows down searcher requests

2018-09-26 Thread ashoknix

We have solr cloud  4 nodes with 5 zookeepers.

Usually search request are super fast! But, when we add docs to leader solr
- it starts pushing updates to other nodes - causing search request to
respond back at snail speed :( :( :( 

We see tons of such logs for period of 2-3 mins and then once it is
completed - again searcher is faster. 

65497572 [http-apr-8980-exec-40] INFO 
org.apache.solr.update.processor.LogUpdateProcessor  – [bbc] webapp=/solr
path=/update params=
{distrib.from=http://ser6.rit.net:8980/solr/bbc/=FROMLEADER=javabin=2=dedupe}
{add=[B63AD0671A5E57B9 (1612655281772167168), 00B8A4CCFABFA1AC
(1612655281784750080), 9C89A1516C9166E6 (1612655281798381568),
9322E17ECEAADE66 (1612655281803624448), C6DDB4BF8E94DE6B
(1612655281814110208), DAA49178A5E74285 (1612655281830887424),
829C2AE38A3E78E4 (1612655281845567488), 4C7B19756D8E4208
(1612655281859198976), BE0F7354DC30164C (1612655281869684736),
59C4A764BB50B13B (1612655281880170496)]} 0 9

Can SOLR gurus - advise what is the best strategy to follow ? / or any
configs to tune ? or any other methods ?

Thanks a ton!




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Rule-based replication or sharing

2018-09-26 Thread Chuck Reynolds

Noble,

Are you saying in the latest version of Solr that this would work with three 
instances of Solr running on each server?

If so how?

Thanks again for your help.

On 9/26/18, 9:11 AM, "Noble Paul"  wrote:

I'm not sure if it is pertinent to ask you to move to the latest Solr
which has the policy based replica placement. Unfortunately, I don't
have any other solution I can think of

On Wed, Sep 26, 2018 at 11:46 PM Chuck Reynolds  
wrote:
>
> Noble,
>
> So other than manually moving replicas of shard do you have a suggestion 
of how one might accomplish the multiple availability zone with multiple 
instances of Solr running on each server?
>
> Thanks
>
> On 9/26/18, 12:56 AM, "Noble Paul"  wrote:
>
> The rules suggested by Steve is correct. I tested it locally and I got
> the same errors. That means a bug exists probably.
> All the new development efforts are invested in the new policy feature
> 
.https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_7-5F4_solrcloud-2Dautoscaling-2Dpolicy-2Dpreferences.html=DwIFaQ=kKqjBR9KKWaWpMhASkPbOg=J-2s3b-3-OTA0o6bGDhJXAQlB5Y3s4rOUxlh_78DJl0=yXVYNcm-dqN_lucLyuQI38EZfK4f8l4828Ty53e4plM=D1vfu3bOu_hOGAU2CIKPwqBTPkYiBeK1kOUoFnQZpKA=
>
> The old one is going to be deprecated pretty soon. So, I'm not sure if
> we should be investing our resources here
> On Wed, Sep 26, 2018 at 1:23 PM Chuck Reynolds 
 wrote:
> >
> > Shawn,
> >
> > Thanks for the info. We’ve been running this way for the past 4 
years.
> >
> > We were running on very large hardware, 20 physical cores with 256 
gigs of ram with 3 billion document and it was the only way we could take 
advantage of the hardware.
> >
> > Running 1 Solr instance per server never gave us the throughput we 
needed.
> >
> > So I somewhat disagree with your statement because our test proved 
otherwise.
> >
> > Thanks for the info.
> >
> > Sent from my iPhone
> >
> > > On Sep 25, 2018, at 4:19 PM, Shawn Heisey  
wrote:
> > >
> > >> On 9/25/2018 9:21 AM, Chuck Reynolds wrote:
> > >> Each server has three instances of Solr running on it so every 
instance on the server has to be in the same replica set.
> > >
> > > You should be running exactly one Solr instance per server.  When 
evaluating rules for replica placement, SolrCloud will treat each instance as 
completely separate from all others, including others on the same machine.  It 
will not know that those three instances are on the same machine.  One Solr 
instance can handle MANY indexes.
> > >
> > > There is only ONE situation where it makes sense to run multiple 
instances per machine, and in my strong opinion, even that situation should not 
be handled with multiple instances. That situation is this:  When running one 
instance would require a REALLY large heap.  Garbage collection pauses can 
become extreme in that situation, so some people will run multiple instances 
that each have a smaller heap, and divide their indexes between them. In my 
opinion, when you have enough index data on an instance that it requires a huge 
heap, instead of running two or more instances on one server, it's time to add 
more servers.
> > >
> > > Thanks,
> > > Shawn
> > >
>
>
>
> --
> -
> Noble Paul
>
>


-- 
-
Noble Paul

RE: [External] Setting Spellcheck for solr only for zero result

2018-09-26 Thread Dyer, James

Neel,

I do not think there is a way to entirely bypass spellchecking if there are 
results returned, and I'm not so sure performance would noticeably improve if 
it did this.  Clients can easily check to see if results were returned and can 
ignore the spellcheck response in these cases, if desired.

The one exception to this is if you are using "spellcheck.collate=true" with  
"spellcheck.maxCollationTries" set to a value > 0.  In this case, if your main 
query uses "o.op=OR" or a low "mm" value, you might want to force it to only 
return collations with all matching words.  In this case you would use 
something like "spellcheck.collateParam.mm=100%" to be sure it only returned 
re-written queries for which all the words matched.

The "spellcheck.maxResultsForSuggest" parameter is designed to be used in 
conjunction with "spellcheck.alternativeTermCount" to produce 
did-you-mean-style suggestions when a query returns only a few hits and at 
least some of the terms were in the index (but may be misspelled nevertheless).

James Dyer
Ingram Content Group

-Original Message-
From: neel choudhury [mailto:findneel2...@gmail.com] 
Sent: Sunday, September 23, 2018 2:58 PM
To: solr-user@lucene.apache.org
Subject: [External] Setting Spellcheck for solr only for zero result

I am looking for setting up spellcheck for solr correctly. For performance
reason (and avoiding confusion) I don't want to give any suggestion for any
query which returns at least one result. Solr provides a parameter
spellcheck.maxResultsForSuggest. For my use case i need to set is as 0 as I
only want suggestions when no result is returned. However looking into the
code of SpellCheckComponent in Solr i saw that for 0 value for
spellcheck.maxResultsForSuggest is ignored because of greater than sign. Is
there a way i can suppress spell suggestion even if 1 result is returned.

private Integer maxResultsForSuggest(ResponseBuilder rb) {
SolrParams params = rb.req.getParams();
float maxResultsForSuggestParamValue =
params.getFloat(SpellingParams.SPELLCHECK_MAX_RESULTS_FOR_SUGGEST,
0.0f);
Integer maxResultsForSuggest = null;

if (maxResultsForSuggestParamValue > 0.0f) {
...}

return maxResultsForSuggest
 }

Re: to cloud or not to cloud

2018-09-26 Thread David Hastings

Agree with Walter.  I personally really like the master slave set up for my use 
cases.  



David J. Hastings | Lead Developer
dhasti...@wshein.com | 716.882.2600 x 176

William S. Hein & Co., Inc.
2350 North Forest Road | Getzville, NY 14068
www.wshein.com/contact-us


From: Walter Underwood 
Sent: Wednesday, September 26, 2018 11:50 AM
To: solr-user@lucene.apache.org
Subject: Re: to cloud or not to cloud

Cloud is very useful if you shard or need near real-time indexing.

For non-sharded,  non real time collections, I really like master/slave.
The loose coupling between master and slave makes it trivial to scale
out. Just clone a slave and fire it up.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Sep 26, 2018, at 8:45 AM, Jeff Courtade  wrote:
>
> Hi,
>
> We are considering a move to solr 7.x  my question is Must we use cloud? We
> currently do not and all is well. It seems all work is done referencing
> cloud implementations.
>
> We have
>
> solr 4.3.0 master/slave
> 14 servers RHEL 32 core 96 gb ram 7 shards one replica per shard
> Total index is 333Gb around 47.5 GB per server.
> APX 2million docs per shard
>
>
>
> --
>
> Jeff Courtade
> M: 240.507.6116

Re: Java version 11 for solr 7.5?

2018-09-26 Thread Jeff Courtade

Thanks ..!

On Wed, Sep 26, 2018 at 11:44 AM Markus Jelsma 
wrote:

> Indeed, but JDK-8038348 has been fixed very recently for Java 9 or higher.
>
> -Original message-
> > From:Jeff Courtade 
> > Sent: Wednesday 26th September 2018 17:36
> > To: solr-user@lucene.apache.org
> > Subject: Re: Java version 11 for solr 7.5?
> >
> > My concern with using g1 is solely based on finding this.
> > Does anyone have any information on this?
> >
> >
> https://wiki.apache.org/lucene-java/JavaBugs#Oracle_Java_.2F_Sun_Java_.2F_OpenJDK_Bugs
> >
> > "Do not, under any circumstances, run Lucene with the G1 garbage
> collector.
> > Lucene's test suite fails with the G1 garbage collector on a regular
> basis,
> > including bugs that cause index corruption. There is no person on this
> > planet that seems to understand such bugs (see
> > https://bugs.openjdk.java.net/browse/JDK-8038348, open for over a
> year), so
> > don't count on the situation changing soon. This information is not out
> of
> > date, and don't think that the next oracle java release will fix the
> > situation."
> >
> >
> > On Wed, Sep 26, 2018 at 11:08 AM Walter Underwood  >
> > wrote:
> >
> > > We’ve been running G1 in prod for at least 18 months. Our biggest
> cluster
> > > is 48 machines, each with 36 CPUs, running 6.6.2. We also run it on our
> > > 4.10.4 master/slave cluster.
> > >
> > > wunder
> > > Walter Underwood
> > > wun...@wunderwood.org
> > > http://observer.wunderwood.org/  (my blog)
> > >
> > > > On Sep 26, 2018, at 7:37 AM, Jeff Courtade 
> > > wrote:
> > > >
> > > > Thanks for that...
> > > > I am just starting to look at this I was unaware of the license
> debacle.
> > > >
> > > > Automated testing up to 10 is great.
> > > >
> > > > I am still curious about the GC1 being supported now...
> > > >
> > > > On Wed, Sep 26, 2018 at 10:25 AM Zisis T. 
> wrote:
> > > >
> > > >> Jeff Courtade wrote
> > > >>> Can we use GC1 garbage collection yet or do we still need to use
> CMS?
> > > >>
> > > >> I believe you should be safe to go with G1. We've applied it in in a
> > > Solr
> > > >> 6.6 cluster with 10 shards, 3 replicas per shard and an index of
> about
> > > >> 500GB
> > > >> (1,5T counting all replicas) and it works extremely well
> (throughput >
> > > >> 99%).
> > > >> The use-case includes complex search queries and faceting.
> > > >> There is also this post you can use as a starting point
> > > >>
> > > >>
> > >
> http://blog.cloudera.com/blog/2017/06/apache-solr-memory-tuning-for-production/
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> Sent from:
> http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
> > > >>
> > > > --
> > > >
> > > > Jeff Courtade
> > > > M: 240.507.6116 <(240)%20507-6116> <(240)%20507-6116>
> > >
> > > --
> >
> > Jeff Courtade
> > M: 240.507.6116 <(240)%20507-6116>
> >
>
-- 

Jeff Courtade
M: 240.507.6116

Re: to cloud or not to cloud

2018-09-26 Thread Walter Underwood

Cloud is very useful if you shard or need near real-time indexing.

For non-sharded,  non real time collections, I really like master/slave.
The loose coupling between master and slave makes it trivial to scale
out. Just clone a slave and fire it up.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Sep 26, 2018, at 8:45 AM, Jeff Courtade  wrote:
> 
> Hi,
> 
> We are considering a move to solr 7.x  my question is Must we use cloud? We
> currently do not and all is well. It seems all work is done referencing
> cloud implementations.
> 
> We have
> 
> solr 4.3.0 master/slave
> 14 servers RHEL 32 core 96 gb ram 7 shards one replica per shard
> Total index is 333Gb around 47.5 GB per server.
> APX 2million docs per shard
> 
> 
> 
> -- 
> 
> Jeff Courtade
> M: 240.507.6116

to cloud or not to cloud

2018-09-26 Thread Jeff Courtade

Hi,

We are considering a move to solr 7.x  my question is Must we use cloud? We
currently do not and all is well. It seems all work is done referencing
cloud implementations.

We have

solr 4.3.0 master/slave
14 servers RHEL 32 core 96 gb ram 7 shards one replica per shard
Total index is 333Gb around 47.5 GB per server.
APX 2million docs per shard



-- 

Jeff Courtade
M: 240.507.6116

RE: Java version 11 for solr 7.5?

2018-09-26 Thread Markus Jelsma

Indeed, but JDK-8038348 has been fixed very recently for Java 9 or higher.
 
-Original message-
> From:Jeff Courtade 
> Sent: Wednesday 26th September 2018 17:36
> To: solr-user@lucene.apache.org
> Subject: Re: Java version 11 for solr 7.5?
> 
> My concern with using g1 is solely based on finding this.
> Does anyone have any information on this?
> 
> https://wiki.apache.org/lucene-java/JavaBugs#Oracle_Java_.2F_Sun_Java_.2F_OpenJDK_Bugs
> 
> "Do not, under any circumstances, run Lucene with the G1 garbage collector.
> Lucene's test suite fails with the G1 garbage collector on a regular basis,
> including bugs that cause index corruption. There is no person on this
> planet that seems to understand such bugs (see
> https://bugs.openjdk.java.net/browse/JDK-8038348, open for over a year), so
> don't count on the situation changing soon. This information is not out of
> date, and don't think that the next oracle java release will fix the
> situation."
> 
> 
> On Wed, Sep 26, 2018 at 11:08 AM Walter Underwood 
> wrote:
> 
> > We’ve been running G1 in prod for at least 18 months. Our biggest cluster
> > is 48 machines, each with 36 CPUs, running 6.6.2. We also run it on our
> > 4.10.4 master/slave cluster.
> >
> > wunder
> > Walter Underwood
> > wun...@wunderwood.org
> > http://observer.wunderwood.org/  (my blog)
> >
> > > On Sep 26, 2018, at 7:37 AM, Jeff Courtade 
> > wrote:
> > >
> > > Thanks for that...
> > > I am just starting to look at this I was unaware of the license debacle.
> > >
> > > Automated testing up to 10 is great.
> > >
> > > I am still curious about the GC1 being supported now...
> > >
> > > On Wed, Sep 26, 2018 at 10:25 AM Zisis T.  wrote:
> > >
> > >> Jeff Courtade wrote
> > >>> Can we use GC1 garbage collection yet or do we still need to use CMS?
> > >>
> > >> I believe you should be safe to go with G1. We've applied it in in a
> > Solr
> > >> 6.6 cluster with 10 shards, 3 replicas per shard and an index of about
> > >> 500GB
> > >> (1,5T counting all replicas) and it works extremely well (throughput >
> > >> 99%).
> > >> The use-case includes complex search queries and faceting.
> > >> There is also this post you can use as a starting point
> > >>
> > >>
> > http://blog.cloudera.com/blog/2017/06/apache-solr-memory-tuning-for-production/
> > >>
> > >>
> > >>
> > >>
> > >> --
> > >> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
> > >>
> > > --
> > >
> > > Jeff Courtade
> > > M: 240.507.6116 <(240)%20507-6116>
> >
> > --
> 
> Jeff Courtade
> M: 240.507.6116
>

Re: Java version 11 for solr 7.5?

2018-09-26 Thread Jeff Courtade

My concern with using g1 is solely based on finding this.
Does anyone have any information on this?

https://wiki.apache.org/lucene-java/JavaBugs#Oracle_Java_.2F_Sun_Java_.2F_OpenJDK_Bugs

"Do not, under any circumstances, run Lucene with the G1 garbage collector.
Lucene's test suite fails with the G1 garbage collector on a regular basis,
including bugs that cause index corruption. There is no person on this
planet that seems to understand such bugs (see
https://bugs.openjdk.java.net/browse/JDK-8038348, open for over a year), so
don't count on the situation changing soon. This information is not out of
date, and don't think that the next oracle java release will fix the
situation."

On Wed, Sep 26, 2018 at 11:08 AM Walter Underwood 
wrote:

> We’ve been running G1 in prod for at least 18 months. Our biggest cluster
> is 48 machines, each with 36 CPUs, running 6.6.2. We also run it on our
> 4.10.4 master/slave cluster.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Sep 26, 2018, at 7:37 AM, Jeff Courtade 
> wrote:
> >
> > Thanks for that...
> > I am just starting to look at this I was unaware of the license debacle.
> >
> > Automated testing up to 10 is great.
> >
> > I am still curious about the GC1 being supported now...
> >
> > On Wed, Sep 26, 2018 at 10:25 AM Zisis T.  wrote:
> >
> >> Jeff Courtade wrote
> >>> Can we use GC1 garbage collection yet or do we still need to use CMS?
> >>
> >> I believe you should be safe to go with G1. We've applied it in in a
> Solr
> >> 6.6 cluster with 10 shards, 3 replicas per shard and an index of about
> >> 500GB
> >> (1,5T counting all replicas) and it works extremely well (throughput >
> >> 99%).
> >> The use-case includes complex search queries and faceting.
> >> There is also this post you can use as a starting point
> >>
> >>
> http://blog.cloudera.com/blog/2017/06/apache-solr-memory-tuning-for-production/
> >>
> >>
> >>
> >>
> >> --
> >> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
> >>
> > --
> >
> > Jeff Courtade
> > M: 240.507.6116 <(240)%20507-6116>
>
> --

Jeff Courtade
M: 240.507.6116

Re: Rule-based replication or sharing

2018-09-26 Thread Noble Paul

I'm not sure if it is pertinent to ask you to move to the latest Solr
which has the policy based replica placement. Unfortunately, I don't
have any other solution I can think of

On Wed, Sep 26, 2018 at 11:46 PM Chuck Reynolds  wrote:
>
> Noble,
>
> So other than manually moving replicas of shard do you have a suggestion of 
> how one might accomplish the multiple availability zone with multiple 
> instances of Solr running on each server?
>
> Thanks
>
> On 9/26/18, 12:56 AM, "Noble Paul"  wrote:
>
> The rules suggested by Steve is correct. I tested it locally and I got
> the same errors. That means a bug exists probably.
> All the new development efforts are invested in the new policy feature
> 
> .https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_7-5F4_solrcloud-2Dautoscaling-2Dpolicy-2Dpreferences.html=DwIFaQ=kKqjBR9KKWaWpMhASkPbOg=J-2s3b-3-OTA0o6bGDhJXAQlB5Y3s4rOUxlh_78DJl0=yXVYNcm-dqN_lucLyuQI38EZfK4f8l4828Ty53e4plM=D1vfu3bOu_hOGAU2CIKPwqBTPkYiBeK1kOUoFnQZpKA=
>
> The old one is going to be deprecated pretty soon. So, I'm not sure if
> we should be investing our resources here
> On Wed, Sep 26, 2018 at 1:23 PM Chuck Reynolds  
> wrote:
> >
> > Shawn,
> >
> > Thanks for the info. We’ve been running this way for the past 4 years.
> >
> > We were running on very large hardware, 20 physical cores with 256 gigs 
> of ram with 3 billion document and it was the only way we could take 
> advantage of the hardware.
> >
> > Running 1 Solr instance per server never gave us the throughput we 
> needed.
> >
> > So I somewhat disagree with your statement because our test proved 
> otherwise.
> >
> > Thanks for the info.
> >
> > Sent from my iPhone
> >
> > > On Sep 25, 2018, at 4:19 PM, Shawn Heisey  wrote:
> > >
> > >> On 9/25/2018 9:21 AM, Chuck Reynolds wrote:
> > >> Each server has three instances of Solr running on it so every 
> instance on the server has to be in the same replica set.
> > >
> > > You should be running exactly one Solr instance per server.  When 
> evaluating rules for replica placement, SolrCloud will treat each instance as 
> completely separate from all others, including others on the same machine.  
> It will not know that those three instances are on the same machine.  One 
> Solr instance can handle MANY indexes.
> > >
> > > There is only ONE situation where it makes sense to run multiple 
> instances per machine, and in my strong opinion, even that situation should 
> not be handled with multiple instances. That situation is this:  When running 
> one instance would require a REALLY large heap.  Garbage collection pauses 
> can become extreme in that situation, so some people will run multiple 
> instances that each have a smaller heap, and divide their indexes between 
> them. In my opinion, when you have enough index data on an instance that it 
> requires a huge heap, instead of running two or more instances on one server, 
> it's time to add more servers.
> > >
> > > Thanks,
> > > Shawn
> > >
>
>
>
> --
> -
> Noble Paul
>
>


-- 
-
Noble Paul

Re: Java version 11 for solr 7.5?

2018-09-26 Thread Walter Underwood

We’ve been running G1 in prod for at least 18 months. Our biggest cluster
is 48 machines, each with 36 CPUs, running 6.6.2. We also run it on our
4.10.4 master/slave cluster.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Sep 26, 2018, at 7:37 AM, Jeff Courtade  wrote:
> 
> Thanks for that...
> I am just starting to look at this I was unaware of the license debacle.
> 
> Automated testing up to 10 is great.
> 
> I am still curious about the GC1 being supported now...
> 
> On Wed, Sep 26, 2018 at 10:25 AM Zisis T.  wrote:
> 
>> Jeff Courtade wrote
>>> Can we use GC1 garbage collection yet or do we still need to use CMS?
>> 
>> I believe you should be safe to go with G1. We've applied it in in a Solr
>> 6.6 cluster with 10 shards, 3 replicas per shard and an index of about
>> 500GB
>> (1,5T counting all replicas) and it works extremely well (throughput >
>> 99%).
>> The use-case includes complex search queries and faceting.
>> There is also this post you can use as a starting point
>> 
>> http://blog.cloudera.com/blog/2017/06/apache-solr-memory-tuning-for-production/
>> 
>> 
>> 
>> 
>> --
>> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>> 
> -- 
> 
> Jeff Courtade
> M: 240.507.6116

Re: Java version 11 for solr 7.5?

2018-09-26 Thread Jeff Courtade

Thanks for that...
I am just starting to look at this I was unaware of the license debacle.

Automated testing up to 10 is great.

I am still curious about the GC1 being supported now...

On Wed, Sep 26, 2018 at 10:25 AM Zisis T.  wrote:

> Jeff Courtade wrote
> > Can we use GC1 garbage collection yet or do we still need to use CMS?
>
> I believe you should be safe to go with G1. We've applied it in in a Solr
> 6.6 cluster with 10 shards, 3 replicas per shard and an index of about
> 500GB
> (1,5T counting all replicas) and it works extremely well (throughput >
> 99%).
> The use-case includes complex search queries and faceting.
> There is also this post you can use as a starting point
>
> http://blog.cloudera.com/blog/2017/06/apache-solr-memory-tuning-for-production/
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>
-- 

Jeff Courtade
M: 240.507.6116

Re: Java version 11 for solr 7.5?

2018-09-26 Thread Zisis T.

Jeff Courtade wrote
> Can we use GC1 garbage collection yet or do we still need to use CMS?

I believe you should be safe to go with G1. We've applied it in in a Solr
6.6 cluster with 10 shards, 3 replicas per shard and an index of about 500GB
(1,5T counting all replicas) and it works extremely well (throughput > 99%).
The use-case includes complex search queries and faceting. 
There is also this post you can use as a starting point
http://blog.cloudera.com/blog/2017/06/apache-solr-memory-tuning-for-production/




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Java version 11 for solr 7.5?

2018-09-26 Thread Alexandre Rafalovitch

The minimal required as per the changes file is 1.8 (8.x). I believe
there is an automated testing up to 10 and there were some issues
related to the modules but AFAIK they were resolved. So, you may be ok
until then.

However, the java 11 is a bit of a different beast. Both because the
public version has just been released, but also because of the license
change issues. You can see the article at:
https://blog.joda.org/2018/09/do-not-fall-into-oracles-java-11-trap.html
and the technical discussion at:
https://news.ycombinator.com/item?id=18074727

Not sure about GC, somebody else may comment on that.

Regards,
   Alex.

On 26 September 2018 at 10:00, Jeff Courtade  wrote:
> Hello,
>
> We are looking to migrate to solr 7.5 with java 11 from solr 4.3.0 with
> java 7.
>
> I have a couple basic questions...
>
> What version of Java is current solr 7.5 development and testing based on?
>
> Can we use java 11 with solr 7.5? any known issues?
>
> Can we use GC1 garbage collection yet or do we still need to use CMS?
>
> --
> Thanks,
>
> Jeff Courtade
> M: 240.507.6116

Java version 11 for solr 7.5?

2018-09-26 Thread Jeff Courtade

Hello,

We are looking to migrate to solr 7.5 with java 11 from solr 4.3.0 with
java 7.

I have a couple basic questions...

What version of Java is current solr 7.5 development and testing based on?

Can we use java 11 with solr 7.5? any known issues?

Can we use GC1 garbage collection yet or do we still need to use CMS?

--
Thanks,

Jeff Courtade
M: 240.507.6116

Re: Rule-based replication or sharing

2018-09-26 Thread Chuck Reynolds

Noble,

So other than manually moving replicas of shard do you have a suggestion of how 
one might accomplish the multiple availability zone with multiple instances of 
Solr running on each server?

Thanks

On 9/26/18, 12:56 AM, "Noble Paul"  wrote:

The rules suggested by Steve is correct. I tested it locally and I got
the same errors. That means a bug exists probably.
All the new development efforts are invested in the new policy feature

.https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_7-5F4_solrcloud-2Dautoscaling-2Dpolicy-2Dpreferences.html=DwIFaQ=kKqjBR9KKWaWpMhASkPbOg=J-2s3b-3-OTA0o6bGDhJXAQlB5Y3s4rOUxlh_78DJl0=yXVYNcm-dqN_lucLyuQI38EZfK4f8l4828Ty53e4plM=D1vfu3bOu_hOGAU2CIKPwqBTPkYiBeK1kOUoFnQZpKA=

The old one is going to be deprecated pretty soon. So, I'm not sure if
we should be investing our resources here
On Wed, Sep 26, 2018 at 1:23 PM Chuck Reynolds  
wrote:
>
> Shawn,
>
> Thanks for the info. We’ve been running this way for the past 4 years.
>
> We were running on very large hardware, 20 physical cores with 256 gigs 
of ram with 3 billion document and it was the only way we could take advantage 
of the hardware.
>
> Running 1 Solr instance per server never gave us the throughput we needed.
>
> So I somewhat disagree with your statement because our test proved 
otherwise.
>
> Thanks for the info.
>
> Sent from my iPhone
>
> > On Sep 25, 2018, at 4:19 PM, Shawn Heisey  wrote:
> >
> >> On 9/25/2018 9:21 AM, Chuck Reynolds wrote:
> >> Each server has three instances of Solr running on it so every 
instance on the server has to be in the same replica set.
> >
> > You should be running exactly one Solr instance per server.  When 
evaluating rules for replica placement, SolrCloud will treat each instance as 
completely separate from all others, including others on the same machine.  It 
will not know that those three instances are on the same machine.  One Solr 
instance can handle MANY indexes.
> >
> > There is only ONE situation where it makes sense to run multiple 
instances per machine, and in my strong opinion, even that situation should not 
be handled with multiple instances. That situation is this:  When running one 
instance would require a REALLY large heap.  Garbage collection pauses can 
become extreme in that situation, so some people will run multiple instances 
that each have a smaller heap, and divide their indexes between them. In my 
opinion, when you have enough index data on an instance that it requires a huge 
heap, instead of running two or more instances on one server, it's time to add 
more servers.
> >
> > Thanks,
> > Shawn
> >



-- 
-
Noble Paul

Re: checksum failed (hardware problem?)

2018-09-26 Thread simon

I saw something like this a year ago which i reported as a possible bug  (
https://issues.apache.org/jira/browse/SOLR-10840, which has  a full
description and stack traces)

This occurred very randomly on an AWS instance; moving the index directory
to a different file system did not fix the problem Eventually I cloned our
environment to a new AWS instance, which proved to be the solution. Why, I
have no idea...

-Simon

On Mon, Sep 24, 2018 at 1:13 PM, Susheel Kumar 
wrote:

> Got it. I'll have first hardware folks check and if they don't see/find
> anything suspicious then i'll return here.
>
> Wondering if any body has seen similar error and if they were able to
> confirm if it was hardware fault or so.
>
> Thnx
>
> On Mon, Sep 24, 2018 at 1:01 PM Erick Erickson 
> wrote:
>
> > Mind you it could _still_ be Solr/Lucene, but let's check the hardware
> > first ;)
> > On Mon, Sep 24, 2018 at 9:50 AM Susheel Kumar 
> > wrote:
> > >
> > > Hi Erick,
> > >
> > > Thanks so much for your reply.  I'll now look mostly into any possible
> > > hardware issues than Solr/Lucene.
> > >
> > > Thanks again.
> > >
> > > On Mon, Sep 24, 2018 at 12:43 PM Erick Erickson <
> erickerick...@gmail.com
> > >
> > > wrote:
> > >
> > > > There are several of reasons this would "suddenly" start appearing.
> > > > 1> Your disk went bad and some sector is no longer faithfully
> > > > recording the bits. In this case the checksum will be wrong
> > > > 2> You ran out of disk space sometime and the index was corrupted.
> > > > This isn't really a hardware problem.
> > > > 3> Your disk controller is going wonky and not reading reliably.
> > > >
> > > > The "possible hardware issue" message is to alert you that this is
> > > > highly unusual and you should at leasts consider doing integrity
> > > > checks on your disk before assuming it's a Solr/Lucene problem
> > > >
> > > > Best,
> > > > Erick
> > > > On Mon, Sep 24, 2018 at 9:26 AM Susheel Kumar  >
> > > > wrote:
> > > > >
> > > > > Hello,
> > > > >
> > > > > I am still trying to understand the corrupt index exception we saw
> > in our
> > > > > logs. What does the hardware problem comment indicates here?  Does
> > that
> > > > > mean it caused most likely due to hardware issue?
> > > > >
> > > > > We never had this problem in last couple of months. The Solr is
> > 6.6.2 and
> > > > > ZK: 3.4.10.
> > > > >
> > > > > Please share your thoughts.
> > > > >
> > > > > Thanks,
> > > > > Susheel
> > > > >
> > > > > Caused by: org.apache.lucene.index.CorruptIndexException: checksum
> > > > > failed *(hardware
> > > > > problem?)* : expected=db243d1a actual=7a00d3d2
> > > > >
> > > >
> > (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/
> app/solr/data/COLL_shard1_replica1/data/index/_i27s.cfs")
> > > > > [slice=_i27s_Lucene50_0.tim])
> > > > >
> > > > > It suddenly started in the logs and before which there was no such
> > error.
> > > > > Searches & ingestions all seems to be working prior to that.
> > > > >
> > > > > 
> > > > >
> > > > > 2018-09-03 17:16:49.056 INFO  (qtp834133664-519872) [c:COLL
> s:shard1
> > > > > r:core_node1 x:COLL_shard1_replica1]
> > > > > o.a.s.u.p.StatelessScriptUpdateProcessorFactory
> > update-script#processAdd:
> > > > >
> > newid=G31MXMRZESC0CYPR!A-G31MXMRZESC0CYPR.2552019802_1-25520
> 08480_1-en_US
> > > > > 2018-09-03 17:16:49.057 ERROR (qtp834133664-519872) [c:COLL
> s:shard1
> > > > > r:core_node1 x:COLL_shard1_replica1] o.a.s.h.RequestHandlerBase
> > > > > org.apache.solr.common.SolrException: Exception writing document
> id
> > > > > G31MXMRZESC0CYPR!A-G31MXMRZESC0CYPR.2552019802_1-2552008480_
> 1-en_US
> > to
> > > > the
> > > > > index; possible analysis error.
> > > > > at
> > > > >
> > > >
> > org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpd
> ateHandler2.java:206)
> > > > > at
> > > > >
> > > >
> > org.apache.solr.update.processor.RunUpdateProcessor.processA
> dd(RunUpdateProcessorFactory.java:67)
> > > > > at
> > > > >
> > > >
> > org.apache.solr.update.processor.UpdateRequestProcessor.proc
> essAdd(UpdateRequestProcessor.java:55)
> > > > > at
> > > > >
> > > >
> > org.apache.solr.update.processor.DistributedUpdateProcessor.
> doLocalAdd(DistributedUpdateProcessor.java:979)
> > > > > at
> > > > >
> > > >
> > org.apache.solr.update.processor.DistributedUpdateProcessor.
> versionAdd(DistributedUpdateProcessor.java:1192)
> > > > > at
> > > > >
> > > >
> > org.apache.solr.update.processor.DistributedUpdateProcessor.
> processAdd(DistributedUpdateProcessor.java:748)
> > > > > at
> > > > >
> > > >
> > org.apache.solr.update.processor.UpdateRequestProcessor.proc
> essAdd(UpdateRequestProcessor.java:55)
> > > > > at
> > > > >
> > > >
> > org.apache.solr.update.processor.StatelessScriptUpdateProcessorFactory$
> ScriptUpdateProcessor.processAdd(StatelessScriptUpdateProces
> sorFactory.java:380)
> > > > > at
> > > > >
> > > >
> > org.apache.solr.handler.loader.JavabinLoader$1.update(Javabi
> nLoader.java:98)
> > > > > at
> > > > >
> > > >

Re: SOLR Index Time Running Optimization

2018-09-26 Thread Susheel Kumar

Also are you using Solr data import? That will be much slower compare to if
you write our own little indexer which does indexing in batches and with
multiple threads.

On Wed, Sep 26, 2018 at 8:00 AM Vincenzo D'Amore  wrote:

> Hi, I know this is the shortest way but, had you tried to add more core or
> CPU to your solr instances? How big is you collection in terms of GB and
> number of documents?
>
> Ciao,
> Vincenzo
>
>
> > On 26 Sep 2018, at 08:36, Krizelle Mae Hernandez <
> krizellemae.marti...@sas.com> wrote:
> >
> > Hi.
> >
> > Our SOLR currently is running approximately 39hours for Full and Delta
> Import. I would like to ask for your assistance on how can we shorten the
> 39hours run time in any possible solution?
> > For SOLR version, we are using solr 5.3.1.
> >
> > Regards,
> > Krizelle Mae M. Hernandez
>

Re: SOLR Index Time Running Optimization

2018-09-26 Thread Vincenzo D'Amore

Hi, I know this is the shortest way but, had you tried to add more core or CPU 
to your solr instances? How big is you collection in terms of GB and number of 
documents?

Ciao,
Vincenzo

> On 26 Sep 2018, at 08:36, Krizelle Mae Hernandez 
>  wrote:
> 
> Hi.
> 
> Our SOLR currently is running approximately 39hours for Full and Delta 
> Import. I would like to ask for your assistance on how can we shorten the 
> 39hours run time in any possible solution?
> For SOLR version, we are using solr 5.3.1.
> 
> Regards,
> Krizelle Mae M. Hernandez

Re: Solr empty highlight entry on match?

2018-09-26 Thread Yasufumi Mizoguchi

Hi,

The documents might be too long to highlight, I think.
See "hl.maxAnalyzedChars" in reference guide.
https://lucene.apache.org/solr/guide/7_4/highlighting.html

Try to increase hl.maxAnalyzedChars value
or to use hl.alternateField, hl.maxAlternateFieldLength to create
snippets even if Solr failed to create snippets.

Thanks,
Yasufumi

2018年9月26日(水) 2:51 Zartmann, Matthias :

> Hi i'm new in solr and have a problem with the highlighter. The
> highlighter returns not for every match a highlight text, it works in most
> cases but not in all (see example,the second entry).
> Can anybody help me?
>
> Solr Version: 7.4.0
>
> Query:
>
> http://localhost:8983/solr/mpdv/select?hl.fl=mpdv_content_de=on=mpdv_content_de:%22Dynamisches%20Anwendungsverhalten%22
>
>
>
> Result:
>
> "highlighting":{
>
>
> "y:\\MPDVAll\\ProductDocumentations\\de\\Procedures\\MDS_MOC\\MDS-Extensibility.pdf":{
>
>   "mpdv_content_de":[" Dynamisches Anwendungsverhalten
> \n\n* Spezifische Logik zur Sichtbarkeit/Aktivierbarkeit"]},
>
>
> "y:\\MPDVAll\\ProductDocumentations\\de\\FunctionPackages\\MDS-BAS_8.1\\MDS-BAS_81.pdf":{}}}
>
>
> Debugoutput:
>
>
> "debug":{
>
> "rawquerystring":"mpdv_content_de:\"Dynamisches Anwendungsverhalten\"",
>
> "querystring":"mpdv_content_de:\"Dynamisches Anwendungsverhalten\"",
>
> "parsedquery":"PhraseQuery(mpdv_content_de:\"dynamisch
> anwendungsverhalt\")",
>
> "parsedquery_toString":"mpdv_content_de:\"dynamisch anwendungsverhalt\"",
>
> "explain":{
>
>
> "y:\\MPDVAll\\ProductDocumentations\\de\\Procedures\\MDS_MOC\\MDS-Extensibility.pdf":"\n11.151565
> = weight(mpdv_content_de:\"dynamisch anwendungsverhalt\" in 2351)
> [SchemaSimilarity], result of:\n  11.151565 = score(doc=2351,freq=2.0 =
> phraseFreq=2.0\n), product of:\n8.509058 = idf(), sum of:\n
> 2.1873097 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq
> + 0.5)) from:\n343.0 = docFreq\n3060.0 = docCount\n
> 6.3217487 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq
> + 0.5)) from:\n5.0 = docFreq\n3060.0 = docCount\n
> 1.3105522 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b
> * fieldLength / avgFieldLength)) from:\n  2.0 = phraseFreq=2.0\n
> 1.2 = parameter k1\n  0.75 = parameter b\n  1436.7814 =
> avgFieldLength\n  1688.0 = fieldLength\n",
>
>
> "y:\\MPDVAll\\ProductDocumentations\\de\\FunctionPackages\\MDS-BAS_8.1\\MDS-BAS_81.pdf":"\n1.0496296
> = weight(mpdv_content_de:\"dynamisch anwendungsverhalt\" in 1372)
> [SchemaSimilarity], result of:\n  1.0496296 = score(doc=1372,freq=2.0 =
> phraseFreq=2.0\n), product of:\n8.509058 = idf(), sum of:\n
> 2.1873097 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq
> + 0.5)) from:\n343.0 = docFreq\n3060.0 = docCount\n
> 6.3217487 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq
> + 0.5)) from:\n5.0 = docFreq\n3060.0 = docCount\n
> 0.123354375 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b +
> b * fieldLength / avgFieldLength)) from:\n  2.0 = phraseFreq=2.0\n
> 1.2 = parameter k1\n  0.75 = parameter b\n  1436.7814 =
> avgFieldLength\n  53272.0 = fieldLength\n"},
>
> "QParser":"LuceneQParser",
>
> "timing":{
>
>   "time":9.0,
>
>   "prepare":{
>
> "time":0.0,
>
> "query":{
>
>   "time":0.0},
>
> "facet":{
>
>   "time":0.0},
>
> "facet_module":{
>
>   "time":0.0},
>
> "mlt":{
>
>   "time":0.0},
>
> "highlight":{
>
>   "time":0.0},
>
> "stats":{
>
>   "time":0.0},
>
> "expand":{
>
>   "time":0.0},
>
> "terms":{
>
>   "time":0.0},
>
> "debug":{
>
>   "time":0.0}},
>
>   "process":{
>
> "time":8.0,
>
> "query":{
>
>   "time":0.0},
>
> "facet":{
>
>   "time":0.0},
>
> "facet_module":{
>
>   "time":0.0},
>
> "mlt":{
>
>   "time":0.0},
>
> "highlight":{
>
>   "time":7.0},
>
> "stats":{
>
>   "time":0.0},
>
> "expand":{
>
>   "time":0.0},
>
> "terms":{
>
>   "time":0.0},
>
> "debug":{
>
>   "time":1.0}
>
>
> Thank's
> Matthias
>

searching is slow while adding document each time

2018-09-26 Thread Mugeesh Husain

Hi,

We are running 3 node solr cloud(4.4) in our production infrastructure, We
recently moved our SOLR server host softlayer to digital ocean server with
same configuration as production.

Now we are facing some slowness in the searcher when we index document, when
we stop indexing then searches is fine, while adding document then it become
slow. one of solr server we are indexing other 2 for searching the request.


I am just wondering what was the reason searches become slow while indexing
even we are using same configuration as we had in prod? 

at the time we are pushing 500 document at a time, this processing is
continuously running(adding & deleting)

these are the indexing logs

65497339 [http-apr-8980-exec-45] INFO 
org.apache.solr.update.processor.LogUpdateProcessor  – [rn0] webapp=/solr
path=/update
params={distrib.from=http://solrhost:8980/solr/rn0/=FROMLEADER=javabin=2=dedupe}
{add=[E4751FCCE977BAC7 (1612655281518411776), 8E712AD1BE76AB63
(1612655281527848960), 789AA5D0FB149A37 (1612655281538334720),
B4F3AA526506F6B7 (1612655281553014784), A9F29F556F6CD1C8
(1612655281566646272), 8D15813305BF7417 (1612655281584472064),
DD13CFA12973E85B (1612655281596006400), 3C93BDBA5DFDE3B3
(1612655281613832192), 96981A0785BFC9BF (1612655281625366528),
D1E52788A466E484 (1612655281636900864)]} 0 9
65497459 [http-apr-8980-exec-22] INFO 
org.apache.solr.update.processor.LogUpdateProcessor  – [rn0] webapp=/solr
path=/update
params={distrib.from=http://solrhost:8980/solr/rn0/=FROMLEADER=javabin=2=dedupe}
{add=[D8AA2E196967D241 (1612655281649483776), E73420772E3235B7
(1612655281666260992), DFDCF1F8325A3EF6 (1612655281680941056),
1B10EF90E7C3695F (1612655281689329664), 51CBD7F59644A718
(1612655281699815424), 1D31EF403AF13E04 (1612655281714495488),
68E1DC3A614B7269 (1612655281723932672), F9BF6A3CF89D74FB
(1612655281737564160), 419E017E1F360EB6 (1612655281749098496),
50EF977E5E873065 (1612655281759584256)]} 0 9
65497572 [http-apr-8980-exec-40] INFO 
org.apache.solr.update.processor.LogUpdateProcessor  – [rn0] webapp=/solr
path=/update
params={distrib.from=http://solrhost:8980/solr/rn0/=FROMLEADER=javabin=2=dedupe}
{add=[B63AD0671A5E57B9 (1612655281772167168), 00B8A4CCFABFA1AC
(1612655281784750080), 9C89A1516C9166E6 (1612655281798381568),
9322E17ECEAADE66 (1612655281803624448), C6DDB4BF8E94DE6B
(1612655281814110208), DAA49178A5E74285 (1612655281830887424),
829C2AE38A3E78E4 (1612655281845567488), 4C7B19756D8E4208
(1612655281859198976), BE0F7354DC30164C (1612655281869684736),
59C4A764BB50B13B (1612655281880170496)]} 0 9
65497724 [http-apr-8980-exec-31] INFO 
org.apache.solr.update.processor.LogUpdateProcessor  – [rn0] webapp=/solr
path=/update
params={distrib.from=http://solrhost:8980/solr/rn0/=FROMLEADER=javabin=2=dedupe}
{add=[1F694F99367D7CE1 (1612655281895899136), 2AEAAF67A6893ABE
(1612655281911627776), 81E72DC36C7A9EBC (1612655281926307840),
AA71BD9B23548E6D (1612655281939939328), 359E8C4C6EC72AFA
(1612655281954619392), 7FEB6C65A3E23311 (1612655281972445184),
9B5ED0BE7AFDD1D0 (1612655281991319552), 99FE8958F6ED8B91
(1612655282009145344), 2BDC61DC4038E19F (1612655282023825408),
5131AEC4B87FBFE9 (1612655282037456896)]} 0 10




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Rule-based replication or sharing

2018-09-26 Thread Noble Paul

The rules suggested by Steve is correct. I tested it locally and I got
the same errors. That means a bug exists probably.
All the new development efforts are invested in the new policy feature
.https://lucene.apache.org/solr/guide/7_4/solrcloud-autoscaling-policy-preferences.html

The old one is going to be deprecated pretty soon. So, I'm not sure if
we should be investing our resources here
On Wed, Sep 26, 2018 at 1:23 PM Chuck Reynolds  wrote:
>
> Shawn,
>
> Thanks for the info. We’ve been running this way for the past 4 years.
>
> We were running on very large hardware, 20 physical cores with 256 gigs of 
> ram with 3 billion document and it was the only way we could take advantage 
> of the hardware.
>
> Running 1 Solr instance per server never gave us the throughput we needed.
>
> So I somewhat disagree with your statement because our test proved otherwise.
>
> Thanks for the info.
>
> Sent from my iPhone
>
> > On Sep 25, 2018, at 4:19 PM, Shawn Heisey  wrote:
> >
> >> On 9/25/2018 9:21 AM, Chuck Reynolds wrote:
> >> Each server has three instances of Solr running on it so every instance on 
> >> the server has to be in the same replica set.
> >
> > You should be running exactly one Solr instance per server.  When 
> > evaluating rules for replica placement, SolrCloud will treat each instance 
> > as completely separate from all others, including others on the same 
> > machine.  It will not know that those three instances are on the same 
> > machine.  One Solr instance can handle MANY indexes.
> >
> > There is only ONE situation where it makes sense to run multiple instances 
> > per machine, and in my strong opinion, even that situation should not be 
> > handled with multiple instances. That situation is this:  When running one 
> > instance would require a REALLY large heap.  Garbage collection pauses can 
> > become extreme in that situation, so some people will run multiple 
> > instances that each have a smaller heap, and divide their indexes between 
> > them. In my opinion, when you have enough index data on an instance that it 
> > requires a huge heap, instead of running two or more instances on one 
> > server, it's time to add more servers.
> >
> > Thanks,
> > Shawn
> >



-- 
-
Noble Paul

SOLR Index Time Running Optimization

2018-09-26 Thread Krizelle Mae Hernandez

Hi.

Our SOLR currently is running approximately 39hours for Full and Delta Import. 
I would like to ask for your assistance on how can we shorten the 39hours run 
time in any possible solution?
For SOLR version, we are using solr 5.3.1.

Regards,
Krizelle Mae M. Hernandez

Solr empty highlight entry on match?

2018-09-26 Thread Zartmann, Matthias

Hi i'm new in solr and have a problem with the highlighter. The highlighter 
returns not for every match a highlight text, it works in most cases but not in 
all (see example,the second entry).
Can anybody help me?

Solr Version: 7.4.0

Query:
http://localhost:8983/solr/mpdv/select?hl.fl=mpdv_content_de=on=mpdv_content_de:%22Dynamisches%20Anwendungsverhalten%22



Result:

"highlighting":{

"y:\\MPDVAll\\ProductDocumentations\\de\\Procedures\\MDS_MOC\\MDS-Extensibility.pdf":{

  "mpdv_content_de":[" Dynamisches Anwendungsverhalten  \n\n* 
Spezifische Logik zur Sichtbarkeit/Aktivierbarkeit"]},

"y:\\MPDVAll\\ProductDocumentations\\de\\FunctionPackages\\MDS-BAS_8.1\\MDS-BAS_81.pdf":{}}}


Debugoutput:


"debug":{

"rawquerystring":"mpdv_content_de:\"Dynamisches Anwendungsverhalten\"",

"querystring":"mpdv_content_de:\"Dynamisches Anwendungsverhalten\"",

"parsedquery":"PhraseQuery(mpdv_content_de:\"dynamisch anwendungsverhalt\")",

"parsedquery_toString":"mpdv_content_de:\"dynamisch anwendungsverhalt\"",

"explain":{

  
"y:\\MPDVAll\\ProductDocumentations\\de\\Procedures\\MDS_MOC\\MDS-Extensibility.pdf":"\n11.151565
 = weight(mpdv_content_de:\"dynamisch anwendungsverhalt\" in 2351) 
[SchemaSimilarity], result of:\n  11.151565 = score(doc=2351,freq=2.0 = 
phraseFreq=2.0\n), product of:\n8.509058 = idf(), sum of:\n  2.1873097 
= idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) 
from:\n343.0 = docFreq\n3060.0 = docCount\n  6.3217487 = 
idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:\n  
  5.0 = docFreq\n3060.0 = docCount\n1.3105522 = tfNorm, 
computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / 
avgFieldLength)) from:\n  2.0 = phraseFreq=2.0\n  1.2 = parameter k1\n  
0.75 = parameter b\n  1436.7814 = avgFieldLength\n  1688.0 = 
fieldLength\n",

  
"y:\\MPDVAll\\ProductDocumentations\\de\\FunctionPackages\\MDS-BAS_8.1\\MDS-BAS_81.pdf":"\n1.0496296
 = weight(mpdv_content_de:\"dynamisch anwendungsverhalt\" in 1372) 
[SchemaSimilarity], result of:\n  1.0496296 = score(doc=1372,freq=2.0 = 
phraseFreq=2.0\n), product of:\n8.509058 = idf(), sum of:\n  2.1873097 
= idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) 
from:\n343.0 = docFreq\n3060.0 = docCount\n  6.3217487 = 
idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:\n  
  5.0 = docFreq\n3060.0 = docCount\n0.123354375 = tfNorm, 
computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / 
avgFieldLength)) from:\n  2.0 = phraseFreq=2.0\n  1.2 = parameter k1\n  
0.75 = parameter b\n  1436.7814 = avgFieldLength\n  53272.0 = 
fieldLength\n"},

"QParser":"LuceneQParser",

"timing":{

  "time":9.0,

  "prepare":{

"time":0.0,

"query":{

  "time":0.0},

"facet":{

  "time":0.0},

"facet_module":{

  "time":0.0},

"mlt":{

  "time":0.0},

"highlight":{

  "time":0.0},

"stats":{

  "time":0.0},

"expand":{

  "time":0.0},

"terms":{

  "time":0.0},

"debug":{

  "time":0.0}},

  "process":{

"time":8.0,

"query":{

  "time":0.0},

"facet":{

  "time":0.0},

"facet_module":{

  "time":0.0},

"mlt":{

  "time":0.0},

"highlight":{

  "time":7.0},

"stats":{

  "time":0.0},

"expand":{

  "time":0.0},

"terms":{

  "time":0.0},

"debug":{

  "time":1.0}


Thank's
Matthias

51 matches

Mail list logo