Prefix for IndexBasedSpellChecker in Solr

2020-06-06 Thread Amrit Sarkar
Hi Solr folks, been a while.

I was experimenting with Spell Checkers and adopted IndexBasedSpellChecker.

It works well, except there is no way to configure *minimum** prefix* like
DirectSolrSpellChecker (it obviously doesn't need any auxiliary index).

I looked at the implementation and understood *DirectSolrSpellChecker *
calculates *FuzzyTermsEnum *on top of the live index. But usage
DirectSolrSpellChecker obviously adds overhead on querying the same index,
the standard queries are made. The overhead depends on the use case.

For e-commerce, what is the recommended way of solving spell check of
incoming queries with a (hardened) minimum prefix chars, e.g.
*mushroom *doesn't
spell check with *washroom *with *IndexBasedSpellChecker*? Is there any? If
not, what is the recommended way apart from using DirectSolrSpellChecker
itself? Thanks in advance.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2


Re: Limit Solr Disk IO

2020-06-06 Thread Erick Erickson
New segments are created when
1> the RAMBufferSizeMB is exceeded
or
2> a commit happens.

The maximum segment size defaults to 5G, but TieredMergePolicy can be 
configured in solrconfig.xml to have larger max sizes by setting 
maxMergedSegmentMB

Depending on your indexing rate, requiring commits every 100K records may be 
too frequent, I have no idea what your indexing rate is. In general I prefer a 
time based autocommit policy. Say, for some reason, you stop indexing after 50K 
records. They’ll never be searchable unless you have a time-based commit. 
Besides, it’s much easier to explain to users “it may take 60 seconds for your 
doc to be searchable” than “well, depending on the indexing rate, it may be 
between 10 seconds and 6 hours for your docs to be searchable”. Of course if 
you’re indexing at a very fast rate, that may not matter.

There’s no such thing as low disk read during segment merging”. If 5 segments 
need to be read, they all must be read in their entirety and the new segment 
must be completely written out. At best you can try to cut down on the number 
of times segment merges happen, but from what you’re describing that may not be 
feasible. 

Attachments are aggressively stripped by the mail server, your graph did not 
come through.

Once a segment grows to the max size (5g by default), it is not mreged again 
unless and until it accumulates quite a number of deleted documents. So one 
question is whether you update existing documents frequently. Is that the case? 
If not, then the index size really shouldn’t matter and your problem is 
something else.

And I sincerely hope that part of your indexing does _NOT_ include 
optimize/forcemerge or expungeDeletes. Those are very expensive operations, and 
prior to Solr 7.5 would leave your index in an awkward state, see: 
https://lucidworks.com/post/segment-merging-deleted-documents-optimize-may-bad/.
 There’s a link for how this is different in Solr 7.5+ in that article.

But something smells fishy about this situation. Segment merging is typically 
not very noticeable. Perhaps you just have too much data on too small hardware? 
You’ve got some evidence that segment merging is the root cause, but I wonder 
if what’s happening is you’re just swapping instead? Segment merging will 
certainly increase the I/O pressure, but by and large that shouldn’t really 
affect search speed if the OS memory space is large enough to hold the 
important portions of your index. If the OS isn’t large enough, the additional 
I/O pressure from merging may be enough to start your system swapping which is 
A Bad Thing.

See: https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html 
for how Lucene uses MMapDirectory...

Best,
Erick

> On Jun 6, 2020, at 11:29 AM, Anshuman Singh  wrote:
> 
> Hi Eric, 
> 
> We are looking into TLOG/PULL replicas. But I have some doubts regarding 
> segments. Can you explain what causes creation of a new segment and how large 
> it can grow?
> And this is my index config:
> maxMergeAtOnce - 20
> segmentsPerTier - 20
> ramBufferSizeMB - 512 MB
> 
> Can I configure these settings optimally for low disk read during segment 
> merging? Like increasing segmentsPerTier may help but a large number of 
> segments may impact search. And as per the documentation, ramBufferSizeMB can 
> trigger segment merging so maybe that can be tweaked.
> 
> One more question:
> This graph is representing index time wrt core size (0-100G). Commits were 
> happening automatically at every 100k records.
> 
> 
> 
> As you can see the density of spikes is increasing as the core size is 
> increasing. When our core size becomes ~100 G, indexing becomes really slow. 
> Why is this happening? Do we need to put a limit on how large each core can 
> grow?
> 
> 
> On Fri, Jun 5, 2020 at 5:59 PM Erick Erickson  wrote:
> Have you considered TLOG/PULL replicas rather than NRT replicas? 
> That way, all the indexing happens on a single machine and you can
> use shards.preference to confine the searches happen on the PULL replicas,
> see:  https://lucene.apache.org/solr/guide/7_7/distributed-requests.html
> 
> No, you can’t really limit the number of segments. While that seems like a
> good idea, it quickly becomes counter-productive. Say you require that you
> have 10 segments. Say each one becomes 10G. What happens when the 11th
> segment is created and it’s 100M? Do you rewrite one of the 10G segments just
> to add 100M? Your problem gets worse, not better.
> 
> 
> Best,
> Erick
> 
> > On Jun 5, 2020, at 1:41 AM, Anshuman Singh  
> > wrote:
> > 
> > Hi Nicolas,
> > 
> > Commit happens automatically at 100k documents. We don't commit explicitly.
> > We didn't limit the number of segments. There are 35+ segments in each core.
> > But unrelated to the question, I would like to know if we can limit the
> > number of segments in the core. I tried it in the past but the merge
> > policies don't allow that.
> > The TieredMergePolicy has two parameters, 

Re: Limit Solr Disk IO

2020-06-06 Thread Anshuman Singh
Hi Eric,

We are looking into TLOG/PULL replicas. But I have some doubts regarding
segments. Can you explain what causes creation of a new segment and how
large it can grow?
And this is my index config:
maxMergeAtOnce - 20
segmentsPerTier - 20
ramBufferSizeMB - 512 MB

Can I configure these settings optimally for low disk read during segment
merging? Like increasing segmentsPerTier may help but a large number of
segments may impact search. And as per the documentation, ramBufferSizeMB
can trigger segment merging so maybe that can be tweaked.

One more question:
This graph is representing index time wrt core size (0-100G). Commits were
happening automatically at every 100k records.

[image: image.png]

As you can see the density of spikes is increasing as the core size is
increasing. When our core size becomes ~100 G, indexing becomes really
slow. Why is this happening? Do we need to put a limit on how large each
core can grow?


On Fri, Jun 5, 2020 at 5:59 PM Erick Erickson 
wrote:

> Have you considered TLOG/PULL replicas rather than NRT replicas?
> That way, all the indexing happens on a single machine and you can
> use shards.preference to confine the searches happen on the PULL replicas,
> see:  https://lucene.apache.org/solr/guide/7_7/distributed-requests.html
>
> No, you can’t really limit the number of segments. While that seems like a
> good idea, it quickly becomes counter-productive. Say you require that you
> have 10 segments. Say each one becomes 10G. What happens when the 11th
> segment is created and it’s 100M? Do you rewrite one of the 10G segments
> just
> to add 100M? Your problem gets worse, not better.
>
>
> Best,
> Erick
>
> > On Jun 5, 2020, at 1:41 AM, Anshuman Singh 
> wrote:
> >
> > Hi Nicolas,
> >
> > Commit happens automatically at 100k documents. We don't commit
> explicitly.
> > We didn't limit the number of segments. There are 35+ segments in each
> core.
> > But unrelated to the question, I would like to know if we can limit the
> > number of segments in the core. I tried it in the past but the merge
> > policies don't allow that.
> > The TieredMergePolicy has two parameters, maxMergeAtOnce and
> > segmentsPerTier. It seems like we cannot control the total number of
> > segments but only the segments per tier.(
> >
> http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html
> > )
> >
> >
> > On Thu, Jun 4, 2020 at 5:48 PM Nicolas Franck 
> > wrote:
> >
> >> The real questions are:
> >>
> >> * how much often do you commit (either explicitly or automatically)?
> >> * how much segments do you allow? If you only allow 1 segment,
> >>  then that whole segment is recreated using the old documents and the
> >> updates.
> >>  And yes, that requires reading the old segment.
> >>  It is common to allow multiple segments when you update often,
> >>  so updating does not interfere with reading the index too often.
> >>
> >>
> >>> On 4 Jun 2020, at 14:08, Anshuman Singh 
> >> wrote:
> >>>
> >>> I noticed that while indexing, when commit happens, there is high disk
> >> read
> >>> by Solr. The problem is that it is impacting search performance when
> the
> >>> index is loaded from the disk with respect to the query, as the disk
> read
> >>> speed is not quite good and the whole index is not cached in RAM.
> >>>
> >>> When no searching is performed, I noticed that disk is usually read
> >> during
> >>> commit operations and sometimes even without commit at low rate. I
> guess
> >> it
> >>> is read due to segment merge operations. Can it be something else?
> >>> If it is merging, can we limit disk IO during merging?
> >>
> >>
>
>


Re: SolrClient.query take a 'collection' argument

2020-06-06 Thread Jim Anderson
Erick,

Thanks for the clarification on the JVM heap space. I will invoke java as
you advise.

The program that I am writing is a java example that I took off the
internet. The intent of the example is to read an existing core stored in
solr. I created the core using instructions that I found in a tutorial. I
think the example from the tutorial worked ok, because I can see the core
in solr that was created using nutch. So I think my status is that I have a
good core, and I was trying to read and print out the documents in that
core.

My current plan is to try to find and intall Nutch 1.17 and then clear and
reinstall solr 8.5.1 and start over again with a clean slate.

Regards,
Jim


On Sat, Jun 6, 2020 at 10:25 AM Erick Erickson 
wrote:

> I’m not talking about how much memory your machine has,
> the critical bit it’s how much heap space is allocated to the
> JVM to run your app.
>
> You can increase it by specifying -Xmx2G say when you
> invoke Java.
>
> The version difference is suspicious indeed. I’m a little
> confused here. Exactly _what_ program is crashing? An
> independent app you wrote or nutch? If the former, you could
> try compiling your Java app against the Solr jars provided
> with the Solr version that ships with Nutch 1.16 (Solr 7.3.1?).
>
> Best,
> Erick
>
> > On Jun 6, 2020, at 9:30 AM, Jim Anderson 
> wrote:
> >
> > Erick,
> >
> > Thanks for the suggestion. I will keep it in the back of my mind for now.
> > My PC has 8 G-bytes of memory and has roughly 4 G-bytes in use.
> >
> > If the forefront, I'm looking at the recommended solr/nutch combinations.
> > I'm using Solr 8.5.1 with nutch 1.16. The recommendation is to use nutch
> > 1.17 with Solr 8.5.1, but 1.17 has not been released for download.
> > Consequently, I used nutch 1.16. I'm not sure that will make a
> difference,
> > but I am suspicious.
> >
> > Jim
> >
> > On Sat, Jun 6, 2020 at 9:18 AM Erick Erickson 
> > wrote:
> >
> >> I’d look for an OutOfMemory problem before going too much farther.
> >> The simplest way to see if that’s in the right direction would be to
> >> run your SolrJ program with a massive memory size. Perhaps monitor
> >> your program with jconsole or similar to see if there’s any clues about
> >> memory usage.
> >>
> >> OOMs lead to unpredictable behavior, so it’s at least a possibility that
> >> this is the root cause. If so, there’s nothing SolrJ can do about it
> >> exactly
> >> because the state of a program is indeterminate afterwards, even if the
> >> OOM is caught somewhere. I suppose you could also try to catch that
> >> exception in the top-level of your program.
> >>
> >> I’m assuming a stand-alone program here, if you’re running some custom
> >> code in Solr itself, make sure the oom-killer script is running.
> >>
> >> Best,
> >> Erick
> >>
> >>> On Jun 6, 2020, at 8:23 AM, Jim Anderson 
> >> wrote:
> >>>
> >>> Shawn,
> >>>
> >>> Thanks for the explanation. Very good response.
> >>>
> >>> The first paragraph helped clarify what a collection is. I have read
> >> quite
> >>> about about Solr. There is so much to absorb that it is slowly sinking
> >> in.
> >>> Your 2nd paragraph definitely answered my question, i.e. passing a core
> >>> name should be ok when a collection name is specified as a method
> >> argument.
> >>> This is what I did.
> >>>
> >>> Regarding the 3rd paragraph, it is good to know that Solrj is fairly
> >> robust
> >>> and should not be crashing. Nevertheless, that is what is happening.
> The
> >>> call to client.query() is wrapped in a try/catch sequence. Apparently
> no
> >>> exceptions were detected, or the program crashed before the exception
> >> could
> >>> be raised.
> >>>
> >>> My next step is to check where I can report this to the Solr folks and
> >> see
> >>> if they can figure out what it is crashing. BTW, I had not checked my
> >>> output file before this morning. The output file indicates that the
> >> program
> >>> ran to completion, so I am guessing that at least one other thread is
> >> being
> >>> created and that that  thread is crashing.
> >>>
> >>> Regards,
> >>> Jim
> >>>
> >>> On Fri, Jun 5, 2020 at 10:52 PM Shawn Heisey 
> >> wrote:
> >>>
>  On 6/5/2020 4:24 PM, Jim Anderson wrote:
> > I am running my first solrj program and it is crashing when I call
> the
> > method
> >
> > client.query("coreName",queryParms)
> >
> > The API doc says the string should be a collection. I'm still not
> sure
> > about the difference between a collection and a core, so what I am
> >> doing
>  is
> > likely illegal. Given that I have created a core, create a collection
>  from
> > it so that I can truly pass a collection name to the query function?
> 
>  The concept of a collection comes from SolrCloud.  A collection is
> made
>  up of one or more shards.  A shard is made up of one or more replicas.
>  Each replica is a core.  If you're not running SolrCloud, then you do
>  not have collections.
> 
>  

Re: SolrClient.query take a 'collection' argument

2020-06-06 Thread Erick Erickson
I’m not talking about how much memory your machine has, 
the critical bit it’s how much heap space is allocated to the
JVM to run your app.

You can increase it by specifying -Xmx2G say when you 
invoke Java.

The version difference is suspicious indeed. I’m a little 
confused here. Exactly _what_ program is crashing? An
independent app you wrote or nutch? If the former, you could
try compiling your Java app against the Solr jars provided
with the Solr version that ships with Nutch 1.16 (Solr 7.3.1?).

Best,
Erick

> On Jun 6, 2020, at 9:30 AM, Jim Anderson  wrote:
> 
> Erick,
> 
> Thanks for the suggestion. I will keep it in the back of my mind for now.
> My PC has 8 G-bytes of memory and has roughly 4 G-bytes in use.
> 
> If the forefront, I'm looking at the recommended solr/nutch combinations.
> I'm using Solr 8.5.1 with nutch 1.16. The recommendation is to use nutch
> 1.17 with Solr 8.5.1, but 1.17 has not been released for download.
> Consequently, I used nutch 1.16. I'm not sure that will make a difference,
> but I am suspicious.
> 
> Jim
> 
> On Sat, Jun 6, 2020 at 9:18 AM Erick Erickson 
> wrote:
> 
>> I’d look for an OutOfMemory problem before going too much farther.
>> The simplest way to see if that’s in the right direction would be to
>> run your SolrJ program with a massive memory size. Perhaps monitor
>> your program with jconsole or similar to see if there’s any clues about
>> memory usage.
>> 
>> OOMs lead to unpredictable behavior, so it’s at least a possibility that
>> this is the root cause. If so, there’s nothing SolrJ can do about it
>> exactly
>> because the state of a program is indeterminate afterwards, even if the
>> OOM is caught somewhere. I suppose you could also try to catch that
>> exception in the top-level of your program.
>> 
>> I’m assuming a stand-alone program here, if you’re running some custom
>> code in Solr itself, make sure the oom-killer script is running.
>> 
>> Best,
>> Erick
>> 
>>> On Jun 6, 2020, at 8:23 AM, Jim Anderson 
>> wrote:
>>> 
>>> Shawn,
>>> 
>>> Thanks for the explanation. Very good response.
>>> 
>>> The first paragraph helped clarify what a collection is. I have read
>> quite
>>> about about Solr. There is so much to absorb that it is slowly sinking
>> in.
>>> Your 2nd paragraph definitely answered my question, i.e. passing a core
>>> name should be ok when a collection name is specified as a method
>> argument.
>>> This is what I did.
>>> 
>>> Regarding the 3rd paragraph, it is good to know that Solrj is fairly
>> robust
>>> and should not be crashing. Nevertheless, that is what is happening. The
>>> call to client.query() is wrapped in a try/catch sequence. Apparently no
>>> exceptions were detected, or the program crashed before the exception
>> could
>>> be raised.
>>> 
>>> My next step is to check where I can report this to the Solr folks and
>> see
>>> if they can figure out what it is crashing. BTW, I had not checked my
>>> output file before this morning. The output file indicates that the
>> program
>>> ran to completion, so I am guessing that at least one other thread is
>> being
>>> created and that that  thread is crashing.
>>> 
>>> Regards,
>>> Jim
>>> 
>>> On Fri, Jun 5, 2020 at 10:52 PM Shawn Heisey 
>> wrote:
>>> 
 On 6/5/2020 4:24 PM, Jim Anderson wrote:
> I am running my first solrj program and it is crashing when I call the
> method
> 
> client.query("coreName",queryParms)
> 
> The API doc says the string should be a collection. I'm still not sure
> about the difference between a collection and a core, so what I am
>> doing
 is
> likely illegal. Given that I have created a core, create a collection
 from
> it so that I can truly pass a collection name to the query function?
 
 The concept of a collection comes from SolrCloud.  A collection is made
 up of one or more shards.  A shard is made up of one or more replicas.
 Each replica is a core.  If you're not running SolrCloud, then you do
 not have collections.
 
 Wherever SolrJ docs says "collection" as a parameter for a request, it
 is likely that you can think "core" instead and have it still be
 correct.  If you're running SolrCloud, you'll want to be very careful to
 know the difference.
 
 It seems very odd for a SolrJ query to cause the program to crash.  It
 would be pretty common for it to throw an exception, but that's not the
 same as a crash, unless exception handling is incorrect or missing.
 
 Thanks,
 Shawn
 
>> 
>> 



Re: SolrClient.query take a 'collection' argument

2020-06-06 Thread Jim Anderson
Erick,

Thanks for the suggestion. I will keep it in the back of my mind for now.
My PC has 8 G-bytes of memory and has roughly 4 G-bytes in use.

If the forefront, I'm looking at the recommended solr/nutch combinations.
I'm using Solr 8.5.1 with nutch 1.16. The recommendation is to use nutch
1.17 with Solr 8.5.1, but 1.17 has not been released for download.
Consequently, I used nutch 1.16. I'm not sure that will make a difference,
but I am suspicious.

Jim

On Sat, Jun 6, 2020 at 9:18 AM Erick Erickson 
wrote:

> I’d look for an OutOfMemory problem before going too much farther.
> The simplest way to see if that’s in the right direction would be to
> run your SolrJ program with a massive memory size. Perhaps monitor
> your program with jconsole or similar to see if there’s any clues about
> memory usage.
>
> OOMs lead to unpredictable behavior, so it’s at least a possibility that
> this is the root cause. If so, there’s nothing SolrJ can do about it
> exactly
> because the state of a program is indeterminate afterwards, even if the
> OOM is caught somewhere. I suppose you could also try to catch that
> exception in the top-level of your program.
>
> I’m assuming a stand-alone program here, if you’re running some custom
> code in Solr itself, make sure the oom-killer script is running.
>
> Best,
> Erick
>
> > On Jun 6, 2020, at 8:23 AM, Jim Anderson 
> wrote:
> >
> > Shawn,
> >
> > Thanks for the explanation. Very good response.
> >
> > The first paragraph helped clarify what a collection is. I have read
> quite
> > about about Solr. There is so much to absorb that it is slowly sinking
> in.
> > Your 2nd paragraph definitely answered my question, i.e. passing a core
> > name should be ok when a collection name is specified as a method
> argument.
> > This is what I did.
> >
> > Regarding the 3rd paragraph, it is good to know that Solrj is fairly
> robust
> > and should not be crashing. Nevertheless, that is what is happening. The
> > call to client.query() is wrapped in a try/catch sequence. Apparently no
> > exceptions were detected, or the program crashed before the exception
> could
> > be raised.
> >
> > My next step is to check where I can report this to the Solr folks and
> see
> > if they can figure out what it is crashing. BTW, I had not checked my
> > output file before this morning. The output file indicates that the
> program
> > ran to completion, so I am guessing that at least one other thread is
> being
> > created and that that  thread is crashing.
> >
> > Regards,
> > Jim
> >
> > On Fri, Jun 5, 2020 at 10:52 PM Shawn Heisey 
> wrote:
> >
> >> On 6/5/2020 4:24 PM, Jim Anderson wrote:
> >>> I am running my first solrj program and it is crashing when I call the
> >>> method
> >>>
> >>> client.query("coreName",queryParms)
> >>>
> >>> The API doc says the string should be a collection. I'm still not sure
> >>> about the difference between a collection and a core, so what I am
> doing
> >> is
> >>> likely illegal. Given that I have created a core, create a collection
> >> from
> >>> it so that I can truly pass a collection name to the query function?
> >>
> >> The concept of a collection comes from SolrCloud.  A collection is made
> >> up of one or more shards.  A shard is made up of one or more replicas.
> >> Each replica is a core.  If you're not running SolrCloud, then you do
> >> not have collections.
> >>
> >> Wherever SolrJ docs says "collection" as a parameter for a request, it
> >> is likely that you can think "core" instead and have it still be
> >> correct.  If you're running SolrCloud, you'll want to be very careful to
> >> know the difference.
> >>
> >> It seems very odd for a SolrJ query to cause the program to crash.  It
> >> would be pretty common for it to throw an exception, but that's not the
> >> same as a crash, unless exception handling is incorrect or missing.
> >>
> >> Thanks,
> >> Shawn
> >>
>
>


Re: SolrClient.query take a 'collection' argument

2020-06-06 Thread Erick Erickson
I’d look for an OutOfMemory problem before going too much farther.
The simplest way to see if that’s in the right direction would be to
run your SolrJ program with a massive memory size. Perhaps monitor
your program with jconsole or similar to see if there’s any clues about
memory usage.

OOMs lead to unpredictable behavior, so it’s at least a possibility that
this is the root cause. If so, there’s nothing SolrJ can do about it exactly
because the state of a program is indeterminate afterwards, even if the
OOM is caught somewhere. I suppose you could also try to catch that
exception in the top-level of your program.

I’m assuming a stand-alone program here, if you’re running some custom
code in Solr itself, make sure the oom-killer script is running.

Best,
Erick

> On Jun 6, 2020, at 8:23 AM, Jim Anderson  wrote:
> 
> Shawn,
> 
> Thanks for the explanation. Very good response.
> 
> The first paragraph helped clarify what a collection is. I have read quite
> about about Solr. There is so much to absorb that it is slowly sinking in.
> Your 2nd paragraph definitely answered my question, i.e. passing a core
> name should be ok when a collection name is specified as a method argument.
> This is what I did.
> 
> Regarding the 3rd paragraph, it is good to know that Solrj is fairly robust
> and should not be crashing. Nevertheless, that is what is happening. The
> call to client.query() is wrapped in a try/catch sequence. Apparently no
> exceptions were detected, or the program crashed before the exception could
> be raised.
> 
> My next step is to check where I can report this to the Solr folks and see
> if they can figure out what it is crashing. BTW, I had not checked my
> output file before this morning. The output file indicates that the program
> ran to completion, so I am guessing that at least one other thread is being
> created and that that  thread is crashing.
> 
> Regards,
> Jim
> 
> On Fri, Jun 5, 2020 at 10:52 PM Shawn Heisey  wrote:
> 
>> On 6/5/2020 4:24 PM, Jim Anderson wrote:
>>> I am running my first solrj program and it is crashing when I call the
>>> method
>>> 
>>> client.query("coreName",queryParms)
>>> 
>>> The API doc says the string should be a collection. I'm still not sure
>>> about the difference between a collection and a core, so what I am doing
>> is
>>> likely illegal. Given that I have created a core, create a collection
>> from
>>> it so that I can truly pass a collection name to the query function?
>> 
>> The concept of a collection comes from SolrCloud.  A collection is made
>> up of one or more shards.  A shard is made up of one or more replicas.
>> Each replica is a core.  If you're not running SolrCloud, then you do
>> not have collections.
>> 
>> Wherever SolrJ docs says "collection" as a parameter for a request, it
>> is likely that you can think "core" instead and have it still be
>> correct.  If you're running SolrCloud, you'll want to be very careful to
>> know the difference.
>> 
>> It seems very odd for a SolrJ query to cause the program to crash.  It
>> would be pretty common for it to throw an exception, but that's not the
>> same as a crash, unless exception handling is incorrect or missing.
>> 
>> Thanks,
>> Shawn
>> 



Re: SolrClient.query take a 'collection' argument

2020-06-06 Thread Jim Anderson
Shawn,

Thanks for the explanation. Very good response.

The first paragraph helped clarify what a collection is. I have read quite
about about Solr. There is so much to absorb that it is slowly sinking in.
Your 2nd paragraph definitely answered my question, i.e. passing a core
name should be ok when a collection name is specified as a method argument.
This is what I did.

Regarding the 3rd paragraph, it is good to know that Solrj is fairly robust
and should not be crashing. Nevertheless, that is what is happening. The
call to client.query() is wrapped in a try/catch sequence. Apparently no
exceptions were detected, or the program crashed before the exception could
be raised.

My next step is to check where I can report this to the Solr folks and see
if they can figure out what it is crashing. BTW, I had not checked my
output file before this morning. The output file indicates that the program
ran to completion, so I am guessing that at least one other thread is being
created and that that  thread is crashing.

Regards,
Jim

On Fri, Jun 5, 2020 at 10:52 PM Shawn Heisey  wrote:

> On 6/5/2020 4:24 PM, Jim Anderson wrote:
> > I am running my first solrj program and it is crashing when I call the
> > method
> >
> > client.query("coreName",queryParms)
> >
> > The API doc says the string should be a collection. I'm still not sure
> > about the difference between a collection and a core, so what I am doing
> is
> > likely illegal. Given that I have created a core, create a collection
> from
> > it so that I can truly pass a collection name to the query function?
>
> The concept of a collection comes from SolrCloud.  A collection is made
> up of one or more shards.  A shard is made up of one or more replicas.
> Each replica is a core.  If you're not running SolrCloud, then you do
> not have collections.
>
> Wherever SolrJ docs says "collection" as a parameter for a request, it
> is likely that you can think "core" instead and have it still be
> correct.  If you're running SolrCloud, you'll want to be very careful to
> know the difference.
>
> It seems very odd for a SolrJ query to cause the program to crash.  It
> would be pretty common for it to throw an exception, but that's not the
> same as a crash, unless exception handling is incorrect or missing.
>
> Thanks,
> Shawn
>


Re: Faster Vector Highlight

2020-06-06 Thread Yasufumi Mizoguchi
Hi, Kaya.

How about using hl.maxAnalyzedChars parameter ?

Thanks,
Yasufumi 

> 2020/06/06 午後5:56、Kayak28 のメール:
> 
> Hello, Solr Community:
> 
> I have a question about FasterVectorHighlight.
> I know Solr highlight does not return highlighted text if the text in the
> highlighted field is too long.
> What is the good way to treat long text highlights?
> 
> 
> -- 
> 
> Sincerely,
> Kaya
> github: https://github.com/28kayak


Faster Vector Highlight

2020-06-06 Thread Kayak28
Hello, Solr Community:

I have a question about FasterVectorHighlight.
I know Solr highlight does not return highlighted text if the text in the
highlighted field is too long.
What is the good way to treat long text highlights?


-- 

Sincerely,
Kaya
github: https://github.com/28kayak