What guarantees does solr have for keeping commit deadlines?

2021-02-27 Thread Nándor Mátravölgyi
Hi!

I'm working on building a NRT solr instance. The schema is designed so
documents can be partially updated. Some documents will need to
receive or lose filter tags in a multi-valued field.

I have to be able to query already existing documents to add tags to
them or remove tags from them. Obviously if I (soft-)commit after each
document is added or removed the serializable consistency would
guarantee that I can see all documents that I might want to change.
However this is not desirable in terms of performance.

I've come up with a potential solution: If I can track document
updates that I make and only call commit before I would query
documents that have been changed recently, the performance is not
sacrificed and I will get the strict consistency where I need it. For
this to work reliably the soft-auto-commit interval and the times
specified through commit-within must strictly comply with the config
and the requests.

I have auto-commit interval of 60 seconds with open-searcher false and
auto-soft-commit interval of 15 seconds. Documents will be submitted
through the REST API where some of them will also have commit-within
2-3 seconds specified.

My questions:
 - After a document indexing request has returned with success, what
level of guarantee do I have that the document will be available after
the configured soft-commit-interval?
 - After a document indexing request with commit-within has returned
with success, what level of guarantee do I have that the document will
be available after the requested commit timeout?
 - Alternatively if I could query solr when the last soft-commit was
done I could ensure to call a soft-commit myself. Is there an API to
see when or how long ago was the last (soft-)commit?

I'm primarily interested in answers regards to solr in standalone mode.

Thanks,
Nandor


Re: unified highlighter performance in solr 8.5.1

2020-07-04 Thread Nándor Mátravölgyi
I guess that's fair. Let's have hl.fragsizeIsMinimum=true as default.

On 7/4/20, David Smiley  wrote:
> I doubt that WORD mode is impacted much by hl.fragsizeIsMinimum in terms of
> quality of the highlight since there are vastly more breaks to pick from.
> I think that setting is more useful in SENTENCE mode if you can stand the
> perf hit.  If you agree, then why not just let this one default to "true"?
>
> We agree on better documenting the perf trade-off.
>
> Thanks again for working on these settings, BTW.
>
> ~ David
>
>
> On Fri, Jul 3, 2020 at 1:25 PM Nándor Mátravölgyi 
> wrote:
>
>> Since the issue seems to be affecting the highlighter differently
>> based on which mode it is using, having different defaults for the
>> modes could be explored.
>>
>> WORD may have the new defaults as it has little effect on performance
>> and it creates nicer highlights.
>> SENTENCE should have the defaults that produce reasonable performance.
>> The docs could document this while also mentioning that the UH's
>> performance is highly dependent on the underlying Java String/Text?
>> Iterator.
>>
>> One can argue that having different defaults based on mode is
>> confusing. In this case I think the defaults should be changed to have
>> the SENTENCE mode perform better. Maybe the options for nice
>> highlights with WORD mode could be put into the docs in this case as
>> some form of an example.
>>
>> As long as I can use the UH with nicely aligned snippets in WORD mode
>> I'm fine with any defaults. I explicitly set them in the config and in
>> the queries most of the time anyways.
>>
>


Re: unified highlighter performance in solr 8.5.1

2020-07-03 Thread Nándor Mátravölgyi
Since the issue seems to be affecting the highlighter differently
based on which mode it is using, having different defaults for the
modes could be explored.

WORD may have the new defaults as it has little effect on performance
and it creates nicer highlights.
SENTENCE should have the defaults that produce reasonable performance.
The docs could document this while also mentioning that the UH's
performance is highly dependent on the underlying Java String/Text?
Iterator.

One can argue that having different defaults based on mode is
confusing. In this case I think the defaults should be changed to have
the SENTENCE mode perform better. Maybe the options for nice
highlights with WORD mode could be put into the docs in this case as
some form of an example.

As long as I can use the UH with nicely aligned snippets in WORD mode
I'm fine with any defaults. I explicitly set them in the config and in
the queries most of the time anyways.


Re: unified highlighter performance in solr 8.5.1

2020-06-19 Thread Nándor Mátravölgyi
Hi!

With the provided test I've profiled the preceding() and following()
calls on the base Java iterators in the different options.

=== default highlighter arguments ===
Calling the test query with SENTENCE base iterator:
- from LengthGoalBreakIterator.following(): 1130 calls of
baseIter.preceding() took 1.039629 seconds in total
- from LengthGoalBreakIterator.following(): 1140 calls of
baseIter.following() took 0.340679 seconds in total
- from LengthGoalBreakIterator.preceding(): 1150 calls of
baseIter.preceding() took 0.099344 seconds in total
- from LengthGoalBreakIterator.preceding(): 1100 calls of
baseIter.following() took 0.015156 seconds in total

Calling the test query with WORD base iterator:
- from LengthGoalBreakIterator.following(): 1200 calls of
baseIter.preceding() took 0.001006 seconds in total
- from LengthGoalBreakIterator.following(): 1700 calls of
baseIter.following() took 0.006278 seconds in total
- from LengthGoalBreakIterator.preceding(): 1710 calls of
baseIter.preceding() took 0.016320 seconds in total
- from LengthGoalBreakIterator.preceding(): 1090 calls of
baseIter.following() took 0.000527 seconds in total

=== hl.fragsizeIsMinimum=true=0 ===
Calling the test query with SENTENCE base iterator:
- from LengthGoalBreakIterator.following(): 860 calls of
baseIter.following() took 0.012593 seconds in total
- from LengthGoalBreakIterator.preceding(): 870 calls of
baseIter.preceding() took 0.022208 seconds in total

Calling the test query with WORD base iterator:
- from LengthGoalBreakIterator.following(): 1360 calls of
baseIter.following() took 0.004789 seconds in total
- from LengthGoalBreakIterator.preceding(): 1370 calls of
baseIter.preceding() took 0.015983 seconds in total

=== hl.fragsizeIsMinimum=true ===
Calling the test query with SENTENCE base iterator:
- from LengthGoalBreakIterator.following(): 980 calls of
baseIter.following() took 0.010253 seconds in total
- from LengthGoalBreakIterator.preceding(): 980 calls of
baseIter.preceding() took 0.341997 seconds in total

Calling the test query with WORD base iterator:
- from LengthGoalBreakIterator.following(): 1670 calls of
baseIter.following() took 0.005150 seconds in total
- from LengthGoalBreakIterator.preceding(): 1680 calls of
baseIter.preceding() took 0.013657 seconds in total

=== hl.fragAlignRatio=0 ===
Calling the test query with SENTENCE base iterator:
- from LengthGoalBreakIterator.following(): 1070 calls of
baseIter.preceding() took 1.312056 seconds in total
- from LengthGoalBreakIterator.following(): 1080 calls of
baseIter.following() took 0.678575 seconds in total
- from LengthGoalBreakIterator.preceding(): 1080 calls of
baseIter.preceding() took 0.020507 seconds in total
- from LengthGoalBreakIterator.preceding(): 1080 calls of
baseIter.following() took 0.006977 seconds in total

Calling the test query with WORD base iterator:
- from LengthGoalBreakIterator.following(): 880 calls of
baseIter.preceding() took 0.000706 seconds in total
- from LengthGoalBreakIterator.following(): 1370 calls of
baseIter.following() took 0.004110 seconds in total
- from LengthGoalBreakIterator.preceding(): 1380 calls of
baseIter.preceding() took 0.014752 seconds in total
- from LengthGoalBreakIterator.preceding(): 1380 calls of
baseIter.following() took 0.000106 seconds in total

There is definitely a big difference between SENTENCE and WORD. I'm
not sure how we can improve the logic on our side while keeping the
features as is. Since the number of calls is roughly the same for when
the performance is good and bad, it seems to depend on what the text
is that the iterator is traversing.


Re: unified highlighter performance in solr 8.5.1

2020-05-28 Thread Nándor Mátravölgyi
Hi!

I've not been able to delve into this issue deeply, but it could be
useful to know that "fragsizeIsMinimum" and "fragAlignRatio" are new
parameters which have behavior changing default values.

Leaving those with their default values makes the comparison between
8.4 and 8.5 like apples to oranges in a sense. To have the new UH
behave like the old one as closely as possible use:
fragsizeIsMinimum=false
fragAlignRatio=0


Re: Atomic update error with JSON handler

2018-05-22 Thread Nándor Mátravölgyi
Hi,
Firstly, I have already tried the request body enclosed in [...] without
success. Turns out it was not the only issue. The path was not right for
the atomic updates:

On the v2 API:
localhost:8983/v2/c/testnode/update/json/commands?commit=true Succeeds
localhost:8983/v2/c/testnode/update/json?commit=true Fails
localhost:8983/v2/c/testnode/update?commit=true Fails

On the old API:
localhost:8983/solr/testnode/update/json?commit=true Succeeds
localhost:8983/solr/testnode/update/json/docs?commit=true Fails

Some insight about what caused my confusion:
In the (
https://lucene.apache.org/solr/guide/7_3/updating-parts-of-documents.html#example-updating-part-of-a-document
) page of the Solr Guide, it is not emphasized that for an atomic JSON
update to work you must use the command endpoint. Furthermore the example
JSONs in that paragraph do not have the actual commands like
add/delete/commit level as they are shown in a previous page. (
https://lucene.apache.org/solr/guide/7_3/uploading-data-with-index-handlers.html#sending-json-update-commands
)
Either boldly stating that the atomic updates are commands or showing
complete JSON requests as examples would have been much clearer.
It is also surprising to me that the command endpoint accepts the "list of
documents" format where the guide does not mention that. (at the second
link provided)

Thank you for pointing me in the right direction!
Nandor

On Tue, May 22, 2018 at 8:14 AM, Yasufumi Mizoguchi <yasufumi0...@gmail.com>
wrote:

> Hi,
>
> At least, it is better to enclose your json body with '[ ]', I think.
>
> Following is the result I tried using curl.
>
> $ curl -XPOST "localhost:8983/solr/test_core/update/json?commit=true"
> --data-binary '{"id":"test1","title":{"set":"Solr Rocks"}}'
> {
>   "responseHeader":{
> "status":400,
> "QTime":18},
>   "error":{
> "metadata":[
>   "error-class","org.apache.solr.common.SolrException",
>   "root-error-class","org.apache.solr.common.SolrException"],
> "msg":"Unknown command 'id' at [5]",
> "code":400}}
> $ curl -XPOST "localhost:8983/solr/test_core/update/json?commit=true"
> --data-binary '[{"id":"test1","title":{"set":"Solr Rocks"}}]'
> {
>   "responseHeader":{
> "status":0,
> "QTime":250}}
>
> Thanks,
> Yasufumi
>
>
> 2018年5月22日(火) 1:26 Nándor Mátravölgyi <nandor.ma...@gmail.com>:
>
> > Hi,
> >
> > I'm trying to build a simple document search core with SolrCloud. I've
> run
> > into an issue when trying to partially update doucments. (aka atomic
> > updates) It appears to be a bug, because the semantically same request
> > succeeds in XML format, while it fails as JSON.
> >
> > The body of the XML request:
> > test1 > update="set">Solr Rocks
> >
> > The body of the JSON request:
> > {"id":"test1","title":{"set":"Solr Rocks"}}
> >
> > I'm using the requests library in Python3 to send the update request.
> > Sending the XML request with the following code works as expected:
> > r = requests.post('
> > http://localhost:8983/v2/c/testnode/update/xml?commit=true',
> > headers={'Content-type': 'application/xml'}, data=xml)
> >
> > Sending the JSON request with the following codes return with a
> > SolrException:
> > r = requests.post('
> > http://localhost:8983/v2/c/testnode/update/json?commit=true',
> > headers={'Content-type': 'application/json'}, data=json)
> > r = requests.post('
> > http://localhost:8983/solr/testnode/update/json/docs?commit=true',
> > headers={'Content-type': 'application/json'}, data=json)
> >
> > Using the same lines of code to send a JSON request that is not an atomic
> > update works as expected. Such JSON request body is like:
> > {"id":"test1","title":"Solr Rocks"}
> >
> > The error message in the response is: ERROR: [doc=test1] unknown field
> > 'title.set'
> > Here is the log of the exception: https://pastebin.com/raw/VJe5hR25
> >
> > Depending on which API I send the request to, the logs are identical
> except
> > on line 27 and 28:
> > This is with v2:
> >   at
> >
> > org.apache.solr.handler.UpdateRequestHandlerApi$1.
> call(UpdateRequestHandlerApi.java:48)
> >   at org.apache.solr.api.V2HttpCall.execute(V2HttpCall.java:325)
> > and this is with the other:
> >   at org.apache.solr.core.SolrCore.execute(SolrCore.java:2503)
> >   at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:711)
> >
> > I'm using Solr 7.3.1 and I believe I do everything according to the
> > documentation. (
> >
> > https://lucene.apache.org/solr/guide/7_3/updating-parts-
> of-documents.html#atomic-updates
> > )
> > The solrconfig.xml and managed-schema files are fairly simple, they have
> > code snippets from the examples mostly: https://pastebin.com/199JJkp0
> > https://pastebin.com/Dp1YK46k
> >
> > This could be a bug, or I can't fathom what I'm missing. Can anyone help
> me
> > out?
> > Thanks,
> > Nandor
> >
>


Atomic update error with JSON handler

2018-05-21 Thread Nándor Mátravölgyi
Hi,

I'm trying to build a simple document search core with SolrCloud. I've run
into an issue when trying to partially update doucments. (aka atomic
updates) It appears to be a bug, because the semantically same request
succeeds in XML format, while it fails as JSON.

The body of the XML request:
test1Solr Rocks

The body of the JSON request:
{"id":"test1","title":{"set":"Solr Rocks"}}

I'm using the requests library in Python3 to send the update request.
Sending the XML request with the following code works as expected:
r = requests.post('
http://localhost:8983/v2/c/testnode/update/xml?commit=true',
headers={'Content-type': 'application/xml'}, data=xml)

Sending the JSON request with the following codes return with a
SolrException:
r = requests.post('
http://localhost:8983/v2/c/testnode/update/json?commit=true',
headers={'Content-type': 'application/json'}, data=json)
r = requests.post('
http://localhost:8983/solr/testnode/update/json/docs?commit=true',
headers={'Content-type': 'application/json'}, data=json)

Using the same lines of code to send a JSON request that is not an atomic
update works as expected. Such JSON request body is like:
{"id":"test1","title":"Solr Rocks"}

The error message in the response is: ERROR: [doc=test1] unknown field
'title.set'
Here is the log of the exception: https://pastebin.com/raw/VJe5hR25

Depending on which API I send the request to, the logs are identical except
on line 27 and 28:
This is with v2:
  at
org.apache.solr.handler.UpdateRequestHandlerApi$1.call(UpdateRequestHandlerApi.java:48)
  at org.apache.solr.api.V2HttpCall.execute(V2HttpCall.java:325)
and this is with the other:
  at org.apache.solr.core.SolrCore.execute(SolrCore.java:2503)
  at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:711)

I'm using Solr 7.3.1 and I believe I do everything according to the
documentation. (
https://lucene.apache.org/solr/guide/7_3/updating-parts-of-documents.html#atomic-updates
)
The solrconfig.xml and managed-schema files are fairly simple, they have
code snippets from the examples mostly: https://pastebin.com/199JJkp0
https://pastebin.com/Dp1YK46k

This could be a bug, or I can't fathom what I'm missing. Can anyone help me
out?
Thanks,
Nandor