Re: Atomic Update (nested), Unified Highlighter and Lazy Field Loading => Invalid Index

David Smiley Thu, 18 Feb 2021 05:41:46 -0800

IMO enableLazyFieldLoading is a small optimization for most apps.  It saves
memory in the document cache at the expense of increased latency if your
usage pattern wants a field later that wasn't requested earlier.  You'd
probably need detailed metrics/benchmarks to observe a difference, and you
might reach a conclusion that enableLazyFieldLoading is best at "false" for
you irrespective of the bug.  I suspect it may have been developed for
particularly large document use-cases where you don't normally need some
large text fields for retrieval/highlighting.  For example imagine if you
stored the entire input data as JSON in a _json_ field or some-such.
Nowadays, I'd set large="true" on such a field, which is a much newer
option.


I was able to tweak my test to have only alphabetic IDs, and the test still
failed.  I don't see how the ID's contents/format could cause any effect.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Thu, Feb 18, 2021 at 5:04 AM Nussbaum, Ronen <ronen.nussb...@verint.com>
wrote:

> You're right, I was able to reproduce it too without highlighting.
> Regarding the existing bug, I think there might be an additional issue
> here because it happens only when id field contains an underscore (didn't
> check for other special characters).
> Currently I have no other choice but to use enableLazyFieldLoading=false.
> I hope it wouldn't have a significant performance impact.
>
> -----Original Message-----
> From: David Smiley <dsmi...@apache.org>
> Sent: יום ה 18 פברואר 2021 01:03
> To: solr-user <solr-user@lucene.apache.org>
> Subject: Re: Atomic Update (nested), Unified Highlighter and Lazy Field
> Loading => Invalid Index
>
> I think the issue is this existing bug, but needs to refer to
> toSolrInputDocument instead of toSolrDoc:
> https://issues.apache.org/jira/browse/SOLR-13034
> Highlighting isn't involved; you just need to somehow get a document
> cached with lazy fields.  In a test I was able to do this simply by doing a
> query that only returns the "id" field.  No highlighting.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Wed, Feb 17, 2021 at 10:28 AM David Smiley <dsmi...@apache.org> wrote:
>
> > Thanks for more details.  I was able to reproduce this locally!  I
> > hacked a test to look similar to what you are doing.  BTW it's okay to
> > fill out a JIRA imperfectly; they can always be edited :-).  Once I
> > better understand the nature of the bug today, I'll file an issue and
> respond with it here.
> >
> > ~ David Smiley
> > Apache Lucene/Solr Search Developer
> > http://www.linkedin.com/in/davidwsmiley
> >
> >
> > On Wed, Feb 17, 2021 at 6:36 AM Nussbaum, Ronen
> > <ronen.nussb...@verint.com>
> > wrote:
> >
> >> Hello David,
> >>
> >> Thank you for your reply.
> >> It was very hard but finally I discovered how to reproduce it. I
> >> thought of issuing an issue but wasn't sure about the components and
> priority.
> >> I used the "tech products" configset, with the following changes:
> >> 1. Added <field name="_nest_path_" type="_nest_path_" /><fieldType
> >> name="_nest_path_" class="solr.NestPathField" /> 2. Added <field
> >> name="text_en" type="text_en" indexed="true"
> >> stored="true" termVectors="true" termOffsets="true" termPositions="true"
> >> required="false" multiValued="true" /> Than I inserted one document
> >> with a nested child e.g.
> >> {id:"abc_1", utterances:{id:"abc_1-1", text_en:"Solr is great"}}
> >>
> >> To reproduce:
> >> Do a search with surround and unified highlighter:
> >>
> >> hl.fl=text_en&hl.method=unified&hl=on&q=%7B!surround%7Dtext_en%3A4W("
> >> solr"%2C"great")
> >>
> >> Now, try to update the parent e.g. {id:"abc_1", categories_i:{add:1}}
> >>
> >> Important: it happens only when "id" contains underscore characters!
> >> If you'll use "abc-1" it would work.
> >>
> >> Thanks in advance,
> >> Ronen.
> >>
> >> -----Original Message-----
> >> From: David Smiley <dsmi...@apache.org>
> >> Sent: יום א 14 פברואר 2021 19:17
> >> To: solr-user <solr-user@lucene.apache.org>
> >> Subject: Re: Atomic Update (nested), Unified Highlighter and Lazy
> >> Field Loading => Invalid Index
> >>
> >> Hello Ronen,
> >>
> >> Can you please file a JIRA issue?  Some quick searches did not turn
> >> anything up.  It would be super helpful to me if you could list a
> >> series of steps with Solr out-of-the-box in 8.8 including what data
> >> to index and query.  Solr already includes the "tech products" sample
> >> data; maybe that can illustrate the problem?  It's not clear if
> >> nested schema or nested docs are actually required in your example.
> >> If you share the JIRA issue with me, I'll chase this one down.
> >>
> >> ~ David Smiley
> >> Apache Lucene/Solr Search Developer
> >> http://www.linkedin.com/in/davidwsmiley
> >>
> >>
> >> On Sun, Feb 14, 2021 at 11:16 AM Ronen Nussbaum <rone...@gmail.com>
> >> wrote:
> >>
> >> > Hi All,
> >> >
> >> > I discovered a strange behaviour with this combination.
> >> > Not only the atomic update fails, the child documents are not
> >> > properly indexed, and you can't use highlights on their text
> >> > fields. Currently there is no workaround other than reindex.
> >> >
> >> > Checked on 8.3.0, 8.6.1 and 8.8.0.
> >> > 1. Configure nested schema.
> >> > 2. enableLazyFieldLoading is true (default).
> >> > 3. Run a search with hl.method=unified and hl.fl=<one of child text
> >> > fields> 4. Trying to do an atomic update on some of the *parents*
> >> > fields> of
> >> > the returned documents from #3.
> >> >
> >> > You get an error: "TransactionLog doesn't know how to serialize
> >> > class org.apache.lucene.document.LazyDocument$LazyField".
> >> >
> >> > Now trying to run #3 again yields an error message that the text
> >> > field is indexed without positions.
> >> >
> >> > If enableLazyFieldLoading is false or if using the default
> >> > highlighter this doesn't happen.
> >> >
> >> > Ronen.
> >> >
> >>
> >>
> >> This electronic message may contain proprietary and confidential
> >> information of Verint Systems Inc., its affiliates and/or
> >> subsidiaries. The information is intended to be for the use of the
> >> individual(s) or
> >> entity(ies) named above. If you are not the intended recipient (or
> >> authorized to receive this e-mail for the intended recipient), you
> >> may not use, copy, disclose or distribute to anyone this message or
> >> any information contained in this message. If you have received this
> >> electronic message in error, please notify us by replying to this
> e-mail.
> >>
> >
>
>
> This electronic message may contain proprietary and confidential
> information of Verint Systems Inc., its affiliates and/or subsidiaries. The
> information is intended to be for the use of the individual(s) or
> entity(ies) named above. If you are not the intended recipient (or
> authorized to receive this e-mail for the intended recipient), you may not
> use, copy, disclose or distribute to anyone this message or any information
> contained in this message. If you have received this electronic message in
> error, please notify us by replying to this e-mail.
>

Re: Atomic Update (nested), Unified Highlighter and Lazy Field Loading => Invalid Index

Reply via email to