Re: Atomic Update (nested), Unified Highlighter and Lazy Field Loading => Invalid Index

David Smiley Fri, 19 Feb 2021 07:58:14 -0800

Even if you could do an "fl" with the ability to exclude certain fields, it
begs the question of what goes into the document cache.  The doc cache is
doc oriented, not field oriented.  So there needs to be some sort of
stand-in value if you don't want to cache a value there.... and that ends
up being LazyField if you have that feature enabled, or possible wasted
space if you don't have that enabled.  So I don't think the ability to
exclude fields in "fl" would obsolete enableLazyFieldLoading which I think
you are implying?


~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Fri, Feb 19, 2021 at 10:10 AM Gus Heck <gus.h...@gmail.com> wrote:

> Actually I suspect it's there because the ability to exclude fields
> rather than include them is still pending...
> https://issues.apache.org/jira/browse/SOLR-3191
> See also
> https://issues.apache.org/jira/browse/SOLR-10367
> https://issues.apache.org/jira/browse/SOLR-9467
>
> All of these and lazy field loading are motivated by the case where you
> have a very large stored field and you sometimes don't want it, but do want
> everything else, and an explicit list of fields is not convenient (i.e. the
> field list would have to be hard coded in an application, or alternately
> require some sort of schema parsing to build a list of possible fields or
> other severe ugliness..)
>
> -Gus
>
> On Thu, Feb 18, 2021 at 8:42 AM David Smiley <dsmi...@apache.org> wrote:
>
> > IMO enableLazyFieldLoading is a small optimization for most apps.  It
> saves
> > memory in the document cache at the expense of increased latency if your
> > usage pattern wants a field later that wasn't requested earlier.  You'd
> > probably need detailed metrics/benchmarks to observe a difference, and
> you
> > might reach a conclusion that enableLazyFieldLoading is best at "false"
> for
> > you irrespective of the bug.  I suspect it may have been developed for
> > particularly large document use-cases where you don't normally need some
> > large text fields for retrieval/highlighting.  For example imagine if you
> > stored the entire input data as JSON in a _json_ field or some-such.
> > Nowadays, I'd set large="true" on such a field, which is a much newer
> > option.
> >
> > I was able to tweak my test to have only alphabetic IDs, and the test
> still
> > failed.  I don't see how the ID's contents/format could cause any effect.
> >
> > ~ David Smiley
> > Apache Lucene/Solr Search Developer
> > http://www.linkedin.com/in/davidwsmiley
> >
> >
> > On Thu, Feb 18, 2021 at 5:04 AM Nussbaum, Ronen <
> ronen.nussb...@verint.com
> > >
> > wrote:
> >
> > > You're right, I was able to reproduce it too without highlighting.
> > > Regarding the existing bug, I think there might be an additional issue
> > > here because it happens only when id field contains an underscore
> (didn't
> > > check for other special characters).
> > > Currently I have no other choice but to use
> enableLazyFieldLoading=false.
> > > I hope it wouldn't have a significant performance impact.
> > >
> > > -----Original Message-----
> > > From: David Smiley <dsmi...@apache.org>
> > > Sent: יום ה 18 פברואר 2021 01:03
> > > To: solr-user <solr-user@lucene.apache.org>
> > > Subject: Re: Atomic Update (nested), Unified Highlighter and Lazy Field
> > > Loading => Invalid Index
> > >
> > > I think the issue is this existing bug, but needs to refer to
> > > toSolrInputDocument instead of toSolrDoc:
> > > https://issues.apache.org/jira/browse/SOLR-13034
> > > Highlighting isn't involved; you just need to somehow get a document
> > > cached with lazy fields.  In a test I was able to do this simply by
> > doing a
> > > query that only returns the "id" field.  No highlighting.
> > >
> > > ~ David Smiley
> > > Apache Lucene/Solr Search Developer
> > > http://www.linkedin.com/in/davidwsmiley
> > >
> > >
> > > On Wed, Feb 17, 2021 at 10:28 AM David Smiley <dsmi...@apache.org>
> > wrote:
> > >
> > > > Thanks for more details.  I was able to reproduce this locally!  I
> > > > hacked a test to look similar to what you are doing.  BTW it's okay
> to
> > > > fill out a JIRA imperfectly; they can always be edited :-).  Once I
> > > > better understand the nature of the bug today, I'll file an issue and
> > > respond with it here.
> > > >
> > > > ~ David Smiley
> > > > Apache Lucene/Solr Search Developer
> > > > http://www.linkedin.com/in/davidwsmiley
> > > >
> > > >
> > > > On Wed, Feb 17, 2021 at 6:36 AM Nussbaum, Ronen
> > > > <ronen.nussb...@verint.com>
> > > > wrote:
> > > >
> > > >> Hello David,
> > > >>
> > > >> Thank you for your reply.
> > > >> It was very hard but finally I discovered how to reproduce it. I
> > > >> thought of issuing an issue but wasn't sure about the components and
> > > priority.
> > > >> I used the "tech products" configset, with the following changes:
> > > >> 1. Added <field name="_nest_path_" type="_nest_path_" /><fieldType
> > > >> name="_nest_path_" class="solr.NestPathField" /> 2. Added <field
> > > >> name="text_en" type="text_en" indexed="true"
> > > >> stored="true" termVectors="true" termOffsets="true"
> > termPositions="true"
> > > >> required="false" multiValued="true" /> Than I inserted one document
> > > >> with a nested child e.g.
> > > >> {id:"abc_1", utterances:{id:"abc_1-1", text_en:"Solr is great"}}
> > > >>
> > > >> To reproduce:
> > > >> Do a search with surround and unified highlighter:
> > > >>
> > > >>
> hl.fl=text_en&hl.method=unified&hl=on&q=%7B!surround%7Dtext_en%3A4W("
> > > >> solr"%2C"great")
> > > >>
> > > >> Now, try to update the parent e.g. {id:"abc_1",
> categories_i:{add:1}}
> > > >>
> > > >> Important: it happens only when "id" contains underscore characters!
> > > >> If you'll use "abc-1" it would work.
> > > >>
> > > >> Thanks in advance,
> > > >> Ronen.
> > > >>
> > > >> -----Original Message-----
> > > >> From: David Smiley <dsmi...@apache.org>
> > > >> Sent: יום א 14 פברואר 2021 19:17
> > > >> To: solr-user <solr-user@lucene.apache.org>
> > > >> Subject: Re: Atomic Update (nested), Unified Highlighter and Lazy
> > > >> Field Loading => Invalid Index
> > > >>
> > > >> Hello Ronen,
> > > >>
> > > >> Can you please file a JIRA issue?  Some quick searches did not turn
> > > >> anything up.  It would be super helpful to me if you could list a
> > > >> series of steps with Solr out-of-the-box in 8.8 including what data
> > > >> to index and query.  Solr already includes the "tech products"
> sample
> > > >> data; maybe that can illustrate the problem?  It's not clear if
> > > >> nested schema or nested docs are actually required in your example.
> > > >> If you share the JIRA issue with me, I'll chase this one down.
> > > >>
> > > >> ~ David Smiley
> > > >> Apache Lucene/Solr Search Developer
> > > >> http://www.linkedin.com/in/davidwsmiley
> > > >>
> > > >>
> > > >> On Sun, Feb 14, 2021 at 11:16 AM Ronen Nussbaum <rone...@gmail.com>
> > > >> wrote:
> > > >>
> > > >> > Hi All,
> > > >> >
> > > >> > I discovered a strange behaviour with this combination.
> > > >> > Not only the atomic update fails, the child documents are not
> > > >> > properly indexed, and you can't use highlights on their text
> > > >> > fields. Currently there is no workaround other than reindex.
> > > >> >
> > > >> > Checked on 8.3.0, 8.6.1 and 8.8.0.
> > > >> > 1. Configure nested schema.
> > > >> > 2. enableLazyFieldLoading is true (default).
> > > >> > 3. Run a search with hl.method=unified and hl.fl=<one of child
> text
> > > >> > fields> 4. Trying to do an atomic update on some of the *parents*
> > > >> > fields> of
> > > >> > the returned documents from #3.
> > > >> >
> > > >> > You get an error: "TransactionLog doesn't know how to serialize
> > > >> > class org.apache.lucene.document.LazyDocument$LazyField".
> > > >> >
> > > >> > Now trying to run #3 again yields an error message that the text
> > > >> > field is indexed without positions.
> > > >> >
> > > >> > If enableLazyFieldLoading is false or if using the default
> > > >> > highlighter this doesn't happen.
> > > >> >
> > > >> > Ronen.
> > > >> >
> > > >>
> > > >>
> > > >> This electronic message may contain proprietary and confidential
> > > >> information of Verint Systems Inc., its affiliates and/or
> > > >> subsidiaries. The information is intended to be for the use of the
> > > >> individual(s) or
> > > >> entity(ies) named above. If you are not the intended recipient (or
> > > >> authorized to receive this e-mail for the intended recipient), you
> > > >> may not use, copy, disclose or distribute to anyone this message or
> > > >> any information contained in this message. If you have received this
> > > >> electronic message in error, please notify us by replying to this
> > > e-mail.
> > > >>
> > > >
> > >
> > >
> > > This electronic message may contain proprietary and confidential
> > > information of Verint Systems Inc., its affiliates and/or subsidiaries.
> > The
> > > information is intended to be for the use of the individual(s) or
> > > entity(ies) named above. If you are not the intended recipient (or
> > > authorized to receive this e-mail for the intended recipient), you may
> > not
> > > use, copy, disclose or distribute to anyone this message or any
> > information
> > > contained in this message. If you have received this electronic message
> > in
> > > error, please notify us by replying to this e-mail.
> > >
> >
>
>
> --
> http://www.needhamsoftware.com (work)
> http://www.the111shift.com (play)
>

Re: Atomic Update (nested), Unified Highlighter and Lazy Field Loading => Invalid Index

Reply via email to