Re: Issues with the handling of NULLs in Streaming Expressions

Joel Bernstein Thu, 06 Jun 2019 17:38:47 -0700

Interesting questions. I suspect we need to beef up our test cases that
deal with nulls and make sure they behave in a consistent manner.


One of the things that likely needs to be looked at more carefully is how
string literals are handled as opposed to nulls. In some cases I believe if
null is encountered it's treated as a string literal and doesn't preserve
the null. So I think it's worth creating a ticket outlining your findings
and we can think about solutions.

Joel Bernstein
http://joelsolr.blogspot.com/


On Thu, Jun 6, 2019 at 9:22 AM Oleksandr Chornyi <oleksandrchor...@gmail.com>
wrote:

> Hi guys!
>
> I'm working on a generic query builder for Streaming Expressions which
> allows building various requests containing row level expressions (i.e.
> evaluators), aggregations/metrics, sorts, etc. On this way, I bumped into
> many issues related to the handling of NULL values by the engine. Here are
> the issues in the descending order of their severity (from my standpoint):
>
> 1. *There is no way to check if a value in a tuple is NULL* because
> *eq* function
> fails to accept *null *as an argument:
>
> > *eq(1,null) *
>
> fails with
>
> > "Unable to check eq(...) because a null value was found"
>
> even though the documentation says
> <
> https://lucene.apache.org/solr/guide/7_7/stream-evaluator-reference.html#eq
> >
> that "If any any parameters are null and there is at least one parameter
> that is not null then false will be returned."
> This issue makes it impossible to evaluate an expression from the *if*
> function
> documentation
> <
> https://lucene.apache.org/solr/guide/7_7/stream-evaluator-reference.html#if
> >
> :
>
> > if(eq(fieldB,null), null, div(fieldA,fieldB)) // if fieldB is null then
> > null else fieldA / fieldB
>
> I think that the root cause of the issue is coming from the fact that
> *EqualToEvaluator* extends *RecursiveBooleanEvaluator* which checks that
> none of the arguments is *null*, but I don't think that's what we want
> here. *Can you confirm that what I see is a bug and I should file it?*
>
> 2. The fact that *FieldValueEvaluator returns a field name when a value is
> null* breaks any evaluator/decorator which otherwise would handle
> *nulls*. Consider
> these examples (I'm using *cartesianProduct *on an integer array to get
> several tuples with integers because I couldn't find a way to do so
> directly):
>
> > cartesianProduct(
> >     tuple(a=array(1,null,3)),
> >     a
> > )
>
> returns values preserving *nulls: *
>
> > "docs": [
> >   {"a": 1},
> >   {"a": null},
> >   {"a": 3},
> > ...]
>
> If I just execute *add(1, null) *it works as expected and returns *null.*
> Now,
> if I'm trying to apply any stream evaluator which should work fine with
> *nulls* to this stream:
>
> > select(
> >     cartesianProduct(
> >         tuple(a=array(1,null,3)),
> >         a
> >     ),
> >     add(a, 1) as a
> > )
>
> it fails to process the second record saying that:
>
> > "docs": [
> >   {"a": 2},
> >   {
> >     "EXCEPTION": "Failed to evaluate expression add(a,val(1)) - Numeric
> > value expected but found type java.lang.String for value a",
> >     ...
> >   }
> > ]
>
> It looks even more confusing when running the following query:
>
> > select(
> >     cartesianProduct(
> >         tuple(a=array(1,null,3)),
> >         a
> >     ),
> >     coalesce(a, 42) as a
> > )
>
> produces
>
> > "docs": [
> >   {"a": 1},
> >   {"a": "a"},
> >   {"a": 3},
> > ...]
>
>  instead of
>
> > "docs": [
> >   {"a": 1},
> >   {"a": *42*},
> >   {"a": 3},
> > ...]
>
> As I mentioned in the issue description, I think the issue lies in these
> lines of *FieldValueEvaluator:*
>
> > if(value == null) {
> >    return fieldName;
> > }
>
> I consider this to be very counterintuitive. *Can you confirm that this is
> a bug, rather than a designed feature?*
>
> 3. *Most Boolean Stream Evaluators* state that they *don't work with
> NULLs.* However,
> it's very inconvenient and there is no other way to work around it (see
> item #1)*. *I'm talking about the following evaluators: *and, eor, or, gt,
> lt, gteq, lteq. *At the moment these evaluators just throw exceptions when
> an argument is *null. **Have you considered making their behavior more
> SQL-like?* When the behavior is like this:
>
>    - *gt, lt, gteq, lteq *evaluators return *null* if any of the arguments
>    is *null*
>    - *or(true, null)* returns *true*
>    - *and(true, null)* returns *false*
>    - *having* decorator treats *null* returned by *booleanEvaluator* as
>    *false*
>
> 4. Some *inconsistencies in evaluators behavior* and/or documentation:
>
>    - *div(1, null)* fails while *mult(1, null), add(1, null), sub(1, null)*
>    return *null*. *Should I file a bug for div?*
>    - documentation for *not *says that "The function will fail to execute
>    if the parameter is non-boolean or null" however it returns *null*
> for *not(null).
>    **Should I create a task to fix the doc?*
>
> I know I mixed many questions into one thread, however for me they are all
> interrelated. Thank in advance for your help.
>
> --
> Best Regards,
> Alex Chornyi
>

Re: Issues with the handling of NULLs in Streaming Expressions

Reply via email to