Issues with the handling of NULLs in Streaming Expressions

Oleksandr Chornyi Thu, 06 Jun 2019 06:23:22 -0700

Hi guys!

I'm working on a generic query builder for Streaming Expressions which
allows building various requests containing row level expressions (i.e.
evaluators), aggregations/metrics, sorts, etc. On this way, I bumped into
many issues related to the handling of NULL values by the engine. Here are
the issues in the descending order of their severity (from my standpoint):


1. *There is no way to check if a value in a tuple is NULL* because
*eq* function
fails to accept *null *as an argument:

> *eq(1,null) *

fails with

> "Unable to check eq(...) because a null value was found"

even though the documentation says
<https://lucene.apache.org/solr/guide/7_7/stream-evaluator-reference.html#eq>
that "If any any parameters are null and there is at least one parameter
that is not null then false will be returned."
This issue makes it impossible to evaluate an expression from the *if* function
documentation
<https://lucene.apache.org/solr/guide/7_7/stream-evaluator-reference.html#if>
:

> if(eq(fieldB,null), null, div(fieldA,fieldB)) // if fieldB is null then
> null else fieldA / fieldB

I think that the root cause of the issue is coming from the fact that
*EqualToEvaluator* extends *RecursiveBooleanEvaluator* which checks that
none of the arguments is *null*, but I don't think that's what we want
here. *Can you confirm that what I see is a bug and I should file it?*

2. The fact that *FieldValueEvaluator returns a field name when a value is
null* breaks any evaluator/decorator which otherwise would handle
*nulls*. Consider
these examples (I'm using *cartesianProduct *on an integer array to get
several tuples with integers because I couldn't find a way to do so
directly):

> cartesianProduct(
>     tuple(a=array(1,null,3)),
>     a
> )

returns values preserving *nulls: *

> "docs": [
>   {"a": 1},
>   {"a": null},
>   {"a": 3},
> ...]

If I just execute *add(1, null) *it works as expected and returns *null.* Now,
if I'm trying to apply any stream evaluator which should work fine with
*nulls* to this stream:

> select(
>     cartesianProduct(
>         tuple(a=array(1,null,3)),
>         a
>     ),
>     add(a, 1) as a
> )

it fails to process the second record saying that:

> "docs": [
>   {"a": 2},
>   {
>     "EXCEPTION": "Failed to evaluate expression add(a,val(1)) - Numeric
> value expected but found type java.lang.String for value a",
>     ...
>   }
> ]

It looks even more confusing when running the following query:

> select(
>     cartesianProduct(
>         tuple(a=array(1,null,3)),
>         a
>     ),
>     coalesce(a, 42) as a
> )

produces

> "docs": [
>   {"a": 1},
>   {"a": "a"},
>   {"a": 3},
> ...]

 instead of

> "docs": [
>   {"a": 1},
>   {"a": *42*},
>   {"a": 3},
> ...]

As I mentioned in the issue description, I think the issue lies in these
lines of *FieldValueEvaluator:*

> if(value == null) {
>    return fieldName;
> }

I consider this to be very counterintuitive. *Can you confirm that this is
a bug, rather than a designed feature?*

3. *Most Boolean Stream Evaluators* state that they *don't work with
NULLs.* However,
it's very inconvenient and there is no other way to work around it (see
item #1)*. *I'm talking about the following evaluators: *and, eor, or, gt,
lt, gteq, lteq. *At the moment these evaluators just throw exceptions when
an argument is *null. **Have you considered making their behavior more
SQL-like?* When the behavior is like this:

   - *gt, lt, gteq, lteq *evaluators return *null* if any of the arguments
   is *null*
   - *or(true, null)* returns *true*
   - *and(true, null)* returns *false*
   - *having* decorator treats *null* returned by *booleanEvaluator* as
   *false*

4. Some *inconsistencies in evaluators behavior* and/or documentation:

   - *div(1, null)* fails while *mult(1, null), add(1, null), sub(1, null)*
   return *null*. *Should I file a bug for div?*
   - documentation for *not *says that "The function will fail to execute
   if the parameter is non-boolean or null" however it returns *null*
for *not(null).
   **Should I create a task to fix the doc?*

I know I mixed many questions into one thread, however for me they are all
interrelated. Thank in advance for your help.

-- 
Best Regards,
Alex Chornyi

Issues with the handling of NULLs in Streaming Expressions

Reply via email to