Interesting questions. I suspect we need to beef up our test cases that deal with nulls and make sure they behave in a consistent manner.
One of the things that likely needs to be looked at more carefully is how string literals are handled as opposed to nulls. In some cases I believe if null is encountered it's treated as a string literal and doesn't preserve the null. So I think it's worth creating a ticket outlining your findings and we can think about solutions. Joel Bernstein http://joelsolr.blogspot.com/ On Thu, Jun 6, 2019 at 9:22 AM Oleksandr Chornyi <oleksandrchor...@gmail.com> wrote: > Hi guys! > > I'm working on a generic query builder for Streaming Expressions which > allows building various requests containing row level expressions (i.e. > evaluators), aggregations/metrics, sorts, etc. On this way, I bumped into > many issues related to the handling of NULL values by the engine. Here are > the issues in the descending order of their severity (from my standpoint): > > 1. *There is no way to check if a value in a tuple is NULL* because > *eq* function > fails to accept *null *as an argument: > > > *eq(1,null) * > > fails with > > > "Unable to check eq(...) because a null value was found" > > even though the documentation says > < > https://lucene.apache.org/solr/guide/7_7/stream-evaluator-reference.html#eq > > > that "If any any parameters are null and there is at least one parameter > that is not null then false will be returned." > This issue makes it impossible to evaluate an expression from the *if* > function > documentation > < > https://lucene.apache.org/solr/guide/7_7/stream-evaluator-reference.html#if > > > : > > > if(eq(fieldB,null), null, div(fieldA,fieldB)) // if fieldB is null then > > null else fieldA / fieldB > > I think that the root cause of the issue is coming from the fact that > *EqualToEvaluator* extends *RecursiveBooleanEvaluator* which checks that > none of the arguments is *null*, but I don't think that's what we want > here. *Can you confirm that what I see is a bug and I should file it?* > > 2. The fact that *FieldValueEvaluator returns a field name when a value is > null* breaks any evaluator/decorator which otherwise would handle > *nulls*. Consider > these examples (I'm using *cartesianProduct *on an integer array to get > several tuples with integers because I couldn't find a way to do so > directly): > > > cartesianProduct( > > tuple(a=array(1,null,3)), > > a > > ) > > returns values preserving *nulls: * > > > "docs": [ > > {"a": 1}, > > {"a": null}, > > {"a": 3}, > > ...] > > If I just execute *add(1, null) *it works as expected and returns *null.* > Now, > if I'm trying to apply any stream evaluator which should work fine with > *nulls* to this stream: > > > select( > > cartesianProduct( > > tuple(a=array(1,null,3)), > > a > > ), > > add(a, 1) as a > > ) > > it fails to process the second record saying that: > > > "docs": [ > > {"a": 2}, > > { > > "EXCEPTION": "Failed to evaluate expression add(a,val(1)) - Numeric > > value expected but found type java.lang.String for value a", > > ... > > } > > ] > > It looks even more confusing when running the following query: > > > select( > > cartesianProduct( > > tuple(a=array(1,null,3)), > > a > > ), > > coalesce(a, 42) as a > > ) > > produces > > > "docs": [ > > {"a": 1}, > > {"a": "a"}, > > {"a": 3}, > > ...] > > instead of > > > "docs": [ > > {"a": 1}, > > {"a": *42*}, > > {"a": 3}, > > ...] > > As I mentioned in the issue description, I think the issue lies in these > lines of *FieldValueEvaluator:* > > > if(value == null) { > > return fieldName; > > } > > I consider this to be very counterintuitive. *Can you confirm that this is > a bug, rather than a designed feature?* > > 3. *Most Boolean Stream Evaluators* state that they *don't work with > NULLs.* However, > it's very inconvenient and there is no other way to work around it (see > item #1)*. *I'm talking about the following evaluators: *and, eor, or, gt, > lt, gteq, lteq. *At the moment these evaluators just throw exceptions when > an argument is *null. **Have you considered making their behavior more > SQL-like?* When the behavior is like this: > > - *gt, lt, gteq, lteq *evaluators return *null* if any of the arguments > is *null* > - *or(true, null)* returns *true* > - *and(true, null)* returns *false* > - *having* decorator treats *null* returned by *booleanEvaluator* as > *false* > > 4. Some *inconsistencies in evaluators behavior* and/or documentation: > > - *div(1, null)* fails while *mult(1, null), add(1, null), sub(1, null)* > return *null*. *Should I file a bug for div?* > - documentation for *not *says that "The function will fail to execute > if the parameter is non-boolean or null" however it returns *null* > for *not(null). > **Should I create a task to fix the doc?* > > I know I mixed many questions into one thread, however for me they are all > interrelated. Thank in advance for your help. > > -- > Best Regards, > Alex Chornyi >