Hi guys! I'm working on a generic query builder for Streaming Expressions which allows building various requests containing row level expressions (i.e. evaluators), aggregations/metrics, sorts, etc. On this way, I bumped into many issues related to the handling of NULL values by the engine. Here are the issues in the descending order of their severity (from my standpoint):
1. *There is no way to check if a value in a tuple is NULL* because *eq* function fails to accept *null *as an argument: > *eq(1,null) * fails with > "Unable to check eq(...) because a null value was found" even though the documentation says <https://lucene.apache.org/solr/guide/7_7/stream-evaluator-reference.html#eq> that "If any any parameters are null and there is at least one parameter that is not null then false will be returned." This issue makes it impossible to evaluate an expression from the *if* function documentation <https://lucene.apache.org/solr/guide/7_7/stream-evaluator-reference.html#if> : > if(eq(fieldB,null), null, div(fieldA,fieldB)) // if fieldB is null then > null else fieldA / fieldB I think that the root cause of the issue is coming from the fact that *EqualToEvaluator* extends *RecursiveBooleanEvaluator* which checks that none of the arguments is *null*, but I don't think that's what we want here. *Can you confirm that what I see is a bug and I should file it?* 2. The fact that *FieldValueEvaluator returns a field name when a value is null* breaks any evaluator/decorator which otherwise would handle *nulls*. Consider these examples (I'm using *cartesianProduct *on an integer array to get several tuples with integers because I couldn't find a way to do so directly): > cartesianProduct( > tuple(a=array(1,null,3)), > a > ) returns values preserving *nulls: * > "docs": [ > {"a": 1}, > {"a": null}, > {"a": 3}, > ...] If I just execute *add(1, null) *it works as expected and returns *null.* Now, if I'm trying to apply any stream evaluator which should work fine with *nulls* to this stream: > select( > cartesianProduct( > tuple(a=array(1,null,3)), > a > ), > add(a, 1) as a > ) it fails to process the second record saying that: > "docs": [ > {"a": 2}, > { > "EXCEPTION": "Failed to evaluate expression add(a,val(1)) - Numeric > value expected but found type java.lang.String for value a", > ... > } > ] It looks even more confusing when running the following query: > select( > cartesianProduct( > tuple(a=array(1,null,3)), > a > ), > coalesce(a, 42) as a > ) produces > "docs": [ > {"a": 1}, > {"a": "a"}, > {"a": 3}, > ...] instead of > "docs": [ > {"a": 1}, > {"a": *42*}, > {"a": 3}, > ...] As I mentioned in the issue description, I think the issue lies in these lines of *FieldValueEvaluator:* > if(value == null) { > return fieldName; > } I consider this to be very counterintuitive. *Can you confirm that this is a bug, rather than a designed feature?* 3. *Most Boolean Stream Evaluators* state that they *don't work with NULLs.* However, it's very inconvenient and there is no other way to work around it (see item #1)*. *I'm talking about the following evaluators: *and, eor, or, gt, lt, gteq, lteq. *At the moment these evaluators just throw exceptions when an argument is *null. **Have you considered making their behavior more SQL-like?* When the behavior is like this: - *gt, lt, gteq, lteq *evaluators return *null* if any of the arguments is *null* - *or(true, null)* returns *true* - *and(true, null)* returns *false* - *having* decorator treats *null* returned by *booleanEvaluator* as *false* 4. Some *inconsistencies in evaluators behavior* and/or documentation: - *div(1, null)* fails while *mult(1, null), add(1, null), sub(1, null)* return *null*. *Should I file a bug for div?* - documentation for *not *says that "The function will fail to execute if the parameter is non-boolean or null" however it returns *null* for *not(null). **Should I create a task to fix the doc?* I know I mixed many questions into one thread, however for me they are all interrelated. Thank in advance for your help. -- Best Regards, Alex Chornyi