Mandeep,

Thanks for reporting these! I put up a JIRA [1] that covers the cases #2 and 
#4, and a Pull Request to address it.
I was not able to replicate #1. The result that I got back was [{"world": 
"hello"}] - which is what I expect. Perhaps you
can share a template that shows your configuration?

Re: #4. This is correct. We certainly could try to do something where we detect 
a column name of EXPR$<number> and
change it to EXPR<number> or something like that... though I wonder if we 
should? When running a database query, for
example, it probably makes sense to get back EXPR$1 because it's because read 
in the context of someone who just
submitted the query. On an automated system, I would think that the column is 
not very meaningful unless appropriately
named. That said, I can understand the desire to rename it. If you'd like to 
submit a JIRA and a PR to do it, then more power
to you :) We should make sure, though, that if we do that, there's a property 
in the Processor that turns that on and off, so as
to maintain backward compatibility of flows.

Thanks
-Mark


[1] https://issues.apache.org/jira/browse/NIFI-5809


On Nov 9, 2018, at 10:15 AM, Mandeep Gill 
<mand...@nstack.com<mailto:mand...@nstack.com>> wrote:

Hi there,

Thanks so much for your help and the quick turnaround! Can confirm that the 
patch works well and resolves the main issues we were hitting with nulls.

We have been hitting a few more issues with QueryProcessor that I've listed 
below - there's a couple and I'm happy to create bug reports for each and help 
fix them if/where we can - where would be the best place to submit them going 
forwards?

1. When using the JSONRecordSetWriter service with QueryProcessor, strings are 
truncated to their first character, so "select 'hello' as world" returns 
"[{"world": "h"}]"
2. count work slightly odd with nulls, so given the following data,

      {"id": "129984bf31e025599c0e9232df5c7b7c", "price": 19.47},
      {"id": "a6cfcb7c7178b9d18c50f2f2dc41dab3", "price": null}

the query "select count(*) as c from flowfile where price is null" returns 
[{"c": 0}]. This may be a upstream Calcite issue however.

3. When returning the result of any SQL functions, e.g. "select count(*) from 
flowfile", the auto-generated fields names, e.g. "EXPR$1" are not valid Avro 
field names and so causes an exception [1] unless explicitly renamed as per the 
previous example

4. When returning a single-width column, any rows that are null are silently 
discarded. As an example, given the data as above, the query "select price from 
flowfile" will drop the second row.

Please let me know if can provide any further information, we'd love to help 
and start contributing although we're still getting familiar with Java and the 
NiFi codebase.

Cheers!
Mandeep

[1] Using JsonRecordSetWriter:
org.apache.nifi.processor.exception.ProcessException: IOException thrown from 
QueryRecord[id=f3ba8f30-0166-1000-d46d-d8f7ddfd96b6]: java.io.IOException: 
org.apache.avro.SchemaParseException: Illegal character in: EXPR$0
at 
org.apache.nifi.controller.repository.StandardProcessSession.write(StandardProcessSession.java:2667)
at 
org.apache.nifi.processors.standard.QueryRecord.onTrigger(QueryRecord.java:309)
at 
org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
at 
org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1165)
at 
org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:203)
at 
org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: org.apache.avro.SchemaParseException: Illegal 
character in: EXPR$0
at 
org.apache.nifi.processors.standard.QueryRecord$1.process(QueryRecord.java:327)
at 
org.apache.nifi.controller.repository.StandardProcessSession.write(StandardProcessSession.java:2648)
... 12 common frames omitted
Caused by: org.apache.avro.SchemaParseException: Illegal character in: EXPR$0
at org.apache.avro.Schema.validateName(Schema.java:1151)
at org.apache.avro.Schema.access$200(Schema.java:81)
at org.apache.avro.Schema$Field.<init>(Schema.java:403)
at org.apache.avro.Schema$Field.<init>(Schema.java:423)
at org.apache.avro.Schema$Field.<init>(Schema.java:415)
at org.apache.nifi.avro.AvroTypeUtil.buildAvroField(AvroTypeUtil.java:122)
at org.apache.nifi.avro.AvroTypeUtil.buildAvroSchema(AvroTypeUtil.java:113)
at org.apache.nifi.avro.AvroTypeUtil.extractAvroSchema(AvroTypeUtil.java:93)
at 
org.apache.nifi.schema.access.WriteAvroSchemaAttributeStrategy.getAttributes(WriteAvroSchemaAttributeStrategy.java:62)
at org.apache.nifi.json.WriteJsonResult.writeRecord(WriteJsonResult.java:137)
at 
org.apache.nifi.serialization.AbstractRecordSetWriter.write(AbstractRecordSetWriter.java:59)
at 
org.apache.nifi.serialization.AbstractRecordSetWriter.write(AbstractRecordSetWriter.java:52)
at 
org.apache.nifi.processors.standard.QueryRecord$1.process(QueryRecord.java:324)
... 13 common frames omitted


On Thu, 8 Nov 2018 at 21:46 Pierre Villard 
<pierre.villard...@gmail.com<mailto:pierre.villard...@gmail.com>> wrote:
Hi Mandeep,

Thanks for reporting this issue! Koji filed the JIRA [1] and submitted a PR for 
it [2]. I just merged it into master and it will be released with NiFi 1.9.0. 
You can also build the standard processors NAR from the master branch if you 
need the fix quickly.

[1] https://issues.apache.org/jira/browse/NIFI-5802
[2] https://github.com/apache/nifi/pull/3158

Pierre

Le mer. 7 nov. 2018 à 12:54, Mandeep Gill 
<mand...@nstack.com<mailto:mand...@nstack.com>> a écrit :
Hi,

We're hitting a couple of issues working with nulls when using QueryRecord 
using both NiFi 1.7.1 and 1.8.0.

Things work as expected for strings, however when using other primitive types 
as defined by the avro schema, such as boolean, long, and double, null values 
in the input data aren't converted to NULLs within the SQL engine / Calcite. 
Instead they appear to remain as java null values and throw NPEs when 
attempting to use them within a query or simply return them as the output.

To give some examples, given the following record data and schema (tested using 
both JSON and Avro record reader/writers)

[ {  "str_test" : "hello1",  "bool_test" : true }, {  "str_test" : null,  
"bool_test" : null } ]

{
  "type": "record",
  "name": "schema",
  "fields": [
    {
      "name": "str_test",
      "type": [ "string", "null" ],
      "default": null
    },
    {
      "name": "bool_test",
      "type": [ "boolean", "null" ],
      "default": null
    }
  ]
}

The following queries return the empty resultset,

select 'res' as res from FLOWFILE where bool_test IS NULL
select 'res' as res from FLOWFILE where bool_test IS UNKNOWN

and the query below returns a resultset of count 2,

select 'res' from FLOWFILE where bool_test IS NOT NULL


The query below works as expected, suggesting things work fine for strings

select 'res' as res from FLOWFILE where str_test IS NULL

However, finally the following query throws a NullPointerException (see [1]) on 
trying to convert the null to a boolean within the output writer

select * from FLOWFILE where bool_test IS NOT NULL

The null values for these types seem to be treated as distinct to the NULLs 
within the SQL engine, as the following query returns the empty resultset.

select 'res' as res from FLOWFILE where CAST(NULL as boolean) IS DISTINCT FROM 
bool_test


and the following query gives an RuntimeException (see [2]),

select (COALESCE(bool_test, TRUE)) as res from flowfile

Given all this we're unable to make use of datasets with nulls, are nulls only 
supported for strings or is there perhaps something we're doing wrong here in 
our setup/config. One thing we've noticed when running a simple "SELECT * from 
FLOWFILE" returns a nullable type for strings in the output avro schema but not 
for other primitives, even if they were nullable in the input schema - which 
could be related.

Cheers,
Mandeep

[1] org.apache.nifi.processor.exception.ProcessException: IOException thrown 
from QueryRecord[id=43ee29ff-0166-1000-28bd-06dd07c1425d]: java.io.IOException: 
org.apache.avro.file.DataFileWriter$AppendWriteException: 
java.lang.NullPointerException: null of boolean in field bool_test of 
org.apache.nifi.nifiRecord
at 
org.apache.nifi.controller.repository.StandardProcessSession.write(StandardProcessSession.java:2667)
at 
org.apache.nifi.processors.standard.QueryRecord.onTrigger(QueryRecord.java:309)
at 
org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
at 
org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1165)
at 
org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:203)
at 
org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: 
org.apache.avro.file.DataFileWriter$AppendWriteException: 
java.lang.NullPointerException: null of boolean in field bool_test of 
org.apache.nifi.nifiRecord
at 
org.apache.nifi.processors.standard.QueryRecord$1.process(QueryRecord.java:327)
at 
org.apache.nifi.controller.repository.StandardProcessSession.write(StandardProcessSession.java:2648)
... 12 common frames omitted
Caused by: org.apache.avro.file.DataFileWriter$AppendWriteException: 
java.lang.NullPointerException: null of boolean in field bool_test of 
org.apache.nifi.nifiRecord
at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:308)
at 
org.apache.nifi.avro.WriteAvroResultWithSchema.writeRecord(WriteAvroResultWithSchema.java:61)
at 
org.apache.nifi.serialization.AbstractRecordSetWriter.write(AbstractRecordSetWriter.java:59)
at 
org.apache.nifi.serialization.AbstractRecordSetWriter.write(AbstractRecordSetWriter.java:52)
at 
org.apache.nifi.processors.standard.QueryRecord$1.process(QueryRecord.java:324)
... 13 common frames omitted
Caused by: java.lang.NullPointerException: null of boolean in field bool_test 
of org.apache.nifi.nifiRecord
at org.apache.avro.generic.GenericDatumWriter.npe(GenericDatumWriter.java:132)
at 
org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:126)
at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:73)
at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:60)
at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:302)
... 17 common frames omitted
Caused by: java.lang.NullPointerException: null
at 
org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:121)
at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:73)
at 
org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:153)
at 
org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:143)
at 
org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:105)
... 20 common frames omitted


[2] org.apache.nifi.processor.exception.ProcessException: IOException thrown 
from QueryRecord[id=43ee29ff-0166-1000-28bd-06dd07c1425d]: java.io.IOException: 
java.lang.RuntimeException: Cannot convert null to boolean
at 
org.apache.nifi.controller.repository.StandardProcessSession.write(StandardProcessSession.java:2667)
at 
org.apache.nifi.processors.standard.QueryRecord.onTrigger(QueryRecord.java:309)
at 
org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
at 
org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1165)
at 
org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:203)
at 
org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: java.lang.RuntimeException: Cannot convert null 
to boolean
at 
org.apache.nifi.processors.standard.QueryRecord$1.process(QueryRecord.java:327)
at 
org.apache.nifi.controller.repository.StandardProcessSession.write(StandardProcessSession.java:2648)
... 12 common frames omitted
Caused by: java.lang.RuntimeException: Cannot convert null to boolean
at org.apache.calcite.runtime.SqlFunctions.cannotConvert(SqlFunctions.java:1460)
at org.apache.calcite.runtime.SqlFunctions.toBoolean(SqlFunctions.java:1483)
at Baz$1$1.current(Unknown Source)
at org.apache.calcite.linq4j.Linq4j$EnumeratorIterator.next(Linq4j.java:684)
at org.apache.calcite.avatica.util.IteratorCursor.next(IteratorCursor.java:46)
at org.apache.calcite.avatica.AvaticaResultSet.next(AvaticaResultSet.java:217)
at 
org.apache.nifi.serialization.record.ResultSetRecordSet.next(ResultSetRecordSet.java:84)
at 
org.apache.nifi.serialization.AbstractRecordSetWriter.write(AbstractRecordSetWriter.java:51)
at 
org.apache.nifi.processors.standard.QueryRecord$1.process(QueryRecord.java:324)
... 13 common frames omitted

--

Mandeep Gill

nstack.com<http://www.nstack.com/> / +44 7961822575<tel:+44%207961%20822575>

--

Mandeep Gill

nstack.com<http://www.nstack.com/> / +44 7961822575

Reply via email to