Mandeep, Thanks for reporting these! I put up a JIRA [1] that covers the cases #2 and #4, and a Pull Request to address it. I was not able to replicate #1. The result that I got back was [{"world": "hello"}] - which is what I expect. Perhaps you can share a template that shows your configuration?
Re: #4. This is correct. We certainly could try to do something where we detect a column name of EXPR$<number> and change it to EXPR<number> or something like that... though I wonder if we should? When running a database query, for example, it probably makes sense to get back EXPR$1 because it's because read in the context of someone who just submitted the query. On an automated system, I would think that the column is not very meaningful unless appropriately named. That said, I can understand the desire to rename it. If you'd like to submit a JIRA and a PR to do it, then more power to you :) We should make sure, though, that if we do that, there's a property in the Processor that turns that on and off, so as to maintain backward compatibility of flows. Thanks -Mark [1] https://issues.apache.org/jira/browse/NIFI-5809 On Nov 9, 2018, at 10:15 AM, Mandeep Gill <mand...@nstack.com<mailto:mand...@nstack.com>> wrote: Hi there, Thanks so much for your help and the quick turnaround! Can confirm that the patch works well and resolves the main issues we were hitting with nulls. We have been hitting a few more issues with QueryProcessor that I've listed below - there's a couple and I'm happy to create bug reports for each and help fix them if/where we can - where would be the best place to submit them going forwards? 1. When using the JSONRecordSetWriter service with QueryProcessor, strings are truncated to their first character, so "select 'hello' as world" returns "[{"world": "h"}]" 2. count work slightly odd with nulls, so given the following data, {"id": "129984bf31e025599c0e9232df5c7b7c", "price": 19.47}, {"id": "a6cfcb7c7178b9d18c50f2f2dc41dab3", "price": null} the query "select count(*) as c from flowfile where price is null" returns [{"c": 0}]. This may be a upstream Calcite issue however. 3. When returning the result of any SQL functions, e.g. "select count(*) from flowfile", the auto-generated fields names, e.g. "EXPR$1" are not valid Avro field names and so causes an exception [1] unless explicitly renamed as per the previous example 4. When returning a single-width column, any rows that are null are silently discarded. As an example, given the data as above, the query "select price from flowfile" will drop the second row. Please let me know if can provide any further information, we'd love to help and start contributing although we're still getting familiar with Java and the NiFi codebase. Cheers! Mandeep [1] Using JsonRecordSetWriter: org.apache.nifi.processor.exception.ProcessException: IOException thrown from QueryRecord[id=f3ba8f30-0166-1000-d46d-d8f7ddfd96b6]: java.io.IOException: org.apache.avro.SchemaParseException: Illegal character in: EXPR$0 at org.apache.nifi.controller.repository.StandardProcessSession.write(StandardProcessSession.java:2667) at org.apache.nifi.processors.standard.QueryRecord.onTrigger(QueryRecord.java:309) at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27) at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1165) at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:203) at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: org.apache.avro.SchemaParseException: Illegal character in: EXPR$0 at org.apache.nifi.processors.standard.QueryRecord$1.process(QueryRecord.java:327) at org.apache.nifi.controller.repository.StandardProcessSession.write(StandardProcessSession.java:2648) ... 12 common frames omitted Caused by: org.apache.avro.SchemaParseException: Illegal character in: EXPR$0 at org.apache.avro.Schema.validateName(Schema.java:1151) at org.apache.avro.Schema.access$200(Schema.java:81) at org.apache.avro.Schema$Field.<init>(Schema.java:403) at org.apache.avro.Schema$Field.<init>(Schema.java:423) at org.apache.avro.Schema$Field.<init>(Schema.java:415) at org.apache.nifi.avro.AvroTypeUtil.buildAvroField(AvroTypeUtil.java:122) at org.apache.nifi.avro.AvroTypeUtil.buildAvroSchema(AvroTypeUtil.java:113) at org.apache.nifi.avro.AvroTypeUtil.extractAvroSchema(AvroTypeUtil.java:93) at org.apache.nifi.schema.access.WriteAvroSchemaAttributeStrategy.getAttributes(WriteAvroSchemaAttributeStrategy.java:62) at org.apache.nifi.json.WriteJsonResult.writeRecord(WriteJsonResult.java:137) at org.apache.nifi.serialization.AbstractRecordSetWriter.write(AbstractRecordSetWriter.java:59) at org.apache.nifi.serialization.AbstractRecordSetWriter.write(AbstractRecordSetWriter.java:52) at org.apache.nifi.processors.standard.QueryRecord$1.process(QueryRecord.java:324) ... 13 common frames omitted On Thu, 8 Nov 2018 at 21:46 Pierre Villard <pierre.villard...@gmail.com<mailto:pierre.villard...@gmail.com>> wrote: Hi Mandeep, Thanks for reporting this issue! Koji filed the JIRA [1] and submitted a PR for it [2]. I just merged it into master and it will be released with NiFi 1.9.0. You can also build the standard processors NAR from the master branch if you need the fix quickly. [1] https://issues.apache.org/jira/browse/NIFI-5802 [2] https://github.com/apache/nifi/pull/3158 Pierre Le mer. 7 nov. 2018 à 12:54, Mandeep Gill <mand...@nstack.com<mailto:mand...@nstack.com>> a écrit : Hi, We're hitting a couple of issues working with nulls when using QueryRecord using both NiFi 1.7.1 and 1.8.0. Things work as expected for strings, however when using other primitive types as defined by the avro schema, such as boolean, long, and double, null values in the input data aren't converted to NULLs within the SQL engine / Calcite. Instead they appear to remain as java null values and throw NPEs when attempting to use them within a query or simply return them as the output. To give some examples, given the following record data and schema (tested using both JSON and Avro record reader/writers) [ { "str_test" : "hello1", "bool_test" : true }, { "str_test" : null, "bool_test" : null } ] { "type": "record", "name": "schema", "fields": [ { "name": "str_test", "type": [ "string", "null" ], "default": null }, { "name": "bool_test", "type": [ "boolean", "null" ], "default": null } ] } The following queries return the empty resultset, select 'res' as res from FLOWFILE where bool_test IS NULL select 'res' as res from FLOWFILE where bool_test IS UNKNOWN and the query below returns a resultset of count 2, select 'res' from FLOWFILE where bool_test IS NOT NULL The query below works as expected, suggesting things work fine for strings select 'res' as res from FLOWFILE where str_test IS NULL However, finally the following query throws a NullPointerException (see [1]) on trying to convert the null to a boolean within the output writer select * from FLOWFILE where bool_test IS NOT NULL The null values for these types seem to be treated as distinct to the NULLs within the SQL engine, as the following query returns the empty resultset. select 'res' as res from FLOWFILE where CAST(NULL as boolean) IS DISTINCT FROM bool_test and the following query gives an RuntimeException (see [2]), select (COALESCE(bool_test, TRUE)) as res from flowfile Given all this we're unable to make use of datasets with nulls, are nulls only supported for strings or is there perhaps something we're doing wrong here in our setup/config. One thing we've noticed when running a simple "SELECT * from FLOWFILE" returns a nullable type for strings in the output avro schema but not for other primitives, even if they were nullable in the input schema - which could be related. Cheers, Mandeep [1] org.apache.nifi.processor.exception.ProcessException: IOException thrown from QueryRecord[id=43ee29ff-0166-1000-28bd-06dd07c1425d]: java.io.IOException: org.apache.avro.file.DataFileWriter$AppendWriteException: java.lang.NullPointerException: null of boolean in field bool_test of org.apache.nifi.nifiRecord at org.apache.nifi.controller.repository.StandardProcessSession.write(StandardProcessSession.java:2667) at org.apache.nifi.processors.standard.QueryRecord.onTrigger(QueryRecord.java:309) at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27) at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1165) at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:203) at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: org.apache.avro.file.DataFileWriter$AppendWriteException: java.lang.NullPointerException: null of boolean in field bool_test of org.apache.nifi.nifiRecord at org.apache.nifi.processors.standard.QueryRecord$1.process(QueryRecord.java:327) at org.apache.nifi.controller.repository.StandardProcessSession.write(StandardProcessSession.java:2648) ... 12 common frames omitted Caused by: org.apache.avro.file.DataFileWriter$AppendWriteException: java.lang.NullPointerException: null of boolean in field bool_test of org.apache.nifi.nifiRecord at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:308) at org.apache.nifi.avro.WriteAvroResultWithSchema.writeRecord(WriteAvroResultWithSchema.java:61) at org.apache.nifi.serialization.AbstractRecordSetWriter.write(AbstractRecordSetWriter.java:59) at org.apache.nifi.serialization.AbstractRecordSetWriter.write(AbstractRecordSetWriter.java:52) at org.apache.nifi.processors.standard.QueryRecord$1.process(QueryRecord.java:324) ... 13 common frames omitted Caused by: java.lang.NullPointerException: null of boolean in field bool_test of org.apache.nifi.nifiRecord at org.apache.avro.generic.GenericDatumWriter.npe(GenericDatumWriter.java:132) at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:126) at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:73) at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:60) at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:302) ... 17 common frames omitted Caused by: java.lang.NullPointerException: null at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:121) at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:73) at org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:153) at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:143) at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:105) ... 20 common frames omitted [2] org.apache.nifi.processor.exception.ProcessException: IOException thrown from QueryRecord[id=43ee29ff-0166-1000-28bd-06dd07c1425d]: java.io.IOException: java.lang.RuntimeException: Cannot convert null to boolean at org.apache.nifi.controller.repository.StandardProcessSession.write(StandardProcessSession.java:2667) at org.apache.nifi.processors.standard.QueryRecord.onTrigger(QueryRecord.java:309) at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27) at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1165) at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:203) at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: java.lang.RuntimeException: Cannot convert null to boolean at org.apache.nifi.processors.standard.QueryRecord$1.process(QueryRecord.java:327) at org.apache.nifi.controller.repository.StandardProcessSession.write(StandardProcessSession.java:2648) ... 12 common frames omitted Caused by: java.lang.RuntimeException: Cannot convert null to boolean at org.apache.calcite.runtime.SqlFunctions.cannotConvert(SqlFunctions.java:1460) at org.apache.calcite.runtime.SqlFunctions.toBoolean(SqlFunctions.java:1483) at Baz$1$1.current(Unknown Source) at org.apache.calcite.linq4j.Linq4j$EnumeratorIterator.next(Linq4j.java:684) at org.apache.calcite.avatica.util.IteratorCursor.next(IteratorCursor.java:46) at org.apache.calcite.avatica.AvaticaResultSet.next(AvaticaResultSet.java:217) at org.apache.nifi.serialization.record.ResultSetRecordSet.next(ResultSetRecordSet.java:84) at org.apache.nifi.serialization.AbstractRecordSetWriter.write(AbstractRecordSetWriter.java:51) at org.apache.nifi.processors.standard.QueryRecord$1.process(QueryRecord.java:324) ... 13 common frames omitted -- Mandeep Gill nstack.com<http://www.nstack.com/> / +44 7961822575<tel:+44%207961%20822575> -- Mandeep Gill nstack.com<http://www.nstack.com/> / +44 7961822575