[jira] [Created] (NIFI-6295) NiFiRecordSerDe in PutHive3StreamingProcessor does not handle types contained in arrays

2019-05-13 Thread Gideon Korir (JIRA)
Gideon Korir created NIFI-6295:
--

 Summary: NiFiRecordSerDe in PutHive3StreamingProcessor does not 
handle types contained in arrays
 Key: NIFI-6295
 URL: https://issues.apache.org/jira/browse/NIFI-6295
 Project: Apache NiFi
  Issue Type: Bug
  Components: Extensions
Affects Versions: 1.9.2
 Environment: - Rhel 7.5
- JDK 8
Reporter: Gideon Korir


NiFiRecordSerDe does not handle objects contained in Arrays correctly, causing 
the HiveStreaming writer to fail when trying to flush the contents.

 

The culprit:
{code:java}
//NiFiRecordSerDe.extractCurrentField

case LIST:
val = Arrays.asList(record.getAsArray(fieldName));
break;{code}
Unhelpful error, the user gets:

 
{code:java}
java.lang.NullPointerException: null at java.lang.System.arraycopy(Native 
Method) at org.apache.hadoop.io.Text.set(Text.java:225) at 
org.apache.orc.impl.StringRedBlackTree.add(StringRedBlackTree.java:59) at 
org.apache.orc.impl.writer.StringTreeWriter.writeBatch(StringTreeWriter.java:70)
 at 
org.apache.orc.impl.writer.StructTreeWriter.writeFields(StructTreeWriter.java:64)
 at 
org.apache.orc.impl.writer.StructTreeWriter.writeBatch(StructTreeWriter.java:78)
 at 
org.apache.orc.impl.writer.StructTreeWriter.writeRootBatch(StructTreeWriter.java:56)
 at org.apache.orc.impl.WriterImpl.addRowBatch(WriterImpl.java:556) at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushInternalBatch(WriterImpl.java:297)
 at org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:334)  
   at 
org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater.close(OrcRecordUpdater.java:557)
 at 
org.apache.hive.streaming.AbstractRecordWriter.close(AbstractRecordWriter.java:360)
 at 
org.apache.hive.streaming.HiveStreamingConnection$TransactionBatch.closeImpl(HiveStreamingConnection.java:979)
 at 
org.apache.hive.streaming.HiveStreamingConnection$TransactionBatch.close(HiveStreamingConnection.java:970)
 at 
org.apache.hive.streaming.HiveStreamingConnection$TransactionBatch.markDead(HiveStreamingConnection.java:833)
 at 
org.apache.hive.streaming.HiveStreamingConnection$TransactionBatch.write(HiveStreamingConnection.java:814)
 at 
org.apache.hive.streaming.HiveStreamingConnection.write(HiveStreamingConnection.java:533)
 at 
org.apache.nifi.processors.hive.PutHive3Streaming.onTrigger(PutHive3Streaming.java:414)
 at 
org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
 at 
org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1162)
 at 
org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:205)
 at 
org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
 at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
at java.lang.Thread.run(Thread.java:748)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (NIFI-5963) Test failure in TestFileSystemRepository caused by DiskUtils.deleteRecursively

2019-01-18 Thread Gideon Korir (JIRA)


 [ 
https://issues.apache.org/jira/browse/NIFI-5963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gideon Korir updated NIFI-5963:
---
Issue Type: Bug  (was: Improvement)

> Test failure in TestFileSystemRepository caused by DiskUtils.deleteRecursively
> --
>
> Key: NIFI-5963
> URL: https://issues.apache.org/jira/browse/NIFI-5963
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Core Framework
>Affects Versions: 1.8.0
> Environment: Rhel 7.5
> NiFi 1.8
> JDK 1.8.0_175
>Reporter: Gideon Korir
>Priority: Major
>
> I'm trying to build NiFi but I keep getting the following errors:
> {code:java}
> //@Before
> java.lang.AssertionError: Unable to delete target/content_repository/1 
> expected null, but was:
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotNull(Assert.java:755)
> at org.junit.Assert.assertNull(Assert.java:737)
> at 
> org.apache.nifi.controller.repository.util.DiskUtils.deleteRecursively(DiskUtils.java:47)
> at 
> org.apache.nifi.controller.repository.TestFileSystemRepository.setup(TestFileSystemRepository.java:77)
> //@After
> java.lang.NullPointerException
> at 
> org.apache.nifi.controller.repository.TestFileSystemRepository.shutdown(TestFileSystemRepository.java:87){code}
>  
> I've traced this to the multiple test methods trying to modify the content 
> repository concurrently. When I run the tests 1 by 1 they all pass; to make 
> them work with maven build I've made the following changes:
> {code:java}
> private final File rootFile = new File("target/content_repository/" + 
> UUID.randomUUID().toString());
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (NIFI-5963) Test failure in TestFileSystemRepository caused by DiskUtils.deleteRecursively

2019-01-18 Thread Gideon Korir (JIRA)
Gideon Korir created NIFI-5963:
--

 Summary: Test failure in TestFileSystemRepository caused by 
DiskUtils.deleteRecursively
 Key: NIFI-5963
 URL: https://issues.apache.org/jira/browse/NIFI-5963
 Project: Apache NiFi
  Issue Type: Improvement
  Components: Core Framework
Affects Versions: 1.8.0
 Environment: Rhel 7.5
NiFi 1.8
JDK 1.8.0_175
Reporter: Gideon Korir


I'm trying to build NiFi but I keep getting the following errors:
{code:java}
//@Before
java.lang.AssertionError: Unable to delete target/content_repository/1 expected 
null, but was:

at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotNull(Assert.java:755)
at org.junit.Assert.assertNull(Assert.java:737)
at 
org.apache.nifi.controller.repository.util.DiskUtils.deleteRecursively(DiskUtils.java:47)
at 
org.apache.nifi.controller.repository.TestFileSystemRepository.setup(TestFileSystemRepository.java:77)

//@After
java.lang.NullPointerException
at 
org.apache.nifi.controller.repository.TestFileSystemRepository.shutdown(TestFileSystemRepository.java:87){code}
 

I've traced this to the multiple test methods trying to modify the content 
repository concurrently. When I run the tests 1 by 1 they all pass; to make 
them work with maven build I've made the following changes:
{code:java}
private final File rootFile = new File("target/content_repository/" + 
UUID.randomUUID().toString());
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (NIFI-5946) Hive3Streaming memory leak

2019-01-10 Thread Gideon Korir (JIRA)


 [ 
https://issues.apache.org/jira/browse/NIFI-5946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gideon Korir resolved NIFI-5946.

   Resolution: Duplicate
Fix Version/s: 1.9.0

> Hive3Streaming memory leak
> --
>
> Key: NIFI-5946
> URL: https://issues.apache.org/jira/browse/NIFI-5946
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Affects Versions: 1.8.0
> Environment: Rhel 7.5
> Open jdk 1.8
> NiFi 1.8
> HDP 3
>Reporter: Gideon Korir
>Priority: Major
> Fix For: 1.9.0
>
> Attachments: nifi_memory_leak.png
>
>
> Our NiFi instance has been leaking memory since started using Hive3Streaming 
> processor and now we are seeing memory usage of 60+ GB. Looking at the 
> processor code it seems like we are:
>  # Creating a new StreamingConnection with each flow file
>  # Registering a shutdown hook via 
> ShutdownHookManager.addShutdownHook that should close the 
> connection
>  # Closing the connection and not removing the shutdown hook.
> The Runnable captures the connection and conf objects, and since our NiFi 
> instance has been running continuously and the JVM hasn't shutdown (3 weeks 
> +) the ShutdownHookManager is still holding reference to the connections 
> created since causing the memory leak.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-5946) Hive3Streaming memory leak

2019-01-10 Thread Gideon Korir (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16739420#comment-16739420
 ] 

Gideon Korir commented on NIFI-5946:


[~pvillard] closing the issue as duplicate, sorry I only searched 
Hive3Streaming and didn't get a result back. Can confirm it's fixed.

> Hive3Streaming memory leak
> --
>
> Key: NIFI-5946
> URL: https://issues.apache.org/jira/browse/NIFI-5946
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Affects Versions: 1.8.0
> Environment: Rhel 7.5
> Open jdk 1.8
> NiFi 1.8
> HDP 3
>Reporter: Gideon Korir
>Priority: Major
> Attachments: nifi_memory_leak.png
>
>
> Our NiFi instance has been leaking memory since started using Hive3Streaming 
> processor and now we are seeing memory usage of 60+ GB. Looking at the 
> processor code it seems like we are:
>  # Creating a new StreamingConnection with each flow file
>  # Registering a shutdown hook via 
> ShutdownHookManager.addShutdownHook that should close the 
> connection
>  # Closing the connection and not removing the shutdown hook.
> The Runnable captures the connection and conf objects, and since our NiFi 
> instance has been running continuously and the JVM hasn't shutdown (3 weeks 
> +) the ShutdownHookManager is still holding reference to the connections 
> created since causing the memory leak.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-5946) Hive3Streaming memory leak

2019-01-10 Thread Gideon Korir (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16739412#comment-16739412
 ] 

Gideon Korir commented on NIFI-5946:


Question: why do we need to add a shutdown hook given that 
HiveStreamingConnection already implements this logic correctly,

> Hive3Streaming memory leak
> --
>
> Key: NIFI-5946
> URL: https://issues.apache.org/jira/browse/NIFI-5946
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Affects Versions: 1.8.0
> Environment: Rhel 7.5
> Open jdk 1.8
> NiFi 1.8
> HDP 3
>Reporter: Gideon Korir
>Priority: Major
> Attachments: nifi_memory_leak.png
>
>
> Our NiFi instance has been leaking memory since started using Hive3Streaming 
> processor and now we are seeing memory usage of 60+ GB. Looking at the 
> processor code it seems like we are:
>  # Creating a new StreamingConnection with each flow file
>  # Registering a shutdown hook via 
> ShutdownHookManager.addShutdownHook that should close the 
> connection
>  # Closing the connection and not removing the shutdown hook.
> The Runnable captures the connection and conf objects, and since our NiFi 
> instance has been running continuously and the JVM hasn't shutdown (3 weeks 
> +) the ShutdownHookManager is still holding reference to the connections 
> created since causing the memory leak.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-5946) Hive3Streaming memory leak

2019-01-10 Thread Gideon Korir (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16739390#comment-16739390
 ] 

Gideon Korir commented on NIFI-5946:


[Similar 
issue|http://apache-nifi-users-list.2361937.n4.nabble.com/Fwd-Nifi-1-7-1-PutHive3Streaming-Connection-shutDownHook-Entry-not-collected-td6340.html]

> Hive3Streaming memory leak
> --
>
> Key: NIFI-5946
> URL: https://issues.apache.org/jira/browse/NIFI-5946
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Affects Versions: 1.8.0
> Environment: Rhel 7.5
> Open jdk 1.8
> NiFi 1.8
> HDP 3
>Reporter: Gideon Korir
>Priority: Major
> Attachments: nifi_memory_leak.png
>
>
> Our NiFi instance has been leaking memory since started using Hive3Streaming 
> processor and now we are seeing memory usage of 60+ GB. Looking at the 
> processor code it seems like we are:
>  # Creating a new StreamingConnection with each flow file
>  # Registering a shutdown hook via 
> ShutdownHookManager.addShutdownHook that should close the 
> connection
>  # Closing the connection and not removing the shutdown hook.
> The Runnable captures the connection and conf objects, and since our NiFi 
> instance has been running continuously and the JVM hasn't shutdown (3 weeks 
> +) the ShutdownHookManager is still holding reference to the connections 
> created since causing the memory leak.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (NIFI-5946) Hive3Streaming memory leak

2019-01-10 Thread Gideon Korir (JIRA)
Gideon Korir created NIFI-5946:
--

 Summary: Hive3Streaming memory leak
 Key: NIFI-5946
 URL: https://issues.apache.org/jira/browse/NIFI-5946
 Project: Apache NiFi
  Issue Type: Improvement
  Components: Extensions
Affects Versions: 1.8.0
 Environment: Rhel 7.5
Open jdk 1.8
NiFi 1.8
HDP 3
Reporter: Gideon Korir
 Attachments: nifi_memory_leak.png

Our NiFi instance has been leaking memory since started using Hive3Streaming 
processor and now we are seeing memory usage of 60+ GB. Looking at the 
processor code it seems like we are:
 # Creating a new StreamingConnection with each flow file
 # Registering a shutdown hook via 
ShutdownHookManager.addShutdownHook that should close the 
connection
 # Closing the connection and not removing the shutdown hook.

The Runnable captures the connection and conf objects, and since our NiFi 
instance has been running continuously and the JVM hasn't shutdown (3 weeks +) 
the ShutdownHookManager is still holding reference to the connections created 
since causing the memory leak.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NIFI-5891) NiFi record serde throws NPE when decimal field has null value

2018-12-13 Thread Gideon Korir (JIRA)


[ 
https://issues.apache.org/jira/browse/NIFI-5891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720227#comment-16720227
 ] 

Gideon Korir commented on NIFI-5891:


This issue seems to affect all logical avro types: decimal, date, timestamp, 
arrays

> NiFi record serde throws NPE when decimal field has null value
> --
>
> Key: NIFI-5891
> URL: https://issues.apache.org/jira/browse/NIFI-5891
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
>Affects Versions: 1.8.0
> Environment: RHEL 7.5
> Open JDK 1.8
>Reporter: Gideon Korir
>Priority: Major
>
> When an incoming record has a null decimal field, NiFiRecordSerDe (
> org.apache.hive.streaming) throws an NPE when trying to create a HiveDecimal. 
> This causes the PutHive3Streaming processor to skip the record and log a 
> warning.
>  
> The lines:
> {code:java}
> val = HiveDecimal.create(record.getAsDouble(fieldName));
> {code}
> should be changed to something along the lines of:
> {code:java}
> Double d = record.getAsDouble(fieldName);
> val = d == null ? null : HiveRecord.create(d);{code}
> The bug is insidious and occurs during Auto-unboxing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (NIFI-5891) NiFi record serde throws NPE when decimal field has null value

2018-12-11 Thread Gideon Korir (JIRA)
Gideon Korir created NIFI-5891:
--

 Summary: NiFi record serde throws NPE when decimal field has null 
value
 Key: NIFI-5891
 URL: https://issues.apache.org/jira/browse/NIFI-5891
 Project: Apache NiFi
  Issue Type: Improvement
  Components: Core Framework
Affects Versions: 1.8.0
 Environment: RHEL 7.5
Open JDK 1.8

Reporter: Gideon Korir


When an incoming record has a null decimal field, NiFiRecordSerDe (
org.apache.hive.streaming) throws an NPE when trying to create a HiveDecimal. 
This causes the PutHive3Streaming processor to skip the record and log a 
warning.
 
The lines:
{code:java}
val = HiveDecimal.create(record.getAsDouble(fieldName));
{code}
should be changed to something along the lines of:
{code:java}
Double d = record.getAsDouble(fieldName);
val = d == null ? null : HiveRecord.create(d);{code}
The bug is insidious and occurs during Auto-unboxing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (NIFI-5888) QueryRecord processor handling timestamp

2018-12-11 Thread Gideon Korir (JIRA)


 [ 
https://issues.apache.org/jira/browse/NIFI-5888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gideon Korir updated NIFI-5888:
---
Description: 
Add the test below to `org.apache.nifi.processors.standard.TestQueryRecord`

 
{code:java}
@Test public void testTimestampColumns() throws InitializationException { 
final MockRecordParser parser = new MockRecordParser(); 
parser.addSchemaField("name", RecordFieldType.STRING); 
parser.addSchemaField("date_added", RecordFieldType.TIMESTAMP); 
parser.addRecord("Tom", Timestamp.valueOf("2018-12-03 09:12:00")); 
parser.addRecord("Jerry", Timestamp.valueOf("2018-12-04 10:26:00")); 
parser.addRecord("Tom", Timestamp.valueOf("2017-01-03 11:22:00")); 
final List colNames = new ArrayList<>(); 
colNames.add("name"); 
colNames.add("day_added"); 
final ResultSetValidatingRecordWriter writer = new 
ResultSetValidatingRecordWriter(colNames); 
TestRunner runner = getRunner(); 
runner.addControllerService("parser", parser); 
runner.enableControllerService(parser); 
runner.addControllerService("writer", writer); 
runner.enableControllerService(writer); 
runner.setProperty(REL_NAME, "select name, {fn YEAR(date_added)} as day_added 
from FLOWFILE"); runner.setProperty(QueryRecord.RECORD_READER_FACTORY, 
"parser"); 
runner.setProperty(QueryRecord.RECORD_WRITER_FACTORY, "writer"); 
runner.enqueue(""); 
runner.run(); 
runner.assertTransferCount(REL_NAME, 1); 
}

{code}
 

*Expected*: Test will pass

*Actual*: fails with ClassCastException with message "java.sql.Timestamp cannot 
be cast to java.lang.Long".

//Also used sql: "select name, CAST(date_added, DATE) as day_added from 
FLOWFILE" and failed

 

I (think) I've traced the issue to the NiFi usage of Apache Calcite; according 
to Julian Hyde's comment on [this 
issue|https://issues.apache.org/jira/browse/CALCITE-1427].

Disclaimer: My knowledge on Calcite is zero to none so I could be completely 
wrong.

Currently, QueryRecord uses FlowFileTable which only implements QueryableTable 
and TranslateableTable and therefore can not work with java.sql.Timestamp (at 
least based on Hyde's comment on the issue). This will mean almost all 
RecordReader services (Avro, Csv) would not work with QueryRecord.

 

  was:
Add the test below to `org.apache.nifi.processors.standard.TestQueryRecord`

 
{code:java}
@Test public void testTimestampColumns() throws InitializationException { final 
MockRecordParser parser = new MockRecordParser(); parser.addSchemaField("name", 
RecordFieldType.STRING); parser.addSchemaField("date_added", 
RecordFieldType.TIMESTAMP); parser.addRecord("Tom", 
Timestamp.valueOf("2018-12-03 09:12:00")); parser.addRecord("Jerry", 
Timestamp.valueOf("2018-12-04 10:26:00")); parser.addRecord("Tom", 
Timestamp.valueOf("2017-01-03 11:22:00")); final List colNames = new 
ArrayList<>(); colNames.add("name"); colNames.add("day_added"); final 
ResultSetValidatingRecordWriter writer = new 
ResultSetValidatingRecordWriter(colNames); TestRunner runner = getRunner(); 
runner.addControllerService("parser", parser); 
runner.enableControllerService(parser); runner.addControllerService("writer", 
writer); runner.enableControllerService(writer); runner.setProperty(REL_NAME, 
"select name, {fn YEAR(date_added)} as day_added from FLOWFILE"); 
runner.setProperty(QueryRecord.RECORD_READER_FACTORY, "parser"); 
runner.setProperty(QueryRecord.RECORD_WRITER_FACTORY, "writer"); 
runner.enqueue(""); runner.run(); runner.assertTransferCount(REL_NAME, 1); 
}

{code}
 

*Expected*: Test will pass

*Actual*: fails with ClassCastException with message "java.sql.Timestamp cannot 
be cast to java.lang.Long".

//Also used sql: "select name, CAST(date_added, DATE) as day_added from 
FLOWFILE" and failed

 

I (think) I've traced the issue to the NiFi usage of Apache Calcite; according 
to Julian Hyde's comment on [this 
issue|https://issues.apache.org/jira/browse/CALCITE-1427].

Disclaimer: My knowledge on Calcite is zero to none so I could be completely 
wrong.

Currently, QueryRecord uses FlowFileTable which only implements QueryableTable 
and TranslateableTable and therefore can not work with java.sql.Timestamp (at 
least based on Hyde's comment on the issue). This will mean almost all 
RecordReader services (Avro, Csv) would not work with QueryRecord.

 


> QueryRecord processor handling timestamp
> 
>
> Key: NIFI-5888
> URL: https://issues.apache.org/jira/browse/NIFI-5888
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Affects Versions: 1.8.0
>Reporter: Gideon Korir
>Priority: Major
>
> Add the test below to `org.apache.nifi.processors.standard.TestQueryRecord`
>  
> {code:java}
> @Test public void testTimestampColumns() throws InitializationException { 
> final MockRecordParser parser = new MockRecordParser(); 
> 

[jira] [Created] (NIFI-5888) QueryRecord processor handling timestamp

2018-12-11 Thread Gideon Korir (JIRA)
Gideon Korir created NIFI-5888:
--

 Summary: QueryRecord processor handling timestamp
 Key: NIFI-5888
 URL: https://issues.apache.org/jira/browse/NIFI-5888
 Project: Apache NiFi
  Issue Type: Improvement
  Components: Extensions
Affects Versions: 1.8.0
Reporter: Gideon Korir


Add the test below to `org.apache.nifi.processors.standard.TestQueryRecord`

 
{code:java}
@Test public void testTimestampColumns() throws InitializationException { final 
MockRecordParser parser = new MockRecordParser(); parser.addSchemaField("name", 
RecordFieldType.STRING); parser.addSchemaField("date_added", 
RecordFieldType.TIMESTAMP); parser.addRecord("Tom", 
Timestamp.valueOf("2018-12-03 09:12:00")); parser.addRecord("Jerry", 
Timestamp.valueOf("2018-12-04 10:26:00")); parser.addRecord("Tom", 
Timestamp.valueOf("2017-01-03 11:22:00")); final List colNames = new 
ArrayList<>(); colNames.add("name"); colNames.add("day_added"); final 
ResultSetValidatingRecordWriter writer = new 
ResultSetValidatingRecordWriter(colNames); TestRunner runner = getRunner(); 
runner.addControllerService("parser", parser); 
runner.enableControllerService(parser); runner.addControllerService("writer", 
writer); runner.enableControllerService(writer); runner.setProperty(REL_NAME, 
"select name, {fn YEAR(date_added)} as day_added from FLOWFILE"); 
runner.setProperty(QueryRecord.RECORD_READER_FACTORY, "parser"); 
runner.setProperty(QueryRecord.RECORD_WRITER_FACTORY, "writer"); 
runner.enqueue(""); runner.run(); runner.assertTransferCount(REL_NAME, 1); 
}

{code}
 

*Expected*: Test will pass

*Actual*: fails with ClassCastException with message "java.sql.Timestamp cannot 
be cast to java.lang.Long".

//Also used sql: "select name, CAST(date_added, DATE) as day_added from 
FLOWFILE" and failed

 

I (think) I've traced the issue to the NiFi usage of Apache Calcite; according 
to Julian Hyde's comment on [this 
issue|https://issues.apache.org/jira/browse/CALCITE-1427].

Disclaimer: My knowledge on Calcite is zero to none so I could be completely 
wrong.

Currently, QueryRecord uses FlowFileTable which only implements QueryableTable 
and TranslateableTable and therefore can not work with java.sql.Timestamp (at 
least based on Hyde's comment on the issue). This will mean almost all 
RecordReader services (Avro, Csv) would not work with QueryRecord.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (NIFI-5662) AvroTypeUtil Decimal support using Fixed Error

2018-10-05 Thread Gideon Korir (JIRA)
Gideon Korir created NIFI-5662:
--

 Summary: AvroTypeUtil Decimal support using Fixed Error
 Key: NIFI-5662
 URL: https://issues.apache.org/jira/browse/NIFI-5662
 Project: Apache NiFi
  Issue Type: Improvement
  Components: Core Framework
Affects Versions: 1.7.1
 Environment: RHEL 7.5
JDK 1.8.182
Reporter: Gideon Korir


When the decimal is specified as fixed in the Avro schema, AvroTypeUtils 
converts the decimal into a ByteBuffer instead of a GenericFixed.

The code:
{code:java}
return new Conversions.DecimalConversion().toBytes(decimal, fieldSchema, 
logicalType)
{code}
 Should be:
{code:java}
return fieldSchema.getType() == Type.BYTES
? new Conversions.DecimalConversion().toBytes(decimal, fieldSchema, logicalType)
: new Conversions.DecimalConversion().toFixed(decimal, fieldSchema, 
logicalType);
{code}
The former causes the AvroRecordSetWriter to fail with the error: 

_org.apache.avro.file.DataFileWriter$AppendWriteException: 
java.lang.ClassCastException: java.nio.HeapByteBuffer cannot be cast to 
org.apache.avro.generic.GenericFixed_



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (NIFI-5584) Reorder if statement in DataTypeUtils.toTimestamp so that Timestamp comes before Date

2018-09-10 Thread Gideon Korir (JIRA)
Gideon Korir created NIFI-5584:
--

 Summary: Reorder if statement in DataTypeUtils.toTimestamp so that 
Timestamp comes before Date
 Key: NIFI-5584
 URL: https://issues.apache.org/jira/browse/NIFI-5584
 Project: Apache NiFi
  Issue Type: Improvement
  Components: Core Framework
Affects Versions: 1.7.1
 Environment: RHEL, JDK 8
Reporter: Gideon Korir


The method DataTypeUtils.toTimestamp in package nifi-record has the if 
statement structured as follows:
{code:java}
public static Timestamp toTimestamp(final Object value, final 
Supplier format, final String fieldName) {
if (value == null) {
return null;
}

if (value instanceof java.util.Date) {
return new Timestamp(((java.util.Date)value).getTime());
}

if (value instanceof Timestamp) {
return (Timestamp) value;
}
{code}
Since Timestamp extends java.util.Date a value of type timestamp always matches 
the 1st if statement and allocates a new timestamp object. The 1st if statement 
should check for timestamp followed by java.util.Date check.
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (NIFI-5572) Support decimal as a RecordFieldType

2018-09-04 Thread Gideon Korir (JIRA)
Gideon Korir created NIFI-5572:
--

 Summary: Support decimal as a RecordFieldType
 Key: NIFI-5572
 URL: https://issues.apache.org/jira/browse/NIFI-5572
 Project: Apache NiFi
  Issue Type: Improvement
  Components: Core Framework
Affects Versions: 1.7.1
Reporter: Gideon Korir


I'm building a custom RecordReader to import financial data from extracts in a 
custom file format. Everything works except when converting from Avro Schema to 
RecordSchema, the decimal information is silently converted to double causing 
the reader to interpret the transaction values as doubles.

It would be nice if we had native support for Decimals as now I have to read 
the values as string and then run a spark job to convert them to doubles.

My suggestion would be:
 # Add a RecordFieldType.DECIMAL value
 # Add a DecimalDataType class that has `getPrecision()` and `getScale()` 
getters
 # Add a `RecordFieldType.getDecimalDataType(precision, scale)` method to 
create an instance of the RecordFieldType.DECIMAL value



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)