RE: PutParquet fails to convert avro logical decimal type.

2018-02-15 Thread mohit.jain
Hi Juan,

 

I’m using Nifi 1.4.0.

 

 

From: Juan Pablo Gardella [mailto:gardellajuanpa...@gmail.com] 
Sent: 16 February 2018 12:10
To: users@nifi.apache.org
Subject: Re: PutParquet fails to convert avro logical decimal type.

 

Are you using Nifi 1.5.0? If not, try with it first. There are bugs in older 
versions related to Record/Avro.

 

On Fri, 16 Feb 2018 at 02:46  > wrote:

Hi all, 

 

I am using QueryDatabaseTable to extract records from mysql. I have set Logical 
Data Type to true. I am using the PutParquet processor to write to HDFS. It is 
not able to convert the logical decimal type.

 

It throws an exception :-

2018-02-15 17:59:05,189 ERROR [Timer-Driven Process Thread-10] 
o.a.nifi.processors.parquet.PutParquet 
PutParquet[id=01611011-e4a8-106a-f933-eb66d923cfd1] Failed to write due to 
org.apache.nifi.serialization.record.util.IllegalTypeConversionException: 
Cannot convert value 1234567.2 of type class java.lang.Double because no 
compatible types exist in the UNION for field dectype: {}

org.apache.nifi.serialization.record.util.IllegalTypeConversionException: 
Cannot convert value 1234567.2 of type class java.lang.Double because no 
compatible types exist in the UNION for field dectype

   at 
org.apache.nifi.avro.AvroTypeUtil.convertUnionFieldValue(AvroTypeUtil.java:667)

   at 
org.apache.nifi.avro.AvroTypeUtil.convertToAvroObject(AvroTypeUtil.java:572)

   at 
org.apache.nifi.avro.AvroTypeUtil.createAvroRecord(AvroTypeUtil.java:432)

   at 
org.apache.nifi.processors.parquet.record.AvroParquetHDFSRecordWriter.write(AvroParquetHDFSRecordWriter.java:43)

   at 
org.apache.nifi.processors.hadoop.record.HDFSRecordWriter.write(HDFSRecordWriter.java:48)

   at 
org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$null$0(AbstractPutHDFSRecord.java:324)

   at 
org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2174)

   at 
org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2144)

   at 
org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$onTrigger$1(AbstractPutHDFSRecord.java:305)

   at java.security.AccessController.doPrivileged(Native Method)

   at javax.security.auth.Subject.doAs(Subject.java:360)

   at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1678)

   at 
org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.onTrigger(AbstractPutHDFSRecord.java:272)

   at 
org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)

   at 
org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1119)

   at 
org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:147)

   at 
org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:47)

   at 
org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:128)

   at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

   at 
java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)

   at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)

   at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)

   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

   at java.lang.Thread.run(Thread.java:748)

 

 

Please let me know if I’m doing anything wrong.

 

Regards,

Mohit Jain



Re: PutParquet fails to convert avro logical decimal type.

2018-02-15 Thread Juan Pablo Gardella
Are you using Nifi 1.5.0? If not, try with it first. There are bugs in
older versions related to Record/Avro.

On Fri, 16 Feb 2018 at 02:46  wrote:

> Hi all,
>
>
>
> I am using QueryDatabaseTable to extract records from mysql. I have set
> Logical Data Type to true. I am using the PutParquet processor to write to
> HDFS. It is not able to convert the logical decimal type.
>
>
>
> It throws an exception :-
>
> 2018-02-15 17:59:05,189 ERROR [Timer-Driven Process Thread-10]
> o.a.nifi.processors.parquet.PutParquet
> PutParquet[id=01611011-e4a8-106a-f933-eb66d923cfd1] Failed to write due to
> org.apache.nifi.serialization.record.util.IllegalTypeConversionException:
> Cannot convert value 1234567.2 of type class java.lang.Double because no
> compatible types exist in the UNION for field dectype: {}
>
> org.apache.nifi.serialization.record.util.IllegalTypeConversionException:
> Cannot convert value 1234567.2 of type class java.lang.Double because no
> compatible types exist in the UNION for field dectype
>
>at
> org.apache.nifi.avro.AvroTypeUtil.convertUnionFieldValue(AvroTypeUtil.java:667)
>
>at
> org.apache.nifi.avro.AvroTypeUtil.convertToAvroObject(AvroTypeUtil.java:572)
>
>at
> org.apache.nifi.avro.AvroTypeUtil.createAvroRecord(AvroTypeUtil.java:432)
>
>at
> org.apache.nifi.processors.parquet.record.AvroParquetHDFSRecordWriter.write(AvroParquetHDFSRecordWriter.java:43)
>
>at
> org.apache.nifi.processors.hadoop.record.HDFSRecordWriter.write(HDFSRecordWriter.java:48)
>
>at
> org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$null$0(AbstractPutHDFSRecord.java:324)
>
>at
> org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2174)
>
>at
> org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2144)
>
>at
> org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$onTrigger$1(AbstractPutHDFSRecord.java:305)
>
>at java.security.AccessController.doPrivileged(Native
> Method)
>
>at javax.security.auth.Subject.doAs(Subject.java:360)
>
>at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1678)
>
>at
> org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.onTrigger(AbstractPutHDFSRecord.java:272)
>
>at
> org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
>
>at
> org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1119)
>
>at
> org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:147)
>
>at
> org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:47)
>
>at
> org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:128)
>
>at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>
>at
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>
>at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>
>at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>
>at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>
>at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>
>at java.lang.Thread.run(Thread.java:748)
>
>
>
>
>
> Please let me know if I’m doing anything wrong.
>
>
>
> Regards,
>
> Mohit Jain
>


PutParquet fails to convert avro logical decimal type.

2018-02-15 Thread mohit.jain
Hi all, 

 

I am using QueryDatabaseTable to extract records from mysql. I have set
Logical Data Type to true. I am using the PutParquet processor to write to
HDFS. It is not able to convert the logical decimal type.

 

It throws an exception :-

2018-02-15 17:59:05,189 ERROR [Timer-Driven Process Thread-10]
o.a.nifi.processors.parquet.PutParquet
PutParquet[id=01611011-e4a8-106a-f933-eb66d923cfd1] Failed to write due to
org.apache.nifi.serialization.record.util.IllegalTypeConversionException:
Cannot convert value 1234567.2 of type class java.lang.Double because no
compatible types exist in the UNION for field dectype: {}

org.apache.nifi.serialization.record.util.IllegalTypeConversionException:
Cannot convert value 1234567.2 of type class java.lang.Double because no
compatible types exist in the UNION for field dectype

   at
org.apache.nifi.avro.AvroTypeUtil.convertUnionFieldValue(AvroTypeUtil.java:6
67)

   at
org.apache.nifi.avro.AvroTypeUtil.convertToAvroObject(AvroTypeUtil.java:572)

   at
org.apache.nifi.avro.AvroTypeUtil.createAvroRecord(AvroTypeUtil.java:432)

   at
org.apache.nifi.processors.parquet.record.AvroParquetHDFSRecordWriter.write(
AvroParquetHDFSRecordWriter.java:43)

   at
org.apache.nifi.processors.hadoop.record.HDFSRecordWriter.write(HDFSRecordWr
iter.java:48)

   at
org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$null$0(Abstra
ctPutHDFSRecord.java:324)

   at
org.apache.nifi.controller.repository.StandardProcessSession.read(StandardPr
ocessSession.java:2174)

   at
org.apache.nifi.controller.repository.StandardProcessSession.read(StandardPr
ocessSession.java:2144)

   at
org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$onTrigger$1(A
bstractPutHDFSRecord.java:305)

   at java.security.AccessController.doPrivileged(Native Method)

   at javax.security.auth.Subject.doAs(Subject.java:360)

   at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.ja
va:1678)

   at
org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.onTrigger(AbstractPu
tHDFSRecord.java:272)

   at
org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java
:27)

   at
org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessor
Node.java:1119)

   at
org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(Continuall
yRunProcessorTask.java:147)

   at
org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(Continuall
yRunProcessorTask.java:47)

   at
org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(Timer
DrivenSchedulingAgent.java:128)

   at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

   at
java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)

   at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$
301(ScheduledThreadPoolExecutor.java:180)

   at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Sch
eduledThreadPoolExecutor.java:294)

   at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:11
49)

   at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:6
24)

   at java.lang.Thread.run(Thread.java:748)

 

 

Please let me know if I'm doing anything wrong.

 

Regards,

Mohit Jain



PutHiveStreaming NullPointerException error

2018-02-15 Thread Michal Tomaszewski
Hi All,

PutHiveStreaming throws lots of errors like bellow:

ERROR [Timer-Driven Process Thread-11] hive.log Got exception: 
java.lang.NullPointerException null
java.lang.NullPointerException: null

Errors appear when PutHiveStreaming is working. It appears from every few 
seconds to few times a second.

Hive streaming writes data to database but is extremely slow (like 2MB/5mins 
per 6core/24GB NiFi node).
Tested on NiFi 1.4, 1.5 and current 1.6 snapshot.

Hadoop/Hive cluster uses Hortonworks HDP 2.6.4 installation in HA mode without 
any security.
NiFi works in a 3 server cluster without security. NiFi compiled with 
-Phortonworks and proper libraries definition.
nifi conf directory contains actual Hadoop configuration files: core-site.xml, 
hbase-site.xml, hdfs-site.xml, hive-site.xml, yarn-site.xml
HiveQL queries in NIFI (both SelectHiveQL and PutHiveQL) are working properly 
and fast.

The same error exists when using Hortonworks original 1.5 NiFi compilation 
installed automatically by Ambari using HDF 3.1 pack.

nifi-app.log:

2018-02-15 17:42:29,889 INFO [put-hive-streaming-0] 
org.apache.hadoop.hive.ql.hooks.ATSHook Created ATS Hook
2018-02-15 17:42:29,889 INFO [put-hive-streaming-0] 
org.apache.hadoop.hive.ql.log.PerfLogger 
2018-02-15 17:42:29,890 INFO [put-hive-streaming-0] 
org.apache.hadoop.hive.ql.log.PerfLogger 
2018-02-15 17:42:29,890 INFO [put-hive-streaming-0] 
org.apache.hadoop.hive.ql.Driver Resetting the caller context to
2018-02-15 17:42:29,890 INFO [put-hive-streaming-0] 
org.apache.hadoop.hive.ql.log.PerfLogger 
2018-02-15 17:42:29,890 INFO [put-hive-streaming-0] 
org.apache.hadoop.hive.ql.Driver OK
2018-02-15 17:42:29,890 INFO [put-hive-streaming-0] 
org.apache.hadoop.hive.ql.log.PerfLogger 
2018-02-15 17:42:29,890 INFO [put-hive-streaming-0] 
o.a.hadoop.hive.ql.lockmgr.DbTxnManager Stopped heartbeat for query: 
nifi_20180215174229_58046dbe-0af7-41ae-94c4-7bb692053d67
2018-02-15 17:42:29,890 INFO [put-hive-streaming-0] 
o.a.hadoop.hive.ql.lockmgr.DbLockManager releaseLocks: [lockid:5010336 
queryId=nifi_20180215174229_58046dbe-0af7-41ae-94c4-7bb692053d67 txnid:0]
2018-02-15 17:42:29,894 INFO [put-hive-streaming-0] 
org.apache.hadoop.hive.ql.log.PerfLogger 
2018-02-15 17:42:29,894 INFO [put-hive-streaming-0] 
org.apache.hadoop.hive.ql.log.PerfLogger 
2018-02-15 17:42:29,898 WARN [Timer-Driven Process Thread-11] hive.metastore 
Unexpected increment of user count beyond one: 2 HCatClient: thread: 132 
users=2 expired=false closed=false
2018-02-15 17:42:29,901 ERROR [Timer-Driven Process Thread-11] hive.log Got 
exception: java.lang.NullPointerException null
java.lang.NullPointerException: null
at 
org.apache.hadoop.hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook.getFilteredObjects(AuthorizationMetaStoreFilterHook.java:77)
at 
org.apache.hadoop.hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook.filterDatabases(AuthorizationMetaStoreFilterHook.java:54)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabases(HiveMetaStoreClient.java:1116)
at 
org.apache.hive.hcatalog.common.HiveClientCache$CacheableHiveMetaStoreClient.isOpen(HiveClientCache.java:469)
at sun.reflect.GeneratedMethodAccessor111.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:174)
at com.sun.proxy.$Proxy341.isOpen(Unknown Source)
at 
org.apache.hive.hcatalog.common.HiveClientCache.get(HiveClientCache.java:269)
at 
org.apache.hive.hcatalog.common.HCatUtil.getHiveMetastoreClient(HCatUtil.java:558)
at 
org.apache.hive.hcatalog.streaming.AbstractRecordWriter.(AbstractRecordWriter.java:94)
at 
org.apache.hive.hcatalog.streaming.StrictJsonWriter.(StrictJsonWriter.java:82)
at 
org.apache.hive.hcatalog.streaming.StrictJsonWriter.(StrictJsonWriter.java:60)
at 
org.apache.nifi.util.hive.HiveWriter.getRecordWriter(HiveWriter.java:85)
at org.apache.nifi.util.hive.HiveWriter.(HiveWriter.java:72)
at org.apache.nifi.util.hive.HiveUtils.makeHiveWriter(HiveUtils.java:46)
at 
org.apache.nifi.processors.hive.PutHiveStreaming.makeHiveWriter(PutHiveStreaming.java:1036)
at 
org.apache.nifi.processors.hive.PutHiveStreaming.getOrCreateWriter(PutHiveStreaming.java:947)
at 
org.apache.nifi.processors.hive.PutHiveStreaming.lambda$null$8(PutHiveStreaming.java:743)
at 
org.apache.nifi.processor.util.pattern.ExceptionHandler.execute(ExceptionHandler.java:127)
at 
org.apache.nifi.processors.hive.PutHiveStreaming.lambda$onTrigger$12(PutHiveStreaming.java:740)
at 
org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2175)
at 

Re: Deployment of Flow Files help

2018-02-15 Thread Bryan Bende
Ok, I'm assuming by "flow files in git repo" you mean template XML
files that you exported from NiFi and saved in Git?

I think the approach with NiFi Registry would be much better than
using templates, but for what you are trying to do right now, you'll
need to give some info about what the issues are... error messages,
stack traces, etc.

On Thu, Feb 15, 2018 at 12:18 PM, Sean Marciniak  wrote:
> Hi Bryan,
>
> We are not currently using NiFi Registry. We are deploying flow files using
> the restful api however, we are having issues trying do with nifi in secure
> mode.
>
> I'm assuming you are asking how you can automate this so that you can spin
> up a new cluster and then deploy a flow automatically, is that correct?
>
> Yeah, we currently have our flow files inside a git repo that are versions
> and we are hoping to deploy from our CI/CD server.
>
>
> The final plan is to move NiFi into a scalable Kubernetes cluster with
> automated deployment of flow files.
>
> On 15 February 2018 at 4:56:04 pm, Bryan Bende (bbe...@gmail.com) wrote:
>
> Hi Sean,
>
> Are you using NiFi Registry [1]?  That should be your primary way of
> deploying flows.
>
> I'm assuming you are asking how you can automate this so that you can spin
> up a new cluster and then deploy a flow automatically, is that correct?
>
> If you are not talking about automation, then everything you need for a
> secure setup between NiFi and NiFi Registry should already exist.
>
> If you are trying to do automation, then you may be interested in some of
> the work we've been doing on NIFI-4839 [2].
>
> Using this CLI, the starting point would be that somewhere you have the JSON
> of a versioned flow from a NiFi registry, presumably from your DEV instance
> where your developed everything. The JSON could be in a file, or posted
> somewhere on an accessible URL.
>
> In your new environment, you would then do something like the following
> (these are CLI commands):
>
> registry create-bucket -b "My Bucket"
>
> registry create-flow -b  -fn "My Flow"
>
> registry import-flow-version -f  -i 
>
> nifi pg-import -b  -f  -fv 
>
> nifi pg-start -pgid 
>
> Obviously this CLI is still under development and subject to change, as it
> has not yet been reviewed and merged to NiFi's master branch.
>
> Thanks,
>
> Bryan
>
> [1] https://nifi.apache.org/registry.html
> [2] https://issues.apache.org/jira/browse/NIFI-4839
>
>
> On Thu, Feb 15, 2018 at 10:36 AM, Sean Marciniak  wrote:
>>
>> Hi team,
>>
>> I am trying to figure out how best to deploy flow files to a NiFi with it
>> set up in 'Secure Clustered Mode'.
>>
>> Any advice and examples of doing so would be greatly appreciated. Once we
>> get this working, we can look at using this inside Kubernetes.
>>
>> I had asked inside the Hipchat room and was told to send an email instead:
>>
>> ```
>> [2:08 PM] Sean Marciniak: Hey team,I have a question regarding deploying
>> flow files when NiFi is in secure clustered mode.
>> Has this been done before? Is there a working example that you know of?
>> We know we can deploy to NiFi with secure mode turned off but we require
>> this to be working
>> [2:12 PM] Sean Marciniak: We are trying to use NiFi v1.5 if that helps
>> ```
>> Original question.
>>
>> Thank you,
>>
>> Sean.
>>
>>
>> --
>>
>> Sean Marciniak
>>
>> s...@beamery.com
>>
>> www.beamery.com
>>
>> Are you ready for GDPR? GDPR: The Complete Guide for Recruiting Teams
>
>


Re: Deployment of Flow Files help

2018-02-15 Thread Sean Marciniak
Hi Bryan,

We are not currently using NiFi Registry. We are deploying flow files using
the restful api however, we are having issues trying do with nifi in secure
mode.

I'm assuming you are asking how you can automate this so that you can spin
up a new cluster and then deploy a flow automatically, is that correct?

Yeah, we currently have our flow files inside a git repo that are versions
and we are hoping to deploy from our CI/CD server.

The final plan is to move NiFi into a scalable Kubernetes cluster with
automated deployment of flow files.

On 15 February 2018 at 4:56:04 pm, Bryan Bende (bbe...@gmail.com) wrote:

Hi Sean,

Are you using NiFi Registry [1]?  That should be your primary way of
deploying flows.

I'm assuming you are asking how you can automate this so that you can spin
up a new cluster and then deploy a flow automatically, is that correct?

If you are not talking about automation, then everything you need for a
secure setup between NiFi and NiFi Registry should already exist.

If you are trying to do automation, then you may be interested in some of
the work we've been doing on NIFI-4839 [2].

Using this CLI, the starting point would be that somewhere you have the
JSON of a versioned flow from a NiFi registry, presumably from your DEV
instance where your developed everything. The JSON could be in a file, or
posted somewhere on an accessible URL.

In your new environment, you would then do something like the following
(these are CLI commands):

registry create-bucket -b "My Bucket"

registry create-flow -b  -fn "My Flow"

registry import-flow-version -f  -i 

nifi pg-import -b  -f  -fv 

nifi pg-start -pgid 

Obviously this CLI is still under development and subject to change, as it
has not yet been reviewed and merged to NiFi's master branch.

Thanks,

Bryan

[1] https://nifi.apache.org/registry.html
[2] https://issues.apache.org/jira/browse/NIFI-4839


On Thu, Feb 15, 2018 at 10:36 AM, Sean Marciniak  wrote:

> Hi team,
>
> I am trying to figure out how best to deploy flow files to a NiFi with it
> set up in 'Secure Clustered Mode'.
>
> Any advice and examples of doing so would be greatly appreciated. Once we
> get this working, we can look at using this inside Kubernetes.
>
> I had asked inside the Hipchat room and was told to send an email instead:
>
> ```
> [2:08 PM ] Sean
> Marciniak: Hey team,I have a question regarding deploying flow files when
> NiFi is in secure clustered mode.
> Has this been done before? Is there a working example that you know of?
> We know we can deploy to NiFi with secure mode turned off but we require
> this to be working
> [2:12 PM ] Sean
> Marciniak: We are trying to use NiFi v1.5 if that helps
> ```
> Original question.
>
> Thank you,
>
> Sean.
>
>
> --
> 
>
> Sean Marciniak
>
> s...@beamery.com
>
> www.beamery.com
>
> Are you ready for GDPR? *GDPR: The Complete Guide for Recruiting Teams
> *
>


Re: Deployment of Flow Files help

2018-02-15 Thread Bryan Bende
Hi Sean,

Are you using NiFi Registry [1]?  That should be your primary way of
deploying flows.

I'm assuming you are asking how you can automate this so that you can spin
up a new cluster and then deploy a flow automatically, is that correct?

If you are not talking about automation, then everything you need for a
secure setup between NiFi and NiFi Registry should already exist.

If you are trying to do automation, then you may be interested in some of
the work we've been doing on NIFI-4839 [2].

Using this CLI, the starting point would be that somewhere you have the
JSON of a versioned flow from a NiFi registry, presumably from your DEV
instance where your developed everything. The JSON could be in a file, or
posted somewhere on an accessible URL.

In your new environment, you would then do something like the following
(these are CLI commands):

registry create-bucket -b "My Bucket"

registry create-flow -b  -fn "My Flow"

registry import-flow-version -f  -i 

nifi pg-import -b  -f  -fv 

nifi pg-start -pgid 

Obviously this CLI is still under development and subject to change, as it
has not yet been reviewed and merged to NiFi's master branch.

Thanks,

Bryan

[1] https://nifi.apache.org/registry.html
[2] https://issues.apache.org/jira/browse/NIFI-4839


On Thu, Feb 15, 2018 at 10:36 AM, Sean Marciniak  wrote:

> Hi team,
>
> I am trying to figure out how best to deploy flow files to a NiFi with it
> set up in 'Secure Clustered Mode'.
>
> Any advice and examples of doing so would be greatly appreciated. Once we
> get this working, we can look at using this inside Kubernetes.
>
> I had asked inside the Hipchat room and was told to send an email instead:
>
> ```
> [2:08 PM ] Sean
> Marciniak: Hey team,I have a question regarding deploying flow files when
> NiFi is in secure clustered mode.
> Has this been done before? Is there a working example that you know of?
> We know we can deploy to NiFi with secure mode turned off but we require
> this to be working
> [2:12 PM ] Sean
> Marciniak: We are trying to use NiFi v1.5 if that helps
> ```
> Original question.
>
> Thank you,
>
> Sean.
>
>
> --
> 
>
> Sean Marciniak
>
> s...@beamery.com
>
> www.beamery.com
>
> Are you ready for GDPR? *GDPR: The Complete Guide for Recruiting Teams
> *
>


Deployment of Flow Files help

2018-02-15 Thread Sean Marciniak
Hi team,

I am trying to figure out how best to deploy flow files to a NiFi with it
set up in 'Secure Clustered Mode'.

Any advice and examples of doing so would be greatly appreciated. Once we
get this working, we can look at using this inside Kubernetes.

I had asked inside the Hipchat room and was told to send an email instead:

```
[2:08 PM ] Sean
Marciniak: Hey team,I have a question regarding deploying flow files when
NiFi is in secure clustered mode.
Has this been done before? Is there a working example that you know of?
We know we can deploy to NiFi with secure mode turned off but we require
this to be working
[2:12 PM ] Sean
Marciniak: We are trying to use NiFi v1.5 if that helps
```
Original question.

Thank you,

Sean.


-- 


Sean Marciniak

s...@beamery.com

www.beamery.com

Are you ready for GDPR? *GDPR: The Complete Guide for Recruiting Teams
*


Re: [Data Flow] File content not read completely

2018-02-15 Thread Mark Payne
Valencia,

The SplitText processor does not change the ‘filename’ attribute of the 
FlowFile. So you will end up with multiple FlowFiles having the same name. 
PutFile may well be overwriting the same file many times - or failing to to 
write the files do to filename conflicts. You can resolve this, if it’s your 
problem, by adding an UpdateAttribute to your flow just before PutFile and 
changing the filename to something unique like ${UUID()} or 
${filename}.${nextInt()}

Hope this helps!

-Mark

Sent from my iPhone

On Feb 15, 2018, at 4:59 AM, Valencia Serrao 
> wrote:


Hi All,

I've started hands-on with Nifi. Basic flows I was able to do without any
issues. But currently I've tried adding more steps to the flow.

Flow intent: Get a local file, split the text on new line, extract text based 
on regex, Put matched/unmatched data on respective kafka topics and
finally write the kafka contents on the local targets set in PutFile.
Current Flow steps: GetFile, SplitText, ExtractText, PutKafka -( 2 of them,one 
for matched and unmatched), and 2 PutFiles components.

The issue I'm facing is that - after the flow execution I see only one entry in 
each of the 2 PutFile targets and rest of the content is not written to them 
even if the criteria is matched. I feel its not looping through the whole file 
or something like that. But I had read that Nifi flow is executed for all 
contents in source files. Maybe I've missed some config somewhere.

It would be really helpful if anyone could help on this issue.

Regards,
Valencia


Re: Object not recognized in ExecuteScript

2018-02-15 Thread Mike Thomsen
Jim,

You need to call session.commit() periodically if you want to progressively
push flowfiles out. Though you need to think about the failure scenarios
your script has and whether you really want to sacrifice the ability to do
a rollback in the event things go wrong.


On Thu, Feb 15, 2018 at 6:27 AM, James McMahon  wrote:

> Matt, I have noticed another behavior pattern I hope to correct, but I
> don't fully understand it and so hope that you can provide some insights.
>
> I have embedded the flowFile related commands to create the flowFiles
> within my file processing loop. My expectation was that I would see the
> output queue from ExecuteScript increment as each file was created. But I
> don't see that behavior. instead, I see a zero in the output queue for
> several minutes, and then suddenly the count jumps to the total set of
> flowFiles created.
>
> Because working back through the directory structure in time can be a time
> consuming process, I hope to output to flow incremental results as files
> are found. How can I make that happen? Is there a means to periodically
> flush what is in the output buffer every N files, perhaps?
>
> Thanks again for your insights.  -Jim
>
> On Mon, Feb 12, 2018 at 12:24 PM, James McMahon 
> wrote:
>
>> I see. Thank you Matt. -Jim
>>
>> On Mon, Feb 12, 2018 at 12:10 PM, Matt Burgess 
>> wrote:
>>
>>> Jim,
>>>
>>> In this case I don't think it's as much that the modules aren't being
>>> found, rather that the datetime module in Jython returns
>>> java.sql.Timestamp (Java) objects, rather than Jython/Python datetime
>>> objects, and the former do not support the methods/attributes of the
>>> latter, including timetuple(). Apparently [1] this change was made
>>> around Jython 2.5, and NiFi uses 2.7.1.
>>>
>>> Looks like you'll need to write your own timetuple() function, using
>>> the java.sql.Timestamp [2] and related APIs.
>>>
>>> Regards,
>>> Matt
>>>
>>> [1] http://www.jython.org/javadoc/com/ziclix/python/sql/Jython22
>>> DataHandler.html
>>> [2] https://docs.oracle.com/javase/8/docs/api/java/sql/Timestamp.html
>>>
>>> On Mon, Feb 12, 2018 at 11:58 AM, James McMahon 
>>> wrote:
>>> > Good afternoon. I havd a python script that I can execute from the
>>> command
>>> > line via my python interpreter. In it, I do this
>>> >
>>> > myTime = time.mktime(myDateTime.timetuple())
>>> >
>>> > When I try to run from my ExecuteScript processor in NiFi, this is not
>>> > recognized. This error gets thrown:
>>> >
>>> > 'java.sql.Timestamp' object has no attribute 'timetuple' in 

Re: Object not recognized in ExecuteScript

2018-02-15 Thread James McMahon
Matt, I have noticed another behavior pattern I hope to correct, but I
don't fully understand it and so hope that you can provide some insights.

I have embedded the flowFile related commands to create the flowFiles
within my file processing loop. My expectation was that I would see the
output queue from ExecuteScript increment as each file was created. But I
don't see that behavior. instead, I see a zero in the output queue for
several minutes, and then suddenly the count jumps to the total set of
flowFiles created.

Because working back through the directory structure in time can be a time
consuming process, I hope to output to flow incremental results as files
are found. How can I make that happen? Is there a means to periodically
flush what is in the output buffer every N files, perhaps?

Thanks again for your insights.  -Jim

On Mon, Feb 12, 2018 at 12:24 PM, James McMahon 
wrote:

> I see. Thank you Matt. -Jim
>
> On Mon, Feb 12, 2018 at 12:10 PM, Matt Burgess 
> wrote:
>
>> Jim,
>>
>> In this case I don't think it's as much that the modules aren't being
>> found, rather that the datetime module in Jython returns
>> java.sql.Timestamp (Java) objects, rather than Jython/Python datetime
>> objects, and the former do not support the methods/attributes of the
>> latter, including timetuple(). Apparently [1] this change was made
>> around Jython 2.5, and NiFi uses 2.7.1.
>>
>> Looks like you'll need to write your own timetuple() function, using
>> the java.sql.Timestamp [2] and related APIs.
>>
>> Regards,
>> Matt
>>
>> [1] http://www.jython.org/javadoc/com/ziclix/python/sql/Jython22
>> DataHandler.html
>> [2] https://docs.oracle.com/javase/8/docs/api/java/sql/Timestamp.html
>>
>> On Mon, Feb 12, 2018 at 11:58 AM, James McMahon 
>> wrote:
>> > Good afternoon. I havd a python script that I can execute from the
>> command
>> > line via my python interpreter. In it, I do this
>> >
>> > myTime = time.mktime(myDateTime.timetuple())
>> >
>> > When I try to run from my ExecuteScript processor in NiFi, this is not
>> > recognized. This error gets thrown:
>> >
>> > 'java.sql.Timestamp' object has no attribute 'timetuple' in 

[Data Flow] File content not read completely

2018-02-15 Thread Valencia Serrao


Hi All,

I've started hands-on with Nifi. Basic flows I was able to do without any
issues. But currently I've tried adding more steps to the flow.

Flow intent: Get a local file, split the text on new line, extract text
based on regex, Put matched/unmatched data on respective kafka topics and
finally write the kafka contents on the local targets set in PutFile.
Current Flow steps: GetFile, SplitText, ExtractText, PutKafka -( 2 of
them,one for matched and unmatched), and 2 PutFiles components.

The issue I'm facing is that - after the flow execution I see only one
entry in each of the 2 PutFile targets and rest of the content is not
written to them even if the criteria is matched. I feel its not looping
through the whole file or something like that. But I had read that Nifi
flow is executed for all contents in source files. Maybe I've missed some
config somewhere.

It would be really helpful if anyone could help on this issue.

Regards,
Valencia