Re: zookeeper error message - nifi 1.11.1/zookeeper 3.5.6

2020-02-13 Thread Jeff
Dan,

I believe the issue (if there is one) would be between Zookeeper and
Curator, but NiFi uses Curator 4.x which is the correct version to use with
ZK 3.5.x.

On Thu, Feb 13, 2020 at 12:28 AM dan young  wrote:

> Thank you for your email. Looking at the zookeeper docs, with 3.5.0 it
> looks like the format may have changed to support the dynamic
> configuration. It may seem that zookeeper is sending back a format that
> NiFi isn't expecting??? I.e the :participant
>
> https://zookeeper.apache.org/doc/r3.5.6/zookeeperReconfig.html
>
>
>
> On Wed, Feb 12, 2020, 5:53 PM 노대호Daeho Ro 
> wrote:
>
>> In my memory,
>>
>> zookeepr 3.5.6 needs the new form of zookeeper string such as
>>
>> server.1=0.0.0.0:2888:3888;2181
>>
>>
>> where the ip is yours.
>>
>> Hope this help you.
>>
>>
>> 2020년 2월 13일 (목) 오전 1:55, dan young 님이 작성:
>>
>>> Sorry Joe,
>>>
>>> Yes, I'll file a JIRA...here's the email again
>>>
>>> We're seeing the following messages in nifi logs on our cluster nodes.
>>> Using
>>> Nifi 1.11.1 and zookeeper (not embedded) version 3.5.6
>>>
>>> Functionality seems not to be impacted, but wondering if there's
>>> something else
>>> going on or the version of zookeeper we're using is causing this.
>>>
>>> 2020-02-12 15:36:43,959 ERROR [main-EventThread]
>>> o.a.c.framework.imps.EnsembleTracker Invalid config event received:
>>> {server.1=10.190.3.170:2888:3888:participant, version=0,
>>> server.3=10.190.3.91:2888:3888:participant, server.2=10.190.3.172:2888
>>> :3888:participant}
>>>
>>> Regards,
>>>
>>> Dano
>>>
>>> On Wed, Feb 12, 2020 at 9:49 AM Joe Witt  wrote:
>>>
 Dan,

 Not sure what others see but for me your email cuts off in the middle
 of a line.

 You might want to file a JIRA with your observation/logs.

 Thanks

 On Wed, Feb 12, 2020 at 11:46 AM dan young  wrote:

> Hello,
>
> We're seeing the following messages in nifi logs on our cluster nodes.  
> Using
> Nifi 1.11.1 and zookeeper (not embedded) version 3.5.6
>
> Functionality seems not to be impacted, but wondering if there's 
> something else
> going on or the version of zookeeper we're using is causing this.
>
> 2020-02-12 15:36:43,959 ERROR [main-EventThread] 
> o.a.c.framework.imps.EnsembleTracker Invalid config event received: 
> {server.1=10.190.3.170:2888:3888:participant, version=0, 
> server.3=10.190.3.91:2888:3888:participant, 
> server.2=10.190.3.172:2888:3888:participant}
>
> Regards,
>
> Dano
>
>
>


Re: Influence about removing RequiresInstanceClassLoading from AbstractHadoopProcessor processor

2019-11-20 Thread Jeff
By Hai Luo,

I haven't personally attempted to use NiFi with Hadoop in the way you
described.  If your Hadoop clusters are not secured I don't think you'll
have issues with UGI.  I wouldn't recommend running NiFi and Hadoop
clusters unsecured, however.

On Tue, Nov 12, 2019 at 8:40 PM abellnotring  wrote:

> Hi,jeff,
> There is no Kerberos authentication  in my HADOOP clusters , but I
> find UGI is initialized with an ExtendConfiguration(extend from hadoop
> Configruation) when those processor's instance was first scheduled .I would
> use nifi to connect different HADOOP cluster, will it  run into any
> issues?(I’m running tests for this)
>
>
> Thanks
>  By Hai Luo
> On 11/13/2019 07:01,Jeff  wrote:
>
> If you remove the @RequiresInstanceClassloading, the UserGroupInformation
> class from Hadoop (hadoop-common, if I remember correctly) will be shared
> across all instances that come from a particular NAR (such as PutHDFS,
> ListHDFS, FetchHDFS, etc, from nifi-hadoop-nar-x.y.z.nar).  If you are
> using Kerberos in those processors and configured different principals
> across the various processors, you could run into issues when the
> processors attempt to acquire new TGTs, most likely the first time a
> relogin is attempted.  UGI has some static state and
> @RequiresInstanceClassloading makes sure each instance of a processor with
> that annotation has its own classloader to keep that kind of state from
> being shared across instances.
>
> On Mon, Nov 11, 2019 at 9:41 PM abellnotring 
> wrote:
>
>> Hi,Peter & All
>>  I’m using kylo to manage the nifi flow(called feed in kylo),and
>> there are 4200 instances(600+ instances extended from
>>  AbstractHadoopProcessor) in my nifi canvas,The NIFI Non-Heap Memory has
>> increased more than  6GB after some days running ,which is extremely
>> abnormal . I have analyzed the class loaded into Compressed Class Space
>> ,and found most of the CCS was used by classes related by
>> AbstractHadoopProcessor.
>>So  I think removing RequiresInstanceClassLoading from
>> AbstractHadoopProcessor processor may be a Solution for reducing the CCS
>> used.
>>Do you have any ideas for this ?
>>
>>
>> <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1=abellnotring=abellnotring%40sina.com=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png=%5B%22abellnotring%40sina.com%22%5D>
>>
>>
>> <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1=abellnotring=abellnotring%40sina.com=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png=%5B%22abellnotring%40sina.com%22%5D>
>>
>>
>> <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1=abellnotring=abellnotring%40sina.com=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png=%5B%22abellnotring%40sina.com%22%5D>
>> <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1=abellnotring=abellnotring%40sina.com=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png=%5B%22abellnotring%40sina.com%22%5D>
>>   Thanks
>>
>>
>> By Hai Luo
>> On 11/12/2019 02:17,Shawn Weeks
>>  wrote:
>>
>> I’m assuming your talking about the snappy problem. If you use compress
>> content prior to puthdfs you can compress with Snappy as it uses the Java
>> Native Snappy Lib. The HDFS processors are limited to the actual Hadoop
>> Libraries so they’d have to change from Native to get around this. I’m
>> pretty sure we need instance loading to handle the other issues mentioned.
>>
>>
>>
>> Thanks
>>
>> Shawn
>>
>>
>>
>> *From: *Joe Witt 
>> *Reply-To: *"users@nifi.apache.org" 
>> *Date: *Monday, November 11, 2019 at 8:56 AM
>> *To: *"users@nifi.apache.org" 
>> *Subject: *Re: Influence about removing RequiresInstanceClassLoading
>> from AbstractHadoopProcessor processor
>>
>>
>>
>> Peter
>>
>>
>>
>> The most common challenge is if two isolated instances both want to use a
>> native lib.  No two native libs with the same name can be in the same jvm.
>> We need to solve that for sure.
>>
>>
>>
>> Thanks
>>
>>
>>
>> On Mon, Nov 11, 2019 at 9:53 AM Peter Turcsanyi 
>> wrote:
>>
>> Hi Hai Luo,
>>
>>
>>
>> @RequiresInstanceClassLoading makes possible to configure separate /
>> isolated "Additional Classpath Resources" settings on your HDFS processors
>> (eg. S3 storage driver on one of your PutHDFS and Azure Blob on the other).
>>
>>
>>
>> Is there any specific reason / use case why you are considering to remove
>> it?
>>
>>
>>
>> Regards,
>>
>> Peter Turcsanyi
>>
>>
>>
>> On Mon, Nov 11, 2019 at 3:30 PM abellnotring 
>> wrote:
>>
>> Hi,all
>>
>>  I’m considering removing the RequiresInstanceClassLoading annotation
>> from class AbstractHadoopProcessor,
>>
>>  Does anybody know the potential Influence?
>>
>>
>>
>> Thanks
>>
>> By Hai Luo
>>
>>


Re: Influence about removing RequiresInstanceClassLoading from AbstractHadoopProcessor processor

2019-11-11 Thread Jeff
If you remove the @RequiresInstanceClassloading, the UserGroupInformation
class from Hadoop (hadoop-common, if I remember correctly) will be shared
across all instances that come from a particular NAR (such as PutHDFS,
ListHDFS, FetchHDFS, etc, from nifi-hadoop-nar-x.y.z.nar).  If you are
using Kerberos in those processors and configured different principals
across the various processors, you could run into issues when the
processors attempt to acquire new TGTs, most likely the first time a
relogin is attempted.  UGI has some static state and
@RequiresInstanceClassloading makes sure each instance of a processor with
that annotation has its own classloader to keep that kind of state from
being shared across instances.

On Mon, Nov 11, 2019 at 9:41 PM abellnotring  wrote:

> Hi,Peter & All
>  I’m using kylo to manage the nifi flow(called feed in kylo),and there
> are 4200 instances(600+ instances extended from  AbstractHadoopProcessor)
> in my nifi canvas,The NIFI Non-Heap Memory has increased more than  6GB
> after some days running ,which is extremely abnormal . I have analyzed the
> class loaded into Compressed Class Space ,and found most of the CCS was
> used by classes related by AbstractHadoopProcessor.
>So  I think removing RequiresInstanceClassLoading from
> AbstractHadoopProcessor processor may be a Solution for reducing the CCS
> used.
>Do you have any ideas for this ?
>
>
> 
>
>
> 
>
>
> 
> 
>   Thanks
>
>
> By Hai Luo
> On 11/12/2019 02:17,Shawn Weeks
>  wrote:
>
> I’m assuming your talking about the snappy problem. If you use compress
> content prior to puthdfs you can compress with Snappy as it uses the Java
> Native Snappy Lib. The HDFS processors are limited to the actual Hadoop
> Libraries so they’d have to change from Native to get around this. I’m
> pretty sure we need instance loading to handle the other issues mentioned.
>
>
>
> Thanks
>
> Shawn
>
>
>
> *From: *Joe Witt 
> *Reply-To: *"users@nifi.apache.org" 
> *Date: *Monday, November 11, 2019 at 8:56 AM
> *To: *"users@nifi.apache.org" 
> *Subject: *Re: Influence about removing RequiresInstanceClassLoading from
> AbstractHadoopProcessor processor
>
>
>
> Peter
>
>
>
> The most common challenge is if two isolated instances both want to use a
> native lib.  No two native libs with the same name can be in the same jvm.
> We need to solve that for sure.
>
>
>
> Thanks
>
>
>
> On Mon, Nov 11, 2019 at 9:53 AM Peter Turcsanyi 
> wrote:
>
> Hi Hai Luo,
>
>
>
> @RequiresInstanceClassLoading makes possible to configure separate /
> isolated "Additional Classpath Resources" settings on your HDFS processors
> (eg. S3 storage driver on one of your PutHDFS and Azure Blob on the other).
>
>
>
> Is there any specific reason / use case why you are considering to remove
> it?
>
>
>
> Regards,
>
> Peter Turcsanyi
>
>
>
> On Mon, Nov 11, 2019 at 3:30 PM abellnotring 
> wrote:
>
> Hi,all
>
>  I’m considering removing the RequiresInstanceClassLoading annotation
> from class AbstractHadoopProcessor,
>
>  Does anybody know the potential Influence?
>
>
>
> Thanks
>
> By Hai Luo
>
>


Re: NiFi 1.10.0 and ZooKeeper 3.5.5

2019-11-08 Thread Jeff Zemerick
Pierre, thanks for confirming what I was seeing.

Jeff

On Fri, Nov 8, 2019 at 12:08 PM Pierre Villard 
wrote:

> Hi Jeff,
>
> I came to the same observation on my side as well. Upgrading to ZK 3.5.5
> did solve my issues.
>
> Pierre
>
> Le ven. 8 nov. 2019 à 18:04, Jeff Zemerick  a
> écrit :
>
>> Hi,
>>
>> I've been using NiFi 1.9.2 with an external ZooKeeper 3.4. After
>> upgrading to NiFi 1.10.0 it appears that the external ZooKeeper needs to be
>> updated to 3.5.5 per NIFI-6578 [1]. (I get lots of ZooKeeper-related errors
>> in 1.10.0's log with 3.4 but 3.5.5 works fine.) I just wanted to verify
>> this is correct and I'm not running into some other issue with NiFi 1.10.0
>> and ZooKeeper 3.4. I kind of expected the 3.5 client to be backward
>> compatible with 3.4 but maybe it is not.
>>
>> Thanks,
>> Jeff
>>
>> [1] https://issues.apache.org/jira/browse/NIFI-6578
>>
>


NiFi 1.10.0 and ZooKeeper 3.5.5

2019-11-08 Thread Jeff Zemerick
Hi,

I've been using NiFi 1.9.2 with an external ZooKeeper 3.4. After upgrading
to NiFi 1.10.0 it appears that the external ZooKeeper needs to be updated
to 3.5.5 per NIFI-6578 [1]. (I get lots of ZooKeeper-related errors in
1.10.0's log with 3.4 but 3.5.5 works fine.) I just wanted to verify this
is correct and I'm not running into some other issue with NiFi 1.10.0 and
ZooKeeper 3.4. I kind of expected the 3.5 client to be backward compatible
with 3.4 but maybe it is not.

Thanks,
Jeff

[1] https://issues.apache.org/jira/browse/NIFI-6578


Re: JAVA 11

2019-10-15 Thread Jeff
Hello Clay,

Currently you can run NiFi 1.9.2 on Java 11, though the binaries are built
with Java 8.

Java 11 build support is in the master branch, and is scheduled to be
released in NiFi 1.10.0.  Keep an eye out for the 1.10.0 release candidate,
on which work is currently progressing.

On Tue, Oct 15, 2019 at 10:42 AM Clay Teahouse 
wrote:

> Hello All,
>
> Does anyone know when NiFi will be available with Java 11? I have the
> latest version, 1.9.2.
>
> thanks
> Clay
>
>


Re: Problem with Context Path Whitelisting

2019-10-11 Thread Jeff
Swarup,

Agreed with Kevin, very nice write-up on the scenario!

Would you please provide the original request as sent by Nginx, along with
your configuration pertaining to NiFi in Nginx?  We can set up some test
cases to reproduce what's happening and get a JIRA filed if there's an edge
case not being handled by NiFi.

On Fri, Oct 11, 2019 at 9:30 AM Kevin Doran  wrote:

> Swarup,
>
> First, thanks for the great email. Nice job troubleshooting this and
> sharing your findings with the community.
>
> I'm more familiar with how these types of things get configured on
> NiFi Registry than NiFi, so I'm not as much help as others. But I did
> take a look and one thing I noticed was a difference between the
> startup config and the per-request config.
>
> On Startup, the whitelisted context paths are coming from the
> ServletContext FilterConfig [1].
>
> During request handling, the whitelisted context paths are coming from
> the ApplicationContext, directly from NiFi Properties [2]
>
> [1]
> https://github.com/apache/nifi/blob/master/nifi-commons/nifi-web-utils/src/main/java/org/apache/nifi/web/filter/SanitizeContextPathFilter.java#L41
> [2]
> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-web/nifi-web-api/src/main/java/org/apache/nifi/web/api/ApplicationResource.java#L165
>
> Ultimately, my assumption is that both of these property values
> *should* be backed by the same nifi.properties file. But it appears
> something is happening in your case/environment/situation that is
> causing the ServletContext and ApplicationContext to get
> configured/initialized differently. This could be something specific
> to your environment or it could be uncovering an edge-case bug in
> NiFi.
>
> I think others on this mailing list who are more familiar with how the
> ServletContext gets setup in NiFi might be able to help further on
> this and determine if there is a solution/workaround or bug that needs
> patching.
>
> Thanks,
> Kevin
>
> On Fri, Oct 11, 2019 at 4:55 AM Swarup Karavadi  wrote:
> >
> > Greetings,
> >
> > I have deployed a single node unsecured NiFi cluster (I say cluster
> because nifi.cluster.is.node is set to "true") as a stateful set on
> Kubernetes (AWS EKS to be specific). The NiFi cluster sits behind an Nginx
> ingress. I have configured the Nginx ingress to forward the appropriate
> headers to NiFi (when deployed behind a reverse proxy) as described in the
> documentation.
> >
> > The path on the Nginx ingress which proxies traffic to the NiFi UI is
> "/pie/ip". This same path has been whitelisted by setting the
> "nifi.web.proxy.context.path" property to "/pie/ip". The way I am expecting
> this setup to work is that when users navigate to http://foo.com/pie/ip
> in the browser, they are shown a simple HTML page with redirect info and
> then automatically redirected to http://foo.com/pie/ip/nifi where they
> can view the NiFi canvas. Instead, the users are being redirected to
> http://foo.com/nifi which results in a 404 response because there is no
> '/nifi' path that has been configured on the Nginx ingress.
> >
> > I set the NiFi and Jetty Server log levels to DEBUG to understand what
> was happening under the hood and this is what I got -
> >
> > On Startup (when the SanitizeContextPathFilter is initialized) -
> > 2019-10-11 06:07:26,206 DEBUG [main]
> o.a.n.w.filter.SanitizeContextPathFilter SanitizeContextPathFilter received
> provided whitelisted context paths from NiFi properties: /pie/ip
> >
> > On Request (when the actual request is made) -
> > 2019-10-11 06:45:45,556 DEBUG [NiFi Web Server-23]
> org.apache.nifi.web.util.WebUtils Context path:
> > 2019-10-11 06:45:45,556 DEBUG [NiFi Web Server-23]
> org.apache.nifi.web.util.WebUtils On the request, the following context
> paths were parsed from headers:
> >  X-ProxyContextPath: /pie/ip
> > X-Forwarded-Context: null
> > X-Forwarded-Prefix: null
> > 2019-10-11 06:45:45,556 DEBUG [NiFi Web Server-23]
> org.apache.nifi.web.util.WebUtils Determined context path: /pie/ip
> > 2019-10-11 06:45:45,556 ERROR [NiFi Web Server-23]
> org.apache.nifi.web.util.WebUtils The provided context path [/pie/ip] was
> not whitelisted []
> > 2019-10-11 06:45:45,556 ERROR [NiFi Web Server-23]
> org.apache.nifi.web.util.WebUtils Error determining context path on JSP
> page: The provided context path [/pie/ip] was not whitelisted []
> > 2019-10-11 06:45:45,556 DEBUG [NiFi Web Server-23]
> o.a.n.w.filter.SanitizeContextPathFilter SanitizeContextPathFilter set
> contextPath:
> >
> > You will notice from the above log entries that the path '/pie/ip' was
> successfully whitelisted. Yet, when handling the request, the whitelisted
> context paths array is empty and this causes the wrong redirect to happen
> on the browser - and I can't figure out why this is happening or how I can
> fix it. Has anyone come across this kind of problem before? Any help on
> this is much appreciated.
> >
> > Cheers,
> > Swarup.
>


Re: Nifi errors - FetchFile and UnpackContent

2019-10-03 Thread Jeff
Hello Tomislav,

Are these processors running in a multi-node cluster?  Is FetchFile
downstream from a ListFile processor that is scheduled to run on all nodes
versus Primary Node only?  Is FetchFile's Completion Strategy set to "Move
File" or "Delete File"?  Typically, source processors should be scheduled
to run on the primary node, otherwise when reading from the same source
across multiple nodes, for example a shared network drive, each source
processor might pull the same data.  In a situation like this, the same
file could be listed by each node, and the FetchFile processor on each node
may attempt to fetch the same file.

If you set the source processor to run on Primary Node only, you can
load-balance the connection between the source processor and FetchFile to
distribute the load of fetching the files across the cluster.

On Thu, Oct 3, 2019 at 2:32 AM Tomislav Novosel 
wrote:

> Hi all,
>
> I'm getting errors from FetchFile and UnpackContent processors.
> I have pipeline where I fetch zip files as they come continuously on
> shared network drive
> with Minimum file age set to 30 sec to avoid fetching file before it is
> written to disk completely.
>
> Sometimes I get this error from FetchFile:
>
> FetchFile[id=c741187c-1172-1166-e752-1f79197a8029] Could not fetch file
> \\avl01\ATGRZ\TestFactory\02 Dep Service\01
> Processdata\Backup\dfs_atfexport\MANA38\ANA_12_BPE7347\ANA_12_BPE7347_TDL_HL_1\measurement_file.atf.zip
> from file system for
> StandardFlowFileRecord[uuid=e7a5e3c4-0981-4ff3-85ea-91e41f0c3c0e,claim=,offset=0,name=PEI_BPE7347_TDLHL1new_826_20191001161312.atf.zip,size=0]
> because the existence of the file cannot be verified; routing to failure
>
>
> And from UnpackContent sometimes I get this error:
>
>
> UnpackContent[id=0164106c-d3b7-1e3f-c770-6e6e07f9259d] Unable to unpack
> StandardFlowFileRecord[uuid=4a019d58-fe45-4276-a161-e46cd8b1667c,claim=StandardContentClaim
> [resourceClaim=StandardResourceClaim[id=1570052741201-5000,
> container=default, section=904], offset=1651,
> length=28417768],offset=0,name=measurement.atf.zip,size=28417768] due to
> IOException thrown from
> UnpackContent[id=0164106c-d3b7-1e3f-c770-6e6e07f9259d]:
> java.io.IOException: Truncated ZIP file; routing to failure:
>
> org.apache.nifi.processor.exception.ProcessException: IOException thrown
> from UnpackContent[id=0164106c-d3b7-1e3f-c770-6e6e07f9259d]:
> java.io.IOException: Truncated ZIP file
>
>
> After getting this error from UnpackContent I tried to fetch file again
> and to unpack it. It went well, without any errors.
> So what does this errors mean? I spoke to colleagues who are using this
> files on the source side and they said files are ok, not corrupted or
> something.
>
> Please help or give advice.
>
> Thanks in advance.
> Tom
>
>
>
>
>
>
>
>
>
>
>


Re: Question on MergeContent "Max bin age"

2019-08-09 Thread Jeff
I like the idea of seeing the details of the reason for eviction/merge in
the details of a provenance event.  Those same details could be provided in
an attribute as well.  If a log statement was also created, it should
probably be at the DEBUG level.

On Fri, Aug 9, 2019 at 9:57 AM Mark Payne  wrote:

> I don’t believe this information is made available. It would certainly be
> a useful improvement to include the reason that the “bin” was evicted and
> merged - due to timeout, minimum threshold reached, maximum threshold
> reached, or due to running out of space for a new bin. Please do file a
> jira for that improvement.
>
> What do you think is the most useful way to relay this information? Logs?
> Attribute on the merged flowfile? Details of the provenance event?
>
> Thanks
> -Mark
>
> Sent from my iPhone
>
> On Aug 9, 2019, at 8:20 AM, Jean-Sebastien Vachon 
> wrote:
>
> Hi all,
>
> Is there a way to know if a MergeContent module has timed out because it
> reached the "Max bin age" setting?
>
> Thanks
>
>


Re: Kerberos Ticket Renewal (when not updating Hadoop user)

2019-06-13 Thread Jeff
James,

No worries!  At least you now have another point of reference. :)

On Thu, Jun 13, 2019 at 1:11 PM James Srinivasan 
wrote:

> Err, my bad - meant to send this to the Accumulo list!
>
> Sorry!
>
> On Thu, 13 Jun 2019, 18:07 Jeff,  wrote:
>
>> Hello James,
>>
>> For our Hadoop processors, we generally don't do any explicit
>> relogins/TGT renewal.  It's handled implicitly by the Hadoop libs.  PR 2360
>> [1] is the primary change-set to allow this in NiFi, and several NiFI JIRAs
>> (mainly NIFI-3472 [2]) are referenced in that pull request if you are
>> interested in doing further reading.
>>
>> HiveConnectionPool (across various Hive versions) is the only component
>> that comes to mind where we explicitly try to do a relogin.  NIFI-5134 [3]
>> contains more information on that.
>>
>> Hope this information helps!
>>
>> [1] https://github.com/apache/nifi/pull/2360
>> [2] https://issues.apache.org/jira/browse/NIFI-3472
>> [3] https://issues.apache.org/jira/browse/NIFI-5134
>>
>> On Wed, Jun 12, 2019 at 4:06 PM James Srinivasan <
>> james.sriniva...@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I'm finally getting around to fixing up some deprecation issues with
>>> our use of Kerberos with Accumulo and GeoMesa
>>> (https://github.com/locationtech/geomesa/). Because I didn't know any
>>> better at the time, I used the KerberosToken ctor specifying that the
>>> Hadoop user should be replaced. Combined with a thread to periodically
>>> renew the ticket (calling
>>> UserGroupInformation.getCurrentUser.checkTGTAndReloginFromKeytab()),
>>> this has worked nicely for us.
>>>
>>> However, there are some unfortunate side effects of updating the
>>> Hadoop user - for instance, subsequent HDFS operations use the new
>>> user, who may not have the same permissions as the original user in a
>>> Zeppelin-type notebook environment. Plus the replaceCurrentUser param
>>> is deprecated and removed in Accumulo 2.0. So I'm keen on not
>>> replacing the Hadoop user, but how do I handle ticket renewal?
>>>
>>> Thanks very much,
>>>
>>> James
>>>
>>


Re: Kerberos Ticket Renewal (when not updating Hadoop user)

2019-06-13 Thread Jeff
Hello James,

For our Hadoop processors, we generally don't do any explicit relogins/TGT
renewal.  It's handled implicitly by the Hadoop libs.  PR 2360 [1] is the
primary change-set to allow this in NiFi, and several NiFI JIRAs (mainly
NIFI-3472 [2]) are referenced in that pull request if you are interested in
doing further reading.

HiveConnectionPool (across various Hive versions) is the only component
that comes to mind where we explicitly try to do a relogin.  NIFI-5134 [3]
contains more information on that.

Hope this information helps!

[1] https://github.com/apache/nifi/pull/2360
[2] https://issues.apache.org/jira/browse/NIFI-3472
[3] https://issues.apache.org/jira/browse/NIFI-5134

On Wed, Jun 12, 2019 at 4:06 PM James Srinivasan 
wrote:

> Hi all,
>
> I'm finally getting around to fixing up some deprecation issues with
> our use of Kerberos with Accumulo and GeoMesa
> (https://github.com/locationtech/geomesa/). Because I didn't know any
> better at the time, I used the KerberosToken ctor specifying that the
> Hadoop user should be replaced. Combined with a thread to periodically
> renew the ticket (calling
> UserGroupInformation.getCurrentUser.checkTGTAndReloginFromKeytab()),
> this has worked nicely for us.
>
> However, there are some unfortunate side effects of updating the
> Hadoop user - for instance, subsequent HDFS operations use the new
> user, who may not have the same permissions as the original user in a
> Zeppelin-type notebook environment. Plus the replaceCurrentUser param
> is deprecated and removed in Accumulo 2.0. So I'm keen on not
> replacing the Hadoop user, but how do I handle ticket renewal?
>
> Thanks very much,
>
> James
>


Re: Number and size of nodes

2019-06-01 Thread Jeff
Christian,

Another factor to consider for the NiFi nodes is the disk hardware and
configuration.  Simply put, the faster your disks are (SSD, RAID0, etc),
the faster NiFi will be able to perform.  This is dependent on the needs of
your flow; the number of CPUs/cores isn't the only consideration for
performance.  NiFi can use multiple repositories (for content, flowfiles,
etc) to allow "striping" across multiple locations (directories, mount
points, separate physical disks, etc) which can help alleviate disk IO
utilization concerns.

It may help you to create a prototype flow on currently available hardware
to take a look at CPU/IO/disk utilization to get a starting point on what
hardware you'll need.

On Sat, Jun 1, 2019 at 3:12 AM Christian Andreasen <
andreasenchrist...@gmail.com> wrote:

> Thank you for your input, Martijn. It def helps us in our decision-making.
>
> Den tor. 30. maj 2019 kl. 11.21 skrev Martijn Dekkers <
> mart...@dekkers.org.uk>:
>
>> In clusters, odd numbers of nodes are generally preferred (depending on
>> clustering implementation) to avoid (to an extent) split-brain scenarios
>> and generally manage quorum. I stand to be corrected, but in the current
>> implementation in NiFi I don't think this is an issue.
>>
>> Additional nodes will give you increased IO throughput for most cases. IO
>> will, in most cases, be your bottleneck.
>> Core/thread count per node will have an impact on scheduling. Matt Clarke
>> wrote an excellent article on thread usage in NiFi:
>> https://community.hortonworks.com/articles/221808/understanding-nifi-max-thread-pools-and-processor.html
>>
>> "Optimal" Cluster design will come down to your anticipated use-cases.
>> Having said that, most run-of-the-mill "decent" systems will deliver great
>> performance for most systems. If your needs are more towards the "we *must*
>> have very high performance", or "we *must* process x messages per second"
>> to a degree of business criticality, you should probably make sure you
>> design your flow and then design and implement a system to meet the needs
>> of that flow.
>>
>> On Thu, 30 May 2019, at 09:52, Christian Andreasen wrote:
>>
>> We are planning to build a NiFi cluster and have two questions that we
>> hope you could help us answer.
>>
>>1. When having our NiFi cluster configured to run with an external
>>Zookeeper cluster (i.e. not using the default embedded ZK mode) is it then
>>still best practice to have an odd number of NiFi nodes? If so, why is 
>> that?
>>2. Keeping all other things constant, is there then any advantage of
>>running a setup with 3 NiFi nodes each having 8 cores compared to a setup
>>with 6 nodes each having 4 cores?
>>
>> Any input much appreciated.
>>
>> Thanks,
>> Christian
>>
>>
>>


Re: migrate state for local to cluster transition

2019-05-07 Thread Jeff
Hello Ben,

In NiFi Toolkit, the Zookeeper Migrator exists to move state from one ZK
quorum to another, but to my knowledge, there's no local-to-cluster
migrator.  There is a JIRA [1] for this functionality, but it's currently
unassigned, and I haven't heard about any work being done for it.

[1] https://issues.apache.org/jira/browse/NIFI-4088

On Tue, May 7, 2019 at 11:30 AM Benjamin Garrett <
benjamingarret...@gmail.com> wrote:

> Hi,
>
> We want to change a nifi 1.9.2 instance from local to clustered mode.  We
> want to maintain 'state' when this happens.   What are our options?  We
> tried google, etc. and haven't turned up anything definitive in the
> documentation or nifi tools.
>
> We experimented with transitioning a 1.9.2 instance from local to
> clustered and it appeared that it remembered the local state BUT it did not
> seem to actually use that state when the processor ran.
>
> Otherwise we are thinking of writing our own migration tool piecing
> together the state provider code for local vs. zookeeper, etc.
>
> Thanks in advance!
> Ben
>


Re: Apply Zookeeper ACL to Existing NiFi Cluster

2019-02-15 Thread Jeff
Ryan,

Sorry for the late reply.  Are you still looking for a way to do this?

If I understand correctly what you're trying to do, you should be able to
use the zk-migrator tool to do this.  I haven't done this personally, but
here is a rough outline of steps you can follow:

- Stop the flow on the unsecured NiFi
- Export your current NiFI ZK state nodes under "/nifi/components" to a
json file, for example "zk-source-data.json"
- Configure NiFi and the ZK quorum for Kerberos
- Configure NiFi to use a different root node than the one used while NiFi
was running unsecured with an unsecured ZK (or an unsecured root node)
- Start the newly kerberos-enabled NiFi, leaving the flow stopped, so that
NiFi can create the cluster nodes in ZK under the new ZK root node
- Import "zk-source-data.json to the new root node, using a JAAS config to
allow the migrator to create the CREATOR-ONLY ACLs with NiFi as the owner
of the nodes

At this point, you should be able to verify that the state for the
processors has been imported into ZK under the new root node by
right-clicking on processor such as ListHDFS and clicking "View State".
You should be able to start the flow to have it pick up where it left off,
based on the imported state.

You could do these for each NiFi cluster, providing a different root node
for each cluster.

Hopefully this helps, and again, sorry for the delay in response!  Please
let us know if you need more information.

- Jeff

On Sat, Jan 19, 2019 at 6:23 PM Ryan H 
wrote:

> Hi All,
>
> I've also posted this question to the Zookeeper Users DL, but thought I
> would also put the question out here as well since it is related to NiFi.
>
> We currently have a centralized external Zookeeper cluster that is being
> used for multiple NiFi clusters. There wasn't any initial security set up
> (shame on us) and now want to add something in such that each NiFi cluster
> should only be able to see it's own ZK data (CreatorOnly).
>
> Can an ACL be put in place (either Kerberos or Username/Password) to an
> existing ZK tree that isn't currently under any kind of ACL? Example being,
> could I stop one of the NiFi clusters, add in Username/Password info and
> CreatorOnly to the state-management.xml file, restart the cluster, and then
> that ZK tree will then be only accessible by that cluster? Would this be a
> case where the migration tool would need to be used? I couldn't really find
> much in way of documentation for this specific case and just want to
> understand what options there are without breaking any of the clusters and
> get some security in there.
>
> Any info is always appreciated!
>
> Cheers,
>
> Ryan H
>


Re: jdbc impala url

2019-01-09 Thread Jeff
Hello,

I'm working on some NiFi/Impala integration examples currently, and will be
taking a look at kerberized Impala in a few days.  I'll try to follow up
with you on this list after I have things working.  If you beat me to it,
please let us know!

- Jeff

On Mon, Jan 7, 2019 at 8:32 PM PasLe Choix  wrote:

> I need to create a  DBCPConnectionPool to connect to my table on Impala
> in Cloudera cluster, wondering who can provide a sample URL for the jdbc
> connection?
> At the moment my URL is as below and is not working:
> jdbc:impala://nydc-pcdhwrk01:21050;
> AuthMech=1;KrbRealm=BD.NOVANTAS.PRI;KrbHostFQDN=
> nydc-pcdhwrk01.novantas.com;KrbServiceName=pchoix
>
>
> Thank you very much.
>
> **
> *Sincerely yours,*
>
>
> *PC*
>


Add NiFi Registry client

2019-01-01 Thread Jeff Zemerick
Hi,

Is it possible to add a NiFi Registry client via a properties file/CLI? I'm
trying to automate the configuration of NiFi so that the instances deployed
are preconfigured with a registry client. I looked in the properties files
and didn't see it but apologies if it's there somewhere and I missed it.

Thanks,
Jeff


Re: InvokeHTTP failure to schedule fro CRON?

2018-12-26 Thread Jeff
There was a bit of a grammar issue with my previous message...  InvokeHTTP
should be *presented* with a cert that's signed by one of the CAs in
cacerts.  You can use your browser to go to the website/URL that you've
configured in InvokeHTTP, and take a look at the certificate for the site.

For example, going to www.google.com using Google Chrome, you can click on
the padlock icon next to the URL, and click on "Certificate", which should
show you information about the cert that was presented for www.google.com,
and you can see that the root CA is from "GlobalSign".  The owner/issuer of
that "GlobalSign" root CA is:

   - CN=GlobalSign, O=GlobalSign, OU=GlobalSign Root CA - R2

Then, you can look in your JDK's cacerts to see if that CA is included:

   - keytool -storepass changeit -keystore
   
/Library/Java/JavaVirtualMachines/jdk1.8.0_192.jdk/Contents/Home/jre/lib/security/cacerts
   -list -v | grep -i "GlobalSign"

You should see output similar to the following:

Alias name: globalsignr2ca [jdk]
Owner: CN=GlobalSign, O=GlobalSign, OU=GlobalSign Root CA - R2
Issuer: CN=GlobalSign, O=GlobalSign, OU=GlobalSign Root CA - R2
 [URIName: http://crl.globalsign.net/root-r2.crl]
Alias name: globalsigneccrootcar4 [jdk]
Owner: CN=GlobalSign, O=GlobalSign, OU=GlobalSign ECC Root CA - R4
Issuer: CN=GlobalSign, O=GlobalSign, OU=GlobalSign ECC Root CA - R4
Alias name: globalsignca [jdk]
Owner: CN=GlobalSign Root CA, OU=Root CA, O=GlobalSign nv-sa, C=BE
Issuer: CN=GlobalSign Root CA, OU=Root CA, O=GlobalSign nv-sa, C=BE
Alias name: globalsignr3ca [jdk]
Owner: CN=GlobalSign, O=GlobalSign, OU=GlobalSign Root CA - R3
Issuer: CN=GlobalSign, O=GlobalSign, OU=GlobalSign Root CA - R3
Alias name: globalsigneccrootcar5 [jdk]
Owner: CN=GlobalSign, O=GlobalSign, OU=GlobalSign ECC Root CA - R5
Issuer: CN=GlobalSign, O=GlobalSign, OU=GlobalSign ECC Root CA - R5

You can see that the cert with an owner/issuer of "CN=GlobalSign,
O=GlobalSign, OU=GlobalSign Root CA - R2" is in the alias "globalsignr2ca
[jdk]" in the output, which should match the root CA with which
www.google.com's cert is signed.  You can look at more details of that cert
with keytool:

keytool -storepass changeit -keystore
/Library/Java/JavaVirtualMachines/jdk1.8.0_192.jdk/Contents/Home/jre/lib/security/cacerts
-list -v -alias "globalsignr2ca [jdk]"

I won't list that output here, but you can compare it with the detailed
view provided by your browser if you click on the root certificate.

If the issuer of the cert presented by a website cannot be found in the
truststore of the SSLContextService that InvokeHTTP is using, you'll see
that "PKIX path" exception.  This can happen with self-signed certs, if the
issuer's cert has not been added to the truststore.  Not all public CAs are
present in the JDK's cacerts by default, and like self-signed certs, the CA
cert will need to be added to a truststore.  I don't recommend adding certs
to cacerts, but Andy may have a different view on this.  If you need to
access sites with certs that are not part of cacerts, and you trust the
issuer of the cert for the site being accessed, you should create a custom
truststore that contains the issuer's cert, and configure an
SSLContextService that uses the custom truststore.

This isn't a comprehensive guide on CA certs, but I hope it helps you to
work through the issue.  Please let us know if we can help further!

- Jeff


On Mon, Dec 24, 2018 at 9:34 AM l vic  wrote:

> Could you try using an explicit path to the cacerts provided by your
> JDK/JRE, instead of referring to $JAVA_HOME?
> Tried without success...
> Were you able to successfully start the SSLContextService after
> configuring it?
> Yes
> InvokeHTTP needs to present a certificate that is signed by a CA that is
> in the default cacerts
> Not sure how to identify one that is supposed to be presented
>
>
> On Sun, Dec 23, 2018 at 1:32 PM Jeff  wrote:
>
>> Could you try using an explicit path to the cacerts provided by your
>> JDK/JRE, instead of referring to $JAVA_HOME?  Andy gave an example of
>> "/Library/Java/JavaVirtualMachines/jdk1.8.0_101.jdk/Contents/Home/jre/lib/security/cacerts",
>> which you would update with the path to the JDK you are using.  Referencing
>> an environment variable (without using EL) will not work for a NiFi
>> property.  It does not appear that EL is supported for the keystore and
>> truststore properties, as that could lead to security issues.  Those
>> properties have validators that should also verify that the
>> keystore/truststore exists and is readable.  Were you able to successfully
>> start the SSLContextService after configuring it?
>>
>> Also, as Andy mentioned, the URL you are using in InvokeHTTP needs to
>> present a certificate that is signed by a CA that is in the de

Re: InvokeHTTP failure to schedule fro CRON?

2018-12-23 Thread Jeff
Could you try using an explicit path to the cacerts provided by your
JDK/JRE, instead of referring to $JAVA_HOME?  Andy gave an example of
"/Library/Java/JavaVirtualMachines/jdk1.8.0_101.jdk/Contents/Home/jre/lib/security/cacerts",
which you would update with the path to the JDK you are using.  Referencing
an environment variable (without using EL) will not work for a NiFi
property.  It does not appear that EL is supported for the keystore and
truststore properties, as that could lead to security issues.  Those
properties have validators that should also verify that the
keystore/truststore exists and is readable.  Were you able to successfully
start the SSLContextService after configuring it?

Also, as Andy mentioned, the URL you are using in InvokeHTTP needs to
present a certificate that is signed by a CA that is in the default
cacerts.  Can you please verify this?  You can get a list of what is
contained in cacerts by using keytool, and specifying the path to cacerts,
the password, and the list command.  For example:

keytool -storepass changeit -keystore
/Library/Java/JavaVirtualMachines/jdk1.8.0_192.jdk/Contents/Home/jre/lib/security/cacerts
-list

- Jeff

On Fri, Dec 21, 2018 at 2:55 PM l vic  wrote:

> I put "default" parameters for trust-store:
> Path: $JAVA_HOME/jre/lib/security/cacerts
> Password: changeit (default)
> Type: JKS
>  and got "invalid path" exception ( see below)
> How does that missing cert file should look like?
> Thanks again...
>
> 2018-12-21 14:46:00,021 ERROR [Timer-Driven Process Thread-1]
> o.a.nifi.processors.standard.InvokeHTTP
> InvokeHTTP[id=0929346d-d742-1fd9-e41a-8e4324b73349] Yielding processor due
> to exception encountered as a source processor:
> javax.net.ssl.SSLHandshakeException:
> sun.security.validator.ValidatorException: PKIX path building failed:
> sun.security.provider.certpath.SunCertPathBuilderException: unable to find
> valid certification path to requested target: {}
>
> javax.net.ssl.SSLHandshakeException:
> sun.security.validator.ValidatorException: PKIX path building failed:
> sun.security.provider.certpath.SunCertPathBuilderException: unable to find
> valid certification path to requested target
>
>at sun.security.ssl.Alerts.getSSLException(Alerts.java:192)
>
>at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1964)
>
>at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:328)
>
>at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:322)
>
>at
> sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1614)
>
>at
> sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:216)
>
>at sun.security.ssl.Handshaker.processLoop(Handshaker.java:1052)
>
>at sun.security.ssl.Handshaker.process_record(Handshaker.java:987)
>
>at
> sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1072)
>
>at
> sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1385)
>
>at
> sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1413)
>
>at
> sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1397)
>
>at
> okhttp3.internal.connection.RealConnection.connectTls(RealConnection.java:267)
>
>at
> okhttp3.internal.connection.RealConnection.establishProtocol(RealConnection.java:237)
>
>at
> okhttp3.internal.connection.RealConnection.connect(RealConnection.java:148)
>
>at
> okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:186)
>
>at
> okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:121)
>
>at
> okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:100)
>
>at
> okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42)
>
>at
> okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
>
>at
> okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
>
>at
> okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
>
>at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterc
> eptorChain.java:92)
>
>at
> okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
>
>at
> okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
>
>at
> okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
>
>at
> okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:120)
>
>  

Re: Question regarding krb tgt renewal for Hive processors and connection pools

2018-12-23 Thread Jeff
Pat,

I apologize for not seeing this thread until today!  I'm glad there was a
commit available for you to cherry-pick and resolve your issue.  Also,
thanks to Shawn and Bryan the helpful comments!

- Jeff

On Sun, Dec 23, 2018 at 10:13 AM Pat White  wrote:

> Update, cherrypicking the fix from NIFI-5134 into 1.6.0 looks good to
> resolve hive connectionpool tgt renew/fetch issue we're seeing.
> Thanks again to Shawn and Bryan for the pointers, and to Jeff for the
> original PR.
>
> patw
>
> On Wed, Dec 19, 2018 at 5:22 PM Shawn Weeks 
> wrote:
>
>> It’s nifi-5134 that fixes this issue. Prior to that the hive connection
>> pool did not renew its Kerberos ticket correctly.
>>
>> Sent from my iPhone
>>
>> On Dec 19, 2018, at 5:15 PM, Pat White  wrote:
>>
>> Thanks much Bryan and Shawn, we're currently on 1.6.0 with some
>> cherrypicks from 1.8.0 jiras.
>> Will check the archives as mentioned, thanks again.
>>
>> patw
>>
>> On Wed, Dec 19, 2018 at 4:45 PM Shawn Weeks 
>> wrote:
>>
>>> There is a bug for this but I’m not sure which release fixed it.
>>> Something after 1.5 I think. The patch is in the hortonworks hdf 3.1.2
>>> release.
>>>
>>> If you go search for me in the archives I mentioned it a few months
>>> back.
>>>
>>> Thanks
>>> Shawn
>>>
>>> Sent from my iPhone
>>>
>>> > On Dec 19, 2018, at 3:59 PM, Pat White  wrote:
>>> >
>>> > Hi Folks,
>>> >
>>> > Using kerberos auth in Nifi clusters communicating with hdfs and for
>>> hive access, the ticket life is 24 hours. Hdfs works fine, however we're
>>> seeing issues with hive where the tgt doesn't seem to renew, or fetch a new
>>> ticket, as the 24hr limit approaches. Hence, hive access works fine until
>>> the 24hrs expires and then fails to authenticate. For example, a
>>> SelectHiveQL processor using the Hive Database Connection Pooling Service
>>> will work for 24 hours after a cluster restart but then fail with:
>>> >
>>> > org.ietf.jgss.GSSException: No valid credentials provided
>>> > (Mechanism level: Failed to find any Kerberos tgt)
>>> >
>>> > Enabled krb debugging, which shows the ticket is found but no renew,
>>> or new fetch attempt, seems to have been made. Krb docs discuss setting
>>> javax.security.auth.useSubjectCredsOnly=false in order to allow the
>>> underlying mechanism to obtain credentials, however the bootstrap.conf
>>> explicitly sets this to 'true', to inhibit JAAS from using any fallback
>>> methods to authenticate.
>>> >
>>> > Trying an experiment with useSubjectCredsOnly=false but would
>>> appreciate if anyone has some guidance on this, how to get hive's
>>> connection pools to renew tgt or fetch a new ticket ? Thank you.
>>> >
>>> > patw
>>> >
>>> >
>>> >
>>>
>>


HiveQL processors in Nifi 1.7.0

2018-11-05 Thread jeff whybark
Hello,

We recently upgraded from Nifi 1.4 to Nifi 1.7.0 and are suddenly having
some issues with both the SelectHiveQL and PutHiveQL processors.  Currently
we are hosting Nifi 1.7.0 in AWS and are connecting to Hive in AWS EMR
(Hive Version 2.1.1).


Since the upgrade we’ve noticed that the SelectHiveQL and PutHiveQL
processors have been acting a bit erratically with a series of different
strange patterns.


SelectHiveQL Processor -

1.  Processors Fails yet the flow file is passed to the Successful
Connection.  Typically error is something like "Error during database query
of conversion of records”

2.  Simple Query fails with an error erratically "transfer relationship not
specified”

3.  Queries that yields zero rows erratically fail.


PutHiveQL Processor -

1.  Query fails and flow file is not routed to failure connection. Instead
the flow file is retried over and over even though the “Retry” connection
is not specified.

2.  An "Insert into" query yielded zero records yet advanced to
“Successful” connection.  Unfortunately I cannot prove exactly what
happened here, but it is my belief that the query must have failed yet
advanced to “Successful”.  When rerun a second time, rows were successfully
written out to the target as it should have.


Mainly, I just wanted to see if anyone else was seeing this kind of odd
behavior with Nifi 1.7.0 and HiveQL processors.  Since these are all a bit
erratic, it has been difficult to troubleshoot and recreate on the fly.
Also, if anyone didn’t see any behavior problems when upgrading to Nifi
1.7.0 in AWS, I’d be really interested to hear which version of EMR/Hive
you are using.


Thank you for any suggestions,

Jeff


Re: [EXT] ReplaceText cannot consume messages if Regex does not match

2018-10-18 Thread Jeff
Is the actualSettlementDate attribute's value in the "MM/dd/" format,
with no other text in front of the date?  For instance, "10/18/2018
12:30:00" is parsable by the "MM/dd/" format, but "12:30:00 10/18/2018"
is not.

On Thu, Oct 18, 2018 at 12:02 PM Juan Pablo Gardella <
gardellajuanpa...@gmail.com> wrote:

> At *search value*:(?s)(^.*$)
>
> At *Replacement value*:
>
> * encoding="UTF-8"?>${actualSettlementDate:toDate('MM/dd/'):format("-MM-dd'T'00:00:00.000")}*
>
> The actualSettlementDate is a flowfile attribute. The problem is the
> replacement value is evaluated inside the processor and the *toDate *method
> fails.
>
> Hope it's clear now.
>
>
> On Thu, 18 Oct 2018 at 12:51 Shawn Weeks 
> wrote:
>
>> I’m still trying to understand your actual issue, can your provide a
>> screenshot of the ReplaceText config like the attached, I need to see
>> exactly where you’re putting the expression. A template would also be
>> really helpful.
>>
>>
>>
>> Thanks
>>
>> Shawn Weeks
>>
>>
>>
>> *From:* Juan Pablo Gardella 
>>
>> *Sent:* Thursday, October 18, 2018 10:45 AM
>>
>>
>> *To:* users@nifi.apache.org
>> *Subject:* Re: [EXT] ReplaceText cannot consume messages if Regex does
>> not match
>>
>>
>>
>> At ReplaceText
>> processor
>> we have:
>>
>>
>>
>> [image: image.png]
>>
>> As you can see, only if *StackOverflowError *is raised during the
>> evaluation, the flowfile is send to failure relationship. I would like to
>> update the code to use Exception or NifiExpressionFailedException (if it
>> exits).
>>
>>
>>
>> Juan
>>
>>
>>
>> On Thu, 18 Oct 2018 at 12:33 Shawn Weeks 
>> wrote:
>>
>> What processor are you defining your expression in? I also may be
>> misunderstanding the problem because I don’t see any regular expressions
>> anywhere. Can you create a sample workflow showing your issue so I can take
>> a look at it.
>>
>>
>>
>> Thanks
>>
>> Shawn Weeks
>>
>>
>>
>> *From:* Juan Pablo Gardella 
>> *Sent:* Thursday, October 18, 2018 10:27 AM
>> *To:* users@nifi.apache.org
>> *Subject:* Re: [EXT] ReplaceText cannot consume messages if Regex does
>> not match
>>
>>
>>
>> No, it's not a valid date. I would like if it an error happens, I would
>> like to throw the flowfile to failure and continue.
>>
>>
>>
>> On Thu, 18 Oct 2018 at 12:19 Shawn Weeks 
>> wrote:
>>
>> Any expression language syntax has to be correct or the processor won’t
>> run. I’m not sure there is any way to work around that except to explicitly
>> check that the value you are trying to evaluate is valid. Is the attribute
>> “tradeDate” coming from the contents of a flow file or is it defined
>> somewhere else. Can you ensure it is a valid date in that format before
>> hand?
>>
>>
>>
>> Thanks
>>
>> Shawn Weeks
>>
>>
>>
>> *From:* Juan Pablo Gardella 
>>
>> *Sent:* Thursday, October 18, 2018 10:13 AM
>>
>>
>> *To:* users@nifi.apache.org
>> *Subject:* Re: [EXT] ReplaceText cannot consume messages if Regex does
>> not match
>>
>>
>>
>> Hi, the error is not in the processor itself. It's in the expression used
>> against flowfile attributes. For example inside the text, I have:
>>
>>
>>
>>
>> ${tradeDate:toDate('MM/dd/'):format("-MM-dd'T'00:00:00.000")}
>>
>> And that is the root issue. If it's unable to convert it, the flow cannot
>> be consumed. How can I evaluate attributes in a non-blocker way?
>>
>>
>>
>> Juan
>>
>>
>>
>> On Thu, 18 Oct 2018 at 12:07 Shawn Weeks 
>> wrote:
>>
>> Where is your expression? That’s not the entire configuration for that
>> processor.
>>
>>
>>
>> Thanks
>>
>> Shawn Weeks
>>
>>
>>
>> *From:* Juan Pablo Gardella 
>> *Sent:* Thursday, October 18, 2018 10:03 AM
>> *To:* users@nifi.apache.org
>> *Subject:* Re: [EXT] ReplaceText cannot consume messages if Regex does
>> not match
>>
>>
>>
>> Configuration:
>>
>> Replacement Strategy: Always replace
>>
>> EvaluationMode: Entire text
>>
>>
>>
>>
>>
>> On Thu, 18 Oct 2018 at 12:01 Juan Pablo Gardella <
>> gardellajuanpa...@gmail.com> wrote:
>>
>> Hortonworks nifi based on 1.5.0:
>>
>>
>>
>> Configuration:
>>
>> Thanks
>>
>>
>>
>> On Thu, 18 Oct 2018 at 11:56 Peter Wicks (pwicks) 
>> wrote:
>>
>> Hi Juan,
>>
>>
>>
>> What version of NiFi are you running on?
>>
>> What mode are you running ReplaceText in, all text or line by line?
>>
>> Other settings that might be important? What’s your RegEx look like (if
>> your able to share).
>>
>>
>>
>> --Peter
>>
>>
>>
>>
>>
>> *From:* Juan Pablo Gardella [mailto:gardellajuanpa...@gmail.com]
>> *Sent:* Thursday, October 18, 2018 8:53 AM
>> *To:* users@nifi.apache.org
>> *Subject:* [EXT] ReplaceText cannot consume messages if Regex does not
>> match
>>
>>
>>
>> Hi all,
>>
>>
>>
>> I'm seeing that ReplaceText is not able to consume messages that does not
>> match regex. It keeps all the messages in the input queue instead of
>> 

Re: question about ConsumeKafka metrics for incoming data

2018-10-18 Thread Jeff
Dominique,

The "bytes in" value represents how many bytes have been received from an
incoming connection.  Since ConsumeKafka is directly consuming data from
Kafka (a "source" processor), and not from a connection from an upstream
processor, it won't show a value for "bytes in".  If you want to see how
much data ConsumeKafka has sent downstream, you can look at "bytes out".
The value should be close to how many bytes have been consumed from Kafka.
I'm not familiar with the Kafka processors, but I assume the messages are
encrypted/compressed.

On Thu, Oct 18, 2018 at 11:25 AM Dominique De Vito 
wrote:

> Hi,
>
> While running a ConsumeKafka processor, I noticed there are metrics for
> incoming data, but there was no value (like "bytes in") provided in the UI.
>
> And as far as I have digged into the ConsumeKafka processor, I didn't
> found any code related to feeding the metrics related to the incoming data.
>
> Is it true that ConsumeKafka processor provides no value for the metrics
> related to the incoming data ?
>
> Is it the same for all processors dealing with data coming from outside
> Nifi ?
>
> Thanks.
>
> Regards,
> Dominique
>
>


Re: Whitelisting Proxy Host values in a Container Environment?

2018-10-14 Thread Jeff
Jon,

I'll start off by saying that I'm still new to k8s, and don't know the
specifics of most of this.  You should be able to inject the host and port
of the proxy into the NiFi configuration before starting the NiFi
instances, with a Helm chart, for instance.  I've done similar things with
docker compose.

You are correct, though, that nifi.web.proxy.host and
nifi.web.proxy.context.path need to be set in nifi.properties before
starting NiFi.  This is required for security purposes, and allows NiFi
return responses which are based on the proxy host and context path values
so that references to resources hosted by NiFi are made through the proxy,
instead of exposing the internal URI to a resource.

- Jeff

On Fri, Oct 12, 2018 at 12:42 PM Jon Logan  wrote:

> We are running into issues with NiFi not allowing secure connections in a
> container due to the proxy...the only documentation we've found on this
> involves whitelisting specific proxy addresses. Is this the only solution?
> Specifically, we're concerned about the fact that we don't know the proxy
> address ahead of time to whitelist -- the port is an arbitrary assigned at
> runtime port, and the proxy name could be any of the nodes of our
> Kubernetes cluster.
>
> Are we missing something?
>
>
> Thanks!
>


Re: [EXT] Re: Hive w/ Kerberos Authentication starts failing after a week

2018-07-29 Thread Jeff
Peter,

They're in separate NARs, and are isolated by different ClassLoaders, so
their state regarding UGI will be separate.  There shouldn't be a problem
there.  The only way I could think of that might create a problem is if
Atlas JARs were added to HDFS using the Additional Classpath Resources
property (from memory, I don't think the Hive processors have that
property), but that also uses a separate (descendant) ClassLoader, and
shouldn't create a problem either.

On Fri, Jul 27, 2018 at 1:29 PM Peter Wicks (pwicks) 
wrote:

> As an aside, while digging around in the code, I noticed that the Atlas
> Reporting Task has its own Hadoop Kerberos authentication logic
> (org.apache.nifi.atlas.security.Kerberos). I’m not using this, but it made
> me wonder if this could cause trouble if Hive (synchronized) and Atlas
> (separate, unsynchronized) were both trying to login from Keytab at the
> same time.
>
>
>
> --Peter
>
>
>
> *From:* Shawn Weeks [mailto:swe...@weeksconsulting.us]
>
> *Sent:* Friday, July 27, 2018 10:29 AM
>
>
> *To:* users@nifi.apache.org
> *Subject:* Re: [EXT] Re: Hive w/ Kerberos Authentication starts failing
> after a week
>
>
>
> If you're using the Hortonworks distribution it's fixed in the latest HDF
> 3.x release I think.
>
>
>
> Thanks
>
> Shawn
>
>
> --
>
> *From:* Peter Wicks (pwicks) 
> *Sent:* Friday, July 27, 2018 10:58 AM
> *To:* users@nifi.apache.org
> *Subject:* RE: [EXT] Re: Hive w/ Kerberos Authentication starts failing
> after a week
>
>
>
> Thanks Shawn. Looks like this was fixed in 1.7.0. Will have to upgrade.
>
>
>
> *From:* Shawn Weeks [mailto:swe...@weeksconsulting.us]
> *Sent:* Friday, July 27, 2018 8:07 AM
> *To:* users@nifi.apache.org
> *Subject:* Re: [EXT] Re: Hive w/ Kerberos Authentication starts failing
> after a week
>
>
>
> See NIFI-5134 as there was a known bug with the Hive Connection Pool that
> made it fail once the Kerberos Tickets expired and you lost your connection
> from Hive. If you don't have this patch in your version once the Kerberos
> Tickets reaches the end of it's lifetime the connection pool won't work
> till you restart NiFi.
>
>
>
> Thanks
>
> Shawn
> --
>
> *From:* Peter Wicks (pwicks) 
> *Sent:* Friday, July 27, 2018 8:51:54 AM
> *To:* users@nifi.apache.org
> *Subject:* RE: [EXT] Re: Hive w/ Kerberos Authentication starts failing
> after a week
>
>
>
> I don’t believe that is how this code works. Not to say that might not
> work, but I don’t believe that the Kerberos authentication used by NiFi
> processors relies in any way on the tickets that appear in klist.
>
>
>
> While we are only using a single account on this particular server, many
> of our servers use several Kerberos principals/keytab’s. I don’t think that
> doing kinit’s for all of them would work either.
>
>
>
> Thanks,
>
>   Peter
>
>
>
> *From:* Sivaprasanna [mailto:sivaprasanna...@gmail.com
> ]
> *Sent:* Friday, July 27, 2018 3:12 AM
> *To:* users@nifi.apache.org
> *Subject:* [EXT] Re: Hive w/ Kerberos Authentication starts failing after
> a week
>
>
>
> Did you try executing 'klist' to see if the tickets are there and renewed?
> If expired, try manual kinit and see if that fixes.
>
>
>
> On Fri, Jul 27, 2018 at 1:51 AM Peter Wicks (pwicks) 
> wrote:
>
> We are seeing frequent failures of our Hive DBCP connections after a week
> of use when using Kerberos with Principal/Keytab. We’ve tried with both the
> Credential Service and without (though in looking at the code, there should
> be no difference).
>
>
>
> It looks like the tickets are expiring and renewal is not happening?
>
>
>
> javax.security.sasl.SaslException: GSS initiate failed
>
> at
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
>
> at
> org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
>
> at
> org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271)
>
> at
> org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
>
> at
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
>
> at
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
>
> at java.security.AccessController.doPrivileged(Native Method)
>
> at javax.security.auth.Subject.doAs(Subject.java:422)
>
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>
> at
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
>
> at
> org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:204)
>
> at
> org.apache.hive.jdbc.HiveConnection.(HiveConnection.java:176)
>
> at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105)
>
> at
> 

Re: NiFi as a service

2018-06-15 Thread Jeff Zemerick
Tim, thanks a lot! I will give that a try.

Mike, I used the "nifi.sh install" as Tim described. My workaround was just
to remove /etc/init.d/nifi but your docker suggestion will help, too.

Thanks!
Jeff


On Fri, Jun 15, 2018 at 1:55 PM Tim Dean  wrote:

> Mike -
>
> The standard build of NiFi includes the bin/nifi.sh script, and one of
> the options for that script is the install command. When executed, that
> command writes a file called /etc/init.d/nifi and also manipulates a few
> links. I believe that something within that file and link creation causes
> subsequent apt-based commands to fail on Ubuntu - or at least on Ubuntu
> 16.04. I'm not well-versed enough in different Linux startup variations to
> know exactly where this is going wrong, but the workaround I provided in a
> previous response is how we've avoided using the built-in install command.
>
> On Fri, Jun 15, 2018 at 9:52 AM, Mike Thomsen 
> wrote:
>
>> Jeff,
>>
>> How did you get install NiFi as a service? I'm not aware of us offering
>> any systemd configurations for NiFi as part of the standard build from
>> Apache.
>>
>> In the mean time, I would suggest using Docker to run Mongo with NiFi if
>> this is a blocker for you and the goal here is to get a development
>> environment set up. If you want to do that, this is all you need:
>>
>> docker run -d -p 27017:27017 --name mongo1 mongo:latest
>>
>> docker exec -it mongo1 mongo DB_NAME_HERE
>>
>> Thanks,
>>
>> Mike
>>
>>
>> On Fri, Jun 15, 2018 at 9:44 AM Jeff Zemerick 
>> wrote:
>>
>>> Hi all,
>>>
>>> Running NiFi 1.6.0 on Ubuntu 16.04 and I installed it as a service and
>>> it worked great. On the same box, I then installed MongoDB (via apt-get).
>>> The MongoDB installation failed because of errors indicating a loop in the
>>> services:
>>>
>>> insserv: There is a loop between service nifi and mountdevsubfs if
>>> started
>>> insserv:  loop involving service mountdevsubfs at depth 2
>>> insserv:  loop involving service udev at depth 1
>>> insserv: Starting nifi depends on plymouth and therefore on system
>>> facility `$all' which can not be true!
>>>
>>> My knowledge around services is pretty limited. I removed the NiFi
>>> service and the MongoDB install finished ok. Wondering if there is anything
>>> I should do differently? If other information is needed please let me know.
>>>
>>> Thanks,
>>> Jeff
>>>
>>>
>


NiFi as a service

2018-06-15 Thread Jeff Zemerick
Hi all,

Running NiFi 1.6.0 on Ubuntu 16.04 and I installed it as a service and it
worked great. On the same box, I then installed MongoDB (via apt-get). The
MongoDB installation failed because of errors indicating a loop in the
services:

insserv: There is a loop between service nifi and mountdevsubfs if started
insserv:  loop involving service mountdevsubfs at depth 2
insserv:  loop involving service udev at depth 1
insserv: Starting nifi depends on plymouth and therefore on system facility
`$all' which can not be true!

My knowledge around services is pretty limited. I removed the NiFi service
and the MongoDB install finished ok. Wondering if there is anything I
should do differently? If other information is needed please let me know.

Thanks,
Jeff


Re: Hive connection Pool error

2018-05-30 Thread Jeff
Vishal Dutt,

Your issue relates to an existing JIRA [1] and as luck would have it, it's
already resolved! :) The fix for that JIRA [1] has been merged to master
and will be in the next NiFi release.

[1] https://issues.apache.org/jira/browse/NIFI-5134
[2] https://github.com/apache/nifi/pull/2667

On Wed, May 30, 2018 at 3:42 AM Pierre Villard 
wrote:

> Hi,
>
> Could you share additional details about the processor/CS configuration as
> well?
>
> Thanks
>
> 2018-05-30 7:03 GMT+02:00 Koji Kawamura :
>
>> Hello,
>>
>> Although I encountered various Kerberos related error, I haven't
>> encountered that one.
>> I tried to reproduce the same error by changing Kerberos related
>> configuration, but to no avail.
>> I recommend enabling Kerberos debug option for further debugging.
>>
>> You can add the option at nifi/conf/bootstrap.conf:
>> java.arg.19=-Dsun.security.krb5.debug=true
>>
>> Then debug logs are written to nifi/logs/nifi-bootstap.log
>>
>> Thanks,
>> Koji
>>
>> On Tue, May 29, 2018 at 10:31 PM, Vishal Dutt 
>> wrote:
>> > Hi ,
>> >
>> >
>> >
>> > We  are getting below error on randomly for few minutes and then goes
>> away,
>> > its coming in PUThiveql
>> >
>> >
>> >
>> >
>> >
>> > 2018-05-29 01:01:07,279 INFO [Timer-Driven Process Thread-95]
>> > org.apache.hive.jdbc.HiveConnection Will try to open client transport
>> with
>> > JDBC Uri:
>> > jdbc:hive2://
>> ctcl-hdpmaster2.msoit.com:1/default;principal=hive/_h...@msoit.com
>> >
>> > 2018-05-29 01:01:07,281 ERROR [Timer-Driven Process Thread-95]
>> > o.apache.thrift.transport.TSaslTransport SASL negotiation failure
>> >
>> > javax.security.sasl.SaslException: GSS initiate failed
>> >
>> > at
>> >
>> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
>> >
>> > at
>> >
>> org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
>> >
>> > at
>> > org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271)
>> >
>> > at
>> >
>> org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
>> >
>> > at
>> >
>> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
>> >
>> > at
>> >
>> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
>> >
>> > at java.security.AccessController.doPrivileged(Native Method)
>> >
>> > at javax.security.auth.Subject.doAs(Subject.java:422)
>> >
>> > at
>> >
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>> >
>> > at
>> >
>> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
>> >
>> > at
>> >
>> org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:204)
>> >
>> > at
>> > org.apache.hive.jdbc.HiveConnection.(HiveConnection.java:176)
>> >
>> > at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105)
>> >
>> > at
>> >
>> org.apache.commons.dbcp.DriverConnectionFactory.createConnection(DriverConnectionFactory.java:38)
>> >
>> > at
>> >
>> org.apache.commons.dbcp.PoolableConnectionFactory.makeObject(PoolableConnectionFactory.java:582)
>> >
>> > at
>> >
>> org.apache.commons.pool.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:1148)
>> >
>> > at
>> >
>> org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:106)
>> >
>> > at
>> >
>> org.apache.commons.dbcp.BasicDataSource.getConnection(BasicDataSource.java:1044)
>> >
>> > at
>> >
>> org.apache.nifi.dbcp.hive.HiveConnectionPool.lambda$getConnection$0(HiveConnectionPool.java:355)
>> >
>> > at java.security.AccessController.doPrivileged(Native Method)
>> >
>> > at javax.security.auth.Subject.doAs(Subject.java:422)
>> >
>> > at
>> >
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>> >
>> > at
>> >
>> org.apache.nifi.dbcp.hive.HiveConnectionPool.getConnection(HiveConnectionPool.java:355)
>> >
>> > at sun.reflect.GeneratedMethodAccessor393.invoke(Unknown Source)
>> >
>> > at
>> >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> >
>> > at java.lang.reflect.Method.invoke(Method.java:498)
>> >
>> > at
>> >
>> org.apache.nifi.controller.service.StandardControllerServiceInvocationHandler.invoke(StandardControllerServiceInvocationHandler.java:89)
>> >
>> > at com.sun.proxy.$Proxy97.getConnection(Unknown Source)
>> >
>> > at
>> >
>> org.apache.nifi.processors.hive.PutHiveQL.lambda$new$1(PutHiveQL.java:191)
>> >
>> > at
>> org.apache.nifi.processor.util.pattern.Put.onTrigger(Put.java:96)
>> >
>> > at
>> >
>> org.apache.nifi.processors.hive.PutHiveQL.lambda$onTrigger$6(PutHiveQL.java:274)
>> >
>> > at
>> >
>> 

Re: TLS Toolkit Certs: Knox to NiFi

2018-03-08 Thread Jeff
Hi Ryan,

I responded to your question over on the Knox user list, but I can include
my response here as well.

I'm glad you're using the TLS Toolkit, I was going to suggest you give that
a try, initially.  The cert from the keystore generated by the toolkit that
identifies the cert to use for Knox needs to be added to gateway.jks, along
with the nifi-cert key from the truststore.  Just importing both the
keystore and truststore generated by the toolkit for Knox should be all you
have to do there, since the toolkit generates those stores with just the
nifi-key and nifi-cert in the keystore and truststore respectively.  You
should end up with three keys in gateway.jks afterward; the
gateway-identity, nifi-key, and nifi-cert keys.  Once both of those are
added to gateway.jks, and you have configured the service definition for
NiFi in your topology with useTwoWaySsl set to true, the two-way SSL
handshake should succeed.

Also, you will want to add the DN from that nifi-key as a node identity (in
the same place you set the initial admin identity) so that NiFi can create
a "user" to represent the Knox node and add a policy for you to allow that
node/identity to proxy requests, if you haven't already done so.

In nifi.properties, set nifi.web.proxy.context.path to
"/gateway/sandbox/nifi-app".  The host and port of the Knox service should
also be set for nifi.web.proxy.host.

After adding the keystore and truststore material to gateway.jks, added a
user and policy for NiFi to identify and authorize Knox for proxying, and
updated nifi.properties mentioned above, Knox should be able to proxy NiFi
securely.

On Thu, Mar 8, 2018 at 8:15 AM Ryan H 
wrote:

> Hi All,
>
> I have been working on getting a secure NiFi cluster to work with Knox. I
> would like to have Knox be the entry point to NiFi. I have a NiFi cluster
> running in secure mode without error. Now I would like to place Knox in
> front of the Cluster. I have KnoxSSO setup which is configured with an
> external OpenID provider for which users are redirected to authN. This
> setup works fine when NiFi cluster is insecure.
>
> The error that I am getting is on the Knox side:
> ...
> *Caused by: sun.security.validator.ValidatorException: PKIX path building
> failed: sun.security.provider.certpath.SunCertPathBuilderException: unable
> to find valid certification path to requested target*
> ...
>
> I am pretty sure it is a cert issue (I reached out to the Knox Users Group
> and they think that it is a cert issue). I used the TLS Toolkit
> (Client/Server mode) to generate certs for the Knox machine. I imported the
> keystore.jks and truststore.jks to the Knox gateway.jks keystore. This did
> not solve the issue though. Is there something else that I should be
> importing into the Knox gateway.jks store based on what is generated by the
> TLS Toolkit?
>
> Any help is appreciated!
>
> Cheers,
>
> Ryan
>


Re: NiFi 1.5 with Knox 1.0.0

2018-03-08 Thread Jeff
I can confirm what Larry said.  A header, X-ProxiedEntitiesChain, is
required when proxying to NiFi secured with two-way SSL, requiring the DNs
of all the identities (parties/participants) involved in the proxying of a
request.

The initial admin would be the DN (which NiFi uses as the identity) that
can be authenticated by Knox that you would like to have the initial admin
privileges.  It's representative of the end user.

On Wed, Mar 7, 2018 at 4:35 PM larry mccay <lmc...@apache.org> wrote:

> The effective user will be the enduser authenticated by Knox not the knox
> user.
> I actually believe that you have the whole chain of users when proxying -
> so you won't lose either.
>
> On Wed, Mar 7, 2018 at 4:14 PM, Ryan H <ryan.howell.developm...@gmail.com>
> wrote:
>
>> Hi,
>>
>> Yes, some additional documentation would be great for Knox integration.
>> Another question I have based on the two options above:
>>
>> If users will access NiFi via Knox (rather than accessing NiFi directly
>> and then auth to Knox), once a user authenticates to Knox (and subsequently
>> to whatever provider is configured for KnoxSSO), will NiFi only see the
>> user as the Knox identity or will NiFi see the user as the user that
>> authenticated to Knox? In this setup would Knox be the initial admin
>> identity or would it be the user I have set up in my IDP (
>> someu...@somemail.com)? I’m just wondering if accessing NiFi thru Knox
>> will result in losing the concept of users. Hopefully this makes sense!
>>
>> Cheers,
>>
>> Ryan
>>
>> On Sun, Mar 4, 2018 at 1:33 PM Jeff <jtsw...@gmail.com> wrote:
>>
>>> Hello Ryan,
>>>
>>> I am not on my development laptop right now, but I can send you an
>>> example Knox topology that uses Knox, SSO, and NiFi.
>>>
>>> Regarding the two options you listed above, both can be used
>>> simultaneously.  If you only want to use option 1, you can set the Knox
>>> properties in nifi.properties and NiFi will be able to redirect users to
>>> log in through Knox.  For option 2, you do not have to set those
>>> properties, but you will have to generate a cert for Knox to identify
>>> itself to NiFi, and add the DN from that cert as a node identity in NiFi
>>> (grant that identity proxy privileges).
>>>
>>> The main concern between option 1 and 2 is if you'd like users to be
>>> able to access NiFi directly, or you'd like to force them to go through a
>>> security gateway (Knox) first.
>>>
>>> Looking at the Knox documentation in the NiFi Admin Guide, we do need to
>>> add a section for configuring Knox to proxy to NiFI with Knox doing the
>>> authentication.  I've created a JIRA [1] and will work on adding the
>>> documentation.
>>>
>>> [1] https://issues.apache.org/jira/browse/NIFI-4931
>>>
>>> On Sat, Mar 3, 2018 at 4:14 PM Ryan H <ryan.howell.developm...@gmail.com>
>>> wrote:
>>>
>>>> Hi All,
>>>>
>>>> I am trying to set up a secure NiFi cluster (or just a single node to
>>>> start with rather) that uses Knox for AuthN. I want to configure Knox with
>>>> an OpenID provider. From what I can tell I have two options:
>>>> 1. Access NiFi directly which would then kick back to Knox for Auth
>>>> (which is then configured with the OpenID provider)
>>>> 2. Access NiFi thru Knox (would not directly access NiFi but rather
>>>> proxy thru Knox always).
>>>>
>>>> I understand that I can just configure NiFi to use the OpenID provider
>>>> and not use Knox. However, there are some issues with this (for my use
>>>> case), specifically if I want to automate scaling up/down cluster nodes
>>>> (redirect url for OpenID has to be explicitly granted with the provider for
>>>> each callback url which is troublesome if dynamically scaling, and the way
>>>> I am exposing the service and the limitation with the NiFi Host Header with
>>>> 1.5).
>>>>
>>>> Based on the 2 assumed options listed above, is there a preference over
>>>> one or the other? I've found a couple blogs on configuring NiFi with Knox,
>>>> but it mostly leaves me with more questions (may just be my lack of
>>>> experience with Knox). Can anyone provide clear and concise direction on
>>>> what is exactly required for NiFi to work with Knox? Any sample Knox
>>>> configs? Is anything else req'd for NiFi config other than the Knox props
>>>> in the nifi.properties file?
>>>>
>>>> Any help is appreciated!
>>>>
>>>> Cheers,
>>>>
>>>> Ryan
>>>>
>>>
>


Re: NiFi 1.5 with Knox 1.0.0

2018-03-04 Thread Jeff
Hello Ryan,

I am not on my development laptop right now, but I can send you an example
Knox topology that uses Knox, SSO, and NiFi.

Regarding the two options you listed above, both can be used
simultaneously.  If you only want to use option 1, you can set the Knox
properties in nifi.properties and NiFi will be able to redirect users to
log in through Knox.  For option 2, you do not have to set those
properties, but you will have to generate a cert for Knox to identify
itself to NiFi, and add the DN from that cert as a node identity in NiFi
(grant that identity proxy privileges).

The main concern between option 1 and 2 is if you'd like users to be able
to access NiFi directly, or you'd like to force them to go through a
security gateway (Knox) first.

Looking at the Knox documentation in the NiFi Admin Guide, we do need to
add a section for configuring Knox to proxy to NiFI with Knox doing the
authentication.  I've created a JIRA [1] and will work on adding the
documentation.

[1] https://issues.apache.org/jira/browse/NIFI-4931

On Sat, Mar 3, 2018 at 4:14 PM Ryan H 
wrote:

> Hi All,
>
> I am trying to set up a secure NiFi cluster (or just a single node to
> start with rather) that uses Knox for AuthN. I want to configure Knox with
> an OpenID provider. From what I can tell I have two options:
> 1. Access NiFi directly which would then kick back to Knox for Auth (which
> is then configured with the OpenID provider)
> 2. Access NiFi thru Knox (would not directly access NiFi but rather proxy
> thru Knox always).
>
> I understand that I can just configure NiFi to use the OpenID provider and
> not use Knox. However, there are some issues with this (for my use case),
> specifically if I want to automate scaling up/down cluster nodes (redirect
> url for OpenID has to be explicitly granted with the provider for each
> callback url which is troublesome if dynamically scaling, and the way I am
> exposing the service and the limitation with the NiFi Host Header with
> 1.5).
>
> Based on the 2 assumed options listed above, is there a preference over
> one or the other? I've found a couple blogs on configuring NiFi with Knox,
> but it mostly leaves me with more questions (may just be my lack of
> experience with Knox). Can anyone provide clear and concise direction on
> what is exactly required for NiFi to work with Knox? Any sample Knox
> configs? Is anything else req'd for NiFi config other than the Knox props
> in the nifi.properties file?
>
> Any help is appreciated!
>
> Cheers,
>
> Ryan
>


Re: Large number of NiFi processors

2018-01-29 Thread Jeff
Hello Jon,

The number of processors is virtually unlimited, provided that you have
enough CPU to sustain the number of concurrent tasks allocated to NiFi to
provide acceptable performance, enough disk space for the NiFi
repositories, and enough RAM to cover the overhead of the processor
instances.  With 1,000 processors, you'll need enough concurrent tasks to
keep the latency of data flow to an acceptably low level.  Depending on the
processors you're using, and the design of your flow, you may need to
configure NiFi with additional heap space to cover the requirements of that
many processors.

NiFi 1.x had added quite a few features that are not available in the 0.x
version, like multi-tenancy support, additional user authentication options
(OpenID/Knox/improved LDAP configuration), improved proxy support, NiFi
Registry support, new processors, increased security, and bug fixes, among
other things.  Take a look at some of the release notes from the 1.x line
[1] for more details,

There are improved repository implementations, along with Record-based
processors that improve throughput of data through NiFi, in the 1.x line
that I don't think were back-ported to 0.x.

Are you running a single instance (non-clustered) of NiFi?  You may want to
create a NiFi cluster to set up distributed processing of data if the
resources of the host on which you're running NiFi is not able to keep up.
How has your current configuration been performing?

I know this answer is very high-level, and would be happy to dive into some
details if you'd like.

[1] https://cwiki.apache.org/confluence/display/NIFI/Release+Notes

On Mon, Jan 29, 2018 at 5:24 PM Jon Allen  wrote:

> Is there a limit to the number of processors that can be put into a NiFi
> flow?
>
> I'm assuming there isn't an actual hard limit but what's likely to break
> as the number of processors increases, and what's considered a large number?
>
> We currently have a few hundred processors in our graph but it's looking
> like this will head towards 1,000 in the near future. Does anyone have any
> suggestions for tuning the system to handle this? Are there any papers
> available describing what I should be looking at?
>
> We're currently running with NiFi 0.7.4. Are there any changes in later
> releases that improve things in this area?
>
> Thanks,
> Jon
>


Re: Generalizing a List / Fetch workflow

2018-01-28 Thread Jeff
It might take a bit more than allowing input on that processor.  One thing
you might want to check is how the processor retains state for the path
given in the configuration.  If this path changes regularly due to EL or
incoming flowfiles, the state management for the processor might not be
consistent.  From memory, I believe the processor only stores timestamps
related to the path, but not the path itself.  That's most likely one of
the reasons why the processor does not allow incoming flowfiles; the state
of multiple directories would have to be tracked with no way to
automatically clear unused/stale state, without NiFi being able to
determine of a path's state is still needed.  If you allow input in
ListFile, you'll have to make sure that state of the various paths gets
tracked properly.  There's potentially a sort of memory leak at that point,
although admittedly you'd have to send a lot of unique paths through that
processor before it became a problem.  At the moment, I don't have a good
strategy when the "stale" paths could be removed from state.

On Sat, Jan 27, 2018 at 7:21 PM James McMahon  wrote:

> This sounds promising. I can give it a try. Assuming this is the proper
> place to download the code for ListFile, can you tell me how to pull the
> correct source code? I've not yet done this. I would definitely go the
> route of fixed folder path with Expression Language support. -Jim
>
> On Sat, Jan 27, 2018 at 7:09 PM, Mike Thomsen 
> wrote:
>
>> It's input requirement is set to INPUT_FORBIDDEN. It shouldn't be too
>> hard to set that to INPUT_ALLOWED and make it able to handle a flowfile or
>> a fixed folder path (or even better, fixed folder path w/ EL support). If
>> you do a patch, I'll try to find time to do a review.
>>
>> On Fri, Jan 26, 2018 at 4:03 PM, James McMahon 
>> wrote:
>>
>>> My customer has posted a requirement for a generalized List / Fetch
>>> workflow. I would not be able to express the target directory in the
>>> ListFile as a fixed value because I would need to reset the input directory
>>> each workflow cycle, and the target input directories will not share any
>>> common parent.
>>>
>>> Has anyone recently explored an approach to feed the ListFile or
>>> equivalent its target input directory dynamically? It does not seem
>>> possible to do that. I've failed to figure this out before, not being able
>>> to determine a way to precede the ListFile in the workflow.
>>>
>>
>>
>


Re: MiNiFi contents-repository config problem

2018-01-17 Thread Jeff Zemerick
Sumanth,

MiNiFi 0.3.0 added a no-op provenance repository that does not store any
events. To use it specify the no-op provenance repository implementation
like:

Provenance Repository:
  provenance rollover time: 1 min
  implementation: org.apache.nifi.provenance.NoOpProvenanceRepository

Jeff



On Wed, Jan 17, 2018 at 12:14 AM, Sumanth Chinthagunta <xmlk...@gmail.com>
wrote:

> I have same problem. I am running MiNFi in Docker on OpenShift and they
> have limit of 10gb storage per container. I really want to disable province
> repository and limit size of the repos
>
> -Sumanth
>
> On Jan 16, 2018, at 7:55 AM, Aldrin Piri <aldrinp...@gmail.com> wrote:
>
> Great, thanks! Will put that on my list to scope out and will move
> conversation over to the issue.
>
> On Tue, Jan 16, 2018 at 10:32 AM, Saulo <saulo.e...@gmail.com> wrote:
>
>> Hi Aldrin,
>>
>>
>> Sure, I have just created the issue with the information we shared and
>> attached the file.
>>
>> Issue link: https://issues.apache.org/jira/browse/MINIFI-425
>>
>>
>> Thank you a lot.
>>
>>
>>
>>
>> --
>> Sent from: http://apache-nifi-users-list.2361937.n4.nabble.com/
>>
>
>


Re: Nifi: how to split logs?

2017-11-25 Thread Jeff
Sally,

What are you trying to parse out of nifi-app.log?  If you are looking for
provenance information, it may be easier and more straightforward to query
provenance directly.

On Fri, Nov 24, 2017 at 6:13 AM sally  wrote:

> I want to use nifi logs ( i mean nifi-app.log data) but when i try to get
> this files usually i got loads of content on one error and i want to know
> how can i filter all this data so that i could get only usefull information
>
> 1.should i change some configuration in logback.xml
> 2.or i should use splitText ,splitContent or relative nifi processors
> here is example of my log file data:
> 2017-11-13 18:29:09,824 INFO [Provenance Maintenance Thread-3]
> o.a.n.p.lucene.UpdateMinimumEventId Updated Minimum Event ID for Provenance
> Event Repository - Minimum Event ID now 45299672017-11-13 18:29:09,824 INFO
> [Provenance Maintenance Thread-3] o.a.n.p.PersistentProvenanceRepository
> Successfully performed Expiration Action
> org.apache.nifi.provenance.lucene.UpdateMinimumEventId@5c633b27 on
> Provenance Event file ./provenance_repository/4529212.prov.gz in 3
> millis2017-11-13 18:29:09,824 INFO [Provenance Maintenance Thread-3]
> o.a.n.p.lucene.DeleteIndexAction Removed expired Provenance Event file
> ./provenance_repository/4529212.prov.gz2017-11-13 18:29:09,824 INFO
> [Provenance Maintenance Thread-3] o.a.n.p.lucene.DeleteIndexAction Removed
> expired Provenance Table-of-Contents file
> ./provenance_repository/toc/4529212.toc2017-11-13 18:29:09,824 INFO
> [Provenance Maintenance Thread-3] o.a.n.p.PersistentProvenanceRepository
> Successfully performed Expiration Action
> org.apache.nifi.provenance.expiration.FileRemovalAction@3f1fd7b on
> Provenance Event file ./provenance_repository/4529212.prov.gz in 233296
> nanos2017-11-13 18:29:07,862 INFO [Provenance Repository Rollover Thread-1]
> o.a.n.p.lucene.SimpleIndexManager Index Writer for
> ./provenance_repository/index-1509974708000 has been returned to Index
> Manager and is no longer in use. Closing Index Writer2017-11-13
> 18:29:07,911
> INFO [Provenance Repository Rollover Thread-1]
> o.a.n.p.PersistentProvenanceRepository Successfully merged 16 journal files
> (103463 records) into single Provenance Log File
> ./provenance_repository/9476034.prov in 6719 milliseconds2017-11-13
> 18:29:07,911 INFO [Provenance Repository Rollover Thread-1]
> o.a.n.p.PersistentProvenanceRepository Successfully Rolled over
>
>
>
> --
> Sent from: http://apache-nifi-users-list.2361937.n4.nabble.com/
>


Re: Wait for N Files before starting to process

2017-11-20 Thread Jeff
Manish,

You're welcome, and thanks to Koji for writing up that blog!

On Mon, Nov 20, 2017 at 9:27 AM Manish Gupta 8 <mgupt...@sapient.com> wrote:

> Thank Jeff. This is exactly what I was searching for. Thank You.
>
>
>
>
>
> *From:* Jeff [mailto:jtsw...@gmail.com]
> *Sent:* Monday, November 20, 2017 7:41 PM
>
>
> *To:* users@nifi.apache.org
> *Subject:* Re: Wait for N Files before starting to process
>
>
>
> Hello Manish,
>
>
>
> I was answering a dev-list question and provided a URL to Koji's blog
> about Wait/Notify processors [1].  Please take a look and feel free to ask
> questions about how you you can integrate the Wait/Notify processors into
> your flow.
>
>
>
> [1]
> http://ijokarumawak.github.io/nifi/2017/02/02/nifi-notify-batch/#alternative-solution-waitnotify
>
>
>
> On Sat, Nov 18, 2017 at 8:28 AM Manish Gupta 8 <mgupt...@sapient.com>
> wrote:
>
> Thanks Andy. I will search for that thread.
>
>
>
> *From:* Andy Loughran [mailto:andylock...@gmail.com]
> *Sent:* Saturday, November 18, 2017 2:53 PM
> *To:* users@nifi.apache.org
> *Subject:* Re: Wait for N Files before starting to process
>
>
>
> Yo can chain together the wait and notify - there’s a message on this list
> for how to do that from about 4 months ago. - I asked the same question :)
>
>
>
> Andy
>
>
>
>
>
> Sent from my iPhone
>
>
> On 18 Nov 2017, at 06:32, Manish Gupta 8 <mgupt...@sapient.com> wrote:
>
> Hi,
>
>
>
> I am working on building a flow in NiFi that involve cloning a file to 2
> or more processing flows, and then wait (on some processor) for all the
> parallel flows to finish (in parallel) and execute a another flow.
>
>
>
> Is there any processor in Nifi which can do this out of the box i.e. wait
> for N file in input queue and then start its processing and timeout if
> expected N files didn’t arrive in some time T? If not, what’s a good way of
> achieving this.
>
>
>
> I have such a flow implemented in Akka, and want to migrate it to NiFi. I
> am using NiFi 1.3
>
>
>
> 
>
>
>
> Thanks,
>
>
>
> *Manish Gupta*
>
> *Senior Specialist Platform | AI & Data Engineering Practice*
>
> ***__**_**_****___****_*
>
>
>
> “Oxygen”, Tower C, Ground - 3rd floor,
>
> Plot No. 7, Sector 144 Expressway, Noida, UP, India
>
>
>
> *Mobile:* +91 981 059 1361 <+91%2098105%2091361>
>
> *Office:* +91 (120) 479 5000 <+91%20120%20479%205000>  *Ext : *75398
>
> *Email:* mgupt...@sapient.com
>
> *sapientconsulting.com* <https://www.sapientconsulting.com/>
>
>
>
>
>
>
>
>


Re: Wait for N Files before starting to process

2017-11-20 Thread Jeff
Hello Manish,

I was answering a dev-list question and provided a URL to Koji's blog about
Wait/Notify processors [1].  Please take a look and feel free to ask
questions about how you you can integrate the Wait/Notify processors into
your flow.

[1]
http://ijokarumawak.github.io/nifi/2017/02/02/nifi-notify-batch/#alternative-solution-waitnotify

On Sat, Nov 18, 2017 at 8:28 AM Manish Gupta 8  wrote:

> Thanks Andy. I will search for that thread.
>
>
>
> *From:* Andy Loughran [mailto:andylock...@gmail.com]
> *Sent:* Saturday, November 18, 2017 2:53 PM
> *To:* users@nifi.apache.org
> *Subject:* Re: Wait for N Files before starting to process
>
>
>
> Yo can chain together the wait and notify - there’s a message on this list
> for how to do that from about 4 months ago. - I asked the same question :)
>
>
>
> Andy
>
>
>
>
>
> Sent from my iPhone
>
>
> On 18 Nov 2017, at 06:32, Manish Gupta 8  wrote:
>
> Hi,
>
>
>
> I am working on building a flow in NiFi that involve cloning a file to 2
> or more processing flows, and then wait (on some processor) for all the
> parallel flows to finish (in parallel) and execute a another flow.
>
>
>
> Is there any processor in Nifi which can do this out of the box i.e. wait
> for N file in input queue and then start its processing and timeout if
> expected N files didn’t arrive in some time T? If not, what’s a good way of
> achieving this.
>
>
>
> I have such a flow implemented in Akka, and want to migrate it to NiFi. I
> am using NiFi 1.3
>
>
>
> 
>
>
>
> Thanks,
>
>
>
> *Manish Gupta*
>
> *Senior Specialist Platform | AI & Data Engineering Practice*
>
> ***__**_**_****___****_*
>
>
>
> “Oxygen”, Tower C, Ground - 3rd floor,
>
> Plot No. 7, Sector 144 Expressway, Noida, UP, India
>
>
>
> *Mobile:* +91 981 059 1361 <+91%2098105%2091361>
>
> *Office:* +91 (120) 479 5000 <+91%20120%20479%205000>  *Ext : *75398
>
> *Email:* mgupt...@sapient.com
>
> *sapientconsulting.com* 
>
>
>
>
>
>
>
>


[ANNOUNCE] Apache NiFi 1.4.0 release.

2017-10-03 Thread Jeff
Hello,

The Apache NiFi team would like to announce the release of Apache NiFi
1.4.0.

Apache NiFi is an easy to use, powerful, and reliable system to process and
distribute
data.  Apache NiFi was made for dataflow.  It supports highly configurable
directed graphs
of data routing, transformation, and system mediation logic.

More details on Apache NiFi can be found here:
https://nifi.apache.org/

The release artifacts can be downloaded from here:
https://nifi.apache.org/download.html

Maven artifacts have been made available here:
https://repository.apache.org/content/repositories/releases/org/apache/nifi/

Issues closed/resolved for this list can be found here:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316020=12340589

Release note highlights can be found here:
https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version1.4.0

Thank you
The Apache NiFi team


Re: NiFi ram usage

2017-08-31 Thread Jeff
Adam,

Mike brings up a good point...  When your VM has started, and you haven't
started NiFi yet, how much memory is free in the system?  An instance of
NiFi with an empty flow should have no trouble running in 512mb of heap
space.  I have a flow with a few processors on it and the heap usage
averages around 250mb for me, a default bootstrap.conf.

On Thu, Aug 31, 2017 at 7:59 AM Pierre Villard <pierre.villard...@gmail.com>
wrote:

> As Jeff, I'm a bit surprised by what you are experiencing. I've never
> changed the default values of 512MB when working with NiFi on my laptop and
> never hit OOM errors. Are you sure that 1GB is available on the VM before
> starting NiFi?
>
> Pierre
>
> 2017-08-31 13:52 GMT+02:00 Mike Thomsen <mikerthom...@gmail.com>:
>
>> Adam,
>>
>> I cannot say exactly why the default settings won't work for you on a
>> clean installation, but it likely has to do with how small the VM is. The
>> OS overhead alone is probably a few hundred MB of RAM. If you have anything
>> else running, even just MySQL or MongoDB it's entirely possible that you
>> actually don't have enough memory to give even 512MB to NiFi.
>>
>> My recommendation would be 4GB of RAM for the VM with Xms1G and Xmx2G for
>> the heap sizes. That's very reasonable for experimenting with something
>> like NiFi. The ram usage is very difficult to calculate in advance because
>> it's based entirely on what you're doing with NiFi.
>>
>> Mike
>>
>> On Wed, Aug 30, 2017 at 11:45 PM, Adam Lamar <adamond...@gmail.com>
>> wrote:
>>
>>> Jeff,
>>>
>>> This was a new installation so I actually hadn't set up any flows yet.
>>> NiFi wouldn't start immediately after installation (before I could
>>> configure any flows) because the system had too little ram. The 1.1GB
>>> figure is private (RSS) memory usage, which exceeded the 1GB instance limit
>>> (and the instance had no swap configured).
>>>
>>> Is there any system requirements documentation? I couldn't find any docs
>>> on minimum system specs, so I guess I'm wondering if the ram usage is known
>>> and expected, and if there are any ways to get the ram usage down.
>>>
>>> Thanks in advance,
>>> Adam
>>>
>>> ​
>>>
>>
>>
>


Re: NiFi ram usage

2017-08-30 Thread Jeff
HI Adam,

Can you provide some more detail about what your NiFi flow is like?  Are
you using custom processors?  I regularly use NiFi with the default
bootstrap settings without issue, but if you're bringing lots of data into
memory, have lots of flowfiles being processed concurrently, etc, memory
usage can ramp up.  Flow design can have quite a bit of impact on how much
ram you need allocated to the JVM to avoid OOMEs.

On Wed, Aug 30, 2017 at 3:15 PM Adam Lamar  wrote:

> Hi everybody,
>
> I recently started up a new cloud Linux instance with 1GB of ram to do
> some quick tasks in NiFi. I noticed NiFi kept dying without much
> information in the logs - it just seemed to stop during startup.
>
> Eventually I realized the system was running out of memory and OOM killing
> the process, hence the lack of information in the NiFi logs. Empirically
> version 1.3.0 needs about 1.1GB of RAM to start, and my flow caused an
> additional 200MB of ram usage.
>
> Are there any recommendations to get NiFi running with a lighter
> footprint? I noted the default 512MB heap limits in the bootstrap config
> (which I didn't change) so I'm guessing the ram usage is related to NiFi's
> plethora of processors.
>
> Cheers,
> Adam
>


Re: Nifi 1.4.0

2017-08-18 Thread Jeff
I'd be happy to undertake RM duties for the 1.4.0 release.

On Wed, Aug 16, 2017 at 3:31 PM BD International <
b.deep.internatio...@gmail.com> wrote:

> Hello,
>
> Just wondering if there was a release date planned for nifi 1.4.0?
>
> There are some really good features and fixs in it.
>
> Thanks
>
> Brian
>


Re: publishmqtt with SSL

2017-08-17 Thread Oxenberg, Jeff
Hey Andy,

Thanks for getting back to me.  I’ve linked to the log files below.  I do see 
in nifi-bootstrap.log that the cert is trusted but like you said it doesn’t 
look to be an SSL-specific issue.  I will work on a remote debug session and 
see if that gives me any additional clues.

2017-08-17 10:17:49,763 INFO [NiFi logging handler] org.apache.nifi.StdOut 
adding as trusted cert:
2017-08-17 10:17:49,763 INFO [NiFi logging handler] org.apache.nifi.StdOut   
Subject: CN=*.azure-devices.net
2017-08-17 10:17:49,763 INFO [NiFi logging handler] org.apache.nifi.StdOut   
Issuer:  CN=Microsoft IT SSL SHA2, OU=Microsoft IT, O=Microsoft Corporation, 
L=Redmond, ST=Washington, C=US
2017-08-17 10:17:49,763 INFO [NiFi logging handler] org.apache.nifi.StdOut   
Algorithm: RSA; Serial number: 0x5a0008405e4aa32ff9d2f237710008405e
2017-08-17 10:17:49,763 INFO [NiFi logging handler] org.apache.nifi.StdOut   
Valid from Thu May 11 20:25:52 CDT 2017 until Mon May 07 12:03:30 CDT 2018

https://www.dropbox.com/s/8ddzmezx0sporwa/nifi-bootstrap.log?dl=0
https://www.dropbox.com/s/609jgeqaxwnukun/nifi-app.log?dl=0

Thanks,

Jeff Oxenberg

From: Andy LoPresto <alopre...@apache.org<mailto:alopre...@apache.org>>
Reply-To: <users@nifi.apache.org<mailto:users@nifi.apache.org>>
Date: Wednesday, August 16, 2017 at 3:05 PM
To: <users@nifi.apache.org<mailto:users@nifi.apache.org>>
Subject: Re: publishmqtt with SSL

Hi Jeff,

Sorry you are having issues with this. Can you provide a full nifi-app.log 
which includes all the stacktraces? If you can enable 
“java.arg.15=-Djavax.net.debug=ssl,handshake” in your conf/bootstrap.conf, 
please also include nifi-bootstrap.log as this will contain the JSSE SSL/TLS 
output. From your stacktrace, it does not appear that this is a specific 
SSL/TLS issue, but it may be exposed by code related to that, so I can take a 
look.

Usually, "InvocationTargetException: null” means that a NullPointerException 
was generated when trying to invoke the method on a null object. If you can do 
a remote debug session, I would look at PublishMQTT:131 and check if an 
exception is being generated there (or catch Throwable on line 338 rather than 
specific MQTTException).

Andy LoPresto
alopre...@apache.org<mailto:alopre...@apache.org>
alopresto.apa...@gmail.com<mailto:alopresto.apa...@gmail.com>
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

On Aug 16, 2017, at 3:55 PM, Oxenberg, Jeff 
<jeff.oxenb...@hpe.com<mailto:jeff.oxenb...@hpe.com>> wrote:

Bumping this up as I’m still having an issue here; has anyone gotten 
publishmqtt working with SSL?

Jeff Oxenberg

From: Oxenberg, Jeff
Sent: Tuesday, August 08, 2017 8:33 PM
To: users@nifi.apache.org<mailto:users@nifi.apache.org>
Subject: publishmqtt with SSL

Hey,

I’m trying to get NiFi to send mqtt messages to the Azure IoT Hub.  The IoT Hub 
uses SSL certificates, and I’m having trouble getting it working with the 
publishmqtt processor.  I create a StandardSSLContextService pointing the 
truststore at /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/security/cacerts.  I 
made sure (I think) that the chain was trusted by importing it manually into 
the cacerts:
openssl s_client -showcerts -connect 
gsetest.azure-devices.net<http://gsetest.azure-devices.net>:8883  msft.cert
keytool -import -noprompt -trustcacerts -alias azure -file msft.cert -keystore 
/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/security/cacerts -storepass changeit

When I start the processor, I immediately get the below error.  This all works 
when I do it manually outside of NiFi using mosquitto_pub, so I know that my 
various settings (username, password, etc) are correct.  Has anyone done 
something similar, or can anyone offer any help here?

2017-08-08 17:20:28,570 ERROR [StandardProcessScheduler Thread-6] 
o.a.n.controller.StandardProcessorNode Failed to invoke @OnScheduled method due 
to java.lang.RuntimeException: Failed while executing one of processor's 
OnScheduled task.
java.lang.RuntimeException: Failed while executing one of processor's 
OnScheduled task.
at 
org.apache.nifi.controller.StandardProcessorNode.invokeTaskAsCancelableFuture(StandardProcessorNode.java:1480)
at 
org.apache.nifi.controller.StandardProcessorNode.access$000(StandardProcessorNode.java:100)
at 
org.apache.nifi.controller.StandardProcessorNode$1.run(StandardProcessorNode.java:1301)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
j

RE: publishmqtt with SSL

2017-08-16 Thread Oxenberg, Jeff
Bumping this up as I’m still having an issue here; has anyone gotten 
publishmqtt working with SSL?

Jeff Oxenberg

From: Oxenberg, Jeff
Sent: Tuesday, August 08, 2017 8:33 PM
To: users@nifi.apache.org
Subject: publishmqtt with SSL

Hey,

I’m trying to get NiFi to send mqtt messages to the Azure IoT Hub.  The IoT Hub 
uses SSL certificates, and I’m having trouble getting it working with the 
publishmqtt processor.  I create a StandardSSLContextService pointing the 
truststore at /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/security/cacerts.  I 
made sure (I think) that the chain was trusted by importing it manually into 
the cacerts:
openssl s_client -showcerts -connect gsetest.azure-devices.net:8883  msft.cert
keytool -import -noprompt -trustcacerts -alias azure -file msft.cert -keystore 
/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/security/cacerts -storepass changeit

When I start the processor, I immediately get the below error.  This all works 
when I do it manually outside of NiFi using mosquitto_pub, so I know that my 
various settings (username, password, etc) are correct.  Has anyone done 
something similar, or can anyone offer any help here?

2017-08-08 17:20:28,570 ERROR [StandardProcessScheduler Thread-6] 
o.a.n.controller.StandardProcessorNode Failed to invoke @OnScheduled method due 
to java.lang.RuntimeException: Failed while executing one of processor's 
OnScheduled task.
java.lang.RuntimeException: Failed while executing one of processor's 
OnScheduled task.
at 
org.apache.nifi.controller.StandardProcessorNode.invokeTaskAsCancelableFuture(StandardProcessorNode.java:1480)
at 
org.apache.nifi.controller.StandardProcessorNode.access$000(StandardProcessorNode.java:100)
at 
org.apache.nifi.controller.StandardProcessorNode$1.run(StandardProcessorNode.java:1301)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.ExecutionException: 
java.lang.reflect.InvocationTargetException
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:206)
at 
org.apache.nifi.controller.StandardProcessorNode.invokeTaskAsCancelableFuture(StandardProcessorNode.java:1463)
... 9 common frames omitted
Caused by: java.lang.reflect.InvocationTargetException: null
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)

Thanks,


Jeff Oxenberg


Nifi controller passwords stored at rest

2017-08-11 Thread jeff whybark
Hello, I am looking to find information on how passwords are stored at rest
in Nifi 1.3 to satisfy security requirements for my organization.  This is
mainly related to the passwords in the controller settings.  Are they
encrypted by default?  Are they stored in the flow.xml.gz file?  Thanks for
your help.  Jeff


publishmqtt with SSL

2017-08-08 Thread Oxenberg, Jeff
Hey,

I’m trying to get NiFi to send mqtt messages to the Azure IoT Hub.  The IoT Hub 
uses SSL certificates, and I’m having trouble getting it working with the 
publishmqtt processor.  I create a StandardSSLContextService pointing the 
truststore at /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/security/cacerts.  I 
made sure (I think) that the chain was trusted by importing it manually into 
the cacerts:
openssl s_client -showcerts -connect gsetest.azure-devices.net:8883  msft.cert
keytool -import -noprompt -trustcacerts -alias azure -file msft.cert -keystore 
/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/security/cacerts -storepass changeit

When I start the processor, I immediately get the below error.  This all works 
when I do it manually outside of NiFi using mosquitto_pub, so I know that my 
various settings (username, password, etc) are correct.  Has anyone done 
something similar, or can anyone offer any help here?

2017-08-08 17:20:28,570 ERROR [StandardProcessScheduler Thread-6] 
o.a.n.controller.StandardProcessorNode Failed to invoke @OnScheduled method due 
to java.lang.RuntimeException: Failed while executing one of processor's 
OnScheduled task.
java.lang.RuntimeException: Failed while executing one of processor's 
OnScheduled task.
at 
org.apache.nifi.controller.StandardProcessorNode.invokeTaskAsCancelableFuture(StandardProcessorNode.java:1480)
at 
org.apache.nifi.controller.StandardProcessorNode.access$000(StandardProcessorNode.java:100)
at 
org.apache.nifi.controller.StandardProcessorNode$1.run(StandardProcessorNode.java:1301)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.ExecutionException: 
java.lang.reflect.InvocationTargetException
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:206)
at 
org.apache.nifi.controller.StandardProcessorNode.invokeTaskAsCancelableFuture(StandardProcessorNode.java:1463)
... 9 common frames omitted
Caused by: java.lang.reflect.InvocationTargetException: null
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)

Thanks,


Jeff Oxenberg


Re: Using QueryDatabaseTable processor in MiNiFi

2017-07-27 Thread Jeff Zemerick
I went through the steps again and it worked fine so it was total user
failure somewhere on my part. I am curious as to where I went wrong so I am
trying to reproduce it to see if it's worth any documentation updates.

Thanks,
Jeff


On Thu, Jul 27, 2017 at 10:09 AM, Aldrin Piri <aldrinp...@gmail.com> wrote:

> Hey Jeff,
>
> Could you please provide a full nifi-app.log of the startup?
>
> Thanks,
> Aldrin
>
> On Thu, Jul 27, 2017 at 10:04 AM, Jeff Zemerick <jzemer...@apache.org>
> wrote:
>
>> I have a small flow that was exported from NiFi and it uses a
>> QueryDatabaseTable processor. When the flow is converted to yaml and set in
>> MiNiFi I get the following error in MiNiFi's app log:
>>
>> ERROR [main] o.apache.nifi.controller.FlowController Could not create
>> Processor of type org.apache.nifi.processors.standard.QueryDatabaseTable
>> for ID 9570c06a-837b-3ac0--; creating "Ghost"
>> implementation
>> org.apache.nifi.controller.exception.ProcessorInstantiationException:
>> Unable to find bundle for coordinate default:unknown:unversioned
>> at org.apache.nifi.controller.FlowController.instantiateProcess
>> or(FlowController.java:1162)
>> at org.apache.nifi.controller.FlowController.createProcessor(Fl
>> owController.java:1080)
>> at org.apache.nifi.controller.FlowController.createProcessor(Fl
>> owController.java:1053)
>> at org.apache.nifi.controller.StandardFlowSynchronizer.addProce
>> ssGroup(StandardFlowSynchronizer.java:1054)
>> at org.apache.nifi.controller.StandardFlowSynchronizer.addProce
>> ssGroup(StandardFlowSynchronizer.java:1175)
>> at org.apache.nifi.controller.StandardFlowSynchronizer.sync(Sta
>> ndardFlowSynchronizer.java:312)
>> at org.apache.nifi.controller.FlowController.synchronize(FlowCo
>> ntroller.java:1544)
>> at org.apache.nifi.persistence.StandardXMLFlowConfigurationDAO.
>> load(StandardXMLFlowConfigurationDAO.java:84)
>> at org.apache.nifi.controller.StandardFlowService.loadFromBytes
>> (StandardFlowService.java:720)
>> at org.apache.nifi.controller.StandardFlowService.load(Standard
>> FlowService.java:533)
>> at org.apache.nifi.minifi.MiNiFiServer.start(MiNiFiServer.java:113)
>> at org.apache.nifi.minifi.MiNiFi.(MiNiFi.java:140)
>> at org.apache.nifi.minifi.MiNiFi.main(MiNiFi.java:239)
>>
>> I verified in the log that the QueryDatabaseTable processor is being
>> loaded during MiNiFi's start up. Just above this message I can see that the
>> DBCP controller service seems to be ok:
>>
>> 2017-07-27 09:32:54,877 INFO [main] 
>> o.a.n.c.s.StandardControllerServiceProvider
>> Created Controller Service of type org.apache.nifi.dbcp.DBCPConnectionPool
>> with identifier c42fe292-fe4b-3423--
>> 2017-07-27 09:32:54,925 INFO [main] o.a.nifi.groups.StandardProcessGroup
>> DBCPConnectionPool[id=c42fe292-fe4b-3423--] added to
>> StandardProcessGroup[identifier=f56cb9ff-f706-328b--]
>>
>> Did I forget to do something?
>>
>> Thanks,
>> Jeff
>>
>>
>


Using QueryDatabaseTable processor in MiNiFi

2017-07-27 Thread Jeff Zemerick
I have a small flow that was exported from NiFi and it uses a
QueryDatabaseTable processor. When the flow is converted to yaml and set in
MiNiFi I get the following error in MiNiFi's app log:

ERROR [main] o.apache.nifi.controller.FlowController Could not create
Processor of type org.apache.nifi.processors.standard.QueryDatabaseTable
for ID 9570c06a-837b-3ac0--; creating "Ghost" implementation
org.apache.nifi.controller.exception.ProcessorInstantiationException:
Unable to find bundle for coordinate default:unknown:unversioned
at
org.apache.nifi.controller.FlowController.instantiateProcessor(FlowController.java:1162)
at
org.apache.nifi.controller.FlowController.createProcessor(FlowController.java:1080)
at
org.apache.nifi.controller.FlowController.createProcessor(FlowController.java:1053)
at
org.apache.nifi.controller.StandardFlowSynchronizer.addProcessGroup(StandardFlowSynchronizer.java:1054)
at
org.apache.nifi.controller.StandardFlowSynchronizer.addProcessGroup(StandardFlowSynchronizer.java:1175)
at
org.apache.nifi.controller.StandardFlowSynchronizer.sync(StandardFlowSynchronizer.java:312)
at
org.apache.nifi.controller.FlowController.synchronize(FlowController.java:1544)
at
org.apache.nifi.persistence.StandardXMLFlowConfigurationDAO.load(StandardXMLFlowConfigurationDAO.java:84)
at
org.apache.nifi.controller.StandardFlowService.loadFromBytes(StandardFlowService.java:720)
at
org.apache.nifi.controller.StandardFlowService.load(StandardFlowService.java:533)
at org.apache.nifi.minifi.MiNiFiServer.start(MiNiFiServer.java:113)
at org.apache.nifi.minifi.MiNiFi.(MiNiFi.java:140)
at org.apache.nifi.minifi.MiNiFi.main(MiNiFi.java:239)

I verified in the log that the QueryDatabaseTable processor is being loaded
during MiNiFi's start up. Just above this message I can see that the DBCP
controller service seems to be ok:

2017-07-27 09:32:54,877 INFO [main]
o.a.n.c.s.StandardControllerServiceProvider Created Controller Service of
type org.apache.nifi.dbcp.DBCPConnectionPool with identifier
c42fe292-fe4b-3423--
2017-07-27 09:32:54,925 INFO [main] o.a.nifi.groups.StandardProcessGroup
DBCPConnectionPool[id=c42fe292-fe4b-3423--] added to
StandardProcessGroup[identifier=f56cb9ff-f706-328b--]

Did I forget to do something?

Thanks,
Jeff


Re: ListS3 duration

2017-07-21 Thread Jeff
Laurens,

I think I have a working Cloudtrail flow on my other computer...  I'll try
to fire that up today and see what I get.  I used 1.3.0 the last time I
looked at Cloudtrail data.

On Thu, Jul 20, 2017 at 4:56 PM Laurens Vets  wrote:

> Please see inline for my answers and some additional information.
>
> > It sounds like you are doing the right troubleshooting steps.  A few
> > more ideas off the top of my head:
> >
> > * When you tested with the s3 cli, did you use the same credentials,
> > from the same machine NiFi is running on?  The CloudTrail events are
> > written by AWS, so the ownership and permissions might be tricky.
>
> Same credentials, not the same machine.
>
> > * As an experiment, try creating one or more new directory/objects as
> > the NiFi user and configuring ListS3's prefix to target only these new
> > objects (you might want to copy/paste ListS3 or be sure to wipe out the
> > state later).
>
> I'll try this as well.
>
> > * You are sure the prefix is blank?  You might try setting it to
> > "AWSLogs/" for a while to see if it's different.
>
> Tried with a blank prefix, with "/" and "AWSLogs" now, no change. Or
> should I wait a while first?
> If I set the prefix to a directory containing actual log objects
> (*.json.gz files), ListS3 is able to list them almost immediately. The
> prefix used is "AWSLogs//CloudTrail/ap-northeast-1/2017/07/03/"
> in this case.
> It sems ListS3 doesn't recurse?
>
> > * Do you have CloudTrail set up to record S3 data events, or can you
> > set this up?  This is usually very tedious, but sometimes there is no
> > substitute.
>
> I'll doublecheck. I believe I set this up.
>
> Kind regards,
> Laurens
>
> > On Thu, Jul 20, 2017 at 11:56 AM, Joe Witt  wrote:
> >
> >> Looking at the code it suggests the two cases where it would come up
> >> with nothing for listing (when there are items to list) is if there is
> >> state already tracking lastModified of a previously pulled object or
> >> previously pulled object with the same key.  Since you're not even
> >> getting to the point where state is being persisted it suggests it
> >> really is getting nothing back on the listing request.
> >>
> >> Just in looking at the docs I wonder if you'll need to explicitly set
> >> the prefix value to something like '/'?
> >>
> >> JeffStorck/JamesWing: Any ideas?
> >>
> >> We should update the code to provide debug information when listed
> >> objects are skipped.
> >>
> >> Thanks
> >> Joe
> >>
> >> On Thu, Jul 20, 2017 at 2:44 PM, Laurens Vets 
> >> wrote:
> >>> I enabled DEBUG logging and I see the following:
> >>>
> >>>
> >>> 2017-07-20 11:39:08,670 DEBUG [StandardProcessScheduler Thread-1]
> >>> org.apache.nifi.processors.aws.s3.ListS3
> >>> ListS3[id=6119854d-015d-1000-341f-b294838980af] Using aws credentials
> >>> for
> >>> creating client
> >>> 2017-07-20 11:39:08,670 INFO [StandardProcessScheduler Thread-1]
> >>> org.apache.nifi.processors.aws.s3.ListS3
> >>> ListS3[id=6119854d-015d-1000-341f-b294838980af] Creating client with
> >>> AWS
> >>> credentials
> >>> 2017-07-20 11:39:08,672 INFO [StandardProcessScheduler Thread-1]
> >>> o.a.n.c.s.TimerDrivenSchedulingAgent Scheduled
> >>> ListS3[id=6119854d-015d-1000-341f-b294838980af] to run with 1 threads
> >>> 2017-07-20 11:39:08,674 DEBUG [Timer-Driven Process Thread-4]
> >>> org.apache.nifi.processors.aws.s3.ListS3
> >>> ListS3[id=6119854d-015d-1000-341f-b294838980af] Returning CLUSTER
> >>> State:
> >>> StandardStateMap[version=-1, values={}]
> >>> 2017-07-20 11:39:09,089 INFO [Flow Service Tasks Thread-2]
> >>> o.a.nifi.controller.StandardFlowService Saved flow controller
> >>> org.apache.nifi.controller.FlowController@7c10f421 // Another save
> >>> pending =
> >>> false
> >>> 2017-07-20 11:39:09,249 INFO [Timer-Driven Process Thread-4]
> >>> org.apache.nifi.processors.aws.s3.ListS3
> >>> ListS3[id=6119854d-015d-1000-341f-b294838980af] Successfully listed
> >>> S3
> >>> bucket BUCKETNAME in 575 millis
> >>> 2017-07-20 11:39:09,249 DEBUG [Timer-Driven Process Thread-4]
> >>> org.apache.nifi.processors.aws.s3.ListS3
> >>> ListS3[id=6119854d-015d-1000-341f-b294838980af] No new objects in S3
> >>> bucket
> >>> BUCKETNAME to list. Yielding.
> >>> 2017-07-20 11:39:09,249 DEBUG [Timer-Driven Process Thread-4]
> >>> org.apache.nifi.processors.aws.s3.ListS3
> >>> ListS3[id=6119854d-015d-1000-341f-b294838980af] has chosen to yield
> >>> its
> >>> resources; will not be scheduled to run again for 1000 milliseconds
> >>> 2017-07-20 11:39:10,246 INFO [Write-Ahead Local State Provider
> >>> Maintenance]
> >>> org.wali.MinimalLockingWriteAheadLog
> >>> org.wali.MinimalLockingWriteAheadLog@2480acc3 checkpointed with 0
> >>> Records
> >>> and 0 Swap Files in 9 milliseconds (Stop-the-world time = 1
> >>> milliseconds,
> >>> Clear Edit Logs time = 0 millis), max Transaction ID -1
> >>> 2017-07-20 11:39:10,250 DEBUG [Timer-Driven Process Thread-4]
> >>> 

Re: S3 Cloudtrail log ingestion to Kafka via NiFi

2017-06-07 Thread Jeff
Hello Laurens,

I set up a flow to test this as well, and also saw that the unzipped json
looks like it contains references to other S3Objects.  I'm not familiar
with the formats used by CloudTrail or how the actual logging data is
stored.  I'll have to read up on it, but I think we can set up a flow to
split the json, grab the referenced S3Object values, and route them to
another FetchS3Object processor to pull back the actual logs and decompress
them.

I'll be away from my computer for most of the night, but will hopefully get
back to you tomorrow after doing some more research.

On Wed, Jun 7, 2017 at 1:30 PM James Wing  wrote:

> Are you seeing errors, or just unexpected results?  ListS3 only returns
> references to objects on S3, but FetchS3Object should return the object
> content.  I recommend looking at the output of FetchS3Object to make sure
> it is right (in size and content type) before trying to unzip it.
>
> Thanks,
>
> James
>
> On Wed, Jun 7, 2017 at 9:56 AM, Laurens Vets  wrote:
>
>> Hello,
>>
>> Has anyone been able to ingest S3 Cloudtrail logs into Kafka with NiFi? I
>> got as far ListS3 -> FetchS3Object -> Gunzip, but I'm stuck here. It seems
>> I'm not actually unzipping the logs, but references to the S3 objects?
>>
>> Any help would be appreciated.
>>
>
>


Re: Phantom node

2017-05-03 Thread Jeff
Can you provide some information on the configuration (nifi.properties) of
the nodes in your cluster?  Can each node in your cluster ping all the
other nodes?  Are you running embedded ZooKeeper, or an external one?

On Wed, May 3, 2017 at 8:11 PM Neil Derraugh <
neil.derra...@intellifylearning.com> wrote:

> I can't load the canvas right now on our cluster.  I get this error from
> one of the nodes nifi-app.logs
>
> 2017-05-03 23:40:30,207 WARN [Replicate Request Thread-2]
> o.a.n.c.c.h.r.ThreadPoolRequestReplicator Failed to replicate request GET
> /nifi-api/flow/current-user to 10.80.53.39:31212 due to {}
> com.sun.jersey.api.client.ClientHandlerException:
> java.net.NoRouteToHostException: Host is unreachable (Host unreachable)
> at
> com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:155)
> ~[jersey-client-1.19.jar:1.19]
> at com.sun.jersey.api.client.Client.handle(Client.java:652)
> ~[jersey-client-1.19.jar:1.19]
> at
> com.sun.jersey.api.client.filter.GZIPContentEncodingFilter.handle(GZIPContentEncodingFilter.java:123)
> ~[jersey-client-1.19.jar:1.19]
> at com.sun.jersey.api.client.WebResource.handle(WebResource.java:682)
> ~[jersey-client-1.19.jar:1.19]
> at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
> ~[jersey-client-1.19.jar:1.19]
> at com.sun.jersey.api.client.WebResource$Builder.get(WebResource.java:509)
> ~[jersey-client-1.19.jar:1.19]
> at
> org.apache.nifi.cluster.coordination.http.replication.ThreadPoolRequestReplicator.replicateRequest(ThreadPoolRequestReplicator.java:579)
> ~[nifi-framework-cluster-1.1.2.jar:1.1.2]
> at
> org.apache.nifi.cluster.coordination.http.replication.ThreadPoolRequestReplicator$NodeHttpRequest.run(ThreadPoolRequestReplicator.java:771)
> ~[nifi-framework-cluster-1.1.2.jar:1.1.2]
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> [na:1.8.0_121]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_121]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> [na:1.8.0_121]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> [na:1.8.0_121]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_121]
> Caused by: java.net.NoRouteToHostException: Host is unreachable (Host
> unreachable)
> at java.net.PlainSocketImpl.socketConnect(Native Method) ~[na:1.8.0_121]
> at
> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
> ~[na:1.8.0_121]
> at
> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
> ~[na:1.8.0_121]
> at
> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
> ~[na:1.8.0_121]
> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
> ~[na:1.8.0_121]
> at java.net.Socket.connect(Socket.java:589) ~[na:1.8.0_121]
> at sun.net.NetworkClient.doConnect(NetworkClient.java:175) ~[na:1.8.0_121]
> at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
> ~[na:1.8.0_121]
> at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
> ~[na:1.8.0_121]
> at sun.net.www.http.HttpClient.(HttpClient.java:211) ~[na:1.8.0_121]
> at sun.net.www.http.HttpClient.New(HttpClient.java:308) ~[na:1.8.0_121]
> at sun.net.www.http.HttpClient.New(HttpClient.java:326) ~[na:1.8.0_121]
> at
> sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1202)
> ~[na:1.8.0_121]
> at
> sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1138)
> ~[na:1.8.0_121]
> at
> sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1032)
> ~[na:1.8.0_121]
> at
> sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:966)
> ~[na:1.8.0_121]
> at
> sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1546)
> ~[na:1.8.0_121]
> at
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1474)
> ~[na:1.8.0_121]
> at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
> ~[na:1.8.0_121]
> at
> com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:253)
> ~[jersey-client-1.19.jar:1.19]
> at
> com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:153)
> ~[jersey-client-1.19.jar:1.19]
> ... 12 common frames omitted
> 2017-05-03 23:40:30,207 WARN [Replicate Request Thread-2]
> o.a.n.c.c.h.r.ThreadPoolRequestReplicator
> com.sun.jersey.api.client.ClientHandlerException:
> java.net.NoRouteToHostException: Host is unreachable (Host unreachable)
> at
> com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:155)
> ~[jersey-client-1.19.jar:1.19]
> at com.sun.jersey.api.client.Client.handle(Client.java:652)
> ~[jersey-client-1.19.jar:1.19]
> at
> 

Re: Organizing the flows in the web interface

2017-04-30 Thread Jeff
There is no "New" button to create a blank canvas that would function as a
separate flow.  You can organize your flows into separate Process Groups,
though.  It's a visual distinction, and still subject to the policy
hierarchy of Process Groups.  Other parts of your flow can send and receive
data between Process Groups via input and output ports.  You can achieve
what you're looking for by creating a Process Group and moving your
existing flow into it, and then creating a second Process Group to create
your "second flow".

On Sun, Apr 30, 2017 at 10:29 PM Buntu Dev  wrote:

> Is there a 'New' button to start creating a new flow/template? I was only
> able to move the existing flow and start adding new processors for building
> a new flow but wondering if there is anything I'm missing to start with a
> new canvas to start building a new flow.
>
>
> Thanks!
>


Re: How to identify MiNiFi source edge devices

2017-04-25 Thread Jeff Zemerick
Aldrin,

To simplify it, the situation is analogous to a deployment of temperature
sensors. Each sensor has a unique ID that is assigned by us at deployment
time and each sensor periodically adds a new row to a database table that
is stored on the sensor. Each sensor uses the same database schema so if
you combined all the rows you couldn't tell which rows originated from
which sensor. In NiFi, I need to do different things based on where the
data originated and I need to associate the sensor's ID with its data.
(Such as inserting the data into DynamoDB with the sensor ID as the Hash
key and a timestamp as the Range key.) The goal is to use the same MiNiFi
configuration for all devices.

I can easily use the ExecuteSQL processor to grab the new rows. But I need
some way to attach an attribute to the data that identifies where it
originated. That was what led to the initial question in this thread. The
Variable Registry along with the UpdateAttribute processor appears to
satisfy that need cleaner than a custom processor.

I hope that explains the situation a bit!

Thanks,
Jeff



On Tue, Apr 25, 2017 at 11:17 AM, Aldrin Piri <aldrinp...@gmail.com> wrote:

> Jeff,
>
> Could you expand upon what a device id is in your case?  Something
> intrinsic to the device? The agent?  Are these generated and assigned
> during provisioning?   How are you making use of these when the data
> arrives at its desired destination?
>
> What you are expressing is certainly a common need.  Would welcome any
> perspective on what your deployment looks like such that we can frame uses
> people are rolling out to guide assumptions that get made during our
> development and design processes.
>
> Thanks for diving in and exploring!
> --Aldrin
>
>
> On Tue, Apr 25, 2017 at 11:05 AM, Andre <andre-li...@fucs.org> wrote:
>
>> Jeff,
>>
>> That would be next suggestion. :-)
>>
>> Cheers
>>
>> On Wed, Apr 26, 2017 at 1:04 AM, Jeff Zemerick <jzemer...@apache.org>
>> wrote:
>>
>>> It is possible. I will take a look to see if the hostname is sufficient
>>> for the device ID.
>>>
>>> I just learned about the Variable Registry. It seems if I use the
>>> Variable Registry to store the device ID it would be available to the
>>> UpdateAttribute processor. Is that correct?
>>>
>>> Thanks,
>>> Jeff
>>>
>>>
>>> On Tue, Apr 25, 2017 at 10:48 AM, Andre <andre-li...@fucs.org> wrote:
>>>
>>>> Jeff,
>>>>
>>>> Would if be feasible for you use UpdateAttribute (which I believe is
>>>> part of MiNiFi core processors) and use the ${hostname(true)} Expression
>>>> language function?
>>>>
>>>> More about it can be found here:
>>>>
>>>> https://nifi.apache.org/docs/nifi-docs/html/expression-langu
>>>> age-guide.html#hostname
>>>>
>>>> Cheers
>>>>
>>>> On Wed, Apr 26, 2017 at 12:39 AM, Jeff Zemerick <jzemer...@apache.org>
>>>> wrote:
>>>>
>>>>> When processing data in NiFi that was received via MiNiFi edge devices
>>>>> I need to be able to identify the source of the data. All of the data on
>>>>> the edge devices will be pulled from a database and will not contain any
>>>>> data that self-identifies the source. My attempt to solve this was to 
>>>>> write
>>>>> a processor that reads a configuration file on the edge device to get its
>>>>> device ID and put that ID as an attribute in the flowfile. This appears to
>>>>> work, but, I was wondering if there is a more recommended approach?
>>>>>
>>>>> Thanks,
>>>>> Jeff
>>>>>
>>>>
>>>>
>>>
>>
>


Re: How to identify MiNiFi source edge devices

2017-04-25 Thread Jeff Zemerick
It is possible. I will take a look to see if the hostname is sufficient for
the device ID.

I just learned about the Variable Registry. It seems if I use the Variable
Registry to store the device ID it would be available to the
UpdateAttribute processor. Is that correct?

Thanks,
Jeff


On Tue, Apr 25, 2017 at 10:48 AM, Andre <andre-li...@fucs.org> wrote:

> Jeff,
>
> Would if be feasible for you use UpdateAttribute (which I believe is part
> of MiNiFi core processors) and use the ${hostname(true)} Expression
> language function?
>
> More about it can be found here:
>
> https://nifi.apache.org/docs/nifi-docs/html/expression-
> language-guide.html#hostname
>
> Cheers
>
> On Wed, Apr 26, 2017 at 12:39 AM, Jeff Zemerick <jzemer...@apache.org>
> wrote:
>
>> When processing data in NiFi that was received via MiNiFi edge devices I
>> need to be able to identify the source of the data. All of the data on the
>> edge devices will be pulled from a database and will not contain any data
>> that self-identifies the source. My attempt to solve this was to write a
>> processor that reads a configuration file on the edge device to get its
>> device ID and put that ID as an attribute in the flowfile. This appears to
>> work, but, I was wondering if there is a more recommended approach?
>>
>> Thanks,
>> Jeff
>>
>
>


How to identify MiNiFi source edge devices

2017-04-25 Thread Jeff Zemerick
When processing data in NiFi that was received via MiNiFi edge devices I
need to be able to identify the source of the data. All of the data on the
edge devices will be pulled from a database and will not contain any data
that self-identifies the source. My attempt to solve this was to write a
processor that reads a configuration file on the edge device to get its
device ID and put that ID as an attribute in the flowfile. This appears to
work, but, I was wondering if there is a more recommended approach?

Thanks,
Jeff


Re: MiNiFi's differentiator

2017-04-20 Thread Jeff Zemerick
Thanks for that info, Aldrin! I will be glad to look over it.

Jeff

On Thu, Apr 20, 2017 at 12:09 PM, Aldrin Piri <aldrinp...@gmail.com> wrote:

> Hey Jeff,
>
> Just to hone in a bit more, MiNiFi would likely consume from some of the
> work surrounding the Command and Control [1] efforts.
>
> There is some initial work under way which should serve as a nice
> foundation to start exploring a more full featured and richer
> implementation.  Certainly plan to further expand and update docs as we
> learn a bit more through design and seeing what supports the common needs.
> Would certainly invite you to evaluate how it would apply to your uses and
> comment, if it is of interest.
>
> Thanks!
>
> [1] https://cwiki.apache.org/confluence/display/MINIFI/
> MiNiFi+Command+and+Control
>
> On Thu, Apr 20, 2017 at 11:51 AM, Jeff Zemerick <jzemer...@apache.org>
> wrote:
>
>> Hi Joe,
>>
>> No, the documentation is clear that is currently only available
>> implementation. Since it was designed to be extendable, I was wondering if
>> there were any example cases in which using the WholeConfigDifferentiator
>>  might not be the ideal differentiator and I should provide my own
>> implementation. WholeConfigDifferentiator's implementation seems simple,
>> fast, and to the point. I wasn't aware of the NiFi-Registry project so that
>> helps me see the bigger picture.
>>
>> Thanks,
>> Jeff
>>
>>
>> On Thu, Apr 20, 2017 at 10:36 AM, Joe Percivall <jperciv...@apache.org>
>> wrote:
>>
>>> Hello Jeff,
>>>
>>> Glad to hear the WholeConfigDifferentiator is working well. While it is
>>> configurable, that is currently the only implementation. Is there a
>>> specific place that suggests there are currently more implementations? The
>>> documentation lists it as the only one[1]. When I wrote it up I implemented
>>> it in such a way so that we could easily add other differentiators in the
>>> future. One such example is once we have the NiFi-Registry in place, a
>>> differentiator that takes advantage of whatever the uniqueness scheme that
>>> is implemented for that.
>>>
>>> [1] https://github.com/apache/nifi-minifi/blob/master/minifi
>>> -docs/src/main/markdown/System_Admin_Guide.md#automatic-warm-redeploy
>>>
>>> Joe
>>>
>>> On Thu, Apr 20, 2017 at 10:06 AM, Jeff Zemerick <jzemer...@apache.org>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> MiNiFi's WholeConfigDifferentiator along with
>>>> the PullHttpChangeIngestor is working well for me. I see that
>>>> the differentiator is configurable and additional implementations can be
>>>> provided. Are there any examples of circumstances in which the using
>>>> WholeConfigDifferentiator would not be the best choice?
>>>>
>>>> Thanks,
>>>> Jeff
>>>>
>>>
>>>
>>>
>>> --
>>> *Joe Percivall*
>>> linkedin.com/in/Percivall
>>> e: jperciv...@apache.com
>>>
>>
>>
>


Re: MiNiFi's differentiator

2017-04-20 Thread Jeff Zemerick
Hi Joe,

No, the documentation is clear that is currently only available
implementation. Since it was designed to be extendable, I was wondering if
there were any example cases in which using the WholeConfigDifferentiator might
not be the ideal differentiator and I should provide my own implementation.
WholeConfigDifferentiator's implementation seems simple, fast, and to the
point. I wasn't aware of the NiFi-Registry project so that helps me see the
bigger picture.

Thanks,
Jeff


On Thu, Apr 20, 2017 at 10:36 AM, Joe Percivall <jperciv...@apache.org>
wrote:

> Hello Jeff,
>
> Glad to hear the WholeConfigDifferentiator is working well. While it is
> configurable, that is currently the only implementation. Is there a
> specific place that suggests there are currently more implementations? The
> documentation lists it as the only one[1]. When I wrote it up I implemented
> it in such a way so that we could easily add other differentiators in the
> future. One such example is once we have the NiFi-Registry in place, a
> differentiator that takes advantage of whatever the uniqueness scheme that
> is implemented for that.
>
> [1] https://github.com/apache/nifi-minifi/blob/master/
> minifi-docs/src/main/markdown/System_Admin_Guide.md#
> automatic-warm-redeploy
>
> Joe
>
> On Thu, Apr 20, 2017 at 10:06 AM, Jeff Zemerick <jzemer...@apache.org>
> wrote:
>
>> Hi all,
>>
>> MiNiFi's WholeConfigDifferentiator along with the PullHttpChangeIngestor
>> is working well for me. I see that the differentiator is configurable and
>> additional implementations can be provided. Are there any examples of
>> circumstances in which the using WholeConfigDifferentiator would not be the
>> best choice?
>>
>> Thanks,
>> Jeff
>>
>
>
>
> --
> *Joe Percivall*
> linkedin.com/in/Percivall
> e: jperciv...@apache.com
>


Re: Configure ExecuteSQL in MiNiFi

2017-04-13 Thread Jeff Zemerick
Thank you, Aldrin! That is great to hear.

Jeff


On Thu, Apr 13, 2017 at 10:02 AM, Aldrin Piri <aldrinp...@gmail.com> wrote:

> Hi Jeff,
>
> This support was provided and is slated for 0.2.0 via MINIFI-154 but has
> not yet been released pending NiFi 1.2.0.  From your question you had on
> another issue yesterday, I believe you are building from master so you
> should have the requisite functionality needed.
>
> For SQL work, relevant docs have some notes on that process [1].
>
> Please let us know if you run into any issues along the way.
>
> Thanks!
>
> [1] https://github.com/apache/nifi-minifi/blob/master/
> minifi-docs/src/main/markdown/minifi-java-agent-quick-start.md
>
> On Thu, Apr 13, 2017 at 9:34 AM, Jeff Zemerick <jzemer...@apache.org>
> wrote:
>
>> How do I configure the database properties for the ExecuteSQL processor
>> in MiNiFi? I created my flow in NiFi and used the tool to convert it for
>> MiNiFi before I saw in the documentation that controller services are not
>> supported by MiNiFi.
>>
>> Thanks for any pointers!
>>
>> Jeff
>>
>>
>


Configure ExecuteSQL in MiNiFi

2017-04-13 Thread Jeff Zemerick
How do I configure the database properties for the ExecuteSQL processor in
MiNiFi? I created my flow in NiFi and used the tool to convert it for
MiNiFi before I saw in the documentation that controller services are not
supported by MiNiFi.

Thanks for any pointers!

Jeff


Re: How can datetime to month conversion failed in french language?

2017-04-12 Thread Jeff
Prabhu,

Are you saying that the data you're working with contains the English names
for the months?

On Wed, Apr 12, 2017 at 6:03 AM prabhu Mahendran <prabhuu161...@gmail.com>
wrote:

> output of the breakdown of the functions is 'Mai'.But in my local file
> contains 'May'. while processing 'May'(English) could be converted as
> 'Mai'(French).
>
> Is there is any expression language to convert French language into
> English?
>
> On Mon, Apr 10, 2017 at 8:02 PM, prabhu Mahendran <prabhuu161...@gmail.com
> > wrote:
>
> I have store that result in another attribute using updateAttribute
> processor.
>
> While incoming flowfiles into updateAttribute processor i have faced that
> error.
> On 10-Apr-2017 6:52 PM, "Andre" <andre-li...@fucs.org> wrote:
>
> Prabhu,
>
> Thanks for the breakdown of the functions but what does
> *${input.4:substringBefore('-'):toDate('MMM')}* output? :-)
>
> May? Mai? something else?
>
> Cheers
>
> On Mon, Apr 10, 2017 at 10:39 PM, prabhu Mahendran <
> prabhuu161...@gmail.com> wrote:
>
> Andre,
>
>
>
>
>
>
>
> *1,12990,Mahe,May-17input.1->1input.2->12990input.3->Mahe.input.4->May-17*
> *${input.4:substringBefore('-'):toDate('MMM')}*
>
> substringBefore('-')--> get the string portion before(' - ') symbol.It
> returns result 'May'.
>
> toDate('MMM')-->Converts string into Month format.
>
> format('MM')-->It converts 'May' into Number like if Jan it should be 01
> then 'May it should be 05.
>
> Here i have convert month into number.
>
> Cheers
>
> On Mon, Apr 10, 2017 at 5:58 PM, Andre <andre-li...@fucs.org> wrote:
>
> Prabhu,
>
> What is the output of *${input.4:substringBefore('-'):toDate('MMM')} *?
>
> Cheers
>
> On Mon, Apr 10, 2017 at 3:15 PM, prabhu Mahendran <prabhuu161...@gmail.com
> > wrote:
>
> Jeff,
>
> My actual data is in English(US).
>
> consider sample data,
>
>
> *1,12990,Mahe,May-17*
> In this line i have get "May-17" and split it as 'May' and '17'.
>
> Using below expression language..,
>
>
>
> *${input.4:substringBefore('-'):toDate('MMM'):format('MM')}*In above
> query it could convert 'May' into '05' value.
>
> That can be work in my windows (English(US)).
>
> That Same query not work in French OS windows(French(Swiss)).
>
> It shows below error.
>
> *org.apache.nifi.expression.language.exception.IllegalAttributeExpression:Cannot
> parse attribute value as date:dateformat:MMM;attribute value:Mai*
>
> In that exception it shows attribute value is 'Mai'.Those value is in
> 'French' but i had given my data is in 'May' [english only].
>
> Can you suggest way to avoid this exception?
>
>
> On Fri, Apr 7, 2017 at 11:40 PM, Jeff <jtsw...@gmail.com> wrote:
>
> Prabhu,
>
> I'll have to try this in NiFi myself.  I'll let you know what I find.
> What is the result of the EL you're using when you are trying it with
> French?
>
> On Fri, Apr 7, 2017 at 1:03 AM prabhu Mahendran <prabhuu161...@gmail.com>
> wrote:
>
> jeff,
>
> Thanks for your reply.
>
> Attribute 'ds' having the '07/04/2017'.
>
> And  convert that into month using UpdateAttribute.
>
> ${ds:toDate('dd/MM/'):format('MMM')}.
>
> if i use that code in windows having language English(India) then it
> worked.
>
> If i use that code in windows having language French(OS) it couldn't work.
>
> Can you suggest any way to solve that problem?
>
> On Fri, Apr 7, 2017 at 1:28 AM, Jeff <jtsw...@gmail.com> wrote:
>
> What is the expression language statement that you're attempting to use?
>
> On Thu, Apr 6, 2017 at 3:12 AM prabhu Mahendran <prabhuu161...@gmail.com>
> wrote:
>
> In NiFi How JVM Check language of machine?
>
> is that take any default language like English(US) else System DateTime
> Selected language?
>
> I face issue while converting datetime format into Month using expression
> language with NiFi package installed with French OS.
>
> But it worked in English(US) Selected language.
>
> Can anyone help me to resolve this?
>
>
>
>
>
>
>
>


Re: MaxFileSize in timeBasedFileNamingAndTriggeringPolicy not work?

2017-04-12 Thread Jeff
Prabhu,

You're right, NiFi has been using logback 1.1.3 for quite a while now.
Looks like we need to investigate upgrading.

I created a JIRA to address this issue:
https://issues.apache.org/jira/browse/NIFI-3699

On Wed, Apr 12, 2017 at 7:25 AM prabhu Mahendran <prabhuu161...@gmail.com>
wrote:

> For your information,I have try that same procedure in NiFi-1.1.1 also.
>
> In that also NiFi have logback-classic-1.1.3 and logback-core-1.1.3 jar
> only present.
>
> In conf\logback.xml  MaxFileSize and maxHistory in
> *TimeBasedRollingPolicy*  not work.
>
>
>
> On Wed, Apr 12, 2017 at 3:58 PM, prabhu Mahendran <prabhuu161...@gmail.com
> > wrote:
>
> Jeff,
>
> Thanks for your mail.
>
> I need to use log back 1.1.7 in NiFi-0.6.1 .
>
> Can you suggest any way to change 1.1.3 into 1.1.7?
>
> On Wed, Apr 12, 2017 at 9:42 AM, Jeff <jtsw...@gmail.com> wrote:
>
> Hello Prabhu,
>
> I think you're running into a logback bug [1] that is fixed with 1.1.7.
> Unfortunately, it looks like NiFi 0.6.1 is using logback 1.1.3.
>
> [1] https://jira.qos.ch/browse/LOGBACK-747
>
> On Tue, Apr 11, 2017 at 8:37 AM prabhu Mahendran <prabhuu161...@gmail.com>
> wrote:
>
> In NiFi-0.6.1 ,i have try to reduce size of nifi-app.log to be stored in
> local directory.
>
> In that conf\logback.xml i have configured "MaxFileSize" to be 1MB.I think
> this only stores nifi-app.log should be under 1 MB Size only.But it doesn't
> do like that.It always store every logs.
>
>  class="ch.qos.logback.core.rolling.RollingFileAppender">
> logs/nifi-app.log
>  class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
> 
> 
> ./logs/nifi-app_%d{-MM-dd_HH}.%i.log
>  class="ch.qos.logback.core.rolling.SizeAndTimeBasedFNATP">
> 1MB
> 
> 
> 1
> 
> 
> %date %level [%thread] %logger{40} %msg%n
> true
> 
> 
>
> Now i need to set 1MB for size of an nifi-app.log.
>
> *How to set size for nifi-app.log?*
>
>
>
>


Re: MaxFileSize in timeBasedFileNamingAndTriggeringPolicy not work?

2017-04-11 Thread Jeff
Hello Prabhu,

I think you're running into a logback bug [1] that is fixed with 1.1.7.
Unfortunately, it looks like NiFi 0.6.1 is using logback 1.1.3.

[1] https://jira.qos.ch/browse/LOGBACK-747

On Tue, Apr 11, 2017 at 8:37 AM prabhu Mahendran 
wrote:

> In NiFi-0.6.1 ,i have try to reduce size of nifi-app.log to be stored in
> local directory.
>
> In that conf\logback.xml i have configured "MaxFileSize" to be 1MB.I think
> this only stores nifi-app.log should be under 1 MB Size only.But it doesn't
> do like that.It always store every logs.
>
>  class="ch.qos.logback.core.rolling.RollingFileAppender">
> logs/nifi-app.log
>  class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
> 
> 
> ./logs/nifi-app_%d{-MM-dd_HH}.%i.log
>  class="ch.qos.logback.core.rolling.SizeAndTimeBasedFNATP">
> 1MB
> 
> 
> 1
> 
> 
> %date %level [%thread] %logger{40} %msg%n
> true
> 
> 
>
> Now i need to set 1MB for size of an nifi-app.log.
>
> *How to set size for nifi-app.log?*
>


Re: Is it possible to dynamically spawn a Processor Group ?

2017-04-10 Thread Jeff
Stephen,

Glad it has been narrowed down for you!

One other thing to try is adjusting "Run Duration" under the "Scheduling"
tab in your processor, if it supports it (I believe @SupportsBatching
enables this).  Increasing this value should result in higher throughput
for your processor, but flowfiles may be delayed a bit before they reach
the downstream processors since a batch needs to be completed before the
flowfiles are available to those processors.

On Sat, Apr 8, 2017 at 5:08 AM Stephen-Talk <stephen.schem...@talktalk.net>
wrote:

> Jeff,
>
> You hit the nail on the head.
>
> My "Concurrent Tasks" is set to 1.
>
> I shall have a fiddle with the numbers for both the threads, which is
> set to 10 and the concurrent tasks, and see if it helps.
>
> Thanks for your valuable assitance.
>
> Have a great weekend...
> Stephen
>
> On 07/04/2017 18:58, Jeff wrote:
> > Stephen,
> >
> > It would be good to see screenshots of your flow, and get a little more
> > information about your NiFi installation to help you get better
> > throughput of your data.  Are you running NiFi as a single node?  At
> > which processor in your flow are you noticing the queues backing up?  In
> > the global settings menu, under "Controller Settings", what is "Maximum
> > Timer Driven Thread Count" set to?  On your "MyImportProcessor" config,
> > on the "Scheduling" tab, what is "Concurrent Tasks" set to?
> >
> > In general terms, having more threads available to a processor means
> > you'll get greater throughput of the data, provided that your IO
> > configuration (disk read/write speed) can keep up.  The number of
> > threads that NiFi is configured to use are made available to processors
> > as flowfiles are presented to a processor via an incoming queue based on
> > the number of concurrent tasks for which a processor is configured.
> >
> > In the UI, you can see how many tasks are currently being executed by
> > each processor, which will never be more than the "Maximum Timer Driven
> > Thread Count" (for processors configured to use timer-based scheduling).
> >
> > If you are experience backpressure on the incoming queue for
> > "MyImportProcessor", try increasing the number of "Concurrent Tasks"
> > available to that processor, and you may also want to increase the
> > number of "Maximum Timer Driven Tread Count".
> >
> > These are just some of the basics of getting more throughput in NiFi.
> >
> > On Thu, Apr 6, 2017 at 4:25 PM Stephen-Talk
> > <stephen.schem...@talktalk.net <mailto:stephen.schem...@talktalk.net>>
> > wrote:
> >
> > Thanks for the quick reply.
> >
> > Yes, that is quite correct.
> > The scenario is the following:
> >
> > The input flow is a "GetFile" process that collects csv files
> > (>100,000 lines) which in turn queues the file and parses each line
> to a
> > locally built processor (MyImportProcessor say) that submits them via
> > the REST API to a Drupal website.
> > The process works fine, but it is very slow, and would like to speed
> it
> > up by splitting the csv file into chunks so that it can then spawn
> > "MyImportProcessor" as many times as required.
> >
> >
> > On 06/04/2017 20:47, Jeff wrote:
> > > Hello Stephen,
> > >
> > > It's possible to watch the status of NiFi, and upon observing a
> > > particular status in which you're interested, you can use the REST
> API
> > > to create new processor groups.  You'd also have to populate that
> > > processor group with processors and other components.  Based on the
> > > scenario you mentioned, though, it sounds like you are looking at
> > being
> > > able to scale up available processing (via more concurrent
> threads, or
> > > more nodes in a cluster) once a certain amount of data is queued
> > up and
> > > waiting to be processed, rather than adding components to the
> existing
> > > flow.  Is that correct?
> > >
> > > On Thu, Apr 6, 2017 at 3:30 PM Stephen-Talk
> > > <stephen.schem...@talktalk.net
> > <mailto:stephen.schem...@talktalk.net>
> > <mailto:stephen.schem...@talktalk.net
> > <mailto:stephen.schem...@talktalk.net>>>
> > > wrote:
> > >
> > > Hi, I am just a Nifi Inquisitor,
> > >
> > > Is it, or could it be possible to Dynamically spawn a
> > "Processor Group"
> > > when the input flow reaches a certain threshold.
> > >
> > > Thanking you in aniticipation.
> > > Stephen
> > >
> >
>


Re: Latency v Throughput

2017-04-07 Thread Jeff
It looks like the way I think about it might be a bit off base. :)

On Fri, Apr 7, 2017 at 2:31 PM Joe Witt <joe.w...@gmail.com> wrote:

> The concept of run duration there is one of the ways we allow users to
> hint to the framework what their preference is.  In general all users
> want the thing to 'go fast'.  But what 'fast' means for you is
> throughput and what fast means for someone else is low latency.
>
> What this really means under the covers at this point is that for
> processors which are willing to delegate the responsibility of 'when
> to commit what they've done in a transactional sense' to the framework
> then the framework can use that knowledge to automatically combine one
> or more transactions into a single transaction.  This has the effect
> of trading off some very small latency for what is arguably higher
> throughput because what that means is we can do a single write to our
> flowfile repository instead of many.  This reduces burden on various
> locks, the file system/interrupts, etc..  It is in general just a bit
> more friendly and does indeed have the effect of higher throughput.
>
> Now, with regard to what should be the default value we cannot really
> know whether one prefers, generically speaking, to have the system
> operate more latency sensitive or more throughput sensitive.  Further,
> it isn't really that tight of a relationship.  Also, consider that in
> a given NiFi cluster it can have and handle flows from numerous teams
> and organizations at the same time.  Each with its own needs and
> interests and preferences.  So, we allow it to be selected.
>
> As to the question about some processors supporting it and some not
> the reason for this is simply that sometimes the processor cannot and
> is not willing to let the framework choose when to commit the session.
> Why?  Because they might have operations which are not 'side effect
> free' meaning once they've done something the environment has been
> altered in ways that cannot be recovered from.  Take for example a
> processor which sends data via SFTP.  Once a given file is sent we
> cannot 'unsend it' nor can we simply repeat that process without a
> side effect.  By allowing the framework to handle it for the processor
> the point is that the operation can be easily undone/redone within the
> confines of NiFi and not have changed some external system state.  So,
> this is a really important thing to appreciate.
>
> Thanks
> Joe
>
> On Fri, Apr 7, 2017 at 2:18 PM, Jeff <jtsw...@gmail.com> wrote:
> > James,
> >
> > The way I look at it (abstractly speaking) is that the slider represents
> how
> > long a processor will be able to use a thread to work on flowfiles (from
> its
> > inbound queue, allowing onTrigger to run more times to generate more
> > outbound flowfiles, etc).  Moving that slider towards higher throughput,
> the
> > processor will do more work, but will hog that thread for a longer
> period of
> > time before another processor can use it.  So, overall latency could go
> > down, because flowfiles will sit in other queues for possibly longer
> periods
> > of time before another processor gets a thread to start doing work, but
> that
> > particular processor will probably see higher throughput.
> >
> > That's in pretty general terms, though.
> >
> > On Fri, Apr 7, 2017 at 9:49 AM James McMahon <jsmcmah...@gmail.com>
> wrote:
> >>
> >> I see that some processors provide a slider to set a balance between
> >> Latency and Throughput. Not all processors provide this, but some do.
> They
> >> seem to be inversely related.
> >>
> >> I also notice that the default appears to be Lower latency, implying
> also
> >> lower throughput. Why is that the default? I would think that being a
> >> workflow, maximizing throughput would be the ultimate goal. Yet it seems
> >> that the processors opt for defaults to lowest latency, lowest
> throughput.
> >>
> >> What is the relationship between Latency and Throughput? Do most folks
> in
> >> the user group typically go in and change that to Highest on
> throughput? Is
> >> that something to avoid because of demands on CPU, RAM, and disk IO?
> >>
> >> Thanks very much. -Jim
>


Re: Latency v Throughput

2017-04-07 Thread Jeff
James,

The way I look at it (abstractly speaking) is that the slider represents
how long a processor will be able to use a thread to work on flowfiles
(from its inbound queue, allowing onTrigger to run more times to generate
more outbound flowfiles, etc).  Moving that slider towards higher
throughput, the processor will do more work, but will hog that thread for a
longer period of time before another processor can use it.  So, overall
latency could go down, because flowfiles will sit in other queues for
possibly longer periods of time before another processor gets a thread to
start doing work, but that particular processor will probably see higher
throughput.

That's in pretty general terms, though.

On Fri, Apr 7, 2017 at 9:49 AM James McMahon  wrote:

> I see that some processors provide a slider to set a balance between
> Latency and Throughput. Not all processors provide this, but some do. They
> seem to be inversely related.
>
> I also notice that the default appears to be Lower latency, implying also
> lower throughput. Why is that the default? I would think that being a
> workflow, maximizing throughput would be the ultimate goal. Yet it seems
> that the processors opt for defaults to lowest latency, lowest throughput.
>
> What is the relationship between Latency and Throughput? Do most folks in
> the user group typically go in and change that to Highest on throughput? Is
> that something to avoid because of demands on CPU, RAM, and disk IO?
>
> Thanks very much. -Jim
>


Re: How can datetime to month conversion failed in french language?

2017-04-07 Thread Jeff
Prabhu,

I'll have to try this in NiFi myself.  I'll let you know what I find.  What
is the result of the EL you're using when you are trying it with French?

On Fri, Apr 7, 2017 at 1:03 AM prabhu Mahendran <prabhuu161...@gmail.com>
wrote:

> jeff,
>
> Thanks for your reply.
>
> Attribute 'ds' having the '07/04/2017'.
>
> And  convert that into month using UpdateAttribute.
>
> ${ds:toDate('dd/MM/'):format('MMM')}.
>
> if i use that code in windows having language English(India) then it
> worked.
>
> If i use that code in windows having language French(OS) it couldn't work.
>
> Can you suggest any way to solve that problem?
>
> On Fri, Apr 7, 2017 at 1:28 AM, Jeff <jtsw...@gmail.com> wrote:
>
> What is the expression language statement that you're attempting to use?
>
> On Thu, Apr 6, 2017 at 3:12 AM prabhu Mahendran <prabhuu161...@gmail.com>
> wrote:
>
> In NiFi How JVM Check language of machine?
>
> is that take any default language like English(US) else System DateTime
> Selected language?
>
> I face issue while converting datetime format into Month using expression
> language with NiFi package installed with French OS.
>
> But it worked in English(US) Selected language.
>
> Can anyone help me to resolve this?
>
>
>


Re: Is it possible to dynamically spawn a Processor Group ?

2017-04-07 Thread Jeff
Stephen,

It would be good to see screenshots of your flow, and get a little more
information about your NiFi installation to help you get better throughput
of your data.  Are you running NiFi as a single node?  At which processor
in your flow are you noticing the queues backing up?  In the global
settings menu, under "Controller Settings", what is "Maximum Timer Driven
Thread Count" set to?  On your "MyImportProcessor" config, on the
"Scheduling" tab, what is "Concurrent Tasks" set to?

In general terms, having more threads available to a processor means you'll
get greater throughput of the data, provided that your IO configuration
(disk read/write speed) can keep up.  The number of threads that NiFi is
configured to use are made available to processors as flowfiles are
presented to a processor via an incoming queue based on the number of
concurrent tasks for which a processor is configured.

In the UI, you can see how many tasks are currently being executed by each
processor, which will never be more than the "Maximum Timer Driven Thread
Count" (for processors configured to use timer-based scheduling).

If you are experience backpressure on the incoming queue for
"MyImportProcessor", try increasing the number of "Concurrent Tasks"
available to that processor, and you may also want to increase the number
of "Maximum Timer Driven Tread Count".

These are just some of the basics of getting more throughput in NiFi.

On Thu, Apr 6, 2017 at 4:25 PM Stephen-Talk <stephen.schem...@talktalk.net>
wrote:

> Thanks for the quick reply.
>
> Yes, that is quite correct.
> The scenario is the following:
>
> The input flow is a "GetFile" process that collects csv files
> (>100,000 lines) which in turn queues the file and parses each line to a
> locally built processor (MyImportProcessor say) that submits them via
> the REST API to a Drupal website.
> The process works fine, but it is very slow, and would like to speed it
> up by splitting the csv file into chunks so that it can then spawn
> "MyImportProcessor" as many times as required.
>
>
> On 06/04/2017 20:47, Jeff wrote:
> > Hello Stephen,
> >
> > It's possible to watch the status of NiFi, and upon observing a
> > particular status in which you're interested, you can use the REST API
> > to create new processor groups.  You'd also have to populate that
> > processor group with processors and other components.  Based on the
> > scenario you mentioned, though, it sounds like you are looking at being
> > able to scale up available processing (via more concurrent threads, or
> > more nodes in a cluster) once a certain amount of data is queued up and
> > waiting to be processed, rather than adding components to the existing
> > flow.  Is that correct?
> >
> > On Thu, Apr 6, 2017 at 3:30 PM Stephen-Talk
> > <stephen.schem...@talktalk.net <mailto:stephen.schem...@talktalk.net>>
> > wrote:
> >
> > Hi, I am just a Nifi Inquisitor,
> >
> > Is it, or could it be possible to Dynamically spawn a "Processor
> Group"
> > when the input flow reaches a certain threshold.
> >
> > Thanking you in aniticipation.
> > Stephen
> >
>


Re: How can datetime to month conversion failed in french language?

2017-04-06 Thread Jeff
What is the expression language statement that you're attempting to use?

On Thu, Apr 6, 2017 at 3:12 AM prabhu Mahendran 
wrote:

> In NiFi How JVM Check language of machine?
>
> is that take any default language like English(US) else System DateTime
> Selected language?
>
> I face issue while converting datetime format into Month using expression
> language with NiFi package installed with French OS.
>
> But it worked in English(US) Selected language.
>
> Can anyone help me to resolve this?
>


Re: Is it possible to dynamically spawn a Processor Group ?

2017-04-06 Thread Jeff
Hello Stephen,

It's possible to watch the status of NiFi, and upon observing a particular
status in which you're interested, you can use the REST API to create new
processor groups.  You'd also have to populate that processor group with
processors and other components.  Based on the scenario you mentioned,
though, it sounds like you are looking at being able to scale up available
processing (via more concurrent threads, or more nodes in a cluster) once a
certain amount of data is queued up and waiting to be processed, rather
than adding components to the existing flow.  Is that correct?

On Thu, Apr 6, 2017 at 3:30 PM Stephen-Talk 
wrote:

> Hi, I am just a Nifi Inquisitor,
>
> Is it, or could it be possible to Dynamically spawn a "Processor Group"
> when the input flow reaches a certain threshold.
>
> Thanking you in aniticipation.
> Stephen
>


Re: What happens if Nifi JVM die? is there any way to monitor and restart it?

2017-01-08 Thread Jeff
Hello,

The NiFi System Administrator Guide [1] has a section on Bootstrap
Properties [2], through which you can set up a Notification Service [3] for
when NiFi dies.  At this time, the only Notification Service provided by
NiFi is an email notification service, as noted in the guide.

The Boostrap process itself should restart NiFi automatically when it
detects that NiFi has died.

[1] https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html
[2]
https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#bootstrap_properties
[3]
https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#notification_services

On Sun, Jan 8, 2017 at 8:27 AM kant kodali  wrote:

> What happens if Nifi JVM die? is there any way to monitor and restart it?
> other words is there some sort of resource management?
>
> Thanks!
>


Re: Wait for child file to process successfully

2016-12-24 Thread Jeff
Brian,

Take a look at these two JIRAs [1] [2], specifically the first, which
should be released in 1.2.0. The second is mostly an FYI so that you are
aware of some of the other aggregation capabilities in NiFi. It has not yet
been merged to master.

There is a way to implement a flow that would handle this scneario before
1.2.0 is released.  You could stash the file in a temp/staging directory,
while creating an attribute with the original filename with
UpdateAttribute, perform your splits on the original CSV file, transform
each split to AVRO, post the AVRO FFs to your web service, and then route
those AVRO FFs to MergeContent.  When that merge completes, you should
still be able to access the original filename attribute (since it's a
common attribute on all the split FFs and will be retained if you set
MergeContent to keep common attributes), and you could use FetchFile to
retrieve the file from the temp/staging dir and delete it (which can be
handled by FetchFile itself) or do further processing on it.

[1] https://issues.apache.org/jira/browse/NIFI-190
<https://issues.apache.org/jira/browse/NIFI-2735>
[2] https://issues.apache.org/jira/browse/NIFI-2735


On Fri, Dec 23, 2016 at 3:33 AM BD International <
b.deep.internatio...@gmail.com> wrote:

> Jeff,
>
> Thanks for that just tried it out and it works perfectly!
>
> On a similar topic I have a flow which picks up a CSV and turns each row
> into an AVRO object and posts that to a web service I've setup. I would
> like to do something similar where I dont delete the original CSV file
> until i have successfully posted all the avro objects.
>
> I would prefer to handle this within nifi but cant seem to see work out
> how without writing custom code.
>
> Thanks
>
> Brian
>
> On 22 Dec 2016 19:04, "Jeff" <jtsw...@gmail.com> wrote:
>
> Brian,
>
> You can use MergeContent in Defragment mode.  Just be sure to set the
> number of bins used by MergeContent equal to or greater than the number of
> concurrent merges you expect to have going on in your flow, and to route
> successfully processed and failed flowfiles (after they've been gracefully
> handled, however it suits your use case) to the MergeContent processor.  If
> a fragment (one of the child flowfiles) is not sent to MergeContent, it
> will never be able to complete the defragmentation since MergeContent would
> not have received all the fragments.
>
> UnpackContent keeps track of the "batch" of files that are unpacked from
> the original archive by assigning to each child flowfile a set of fragment
> attributes that provide an ID to correlate merging (defragmenting in this
> case), the total number of fragments, and the fragment index.
>
> After the merge is complete, you'll have a recreation of the original zip
> file, and it signifies that all the child flowfiles have completed
> processing.
>
> - Jeff
>
> On Thu, Dec 22, 2016 at 12:29 PM BD International <
> b.deep.internatio...@gmail.com> wrote:
>
> Hello,
>
> I've got a data flow which picks up a zip file and uses UnpackContent to
> extract the contents. The subsequent files are them converted to json and
> stored in a database.
>
> I would like to store the original zip file and only delete the file once
> all the extracted files have been stored correctly, has anyone else come
> across a way to do this?
>
> Thanks in advance,
>
> Brian
>
>


Re: Wait for child file to process successfully

2016-12-22 Thread Jeff
Brian,

You can use MergeContent in Defragment mode.  Just be sure to set the
number of bins used by MergeContent equal to or greater than the number of
concurrent merges you expect to have going on in your flow, and to route
successfully processed and failed flowfiles (after they've been gracefully
handled, however it suits your use case) to the MergeContent processor.  If
a fragment (one of the child flowfiles) is not sent to MergeContent, it
will never be able to complete the defragmentation since MergeContent would
not have received all the fragments.

UnpackContent keeps track of the "batch" of files that are unpacked from
the original archive by assigning to each child flowfile a set of fragment
attributes that provide an ID to correlate merging (defragmenting in this
case), the total number of fragments, and the fragment index.

After the merge is complete, you'll have a recreation of the original zip
file, and it signifies that all the child flowfiles have completed
processing.

- Jeff

On Thu, Dec 22, 2016 at 12:29 PM BD International <
b.deep.internatio...@gmail.com> wrote:

> Hello,
>
> I've got a data flow which picks up a zip file and uses UnpackContent to
> extract the contents. The subsequent files are them converted to json and
> stored in a database.
>
> I would like to store the original zip file and only delete the file once
> all the extracted files have been stored correctly, has anyone else come
> across a way to do this?
>
> Thanks in advance,
>
> Brian
>


Re: Load-balancing web api in cluster

2016-12-20 Thread Jeff
Greg,

Again, I have to apologize.  You're right, the host:port in ZK are for the
cluster, not the NiFi UI.  Also, I was told those nodes in ZK are being
created by Curator, NiFi isn't explicitly creating them, so I'd be hesitant
to rely on that information.  The cluster node protocol is
production-stable, in my opinion, but it's not part of the public API.

I created a feature request JIRA to add the hostnames and UI ports of the
nodes in a NiFi cluster [1].

[1] https://issues.apache.org/jira/browse/NIFI-3237

On Tue, Dec 20, 2016 at 2:18 PM Hart, Greg <greg.h...@thinkbiganalytics.com>
wrote:

> Hi Jeff,
>
> I saw this and looked into it. The data in those nodes are
> the nifi.cluster.node.address and nifi.cluster.node.protocol.port values.
> In order to get the nifi.web.http.host and nifi.web.http.port values, it
> seems I would have to connect first using the cluster node protocol and
> pretend to be a NiFi node so that I can query the cluster coordinator for
> the list of NodeIdentifier objects. Is this cluster node protocol stable
> enough to use in a production application? It doesn’t seem to be documented
> anywhere so I was assuming it may change in a minor release without much
> notice.
>
> Thanks!
> -Greg
>
> From: Jeff <jtsw...@gmail.com>
> Reply-To: "users@nifi.apache.org" <users@nifi.apache.org>
> Date: Tuesday, December 20, 2016 at 11:10 AM
>
> To: "users@nifi.apache.org" <users@nifi.apache.org>
> Subject: Re: Load-balancing web api in cluster
>
> Greg,
>
> That first statement in my previous email should read "which nodes can be
> the primary or cluster coordinator".  I apologize for any confusion!
>
> - Jeff
>
> On Tue, Dec 20, 2016 at 2:04 PM Jeff <jtsw...@gmail.com> wrote:
>
> Greg,
>
> NiFi does store which nodes are the primary and coordinator.  Relevant
> nodes in ZK are (for instance, in a cluster I'm running locally):
> /nifi/leaders/Primary
> Node/_c_c94f1eb8-e5ac-443c-9643-2668b6f685b2-lock-000553,
> /nifi/leaders/Primary
> Node/_c_7cd14bd5-85f5-4ea9-b849-121496269ef4-lock-000554,
> /nifi/leaders/Primary
> Node/_c_99b79311-495f-4619-b316-9e842d445a8d-lock-000552,
> /nifi/leaders/Cluster
> Coordinator/_c_dc449a75-1a14-42d6-98ab-2cef3e74d616-lock-005967,
> /nifi/leaders/Cluster
> Coordinator/_c_2fbc68df-c9cd-4ecd-99d2-234b7b801110-lock-005966,
> /nifi/leaders/Cluster
> Coordinator/_c_a2b9c2be-c0fd-4bf7-a479-e011a7792fc3-lock-005968
>
> The data on each of these nodes should have the host:port.  These are the
> candidate nodes for being elected the Primary or Cluster Coordinator.  I
> don't think that the current active Primary and Cluster Coordinator is
> stored in ZK, just the nodes that are candidates to fulfill those roles.
> I'll have to get back to you on that for sure, though.
>
> - Jeff
>
> On Tue, Dec 20, 2016 at 1:45 PM Hart, Greg <
> greg.h...@thinkbiganalytics.com> wrote:
>
> Hi Jeff,
>
> My application communicates with the NiFi REST API to import templates,
> instantiate flows from templates, edit processor properties, and a few
> other things. I’m currently using Jersey to send calls to one NiFi node but
> if that node goes down then my application has to be manually reconfigured
> with the hostname and port of another NiFi node. HAProxy would handle
> failover but it still must be manually reconfigured when a NiFi node is
> added or removed from the cluster.
>
> I was hoping that NiFi would use ZooKeeper similarly to other applications
> (Hive or HBase) where a client can easily get the hostname and port of the
> cluster coordinator (or active master). Unfortunately, the information in
> ZooKeeper does not include the value of nifi.rest.http.host and
> nifi.rest.http.port of any NiFi nodes.
>
> It sounds like HAProxy might be the better solution for now. Luckily,
> adding or removing nodes from a cluster shouldn’t be a daily occurrence. If
> you have any other ideas please let me know.
>
> Thanks!
> -Greg
>
> From: Jeff <jtsw...@gmail.com>
> Reply-To: "users@nifi.apache.org" <users@nifi.apache.org>
> Date: Tuesday, December 20, 2016 at 8:56 AM
> To: "users@nifi.apache.org" <users@nifi.apache.org>
> Subject: Re: Load-balancing web api in cluster
>
> Hello Greg,
>
> You can use the REST API on any of the nodes in the cluster.  Could you
> provide more details on what you're trying to accomplish?  If, for
> instance, you are posting data to a ListenHTTP processor and you want to
> balance POSTs across the instances of ListenHTTP on your cluster, then
> haproxy would probably be a good idea.  If you're trying to distribute the
> processing load once the data i

Re: Load-balancing web api in cluster

2016-12-20 Thread Jeff
Greg,

That first statement in my previous email should read "which nodes can be
the primary or cluster coordinator".  I apologize for any confusion!

- Jeff

On Tue, Dec 20, 2016 at 2:04 PM Jeff <jtsw...@gmail.com> wrote:

> Greg,
>
> NiFi does store which nodes are the primary and coordinator.  Relevant
> nodes in ZK are (for instance, in a cluster I'm running locally):
> /nifi/leaders/Primary
> Node/_c_c94f1eb8-e5ac-443c-9643-2668b6f685b2-lock-000553,
> /nifi/leaders/Primary
> Node/_c_7cd14bd5-85f5-4ea9-b849-121496269ef4-lock-000554,
> /nifi/leaders/Primary
> Node/_c_99b79311-495f-4619-b316-9e842d445a8d-lock-000552,
> /nifi/leaders/Cluster
> Coordinator/_c_dc449a75-1a14-42d6-98ab-2cef3e74d616-lock-005967,
> /nifi/leaders/Cluster
> Coordinator/_c_2fbc68df-c9cd-4ecd-99d2-234b7b801110-lock-005966,
> /nifi/leaders/Cluster
> Coordinator/_c_a2b9c2be-c0fd-4bf7-a479-e011a7792fc3-lock-005968
>
> The data on each of these nodes should have the host:port.  These are the
> candidate nodes for being elected the Primary or Cluster Coordinator.  I
> don't think that the current active Primary and Cluster Coordinator is
> stored in ZK, just the nodes that are candidates to fulfill those roles.
> I'll have to get back to you on that for sure, though.
>
> - Jeff
>
> On Tue, Dec 20, 2016 at 1:45 PM Hart, Greg <
> greg.h...@thinkbiganalytics.com> wrote:
>
> Hi Jeff,
>
> My application communicates with the NiFi REST API to import templates,
> instantiate flows from templates, edit processor properties, and a few
> other things. I’m currently using Jersey to send calls to one NiFi node but
> if that node goes down then my application has to be manually reconfigured
> with the hostname and port of another NiFi node. HAProxy would handle
> failover but it still must be manually reconfigured when a NiFi node is
> added or removed from the cluster.
>
> I was hoping that NiFi would use ZooKeeper similarly to other applications
> (Hive or HBase) where a client can easily get the hostname and port of the
> cluster coordinator (or active master). Unfortunately, the information in
> ZooKeeper does not include the value of nifi.rest.http.host and
> nifi.rest.http.port of any NiFi nodes.
>
> It sounds like HAProxy might be the better solution for now. Luckily,
> adding or removing nodes from a cluster shouldn’t be a daily occurrence. If
> you have any other ideas please let me know.
>
> Thanks!
> -Greg
>
> From: Jeff <jtsw...@gmail.com>
> Reply-To: "users@nifi.apache.org" <users@nifi.apache.org>
> Date: Tuesday, December 20, 2016 at 8:56 AM
> To: "users@nifi.apache.org" <users@nifi.apache.org>
> Subject: Re: Load-balancing web api in cluster
>
> Hello Greg,
>
> You can use the REST API on any of the nodes in the cluster.  Could you
> provide more details on what you're trying to accomplish?  If, for
> instance, you are posting data to a ListenHTTP processor and you want to
> balance POSTs across the instances of ListenHTTP on your cluster, then
> haproxy would probably be a good idea.  If you're trying to distribute the
> processing load once the data is received, you can use a Remote Process
> Group to distribute the data across the cluster.  Pierre Villard has
> written a nice blog about setting up a cluster and configuring a flow using
> a Remote Process Group to distribute the processing load [1].  It details
> creating a Remote Process Group to send data back to an Input Port in the
> same NiFi cluster, and allows NiFi to distribute the processing load across
> all the nodes in your cluster.
>
> You can use a combination of haproxy and Remote Process Group to load
> balance connections to the REST API on each NiFi node and to balance the
> processing load across the cluster.
>
> [1] https://pierrevillard.com/2016/08/13/apache-nifi-1-0-0-cluster-setup/
>
> - Jeff
>
> On Mon, Dec 19, 2016 at 9:25 PM Hart, Greg <
> greg.h...@thinkbiganalytics.com> wrote:
>
> Hi all,
>
> What¹s the recommended way for communicating with the NiFi REST API in a
> cluster? I see that NiFi uses ZooKeeper so is it possible to get the
> Cluster Coordinator hostname and API port from ZooKeeper, or should I use
> something like haproxy?
>
> Thanks!
> -Greg
>
>


Re: Load-balancing web api in cluster

2016-12-20 Thread Jeff
Greg,

NiFi does store which nodes are the primary and coordinator.  Relevant
nodes in ZK are (for instance, in a cluster I'm running locally):
/nifi/leaders/Primary
Node/_c_c94f1eb8-e5ac-443c-9643-2668b6f685b2-lock-000553,
/nifi/leaders/Primary
Node/_c_7cd14bd5-85f5-4ea9-b849-121496269ef4-lock-000554,
/nifi/leaders/Primary
Node/_c_99b79311-495f-4619-b316-9e842d445a8d-lock-000552,
/nifi/leaders/Cluster
Coordinator/_c_dc449a75-1a14-42d6-98ab-2cef3e74d616-lock-005967,
/nifi/leaders/Cluster
Coordinator/_c_2fbc68df-c9cd-4ecd-99d2-234b7b801110-lock-005966,
/nifi/leaders/Cluster
Coordinator/_c_a2b9c2be-c0fd-4bf7-a479-e011a7792fc3-lock-005968

The data on each of these nodes should have the host:port.  These are the
candidate nodes for being elected the Primary or Cluster Coordinator.  I
don't think that the current active Primary and Cluster Coordinator is
stored in ZK, just the nodes that are candidates to fulfill those roles.
I'll have to get back to you on that for sure, though.

- Jeff

On Tue, Dec 20, 2016 at 1:45 PM Hart, Greg <greg.h...@thinkbiganalytics.com>
wrote:

> Hi Jeff,
>
> My application communicates with the NiFi REST API to import templates,
> instantiate flows from templates, edit processor properties, and a few
> other things. I’m currently using Jersey to send calls to one NiFi node but
> if that node goes down then my application has to be manually reconfigured
> with the hostname and port of another NiFi node. HAProxy would handle
> failover but it still must be manually reconfigured when a NiFi node is
> added or removed from the cluster.
>
> I was hoping that NiFi would use ZooKeeper similarly to other applications
> (Hive or HBase) where a client can easily get the hostname and port of the
> cluster coordinator (or active master). Unfortunately, the information in
> ZooKeeper does not include the value of nifi.rest.http.host and
> nifi.rest.http.port of any NiFi nodes.
>
> It sounds like HAProxy might be the better solution for now. Luckily,
> adding or removing nodes from a cluster shouldn’t be a daily occurrence. If
> you have any other ideas please let me know.
>
> Thanks!
> -Greg
>
> From: Jeff <jtsw...@gmail.com>
> Reply-To: "users@nifi.apache.org" <users@nifi.apache.org>
> Date: Tuesday, December 20, 2016 at 8:56 AM
> To: "users@nifi.apache.org" <users@nifi.apache.org>
> Subject: Re: Load-balancing web api in cluster
>
> Hello Greg,
>
> You can use the REST API on any of the nodes in the cluster.  Could you
> provide more details on what you're trying to accomplish?  If, for
> instance, you are posting data to a ListenHTTP processor and you want to
> balance POSTs across the instances of ListenHTTP on your cluster, then
> haproxy would probably be a good idea.  If you're trying to distribute the
> processing load once the data is received, you can use a Remote Process
> Group to distribute the data across the cluster.  Pierre Villard has
> written a nice blog about setting up a cluster and configuring a flow using
> a Remote Process Group to distribute the processing load [1].  It details
> creating a Remote Process Group to send data back to an Input Port in the
> same NiFi cluster, and allows NiFi to distribute the processing load across
> all the nodes in your cluster.
>
> You can use a combination of haproxy and Remote Process Group to load
> balance connections to the REST API on each NiFi node and to balance the
> processing load across the cluster.
>
> [1] https://pierrevillard.com/2016/08/13/apache-nifi-1-0-0-cluster-setup/
>
> - Jeff
>
> On Mon, Dec 19, 2016 at 9:25 PM Hart, Greg <
> greg.h...@thinkbiganalytics.com> wrote:
>
> Hi all,
>
> What¹s the recommended way for communicating with the NiFi REST API in a
> cluster? I see that NiFi uses ZooKeeper so is it possible to get the
> Cluster Coordinator hostname and API port from ZooKeeper, or should I use
> something like haproxy?
>
> Thanks!
> -Greg
>
>


Re: Load-balancing web api in cluster

2016-12-20 Thread Jeff
Hello Greg,

You can use the REST API on any of the nodes in the cluster.  Could you
provide more details on what you're trying to accomplish?  If, for
instance, you are posting data to a ListenHTTP processor and you want to
balance POSTs across the instances of ListenHTTP on your cluster, then
haproxy would probably be a good idea.  If you're trying to distribute the
processing load once the data is received, you can use a Remote Process
Group to distribute the data across the cluster.  Pierre Villard has
written a nice blog about setting up a cluster and configuring a flow using
a Remote Process Group to distribute the processing load [1].  It details
creating a Remote Process Group to send data back to an Input Port in the
same NiFi cluster, and allows NiFi to distribute the processing load across
all the nodes in your cluster.

You can use a combination of haproxy and Remote Process Group to load
balance connections to the REST API on each NiFi node and to balance the
processing load across the cluster.

[1] https://pierrevillard.com/2016/08/13/apache-nifi-1-0-0-cluster-setup/

- Jeff

On Mon, Dec 19, 2016 at 9:25 PM Hart, Greg <greg.h...@thinkbiganalytics.com>
wrote:

> Hi all,
>
> What¹s the recommended way for communicating with the NiFi REST API in a
> cluster? I see that NiFi uses ZooKeeper so is it possible to get the
> Cluster Coordinator hostname and API port from ZooKeeper, or should I use
> something like haproxy?
>
> Thanks!
> -Greg
>
>


Re: merge flowfiles

2016-12-19 Thread Jeff
Hello Raf,

MergeContent can merge based on a correlation ID (attribute).  However, the
merging currently operates in two modes: Defragment or Bin-Packing
Algorithm.  Defragment is completed by defragmenting based on the
correlation ID and a known number of fragments.  Bin-Packing Algorithm is
completed based on a min or max age of a "bin", and/or after a certain
number of flowfiles have been received.

Based on your question, I'm assuming you will not know how many flowfiles
you'd be merging per attribute, so I'm not sure that MergeContent will work
for your use case.  Depending on how quickly you want those files merged
and sent downstream, a max bin age might work for you, though.  There is a
JIRA for implementing a more general-case aggregation processor [1].

With some more details around your scenario we might be able to figure out
how to get it to work for you with the standard processors.

[1] https://issues.apache.org/jira/browse/NIFI-1926

On Mon, Dec 19, 2016 at 10:23 AM Raf Huys  wrote:

> I want to batch incoming flowfiles based on an attribute. As soon as this
> attributes' value changes, the current batch should be transferred
> downstream and be reset. So basically I'm looking for a tumbling window.
>
> Can this be done with the MergeContent processor (which strategy?) or
> should I write my own processor?
>
>
> --
> tx
>
>


Re: Data Provenance is not available

2016-11-21 Thread Jeff
Pablo,

Also, keep in mind that policies are inherited by child components
(processors, process groups, etc), so you can enable viewing of Data
Provenance for the whole flow for a given user by adding the user to the
root group's "view the data" policy.  Unless that policy is explicitly
overridden for a child component, that user will be able to see data
provenance for all components.

On Mon, Nov 21, 2016 at 6:07 PM Jeff <jtsw...@gmail.com> wrote:

> Hi Pablo
>
> From the global policies menu, you can add the users that need to able to
> query Data Provenance to the "query provenance" policy.
>
> For any component, such as the root group, or a particular processor,
> you'll also want to add the user to "view the data" policies for the
> respective components.
>
> On Mon, Nov 21, 2016 at 5:52 PM Pablo Lopez <pablo.lo...@integrado.com.au>
> wrote:
>
> Ok. Thanks for that.
> Would you know how can I enable this again?
>
> Pablo.
>
> On Tue, Nov 22, 2016 at 6:44 AM, Andrew Grande <apere...@gmail.com> wrote:
>
> Yes.
>
> It's a combination of a generic view data permission in the global menu
> and specific access in the processing group.
>
> Andrew
>
> On Mon, Nov 21, 2016, 12:28 PM Pablo Lopez <pablo.lo...@integrado.com.au>
> wrote:
>
> Hi,
>
> Any ideas as of why Data Provenance option is grayed out (not available)
> and none of the processors show the option in the context menu? Is this
> something to do with security?
>
> Thanks,
> Pablo.
>
>
>
>
> --
> Pablo Lopez.
> Integration Architect
> Integrado Pty Ltd
> M: 044 84 52 479
> pablo.lo...@integrado.com.au
>
>


Re: Data Provenance is not available

2016-11-21 Thread Jeff
Hi Pablo

>From the global policies menu, you can add the users that need to able to
query Data Provenance to the "query provenance" policy.

For any component, such as the root group, or a particular processor,
you'll also want to add the user to "view the data" policies for the
respective components.

On Mon, Nov 21, 2016 at 5:52 PM Pablo Lopez 
wrote:

> Ok. Thanks for that.
> Would you know how can I enable this again?
>
> Pablo.
>
> On Tue, Nov 22, 2016 at 6:44 AM, Andrew Grande  wrote:
>
> Yes.
>
> It's a combination of a generic view data permission in the global menu
> and specific access in the processing group.
>
> Andrew
>
> On Mon, Nov 21, 2016, 12:28 PM Pablo Lopez 
> wrote:
>
> Hi,
>
> Any ideas as of why Data Provenance option is grayed out (not available)
> and none of the processors show the option in the context menu? Is this
> something to do with security?
>
> Thanks,
> Pablo.
>
>
>
>
> --
> Pablo Lopez.
> Integration Architect
> Integrado Pty Ltd
> M: 044 84 52 479
> pablo.lo...@integrado.com.au
>


Re: Nifi- PutEmail processor issue

2016-11-14 Thread Jeff
Hello Sravani,

Could it be possible that the SMTP server you're using is denying
connections due to the volume of emails your flow might be sending?  How
many emails are sent per flow file, and how many emails do you estimate are
sent per minute?

If this is the case, you can modify your flow to aggregate flowfiles with a
processor like MergeContent so that you can send emails that resemble a
digest, rather than a separate email for each flowfile that moves through
your flow.

On Mon, Nov 14, 2016 at 11:59 PM Gadiputi, Sravani <
sravani.gadip...@capgemini.com> wrote:

>
>
> Hi,
>
>
>
> I have used PutEmail processor in my project to send email notification
> for successful/failure copying of a files.
>
> Each file flow having corresponding PutEmail to send  email notification
> to respective recipients.
>
>
>
> Here the issue is, sometimes email notification will send to respective
> recipients successfully  for successful/failure job.
>
> But sometimes for any one specific job email notification will not be send
> to recipients though job is successful, due to  below error.
>
>
>
> Error:
>
>
>
> Could not connect to SMTP host
>
> Java.net.ConnectException: Connection timed out
>
>
>
> Could you please suggest me how we can overcome this error.
>
>
>
>
>
> Thanks,
>
> Sravani
>
> This message contains information that may be privileged or confidential
> and is the property of the Capgemini Group. It is intended only for the
> person to whom it is addressed. If you are not the intended recipient, you
> are not authorized to read, print, retain, copy, disseminate, distribute,
> or use this message or any part thereof. If you receive this message in
> error, please notify the sender immediately and delete all copies of this
> message.
>


Re: How do you recover a workflow ?

2016-11-11 Thread Jeff
>From your description, it sounds like your additional GenerateFlowFile
processor to manually add a flowfile is the best way to do what you're
trying to do.  However, you can submit a JIRA [1] if you have ideas on how
your workflow could be made easier to perform.

[1] https://issues.apache.org/jira/browse/NIFI

On Fri, Nov 11, 2016 at 2:16 AM Alessio Palma <
alessio.pa...@docomodigital.com> wrote:

> The point is that I have a workflow but sometimes things go wrong and I
> need to manually restart it; this action requires:
>
> 1) Change some parameters ( UpdateAttribute processor )
>
> 2) Fire a new flowfile which will start again the workflow.  Perhaps this
> is the most obscure point. We are using nifi to execute some old cron jobs
> and I'm using the GenerateFlowFile ( crontab scheduling strategy )
> processor to start the flow.
> When the workflow did not complete I use another GenerateFlowFile
> processor to fire a new flowfile which allows me to execute again the flow
> out of the schedule.
>
> All these points can be executed faster if I can insert the value into
> some kind of form into the screen and can fire a new flowfile clicking some
> button instead to start/stop ad additional GenerateFlowFile processor.
>
> Perhaps I'm doing it in the wrong way. So how do you restart a workflow ?
> Maybe this feature can help others in the same task.
> Don't know... I'm just asking.
>
>
>
>
> --
> *From:* Jeff <jtsw...@gmail.com>
> *Sent:* Friday, November 11, 2016 2:36:02 AM
> *To:* users@nifi.apache.org
> *Subject:* Re: How do you recover a workflow ?
>
> Hello Alessio,
>
> Could you provide some more details about your NiFi flow?
>
> One of the triggers I used to manually be able to start processing in my
> flow was to have a GetFile processor (configured with "Keep Source File"
> set to false) watching for files in a directory, and when I wanted to test
> the flow, I would just run the touch command to create a file that the
> GetFile processor would detect and emit a flowfile for it.
>
> Depending on your use case, there might be a better source processor for
> flowfiles that you can use in your flow.
>
> On Thu, Nov 10, 2016 at 6:55 AM Alessio Palma <
> alessio.pa...@docomodigital.com> wrote:
>
> Hello all,
>
> what is the best pratice to recover a workflow gone bad ?
>
> Currently I use a generateFlowFile processor attached to some entry point,
> which allows me to restart something. Start then stop and a flowfile is
> created, but this is not the best option.
>
> I really miss the option to put a flowfile using a mouse click. Also some
> way to display a basic interface where to insert/modify some values used
> into some updateAttribute process helps a lot.
>
> What do you think ?
>
>
> AP
>
>
>


Re: How do you recover a workflow ?

2016-11-10 Thread Jeff
Hello Alessio,

Could you provide some more details about your NiFi flow?

One of the triggers I used to manually be able to start processing in my
flow was to have a GetFile processor (configured with "Keep Source File"
set to false) watching for files in a directory, and when I wanted to test
the flow, I would just run the touch command to create a file that the
GetFile processor would detect and emit a flowfile for it.

Depending on your use case, there might be a better source processor for
flowfiles that you can use in your flow.

On Thu, Nov 10, 2016 at 6:55 AM Alessio Palma <
alessio.pa...@docomodigital.com> wrote:

> Hello all,
>
> what is the best pratice to recover a workflow gone bad ?
>
> Currently I use a generateFlowFile processor attached to some entry point,
> which allows me to restart something. Start then stop and a flowfile is
> created, but this is not the best option.
>
> I really miss the option to put a flowfile using a mouse click. Also some
> way to display a basic interface where to insert/modify some values used
> into some updateAttribute process helps a lot.
>
> What do you think ?
>
>
> AP
>
>
>


Re: Push x Pull ETL

2016-10-13 Thread Jeff
Great to hear, Marcio!

On Thu, Oct 13, 2016 at 9:26 PM Márcio Faria <faria.mar...@ymail.com> wrote:

> Jeff,
>
> Many thanks. I'm now more confident NiFi could be a good fit for us.
>
> Marcio
>
>
> On Wednesday, October 12, 2016 9:06 PM, Jeff <jtsw...@gmail.com> wrote:
>
>
> Hello Marcio,
>
> You're asking on the right list!
>
> Based on the scenario you described, I think NiFi would suit your needs.
> To address your 3 major steps of your workflow:
>
> 1) Processors can run based on a timer-based or cron-based schedule.
> GenerateTableFetch is a processor that can be used to create SQL SELECT
> statements from a table based on increasing values in one or more columns,
> and can be partitioned depending on your batching needs.  These SQL SELECT
> statements can then be executed against the destination database by use of
> the PutSQL processor.
>
> 2) With the more recent data, which I'm assuming is queried from the
> destination database, you can use QueryDatabaseTable to retrieve the new
> rows in Avro format and then transform as needed, which may include
> processors that encapsulate any custom logic you might have written for
> your homemade ETL solution
>
> 3) The PostHTTP processor can be used to send files over HTTPS to the
> external server.
>
> Processors have failure relationships when processing for a flow file
> fails, and can be routed as appropriate, such as retrying failed flow
> files.  For errors that require human intervention, there are a number of
> options.  Most likely, the way your homemade solution currently handles
> errors that require human intervention can be done by NiFi as well.
>
> Personally, I have used NiFi in similar ways to what you have described.
> There are some examples on the Apache NiFi site [1] that you can check
> out.  Your questions about the stopping and restarting of processing when
> errors occur is possible, though much of that is in how you design your
> flow.
>
> Feel free to ask any questions!  Much of the information above is fairly
> high-level, and NiFi offers a lot of processors to meet your data flow
> needs.
>
> - Jeff
>
> On Tue, Oct 11, 2016 at 5:18 PM Márcio Faria <faria.mar...@ymail.com>
> wrote:
>
> Hi,
>
> Potential NiFi user here.
>
> I'm trying to figure out if NiFi could be a good choice to replace our
> existent homemade ETL system, which roughly works like this:
>
> 1) Either on demand or at periodic instants, fetch fresh rows from one or
> more tables in the source database and insert or update them into the
> destination database;
>
> 2) Run the jobs which depend on the more recent data, and generate files
> based on those;
>
> 3) Upload the generated files to an external server using HTTPS.
>
> Since our use cases are more of a "pull" style (Ex: It's time to run the
> report -> get the required data updated -> run the processing job and
> submit the results) than "push" (Ex: Get the latest data available -> when
> some condition is met, run the processing job and submit the results), I'm
> wondering if NiFi, or any other flow-based toolset for that matter, would
> be a good option for us to try or not. Your opinion? Suggestions?
>
> Besides, what is the recommended way to handle errors in a ETL scenario
> like that? For example, we submit a "page" of rows to a remote server and
> its response tells us which of those rows were accepted and which ones had
> a validation error. What would be the recommended approach to handle such
> errors if the fix requires some human intervention? Is there a way of
> stopping the whole flow until the correction is done? How to restart it
> when part of the data were already processed by some of the processors? The
> server won't accept a transaction B if it depends on a transaction A that
> wasn't successfully submitted before.
>
> As you see, our processing is very batch-oriented. I know NiFi can fetch
> data in chunks from a relational database, but I'm not sure how to approach
> the conversion from our current style to a more "stream"-oriented one. I'm
> afraid I could try to use the "right tool for the wrong problem", if you
> know what I mean.
>
> Apologies if this is not the proper venue to ask. I checked all the posts
> in this mailing list and also tried to search for information elsewhere,
> but I wasn't able to find the answers myself.
>
> Any guidance, like examples or links to further reading, would be very
> much appreciated. I'm just starting to learn the ropes.
>
> Thank you,
> Marcio
>
>
>
>


Re: Push x Pull ETL

2016-10-12 Thread Jeff
Hello Marcio,

You're asking on the right list!

Based on the scenario you described, I think NiFi would suit your needs.
To address your 3 major steps of your workflow:

1) Processors can run based on a timer-based or cron-based schedule.
GenerateTableFetch is a processor that can be used to create SQL SELECT
statements from a table based on increasing values in one or more columns,
and can be partitioned depending on your batching needs.  These SQL SELECT
statements can then be executed against the destination database by use of
the PutSQL processor.

2) With the more recent data, which I'm assuming is queried from the
destination database, you can use QueryDatabaseTable to retrieve the new
rows in Avro format and then transform as needed, which may include
processors that encapsulate any custom logic you might have written for
your homemade ETL solution

3) The PostHTTP processor can be used to send files over HTTPS to the
external server.

Processors have failure relationships when processing for a flow file
fails, and can be routed as appropriate, such as retrying failed flow
files.  For errors that require human intervention, there are a number of
options.  Most likely, the way your homemade solution currently handles
errors that require human intervention can be done by NiFi as well.

Personally, I have used NiFi in similar ways to what you have described.
There are some examples on the Apache NiFi site [1] that you can check
out.  Your questions about the stopping and restarting of processing when
errors occur is possible, though much of that is in how you design your
flow.

Feel free to ask any questions!  Much of the information above is fairly
high-level, and NiFi offers a lot of processors to meet your data flow
needs.

- Jeff

On Tue, Oct 11, 2016 at 5:18 PM Márcio Faria <faria.mar...@ymail.com> wrote:

> Hi,
>
> Potential NiFi user here.
>
> I'm trying to figure out if NiFi could be a good choice to replace our
> existent homemade ETL system, which roughly works like this:
>
> 1) Either on demand or at periodic instants, fetch fresh rows from one or
> more tables in the source database and insert or update them into the
> destination database;
>
> 2) Run the jobs which depend on the more recent data, and generate files
> based on those;
>
> 3) Upload the generated files to an external server using HTTPS.
>
> Since our use cases are more of a "pull" style (Ex: It's time to run the
> report -> get the required data updated -> run the processing job and
> submit the results) than "push" (Ex: Get the latest data available -> when
> some condition is met, run the processing job and submit the results), I'm
> wondering if NiFi, or any other flow-based toolset for that matter, would
> be a good option for us to try or not. Your opinion? Suggestions?
>
> Besides, what is the recommended way to handle errors in a ETL scenario
> like that? For example, we submit a "page" of rows to a remote server and
> its response tells us which of those rows were accepted and which ones had
> a validation error. What would be the recommended approach to handle such
> errors if the fix requires some human intervention? Is there a way of
> stopping the whole flow until the correction is done? How to restart it
> when part of the data were already processed by some of the processors? The
> server won't accept a transaction B if it depends on a transaction A that
> wasn't successfully submitted before.
>
> As you see, our processing is very batch-oriented. I know NiFi can fetch
> data in chunks from a relational database, but I'm not sure how to approach
> the conversion from our current style to a more "stream"-oriented one. I'm
> afraid I could try to use the "right tool for the wrong problem", if you
> know what I mean.
>
> Apologies if this is not the proper venue to ask. I checked all the posts
> in this mailing list and also tried to search for information elsewhere,
> but I wasn't able to find the answers myself.
>
> Any guidance, like examples or links to further reading, would be very
> much appreciated. I'm just starting to learn the ropes.
>
> Thank you,
> Marcio
>


Re: UI: feedback on the processor 'color' in NiFi 1.0

2016-09-19 Thread Jeff
I was thinking, in addition to changing the color of the icon on the
processor, that the color of the drop shadow could be changed as well.
That would provide more contrast, but preserve readability, in my opinion.

On Mon, Sep 19, 2016 at 6:39 PM Andrew Grande  wrote:

> Hi All,
>
> Rolling with UI feedback threads. This time I'd like to discuss how NiFi
> 'lost' its ability to change processor boxes color. I.e. as you can see
> from a screenshot attached, it does change color for the processor in the
> flow overview panel, but the processor itself only changes the icon in the
> top-left of the box. I came across a few users who definitely miss the old
> way. I personally think changing the icon color for the processor doesn't
> go far enough, especially when one is dealing with a flow of several dozen
> processors, zooms in and out often. The overview helps, but it's not the
> same.
>
> Proposal - can we restore how color selection for the processor changed
> the actual background of the processor box on the canvas? Let the user go
> wild with colors and deal with readability, but at least it's easy to spot
> 'important' things this way. And with multi-tenant authorization it becomes
> a poor-man's doc between teams, to an extent.
>
> Thanks for any feedback,
> Andrew
>


Re: PermissionBasedStatusMergerSpec is failing

2016-09-16 Thread Jeff
I looked into the merger code and found a possible issue.  I haven't been
able to reproduce the test failure you're getting by setting my locale to
en_IN, but I have a sneaking suspicion that FormatUtils might be the
culprit here when it's formatting the value for TasksDuration in the target
DTO.  Can you please change line 86 in FormatUtils to use
Locale.getDefault() instead of Locale.US and rerun the tests?

On Fri, Sep 16, 2016 at 1:12 PM Tijo Thomas <tijopara...@gmail.com> wrote:

> Hi Jeff,
>
> Yes,  I took a fresh clone from git hub .  I cleaned my maven repo as well
> before building .
>
>  Attached StatusMerger.java and ProcessorStatusSnapshotDTO.java
>
> All,
>
> It will be great if others also check if it is happening when u build. I
> am worried whether I am doing some thing really stupid.
>
> Thanks & Regards
>
> Tijo Thomas
>
> On 16-Sep-2016 7:47 pm, "Jeff" <jtsw...@gmail.com> wrote:
>
>> Thank you for the information.  Did you try running the tests on a fresh
>> clone of the github repo?
>>
>> Could you please link me to or include the contents of StatusMerger.java
>> and ProcessorStatusSnapshotDTO.java?
>>
>> On Fri, Sep 16, 2016 at 4:17 AM Tijo Thomas <tijopara...@gmail.com>
>> wrote:
>>
>>>
>>> Output when I ran through IDE,
>>>
>>>
>>> Condition not satisfied:
>>>
>>> returnedJson == expectedJson
>>> ||  |
>>> ||
>>> {"id":"hidden","groupId":"hidden","name":"hidden","type":"hidden","bytesRead":0,"bytesWritten":0,"read":"0
>>> bytes","written":"0 bytes","flowFilesIn":0,"bytesIn":0,"input":"0 (0
>>> bytes)","flowFilesOut":0,"bytesOut":0,"output":"0 (0
>>> bytes)","taskCount":0,"tasksDurationNanos":0,"tasks":"0","tasksDuration":"00:00:00.000","activeThreadCount":0}
>>> |false
>>> |1 difference (99% similarity)
>>> |
>>> {"id":"hidden","groupId":"hidden","name":"hidden","type":"hidden","bytesRead":0,"bytesWritten":0,"read":"0
>>> bytes","written":"0 bytes","flowFilesIn":0,"bytesIn":0,"input":"0 (0
>>> bytes)","flowFilesOut":0,"bytesOut":0,"output":"0 (0
>>> bytes)","taskCount":0,"tasksDurationNanos":0,"tasks":"0","tasksDuration":"00:(3)0:00.000","activeThreadCount":0}
>>> |
>>> {"id":"hidden","groupId":"hidden","name":"hidden","type":"hidden","bytesRead":0,"bytesWritten":0,"read":"0
>>> bytes","written":"0 bytes","flowFilesIn":0,"bytesIn":0,"input":"0 (0
>>> bytes)","flowFilesOut":0,"bytesOut":0,"output":"0 (0
>>> bytes)","taskCount":0,"tasksDurationNanos":0,"tasks":"0","tasksDuration":"00:(0)0:00.000","activeThreadCount":0}
>>> {"id":"hidden","groupId":"hidden","name":"hidden","type":"hidden","bytesRead":0,"bytesWritten":0,"read":"0
>>> bytes","written":"0 bytes","flowFilesIn":0,"bytesIn":0,"input":"0 (0
>>> bytes)","flowFilesOut":0,"bytesOut":0,"output":"0 (0
>>> bytes)","taskCount":0,"tasksDurationNanos":0,"tasks":"0","tasksDuration":"00:30:00.000","activeThreadCount":0}
>>>
>>> Expected
>>> :{"id":"hidden","groupId":"hidden","name":"hidden","type":"hidden","bytesRead":0,"bytesWritten":0,"read":"0
>>> bytes","written":"0 bytes","flowFilesIn":0,"bytesIn":0,"input":"0 (0
>>> bytes)","flowFilesOut":0,"bytesOut":0,"output":"0 (0
>>> bytes)"

Re: PermissionBasedStatusMergerSpec is failing

2016-09-14 Thread Jeff
Ok, sounds good!  Please let us know!

On Wed, Sep 14, 2016 at 12:04 AM Tijo Thomas <tijopara...@gmail.com> wrote:

> Sorry to reply  late. I was on vacation for last 4 days.
>
> I have not modified any files.
>
> I think there is some problem with my repo.  I will make a new repo and
> try again.  Still the problem  exist I will post it again in the group.
>
> Thank you very much for your support.
>
> Tijo
>
> On 10-Sep-2016 6:44 pm, "Jeff" <jtsw...@gmail.com> wrote:
>
>> Tijo,
>>
>> Have you modified ProcessorStatusSnapshotDTO.java or
>> PermissionBasedStatusMergerSpec.groovy?
>>
>> On Sat, Sep 10, 2016 at 7:48 AM Tijo Thomas <tijopara...@gmail.com>
>> wrote:
>>
>>> Hi Jeff
>>>
>>> I recently  rebase  from master.
>>> Then I cloned again and ran mvn  package
>>>
>>> Tijo
>>>
>>> On 09-Sep-2016 9:12 pm, "Jeff" <jtsw...@gmail.com> wrote:
>>>
>>>> Tijo,
>>>>
>>>> I just ran this test on master and it's passing for me.  Can you
>>>> provide some details about the branch you're on when running the tests?  I
>>>> see that tasksDuration is 00:30:00.000 when it's expecting 00:00:00.000,
>>>> and that's why the JSON isn't matching.
>>>>
>>>> On Thu, Sep 8, 2016 at 4:58 PM Tijo Thomas <tijopara...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi
>>>>> Nifi test case is failing (PermissionBasedStatusMergerSpec) .
>>>>> This is written in Grovy .. not comfortable with Groovy .
>>>>>
>>>>> Running org.apache.nifi.cluster.manager.PermissionBasedStatusMergerSpec
>>>>> Tests run: 20, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 0.922
>>>>> sec <<< FAILURE! - in
>>>>> org.apache.nifi.cluster.manager.PermissionBasedStatusMergerSpec
>>>>> Merge
>>>>> ProcessorStatusSnapshotDTO[0](org.apache.nifi.cluster.manager.PermissionBasedStatusMergerSpec)
>>>>> Time elapsed: 0.144 sec  <<< FAILURE!
>>>>> org.spockframework.runtime.SpockComparisonFailure: Condition not
>>>>> satisfied:
>>>>>
>>>>> returnedJson == expectedJson
>>>>> ||  |
>>>>> ||
>>>>> {"id":"hidden","groupId":"hidden","name":"hidden","type":"hidden","bytesRead":0,"bytesWritten":0,"read":"0
>>>>> bytes","written":"0 bytes","flowFilesIn":0,"bytesIn":0,"input":"0 (0
>>>>> bytes)","flowFilesOut":0,"bytesOut":0,"output":"0 (0
>>>>> bytes)","taskCount":0,"tasksDurationNanos":0,"tasks":"0","tasksDuration":"00:00:00.000","activeThreadCount":0}
>>>>> |false
>>>>> |1 difference (99% similarity)
>>>>> |
>>>>> {"id":"hidden","groupId":"hidden","name":"hidden","type":"hidden","bytesRead":0,"bytesWritten":0,"read":"0
>>>>> bytes","written":"0 bytes","flowFilesIn":0,"bytesIn":0,"input":"0 (0
>>>>> bytes)","flowFilesOut":0,"bytesOut":0,"output":"0 (0
>>>>> bytes)","taskCount":0,"tasksDurationNanos":0,"tasks":"0","tasksDuration":"00:(3)0:00.000","activeThreadCount":0}
>>>>> |
>>>>> {"id":"hidden","groupId":"hidden","name":"hidden","type":"hidden","bytesRead":0,"bytesWritten":0,"read":"0
>>>>> bytes","written":"0 bytes","flowFilesIn":0,"bytesIn":0,"input":"0 (0
>>>>> bytes)","flowFilesOut":0,"bytesOut":0,"output":"0 (0
>>>>> bytes)","taskCount":0,"tasksDurationNanos":0,"tasks":"0","tasksDuration":"00:(0)0:00.000","activeThreadCount":0}
>>>>> {"id":"hidden","groupId":"hidden","name":"hidden","type":"hidden","bytesRead":0,"bytesWritten":0,"read":"0
>>>>> bytes","written":"0 bytes","flowFilesIn":0,"bytesIn":0,"input":"0 (0
>>>>> bytes)","flowFilesOut":0,"bytesOut":0,"output":"0 (0
>>>>> bytes)","taskCount":0,"tasksDurationNanos":0,"tasks":"0","tasksDuration":"00:30:00.000","activeThreadCount":0}
>>>>>
>>>>> at
>>>>> org.apache.nifi.cluster.manager.PermissionBasedStatusMergerSpec.Merge
>>>>> ProcessorStatusSnapshotDTO(PermissionBasedStatusMergerSpec.groovy:257)
>>>>>
>>>>> Merge
>>>>> ProcessorStatusSnapshotDTO[1](org.apache.nifi.cluster.manager.PermissionBasedStatusMergerSpec)
>>>>> Time elapsed: 0.01 sec  <<< FAILURE!
>>>>> org.spockframework.runtime.SpockComparisonFailure: Condition not
>>>>> satisfied:
>>>>>
>>>>> Tijo
>>>>>
>>>>>
>>>>>
>>>>>


Re: PermissionBasedStatusMergerSpec is failing

2016-09-10 Thread Jeff
Tijo,

Have you modified ProcessorStatusSnapshotDTO.java or
PermissionBasedStatusMergerSpec.groovy?

On Sat, Sep 10, 2016 at 7:48 AM Tijo Thomas <tijopara...@gmail.com> wrote:

> Hi Jeff
>
> I recently  rebase  from master.
> Then I cloned again and ran mvn  package
>
> Tijo
>
> On 09-Sep-2016 9:12 pm, "Jeff" <jtsw...@gmail.com> wrote:
>
>> Tijo,
>>
>> I just ran this test on master and it's passing for me.  Can you provide
>> some details about the branch you're on when running the tests?  I see that
>> tasksDuration is 00:30:00.000 when it's expecting 00:00:00.000, and that's
>> why the JSON isn't matching.
>>
>> On Thu, Sep 8, 2016 at 4:58 PM Tijo Thomas <tijopara...@gmail.com> wrote:
>>
>>> Hi
>>> Nifi test case is failing (PermissionBasedStatusMergerSpec) .
>>> This is written in Grovy .. not comfortable with Groovy .
>>>
>>> Running org.apache.nifi.cluster.manager.PermissionBasedStatusMergerSpec
>>> Tests run: 20, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 0.922
>>> sec <<< FAILURE! - in
>>> org.apache.nifi.cluster.manager.PermissionBasedStatusMergerSpec
>>> Merge
>>> ProcessorStatusSnapshotDTO[0](org.apache.nifi.cluster.manager.PermissionBasedStatusMergerSpec)
>>> Time elapsed: 0.144 sec  <<< FAILURE!
>>> org.spockframework.runtime.SpockComparisonFailure: Condition not
>>> satisfied:
>>>
>>> returnedJson == expectedJson
>>> ||  |
>>> ||
>>> {"id":"hidden","groupId":"hidden","name":"hidden","type":"hidden","bytesRead":0,"bytesWritten":0,"read":"0
>>> bytes","written":"0 bytes","flowFilesIn":0,"bytesIn":0,"input":"0 (0
>>> bytes)","flowFilesOut":0,"bytesOut":0,"output":"0 (0
>>> bytes)","taskCount":0,"tasksDurationNanos":0,"tasks":"0","tasksDuration":"00:00:00.000","activeThreadCount":0}
>>> |false
>>> |1 difference (99% similarity)
>>> |
>>> {"id":"hidden","groupId":"hidden","name":"hidden","type":"hidden","bytesRead":0,"bytesWritten":0,"read":"0
>>> bytes","written":"0 bytes","flowFilesIn":0,"bytesIn":0,"input":"0 (0
>>> bytes)","flowFilesOut":0,"bytesOut":0,"output":"0 (0
>>> bytes)","taskCount":0,"tasksDurationNanos":0,"tasks":"0","tasksDuration":"00:(3)0:00.000","activeThreadCount":0}
>>> |
>>> {"id":"hidden","groupId":"hidden","name":"hidden","type":"hidden","bytesRead":0,"bytesWritten":0,"read":"0
>>> bytes","written":"0 bytes","flowFilesIn":0,"bytesIn":0,"input":"0 (0
>>> bytes)","flowFilesOut":0,"bytesOut":0,"output":"0 (0
>>> bytes)","taskCount":0,"tasksDurationNanos":0,"tasks":"0","tasksDuration":"00:(0)0:00.000","activeThreadCount":0}
>>> {"id":"hidden","groupId":"hidden","name":"hidden","type":"hidden","bytesRead":0,"bytesWritten":0,"read":"0
>>> bytes","written":"0 bytes","flowFilesIn":0,"bytesIn":0,"input":"0 (0
>>> bytes)","flowFilesOut":0,"bytesOut":0,"output":"0 (0
>>> bytes)","taskCount":0,"tasksDurationNanos":0,"tasks":"0","tasksDuration":"00:30:00.000","activeThreadCount":0}
>>>
>>> at
>>> org.apache.nifi.cluster.manager.PermissionBasedStatusMergerSpec.Merge
>>> ProcessorStatusSnapshotDTO(PermissionBasedStatusMergerSpec.groovy:257)
>>>
>>> Merge
>>> ProcessorStatusSnapshotDTO[1](org.apache.nifi.cluster.manager.PermissionBasedStatusMergerSpec)
>>> Time elapsed: 0.01 sec  <<< FAILURE!
>>> org.spockframework.runtime.SpockComparisonFailure: Condition not
>>> satisfied:
>>>
>>> Tijo
>>>
>>>
>>>
>>>


Re: PermissionBasedStatusMergerSpec is failing

2016-09-09 Thread Jeff
Tijo,

I just ran this test on master and it's passing for me.  Can you provide
some details about the branch you're on when running the tests?  I see that
tasksDuration is 00:30:00.000 when it's expecting 00:00:00.000, and that's
why the JSON isn't matching.

On Thu, Sep 8, 2016 at 4:58 PM Tijo Thomas  wrote:

> Hi
> Nifi test case is failing (PermissionBasedStatusMergerSpec) .
> This is written in Grovy .. not comfortable with Groovy .
>
> Running org.apache.nifi.cluster.manager.PermissionBasedStatusMergerSpec
> Tests run: 20, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 0.922 sec
> <<< FAILURE! - in
> org.apache.nifi.cluster.manager.PermissionBasedStatusMergerSpec
> Merge
> ProcessorStatusSnapshotDTO[0](org.apache.nifi.cluster.manager.PermissionBasedStatusMergerSpec)
> Time elapsed: 0.144 sec  <<< FAILURE!
> org.spockframework.runtime.SpockComparisonFailure: Condition not satisfied:
>
> returnedJson == expectedJson
> ||  |
> ||
> {"id":"hidden","groupId":"hidden","name":"hidden","type":"hidden","bytesRead":0,"bytesWritten":0,"read":"0
> bytes","written":"0 bytes","flowFilesIn":0,"bytesIn":0,"input":"0 (0
> bytes)","flowFilesOut":0,"bytesOut":0,"output":"0 (0
> bytes)","taskCount":0,"tasksDurationNanos":0,"tasks":"0","tasksDuration":"00:00:00.000","activeThreadCount":0}
> |false
> |1 difference (99% similarity)
> |
> {"id":"hidden","groupId":"hidden","name":"hidden","type":"hidden","bytesRead":0,"bytesWritten":0,"read":"0
> bytes","written":"0 bytes","flowFilesIn":0,"bytesIn":0,"input":"0 (0
> bytes)","flowFilesOut":0,"bytesOut":0,"output":"0 (0
> bytes)","taskCount":0,"tasksDurationNanos":0,"tasks":"0","tasksDuration":"00:(3)0:00.000","activeThreadCount":0}
> |
> {"id":"hidden","groupId":"hidden","name":"hidden","type":"hidden","bytesRead":0,"bytesWritten":0,"read":"0
> bytes","written":"0 bytes","flowFilesIn":0,"bytesIn":0,"input":"0 (0
> bytes)","flowFilesOut":0,"bytesOut":0,"output":"0 (0
> bytes)","taskCount":0,"tasksDurationNanos":0,"tasks":"0","tasksDuration":"00:(0)0:00.000","activeThreadCount":0}
> {"id":"hidden","groupId":"hidden","name":"hidden","type":"hidden","bytesRead":0,"bytesWritten":0,"read":"0
> bytes","written":"0 bytes","flowFilesIn":0,"bytesIn":0,"input":"0 (0
> bytes)","flowFilesOut":0,"bytesOut":0,"output":"0 (0
> bytes)","taskCount":0,"tasksDurationNanos":0,"tasks":"0","tasksDuration":"00:30:00.000","activeThreadCount":0}
>
> at
> org.apache.nifi.cluster.manager.PermissionBasedStatusMergerSpec.Merge
> ProcessorStatusSnapshotDTO(PermissionBasedStatusMergerSpec.groovy:257)
>
> Merge
> ProcessorStatusSnapshotDTO[1](org.apache.nifi.cluster.manager.PermissionBasedStatusMergerSpec)
> Time elapsed: 0.01 sec  <<< FAILURE!
> org.spockframework.runtime.SpockComparisonFailure: Condition not satisfied:
>
> Tijo
>
>
>
>


RE: Need to read a small local file into a flow file property

2016-08-25 Thread Oxenberg, Jeff
Yeah, that would work.. here’s a quick example in python that works on my local 
machine.

https://gist.github.com/jeffoxenberg/327b0dfeaa6bb63882279dd290222582

Thanks,


Jeff Oxenberg

From: Andre [mailto:andre-li...@fucs.org]
Sent: Thursday, August 25, 2016 8:41 AM
To: users@nifi.apache.org
Subject: Re: Need to read a small local file into a flow file property



wouldn't scripted task using ExecuteScript solve this issue?

You could simply use jython, groovy, jruby, luaj or javascript to read the 
contents and add to the attributes. Just be mindful that if I recall correctly 
attributes are size constrained.

Cheers

On Thu, Aug 25, 2016 at 11:17 PM, McDermott, Chris Kevin (MSDU - 
STaTS/StorefrontRemote) 
<chris.mcderm...@hpe.com<mailto:chris.mcderm...@hpe.com>> wrote:
Sorry, I should have been more clear.  I have a flow file with conten.  To that 
flow file, I need to add the content of a disk file as an attribute without 
losing the original content.

Does that better explain things?

Chris McDermott

Remote Business Analytics
STaTS/StoreFront Remote
HPE Storage
Hewlett Packard Enterprise
Mobile: +1 978-697-5315

[cid:image001.png@01D1FEBE.82285E20]

From: Matt Burgess <mattyb...@gmail.com<mailto:mattyb...@gmail.com>>
Reply-To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" 
<users@nifi.apache.org<mailto:users@nifi.apache.org>>
Date: Wednesday, August 24, 2016 at 5:13 PM
To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" 
<users@nifi.apache.org<mailto:users@nifi.apache.org>>
Subject: Re: Need to read a small local file into a flow file property

Chris,

Are you looking to have a flow file that has its own content also as an 
attribute? With EvaluateJsonPath, are you taking in the entire document? If so, 
you could use ExtractText with a regex that captures all text and puts it in an 
attribute, I believe the content of the flow file is untouched.

Please let me know if I've misunderstood your use case, I'm a little confused 
as to why you have two paths and step 3. Wouldn't #1 and #2 (with 
"flowfile-attribute" as the Destination) read the file into an attribute and 
also keep it in the content?

Regards,
Matt

On Wed, Aug 24, 2016 at 4:33 PM, McDermott, Chris Kevin (MSDU - 
STaTS/StorefrontRemote) 
<chris.mcderm...@hpe.com<mailto:chris.mcderm...@hpe.com>> wrote:
Hi folks,

I’m looking for some ideas here.  I need to read the content of a small local 
file info a flow file attribute.  I can’t find a processor that does this.  Did 
I miss one that does?

So without one of these I’ve been trying to do this using a MergeContent 
processor.

First, I assign a correlation UUID and store it in an attribute

I split by file down two processing paths.  The left hand path goes straight to 
the MergeContentProcessors.

In the right hand path I

1.   Read the content of the local file using FetchFile

2.   Pull the content of the FlowFile into an attribute using 
EvaluateJSONPath

3.   Clear the content of the FlowFile using ReplaceText


Then I combine the left and right legs using MergeContent using the assigned 
correlation UUID to merge the files.

This generally works, except when it doesn’t. ☺

The problem seems to be that the left hand side of the stream flows relatively 
faster than the right hand path, which makes sense.  This can lead to the 
“bins” in the MergeContent processor being reused before the file in the bin 
can be merged with the file traveling down the right hand path causing 
Uncorrelated files are then sent to the merged output.

Does it sound like I am using the MergeContent processor in the right way?

Any other ideas?


Thanks in advance,

Chris McDermott

Remote Business Analytics
STaTS/StoreFront Remote
HPE Storage
Hewlett Packard Enterprise
Mobile: +1 978-697-5315

[cid:image002.png@01D1FEBE.82285E20]




RE: Shell script execution through Nifi

2016-07-11 Thread Oxenberg, Jeff
Hey Sravani,

The output of whatever you run using ExecuteProcess is logged to the flowfile 
contents.  I tested briefly with command=/usr/bin/ssh and 
arguments=root@IP.address ‘hostname’ and it returned the remote machine’s 
hostname to the flowfile contents.  This would in my opinion be a little more 
portable/easier to manage than using a separate shell script.

You’d first need to setup passwordless ssh, on the nifi machine: ssh-copy-id 
ip.of.remote.machine.  That would assume that you don’t have a passphrase setup 
for your key.  If you do, you can use an expect script to enter it.

Thanks,


Jeff Oxenberg

From: Gadiputi, Sravani [mailto:sravani.gadip...@capgemini.com]
Sent: Monday, July 11, 2016 9:04 AM
To: users@nifi.apache.org
Subject: RE: Shell script execution through Nifi

Hi Bryan,

Thank you for solution.
Here I am adding  few more to my question.

1)Can we capture any output from the execution of the shell script in flowfile.
2)is there any process to validate logging of remote machine through Nifi.
3) We have to do remote login using ssh to the remote server but it should be 
passwordless.
 How can we achieve?
After connecting to server, how to execute shell script in  remote server 
through NIFI?


Please give me suggestions/inputs on the above points.

Thanks a lot for your assistance. Looking forward for your reply.


Regards,
Sravani


From: Bryan Bende [mailto:bbe...@gmail.com]
Sent: Monday, July 11, 2016 6:25 PM
To: users@nifi.apache.org<mailto:users@nifi.apache.org>
Subject: Re: Shell script execution through Nifi

Hello,

Can you use ExecuteProcess to call a local shell script that SSH's to the 
remote machine and executes whatever you need there?

-Bryan

On Mon, Jul 11, 2016 at 8:49 AM, Gadiputi, Sravani 
<sravani.gadip...@capgemini.com<mailto:sravani.gadip...@capgemini.com>> wrote:
Hi Team,

Need your assistance/inputs for below requirement.
I wanted to Execute shell script/spark jobs  in remote machines through Nifi 
.How can we achieve this?

Could you please suggest any  solution/work abounds, that would be great help.


Thanks in advance!!


Regards,
Sravani

This message contains information that may be privileged or confidential and is 
the property of the Capgemini Group. It is intended only for the person to whom 
it is addressed. If you are not the intended recipient, you are not authorized 
to read, print, retain, copy, disseminate, distribute, or use this message or 
any part thereof. If you receive this message in error, please notify the 
sender immediately and delete all copies of this message.



RE: PutCassandraQL failing on ISO-8601-formatted timestamp

2016-06-30 Thread Oxenberg, Jeff
Hey Matt,

Thanks for responding, I just had the time to update Nifi and retry.  I updated 
to 1.0 as of today.  I'm still unable to insert timestamps - if I omit the 
timestamp column (and rework the query/table to just be key and value), I can 
insert just fine.

Here's the error I'm getting now:

com.datastax.driver.core.exceptions.CodecNotFoundException: Codec not found for 
requested operation: [timestamp <-> java.lang.String]
Here's the relevant attributes of the flowfile:

FlowFile Attribute Map Content
Key: 'cql.args.1.type'
Value: 'text'
Key: 'cql.args.1.value'
Value: 'temp3'
Key: 'cql.args.2.type'
Value: 'timestamp'
Key: 'cql.args.2.value'
Value: '2016-06-30T20:04:36Z'
Key: 'cql.args.3.type'
Value: 'float'
Key: 'cql.args.3.value'
Value: '6.7'
Key: 'j.id'
Value: 'temp3'
Key: 'j.ts'
Value: '2016-06-30T20:04:36Z'
Key: 'j.value'
Value: '6.7'
--
INSERT INTO test.test2 (sensor, ts, value) VALUES(?,?,?)

Thanks,


Jeff Oxenberg

-Original Message-
From: Matt Burgess [mailto:mattyb...@gmail.com] 
Sent: Tuesday, June 21, 2016 8:52 PM
To: users@nifi.apache.org
Subject: Re: PutCassandraQL failing on ISO-8601-formatted timestamp

Jeff,

That appears to be a correct ISO-8601 date, so I'm not sure what's going on 
there. I checked the NiFi code and the Cassandra Java driver Jira and didn't 
see anything related (that wasn't already fixed, in the latter case). The 
upcoming 0.7.0 release has an updated Cassandra driver, perhaps that will solve 
your problem; if it doesn't, please feel free to file a Jira about this.

Regards,
Matt

On Tue, Jun 21, 2016 at 9:32 PM, Oxenberg, Jeff <jeff.oxenb...@hpe.com> wrote:
> Hey,
>
>
>
> As a learning exercise, I’ve created a flow that parses a kafka topic 
> of json messages and inserts them into Cassandra.  It’s a three column table:
> id (text), ts (timestamp), and value (float).  I set cql.args.x.type 
> to the proper data types for each column.
>
>
>
> PutCassandraQL is failing with the following:
>
>
>
> org.apache.nifi.processor.exception.ProcessException: The value of the 
> cql.args.2.value is '2016-06-21T20:23:41Z', which cannot be converted 
> into the necessary data type: timestamp
>
>
>
> This happens for each timestamp, but I’m pretty sure the format is correct.
> It’s in ISO-8601, and I’m able to insert them manually into the c* table.
> I’m on Nifi 0.6.1.
>
>
>
> Any help would be appreciated, thanks!


PutCassandraQL failing on ISO-8601-formatted timestamp

2016-06-21 Thread Oxenberg, Jeff
Hey,

As a learning exercise, I've created a flow that parses a kafka topic of json 
messages and inserts them into Cassandra.  It's a three column table: id 
(text), ts (timestamp), and value (float).  I set cql.args.x.type to the proper 
data types for each column.

PutCassandraQL is failing with the following:

org.apache.nifi.processor.exception.ProcessException: The value of the 
cql.args.2.value is '2016-06-21T20:23:41Z', which cannot be converted into the 
necessary data type: timestamp

This happens for each timestamp, but I'm pretty sure the format is correct.  
It's in ISO-8601, and I'm able to insert them manually into the c* table.  I'm 
on Nifi 0.6.1.

Any help would be appreciated, thanks!


Re: Can't connect to Secure HBase cluster

2016-03-31 Thread Jeff Lord
Do you have a core-site.xml in your config?

On Thu, Mar 31, 2016 at 4:27 AM, Guillaume Pool  wrote:

> Hi,
>
>
>
> I am trying to make a connection to a secured cluster that has phoenix
> installed.
>
>
>
> I am running HDP 2.3.2 and NiFi 0.6.0
>
>
>
> Getting the following error on trying to enable HBase_1_1_2_ClientService
>
>
>
> 2016-03-31 13:24:23,916 INFO [StandardProcessScheduler Thread-5]
> o.a.nifi.hbase.HBase_1_1_2_ClientService
> HBase_1_1_2_ClientService[id=e7e9b2ed-d336-34be-acb4-6c8b60c735c2] HBase
> Security Enabled, logging in as principal n...@hdp.supergrp.net with
> keytab /app/env/nifi.keytab
>
> 2016-03-31 13:24:23,984 WARN [StandardProcessScheduler Thread-5]
> org.apache.hadoop.util.NativeCodeLoader Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
>
> 2016-03-31 13:24:24,101 INFO [StandardProcessScheduler Thread-5]
> o.a.nifi.hbase.HBase_1_1_2_ClientService
> HBase_1_1_2_ClientService[id=e7e9b2ed-d336-34be-acb4-6c8b60c735c2]
> Successfully logged in as principal n...@hdp.supergrp.net with keytab
> /app/env/nifi.keytab
>
> 2016-03-31 13:24:24,177 ERROR [StandardProcessScheduler Thread-5]
> o.a.n.c.s.StandardControllerServiceNode
> HBase_1_1_2_ClientService[id=e7e9b2ed-d336-34be-acb4-6c8b60c735c2] Failed
> to invoke @OnEnabled method due to java.io.IOException:
> java.lang.reflect.InvocationTargetException
>
> 2016-03-31 13:24:24,182 ERROR [StandardProcessScheduler Thread-5]
> o.a.n.c.s.StandardControllerServiceNode
>
> java.io.IOException: java.lang.reflect.InvocationTargetException
>
> at
> org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:240)
> ~[hbase-client-1.1.2.jar:1.1.2]
>
> at
> org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:218)
> ~[hbase-client-1.1.2.jar:1.1.2]
>
> at
> org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:119)
> ~[hbase-client-1.1.2.jar:1.1.2]
>
> at
> org.apache.nifi.hbase.HBase_1_1_2_ClientService$1.run(HBase_1_1_2_ClientService.java:215)
> ~[nifi-hbase_1_1_2-client-service-0.6.0.jar:0.6.0]
>
> at
> org.apache.nifi.hbase.HBase_1_1_2_ClientService$1.run(HBase_1_1_2_ClientService.java:212)
> ~[nifi-hbase_1_1_2-client-service-0.6.0.jar:0.6.0]
>
> at java.security.AccessController.doPrivileged(Native Method)
> ~[na:1.8.0_71]
>
> at javax.security.auth.Subject.doAs(Subject.java:422)
> ~[na:1.8.0_71]
>
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
> ~[hadoop-common-2.6.2.jar:na]
>
> at
> org.apache.nifi.hbase.HBase_1_1_2_ClientService.createConnection(HBase_1_1_2_ClientService.java:212)
> ~[nifi-hbase_1_1_2-client-service-0.6.0.jar:0.6.0]
>
> at
> org.apache.nifi.hbase.HBase_1_1_2_ClientService.onEnabled(HBase_1_1_2_ClientService.java:161)
> ~[nifi-hbase_1_1_2-client-service-0.6.0.jar:0.6.0]
>
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> ~[na:1.8.0_71]
>
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> ~[na:1.8.0_71]
>
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> ~[na:1.8.0_71]
>
> at java.lang.reflect.Method.invoke(Method.java:497) ~[na:1.8.0_71]
>
> at
> org.apache.nifi.util.ReflectionUtils.invokeMethodsWithAnnotations(ReflectionUtils.java:137)
> ~[na:na]
>
> at
> org.apache.nifi.util.ReflectionUtils.invokeMethodsWithAnnotations(ReflectionUtils.java:125)
> ~[na:na]
>
> at
> org.apache.nifi.util.ReflectionUtils.invokeMethodsWithAnnotations(ReflectionUtils.java:70)
> ~[na:na]
>
> at
> org.apache.nifi.util.ReflectionUtils.invokeMethodsWithAnnotation(ReflectionUtils.java:47)
> ~[na:na]
>
> at
> org.apache.nifi.controller.service.StandardControllerServiceNode$1.run(StandardControllerServiceNode.java:285)
> ~[na:na]
>
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> [na:1.8.0_71]
>
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> [na:1.8.0_71]
>
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> [na:1.8.0_71]
>
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> [na:1.8.0_71]
>
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> [na:1.8.0_71]
>
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> [na:1.8.0_71]
>
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_71]
>
> Caused by: java.lang.reflect.InvocationTargetException: null
>
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method) ~[na:1.8.0_71]
>
> at
> 

Re: splitText output appears to be getting dropped

2016-02-18 Thread Jeff Lord
Matt,

Thanks a bunch!
That did the trick.
Is there a better way to handle this out of curiosity? Than writing out a
single line into multiple files.
Each file contains a single string that will be used to build a url.

-Jeff

On Thu, Feb 18, 2016 at 6:00 PM, Matthew Clarke <matt.clarke@gmail.com>
wrote:

> Jeff,
>   It appears you files are being dropped because your are
> auto-terminating the failure relationship on your putFile processor. When
> the splitText processor splits the file by lines every new file has the
> same filename as the original it came from. My guess is the first file is
> being worked to disk and all others are failing because a file of the same
> name already exists in target dir. Try adding an UpdateAttribute processor
> after the splitText to rename all the files. Easiest way is to append the
> files uuid to its filename.  I also do not recommend auto-terminating
> failure relationships except in rare cases.
>
> Matt
> On Feb 18, 2016 8:36 PM, "Jeff Lord" <jeffrey.l...@gmail.com> wrote:
>
>> I have a pretty simple flow where I query for a list of ids using
>> executeProcess and than pass that list along to splitText where I am trying
>> to split on each line to than dynamically build a url further down the line
>> using updateAttribute and so on.
>>
>> executeProcess -> splitText -> putFile
>>
>> For some reason I am only getting one file written with one line.
>> I would expect something more like 100 files each with one line.
>> Using the provenance reporter it appears that some of my items are being
>> dropped.
>>
>> Time02/18/2016 17:13:46.145 PST
>> Event DurationNo value set
>> Lineage Duration00:00:12.187
>> TypeDROP
>> FlowFile Uuid7fa42367-490d-4b54-a32f-d062a885474a
>> File Size14 bytes
>> Component Id3b37a828-ba2c-4047-ba7a-578fd0684ce6
>> Component NamePutFile
>> Component TypePutFile
>> DetailsAuto-Terminated by failure Relationship
>>
>> Any ideas on what I need to change here?
>>
>> Thanks in advance,
>>
>> Jeff
>>
>


splitText output appears to be getting dropped

2016-02-18 Thread Jeff Lord
I have a pretty simple flow where I query for a list of ids using
executeProcess and than pass that list along to splitText where I am trying
to split on each line to than dynamically build a url further down the line
using updateAttribute and so on.

executeProcess -> splitText -> putFile

For some reason I am only getting one file written with one line.
I would expect something more like 100 files each with one line.
Using the provenance reporter it appears that some of my items are being
dropped.

Time02/18/2016 17:13:46.145 PST
Event DurationNo value set
Lineage Duration00:00:12.187
TypeDROP
FlowFile Uuid7fa42367-490d-4b54-a32f-d062a885474a
File Size14 bytes
Component Id3b37a828-ba2c-4047-ba7a-578fd0684ce6
Component NamePutFile
Component TypePutFile
DetailsAuto-Terminated by failure Relationship

Any ideas on what I need to change here?

Thanks in advance,

Jeff


Re: Version Control on NiFi flow.xml

2016-02-17 Thread Jeff - Data Bean Australia
Thanks Joe for pointing out the order issue. Given that, I need to
reconsider my approach, because the original thought was to help
facilitating existing version control tools, such as Git, and compare
different versions on the fly. Given the order issue, this approach doesn't
make more sense than simply store the gz file.

In this case, do we have some tool to compare two flow.xml.gz for some
subtle changes? I am sure the UI based auditing is helpful though.

On Thu, Feb 18, 2016 at 11:07 AM, Joe Witt <joe.w...@gmail.com> wrote:

> Jeff
>
> I think what you're doing is just fine for now.  To Oleg's point we
> should make it better.
>
> We do also have a database where each flow change is being written to
> from a audit perspective and so we can show in the UI who made what
> changes last.  That is less about true CM and more about providing a
> meaningful user experience.
>
> The biggest knock for CM of our current flow.xml.gz and for the
> templates is that the order in which their components are serialized
> is not presently guaranteed so it means diff won't be meaningful.  But
> as far as capturing at specific intervals and storing the flow you
> should be in good shape with your approach.
>
> Thanks
> Joe
>
> On Wed, Feb 17, 2016 at 4:52 PM, Jeff - Data Bean Australia
> <databean...@gmail.com> wrote:
> > Thanks Oleg for sharing this. They are definitely useful.
> >
> > By my question focused more on keeping the data flow definition files'
> > versions, so that Data Flow Developers, or NiFi Cluster Manager in NiFi's
> > term can keep track of our work.
> >
> > Currently I am using the following command line to generate a formatted
> XML
> > to put it into our Git repository:
> >
> > cat conf/flow.xml.gz | gzip -dc | xmllint --format -
> >
> >
> >
> >
> > On Thu, Feb 18, 2016 at 10:01 AM, Oleg Zhurakousky
> > <ozhurakou...@hortonworks.com> wrote:
> >>
> >> Jeff, what you are describing is in works and actively discussed
> >> https://cwiki.apache.org/confluence/display/NIFI/Extension+Registry
> >> and
> >>
> >>
> https://cwiki.apache.org/confluence/display/NIFI/Component+documentation+improvements
> >>
> >> The last one may not directly speaks to the “ExtensionRegistry”, but if
> >> you look through he comments there is a whole lot about it since it is
> >> dependent.
> >> Feel free to participate, but I can say for now that it is slated for
> 1.0
> >> release.
> >>
> >> Cheers
> >> Oleg
> >>
> >> On Feb 17, 2016, at 3:08 PM, Jeff - Data Bean Australia
> >> <databean...@gmail.com> wrote:
> >>
> >> Hi,
> >>
> >> As my NiFi data flow becomes more and more serious, I need to put on
> >> Version Control. Since flow.xml.gz is generated automatically and it is
> >> saved in a compressed file, I am wondering what would be the best
> practice
> >> regarding version control?
> >>
> >> Thanks,
> >> Jeff
> >>
> >> --
> >> Data Bean - A Big Data Solution Provider in Australia.
> >>
> >>
> >
> >
> >
> > --
> > Data Bean - A Big Data Solution Provider in Australia.
>



-- 
Data Bean - A Big Data Solution Provider in Australia.


Re: Version Control on NiFi flow.xml

2016-02-17 Thread Jeff - Data Bean Australia
Thanks Oleg for sharing this. They are definitely useful.

By my question focused more on keeping the data flow definition files'
versions, so that Data Flow Developers, or NiFi Cluster Manager in NiFi's
term can keep track of our work.

Currently I am using the following command line to generate a formatted XML
to put it into our Git repository:

cat conf/flow.xml.gz | gzip -dc | xmllint --format -




On Thu, Feb 18, 2016 at 10:01 AM, Oleg Zhurakousky <
ozhurakou...@hortonworks.com> wrote:

> Jeff, what you are describing is in works and actively discussed
> https://cwiki.apache.org/confluence/display/NIFI/Extension+Registry
> and
>
> https://cwiki.apache.org/confluence/display/NIFI/Component+documentation+improvements
>
> The last one may not directly speaks to the “ExtensionRegistry”, but if
> you look through he comments there is a whole lot about it since it is
> dependent.
> Feel free to participate, but I can say for now that it is slated for 1.0
> release.
>
> Cheers
> Oleg
>
> On Feb 17, 2016, at 3:08 PM, Jeff - Data Bean Australia <
> databean...@gmail.com> wrote:
>
> Hi,
>
> As my NiFi data flow becomes more and more serious, I need to put on
> Version Control. Since flow.xml.gz is generated automatically and it is
> saved in a compressed file, I am wondering what would be the best practice
> regarding version control?
>
> Thanks,
> Jeff
>
> --
> Data Bean - A Big Data Solution Provider in Australia.
>
>
>


-- 
Data Bean - A Big Data Solution Provider in Australia.


Version Control on NiFi flow.xml

2016-02-17 Thread Jeff - Data Bean Australia
Hi,

As my NiFi data flow becomes more and more serious, I need to put on
Version Control. Since flow.xml.gz is generated automatically and it is
saved in a compressed file, I am wondering what would be the best practice
regarding version control?

Thanks,
Jeff

-- 
Data Bean - A Big Data Solution Provider in Australia.


Re: Generate URL based on different conditions

2016-02-17 Thread Jeff - Data Bean Australia
Thank you Matt and Joe for your help.

On Wed, Feb 17, 2016 at 4:22 PM, Matt Burgess <mattyb...@gmail.com> wrote:

> Here's a Gist template that uses Joe's approach of RouteOnAttribute then
> UpdateAttribute to generate URLs with the use case you described:
> https://gist.github.com/mattyb149/8fd87efa1338a70c
>
> On Tue, Feb 16, 2016 at 9:51 PM, Joe Witt <joe.w...@gmail.com> wrote:
>
>> Jeff,
>>
>> For each of the input files could it be that you would pull data from
>> multiple URLs?
>>
>> Have you had a chance to learn about the NiFi Expression language?
>> That will come in quite handy for constructing the URL used in
>> InvokeHTTP.
>>
>> The general pattern I think makes sense here is:
>> - Gather Data
>> - Extract Features from data to construct URL
>> - Fetch document/response from URL
>>
>> During 'Gather Data' you acquire the files.
>>
>> During 'Extract features' you pull out elements of the content of the
>> file into flow file attributes.  You can use RouteOnAttribute to send
>> to an UpdateAttribute processor which constructs a new attribute of
>> URL pattern A or URL pattern B respectively.  You can also collapse
>> that into a single UpdateAttribute possibly using the advanced UI and
>> set specific URLs based on patterns of attributes.  Lots of ways to
>> slice that.
>>
>> During Fetch document you should be able to just have a single
>> InvokeHTTP potentially which looks at some attribute you've defined
>> say 'the-url' and specify in InvokeHTTP the remote URL value to be
>> "${the-url}"
>>
>> We should publish a template for this pattern/approach if we've not
>> already but let's see how you progress and decide what would be most
>> useful for others.
>>
>> Thanks
>> Joe
>>
>> On Tue, Feb 16, 2016 at 9:36 PM, Jeff - Data Bean Australia
>> <databean...@gmail.com> wrote:
>> > Hi,
>> >
>> > I got a use case like this:
>> >
>> > There are two files, say fileA and fileB, both of them contains multiple
>> > lines of items and used for generate URLs. However, the algorithm for
>> > generating URLs are different. If items come from fileA, the URL
>> template
>> > looks like this:
>> >
>> > foo--foo
>> >
>> > If items come from fileB, the template looks like this:
>> >
>> > bar--foo--whatever
>> >
>> > I am going to create a NiFi template to for the Data Flow from reading
>> the
>> > list file up to downloading data using InvokeHTTP, and place a
>> > UpdateAttribute processor in front of the template to feed in different
>> file
>> > names (I have only two files).
>> >
>> > The problem I have so far is how to generate the URLs based on different
>> > input, so that I can make a general NiFi template for reusability.
>> >
>> > Thanks,
>> > Jeff
>> >
>> >
>> >
>> > --
>> > Data Bean - A Big Data Solution Provider in Australia.
>>
>
>


-- 
Data Bean - A Big Data Solution Provider in Australia.


Re: Does NiFi support Hot Deploy?

2016-02-15 Thread Jeff - Data Bean Australia
Thanks Joe. I thought NiFi Clusters, or Clustered NiFis are used for HA,
Load Balancing, and Scalability, but it seems like it can use for
modularization.



On Tue, Feb 16, 2016 at 3:45 PM, Joe Witt <joe.w...@gmail.com> wrote:

> Jeff
>
> Clustered NiFis (multiple nodes) is quite common.  You can have
> hundreds of processors representing a quite large number of distinct
> flows on there.  Templates are the mechanism by which flows can be
> shared among clusters and between dev/prod environments.  We have some
> important issues to tackle to make templates as powerful as they
> should be such as having environment variable mapping and
> consistent/repeatable serialization which lends to better version
> control of templates.
>
> Thanks
> Joe
>
> On Mon, Feb 15, 2016 at 9:39 PM, Jeff - Data Bean Australia
> <databean...@gmail.com> wrote:
> > Thanks Joe for the clarification. And it does make sense to me with
> runtime
> > reliability and change requirement in enterprise environment.
> >
> > I know that there are solutions that put multiple NiFi instances working
> > together. How common a solution like this in real world? Should we,
> > generally speaking, prefer simple flow.xml configuration and combine NiFi
> > instances together for more complex scenarios?
> >
> > On Tue, Feb 16, 2016 at 3:16 PM, Joe Witt <joe.w...@gmail.com> wrote:
> >>
> >> Jeff,
> >>
> >> Regarding hot deploy of new dataflows:
> >> Absolutely.  The ability to have interactive command and control is a
> >> key feature/concept of NiFi so you can certainly create, alter, remove
> >> dataflows while the systems is running by design.  This isn't just for
> >> a person controlling the flow through  UI but also for external
> >> systems to have automated interactions with NiFi through its REST API
> >> which can do things like change the flow, alter priorities, etc..
> >>
> >> Regarding hot deploy of new code/extensions:
> >> We've avoided live deploy of new extensions to this point.  Largely
> >> due to the understanding that while adding new extensions at runtime
> >> is pretty doable it is less reliable/clear to change/update.  That
> >> said, we're trending toward this registry and it would back versions
> >> of extensions at which point maybe this becomes more reasonable.
> >>
> >> Anyway, in the mean time one option may be the fact that the
> >> ExecuteScript,InvokeScriptedProcessor processors do support live
> >> alteration of the code behind them.  While clearly not a complete
> >> solution this may help with some of your cases.
> >>
> >> Thanks
> >> Joe
> >>
> >>
> >>
> >> On Mon, Feb 15, 2016 at 9:10 PM, Jeff - Data Bean Australia
> >> <databean...@gmail.com> wrote:
> >> > Here is my use case: I have a quite complex Data Flow system
> implemented
> >> > by
> >> > NiFi and keep streaming data to another system. There is part of the
> >> > system
> >> > I might want to update on-the-fly without turning down the whole NiFi
> >> > platform. And occasionally I might want to add a new processor, or
> >> > upgrade
> >> > an existing one, then bring up this processor right away.
> >> >
> >> > Can I do it in NiFi now?
> >> >
> >> > Thanks,
> >> > Jeff
> >> >
> >> > --
> >> > Data Bean - A Big Data Solution Provider in Australia.
> >
> >
> >
> >
> > --
> > Data Bean - A Big Data Solution Provider in Australia.
>



-- 
Data Bean - A Big Data Solution Provider in Australia.


Re: Does NiFi support Hot Deploy?

2016-02-15 Thread Jeff - Data Bean Australia
Thanks Joe for the clarification. And it does make sense to me with runtime
reliability and change requirement in enterprise environment.

I know that there are solutions that put multiple NiFi instances working
together. How common a solution like this in real world? Should we,
generally speaking, prefer simple flow.xml configuration and combine NiFi
instances together for more complex scenarios?

On Tue, Feb 16, 2016 at 3:16 PM, Joe Witt <joe.w...@gmail.com> wrote:

> Jeff,
>
> Regarding hot deploy of new dataflows:
> Absolutely.  The ability to have interactive command and control is a
> key feature/concept of NiFi so you can certainly create, alter, remove
> dataflows while the systems is running by design.  This isn't just for
> a person controlling the flow through  UI but also for external
> systems to have automated interactions with NiFi through its REST API
> which can do things like change the flow, alter priorities, etc..
>
> Regarding hot deploy of new code/extensions:
> We've avoided live deploy of new extensions to this point.  Largely
> due to the understanding that while adding new extensions at runtime
> is pretty doable it is less reliable/clear to change/update.  That
> said, we're trending toward this registry and it would back versions
> of extensions at which point maybe this becomes more reasonable.
>
> Anyway, in the mean time one option may be the fact that the
> ExecuteScript,InvokeScriptedProcessor processors do support live
> alteration of the code behind them.  While clearly not a complete
> solution this may help with some of your cases.
>
> Thanks
> Joe
>
>
>
> On Mon, Feb 15, 2016 at 9:10 PM, Jeff - Data Bean Australia
> <databean...@gmail.com> wrote:
> > Here is my use case: I have a quite complex Data Flow system implemented
> by
> > NiFi and keep streaming data to another system. There is part of the
> system
> > I might want to update on-the-fly without turning down the whole NiFi
> > platform. And occasionally I might want to add a new processor, or
> upgrade
> > an existing one, then bring up this processor right away.
> >
> > Can I do it in NiFi now?
> >
> > Thanks,
> > Jeff
> >
> > --
> > Data Bean - A Big Data Solution Provider in Australia.
>



-- 
Data Bean - A Big Data Solution Provider in Australia.


  1   2   >