Re: [VOTE] Release Apache NiFi 1.10.0 (rc3)

2019-11-01 Thread Koji Kawamura
+1 (biding)

Ran through the release helper guid. Built successfully and ran
example flows worked as expected.

Note for others using old macOS.
I encountered test failure due to incompatibility between old macOS
Sierra (10.12.6) and RocksDB 6.2.2 as follows. Recent macOS should be
fine.
https://github.com/facebook/rocksdb/issues/4862

[ERROR] Tests run: 10, Failures: 0, Errors: 9, Skipped: 0, Time
elapsed: 0.167 s <<< FAILURE! - in
org.apache.nifi.rocksdb.TestRocksDBMetronome
[ERROR] testColumnFamilies(org.apache.nifi.rocksdb.TestRocksDBMetronome)
 Time elapsed: 0.08 s  <<< ERROR!
java.lang.UnsatisfiedLinkError:
/private/var/folders/wd/mgqn9pqx07x43hqcg9s4t0fhgn/T/librocksdbjni6955104638219729855.jnilib:
dlopen(/private/var/folders/wd/mgqn9pqx07x43hqcg9s4t0fhgn/T/librocksdbjni6955104638219729855.jnilib,
1): Symbol not found: __ZdlPvSt11align_val_t
  Referenced from:
/private/var/folders/wd/mgqn9pqx07x43hqcg9s4t0fhgn/T/librocksdbjni6955104638219729855.jnilib
  Expected in: /usr/lib/libc++.1.dylib
in 
/private/var/folders/wd/mgqn9pqx07x43hqcg9s4t0fhgn/T/librocksdbjni6955104638219729855.jnilib
at 
org.apache.nifi.rocksdb.TestRocksDBMetronome.testColumnFamilies(TestRocksDBMetronome.java:163)

Thanks Joe for RMing!

Koji

On Sat, Nov 2, 2019 at 5:26 AM Kevin Doran  wrote:
>
> +1 (binding)
>
> Ran through the release helper steps, ran the resulting assembly,
> secured it, and tested integration with secure registry. All working
> well.
>
> Nice work all! Thanks Joe for RM'ing!
>
> On Fri, Nov 1, 2019 at 1:11 PM Andrew Lim  wrote:
> >
> > +1 (non-binding)
> >
> > -Ran full clean install on OS X (10.14.2)
> > -Tested secure NiFi with secure NiFi Registry
> > -Ran basic flows successfully
> > -Tested parameters/parameter contexts functionality and policies
> > -Reviewed core UI and documentation fixes/updates
> >
> > Drew
> >
> > > On Oct 29, 2019, at 1:32 PM, Joe Witt  wrote:
> > >
> > > Hello,
> > >
> > > I am pleased to be calling this vote for the source release of Apache NiFi
> > > nifi-1.10.0.
> > >
> > > As they say 'third time's a charm'.
> > >
> > > The source zip, including signatures, digests, etc. can be found at:
> > > https://repository.apache.org/content/repositories/orgapachenifi-1151
> > >
> > > The source being voted upon and the convenience binaries can be found at:
> > > https://dist.apache.org/repos/dist/dev/nifi/nifi-1.10.0/
> > >
> > > The Git tag is nifi-1.10.0-RC3
> > > The Git commit ID is b217ae20ad6a04cac874b2b00d93b7f7514c0b88
> > > https://gitbox.apache.org/repos/asf?p=nifi.git;a=commit;h=b217ae20ad6a04cac874b2b00d93b7f7514c0b88
> > >
> > > Checksums of nifi-1.10.0-source-release.zip:
> > > SHA256: e9b0a14b3029acd69c6693781b6b6487c14dda12676db8b4a015bce23b1029c1
> > > SHA512:
> > > b07258cbc21d2e529a1aa3098449917e2d059e6b45ffcfcb6df094931cf16caa8970576555164d3f2290cfe064b5780ba1a8bf63dad04d20100ed559a1cfe133
> > >
> > > Release artifacts are signed with the following key:
> > > https://people.apache.org/keys/committer/joewitt.asc
> > >
> > > KEYS file available here:
> > > https://dist.apache.org/repos/dist/release/nifi/KEYS
> > >
> > > 384 issues were closed/resolved for this release:
> > > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316020=12344993
> > >
> > > Release note highlights can be found here:
> > > https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version1.10.0
> > >
> > > The vote will be open for 72 hours.
> > > Please download the release candidate and evaluate the necessary items
> > > including checking hashes, signatures, build
> > > from source, and test. Then please vote:
> > >
> > > [ ] +1 Release this package as nifi-1.10.0
> > > [ ] +0 no opinion
> > > [ ] -1 Do not release this package because...
> >


Re: MergeRecord can not guarantee the ordering of the input sequence?

2019-10-15 Thread Koji Kawamura
Hi Lei,

How about setting FIFO prioritizer at all the preceding connections
before the MergeRecord?
Without setting any prioritizer, FlowFile ordering is nondeterministic.

Thanks,
Koji

On Tue, Oct 15, 2019 at 8:56 PM wangl...@geekplus.com.cn
 wrote:
>
>
> If  FlowFile A, B, C enter the MergeRecord sequentially, the output should be 
> one FlowFile {A, B, C}
> However, when testing with  large data volume, sometimes the output order 
> will be not the same as they enter. And this result is nondeterministic
>
> This really confuses me a lot.
> Anybody has any insight on this?
>
> Thanks,
> Lei
>
> 
> wangl...@geekplus.com.cn


Re: Data inconsistency happens when using CDC to replicate my database

2019-10-15 Thread Koji Kawamura
Hi Lei,

To address FlowFile ordering issue related to CaptureChangeMySQL, I'd
recommend using EnforceOrder processor and FIFO prioritizer before a
processor that requires precise ordering. EnforceOrder can use
"cdc.sequence.id" attribute.

Thanks,
Koji

On Tue, Oct 15, 2019 at 1:14 PM wangl...@geekplus.com.cn
 wrote:
>
>
> Seems it is related with which prioritizer is used.
> The inconsistency accurs when OldestFlowFileFirst prioritizer is used, but 
> not accur when FirstInFristOut prioritizer is used.
> But I have no idea why.
> Any insight on this?
>
> Thanks,
> Lei
>
>
> 
> wangl...@geekplus.com.cn
>
>
> 发件人: wangl...@geekplus.com.cn
> 发送时间: 2019-10-15 08:08
> 收件人: users
> 抄送: dev
> 主题: Data inconsistency happens when using CDC to replicate my database
> Using CaptureChangeMySQL to extract binlog, do some translation and then put 
> to another database with PutDatabaseRecord processor.
> But there's always data inconsitency between destination database and souce 
> database. To debug this, I have do the following settings.
>
> CaptureChangeMySQL only output one table. There's a field called order_no 
> that is uniq in the table.
> All the proessors are scheduled with only one concurrency.
> No data balance between nodes. All run on primary node
> After CaptureChangeMySQL, add a LogAttrubute processor called log1. Before 
> PutDatabaseRecord, also add a LogAttrubute, called log2.
>
> For the inconsistent data, i can  grep the order_no in log1 and log2.
> For one specified order_no, there's total 5  binlog message. But in log1, 
> there's only one message. In log2, there's 5, but the order is changed.
>
> position   type
> 201721167  insert (appeared in log1 and log2)
> 201926490  update(appeared only in log2)
> 202728760  update(appeared only in log2)
> 203162806  update(appeared only in log2)
> 203135127  update (appeared only in log2, the position number is smaller then 
> privious msg)
>
> This really confused me a lot.
> Any insight on this?  Thanks very much.
>
> Lei
>
> 
> wangl...@geekplus.com.cn


Re: Nifi Database repository backup & restore

2019-10-14 Thread Koji Kawamura
Hi Ganesh,

Thanks for the update. But it's still unclear to me about following points:
1. What files did you back up and restore exactly? If you can share
how did you take backup and restore with granular details such as
exact command list, then I can try reproducing the issue on my end. I
have successfully taken backups and restored them.
2. Did you see any error message or logs telling NiFi flow didn't run
successfully? Please share error messages you faced.

Thanks,
Koji

On Fri, Oct 11, 2019 at 3:36 PM Ganesh, B (Nokia - IN/Bangalore)
 wrote:
>
> Hi Koji ,
>
> I scenario is like below
>
> 1. Install Nifi and run the processor to getHttp CSV ->convert to Json -->put 
> file
> 2. Take backup of data ( using one of the backup and restore technique)
> 3. login to the Nifi pod and delete the flow_file_repository & 
> data_base_repository folders .
> 3. restore the same
>
> But Nifi flow is failed to process data .
>
> Have you ever tried this scenario ? If you tried please let me know .
>
> Thanks & Regards,
> Ganesh.B
>
>
> -Original Message-
> From: Koji Kawamura 
> Sent: Friday, October 11, 2019 11:13 AM
> To: dev 
> Subject: Re: Nifi Database repository backup & restore
>
> Hi Ganesh,
>
> What did you mean by following statement? Would you elaborate what is 
> expected and how it behaved actually?
> > Nifi is not processing flow from the point where it got stopped or crashed .
>
> Some processor needs "state" get restored in addition to FlowFiles.
> States are stored in Zookeeper or in local file system. Does your test 
> recover these files, too?
>
> Thanks,
> Koji
>
> On Thu, Oct 10, 2019 at 3:10 PM Ganesh, B (Nokia - IN/Bangalore) 
>  wrote:
> >
> > Hi ,
> >
> > we are trying to test a disaster recovery with the backup/restore feature.
> > We could see that the flowfile and database volume are restored but Nifi is 
> > not processing flow from the point where it got stopped or crashed .
> >
> > Any one have tested the above scenario ?
> >
> > Thanks & Regards,
> > Ganesh.B


Re: Nifi Database repository backup & restore

2019-10-10 Thread Koji Kawamura
Hi Ganesh,

What did you mean by following statement? Would you elaborate what is
expected and how it behaved actually?
> Nifi is not processing flow from the point where it got stopped or crashed .

Some processor needs "state" get restored in addition to FlowFiles.
States are stored in Zookeeper or in local file system. Does your test
recover these files, too?

Thanks,
Koji

On Thu, Oct 10, 2019 at 3:10 PM Ganesh, B (Nokia - IN/Bangalore)
 wrote:
>
> Hi ,
>
> we are trying to test a disaster recovery with the backup/restore feature.
> We could see that the flowfile and database volume are restored but Nifi is 
> not processing flow from the point where it got stopped or crashed .
>
> Any one have tested the above scenario ?
>
> Thanks & Regards,
> Ganesh.B


Re: Jira contributor access

2019-10-10 Thread Koji Kawamura
Hi Seokwon,

I've added you contributor role. Looking forward to see your contribution!

Thanks,
Koji

On Thu, Oct 10, 2019 at 7:51 AM Seokwon Yang  wrote:
>
> Hello,
>
> I would like to contribute to the nifi codebase. Please add me (Jira username 
> : sjyang18) as a contributor.
>
> Thanks
>
> Seokwon
>


Re: Can CaptureChangeMySQL be scheduled to all nodes instead of primary node?

2019-10-10 Thread Koji Kawamura
Hi Lei,

I don't know any NiFi built-in feature to achieve that.
To distribute CaptureChangeMySQL load among nodes, I'd deploy separate
standalone NiFi (or even MiNiFi Java) in addition to the main NiFi
cluster for the main data flow.

For example, if there are 5 databases and 3 NiFi nodes, deploy a 3
node NiFi cluster with an InputPort.
And also, run standalone NiFi/MiNiFi processes on each node, too,
node-a (datasource 1 and 2), node-b (datasource 3 and 4), node-c
(datasource 5) then use RemoteProcessGroup to send captured data to
the main NiFi cluster.

This approach may be harder to maintain, but feasible.

Thanks,
Koji

On Wed, Oct 9, 2019 at 3:06 PM wangl...@geekplus.com.cn
 wrote:
>
> I am using CaptureChangeMySQL to replicate the database.
> There are many data sources and so there're many  CaptureChangeMySQL 
> processors.
> The CaptureChangeMySQL throws same slave id error  if scheduled on all nodes. 
> So it can only be scheduled on primary node. This causes  very heavy load on 
> the primary node.
>
> Is there any method than i can  distribute the CaptureChangeMySQL processors 
> to all nodes instead of only to primary node?
>
> Thanks,
> Lei
>
> 
> wangl...@geekplus.com.cn


Re: [discuss] approaching a NiFi 1.10.0 release

2019-10-02 Thread Koji Kawamura
Hi Pierre,

The PR3394 looks good, but is hard to merge to master without manually
resolving conflict, at least with the commands I know. Please see my
comment on the PR.

Thanks,
Koji

On Thu, Oct 3, 2019 at 1:59 AM Pierre Villard
 wrote:
>
> Someone willing to merge https://github.com/apache/nifi/pull/3394? It's
> been reviewed, just wanted to avoid merging my own PR.
>
> Le mer. 2 oct. 2019 à 18:44, Jeff  a écrit :
>
> > Joe,
> >
> > NIFI-6275 [1] has been resolved, and PR 3483 [2] has been merged to master.
> >
> > [1] https://issues.apache.org/jira/browse/NIFI-6275
> > [2] https://github.com/apache/nifi/pull/3483
> >
> > On Tue, Oct 1, 2019 at 12:30 AM Jeff  wrote:
> >
> > > Thanks Joe!  I should have an update/resolution for NIFI-6275 tomorrow.
> > >
> > > On Mon, Sep 30, 2019 at 11:48 PM Joe Witt  wrote:
> > >
> > >> Team,
> > >>
> > >> We are pretty much there now it looks like for a 1.10.0.  In scanning
> > >> through this thread it looks like the ListHDFS ask from Jeff is perhaps
> > >> the
> > >> only remaining bit.
> > >>
> > >> I'll probably create the RC branch soon so folks can keep moving on
> > master
> > >> and I'll try to pull in other things that look ready/reviewed/etc.. that
> > >> might work.
> > >>
> > >> If you look here
> > >> https://issues.apache.org/jira/projects/NIFI/versions/12344993 we
> > already
> > >> have a *TON* of features/fixes/improvements loaded up on this release.
> > >>
> > >> My goal is to initiate the RC processes this week.
> > >>
> > >> Huge thanks to the many many folks that contributed to getting us this
> > far
> > >> in this release.
> > >>
> > >> Thanks!
> > >> Joe
> > >>
> > >>
> > >>
> > >> On Fri, Aug 30, 2019 at 4:21 AM Pierre Villard <
> > >> pierre.villard...@gmail.com>
> > >> wrote:
> > >>
> > >> > +1 for getting a 1.10.0 release soon - a lot of great things has been
> > >> > added.
> > >> >
> > >> > If I may ask, it would be great to have a +1 from a committer on
> > >> NIFI-6159
> > >> > [1].
> > >> > The PR has been reviewed by turcsanyip but it would need a final go
> > >> from a
> > >> > committer.
> > >> > However, I do understand that other PRs mentioned in this thread are
> > >> more
> > >> > important (bug related).
> > >> >
> > >> > [1] https://github.com/apache/nifi/pull/3394
> > >> >
> > >> > Thanks,
> > >> > Pierre
> > >> >
> > >> > Le ven. 30 août 2019 à 08:21, Jeff  a écrit :
> > >> >
> > >> > > I'd also like to get the PR [1] for NIFI-6275 merged before 1.10.0
> > is
> > >> > > released.  It's an update to ListHDFS so that the scheme and
> > authority
> > >> > are
> > >> > > ignored when using the "Full Path" filter mode.
> > >> > >
> > >> > > [1] https://github.com/apache/nifi/pull/3483
> > >> > >
> > >> > > On Thu, Aug 29, 2019 at 10:04 PM Aldrin Piri 
> > >> > wrote:
> > >> > >
> > >> > > > I created NIFI-6604 [1] to reduce assembly size and listed it as a
> > >> > > Blocker
> > >> > > > for this release.  The issue has a link to the associated
> > discussion
> > >> > > thread
> > >> > > > and an initial PR.
> > >> > > >
> > >> > > > --aldrin
> > >> > > >
> > >> > > > [1] https://issues.apache.org/jira/browse/NIFI-6604
> > >> > > >
> > >> > > > On Thu, Aug 29, 2019 at 4:08 PM Matt Burgess  > >
> > >> > > wrote:
> > >> > > >
> > >> > > > > Mike,
> > >> > > > >
> > >> > > > > I’ll review those two graph PRs tonight or tomorrow (if they’re
> > >> still
> > >> > > > open
> > >> > > > > by then)
> > >> > > > >
> > >> > > > > > On Aug 29, 2019, at 3:43 PM, Mike Thomsen <
> > >> mikerthom...@gmail.com>
> > >> > > > > wrote:
> > >> > > > > >
> > >> > > > > > I have two open graph-related PR's that are really small and
> > are
> > >> > > needed
> > >> > > > > to
> > >> > > > > > close some bugs that will be bad for early adopters:
> > >> > > > > >
> > >> > > > > > https://github.com/apache/nifi/pull/3571
> > >> > > > > > https://github.com/apache/nifi/pull/3572
> > >> > > > > >
> > >> > > > > > If someone wants to review, that'd be great. Otherwise, I can
> > >> merge
> > >> > > > them
> > >> > > > > in
> > >> > > > > > because I'm doing daily work on some graph stuff using a
> > branch
> > >> > based
> > >> > > > on
> > >> > > > > > both patches.
> > >> > > > > >
> > >> > > > > > Thanks,
> > >> > > > > >
> > >> > > > > > Mike
> > >> > > > > >
> > >> > > > > >> On Thu, Aug 29, 2019 at 2:12 PM Joe Witt  > >
> > >> > > wrote:
> > >> > > > > >>
> > >> > > > > >> We had another discuss on that recently and the intent is to
> > >> drop
> > >> > a
> > >> > > > few
> > >> > > > > >> toys off the raft and update migration guidance.
> > >> > > > > >>
> > >> > > > > >> Thanks
> > >> > > > > >>
> > >> > > > > >>> On Thu, Aug 29, 2019 at 2:02 PM Jeremy Dyer <
> > jdy...@gmail.com
> > >> >
> > >> > > > wrote:
> > >> > > > > >>>
> > >> > > > > >>> +1 looking forward to this.
> > >> > > > > >>>
> > >> > > > > >>> I recall seeing some issues about Apache Infra and the
> > binary
> > >> > size.
> > >> > > > > Were
> > >> > > > > >>> all of those 

Re: [VOTE] Create NiFi Standard Libraries sub-project

2019-09-03 Thread Koji Kawamura
+1 Create NiFi Standard Libraries (binding)

On Wed, Sep 4, 2019 at 7:25 AM Mike Thomsen  wrote:
>
> +1 binding
>
> On Tue, Sep 3, 2019 at 5:33 PM Andy LoPresto  wrote:
>
> > +1, create NiFi Standard Libraries (binding)
> >
> > Andy LoPresto
> > alopre...@apache.org
> > alopresto.apa...@gmail.com
> > PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
> >
> > > On Sep 3, 2019, at 2:16 PM, Bryan Bende  wrote:
> > >
> > > All,
> > >
> > > In a previous thread there was a plan discussed to restructure some of
> > > the repositories in order to address several different issues, such as
> > > build time, reusability of code, and eventually separating how the
> > > framework and extensions are released [1][2].
> > >
> > > The overall plan requires many steps to get there, so I'd like to
> > > propose starting with a small actionable step - the creation of a new
> > > sub-project called NiFi Standard Libraries (formerly referred to as
> > > nifi-commons).
> > >
> > > Project Name: Apache NiFi Standard Libraries
> > > Git Repository: nifi-standard-libraries
> > > JIRA: NIFILIBS
> > >
> > > Description:
> > >
> > > A collection of standard implementations used across the NiFi ecosystem.
> > >
> > > Candidate Libraries:
> > >
> > > In general, each library may consist of multiple Maven modules, and
> > > should be independent from the rest of the ecosystem, and from other
> > > libraries within NiFi Standard Libraries.
> > >
> > > In addition, each library may make it's own decision about whether it
> > > is considered a public facing extension point/API, or an internal
> > > library that may be changed at any time. This should be documented in
> > > a README at the root of each library, such as
> > > nifi-standard-libraries/nifi-xyz/README.
> > >
> > > An initial library that has been discussed was referred to as
> > > 'nifi-security' and would centralize much of the security related code
> > > shared by NiFi and NiFi Registry, such as shared security APIs, and
> > > implementations for various providers, such as LDAP/Kerberos/etc.
> > >
> > > A second candidate library would be an optimistic-locking library
> > > based on NiFi's revision concept. Currently this has been created
> > > inside nifi-registry for now [3], but could be moved as soon as
> > > nifi-standard-libraries exists.
> > >
> > > (This list does not have to be final in order to decide if we are
> > > creating NiFi Standard Libraries or not)
> > >
> > > Integration & Usage:
> > >
> > > Once NiFi Standard Libraries is created, the community can start
> > > creating and/or moving code there and perform releases as necessary. A
> > > release will consist of the standard Apache source release, plus
> > > artifacts released to Maven central. The community can then decide
> > > when it is appropriate to integrate these released libraries into one
> > > of our downstream projects.
> > >
> > > For example, if we create a nifi-security library in
> > > nifi-standard-libraries, we can release that whenever we decide, but
> > > we may not integrate it into NiFi or NiFi Registry until it makes
> > > sense for a given release of those projects.
> > >
> > > This vote will be open for 48 hours, please vote:
> > >
> > > [ ] +1 Create NiFi Standard Libraries
> > > [ ] +0 no opinion
> > > [ ] -1 Do not create NiFi Standard Libraries because...
> > >
> > > [1]
> > http://apache-nifi.1125220.n5.nabble.com/discuss-Splitting-NiFi-framework-and-extension-repos-and-releases-td27499.html
> > > [2]
> > https://cwiki.apache.org/confluence/display/NIFIREG/NiFi+Project+and+Repository+Restructuring
> > > [3]
> > https://github.com/apache/nifi-registry/tree/master/nifi-registry-core/nifi-registry-revision
> >
> >


Re: [discuss] approaching a NiFi 1.10.0 release

2019-08-29 Thread Koji Kawamura
There is a critical issue with RAW Site-to-Site server side code with Java 11.
RAW Site-to-Site cannot be used currently due to the illegal blocking
mode error.
https://issues.apache.org/jira/browse/NIFI-5952

There is a PR to make RAW S2S not using blocking mode. This addresses
the issue and RAW S2S works fine on Java 11.
https://github.com/apache/nifi/pull/3265

At the same time, I've been working on implementing a complete
non-blocking RAW S2S. This will support more concurrent client
connections, but hasn't finished as it doesn't work in a clustered
environment. I'm still debugging but not sure how it can be done.
https://github.com/apache/nifi/pull/3578

My suggestion is merging #3265 before releasing 1.10.0.
The non-blocking S2S work can continue after that. It's an
improvement. But we can't release 1.10.0 and support Java 11 without
resolving NIFI-5952.

On Fri, Aug 30, 2019 at 5:08 AM Matt Burgess  wrote:
>
> Mike,
>
> I’ll review those two graph PRs tonight or tomorrow (if they’re still open by 
> then)
>
> > On Aug 29, 2019, at 3:43 PM, Mike Thomsen  wrote:
> >
> > I have two open graph-related PR's that are really small and are needed to
> > close some bugs that will be bad for early adopters:
> >
> > https://github.com/apache/nifi/pull/3571
> > https://github.com/apache/nifi/pull/3572
> >
> > If someone wants to review, that'd be great. Otherwise, I can merge them in
> > because I'm doing daily work on some graph stuff using a branch based on
> > both patches.
> >
> > Thanks,
> >
> > Mike
> >
> >> On Thu, Aug 29, 2019 at 2:12 PM Joe Witt  wrote:
> >>
> >> We had another discuss on that recently and the intent is to drop a few
> >> toys off the raft and update migration guidance.
> >>
> >> Thanks
> >>
> >>> On Thu, Aug 29, 2019 at 2:02 PM Jeremy Dyer  wrote:
> >>>
> >>> +1 looking forward to this.
> >>>
> >>> I recall seeing some issues about Apache Infra and the binary size. Were
> >>> all of those appropriate modules removed as discussed and the build size
> >>> will be small enough now?
> >>>
>  On Thu, Aug 29, 2019 at 1:35 PM Bryan Bende  wrote:
> 
>  +1 Looking forward to getting parameters and Java 11 support out there
>  in a release.
> 
> > On Thu, Aug 29, 2019 at 1:02 PM Joe Witt  wrote:
> >
> > Team,
> >
> > It looks like we're reaching a point in which it is time to close in
> >> on
> > 1.10.
> >
> >
> 
> >>>
> >> https://issues.apache.org/jira/browse/NIFI-6595?jql=project%20%3D%20NIFI%20AND%20fixVersion%20%3D%201.10.0
> >
> > There are 250+ issues in there nearly 240 of which are already
>  resolved.  I
> > haven't gone through yet for fully analyzing but the awesome Java 11
> >>> work
> > Jeff Storck and others has done means we can now build on Java 11 as
> >>> well
> > as run on it.  There is a ton of great work on parameters which will
> >>> make
> > managing flows at scale much easier.  We will need to drop some nars
>  which
> > means we have some migration guidance to update.
> >
> > I'd like to volunteer as RM but if there are any other takers please
> >>> let
>  me
> > know.
> >
> > As we close in on releases this tends to create a lot of urgency to
>  quickly
> > try to get new things in.  I will try to manage this well.  As always
> >>> we
> > can do another release.  There is no timeline pushing us to have such
>  gaps
> > between releases...we can always do another feature release.  That
> >>> said,
>  PR
> > review bandwidth is at a super premium.  We receive a lot more PRs
> >> than
>  we
> > do receive reviews.  We'll have to work this as the PR depth grows.
> >
> > Thanks
> > Joe
> 
> >>>
> >>


Re: Wait/Notify Question

2019-08-19 Thread Koji Kawamura
Hi Chris,

You are correct, Wait processor has to rely on an attribute within a
FlowFile to determine target signal count.
I think the idea of making Wait be able to fetch target signal count
from DistributedMapCache is a nice improvement.

Please create a JIRA for further discussion. I guess we will need to
add a property such as "Fetch Target Signal Count from Cache Service",
boolean, defaults to false. If enabled, Wait processor treats the
configured "Target Signal Count" value as a key in the
DistributedMapCache, then fetch the value to use as a target count. In
case the key is not found, the Wait processor transfer the FlowFile to
wait relationship.
https://issues.apache.org/jira/projects/NIFI

Adding FetchDistributedMapCache right before Wait provides the same
result. But if Wait processor can fetch it, we can reduce the number
of fetch operation required to process multiple FlowFiles at Wait.

To avoid the race condition that Wait processes FlowFiles before the
counting part finishes, I'd use two keys at the counting part.
Temporary one to accumulate the count, and the final one (the signal
identifier), once the counting finished.

Thanks,
Koji

On Tue, Aug 20, 2019 at 1:08 AM Chris Lundeberg  wrote:
>
> Hi all,
>
> I wanted to throw out a question to the larger community before I went down
> a different path.  I might be looking at this wrong or making assumptions I
> shouldn't.
>
> Recently I started working with the Wait and Notify processors a bit more.
> I have a new flow which is a bit more batch in nature and these processors
> seem to work nicely for being able to intelligently wait for chunks or
> files to be processed, before moving on to the next step.  I have one
> specific pattern that I haven't solved with the inbuilt functionality,
> which is:
>
> 1. I have an incoming zip file from SFTP.  That zip contains n-number of
> files within and each of those files need to be split in some way.  I won't
> know the number of files within the zip.
>
> 2.  After they have been split correctly, a few transformations run on each
> of the files.
>
> 3.  At the end of the transformation process, these various files will be
> merged into 5 specific outbound file formats, to be sent to an outbound
> SFTP server.  *Note*: I am not splitting and merging the same files back
> together (I have looked at the fragment index stuff).
>
> I found a nice solution for being able to count the number of flowfiles
> after the split, so I know exactly how many files should be transformed and
> thus I know what my "Target Signal Count" should be within the Wait
> processor.  At the moment I have a counting process to (1) Fetch
> Distributed MapCache, (2) Replace text (incrementing the count number from
> the fetch, if a number is found), and (3) Put Distributed MapCache.  This
> process works as expected and I have a valid key/value pair in the MapCache
> for that particular process (I create a BachID so its very specific for
> each pull from the SFTP processor).  The only way I know how to
> intelligently provide that information back to the Wait processor is to
> pull that value with a Fetch Distributed MapCache right before the flowfile
> enters the Wait processor.  In theory each flowfile waiting would have the
> same attribute from the Fetch process and each attribute would be the same
> count.  However this doesn't always work because there could exist a
> condition where the transformations happen before the counting has been
> done and published to the MapCache Sever.  So in this scenario you end up
> with some flowfiles having a lower count than others or just not having the
> "true" count.  Now, I can put additional gates in place such as trying to
> slow down the flowfiles at specific sections to try and allow the counting
> to be done first, but its not a perfect science.
>
> I thought ideally it would be good to allow the Wait processor to pull
> directly from the MapCache if I could provide the key it would need for a
> lookup, within the "Target Signal Count" field.  It could use the signal
> coming from Notify to say "I have X number of Notify, for this signal" and
> use the count value I have set in the MapCache to say "This is the total
> number of files I need to see from Notify, for that same signal". This way,
> I could run the Wait processor every few seconds and the chances of running
> into a miscount condition would be far less.  Is there any way currently
> where this processor could pull directly from the cache, or does it have to
> rely on an attribute within the flowfile itself?  I think it's the latter,
> but I want to make sure someone doesn't have a better idea.
>
> Sorry for the long message. Thanks!
>
>
> Chris Lundeberg


Re: Enquire about JMSConnectonFactoryProvider

2019-07-07 Thread Koji Kawamura
Hello,

Sorry to hear that you are having trouble with upgrading NiFi.
What complaints or error messages do you get?

Thanks,
Koji

On Fri, Jul 5, 2019 at 8:18 PM Chaganti Suresh Naidu (NCS)
 wrote:
>
> Dear Sir/Madam,
> Greetings,
>
> I was using nifi quite some time, with the version 1.4.0,
> And I didn't update the nifi version till now, and I want to upgrade nifi to 
> 1.9.2 version which I got few conflicts which I manage to fix,
> But for JMSConnectionFactoryProvider, it complains but in 1.4.0 version 
> doesn't,
>
> How I can fix the issue? I re-created the existing ConsumeJMS processor for 
> my requirement, in it uses JMSCOnnectionfactoryProvider
>
> Please help me out!
>
> Thanks
> Suresh
>


Re: Message sending to JMS through MQ gives error-Transport scheme NOT recognized:

2019-06-13 Thread Koji Kawamura
Hi,

There is another NiFi JMS connection ControllerService, that is
JndiJmsConnectionFactoryProvider.
With PublishJMS and JndiJmsConnectionFactoryProvider, I was able to
send message to a HornetQ queue.

JndiJmsConnectionFactoryProvider config example:
- Initial Naming Factory Class: org.jnp.interfaces.NamingContextFactory
- Naming Provider URL: jnp://localhost:1099
- Connection Factory Name: /ConnectionFactory
- Naming Factory Libraries: /Users/koji/Downloads/hornetq-2.4.0.Final/lib

Hope this helps.
Koji

On Fri, Jun 14, 2019 at 10:23 AM Koji Kawamura  wrote:
>
> Hello,
>
> PutJMS is deprecated, PublishJMS is recommended instead.
> PublishJMS uses JMSConnectionFactoryProvider Controller Service, in
> which you can specify "MQ ConnectionFactory Implementation" and "MQ
> Client Libraries path (i.e., /usr/jms/lib)".
> You will need to download HornetQ from here, extract it and point the
> lib dir at JMSConnectionFactoryProvider "MQ Client Libraries path".
>
> I didn't have no experience with HornetQ, but I downloaded it and try
> connecting PublishJMS to it.
> However, unfortunately it seems that NiFi JMSConnectionFactoryProvider
> doesn't support HornetQ currently.
> Because HornetQConnectionFactory needs few application code to
> initialize connection factory with service locator, example is
> available in the link below.
> While NiFi JMSConnectionFactoryProvider uses default constructor and
> that can't initialize service locator.
> https://gist.github.com/caandradeduarte/a527712241c1e1c6d86b171362b58b78
>
> I think you need a custom JMSConnectionFactoryProviderDefinition
> implementation to connect NiFi to HornetQ.
> I may be wrong since this is the first time I used HornetQ..
>
> Thanks,
> Koji
>
> On Thu, Jun 13, 2019 at 9:21 PM Puspak  wrote:
> >
> > Hi ,
> >
> > I am relatively new to nifi .I have a requirment where i have to push some
> > messages to JMS (HornetQ).
> >
> > When i push the message with below configuration in putJMS of nifi i am
> > getting the below error.
> >
> > 2019-06-13 09:30:31,701 ERROR [Timer-Driven Process Thread-7]
> > o.apache.nifi.processors.standard.PutJMS
> > PutJMS[id=4a3d7305-016b-1000-c244-a62fe67418b0] Failed to connect to JMS
> > Server due to javax.jms.JMSException: Could not create Transport. Reason:
> > java.io.IOException: Transport scheme NOT recognized: [http]:
> > javax.jms.JMSException: Could not create Transport. Reason:
> > java.io.IOException: Transport scheme NOT recognized: [http]
> > javax.jms.JMSException: Could not create Transport. Reason:
> > java.io.IOException: Transport scheme NOT recognized: [http]
> > at
> > org.apache.activemq.util.JMSExceptionSupport.create(JMSExceptionSupport.java:36)
> > at
> > org.apache.activemq.ActiveMQConnectionFactory.createTransport(ActiveMQConnectionFactory.java:333)
> > at
> > org.apache.activemq.ActiveMQConnectionFactory.createActiveMQConnection(ActiveMQConnectionFactory.java:346)
> > at
> > org.apache.activemq.ActiveMQConnectionFactory.createActiveMQConnection(ActiveMQConnectionFactory.java:304)
> > at
> > org.apache.activemq.ActiveMQConnectionFactory.createConnection(ActiveMQConnectionFactory.java:244)
> > <http://apache-nifi-developer-list.39713.n7.nabble.com/file/t1140/putJMS-Config.jpg>
> >
> >
> >
> > --
> > Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/


Re: Message sending to JMS through MQ gives error-Transport scheme NOT recognized:

2019-06-13 Thread Koji Kawamura
Hello,

PutJMS is deprecated, PublishJMS is recommended instead.
PublishJMS uses JMSConnectionFactoryProvider Controller Service, in
which you can specify "MQ ConnectionFactory Implementation" and "MQ
Client Libraries path (i.e., /usr/jms/lib)".
You will need to download HornetQ from here, extract it and point the
lib dir at JMSConnectionFactoryProvider "MQ Client Libraries path".

I didn't have no experience with HornetQ, but I downloaded it and try
connecting PublishJMS to it.
However, unfortunately it seems that NiFi JMSConnectionFactoryProvider
doesn't support HornetQ currently.
Because HornetQConnectionFactory needs few application code to
initialize connection factory with service locator, example is
available in the link below.
While NiFi JMSConnectionFactoryProvider uses default constructor and
that can't initialize service locator.
https://gist.github.com/caandradeduarte/a527712241c1e1c6d86b171362b58b78

I think you need a custom JMSConnectionFactoryProviderDefinition
implementation to connect NiFi to HornetQ.
I may be wrong since this is the first time I used HornetQ..

Thanks,
Koji

On Thu, Jun 13, 2019 at 9:21 PM Puspak  wrote:
>
> Hi ,
>
> I am relatively new to nifi .I have a requirment where i have to push some
> messages to JMS (HornetQ).
>
> When i push the message with below configuration in putJMS of nifi i am
> getting the below error.
>
> 2019-06-13 09:30:31,701 ERROR [Timer-Driven Process Thread-7]
> o.apache.nifi.processors.standard.PutJMS
> PutJMS[id=4a3d7305-016b-1000-c244-a62fe67418b0] Failed to connect to JMS
> Server due to javax.jms.JMSException: Could not create Transport. Reason:
> java.io.IOException: Transport scheme NOT recognized: [http]:
> javax.jms.JMSException: Could not create Transport. Reason:
> java.io.IOException: Transport scheme NOT recognized: [http]
> javax.jms.JMSException: Could not create Transport. Reason:
> java.io.IOException: Transport scheme NOT recognized: [http]
> at
> org.apache.activemq.util.JMSExceptionSupport.create(JMSExceptionSupport.java:36)
> at
> org.apache.activemq.ActiveMQConnectionFactory.createTransport(ActiveMQConnectionFactory.java:333)
> at
> org.apache.activemq.ActiveMQConnectionFactory.createActiveMQConnection(ActiveMQConnectionFactory.java:346)
> at
> org.apache.activemq.ActiveMQConnectionFactory.createActiveMQConnection(ActiveMQConnectionFactory.java:304)
> at
> org.apache.activemq.ActiveMQConnectionFactory.createConnection(ActiveMQConnectionFactory.java:244)
> 
>
>
>
> --
> Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/


Re: [EXT] Re: GitHub Stuff

2019-06-12 Thread Koji Kawamura
Thanks Bryan for the heads up.

My GPG key had been expired. I've renewed my KEY by extending expiration.
Now I confirmed that my commits is marked as 'verified' on Github.

Koji

On Wed, Jun 12, 2019 at 5:43 AM Andy LoPresto  wrote:
>
> Peter,
>
> If you have specific issues setting it up, I’m happy to help debug. I haven’t 
> done it recently but am willing to investigate with you.
>
> Andy LoPresto
> alopre...@apache.org
> alopresto.apa...@gmail.com
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
> > On Jun 11, 2019, at 12:55 PM, Bryan Bende  wrote:
> >
> > I will admit I've never setup GPG signing on Linux. I'm sure there are
> > some additional challenges there.
> >
> > Not sure if it is helpful, but there are a few things related to Linux
> > that are mentioned on this Github page:
> >
> > https://help.github.com/en/articles/telling-git-about-your-signing-key
> >
> >
> > On Tue, Jun 11, 2019 at 3:45 PM Kevin Doran  wrote:
> >>
> >> Yep, I support these suggestions.
> >>
> >> Setting up GPG does have a learning curve for folks that haven't done
> >> it before, but I think our community would be helpful in assisting
> >> folks on the mailing list and Apache NiFi Slack where they run into
> >> trouble. It's a good practice to learn and once setup there's not much
> >> more to do to get the benefits of it.
> >>
> >> Setting up GPG is also required when acting as release manager in
> >> order to sign convenience binaries (and soon, as Andy brought up,
> >> maven release artifacts as well - I think that is also a good idea),
> >> so the effort required to get setup for GPG has lots of benefits for
> >> folks that are interested in RM'ing as well.
> >>
> >> Kevin
> >>
> >> On Tue, Jun 11, 2019 at 3:30 PM Peter Wicks (pwicks)  
> >> wrote:
> >>>
> >>> I like having signed commits. I develop on both Windows and Linux, but 
> >>> have only had success getting signing working on Windows (which was a bit 
> >>> complicated as it was). You can see when I switched from mostly Windows 
> >>> to mostly Linux by when I stopped signing commits...
> >>>
> >>> Thanks,
> >>>  Peter
> >>>
> >>> -Original Message-
> >>> From: Andy LoPresto 
> >>> Sent: Tuesday, June 11, 2019 1:25 PM
> >>> To: dev@nifi.apache.org
> >>> Subject: [EXT] Re: GitHub Stuff
> >>>
> >>> I strongly support both of these suggestions. Thanks for starting the 
> >>> conversation Bryan. GPG signing is very important for security and for 
> >>> encouraging the rest of the community to adopt these practices as well.
> >>>
> >>>
> >>> Andy LoPresto
> >>> alopre...@apache.org
> >>> alopresto.apa...@gmail.com
> >>> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
> >>>
>  On Jun 11, 2019, at 11:42 AM, Bryan Bende  wrote:
> 
>  I had two thoughts related to our GitHub usage that I wanted to throw
>  out there for PMC members and committers...
> 
>  1) I think it would be helpful if everyone setup the link between
>  their Apache id and github [1]. Setting up this link puts you into the
>  nifi-committers group in Apache (currently 17 of us are in there), and
>  I believe this is what controls the list of users that can be selected
>  as a reviewer on a pull request. Since PRs are the primary form of
>  contribution, it would be nice if all of the PMC/committers were in
>  the reviewer list, but of course you can continue to commit against
>  Gitbox without doing this.
> 
>  2) I also think it would be nice if most of the commits in the repo
>  were signed commits that show up as "Verified" in GitHub [2]. Right
>  now I think we lose the verification if the user reviewing the commit
>  doesn't have signing setup, because when you amend the commit to add
>  "This closes ...", it technically produces a new commit hash, thus
>  making the original signature no longer apply (at least this is what I
>  think is happening, but other may know more).
> 
>  These are obviously just my opinions and no one has to do these
>  things, but just thought I would throw it out there for discussion in
>  case anyone wasn't aware.
> 
>  -Bryan
> 
>  [1]
>  https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitb
>  ox.apache.org%2Fsetup%2Fdata=02%7C01%7Cpwicks%40micron.com%7Cc2f2
>  0a00f6424597c10708d6eea27d65%7Cf38a5ecd28134862b11bac1d563c806f%7C0%7C
>  0%7C636958778999592924sdata=mJ59FD6KSYn1jXHN0yRRagKf6BHdWn7N1ZXmV
>  4BtBi8%3Dreserved=0 [2]
>  https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fhelp
>  .github.com%2Fen%2Farticles%2Fsigning-commitsdata=02%7C01%7Cpwick
>  s%40micron.com%7Cc2f20a00f6424597c10708d6eea27d65%7Cf38a5ecd28134862b1
>  1bac1d563c806f%7C0%7C0%7C636958778999592924sdata=%2BiByT0SfcxSsoL
>  XgS4VFLI1DTBn9BW3vD1iPvCCqRSI%3Dreserved=0
> >>>
>


Re: [VOTE] Release Apache NiFi NAR Maven Plugin 1.3.1

2019-05-09 Thread Koji Kawamura
+1 (binding)

Went through the Release Helper Guide.
- On OS X
- Build nifi-nar-maven-plugin with contrib-check was successful
- Removed .m2 dir before building NiFi
- Full NiFi build was successful
- Tested standalone and secure clustered NiFi, worked as expected
- Confirmed extension-manifest.xml files were generated and contained in nars

Thanks Bryan for RMing!

Koji

On Fri, May 10, 2019 at 3:52 AM Bryan Bende  wrote:
>
> Hello,
>
> I am pleased to be calling this vote for the source release of Apache
> NiFi NAR Maven Plugin 1.3.1.
>
> The source zip, including signatures, digests, etc. can be found at:
> https://repository.apache.org/content/repositories/orgapachenifi-1144
>
> The Git tag is nifi-nar-maven-plugin-1.3.1-RC1
> The Git commit ID is 51deb8a070ef2b9f0041c2b6448b72de67f91822
> https://gitbox.apache.org/repos/asf?p=nifi-maven.git;a=commit;h=51deb8a070ef2b9f0041c2b6448b72de67f91822
>
> Checksums of nifi-nar-maven-plugin-1.3.1-source-release.zip:
> SHA256: 48f2b3e361d7e45c79a659bfcf7a56a88278411f3686d2da800c58858c742637
> SHA512: 
> c0ace13e7d7f7dd4468c7ac71f1dddc098393481cd1096c7479a48fb215ef47197969da50edb987b182592b988368104d58ec26a21ba4beea2a4db46b6dca4eb
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/bbende.asc
>
> KEYS file available here:
> https://dist.apache.org/repos/dist/release/nifi/KEYS
>
> 1 issue was closed/resolved for this release:
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12345484==12316020
>
> Release note highlights can be found here:
> https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-NiFiNARMavenPluginVersion1.3.1
>
> The vote will be open for 72 hours.
> Please download the release candidate and evaluate the necessary items
> including checking hashes, signatures, build from source, and test.
> Then please vote:
>
> [ ] +1 Release this package as nifi-nar-maven-plugin-1.3.1
> [ ] +0 no opinion
> [ ] -1 Do not release this package because...


Re: NiFi - How to Wait until all Inserts into Table are done, then Create/Analyze Indexes on the Table afterwards

2019-04-02 Thread Koji Kawamura
Hi Matt,

I posted my answer to your Stackoverflow question.
https://stackoverflow.com/questions/55483317/nifi-create-indexes-after-inserting-records-into-table/55486259#55486259

Thanks,
Koji

On Wed, Apr 3, 2019 at 8:04 AM matthewmich...@northwesternmutual.com
 wrote:
>
> NiFi Developers,
>
> I tried reaching out to Stack Overflow Apache-Nifi Tag, but am getting no 
> replies - so normally I wouldn't ask this email list this type of question, 
> but am not sure where else to turn?
>
> So to keep simple, my 1st Process Group (PG) Truncates then drops indexes in 
> table. Then that output port routes to input port in 2nd process group the 
> does the 500K Inserts into the table. After successfully inserting the 500K 
> rows, I want to create the indexes on the table and analyze it via 3rd 
> process group. This is typical Data Warehouse methodology. Can anyone please 
> give advice on how to do this?  I'd like to be able to not even start the 3rd 
> process group, until first 2 complete successfully, and the Insert Count 
> matches the Select Count.
>
> I've tried setting counters, then comparing total inserted count to match 
> initial select count, but it appears that I cannot reference Counters in the 
> Expression Language because this "${InsertCounter}"  syntax always returns 
> nulls??   So maybe instead I should be using the Wait & Notify Processors, so 
> could Wait until all the inserts are done, then Notify the next process (the 
> create & analyze analyze indexes process) to start after all the inserts are 
> done?  Here's example code on that from Stack Overflow:
>
> In the wait processor set the Target Signal Count to ${fragment.count}.
>
> Set the Release Signal Identifier in both the notify and wait processor to 
> ${fragment.identifier}
> Thanks for any help anyone can provide!
>
> Thanks & Regards,
>
> Matt Michala
> Senior Engineer
> Digital Workplace & Corp. Solutions
> Northwestern Mutual (T07-150)
> 414-661-4668
>
>
>
> This e-mail and any attachments may contain confidential information of 
> Northwestern Mutual. If you are not the intended recipient of this message, 
> be aware that any disclosure, copying, distribution or use of this e-mail and 
> any attachments is prohibited. If you have received this e-mail in error, 
> please notify Northwestern Mutual immediately by returning it to the sender 
> and delete all copies from your system. Please be advised that communications 
> with {SECURE MESSAGE} in the subject line have been sent using a secure 
> messaging system. Communications that do not have this tag may not be secure 
> and could be observed by a third party. Thank you for your cooperation.
>
>


Re: Change Flowfile Name

2019-03-28 Thread Koji Kawamura
Hi Rajesh,

To process FlowFiles in the order of its arrival, you need to use
FirstInFirstOutPrioritizer at the outgoing connection from ConsumeMQTT
processor, and all connections after that where first in first out ordering
is required.
Please refer this docs for detail.
https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#prioritization

Hope this helps,
Koji


On Wed, Mar 27, 2019 at 5:55 PM Rajesh Biswas 
wrote:

> Hello NiFi Dev Team,
>
> We are using ConsumeMQTT processor to read the messages from RabbitMQ
> server.
>
> We have experienced the order on which data is passed through ConsumeMQTT
> processor is different than the data order in Rabbit MQ queue.
>
> We are working on a real time data processing pipeline, hence order is
> crucial parameter for our case.
>
> We tried altering different parameters in processor but still the issue is
> not fixed.
>
> Please suggest and give me some direction:
>
> Below are the screenshot of the processor
>
>
>
>
>
>
>
>
>
> Thanks and Regards,
>
> *Rajesh Biswas* | +91 9886433461 | www.bridgera.com
>
>
>
>
> 
>  I’m
> protected online with Avast Free Antivirus. Get it here — it’s free
> forever.
> 
> <#m_1318400948661169330_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>


Re: [VOTE] Release Apache NiFi MiNiFi C++ 0.6.0 (RC2)

2019-03-22 Thread Koji Kawamura
+1 binding

- Went through the release helper guide
- Ran a simple flow to send data from MiNiFi to NiFi using S2S worked fine

JNI processors are exciting! Thanks for managing release Marc!

Koji

On Fri, Mar 22, 2019 at 1:12 PM Kevin Doran  wrote:
>
> +1, binding
>
> - verified build on Mac OS X 10.14 and Alpine Linux (using docker build)
> - exercised various new features including JNI processors and CoAP support
>
> I did observe a minor issue using the bootstrap script on Mac [1] for
> which I submitted [2].
>
> All in all, really great release -- amazing how far the C++ agent has
> come. Great work everyone and thanks for RM'ing Marc!
>
> [1] https://issues.apache.org/jira/browse/MINIFICPP-783
> [2] https://github.com/apache/nifi-minifi-cpp/pull/519
>
> Thanks,
> Kevin
>
> On Thu, Mar 21, 2019 at 10:15 PM Jeremy Dyer  wrote:
> >
> > +1, binding
> >
> >- Verified all checksums
> >- Configured via CMake without bootstrap.sh
> >- Configured and built with bootstrap.sh
> >- Validated base build
> >- Validated build with all extensions enabled on CentOs 7.6 and Ubuntu
> >18.04
> >- Validated base build on OS X 10.14.3
> >- Validated base build on Ubuntu 16.04
> >- Validated base build on Ubuntu 18.04
> >- Verified LICENSE and README look good
> >
> > Thank you for getting this RC together. I have for one have been following
> > this release closely and am very excited for the new JNI integrations,
> > native Python processors, and CoAP support.
> >
> > On Wed, Mar 20, 2019 at 8:10 AM Arpad Boda  wrote:
> >
> > > +1
> > >
> > >
> > >
> > > -Verified checksums, signature, commit ID
> > >
> > > -Built on Ubuntu 16 and OS X 10.13 (executed all tests on both systems)
> > >
> > > -Verified MiNiFi and some Nanofi examples to start and work as expected
> > >
> > > -LICENSE found a looks good to me
> > >
> > > -NOTICE found, but should be updated:
> > >
> > > Apache NiFi MiNiFi
> > >
> > > Copyright 2017 The Apache Software Foundation
> > >
> > > -README.md found a looks good to me, contains a minor typo for Bionic:
> > >
> > > | Ubuntu 16  | make u18 | nifi-minifi-cpp-bionic-$VERSION-bin.tar.gz
> > >
> > >
> > >
> > >
> > >
> > > On 19/03/2019, 20:59, "Aldrin Piri"  wrote:
> > >
> > >
> > >
> > > +1, binding
> > >
> > >
> > >
> > > So much great stuff in this release.  Thanks for RMing, Marc!
> > >
> > >
> > >
> > > comments:
> > >
> > > Signature and hashes looked good
> > >
> > > Verified build, tests, and linter looked good on Ubuntu 18 and OS X
> > > 10.13
> > >
> > > Performed builds using the make targets {u16, u18, centos, debian,
> > > fedora}
> > >
> > > to generate successful assemblies
> > >
> > > Verified a sampling of the assemblies on different platforms and
> > > worked as
> > >
> > > expected for a variety of flows and functionalities
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Tue, Mar 19, 2019 at 12:52 PM Marc Parisi 
> > > wrote:
> > >
> > >
> > >
> > > > Hello Apache NiFi community,
> > >
> > > >
> > >
> > > > I am pleased to call this vote for the source release of Apache NiFi
> > > MiNiFi
> > >
> > > > C++ 0.6.0
> > >
> > > >
> > >
> > > > The source tar.gz, including signatures, digests, and convenience
> > > binaries.
> > >
> > > > can be found at:
> > >
> > > >
> > > https://dist.apache.org/repos/dist/dev/nifi/nifi-minifi-cpp/0.6.0-rc2/
> > >
> > > >
> > >
> > > > The Git tag is minifi-cpp-0.6.0-RC2
> > >
> > > > The Git commit ID is 28ade6cf75e8a0ccce78699f147f4ad9baaf70c2
> > >
> > > >
> > > https://git-wip-us.apache.org/repos/asf?p=nifi-minifi-cpp.git;a=commit;h=
> > >
> > > > <
> > >
> > > >
> > > https://git-wip-us.apache.org/repos/asf?p=nifi-minifi-cpp.git;a=commit;h=28ade6cf75e8a0ccce78699f147f4ad9baaf70c2
> > >
> > > > >
> > >
> > > > 28ade6cf75e8a0ccce78699f147f4ad9baaf70c2
> > >
> > > >
> > >
> > > > Checksum of nifi-minifi-cpp-0.6.0-source.tar.gz:
> > >
> > > > SHA256:
> > > 65c5ecf4b8ce807e982ed4dcb11472f574f62e66bfeeaa3348b0852c926175c3
> > >
> > > > SHA512:
> > >
> > > >
> > >
> > > >
> > > 4c53275b6b595fe1a17f6ad54a4aaa831b45044f3f8a5ad204475d3beb8fde441fc28d4a385d8804ed28f7934e37d45a01ab130b5241baa414a7c788f23e4451
> > >
> > > >
> > >
> > > > Release artifacts are signed with the following key:
> > >
> > > > https://people.apache.org/keys/committer/phrocker.asc
> > >
> > > >
> > >
> > > > KEYS file available here:
> > >
> > > > https://dist.apache.org/repos/dist/release/nifi/KEYS
> > >
> > > >
> > >
> > > > 142 issues were closed/resolved for this release:
> > >
> > > >
> > >
> > > >
> > > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12321520=12343363
> > >
> > > >
> > >
> > > > Release note highlights can be found here:
> > >
> > > >
> > >
> > > >
> > > 

Re: [VOTE] Release Apache NiFi 1.9.1 (rc1)

2019-03-14 Thread Koji Kawamura
+1 binding

Went through release helper guide.
Thanks Joe for managing this release!

On Fri, Mar 15, 2019 at 9:34 AM Aldrin Piri  wrote:
>
> +1, binding
>
> comments:
> hashes and signature looked good
> build, tests, and contrib check good on Ubuntu and MacOS
>
>
> On Thu, Mar 14, 2019 at 6:58 PM James Wing  wrote:
>
> > +1 (binding) - Ran through the release helper, checked the signatures,
> > license/readme, and ran the full build.  Ran a simple test flow.
> >
> > Thanks, Joe, for putting this release together!
> >
> > On Tue, Mar 12, 2019 at 10:49 PM Joe Witt  wrote:
> >
> > > Hello,
> > >
> > > I am pleased to be calling this vote for the source release of Apache
> > NiFi
> > > 1.9.1.
> > >
> > > The source zip, including signatures, digests, etc. can be found at:
> > > https://repository.apache.org/content/repositories/orgapachenifi-1138
> > > https://dist.apache.org/repos/dist/dev/nifi/nifi-1.9.1-rc1/
> > >
> > > The Git tag is nifi-1.9.1-RC1
> > > The Git commit ID is a5cedc4ad39b17bee97303b63b620f9ac3dddc79
> > >
> > >
> > https://gitbox.apache.org/repos/asf?p=nifi.git;a=commit;h=a5cedc4ad39b17bee97303b63b620f9ac3dddc79
> > >
> > > Checksums of nifi-1.9.1-source-release.zip:
> > > SHA256: 7099abb33e26445788630b69f38b8788117cdd787b7001752b4893d8b6c16f38
> > > SHA512:
> > >
> > >
> > 678c2ee32f7db8c73393178f329c574315b1b892084b822f9b7a6dc5bc159d5e7e1169812d9676a72f738d03fd2f4366f2b67ddee152b56c8a77751fd5cbb218
> > >
> > > Release artifacts are signed with the following key:
> > > https://people.apache.org/keys/committer/joewitt.asc
> > >
> > > KEYS file available here:
> > > https://dist.apache.org/repos/dist/release/nifi/KEYS
> > >
> > > 19 issues were closed/resolved for this release:
> > >
> > >
> > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316020=12345163
> > >
> > > Release note highlights can be found here:
> > >
> > >
> > https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version1.9.1
> > >
> > > The vote will be open for 72 hours.
> > > Please download the release candidate and evaluate the necessary items
> > > including checking hashes, signatures, build
> > > from source, and test. Then please vote:
> > >
> > > [ ] +1 Release this package as nifi-1.9.1
> > > [ ] +0 no opinion
> > > [ ] -1 Do not release this package because...
> > >
> >


Re: SSLHandshake Exception from Site-to-Site

2019-03-06 Thread Koji Kawamura
Hi Nadeem,

> nifi.remote.input.host=
This property is used for how S2S server introduces itself to S2S
clients for further network communication.
For example, let's say if the server has 2 ip addresses, private and
public, and the public ip is bounded to a fqdn. hostnames for the
server would be:
- private: ip-10-200-46-112.us-west-2.compute.internal
- public: nifi1.example.com
In that case, the property should be set as
nifi.remote.input.host=nifi1.example.com

I don't have much experience with Kubernetes (K8s), but usually some
naming and port mapping configurations should be used in order to
expose such public API endpoint in an environment like K8s or
containers.
Or use service name to make communication between containers successful.
https://kubernetes.io/docs/concepts/services-networking/connect-applications-service/

Furthermore, if you need to expose your NiFi S2S running on K8s so
that S2S client running outside of the K8s cluster can communicate,
then I think you will need to deploy a load-balancer.
https://kubernetes.io/docs/tutorials/stateless-application/expose-external-ip-address/

> [Site-to-Site Worker Thread-235] o.a.nifi.remote.SocketRemoteSiteListener
> org.apache.nifi.remote.SocketRemoteSiteListener$1$1@74dd1923 Connection URL
> is nifi://ip-10-200-46-112.us-west-2.compute.internal:22343*

This log is written by S2S worker thread, that is working at the S2S
server accepting incoming connections.
The Connection URL here represents the associated peer, which is a S2S
client connecting to the server.
That's why it shows internal hostname and random port.

Thanks,
Koji

On Thu, Mar 7, 2019 at 12:14 AM Mohammed Nadeem  wrote:
>
> Thank you so much Koji for replying,
>
> This issue of SSL Handshake we see is for a single node cluster instance,
> where our NiFi application has been deployed in Kubernetes container, Here
> is the below configuration we did for site-to-site in nifi.properties file
> for a single cluster node.
>
> # Site to Site properties
> nifi.remote.input.host=
> nifi.remote.input.secure=true
> nifi.remote.input.socket.port=9443
> nifi.remote.input.http.enabled=false
>
> I was trying to understand how site-to-site works internally by going
> through source code and also debugging parallel to how it does
> communication. I found couple of observations from my analysis
>
> 1. First off, I believe when you give same hostname as nifi application
> running in a container for site-to-site in nifi.properties for single
> cluster node, the internal site-to-site java code doesn't get the hostname
> of the self node when asked for cluster nodes information (NodeInFormant),
> instead it gives some other private ip hostname. In the logs we see -* DEBUG
> [Site-to-Site Worker Thread-235] o.a.nifi.remote.SocketRemoteSiteListener
> org.apache.nifi.remote.SocketRemoteSiteListener$1$1@74dd1923 Connection URL
> is nifi://ip-10-200-46-112.us-west-2.compute.internal:22343*
> From above debug log, I see the internal java code is not recognizing that
> its a docker container instead its trying to connect with unknown hostname
> name with random port. I believe due to incapable of recognizing its a
> container instead returning some kubernetes node ip address, its throwing
> the ssl handshake error
>
> Interesting thing is, When the port 'nifi.remote.input.socket.port' (9443)
> was reachable at a container level, we see the above SSLHandshake error with
> site-to-site worker thread trying to hit different hostname from the point-1
> I mentioned above, when we blocked this port from the container, the
> SSLHandshake error went away, we no longer see when blocking the port from
> 'nifi.remote.input.socket.port' in properties. I'm not sure if this make
> sense but I want to understand how site-to-site works internally in detail.
>
> If above of my observations are incorrect or something needs to be done,
> please help me in understanding. Please Bryan, Pierre, Marks, Koji or any
> NiFi experts pleas help me understanding this. I have gone through almost
> all blogs and etc.
>
> Please suggest the solution,
>
> Thanks,
> Nadeem
>
>
>
> --
> Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/


Re: SSLHandshake Exception from Site-to-Site

2019-03-04 Thread Koji Kawamura
Hi Nadeem,

How many S2S clients are connecting to your NiFi? And how many NiFi
nodes does your remote NiFi have?

I've encountered the same error message when I conducted a test using
hundreds of S2S client connecting to a single NiFi node.
It happened in a situation like followings:
1. A S2S client connects to the NiFi node
2. The NiFi node accepts the connection, spawns new thread to process
further communication [Site-to-Site Worker Thread-N]
3. But the NiFi node is not able to process incoming connections fast
enough, and when the node starts SSL hand-shake process, the client
has already disconnected.

In my case, setting longer timeout at S2S clients helped accepting
more concurrent connections. But also this can be an indication
suggesting the need of having more nodes (if the message is logged
from the similar situation with mine).

Another possibility is as the message says, a malicious user is
actually sending SSL truncation attack.

Thanks,
Koji

On Fri, Mar 1, 2019 at 1:19 AM Mohammed Nadeem  wrote:
>
> Hi,
>
> Can someone please help me resolving SSLHandshake issue (Site-to-Site) which
> I'm getting in logs. This ERROR doesn't impact us from accessing the NiFi
> canvas or any calls we make from Nifi components (like SSL Context Service).
> This is something which keeps on throwing every now and then in
> nifi-app.logs
>
> Below, is the error we get in the logs
>
> ERROR [Site-to-Site Worker Thread-138]
> o.a.n.r.io.socket.ssl.SSLSocketChannel
> org.apache.nifi.remote.io.socket.ssl.SSLSocketChannel@938965a Failed to
> connect due to {}
> javax.net.ssl.SSLHandshakeException: Reached End-of-File marker while
> performing handshake
> at
> org.apache.nifi.remote.io.socket.ssl.SSLSocketChannel.performHandshake(SSLSocketChannel.java:248)
> at
> org.apache.nifi.remote.io.socket.ssl.SSLSocketChannel.connect(SSLSocketChannel.java:163)
> at
> org.apache.nifi.remote.SocketRemoteSiteListener$1$1.run(SocketRemoteSiteListener.java:168)
> at java.lang.Thread.run(Thread.java:748)
>
> ERROR [Site-to-Site Worker Thread-138]
> o.a.nifi.remote.SocketRemoteSiteListener RemoteSiteListener Unable to accept
> connection from Socket[unconnected] due to javax.net.ssl.SSLException:
> Inbound closed before receiving peer's close_notify: possible truncation
> attack?
>
> Setup,
> CA Server is running on separate host ( eg, ca_server_host ) which generates
> self-signed certificates
> Each Nifi instance calls CA to get the keystore, trustore etc like the
> necessary certs
>
> Please help me understand the issue, I have gone through many resources
> online but I wasn't able to resolve,
>
> Thanks,
> Nadeem
>
>
>
> --
> Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/


Re: Not able to connect through site-to-site remote professor group to the last nifi in a nifi chaining

2019-03-03 Thread Koji Kawamura
Hello,

The error message indicates that the URL is not in a valid format.
Is there a trailing white space with this configuration?
nifi.remote.input.host=FQDN of Nifi3

Thanks,
Koji

On Sat, Mar 2, 2019 at 12:00 PM Puspak  wrote:
>
> # Site to Site properties-for Nifi1
> # nifi.remote.input.host=
> # nifi.remote.input.secure=false
> # nifi.remote.input.socket.port=8443
> # nifi.remote.input.http.enabled=true
> # nifi.remote.input.http.transaction.ttl=30 sec
> # nifi.remote.contents.cache.expiration=30 secs
> ===
> # Site to Site properties- for Nifi2
> nifi.remote.input.host=FQDN of Nifi2
> nifi.remote.input.secure=false
> nifi.remote.input.socket.port=8443
> nifi.remote.input.http.enabled=true
> nifi.remote.input.http.transaction.ttl=30 sec
> nifi.remote.contents.cache.expiration=30 secs
> 
> # Site to Site properties- for Nifi3
> nifi.remote.input.host=FQDN of Nifi3
> nifi.remote.input.secure=false
> nifi.remote.input.socket.port=
> nifi.remote.input.http.enabled=true
> nifi.remote.input.http.transaction.ttl=30 sec
> =
> Basically i have 3 nifi in chain Nifi1--> Nifi2-->Nifi3 connected via
> remoteproceddor group and i have the site-to-site properties configured as
> above.
> I am getting the connectivity and push message successfully from
> Nifi1->Nifi2 , but from nifi2->nifi3 i am not able to get the connectivity
> and push message .
>
> I am getting the below error when trying to push message from Nifi2->Nifi3
>
> 2019-03-02 07:41:06,181 WARN [Timer-Driven Process Thread-4]
> o.a.n.controller.tasks.ConnectableTask Administratively Yielding
> RemoteGroupPort[name=from ind23 Nifi,targets=http://FQDN-OF-NIFI3:8080/nifi]
> due to uncaught Exception: java.lang.RuntimeException:
> java.lang.IllegalArgumentException: Invalid URL: http://FQDN-OF-NIFI3
> :8080/nifi-api
> java.lang.RuntimeException: java.lang.IllegalArgumentException: Invalid URL:
> http://FQDN-OF-NIFI3 :8080/nifi-api
> at 
> org.apache.nifi.controller.AbstractPort.onTrigger(AbstractPort.java:257)
>
>
>
>
> --
> Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/


Re: [VOTE] Release Apache NiFi 1.9.0 (rc2)

2019-02-19 Thread Koji Kawamura
+1 Release this package as nifi-1.9.0 (binding)

- Verified signature and hashes
- Clean build & test passed
- Tested flows using standalone and secure cluster environments

Thanks,
Koji

On Mon, Feb 18, 2019 at 10:30 PM Denes Arvay  wrote:
>
> +1 Release this package as nifi-1.9.0 (non-binding)
>
> - Verified signature
> - Verified hashes
> - Verified that the RC was branched off the correct git commit ID
> - Build successful & tests pass (using mvn clean install
> -Pcontrib-check,include-grpc)
> - Started and tested with a simple flow
>
> (nit: the NOTICE files still contain copyright ...-2018)
>
> Best,
> Denes
>
> On Sun, Feb 17, 2019 at 4:50 AM Joe Witt  wrote:
>
> > Hello,
> >
> > I am pleased to be calling this vote for the source release of Apache NiFi
> > nifi-1.9.0.
> >
> > The source zip, including signatures, digests, etc. can be found at:
> > https://repository.apache.org/content/repositories/orgapachenifi-1136
> >
> > The Git tag is nifi-1.9.0-RC2
> > The Git commit ID is 45bb53d2aafd6ec5cb6bb794b3f7f8fc8300a04b
> >
> > https://gitbox.apache.org/repos/asf?p=nifi.git;a=commit;h=45bb53d2aafd6ec5cb6bb794b3f7f8fc8300a04b
> >
> > Checksums of nifi-1.9.0-source-release.zip:
> > SHA256: f8d2987a98903f0c00c50677f3a6ad361e417c6021f5179280cbe9ca838695da
> > SHA512:
> >
> > 2e77c420f932514417693584b4708a534df398e344dac7c1471f55cc382b7493d73b10ebc0d9e58562eb989c1f0b72980d6d18a2555883267f0bc08f092f30fe
> >
> > Release artifacts are signed with the following key:
> > https://people.apache.org/keys/committer/joewitt.asc
> >
> > KEYS file available here:
> > https://dist.apache.org/repos/dist/release/nifi/KEYS
> >
> > 160 issues were closed/resolved for this release:
> >
> > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316020=12344357
> >
> > Release note highlights can be found here:
> >
> > https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version1.9.0
> > https://dist.apache.org/repos/dist/dev/nifi/nifi-1.9.0-rc2/
> >
> > The vote will be open for 72 hours.
> > Please download the release candidate and evaluate the necessary items
> > including checking hashes, signatures, build
> > from source, and test. Then please vote:
> >
> > [ ] +1 Release this package as nifi-1.9.0
> > [ ] +0 no opinion
> > [ ] -1 Do not release this package because...
> >


Re: Advanced UI button on Update Attributes

2019-01-06 Thread Koji Kawamura
Probably you've already found a solution, but just in case, did you
update nifi-processor-configuration file, too?
nifi-update-attribute-ui/src/main/webapp/META-INF/nifi-processor-configuration

Thanks,
Koji

On Thu, Jan 3, 2019 at 7:29 PM DAVID SMITH
 wrote:
>
> Hi
> I am looking to create a bespoke processor based very loosely on the Update 
> Attribute processor set, I have taken a copy of the update attribute bundle 
> and changed the name of the bundle and its components to 
> nifi-test-build-processor, nifi-test-build-model etc, I have also amended the 
> pom files to reference these new names. However, when I build the new test 
> processor set, it builds and loads in NiFi 1.8.0 but the Advanced button has 
> disappeared in the UpdateAttribute processor configuration. Can anyone tell 
> me what I have missed, I have done this once before but I can't remember what 
> the answer is?
> Many thanksDave


Change to ANTLR Lexer

2018-12-06 Thread Koji Kawamura
Hi team,

I'm trying to fix a part of ANTLR Lexer that hasn't been updated since
NiFi is released. Since would like to have more reviewers and comments
if possible.

Ed, Otto and I have been working on
NIFI-5826 UpdateRecord processor throwing PatternSyntaxException
https://issues.apache.org/jira/browse/NIFI-5826

We identified that the cause is RecordPathLexer.g converts
 an escaped square-bracket '\['
 to an escaped back-slash and square-bracket '\\['

Specifically, RecordPathLexer.g 'ESC' fragment does that conversion.
Also, AttributeExpressionLexer and HL7QueryLexer have the same
definition (and issue).

There is detailed explanation on this PR's description.
https://github.com/apache/nifi/pull/3200

I will hold merging this PR even if Ed and Otto give +1, to make
enough time for other reviewers, at least for another week.
Any comment would be appreciated. Something like:
- Suggestions to add more unit tests, input and expectation
- The original intent of converting a single back-slash to doubled (if
anyone knows)
- "We should hold this until NiFi 2.0" (not my preference though)
... etc

Thanks in advance!
Koji


Re: Fetching data from database on cluster.

2018-12-02 Thread Koji Kawamura
Hello,

Instead of implementing another lock within NiFi, I suggest
investigating the reason why the primary node changed.
NiFi uses Zookeeper election to designate a node to be the primary node.
If a primary node's Zookeeper session timed out, Zookeeper elects
another node to take over the primary node role.

I suspect there were some issues for the ex-primary node to miss
sending a heart beat to Zookeeper within configured timeout (4 secs by
default, 2000 tickTime x 2). That can happen NW issue, Java VM pause
due to GC or Virtual machine pause ... etc.

If you are using NiFi's embedded Zookeeper to coordinate your cluster,
I recommend using an external Zookeeper cluster for stability.

Thanks,
Koji
On Wed, Nov 28, 2018 at 1:04 AM Bronislav Jitnikov  wrote:
>
> I have small cluster with 3 instances of NiFi, and found some problem, as I
> think.
> Processor QueryDatabaseTable set to work on PrimaryNode only and Concurent
> tasks set to 1. Run Schedule set to large value (something like 20
> minutes), so I expect only one execution at a time. While query is
> executed, primary node changed and new Task started on new primary node. So
> I see two ways to resolve this problem:
> 1. Create some sort of lock on QueryDatabaseTable (maybe custom proc that
> lock run across the cluster StateManager)
> 2. Add some check in  connectableTask.invoke() (Better for me because I
> have similar problems with get data from REST).
>
> May be I miss something: So any help and ideas would be appreciated.
>
> Bronislav Zhitnikov
>
> PS: and sorry for my bad English.


Re: Merge multiple xml files into one file

2018-11-28 Thread Koji Kawamura
Hi Bhasker,

MergeRecord processor can do the job.
If your XML files are compressed, you can use CompressContent with
'decompress' mode in front of MergeRecord.

Please refer this NiFi flow template as an example for MergeRecord
merging multiple XML files.
https://gist.github.com/ijokarumawak/eeaf519a7ceea476fa452f7aa2ee5671

Also, the additional detail for XMLRecordSetWriter can be helpful.
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-record-serialization-services-nar/1.8.0/org.apache.nifi.xml.XMLRecordSetWriter/additionalDetails.html

Hope this helps.

Thanks,
Koji

On Thu, Nov 29, 2018 at 2:40 AM Bhasker gaddam  wrote:
>
> In my use case I want to merge multiple .xml files into single file by
> using of Apache nifi.
>
> My xml file are end with .gz format , could  you please help me to how to
> implement merging xml files into single or some number (5 or 10) files.
>
>
>
>
> Thanks,
> Bhasker Gaddam


Re: Globally configure Backpressure count and size for all queues

2018-11-21 Thread Koji Kawamura
Hello,

You can substitute back-pressure configurations saved in your
flow.xml.gz file before restarting NiFi.

# For example, use sed command. The command creates updated-flow.xml.gz
gunzip -c conf/flow.xml.gz |sed
's/1<\/backPressureObjectThreshold>/300<\/backPressureObjectThreshold>/g'
| gzip > conf/updated-flow.xml.gz

# Then confirm the changes and replace the original
mv conf/flow.xml.gz conf/flow.xml.gz.org
mv conf/updated-flow.xml.gz conf/flow.xml.gz

Thanks,
Koji
On Wed, Nov 21, 2018 at 9:26 PM tphan.dat  wrote:
>
> Dear all,
>
> Inside the *nifi.properties* file, there exists two properties, which are
> *nifi.queue.backpressure.count* and *nifi.queue.backpressure.size* for
> setting up the default count and size when creating new queue.
>
> However, it is certainly not applied to already created queues.
>
> My question is: is it possible to configure default count and size
> thresholds for all created queues when starting up NIFI?
>
>
>
>
> --
> Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/


Re: Incremental Fetch for RestAPI

2018-11-12 Thread Koji Kawamura
Hi Manee,

It depends on the Client API on how to tell what the next response
data set should be.
That may be an additional query parameter such as last fetch
timestamp, or something like HTTP etag header in most APIs.
You can pass FlowFiles to InvokeHTTP to tell such parameters.

Also, I recommend using Record processors over splitting dataset to process.
Because Record processors work more efficiently and the data set unit
will be more meaningful in your case.

I think your flow would be something like:

InvokeHTTP (Assuming the API result contains value to make next
incremental request)
 -> Do some JSON transformation (QueryRecord, UpdateRecord or
JoltTransformRecord)
 -> PutDatabaseRecord
 -> Then connect success back to InvokeHTTP to fetch next dataset

Hope this helps.

Thanks,
Koji

On Mon, Nov 12, 2018 at 9:06 PM Manee  wrote:
>
> Hi Team,
>
> I am new to NiFi .I have a task like .We need to fetch a data from  Client
> API and stored into Postgresql
> My Flow ;
>
> InvokeHTTP -->>--SplitJson-->>--EvaluateJsonPath
> -->>ConvertJSONTosql--->>PutSql
>
> This is my flow is working fine but i need to make this as a incremental
> fetch from API ..whenever client API made changes it should reflect in our
> databases .How i can process for incremental fetch in API call ?.Please
> guide me to fix this problem .
>
>
> Thanks in Advance,
> Manikandan K
>
>
>
>
>
>
> -
> Thanks,
> Manee
> --
> Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/


Re: ListSFTP is hanging

2018-11-11 Thread Koji Kawamura
Hello Dave,

Although you already mentioned that you haven't migrated to 1.8 yet, I
recommend to do so because:
- 1.8 adds new 'Listing Strategy' property to ListSFTP processor,
which may help your use-case to not miss any files to list
https://issues.apache.org/jira/browse/NIFI-5406
- 1.8 also introduced new 'Load Balancing' capability at connections,
which simplifies List, Distribute, Fetch pattern
https://issues.apache.org/jira/browse/NIFI-5516

It's all about file timestamps and the state tracked by ListSFTP. If
you can share the ListSFTP state and the files and their last modified
timestamps those are expected to be fetched (but not), we can find how
they are missed. But the reason why we had to implement NIFI-5406 was,
handling every case by only timestamps was not possible. Even if we
find a reason, it doesn't guarantee that we can find a workaround
without updating NiFi to 1.8.

Thanks,
Koji
On Sat, Nov 10, 2018 at 7:55 AM David Marrow  wrote:
>
> We have been using Nifi for over a year and we just turned up a new cluster.
> We move around 6TB a day of small to large files.   We are having an issue
> of the ListSFTP missing files.   I know this can happen if a file with an
> older date is moved into the directory because the lister is maintaining
> state.   However it also seems to hang when there are 10k plus files.   I am
> running Nifi 1.6 on Ubuntu 18.  The cluster has plenty of memory, CPU, and
> disk space.   I am also using the distributed cache because we haven't
> migrated to 1.8 yet.
>
> We have 20 different data flows all with their own logic.  We connect the
> Lister to a remote port that is connected to a remote process group and then
> distributed across the cluster to a FetchSFTP that deletes the files after
> they are loaded.
>
> We move files into the input directory so we have permission to delete them
> from the Nifi Fetch.  We are doing a find which orders the files to make
> sure that we don't grab old files.  This could still be an issue and cause
> us to miss a few files but it still doesn't explain why when the lister is
> running and there are files to pull nothing gets pulled.
>
> Any suggestion for idea would be appreciated.
>
> Dave
>
>
>
> --
> Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/


Re: Load Balancing

2018-11-01 Thread Koji Kawamura
Hi Mark,

> In this scenario, should the nifi.cluster.load.balance.comms.timeout have 
> caused the balancing operation to terminate (unsuccessful)?

I agree with that. Wasn't there any WARN log messages written?
Currently NiFi UI doesn't have capability to show load-balancing
related error on the canvas, other than currently active or not.

> Another question: the usage of nifi.cluster.load.balance.host (and .port) 
> values is not clear to me. If Node A set's this value for Node B's FQDN, 
> would this allow Node A to "spoof" Node B and accept load balanced items 
> intended for Node B?

The two properties are used how a NiFi node opens its socket to
receive load-balanced data from other node. It can be useful when a
node have different NICs and you want to use specific one for
load-balancing. If you specify a hostname or ip-address that is not
for the node, then you'll get an exception when the node tries opening
a socket since it can't bind to the specified address.

Thanks,
Koji
On Thu, Nov 1, 2018 at 3:04 AM mark.o.b...@gmail.com
 wrote:
>
> I found the problem: iptables was blocking the load balancing port. Once the 
> port was opened, the balance completed and all files were visible via List 
> queue.
>
> In this scenario, should the nifi.cluster.load.balance.comms.timeout have 
> caused the balancing operation to terminate (unsuccessful)?
>
> Another question: the usage of nifi.cluster.load.balance.host (and .port) 
> values is not clear to me. If Node A set's this value for Node B's FQDN, 
> would this allow Node A to "spoof" Node B and accept load balanced items 
> intended for Node B?
>
>
> On 2018/10/31 17:45:29, Mark Bean  wrote:
> > I am trying to understand how the load balancing works in NiFi 1.8.0.
> >
> > I have a 2-node Cluster. I set an UpdateAttribute to set the value of a
> > property, "balancer", to either 0 or 1. I am using stateful EL for this:
> > ${getStateValue('balancer'):plus(1):mod(2)}.
> >
> > The connection for the output of the UpdateAttribute processor is load
> > balanced.
> > Load Balance Strategy: Partition by attribute
> > Attribute name: balancer
> >
> > The queue in the connection contains 5 objects, but when I perform a "List
> > queue", I only see 3 flowfiles. All of the flowfiles are on the same Node,
> > and as expected have the same "balancer" attribute value.
> >
> > Presumably, the other 2 flowfiles were load-balanced to the other Node.
> > However, they should still be visible in List queue, correct?
> >
> > Perhaps related, the load balance icon on the connection indicates
> > "Actively balancing...". There are only two 10 byte files, but the
> > balancing never seems to complete.
> >


Re: Status of MetricsReportingTask?

2018-11-01 Thread Koji Kawamura
Hi Jon,

About reporting counter values, there is an existing JIRA for the
improvement idea to expose counters to Reporting task context. That
requires NiFi framework level improvements. I'd suggest taking a look
on it, and resuming discussion there if needed.
https://issues.apache.org/jira/browse/NIFI-3293

Although MetricsReportingTask currently only supports Graphite, the
component is well designed for generic reporting use-cases. I don't
think it is a legacy component. The underlying dropwizard metrics
project seems active, too.
https://github.com/dropwizard/metrics

I'm interested in what service implementation you're going to write.
Is it going to use one of already available reporters?
https://metrics.dropwizard.io/3.1.0/manual/third-party/

Thanks,
Koji
On Thu, Nov 1, 2018 at 3:17 AM Jon Logan  wrote:
>
> Hi All,
>
> I was looking at utilizing the MetricsReportingTask service, but I was
> wondering what the status of it is -- it seems to be lacking some features
> that I thought it'd have (ex. reporting counters), and I'm not sure there's
> an ability to extend the metrics being produced. Is this something that is
> still being worked on, or is a legacy component? We are going to have to
> write our own Service implementation, as the only supported one seems to be
> Graphite, and wanted to make sure we're not going down a legacy path.
>
> Thanks!


Re: Controller Service not loading ERROR: The service APIs should not be bundled with the implementations.

2018-10-29 Thread Koji Kawamura
Hi Milan,

I assume you put both of the ControllerServices interface and
implementation classes into the same NAR file.
You need to separate those into different NARs.
Please refer nifi-standard-services-api-nar (interfaces) and
nifi-distributed-cache-client-sevices-nar (implementations).

Thanks,
Koji
On Tue, Oct 30, 2018 at 10:51 AM Milan Das  wrote:
>
> Hello NIFI Dev,
>
> I am trying to add two  new controller services. Getting an error with one of 
> the controller services. I am not sure what went wrong.
>
> I am guessing it is pssoibly nifi jar maven dependencies scope I have added. 
> Like nifi-hbase-client-service-api, nifi-distributed-cache-client-service-api
>
>
>
>
>
> 2018-10-29 21:35:17,614 WARN [main] org.apache.nifi.nar.ExtensionManager 
> Controller Service com.interset.nifi.hbase.CDHHBase_ClientService is bundled 
> with its supporting APIs com.interset.nifi.hbase.CDHHBaseClientService. The 
> service APIs should not be bundled with the implementations.
>
> 2018-10-29 21:35:17,614 ERROR [main] org.apache.nifi.nar.ExtensionManager 
> Skipping Controller Service com.interset.nifi.hbase.CDHHBase_ClientService 
> because it is bundled with its supporting APIs and requires instance class 
> loading.
>
> 2018-10-29 21:35:21,523 WARN [main] o.a.n.d.html.HtmlDocumentationWriter 
> Could not link to com.interset.nifi.hbase.CDHHBase_ClientService because no 
> bundles were found
>
>
>
>
>
> Thanks,
>
> Milan Das
>
>
>


Re: [EXT] Re: New Standard Pattern - Put Exception that caused failure in an attribute

2018-10-26 Thread Koji Kawamura
Hi all,

I'd like to add another option to Matt's list of solutions:

4) Add a processor property, 'Enable detailed error handling'
(defaults to false), then toggle available list of relationships. This
way, existing flows such as Peter's don't have to change, while he can
opt-in new relationships. RouteOnAttribute can be a reference
implementation.

I like the idea of thinking relationships as potential exceptions. It
can be better if relationships have hierarchy.
Some users need more granular relationships while others don't.
For NiFi 2.0 or later, supporting relationship hierarchy at framework
can mitigate having additional property at each processor.

Thanks,
Koji
On Fri, Oct 26, 2018 at 11:49 AM Matt Burgess  wrote:
>
> Peter,
>
> Totally agree, RDBMS/JDBC is in a weird class as always, there is a
> teaspoon of exception types for an ocean of causes. For NiFi 1.x, it
> seems like we need to pick from a set of less-than-ideal solutions:
>
> 1) Add new relationships, but then your (possibly hundreds of)
> processors are invalid
> 2) Add new auto-terminated relationships, but then your
> previously-handled errors are "lost"
> 3) Add an attribute, but then each NiFi instance/release/flow is
> responsible for parsing the error and handling it as desired.
>
> We could mitigate 1-2 with a tool that updates your flow/template by
> sending all new failure relationships to the same target as the
> existing one, but then the tool itself suffers from maintainability
> issues (as does option #3). If we could recognize that the new
> relationships are self-terminated and then send the errors out to the
> original failure relationship, that could be quite confusing to the
> user, especially as time goes on (how to suppress the "new" errors,
> e.g.).
>
> IMHO I think we're between a rock and a hard place here, I guess with
> great entropy comes great responsibility :P
>
> P.S. For your use case, is the workaround to just keep retrying? Or
> are there other constraints at play?
>
> Regards,
> Matt
>
> On Thu, Oct 25, 2018 at 10:27 PM Peter Wicks (pwicks)  
> wrote:
> >
> > Matt,
> >
> > If I were to split an existing failure relationship into several 
> > relationships, I do not think I would want to auto-terminate in most cases. 
> > Specifically, I'm interested in a failure relationship for a database 
> > disconnect during SQL execution (database was online when the connection 
> > was verified in the DBCP pool, but went down during execution). If I were 
> > to find a way to separate this into its own relationship, I do not think 
> > most users would appreciate it being a condition silently not handled by 
> > the normal failure path.
> >
> > Thanks,
> >   Peter
> >
> > -Original Message-
> > From: Matt Burgess [mailto:mattyb...@apache.org]
> > Sent: Friday, October 26, 2018 10:18 AM
> > To: dev@nifi.apache.org
> > Subject: Re: [EXT] Re: New Standard Pattern - Put Exception that caused 
> > failure in an attribute
> >
> > NiFi (as of the last couple releases I think) has the ability to set 
> > auto-terminating relationships; this IMO is one of those use cases (for 
> > NiFi 1.x). If new relationships are added, they could default to 
> > auto-terminate; then the existing processors should remain valid.
> > However we might want an "omnibus Jira" to capture those relationships we'd 
> > like to remove the auto-termination from in NiFi 2.0.
> >
> > Regards,
> > Matt
> > On Thu, Oct 25, 2018 at 10:12 PM Peter Wicks (pwicks)  
> > wrote:
> > >
> > > Mark,
> > >
> > > I agree with you that this is the best option in general terms. After 
> > > thinking about it some more I think the biggest use case is for 
> > > troubleshooting. If a file routes to failure, you need to be watching the 
> > > UI to see what the exception was. An admin may have access to the NiFi 
> > > log files and could grep the error out, but a normal user who checks in 
> > > on the flow and sees a FlowFile in the error queue will not know what the 
> > > cause was; this is especially frustrating if retrying the file works 
> > > without failure the second time... Capturing the error message in an 
> > > attribute makes this easy to find.
> > >
> > > One thing I worry about too is adding new relationships to core 
> > > processors. After an upgrade, won't users need to go to each instance of 
> > > that processor and handle the new relationship? Right now I'd swagger we 
> > > have at least five thousand ExecuteSQL processors in our environment; and 
> > > while we have strong scripting skills in my NiFi team, I would not want 
> > > to encounter this without that.
> > >
> > > Thanks,
> > >   Peter
> > >
> > > -Original Message-
> > > From: Mark Payne [mailto:marka...@hotmail.com]
> > > Sent: Thursday, October 25, 2018 10:38 PM
> > > To: dev@nifi.apache.org
> > > Subject: [EXT] Re: New Standard Pattern - Put Exception that caused
> > > failure in an attribute
> > >
> > > I agree - the notion of adding a "failure.reason" attribute 

Re: [VOTE] Release Apache NiFi 1.8.0 (RC3)

2018-10-24 Thread Koji Kawamura
+1 (binding)

Ran through the release helper.
No issue was found.
Thanks for RM duties, Jeff!

On Wed, Oct 24, 2018 at 1:42 PM James Wing  wrote:
>
> +1 (binding).  Ran though the release helper, tested the resulting binary.
> Thank you for your persistence, Jeff.
>
>
> On Mon, Oct 22, 2018 at 10:56 PM Jeff  wrote:
>
> > Hello,
> >
> > I am pleased to be calling this vote for the source release of Apache NiFi
> > nifi-1.8.0.
> >
> > The source zip, including signatures, digests, etc. can be found at:
> > https://repository.apache.org/content/repositories/orgapachenifi-1135
> >
> > The Git tag is nifi-1.8.0-RC3
> > The Git commit ID is 98aabf2c50f857efc72fd6f2bfdd9965b97fa195
> >
> > https://git-wip-us.apache.org/repos/asf?p=nifi.git;a=commit;h=98aabf2c50f857efc72fd6f2bfdd9965b97fa195
> >
> > Checksums of nifi-1.8.0-source-release.zip:
> > SHA256: 6ec21c36ebb232f344493a4aeb5086eed0c462c576e11a79abed8149bc8b65c3
> > SHA512:
> >
> > 846aecd4eb497a3b7dee7d1911b02453b8162b6c87e39f3df837744a478212e2e3e3615921079d29c2804671f26ecd05b04ce46a4bb69e8911fc185e27be9c24
> >
> > Release artifacts are signed with the following key:
> > https://people.apache.org/keys/committer/jstorck.asc
> >
> > KEYS file available here:
> > https://dist.apache.org/repos/dist/release/nifi/KEYS
> >
> > 209 issues were closed/resolved for this release:
> >
> > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316020=12343482
> >
> > Release note highlights can be found here:
> >
> > https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version1.8.0
> >
> > The vote will be open for 72 hours.
> > Please download the release candidate and evaluate the necessary items
> > including checking hashes, signatures, build
> > from source, and test. Then please vote:
> >
> > [ ] +1 Release this package as nifi-1.8.0
> > [ ] +0 no opinion
> > [ ] -1 Do not release this package because...
> >


Re: [VOTE] Release Apache NiFi 1.8.0 (RC2)

2018-10-22 Thread Koji Kawamura
+1 (binding).

Build passed, confirmed few existing flows with a secure cluster.
On Mon, Oct 22, 2018 at 12:01 PM James Wing  wrote:
>
> +1 (binding).  Thanks again, Jeff.
>
> On Sat, Oct 20, 2018 at 8:11 PM Jeff  wrote:
>
> > Hello,
> >
> > I am pleased to be calling this vote for the source release of Apache NiFi
> > nifi-1.8.0.
> >
> > The source zip, including signatures, digests, etc. can be found at:
> > https://repository.apache.org/content/repositories/orgapachenifi-1134
> >
> > The Git tag is nifi-1.8.0-RC2
> > The Git commit ID is 19bdd375c32c97e2b7dfd41e5ffe65f5e1eb2435
> >
> > https://git-wip-us.apache.org/repos/asf?p=nifi.git;a=commit;h=19bdd375c32c97e2b7dfd41e5ffe65f5e1eb2435
> >
> > Checksums of nifi-1.8.0-source-release.zip:
> > SHA256: 72dc2934f70f41e0c62e0aeb2bdc48e9feaa743dc06319cbed42da04bdc0f827
> > SHA512:
> >
> > 012194f79d4bd5060032588e29f5e9c4240aa5e4758946a6cbcc89c0a1499de9db0c46d3f76e5ee694f0f9345c5f1bee3f3e315ef6fcc1194447958cb3f8b003
> >
> > Release artifacts are signed with the following key:
> > https://people.apache.org/keys/committer/jstorck.asc
> >
> > KEYS file available here:
> > https://dist.apache.org/repos/dist/release/nifi/KEYS
> >
> > 204 issues were closed/resolved for this release:
> >
> > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316020=12343482
> >
> > Release note highlights can be found here:
> >
> > https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version1.8.0
> >
> > The vote will be open for 96 hours.
> > Please download the release candidate and evaluate the necessary items
> > including checking hashes, signatures, build
> > from source, and test. Then please vote:
> >
> > [ ] +1 Release this package as nifi-1.8.0
> > [ ] +0 no opinion
> > [ ] -1 Do not release this package because...
> >


Re: [VOTE] Release Apache NiFi 1.8.0

2018-10-18 Thread Koji Kawamura
+1 binding

Validated signatures and hashes
Confirmed existing flows work, load-balance and node-offload with a
secured cluster from UI and CLI.

Thank you for the RM duties, Jeff!

Koji
On Fri, Oct 19, 2018 at 6:32 AM Jeremy Dyer  wrote:
>
> +1, binding
>
> Validated signatures, hashes, and commit
> Validated existing workflows against a NiFi Registry instance
>
> Lots of good stuff!
> Jeff thanks for handling the release!
>
> - Jeremy Dyer
>
> On Thu, Oct 18, 2018 at 5:11 PM Pierre Villard 
> wrote:
>
> > +1, binding
> >
> > Checked signature, hashes, L and commit.
> > Ran multiple workflows in both secured and unsecured clusters.
> > Played with the new load-balancing and offloading features (AWESOME
> > WORK!!!)
> > Confirmed some minor fixes and improvements.
> >
> > A great release!
> > Thanks Jeff for taking care of the RM duties and thanks to all the people
> > contributing in this release!
> >
> > Pierre
> >
> > Le jeu. 18 oct. 2018 à 20:33, Otto Fowler  a
> > écrit :
> >
> > > +1
> > >
> > > Confirmed signatures
> > > Confirmed hashes
> > > Confirmed source matches commit src
> > > ran build withe check
> > > ran nifi with a template from a bug to verify fix
> > >
> > >
> > >
> > >
> > > On October 18, 2018 at 09:45:12, Joe Witt (joe.w...@gmail.com) wrote:
> > >
> > > +1 (binding)
> > >
> > > Confirmed sigs, hashes, source L, and specified commit present
> > > (including Drew's doc update which is the last code change)
> > > Confirmed full clean build with contrib check and grpc on clean
> > repo/etc..
> > > Tested local nifi instance and sample flows. Did not test full secure
> > > setup with clustering.
> > >
> > > This release has quietly turned out to be another beast and the new
> > > load balancing capabilities are awesome!
> > >
> > > Thanks
> > > On Thu, Oct 18, 2018 at 7:57 AM Mike Thomsen 
> > > wrote:
> > > >
> > > > (NOTICE, LICENSE and README all looked good when I looked at them; no
> > > > problems were apparent)
> > > >
> > > > On Thu, Oct 18, 2018 at 7:56 AM Mike Thomsen 
> > > wrote:
> > > >
> > > > > +1 binding.
> > > > >
> > > > > - Checksums and branch id/diff matched up.
> > > > > - Ran binary against a Mongo regression test that I made for the new
> > > > > client service functionality and everything worked cleanly against
> > > > > dockerized Mongo.
> > > > >
> > > > > On Wed, Oct 17, 2018 at 11:59 PM Jeff  wrote:
> > > > >
> > > > >> Hello,
> > > > >>
> > > > >> I am pleased to be calling this vote for the source release of
> > Apache
> > > NiFi
> > > > >> nifi-1.8.0.
> > > > >>
> > > > >> The source zip, including signatures, digests, etc. can be found at:
> > > > >>
> > https://repository.apache.org/content/repositories/orgapachenifi-1133
> > > > >>
> > > > >> The Git tag is nifi-1.8.0-RC1
> > > > >> The Git commit ID is 9b02d58626ca874ed2ed3e0bbe530512cfa0dbf8
> > > > >>
> > > > >>
> > >
> > >
> > https://git-wip-us.apache.org/repos/asf?p=nifi.git;a=commit;h=9b02d58626ca874ed2ed3e0bbe530512cfa0dbf8
> > > > >>
> > > > >> Checksums of nifi-1.8.0-source-release.zip:
> > > > >> SHA256:
> > > 3ec90a7f153e507d7bba2400d6dafac02641d6f7afc7a954fed959191073ce21
> > > > >> SHA512:
> > > > >>
> > > > >>
> > >
> > >
> > 8b9d944da1833bfb645f502107cab98a555e3b2a7602c5ff438407272c86defdeebe18625c5ad9dfb3f344397314569e97220a35f2438182a79a700caa90721e
> > >
> > > > >>
> > > > >> Release artifacts are signed with the following key:
> > > > >> https://people.apache.org/keys/committer/jstorck.asc
> > > > >>
> > > > >> KEYS file available here:
> > > > >> https://dist.apache.org/repos/dist/release/nifi/KEYS
> > > > >>
> > > > >> 204 issues were closed/resolved for this release:
> > > > >>
> > > > >>
> > >
> > >
> > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316020=12343482
> > > > >>
> > > > >> Release note highlights can be found here:
> > > > >>
> > > > >>
> > >
> > >
> > https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version1.8.0
> > > > >>
> > > > >> The vote will be open for 72 hours.
> > > > >> Please download the release candidate and evaluate the necessary
> > items
> > > > >> including checking hashes, signatures, build
> > > > >> from source, and test. Then please vote:
> > > > >>
> > > > >> [ ] +1 Release this package as nifi-1.8.0
> > > > >> [ ] +0 no opinion
> > > > >> [ ] -1 Do not release this package because...
> > > > >>
> > > > >
> > >
> >


Re: [DISCUSS] Closing in on a release of NiFi 1.8.0?

2018-10-14 Thread Koji Kawamura
Jeff, Sivasprasanna,

NIFI-5698 (PR3073) Fixing DeleteAzureBlob bug is merged.

Thanks,
Koji
On Mon, Oct 15, 2018 at 10:18 AM Koji Kawamura  wrote:
>
> Thank you for the fix Sivaprasanna,
> I have Azure account. Reviewing it now.
>
> Koji
> On Sun, Oct 14, 2018 at 11:21 PM Jeff  wrote:
> >
> > Sivaprasanna,
> >
> > Thanks for submitting a pull request for that issue!  Later today or
> > tomorrow I'll have to check to see if I've already used up my free-tier
> > access to Azure.  If I still have access, I can review your PR and we'll
> > get it into 1.8.0.
> >
> > On Sun, Oct 14, 2018 at 4:30 AM Sivaprasanna 
> > wrote:
> >
> > > All - Just found one bug with DeleteAzureBlobStorage processor. It was
> > > shared by one user on StackOverflow [1] and I later confirmed it. It looks
> > > to be introduced by NIFI-4199. I have created a Jira [2] and made the
> > > necessary changes (not huge, just few lines) and raised a PR [3]. I think,
> > > if we can spend a little time in getting it reviewed, we can mark it for
> > > 1.8.0. Thoughts?
> > >
> > > [1] -
> > >
> > > https://stackoverflow.com/questions/52766991/apache-nifi-deleteazureblobstorage-processor-is-throwing-an-error
> > > [2] - https://issues.apache.org/jira/browse/NIFI-5698
> > > [3] - https://github.com/apache/nifi/pull/3073
> > >
> > > -
> > > Sivaprasanna
> > >
> > > On Fri, Oct 12, 2018 at 9:05 PM Mike Thomsen 
> > > wrote:
> > >
> > > > 4811 should be ready for review now. Rebased and cleaned it up with a
> > > full
> > > > listing of the Spring dependencies.
> > > >
> > > > On Fri, Oct 12, 2018 at 11:23 AM Joe Witt  wrote:
> > > >
> > > > > Jeff,
> > > > >
> > > > > I think for anything not tagged to 1.8.0 we just keep rolling.  For
> > > > > anything tagged 1.8.0 that should not be we should remove it until
> > > > > ready.  For things tagged to 1.8.0 that cannot be moved we should
> > > > > resolve.  For the tagged 1.8.0 section you had.
> > > > >
> > > > >- NIFI-4811 <https://issues.apache.org/jira/browse/NIFI-4811> -
> > > Use a
> > > > >newer version of spring-data-redis
> > > > >- PR 2856 <https://github.com/apache/nifi/pull/2856>
> > > > > *This needs to be resolved by either reverting the commit or ensuring
> > > > > L accurately reflects all.  We have to do this always and for every
> > > > > nar.  The process isnt easy or fun but it is necessary to produce
> > > > > valid ASF releases.  Landing commits which change dependencies
> > > > > requires this due diligence.  Now, we've put a lot of energy into
> > > > > updating Spring dependencies because some older Spring libs had
> > > > > vulnerabilities which while we likely aren't exposed to them we want
> > > > > to fix in due course.  So reverting may require more analysis than if
> > > > > we were just get L fixed with this new change.  I commented on the
> > > > > JIRA.  But this needs to be resolved.
> > > > >
> > > > >
> > > > >- NIFI-5426 <https://issues.apache.org/jira/browse/NIFI-5426> - Use
> > > > >NIO.2 API for ListFile to avoid multiple disk reads
> > > > >   - PR 2889 <https://github.com/apache/nifi/pull/2889>
> > > > > *This just needed to be marked resolved.  The commit went in the day
> > > > > after we cut 1.7.1.  So this one is sorted.
> > > > >
> > > > >- NIFI-5448 <https://issues.apache.org/jira/browse/NIFI-5448> -
> > > > Failed
> > > > >EL date parsing live-locks processors without a failure 
> > > > > relationship
> > > > > * The commit needs to be reverted.  I'm working on that now.  Once the
> > > > > discsusion/concerns are addressed this can get dealt with.
> > > > >
> > > > >- NIFI-5665 <https://issues.apache.org/jira/browse/NIFI-5665> -
> > > > Upgrade
> > > > >io.netty dependencies
> > > > > * This looks important to get resolved if possible as old netty libs
> > > > > are on the list of things with vulnerabilities.
> > > > >
> > > > >- NIFI-5686 <https://issues.apache.org/jira/browse/NIFI-5686> -
> > > Test
> >

Re: [DISCUSS] Closing in on a release of NiFi 1.8.0?

2018-10-14 Thread Koji Kawamura
Thank you for the fix Sivaprasanna,
I have Azure account. Reviewing it now.

Koji
On Sun, Oct 14, 2018 at 11:21 PM Jeff  wrote:
>
> Sivaprasanna,
>
> Thanks for submitting a pull request for that issue!  Later today or
> tomorrow I'll have to check to see if I've already used up my free-tier
> access to Azure.  If I still have access, I can review your PR and we'll
> get it into 1.8.0.
>
> On Sun, Oct 14, 2018 at 4:30 AM Sivaprasanna 
> wrote:
>
> > All - Just found one bug with DeleteAzureBlobStorage processor. It was
> > shared by one user on StackOverflow [1] and I later confirmed it. It looks
> > to be introduced by NIFI-4199. I have created a Jira [2] and made the
> > necessary changes (not huge, just few lines) and raised a PR [3]. I think,
> > if we can spend a little time in getting it reviewed, we can mark it for
> > 1.8.0. Thoughts?
> >
> > [1] -
> >
> > https://stackoverflow.com/questions/52766991/apache-nifi-deleteazureblobstorage-processor-is-throwing-an-error
> > [2] - https://issues.apache.org/jira/browse/NIFI-5698
> > [3] - https://github.com/apache/nifi/pull/3073
> >
> > -
> > Sivaprasanna
> >
> > On Fri, Oct 12, 2018 at 9:05 PM Mike Thomsen 
> > wrote:
> >
> > > 4811 should be ready for review now. Rebased and cleaned it up with a
> > full
> > > listing of the Spring dependencies.
> > >
> > > On Fri, Oct 12, 2018 at 11:23 AM Joe Witt  wrote:
> > >
> > > > Jeff,
> > > >
> > > > I think for anything not tagged to 1.8.0 we just keep rolling.  For
> > > > anything tagged 1.8.0 that should not be we should remove it until
> > > > ready.  For things tagged to 1.8.0 that cannot be moved we should
> > > > resolve.  For the tagged 1.8.0 section you had.
> > > >
> > > >- NIFI-4811  -
> > Use a
> > > >newer version of spring-data-redis
> > > >- PR 2856 
> > > > *This needs to be resolved by either reverting the commit or ensuring
> > > > L accurately reflects all.  We have to do this always and for every
> > > > nar.  The process isnt easy or fun but it is necessary to produce
> > > > valid ASF releases.  Landing commits which change dependencies
> > > > requires this due diligence.  Now, we've put a lot of energy into
> > > > updating Spring dependencies because some older Spring libs had
> > > > vulnerabilities which while we likely aren't exposed to them we want
> > > > to fix in due course.  So reverting may require more analysis than if
> > > > we were just get L fixed with this new change.  I commented on the
> > > > JIRA.  But this needs to be resolved.
> > > >
> > > >
> > > >- NIFI-5426  - Use
> > > >NIO.2 API for ListFile to avoid multiple disk reads
> > > >   - PR 2889 
> > > > *This just needed to be marked resolved.  The commit went in the day
> > > > after we cut 1.7.1.  So this one is sorted.
> > > >
> > > >- NIFI-5448  -
> > > Failed
> > > >EL date parsing live-locks processors without a failure relationship
> > > > * The commit needs to be reverted.  I'm working on that now.  Once the
> > > > discsusion/concerns are addressed this can get dealt with.
> > > >
> > > >- NIFI-5665  -
> > > Upgrade
> > > >io.netty dependencies
> > > > * This looks important to get resolved if possible as old netty libs
> > > > are on the list of things with vulnerabilities.
> > > >
> > > >- NIFI-5686  -
> > Test
> > > >failure in TestStandardProcessScheduler
> > > >- PR 3062 
> > > > * This has a PR but a test, possibly two, failed in one of the travis
> > > > runs and it is clearly related.  I ignored one of those tests in a
> > > > previous run.  We must deal with brittle tests.  But the underlying
> > > > problem is important to solve here so either the tests needs improved
> > > > or we still have an issue.  Not clear but worth some focus.
> > > >
> > > > note: I intend to reference updates to libraries that have known
> > > > vulnerabilities and do so in a far less subtle manner than we had.  We
> > > > aren't acknowledging that NiFi is or exposes vulnerabilities but we
> > > > are and should be clear when we're updating dependencies that do have
> > > > them (even if we're not exposed to them) so that some of these commits
> > > > aren't so mysterious.  It creates far more confusion than is worth.
> > > > We still will follow the ASF/NiFi security handling policy but I no
> > > > longer intend to treat due course dependency updates as if they need
> > > > to be a secret.
> > > >
> > > > Thanks
> > > > Joe
> > > >
> > > >
> > > > On Fri, Oct 12, 2018 at 3:32 AM Jeff  wrote:
> > > > >
> > > > > Hello everyone!  Next week is probably a good timeframe to aim for a
> > > > > release 

Re: NiFi remote connections

2018-09-27 Thread Koji Kawamura
Hi Clay,

Excuse me for the confusing response.
I looked at the source code again and do some testing to see how RPG
(HTTP S2S) manages connections.
It uses both sync and async HTTP client simultaneously and it
establishes multiple connection and those are not persistent.
I just remember that in case of RPG, it was difficult to share a
single connection with multiple HTTP requests especially with async
HTTP client while getting along with the existing S2S mechanism.
Currently HTTP S2S protocol consists of a series of requests and uses
multiple connections.

Thanks,
Koji

On Wed, Sep 26, 2018 at 1:16 PM Clay Teahouse  wrote:
>
> Thanks for the reply, Koji.
>
> In case of RPG, are there circumstances where the connections are not
> persistent?
>
>
> On Tue, Sep 25, 2018 at 12:14 AM Koji Kawamura 
> wrote:
>
> > Hi Clay,
> >
> > RPG (Site-to-Site) is a Peer-to-Peer communication protocol. There's
> > no distinction between a primary node and the remote cluster, or nodes
> > other than the primary node and the remote cluster.
> > E.g. With Cluster A (node a1, a2 and a3) and Cluster B (node b1, b2 and b3)
> > Each node must be able to communicate with every remote node. Node a1
> > will communicate with all of node b1, b2 and b3. So do node a2 and a3.
> > Those connections are persistent.
> > S2S RAW uses socket based connection. S2S HTTP uses
> > PoolingHttpClientConnectionManager internally to reuse connection.
> >
> > PostHTTP uses PoolingHttpClientConnectionManager, too.
> > InvokeHTTP uses different HTTP client library, that is okhttp. I
> > didn't check it, but I assume it supports keep-alive.
> >
> > Do you have any specific concern about keep-alive?
> > The keep-alive technology can be used to improve performance. However,
> > we should not depend on that for any load-balancing rule.
> > If you are looking for a solution to distribute FlowFiles based on
> > some rules, NIFI-5516 will be useful (under development).
> > https://issues.apache.org/jira/browse/NIFI-5516
> >
> > Hope this helps.
> >
> > Thanks,
> > Koji
> > On Tue, Sep 25, 2018 at 12:07 PM Clay Teahouse 
> > wrote:
> > >
> > > Hi All,
> > >
> > > Are the connections between the primary node and RPG persistent, and if
> > no,
> > > is there a way to make them persistent?
> > >
> > > Similarly, are the http connection from PostHTTP and InvokeHTTP to the
> > > destination persistent, meaning keep-alive is set to true?
> > >
> > > thanks a lot
> > >
> > > Clay
> >


Re: NiFi remote connections

2018-09-24 Thread Koji Kawamura
Hi Clay,

RPG (Site-to-Site) is a Peer-to-Peer communication protocol. There's
no distinction between a primary node and the remote cluster, or nodes
other than the primary node and the remote cluster.
E.g. With Cluster A (node a1, a2 and a3) and Cluster B (node b1, b2 and b3)
Each node must be able to communicate with every remote node. Node a1
will communicate with all of node b1, b2 and b3. So do node a2 and a3.
Those connections are persistent.
S2S RAW uses socket based connection. S2S HTTP uses
PoolingHttpClientConnectionManager internally to reuse connection.

PostHTTP uses PoolingHttpClientConnectionManager, too.
InvokeHTTP uses different HTTP client library, that is okhttp. I
didn't check it, but I assume it supports keep-alive.

Do you have any specific concern about keep-alive?
The keep-alive technology can be used to improve performance. However,
we should not depend on that for any load-balancing rule.
If you are looking for a solution to distribute FlowFiles based on
some rules, NIFI-5516 will be useful (under development).
https://issues.apache.org/jira/browse/NIFI-5516

Hope this helps.

Thanks,
Koji
On Tue, Sep 25, 2018 at 12:07 PM Clay Teahouse  wrote:
>
> Hi All,
>
> Are the connections between the primary node and RPG persistent, and if no,
> is there a way to make them persistent?
>
> Similarly, are the http connection from PostHTTP and InvokeHTTP to the
> destination persistent, meaning keep-alive is set to true?
>
> thanks a lot
>
> Clay


Re: [VOTE] Release Apache NiFi Registry 0.3.0

2018-09-24 Thread Koji Kawamura
+1 (binding)

Verified building and testing with Ranger auth.
$ mvn clean install -Pcontrib-check -Pinclude-ranger

Apache release distribution guid line has been updated and discourages
providing SHA-1. We should update release process template.
"SHOULD NOT supply a MD5 or SHA-1 checksum file (because these are deprecated)"
http://www.apache.org/dev/release-distribution#sigs-and-sums

On Tue, Sep 25, 2018 at 11:46 AM Aldrin Piri  wrote:
>
> +1, binding
>
> comments
> hashes and signature look good
> build and tests were good
> verified integration with NiFi and versioning of some process groups
>
> On Mon, Sep 24, 2018 at 7:36 PM Marc Parisi  wrote:
>
> > +1 -- binding
> >
> >Validated sigs, checksums, built, and used within current integration
> > testing techniques in secure mode.
> >
> > On Mon, Sep 24, 2018 at 9:16 AM Mark Payne  wrote:
> >
> > > +1 (binding)
> > >
> > > Validated hashes, build with contrib-check. Started and ensured that
> > > registry is
> > > able to store and retrieve flow with load balancing information.
> > >
> > > Thanks for handling the RM duties this time around, Kevin!
> > >
> > > -Mark
> > >
> > >
> > > > On Sep 22, 2018, at 9:54 AM, Kevin Doran  wrote:
> > > >
> > > > Hello,
> > > >
> > > > I am pleased to call this vote for the source release of Apache NiFi
> > > > Registry 0.3.0.
> > > >
> > > > The source zip, including signatures, digests, etc. can be found at:
> > > >
> > >
> > https://dist.apache.org/repos/dist/dev/nifi/nifi-registry/nifi-registry-0.3.0/
> > > >
> > > > The Git tag is nifi-registry-0.3.0-RC1
> > > > The Git commit ID is 2fef9fac2b627ee1f3428b07019b333c68f65f2d
> > > >
> > >
> > https://git-wip-us.apache.org/repos/asf?p=nifi-registry.git;a=commit;h=2fef9fac2b627ee1f3428b07019b333c68f65f2d
> > > >
> > > > Checksums of nifi-registry-0.3.0-source-release.zip:
> > > > SHA1:   4340068ef7e3ba099f614b86be4d2d77016845d9
> > > > SHA256:
> > c0a51f6b5855202993daa808015ddf26466843f3bc5727333f44661099c2ad5b
> > > > SHA512:
> > >
> > efc9882511eaccca4cc83117f5a870859f80b3667d8d3d9bf6e1fec9a1e4d1f37abee4476aaa28245468eca0eec30a3e84eba8e547a0fc1a97fdf6c289ab8b96
> > > >
> > > > Release artifacts are signed with the following key:
> > > > Fingerprint = C09B A891 AED4 5B8C 2C23  1AFE 1FB6 6A91 F71B 6207
> > > > Available at:
> > > > https://people.apache.org/keys/committer/kdoran.asc
> > > > https://pgp.mit.edu/pks/lookup?op=get=0x1FB66A91F71B6207
> > > >
> > > > KEYS file available here:
> > > > https://dist.apache.org/repos/dist/dev/nifi/KEYS
> > > >
> > > > 15 issues were closed/resolved for this release:
> > > >
> > >
> > https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343483=Html=12320920
> > > >
> > > > Release note highlights can be found here:
> > > > https://cwiki.apache.org/confluence/display/NIFI/Release+Notes
> > > >
> > > > The vote will be open for 72 hours.
> > > > Please download the release candidate and evaluate the necessary items
> > > > including checking hashes, signatures, build
> > > > from source, and test. Then please vote:
> > > >
> > > > [ ] +1 Release this package as nifi-registry-${VERSION}
> > > > [ ] +0 no opinion
> > > > [ ] -1 Do not release this package because...
> > >
> > >
> >


Re: [VOTE] Release Apache NiFi 1.7.0

2018-06-21 Thread Koji Kawamura
+1 (binding)

Verified things in release helper guide.
Tested that a NiFi flow created in old version can be used with 1.7.0.

There are some component properties those have been renamed.
I listed such properties that I'm aware of, on the Migration Guidance page.
https://cwiki.apache.org/confluence/display/NIFI/Migration+Guidance

Thanks Andy for volunteering the Release Manager duties!

Koji


On Thu, Jun 21, 2018 at 11:44 PM, Otto Fowler  wrote:
> +1
> builds, tests, contrib
> checksum, signing
> tag checkout by commit and diff
>
>
>
> On June 20, 2018 at 03:16:47, Andy LoPresto (alopre...@apache.org) wrote:
>
> Hello,
>
> I am pleased to be calling this vote for the source release of Apache NiFi
> nifi-1.7.0.
>
> The source zip, including signatures, digests, etc. can be found at:
> https://repository.apache.org/content/repositories/orgapachenifi-1127
>
> and
>
> https://dist.apache.org/repos/dist/dev/nifi/nifi-1.7.0
>
> The Git tag is nifi-1.7.0-RC1
> The Git commit ID is 99bcd1f88dc826f857ae4ab33e842110bfc6ce21
> https://git-wip-us.apache.org/repos/asf?p=nifi.git;a=commit;h=99bcd1f88dc826f857ae4ab33e842110bfc6ce21
>
> Checksums of nifi-1.7.0-source-release.zip:
> SHA1: 11086ef532bb51462d7e1ac818f6308d4ac62f03
> SHA256: b616f985d486af3d05c04e375f952a4a5678f486017a2211657d5ba03aaaf563
> SHA512:
> d81e9c6eb7fc51905d6f6629b25151fc3d8af7a3cd7cbc3aa03be390c0561858d614b62d8379a90fdb736fcf5c1b4832f4e050fdcfcd786e9615a0b5cc1d563d
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/alopresto.asc
>
> KEYS file available here:
> https://dist.apache.org/repos/dist/release/nifi/KEYS
>
> 194 issues were closed/resolved for this release:
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12342979=12316020
>
> Release note highlights can be found here:
> https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version1.7.0
>
> The vote will be open for 72 hours.
> Please download the release candidate and evaluate the necessary items
> including checking hashes, signatures, build
> from source, and test. Then please vote:
>
> [ ] +1 Release this package as nifi-1.7.0
> [ ] +0 no opinion
> [ ] -1 Do not release this package because…
>
> Andy LoPresto
> alopre...@apache.org
> *alopresto.apa...@gmail.com *
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69


How do we want to add DeepLearning capability?

2018-06-18 Thread Koji Kawamura
Hi all,

A PR is submitted to add to do classification or prediction using a
pre-built model with DeepLearning4J library (thanks @mans2singh!).
https://github.com/apache/nifi/pull/2686

I found following things can/should be improved so that people can use
it easier from a NiFi flow:
- To utilize evaluation results at the downstream flow. Example flow
template is provided, however it requires external database.
(RecordLookup, enrichment pattern would be more appropriate?)
- Vectorize input data. (Record reader/writer, conversion?)

To me, the key point is "How easy and naturally NiFi user can use
DeepLearning within their NiFi flow?" A sophisticated DeepLearning
support may not be able to accomplish with just a single Processor.

I will keep reviewing PR2686, but anyone who is interested in, or has
some knowledge in Machine Learning or Deep Learning stuffs, please
join!

Thanks,
Koji


Re: [VOTE] Release Apache NiFi Registry 0.2.0

2018-06-18 Thread Koji Kawamura
+1 (binding)

- Run through Release Helper Guide
- Tested other database other than H2
- Tested Git persistence provider

Few minor feedback:

1. Database user whose password is blank can not be used

When I used HSQLDB, the default 'sa' user does not have password. If I
configure a blank password property, I got following error.
Work-around: create a user with password
Caused by: java.lang.IllegalStateException: nifi.registry.db.password
is required
at 
org.apache.nifi.registry.db.DataSourceFactory.createDataSource(DataSourceFactory.java:78)
~[na:na]

2. V2__Initial.sql uses 'TEXT' data type, which is not supported in some DBMS

While many databases such as MySQL, PostgreSQL, SQL Server and Oracle
... etc support TEXT, it is not a standard data type and some other
database engines do not support it (e.g. HSQLDB).
Also, Microsoft docs mention that SQL Server may remove TEXT data type
in the future release.
https://docs.microsoft.com/en-us/sql/t-sql/data-types/ntext-text-and-image-transact-sql?view=sql-server-2017

V1__Initial.sql uses 'VARCHAR(4096)' instead. That can support more databases.
After changing V2__Initial.sql to use VARCHAR, I was able to use HSQLDB.


Thanks,
Koji

On Sun, Jun 17, 2018 at 9:13 PM, Jeff Zemerick  wrote:
> +1 non-binding
>
> Ran through the release helper guide with no issues. Good stuff!
>
> On Sat, Jun 16, 2018 at 5:47 PM Andy LoPresto 
> wrote:
>
>> Sorry everyone,
>>
>> Committers can provide binding votes on technical discussions (such as to
>> call for a release), but only PMC members can provide binding votes on
>> releases. This is an Apache policy.
>>
>> I’ll update the page on Monday to be a bit clearer on the distinction.
>>
>> Andy LoPresto
>> alopre...@apache.org
>> alopresto.apa...@gmail.com
>> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>>
>> > On Jun 16, 2018, at 14:34, Andy LoPresto 
>> wrote:
>> >
>> > Abdelkrim,
>> >
>> > Thanks for validating the release and voting. Just to clarify, only PMC
>> members and committers [1] can cast a “binding” vote. All other community
>> members are welcome to cast a +1, 0, or -1 vote as well, but these are
>> “non-binding”. Thanks again.
>> >
>> > [1] http://nifi.apache.org/people.html
>> >
>> > Andy LoPresto
>> > alopre...@apache.org
>> > alopresto.apa...@gmail.com
>> > PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>> >
>> >> On Jun 16, 2018, at 14:23, Abdelkrim Hadjidj 
>> wrote:
>> >>
>> >> +1 (binding)
>> >>
>> >> - Tested everything from the release helper and everything worked fine
>> >> - Tested the GitFlowPersistenceProvider with Github, works fine
>> >> - The Switching from other Persistence Provider section of the Admin
>> Guide asks to move the H2 DB specified as nifi.registry.db.directory. In
>> the Registery 0.2, this property is set to "". So it doesn't apply when you
>> start a Registry 0.2 with FilePersistenceProvider and migrate to Git
>> provider.
>> >> - This is not an issue, but it seems that there no way to migrate
>> previous flow file from File provider to Git provider, or from an existing
>> Git provider with Git clone.
>> >>
>> >> Thanks
>> >>
>> >> On 6/16/18, 3:22 PM, "Bryan Bende"  wrote:
>> >>
>> >>+1 (binding) Release this package as nifi-registry-0.2.0
>> >>
>> >>- Ran through everything in the release helper and looked good, few
>> minor
>> >>things Andy mentioned
>> >>- Tested upgrading an existing registry to 0.2.0 to test database
>> migration
>> >>- Tested basic event hook logging
>> >>- Ran secure NiFi with secure registry
>> >>
>> >>I also never saw the strange Maven output that Andy reported.
>> >>
>> >>Thanks for RM'ing!!
>> >>
>> >>
>> >>>On Sat, Jun 16, 2018 at 1:29 AM, Andy LoPresto <
>> alopre...@apache.org> wrote:
>> >>>
>> >>> +1, binding
>> >>>
>> >>> I:
>> >>> * verified all checksums and signatures
>> >>> * ran through the normal build process (tests and contrib-check)
>> >>> * verified the LICENSE, NOTICE, and README.md documents
>> >>> * deployed the application
>> >>> * verified interaction with NiFi
>> >>>
>> >>> I also deployed a secured NiFi Registry and NiFi instance and was able
>> to
>> >>> perform expected behavior. I then encrypted the Registry configs and
>> >>> verified the application still worked.
>> >>>
>> >>> There were a lot of tiny issues, but none that I believe raise to the
>> >>> level of blocking the release. I’ve included brief notes here but will
>> open
>> >>> Jiras for these at a later date. Thanks Kevin, great work.
>> >>>
>> >>> * Maven build has weird output at the end (see below; seems like text
>> >>> interpolation from race condition? may only be my machine, but I ran
>> >>> single-threaded)
>> >>> * source NOTICE includes copyright date 2014-2018 (who has the
>> DeLorean?)
>> >>> * source NOTICE says MiNiFi copyright date 2015-2018?
>> >>> * Probably should have spaces around hyphens in "Registry-a
>> sub-project of
>> >>> Apache NiFi-is" in source 

Re: LookupService + flowfile attributes

2018-06-12 Thread Koji Kawamura
Hi Mike,

I'm still not sure which is better, separating the variables or
containing all values.
I wrote a comment so that we can keep the discussion there.
https://github.com/apache/nifi/pull/2777#issuecomment-396512384

Thanks,
Koji

On Tue, Jun 12, 2018 at 1:56 AM, Mike Thomsen  wrote:
> Koji,
>
> After reading Mark's comments on GitHub, it occurred to me that the MongoDB
> lookup service and the ES one I have as a PR would be screwed up if we take
> the original approach because they blindly build a query from the total
> coordinates set. So they'd add flowfile attributes as criteria by default.
> I'll update the PR accordingly and make the new method default to the
> existing one in all of the lookup services that are already there.
>
> On Sat, Jun 9, 2018 at 8:44 AM Mike Thomsen  wrote:
>
>> https://issues.apache.org/jira/browse/NIFI-5287
>>
>> On Sat, Jun 9, 2018 at 1:20 AM Koji Kawamura 
>> wrote:
>>
>>> Thanks Mike for starting the discussion.
>>>
>>> Yes, I believe that will make LookupService and Schema access strategy
>>> much easier, reusable, and useful.
>>>
>>> What I was imagined is not adding new method signature, but simply
>>> copy certain FlowFile attributes into the coordinates map.
>>> We can add that at LookupRecord.
>>> Currently LookupAttribute only uses one coordinate value and can be
>>> left as it is.
>>>
>>> Specifically, by adding new processor property, 'Copy FlowFile
>>> Attributes into Coordinates', where user can define RegularExpression
>>> to select which attributes to copy.
>>> I think it's fine to mix FlowFile attributes and values defined as
>>> dynamic properties into the same coordinates map.
>>> The put oder should be FlowFile attributes, then dynamic properties,
>>> so that user can overwrite attribute values when necessary.
>>>
>>> Koji
>>>
>>>
>>> On Sat, Jun 9, 2018 at 1:06 AM, Mike Thomsen 
>>> wrote:
>>> > On the RestLookupService PR I think Koji mentioned the idea of expanding
>>> > the lookup capability to include flowfile attributes. That sort of thing
>>> > would be immensely useful on two PRs I have already open for lookup
>>> service
>>> > changes for ES and Mongo. Koji, add your thoughts, but what I'm thinking
>>> > would be a new PR that adds:
>>> >
>>> > T lookup(Map flowfileAttributes, Map
>>> > coordinates);
>>> >
>>> > to the LookupService interface and has the related processors pass in
>>> the
>>> > flowfile attribute map. Specifically, it would help make the schema
>>> access
>>> > capabilities really usable with lookup services (see
>>> MongoDBLookupService
>>> > PR for example; I added a new SchemaRegistryService type for JSON
>>> sources)
>>>
>>


Re: LookupService + flowfile attributes

2018-06-08 Thread Koji Kawamura
Thanks Mike for starting the discussion.

Yes, I believe that will make LookupService and Schema access strategy
much easier, reusable, and useful.

What I was imagined is not adding new method signature, but simply
copy certain FlowFile attributes into the coordinates map.
We can add that at LookupRecord.
Currently LookupAttribute only uses one coordinate value and can be
left as it is.

Specifically, by adding new processor property, 'Copy FlowFile
Attributes into Coordinates', where user can define RegularExpression
to select which attributes to copy.
I think it's fine to mix FlowFile attributes and values defined as
dynamic properties into the same coordinates map.
The put oder should be FlowFile attributes, then dynamic properties,
so that user can overwrite attribute values when necessary.

Koji


On Sat, Jun 9, 2018 at 1:06 AM, Mike Thomsen  wrote:
> On the RestLookupService PR I think Koji mentioned the idea of expanding
> the lookup capability to include flowfile attributes. That sort of thing
> would be immensely useful on two PRs I have already open for lookup service
> changes for ES and Mongo. Koji, add your thoughts, but what I'm thinking
> would be a new PR that adds:
>
> T lookup(Map flowfileAttributes, Map
> coordinates);
>
> to the LookupService interface and has the related processors pass in the
> flowfile attribute map. Specifically, it would help make the schema access
> capabilities really usable with lookup services (see MongoDBLookupService
> PR for example; I added a new SchemaRegistryService type for JSON sources)


Re: [EXT] Re: Primary Only Content Migration

2018-06-07 Thread Koji Kawamura
There is an existing JIRA submitted by Pierre.
I think its goal is the same with what Joe mentioned above.
https://issues.apache.org/jira/browse/NIFI-4026

As for hashing and routing data with affinity/correlation, I think
'Consistent Hashing' is the most popular approach to minimize the
impact of node addition/deletion.
Applying Consistent Hashing to S2S client may not be difficult. The
challenging part is how to support cluster topology change in the
middle of transferring data that needs correlation.

A simple challenging scenario:
Let's say there is a group of 4 FlowFiles having correlation id as 'rel-A'
1. Client sends rel-A, data-1of4 to Node1
2. Client sends rel-A, data-2of4 to Node1
3. NodeN is added and it takes some part in hash key space that Node1
was assigned to
4. Client sends rel-A, data-3of4 to NodeN
5. Client sends rel-A, data-4of4 to NodeN

Then, a Merge processor running on Node1 and NodeN can not complete
because it won't have the whole dataset to merge.
This situation can be handled manually if we document it well.
Or adding resending loop, so that:

5. Client on Node1 resends rel-A, data1of4 to NodeN
6. Client on Node1 resends rel-A, data2of4 to NodeN
7. Merge processor on NodeN merges the FlowFiles.

I'm interested in working on this improvement, too.

Thanks,
Koji


On Fri, Jun 8, 2018 at 8:19 AM, Joe Witt  wrote:
> Peter
>
> I'm not sure there is a good way for a processor to drive such a thing
> with existing infrastructure.  The processor having ability to know
> about the structure of a cluster is not something we have wanted to
> expose for good reasons.  There would likely need to be a more
> fundamental point of support for this.
>
> I'm not sure what that design would look like just yet - but agreeing
> this is an important step to take soon.  If you want to start
> sketching out design ideas that would be awesome.
>
> Thanks
> On Thu, Jun 7, 2018 at 6:11 PM Peter Wicks (pwicks)  wrote:
>>
>> Joe,
>>
>> I agree it is a lot of work, which is why I was thinking of starting with a 
>> processor that could do some of these operations before looking further. If 
>> the processor could move flowfile's between nodes in the cluster it would be 
>> a good step. Data comes in form a queue on any node, but gets written out to 
>> a queue on only the desired node; or gets round robin outputted for a 
>> distribute scenario.
>>
>> I want to work on it, and was trying to figure out if it could be done using 
>> only a processor, or if larger changes would be needed for sure.
>>
>> --Peter
>>
>> -Original Message-
>> From: Joe Witt [mailto:joe.w...@gmail.com]
>> Sent: Thursday, June 7, 2018 3:34 PM
>> To: dev@nifi.apache.org
>> Subject: Re: [EXT] Re: Primary Only Content Migration
>>
>> Peter,
>>
>> It isn't a pattern that is well supported now in a cluster context.
>>
>> What is needed are automatically load balanced connections with 
>> partitioning.  This would mean a user could select a given relationship and 
>> indicate that data should automatically distributed and they should be able 
>> to express, optionally, if there is a correlation attribute that is used for 
>> ensuring data which belongs together stays together or becomes together.  We 
>> could use this to automatically have a connection result in data being 
>> distributed across the cluster for load balancing purposes and also ensure 
>> that data is brought back to a single node whenever necessary which is the 
>> case in certain scenarios like fork/distribute/process/join/send and things 
>> like distributed receipt then join for merging (like defragmenting data 
>> which has been split).  To join them together we need affinity/correlation 
>> and this could work based on some sort of hashing mechanism where there are 
>> as many buckets as their are nodes in a cluster at a given time.  It needs a 
>> lot of thought/design/testing/etc..
>>
>> I was just having a conversation about this yesterday.  It is definitely a 
>> thing and will be a major effort.  Will make a JIRA for this soon.
>>
>> Thanks
>>
>> On Thu, Jun 7, 2018 at 5:21 PM, Peter Wicks (pwicks)  
>> wrote:
>> > Bryan,
>> >
>> > We see this with large files that we have split up into smaller files and 
>> > distributed across the cluster using site-to-site. We then want to merge 
>> > them back together, so we send them to the primary node before continuing 
>> > processing.
>> >
>> > --Peter
>> >
>> > -Original Message-
>> > From: Bryan Bende [mailto:bbe...@gmail.com]
>> > Sent: Thursday, June 7, 2018 12:47 PM
>> > To: dev@nifi.apache.org
>> > Subject: [EXT] Re: Primary Only Content Migration
>> >
>> > Peter,
>> >
>> > There really shouldn't be any non-source processors scheduled for primary 
>> > node only. We may even want to consider preventing that option when the 
>> > processor has an incoming connection to avoid creating any confusion.
>> >
>> > As long as you set source processors to primary node only then everything 
>> > should be 

Re: Restrict WebUI Access based on IP

2018-06-06 Thread Koji Kawamura
Hi Ruben,

I am not aware of any configuration to do that at NiFi side, I believe
NiFi doesn't have that.
I usually do access control based on client IP addresses by FireWall.

'iptables' is the standard one for Linux. You can find many examples
on the internet to configure iptables.
If you are using IaaS cloud services such as AWS EC2 or Azure VM
instances, then you can apply such access control at 'Security Group'
configuration.

Thanks,
Koji

On Wed, Jun 6, 2018 at 7:46 AM, Ruben Barrios  wrote:
> Hello NiFi team,
>
> My name is Ruben, I'm working with NiFi 1.6.0 in Stand Alone mode.
>
> I have a question about WebUI access, it's possible to block incoming
> connections to 8080 port based on specific IP's or a Subnet?
>
> For Example:
>   Dev team is on IPs 172.0.1.5 to 172.0.1.10,
>   Testing team is on 172.0.1.11 to 172.0.1.20
>
> Is any option to allow access only to IPs from Dev Team?
>
> Thank you!
>
> Rubén Barrios


Re: [VOTE] Release Apache NiFi MiNiFi C++ 0.5.0

2018-06-04 Thread Koji Kawamura
+1 (binding)

- Verified hashes
- Build and unit tests succeeded
- Run simple flows to send data from MiNiFi CPP to NiFi
- on Mac OS 10.13.4

I have a feedback on the release procedure.
Apache Release distribution policy has following:
"SHOULD NOT supply a MD5 checksum file (because MD5 is too broken)."
http://www.apache.org/dev/release-distribution#sigs-and-sums
NiFi release stopped supplying MD5 checksum since 1.6.0 and MiNiFi
should do that.

Appreciate all the improvements and bug fixes for this release. Thanks
to Jeremy to take the release manager role!

Thanks,
Koji

On Mon, Jun 4, 2018 at 10:10 AM, Kevin Doran  wrote:
> +1 (non-binding)
>
> I followed the steps in the helper guide and was able to verify the agent 
> works as expected with a couple test flows that send data to NiFi over s2s. 
> Most of my RC verification was done with the agent on Mac OS 10.12. A full 
> build with tests passed on Ubuntu 16.04 for me.
>
> One very minor issue I ran into was a faulty Y/N prompt in bootstrap.sh when 
> selecting "N" to bail at the cmake confirmation. It doesn't really do any 
> harm as you can easily blow away the build dir and start over, and is an easy 
> fix. I filed a JIRA [1] and opened a PR that corrects it [2]. And overall, 
> the bootstrap stuff really makes this a lot easier to build and deploy on new 
> platforms, so I'm really enjoying using that.
>
> Great work everyone who contributed to the agent since the last release. This 
> is a nice step forward! And thanks Jeremy for managing the release!
>
> [1] https://issues.apache.org/jira/browse/MINIFICPP-523
> [2] https://github.com/apache/nifi-minifi-cpp/pull/351
>
> On 5/31/18, 20:39, "Jeremy Dyer"  wrote:
>
> Hello Apache NiFi Community,
>
> I am pleased to call this vote for the source release of Apache NiFi 
> MiNiFi
> C++, nifi-minifi-cpp-0.5.0.
>
> The source archive, signature, and digests can be located at:
>
> Source Archive:
> https://dist.apache.org/repos/dist/dev/nifi/nifi-minifi-cpp/
> 0.5.0/nifi-minifi-cpp-0.5.0-source.tar.gz
>
> GPG armored signature:
> https://dist.apache.org/repos/dist/dev/nifi/nifi-minifi-cpp/
> 0.5.0/nifi-minifi-cpp-0.5.0-source.tar.gz.asc
>
> Source MD5:
> https://dist.apache.org/repos/dist/dev/nifi/nifi-minifi-cpp/
> 0.5.0/nifi-minifi-cpp-0.5.0-source.tar.gz.md5
>
> Source SHA1:
> https://dist.apache.org/repos/dist/dev/nifi/nifi-minifi-cpp/
> 0.5.0/nifi-minifi-cpp-0.5.0-source.tar.gz.sha1
>
> Source SHA256:
> https://dist.apache.org/repos/dist/dev/nifi/nifi-minifi-cpp/
> 0.5.0/nifi-minifi-cpp-0.5.0-source.tar.gz.sha256
>
> The Git tag is minifi-cpp-0.5.0-RC1
> The Git commit hash is 5f3b3973e37def4d8ed2753837986d121fd58322
> * 
> *https://git-wip-us.apache.org/repos/asf?p=nifi-minifi-cpp.git;a=commit;h=5f3b3973e37def4d8ed2753837986d121fd58322
> 
> *
> * 
> *https://github.com/apache/nifi-minifi-cpp/commit/5f3b3973e37def4d8ed2753837986d121fd58322
> 
> *
>
> Checksums of nifi-minifi-cpp-0.5.0-source.tar.gz:
> MD5: 9ec230b9ac3004981000276015860c52
> SHA1: a9e3fe34ed25f9f1a840cd318845bcdb6fb622f1
> SHA256: 78b5bbd65d1e3484efafc02a882f99063e06b88e1694daff6c24aaa3066037dc
>
> KEYS file available here:
> https://dist.apache.org/repos/dist/dev/nifi/KEYS
>
> 87 issues were closed/resolved for this release:
> 
> *https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12321520=12342659
> 
> *
>
> Release note highlights can be found here:
> 
> *https://cwiki.apache.org/confluence/display/MINIFI/Release+Notes#ReleaseNotes-Versioncpp-0.5.0
> 
> *
>
> The vote will close 3 June at 9PM EST.
>
> Please download the release candidate and evaluate the necessary items
> including checking hashes, signatures, build from source, and test. Then
> please vote:
>
> [ ] +1 Release this package as nifi-minifi-cpp-0.4.0
> [ ] +0 no opinion
> [ ] -1 Do not release this package because...
>
> Thanks!
>
>
>


Re: Using EL classes with just java Map objects

2018-06-03 Thread Koji Kawamura
Hi Mike,

In order to evaluate an ExpressionLanguage with Map containing
variables, I used Query.prepare, to parse a query String into
PreparedQuery.
Following code snippet works without issue. Is that something you want to do?

final Map map = Collections.singletonMap("name", "John Smith");
final PreparedQuery query = Query.prepare("${name}-${name:length()}");
final String result = query.evaluateExpressions(map, null);
System.out.println(result);

The code prints:
John Smith-10

Thanks,
Koji

On Sun, Jun 3, 2018 at 9:56 AM, Mike Thomsen  wrote:
> Point of clarification, the templated URL would itself be part of the
> coordinate Map, not a descriptor on the service so users would have total
> freedom there to send different variations with each record depending on
> their needs per record being enriched.
>
> On Sat, Jun 2, 2018 at 8:55 PM Mike Thomsen  wrote:
>
>> Ok. That makes sense. The idea was that the RestLookupService would
>> provide a templated URL option so you could specify roughly this as an
>> example:
>>
>> GET "https://something.com:${port}/service/${username}/something/${related
>> }"
>>
>> And have the EL engine take the Map and fill in the blanks.
>>
>> On Sat, Jun 2, 2018 at 6:53 PM Matt Burgess  wrote:
>>
>>> Mike,
>>>
>>> IIRC the "top-level" EL evaluator will go through a string finding EL
>>> constructs and pass them into Query I think.  Also ReplaceText (for
>>> some reason) is the only place I know of where you can quote something
>>> and (if EL is present), the result is treated as a string literal.
>>> Otherwise in NiFi Expression Language I believe a quoted construct on
>>> its own is an attribute to be evaluated. You might want the following:
>>>
>>> literal('${name}-${name:length()}')
>>>
>>> or if that doesn't work, it might be because the Query has to be a
>>> full EL construct so maybe you'd have to put the whole thing together
>>> yourself:
>>>
>>> Query.compile("${name}").evaluate(coordinates) + "-" +
>>> Query.compile("${name:length()}")
>>>
>>> I didn't try this out, and it's very possible my assumptions are not
>>> spot-on, so if these don't work let me know and I'll take a closer
>>> look.
>>>
>>> Regards,
>>> Matt
>>>
>>>
>>>
>>>
>>> On Sat, Jun 2, 2018 at 6:30 PM, Mike Thomsen 
>>> wrote:
>>> > I tried working with the EL package's Query object to try building
>>> > something like this:
>>> >
>>> > def evaluate(String query, Map coordinates) {
>>> > def compiled = Query.compile(query)
>>> > compiled.evaluate(coordinates)
>>> > }
>>> >
>>> > Which for [ name: "John Smith" ] and query '${name}-${name:length()}' I
>>> > expected would return a string with both bracketed operations executed.
>>> It
>>> > threw an exception saying unexpected token '-' at column 7.
>>> >
>>> > Am I missing something here?
>>> >
>>> > Thanks,
>>> >
>>> > Mike
>>>
>>


Re: Unable to bring up NiFi user interface

2018-05-29 Thread Koji Kawamura
Hello,

> A black command window screen pops up for a brief second and then closes.

Instead of double clicking run-nifi.bat button, you can run the bat
file from command prompt. That way, the output of run-nifi.bat will
stay in the command prompt and can help debugging what went wrong.
1. Open command prompt
2. Change directory to NIFI home, e.g. "cd C:\nifi"
3. Run the bat file from command prompt, e.g. "bin\run-nifi.bat"

Please share what is shown at step 3 above, in addition to what Joe asked.

Thanks,
Koji

On Wed, May 30, 2018 at 8:42 AM, Joe Witt  wrote:
> Hello
>
> Please share info about os version, java version, etc..
>
> run
>
> java -version
>
> and share that.
>
> what version of nifi?  did you change any settings?
>
> how much ram/cpu does your system have?
> is nifi in a directory it has write perms to?
>
>
> please share the contents of nifi log dir.
>
> thanks
>
>
> On Tue, May 29, 2018, 4:35 PM Call, Schuyler 
> wrote:
>
>> Hello,
>>
>> I am trying to use NiFi for a project I'm working on. I tried following
>> the steps in the "Getting Started with Apache NiFi" guide to create my own
>> interface, but I got stuck after trying to run the run-nifi.bat file. A
>> black command window screen pops up for a brief second and then closes. I
>> am unable to go to the localhost:8080 page because the run-nifi program
>> isn't running. I have tried looking up steps to troubleshoot this issue but
>> none of them have solved my problem. Is there some way you could help?
>>
>> Thank you,
>>
>> Schuyler Call
>>


Re: sFTP Question

2018-05-27 Thread Koji Kawamura
Hi Anil,

1. I'd use MonitorActivity, too.
Assuming you want to do something when there is no new files listed by
ListSFTP at a scheduled time.
Then you can add MonitorActivity in between ListSFTP and FetchSFTP.
ListSFTP -> MonitorActivity --success--> FetchSFTP
  +--inactive--> LogAttribute

Let's say the desired scheduling interval is 1 hour to list,
ListSFTP's run schedule is 1 hour. Then, MonitorActivity can be
configured to emit 'inactive' signal if it does not see any FlowFile
passed by ListSFTP longer than 1 hour, for example:
"Threshold Duration": "61 min"

The additional 1 minute is for a delay to fire 'inactive' signal, to
allow ListSFTP being delayed a bit in certain situation.

Then, MonitorActivity will create a FlowFile to send to 'inactive'
relationship if ListSFTP didn't find any file when it runs, to trigger
an alternative flow path, that is LogAttribute in the above example.

2. FetchSFTP does not support wild card for a remote file path, it
needs incoming FlowFiles to pass the complete absolute remote file
path.

Thanks,
Koji

On Sat, May 26, 2018 at 7:07 PM, Anil Rai  wrote:
> Below is my scenario
>
> On scheduled time, pick files from a sFTP location and process it further.
> I have two questions regarding this.
>
>1. To do this, I am using ListSFTP -> routeOnAttribute to filter
>filename with known prefix-> FetchSFTP. I want to know what is the best way
>to figure out or get a message  when there is no file in the location at
>the scheduled time. I am thinking of using MonitorActivity to keep track of
>any flowfiles generated in ListSFTP, but I am not sure how to configure it.
>2. Also, I already know the prefix of the files I will need. So is there
>a way  to know put a wild card search in the remote path property of
>FetchSFTP, so that I can remove ListSFTP altogether?
>
>
> Thanks
> Anil


Re: Put data to Elastic with static settings or index template

2018-05-22 Thread Koji Kawamura
Hi Bobby,

Elasticsearch creates index if it doesn't exist.

I haven't tried it myself yet, but Elasticsearch's Index template
might be useful to tweak default settings for indices those are
created automatically.
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-templates.html

Thanks,
Koji

On Tue, May 22, 2018 at 3:41 PM, Bobby  wrote:
> Siva,
>
> In my putElastic processor i only state below properties:
>
> 
>
> Given the index name is using expression language, i assume it will be
> created if it is not exist; In my example, i tend to create new index per
> day. My team also said, he didn't create index first, the processor take
> care of it.
>
> Thanks
>
>
>
> -
>
> -
> Bobby
> --
> Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/


Re: Proposal: standard record metadata attributes for data sources

2018-05-15 Thread Koji Kawamura
Hi Mike,

I agree with the approach that enrich provenance events. In order to
do so, we can use several places to embed meta-data:

- FlowFile attributes: automatically mapped to a provenance event, but
as Andy mentioned, we need to be careful not to put sensitive data.
- Transit URI: when I developed NiFi Atlas integration, I used this as
the primary source of what data a processor interact with. E.g. remote
address, database, table ... etc.
- The 'details' string. It might not be ideal solution, but
ProvenanceReporter accepts additional 'details' string. We can embed
whatever we want here.

I'd map meta-data you mentioned as follows:
1. Source system. => Transit URI
2. Database/table/index/collection/etc. => Transit URI or FlowFile
attribute. I think it's fine to put these into attribute.
3. The lookup criteria that was used (similar to the "query attribute"
some already have). => 'details' string

What I learned from Atlas integration, it's really hard to design a
complete standard set of attributes. I'd suggest use what NiFi
framework provides currently.

Thanks,

Koji

On Tue, May 15, 2018 at 8:15 AM, Andy LoPresto  wrote:
> Maybe an ADDINFO event or FORK event could be used and a new flowfile with
> the relevant attributes/content could be created. The flowfiles would be
> linked, but the “sensitive” information wouldn’t travel with the original.
>
> Andy LoPresto
> alopre...@apache.org
> alopresto.apa...@gmail.com
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
> On May 14, 2018, at 3:32 PM, Mike Thomsen  wrote:
>
> Does the provenance system have the ability to add user-defined key/value
> pairs to a flowfile's provenance record at a particular processor?
>
> On Mon, May 14, 2018 at 6:11 PM Andy LoPresto  wrote:
>
> I would actually propose that this is added to the provenance but not
> always put into the flowfile attributes. There are many scenarios in which
> the data retrieval should be separated from the analysis/follow-on, both
> for visibility, responsibility, and security concerns. While I understand a
> separate UpdateAttribute processor could be put in the downstream flow to
> remove these attributes, I would push for not adding them by default as a
> more secure approach. Perhaps this could be configurable on the Get*
> processor via a boolean property, but I think doing it automatically by
> default introduces some serious concerns.
>
>
> Andy LoPresto
> alopre...@apache.org
> *alopresto.apa...@gmail.com *
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
> On May 13, 2018, at 11:48 AM, Mike Thomsen  wrote:
>
> @Joe @Matt
>
> This is kinda related to the point that Joe made in the graph DB thread
> about provenance. My thought here was that we need some standards on
> enriching the metadata about what was fetched so that no matter how you
> store the provenance, you can find some way to query it for questions like
> when a data set was loaded into NiFi, how many records went through a
> terminating processor, etc. IMO this could help batch-oriented
> organizations feel more at ease with something stream-oriented like NiFi.
>
> On Fri, Apr 13, 2018 at 4:01 PM Mike Thomsen 
> wrote:
>
> I'd like to propose that all non-deprecated (or likely to be deprecated)
> Get/Fetch/Query processors get a standard convention for attributes that
> describe things like:
>
> 1. Source system.
> 2. Database/table/index/collection/etc.
> 3. The lookup criteria that was used (similar to the "query attribute"
> some already have).
>
> Using GetMongo as an example, it would add something like this:
>
> source.url=mongodb://localhost:27017
> source.database=testdb
> source.collection=test_collection
> source.query={ "username": "john.smith" }
> source.criteria.username=john.smith //GetMongo would parse the query and
> add this.
>
> We have a use case where a team is coming from an extremely batch-oriented
> view and really wants to know when "dataset X" was run. Our solution was to
> extract that from the result set because the dataset name is one of the
> fields in the JSON body.
>
> I think this would help expand what you can do out of the box with
> provenance tracking because it would provide a lot of useful information
> that could be stored in Solr or ES and then queried against terminating
> processors' DROP events to get a solid window into when jobs were run
> historically.
>
> Thoughts?
>
>
>
>


Re: [VOTE] Release Apache NiFi 1.6.0 (RC3)

2018-04-06 Thread Koji Kawamura
+1 (binding)

Ran through the release helper steps.
Confirmed example flows including Atlas integration work with a secure
NiFi cluster.

Thanks for the release efforts!

On Fri, Apr 6, 2018 at 1:01 PM, James Wing  wrote:
> +1 (binding) - Ran through the release helper checksums and build steps
> (Amazon Linux). Upgraded a working NiFi out of unbounded optimism for RC3,
> and it's looking good.  Thanks again for the release efforts!
>
> On Tue, Apr 3, 2018 at 4:49 PM, Joe Witt  wrote:
>
>> Hello,
>>
>> I am pleased to be calling this vote for the source release of Apache
>> NiFi nifi-1.6.0.
>>
>> The source zip, including signatures, digests, etc. can be found at:
>> https://repository.apache.org/content/repositories/orgapachenifi-1124
>>
>> The Git tag is nifi-1.6.0-RC3
>> The Git commit ID is f8466cb16d6723ddc3bf5f0e7f8ce8a47d27cbe5
>> https://git-wip-us.apache.org/repos/asf?p=nifi.git;a=commit;h=
>> f8466cb16d6723ddc3bf5f0e7f8ce8a47d27cbe5
>>
>> Checksums of nifi-1.6.0-source-release.zip:
>> SHA1: d1e1c24f9af809bf812982962b61d07df4f1131e
>> SHA256: 1e5028d594bb402aa36460f1b826d4e8a664ad6f0538deed20286cbf3c621fb8
>> SHA512: 8cb10cbafa6feeed712dbc0cf076496d6bc014276aab71383ff3481d8ea7
>> 19cf1f39766abc76c75ba58ffca747df3bd6d9bac82e410de1c78673dcd16a5ddfee
>>
>> Release artifacts are signed with the following key:
>> https://people.apache.org/keys/committer/joewitt.asc
>>
>> KEYS file available here:
>> https://dist.apache.org/repos/dist/release/nifi/KEYS
>>
>> 162 issues were closed/resolved for this release:
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?
>> projectId=12316020=12342422
>>
>> Release note highlights can be found here:
>> https://cwiki.apache.org/confluence/display/NIFI/
>> Release+Notes#ReleaseNotes-Version1.6.0
>>
>> The vote will be open for 72 hours.
>> Please download the release candidate and evaluate the necessary items
>> including checking hashes, signatures, build
>> from source, and test.  The please vote:
>>
>> [ ] +1 Release this package as nifi-1.6.0
>> [ ] +0 no opinion
>> [ ] -1 Do not release this package because...
>>


Re: [VOTE] Release Apache NiFi 1.6.0 (RC2)

2018-03-28 Thread Koji Kawamura
+1 (binding)

- Confirmed hashes
- Built with include-atlas profile
- Confirmed various flows with 3 node secured cluster on Ubuntu
- Tested integration with Hadoop environment and NiFi Registry

Koji

On Wed, Mar 28, 2018 at 12:27 PM, Andrew Lim  wrote:
> +1 (non-binding)
>
> -Ran full clean install on OS X (10.11.6)
> -Tested integration with Secure NiFi Registry (1.5.0)
> -Tested fine grained restricted component policies.  Verified two issues 
> discovered while testing RC1 have been fixed in RC2 [1, 2]
> -Ran basic flows successfully
> -Reviewed documentation
>
> Drew
>
> [1] https://issues.apache.org/jira/browse/NIFI-5008
> [2] https://issues.apache.org/jira/browse/NIFI-5009
>
>
>> On Mar 26, 2018, at 11:34 PM, Joe Witt  wrote:
>>
>> Hello,
>>
>> I am pleased to be calling this vote for the source release of Apache
>> NiFi nifi-1.6.0.
>>
>> The source zip, including signatures, digests, etc. can be found at:
>> https://repository.apache.org/content/repositories/orgapachenifi-1123
>>
>> The Git tag is nifi-1.6.0-RC2
>> The Git commit ID is b5935ec81a7cbc048820781ac62cd96bbea5b232
>> https://git-wip-us.apache.org/repos/asf?p=nifi.git;a=commit;h=b5935ec81a7cbc048820781ac62cd96bbea5b232
>>
>> Checksums of nifi-1.6.0-source-release.zip:
>> SHA1: 009f1e2e3c17e38f21f27170b9c06228d11653c0
>> SHA256: 39941a5b25427e2b4cc5ba8206084ff92df58863f29ddd097d4ac1e85424beb9
>> SHA512: 
>> 1773417a48665e3cda22180ea7f401bc8190ebddbf3f7bc29831e46e7ab0a07694c6e478d252fa573209d4a3c8132a522a8507b6a8784669ab7364847a07e234
>>
>> Release artifacts are signed with the following key:
>> https://people.apache.org/keys/committer/joewitt.asc
>>
>> KEYS file available here:
>> https://dist.apache.org/repos/dist/release/nifi/KEYS
>>
>> 146 issues were closed/resolved for this release:
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316020=12342422
>>
>> Release note highlights can be found here:
>> https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version1.6.0
>>
>> The vote will be open for 72 hours.
>> Please download the release candidate and evaluate the necessary items
>> including checking hashes, signatures, build
>> from source, and test.  The please vote:
>>
>> [ ] +1 Release this package as nifi-1.6.0
>> [ ] +0 no opinion
>> [ ] -1 Do not release this package because...
>


Re: [VOTE] Establish Fluid Design System, a sub-project of Apache NiFi

2018-03-11 Thread Koji Kawamura
+1

On Mon, Mar 12, 2018 at 3:10 AM, Matt Burgess  wrote:
> +1
>
> On Sun, Mar 11, 2018 at 1:00 PM, Jeff  wrote:
>> +1
>>
>> On Sat, Mar 10, 2018 at 8:42 PM Joe Skora  wrote:
>>
>>> +1
>>>
>>>
>>> On Fri, Mar 9, 2018, 3:10 PM Scott Aslan  wrote:
>>>
>>> > All,
>>> >
>>> > Following a solid discussion for the past couple of weeks [1] regarding
>>> the
>>> > establishment of Fluid Design System as a sub-project of Apache NiFi, I'd
>>> > like to
>>> > call a formal vote to record this important community decision and
>>> > establish consensus.
>>> >
>>> > The scope of this project is to define a theme-able set of high quality
>>> UI
>>> > components and utilities for use across the various Apache NiFi web
>>> > applications in order to provide a more consistent user experience.
>>> >
>>> > I am a +1 and looking forward to the future work in this area.
>>> >
>>> > The vote will be open for 72 hours and be a majority rule vote.
>>> >
>>> > [ ] +1 Establish Fluid Design System, a subproject of Apache NiFi
>>> > [ ]   0 Do not care
>>> > [ ]  -1 Do not establish Fluid Design System, a subproject of Apache NiFi
>>> >
>>> > Thanks,
>>> >
>>> > ScottyA
>>> >
>>> > [1] *
>>> >
>>> http://mail-archives.apache.org/mod_mbox/nifi-dev/201802.mbox/%3CCAKeSr4ibXX9xzGN1GhdVv5uTmWvfB3QULXF9orzw4FYD0n7taQ%40mail.gmail.com%3E
>>> > <
>>> >
>>> http://mail-archives.apache.org/mod_mbox/nifi-dev/201802.mbox/%3CCAKeSr4ibXX9xzGN1GhdVv5uTmWvfB3QULXF9orzw4FYD0n7taQ%40mail.gmail.com%3E
>>> > >*
>>> >
>>>


Re: Facing issue in Site to Site Https Communication

2018-02-22 Thread Koji Kawamura
Hi,

A common mistake with tls-toolkit is generating keystore and
truststore for each node using DIFFERENT NiFi CA Cert.
If tls-toolkit standalone is executed against different output
directories, it may produce different NiFi CA in each directory.

Please check both of s2s client and server truststores have the same
NiFi CA Cert.
To do so, use keytool command:
keytool -list -keystore truststore.jks
nifi-cert, Feb 9, 2018, trustedCertEntry,
Certificate fingerprint (SHA1):
FE:0D:FE:0D:72:40:0A:7E:49:45:1B:78:D9:F5:F4:6E:A2:3C:92:E5

If that's not the case, then I'd recommend adding
-Djavax.net.debug=all Java option to debug further.
You can add Java options from ${NIFI_HOME}/conf/bootstrap.conf.
https://docs.oracle.com/javase/7/docs/technotes/guides/security/jsse/ReadDebug.html

Thanks,
Koji

On Fri, Feb 23, 2018 at 9:01 AM, yi  wrote:
> Apologies, I should clarify that I still do not have communication working
> site to site. Please assist. Thank you.
>
>
>
> --
> Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/


Re: Facing issue in Site to Site Https Communication

2018-02-21 Thread Koji Kawamura
Hi,

If tls-toolkit was used to generate certificates, then there should be
server-1 and server-2 directories created and each contains
keystore.jks and truststore.jks.

```
sudo bash ./tls-toolkit.sh standalone -n 'server-1,server-2' -C 'CN=demo,
OU=nifi' -O -o ../security_output
```

Please check following configurations in nifi.properties file to see
if the generated keystore and truststore are specified correctly:

nifi.security.keystore
nifi.security.keystoreType
nifi.security.keystorePasswd
nifi.security.keyPasswd
nifi.security.truststore
nifi.security.truststoreType
nifi.security.truststorePasswd

Thanks,
Koji

On Thu, Feb 22, 2018 at 4:13 PM, yi  wrote:
> Hi there,
>
> sticking my nose in as I have the same issue!
>
> slightly different to Nishant, but here's my settings:
>
> On the RPG instance side
>
> # Site to Site properties
> nifi.remote.input.host=
> nifi.remote.input.secure=true
> nifi.remote.input.socket.port=8899
> nifi.remote.input.http.enabled=true
> nifi.remote.input.http.transaction.ttl=30 sec
>
> # web properties #
> nifi.web.war.directory=./lib
> nifi.web.http.host=
> nifi.web.http.port=
> nifi.web.http.network.interface.default=
> nifi.web.https.host=localhost
> nifi.web.https.port=8443
> nifi.web.https.network.interface.default=
> nifi.web.jetty.working.directory=./work/jetty
> nifi.web.jetty.threads=200
>
> On the "server" side
> # Site to Site properties
> nifi.remote.input.host=
> nifi.remote.input.secure=true
> nifi.remote.input.socket.port=8899
> nifi.remote.input.http.enabled=false
> nifi.remote.input.http.transaction.ttl=30 sec
>
> # web properties #
> nifi.web.war.directory=./lib
> nifi.web.http.host=
> nifi.web.http.port=
> nifi.web.http.network.interface.default=
> nifi.web.https.host=
> nifi.web.https.port=8443
> nifi.web.https.network.interface.default=
> nifi.web.jetty.working.directory=./work/jetty
> nifi.web.jetty.threads=200
>
>
> 
>
> Any guidance is appreciated!
>
> Thank you
>
>
>
> --
> Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/


Re: CSV record parsing with custom date formats

2018-02-15 Thread Koji Kawamura
Hi Derek,

Thanks for sharing the files and detailed README. I was able to
reproduce the issue.
It seems there are two different points those can be improved in this scenario.
I've created two JIRAs:

CSVRecordReader should utilize specified date/time/timestamp format at
its convertSimpleIfPossible method
https://issues.apache.org/jira/browse/NIFI-4882

VaridateRecord processor should be able to use different schema for
valid and invalid records
https://issues.apache.org/jira/browse/NIFI-4883

Derek, would you check those JIRA descriptions to see if I map the
issues correctly?
Also, are you interested in working with NIFI-4882? It should be
similar to the patch you included in the zip file.

Thanks,
Koji

On Thu, Feb 15, 2018 at 5:22 AM, Derek Straka  wrote:
> Thanks for the suggestion.  I want to use the field as a date, and my current
> work around is to preprocess the field with a Jython script.  If it is a
> bug, I'll make a report and take a stab at a fix.
>
>
>
> --
> Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/


Re: CSV record parsing with custom date formats

2018-02-13 Thread Koji Kawamura
Hi Derek,

By looking at the code briefly, I guess you are using ValidateRecord
processor with CSVReader and AvroWriter..
As you pointed out, it seems DataTypeUtils.isCompatibleDataType does
not use the date format user defined at CSVReader.

Is it possible for you to share followings for us to reproduce and
understand it better?
- Sample input CSV file
- NiFi flow template using CSVReader and AvroWriter

Thanks,
Koji

On Wed, Feb 14, 2018 at 7:11 AM, Derek Straka  wrote:
> I have a question about the expected behavior of convertSimpleIfPossible in
> CSVRecordReader.java (NiFi 1.5.0).
>
> I have a custom CSV file that I am taking to an avro schema using
> ValidateRecord.  The schema contains a logical date type and the CSV has
> the date in the format MM/DD/.  I expected to provide the date string
> in the controller element for the CSV reader and have everything parse
> happily, but it ends up throwing an exception when it tries to parse things
> in the avro writer (String->Date).  I don't think I should be blaming the
> avro writer because I expected the CSV reader to parse the date for me.
>
> I did a little digging in the CSVRecordReader.java, and I see everything
> flows through convertSimpleIfPossible when parsing the data, and each data
> type is checked with DataTypeUtils.isCompatibleDataType prior to actually
> trying to perform the conversion.
>
> The date string doesn't use the user provided format in the call to
> DataTypeUtils.isCompatibleDataType, but instead uses the default for date
> types.  The validation ends up failing when it uses the default date string
> (-MM-DD), so it won't use LAZY_DATE_FORMAT as I expected.  Am I totally
> off base, or it this unexpected behavior?
>
> Thanks.
>
> -Derek


Re: NiFi 1.5.0 HBase_1_1_2_ClientService performance bug

2018-02-09 Thread Koji Kawamura
Hi,

The PR is ready for review. I confirmed that performance issue is addressed.
https://github.com/apache/nifi/pull/2464

I was also testing to see if the
nifi-hbase_1_1_2-client-service-nar-1.6.0-SNAPSHOT.nar can be used in
NiFi 1.5.0 env. But unfortunately it doesn't seem we can put it as it
is.
A validation error occurs saying, 'HBase_1_1_2_ClientService
-1.6.0-SNAPSHOT from org.apache.nifi -
nifi-hbase_1_1_2-client-service-nar is not compatible with
HBaseClientService -1.5.0 from org.apache.nifi -
nifi-standard-services-api-nar'.
It looks like nifi-standard-services needs to be updated, too, but I
think that's a bit risky, it may affect other services.

So, I've wrote a Gist to work around this, with
nifi-hbase_1_1_2-client-service-nar-1.5.0_nifi-4866.nar built with
1.5.0 released commit with cherry-picked performance fix.
https://gist.github.com/ijokarumawak/85db60ca71f1825f543c18c62bf7c3fd

Thanks,
Koji



On Sat, Feb 10, 2018 at 10:37 AM, Koji Kawamura <ijokaruma...@gmail.com> wrote:
> Hi Adam,
>
> Thank you very much for reporting the performance issue.
> I created NIFI-4866 and started fixing the issue by moving the
> problematic code block to createConnection.
> After confirming that addresses performance issue, I will send a PR to
> get it merged.
>
> Koji
>
>
> On Sat, Feb 10, 2018 at 9:25 AM, Joe Witt <joe.w...@gmail.com> wrote:
>> adam
>>
>> you should also be able to put the old hbase nar in and switch to that
>> version.
>>
>> we now support multiple versions of the same component.
>>
>> thanks
>>
>> On Feb 9, 2018 7:10 PM, "Mike Thomsen" <mikerthom...@gmail.com> wrote:
>>
>>> Adam,
>>>
>>> If you're doing bulk ingestion of JSON, I would recommend using
>>> PutHBaseRecord. I wrote it/contributed it when my team ran into similar
>>> limitations doing genomic data ingestion (several 10s of billions of Puts
>>> from the 1000 genomes project). If you run into problems with it, just post
>>> them and poke me.
>>>
>>> Mike
>>>
>>> On Fri, Feb 9, 2018 at 6:56 PM, Joe Witt <joe.w...@gmail.com> wrote:
>>>
>>> > adam
>>> >
>>> > thanks for reporting and if you can do a contrib that would be great!
>>> >
>>> > thanks
>>> > joe
>>> >
>>> > On Feb 9, 2018 6:56 PM, "Martini, Adam" <adam.mart...@nike.com> wrote:
>>> >
>>> > > Hello NiFi Dev Community,
>>> > >
>>> > > This commit hash (part of the NiFi 1.5.0 release) created serious
>>> > > performance issues for HBase Put operations: "
>>> > > 116c8463428c1fb51bfb7a8adfcf23c32fded964".
>>> > >
>>> > > The override of the “toTransitUri” method makes a call to
>>> > > “connection.getAdmin().getClusterStatus().getMaster()
>>> .getHostAndPort()”
>>> > > upon every flow file transfer, which essentially doubles the traffic
>>> > > through the HBase connector.  The performance of our PutHBaseJSON
>>> > processor
>>> > > dropped to 1/3 after deploying NiFi 1.5.0.
>>> > >
>>> > > Please let us know a timeline for a fix.  We are building and testing
>>> our
>>> > > own tar ball in the interim to fix the issue and are happy to
>>> contribute
>>> > > our code back to the project if you would like.
>>> > >
>>> > > All the best and thank you.
>>> > >
>>> > > Adam Martini
>>> > > Senior Developer, Nike Digital
>>> > >
>>> > >
>>> > >
>>> >
>>>


Re: NiFi 1.5.0 HBase_1_1_2_ClientService performance bug

2018-02-09 Thread Koji Kawamura
Hi Adam,

Thank you very much for reporting the performance issue.
I created NIFI-4866 and started fixing the issue by moving the
problematic code block to createConnection.
After confirming that addresses performance issue, I will send a PR to
get it merged.

Koji


On Sat, Feb 10, 2018 at 9:25 AM, Joe Witt  wrote:
> adam
>
> you should also be able to put the old hbase nar in and switch to that
> version.
>
> we now support multiple versions of the same component.
>
> thanks
>
> On Feb 9, 2018 7:10 PM, "Mike Thomsen"  wrote:
>
>> Adam,
>>
>> If you're doing bulk ingestion of JSON, I would recommend using
>> PutHBaseRecord. I wrote it/contributed it when my team ran into similar
>> limitations doing genomic data ingestion (several 10s of billions of Puts
>> from the 1000 genomes project). If you run into problems with it, just post
>> them and poke me.
>>
>> Mike
>>
>> On Fri, Feb 9, 2018 at 6:56 PM, Joe Witt  wrote:
>>
>> > adam
>> >
>> > thanks for reporting and if you can do a contrib that would be great!
>> >
>> > thanks
>> > joe
>> >
>> > On Feb 9, 2018 6:56 PM, "Martini, Adam"  wrote:
>> >
>> > > Hello NiFi Dev Community,
>> > >
>> > > This commit hash (part of the NiFi 1.5.0 release) created serious
>> > > performance issues for HBase Put operations: "
>> > > 116c8463428c1fb51bfb7a8adfcf23c32fded964".
>> > >
>> > > The override of the “toTransitUri” method makes a call to
>> > > “connection.getAdmin().getClusterStatus().getMaster()
>> .getHostAndPort()”
>> > > upon every flow file transfer, which essentially doubles the traffic
>> > > through the HBase connector.  The performance of our PutHBaseJSON
>> > processor
>> > > dropped to 1/3 after deploying NiFi 1.5.0.
>> > >
>> > > Please let us know a timeline for a fix.  We are building and testing
>> our
>> > > own tar ball in the interim to fix the issue and are happy to
>> contribute
>> > > our code back to the project if you would like.
>> > >
>> > > All the best and thank you.
>> > >
>> > > Adam Martini
>> > > Senior Developer, Nike Digital
>> > >
>> > >
>> > >
>> >
>>


Re: Reg: AzureBlobStorage Sensitive Property

2018-01-30 Thread Koji Kawamura
Hi Sivaprasanna,

That's a good point.

I am not aware of any background reason for ACCOUNT_NAME to be a
sensitive property.
It seems that it has been a sensitive property since the beginning
when Azure blob processors were contributed.
https://github.com/apache/nifi/pull/1636/files#diff-ad53d357304781182a9f427bab9e6215R29

Any recommendation can be found that suggest protecting an account
name by looking at Azure Blob security guid.
https://docs.microsoft.com/en-us/azure/storage/common/storage-security-guide

I think it's reasonable to make ACCOUNT_NAME as a normal property.
If you are interested in doing that, please file a JIRA.
https://issues.apache.org/jira/browse/NIFI

Thanks,
Koji


On Mon, Jan 29, 2018 at 4:53 PM, Sivaprasanna  wrote:
> Hi,
>
> I was going through the Azure Blob Storage code-base. If you look at the
> AzureStorageUtils
> ,
> ACCOUNT_NAME is defined as a sensitive property. I would like to know the
> rationale behind that. The reason I'm asking is regardless of making the
> storage account name as sensitive, the azure.primaryUri flowfile attribute
> will have the storage account name in it since the primary URI is going to
> be like this: https://*mystorageaccountname*.
> blob.core.windows.net/container/blob
>
> -
> Sivaprasanna


Re: Help & advice needed on application.js in Processors

2018-01-30 Thread Koji Kawamura
Hi Dave,

If you can confirm the updated application.js is included in the war
file, then it sounds like a matter of Web browser caching. The old
application.js cached at the client Web browser may be used. A hard
reload (Ctrl + Shift + R for Chrome) may help if that's the case.

Thanks,
Koji

On Wed, Jan 31, 2018 at 6:48 AM, DAVID SMITH  wrote:
> Hi
> I am trying to create a processor that is partially based on the 
> UpdateAttribute processor 0.7.3.I have cloned the UpdateAttribute source, 
> renamed the processor and I have started by trying to amend the Advanced 
> configuration UI. I have found that I can change labels for fields in the 
> Advanced configuration UI in WEB-INF/jsp/worksheet.jsp, and these are 
> reflected in the Advanced UI after a reload.
> However any changes that I make in webapp/js/application.js, such as changing 
> the label on the Rule FIlter button never gets picked up and displayed in the 
> UI.I have unpacked the war file and the application.js looks exactly as I 
> have edited it.
> When I build the new processor  nar I am using mvn clean install, I am seeing 
> no errors, also when NiFi loads there are no errors or warnings.Is there a 
> developers guide for creating this type of UI in processors, or can someone 
> help and tell me why my changes are not being picked up?
> Many thanksDave


Re: Confused by two classes

2018-01-25 Thread Koji Kawamura
Hello,

ComplexRecordField is used to represent child record which can have
multiple fields in it. I.e. embedded objects.
ComplexRecordField corresponds to Records type in Avro terminology [1].
UnionRecordField represents a field which data type can be one of
defined types. It is often used to represents optional field, e.g.
["null", "string"].
So does UnionRecordField to Unions [2].

RepositoryRecordSchema [3] may be a good example to use Complex and
Union types to represent a top level field that can either
"createOrUpdate", "delete", "swapOut" or "swapIn" child record.

Hope this helps and my understanding is correct. If not Mark Payne
will describe it better :)

For the demand of more javadocs, I totally agree with you. Please feel
free to submit a JIRA for improvements.
https://issues.apache.org/jira/projects/NIFI

[1] http://avro.apache.org/docs/current/spec.html#schema_record
[2] http://avro.apache.org/docs/current/spec.html#Unions
[3] 
https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-flowfile-repo-serialization/src/main/java/org/apache/nifi/controller/repository/schema/RepositoryRecordSchema.java#L97

Thanks,
Koji

On Thu, Jan 25, 2018 at 6:23 PM, sunfishyc  wrote:
> Hi there!
> I'm reading NiFi's source code recently, and I got confused by these two 
> classes:
> org.apache.nifi.repository.schema.ComplexRecordField and
> org.apache.nifi.repository.schema.UnionRecordField
>
> It seems to me that these two classes are almost the same, except that 
> ComplexRecordField has customed hashCode and equals method.
> Are they designed to be this similar or it's accidently duplicated? Can they 
> be simplyfied to one?
>
> In addition, I think there should be more javadocs in this project, to 
> improve the readability.
>
> 2018-01-25
>
>
> sunfishyc


Re: [VOTE] Release Apache NiFi 1.5.0 (RC1)

2018-01-10 Thread Koji Kawamura
+1 (binding) Release this package as nifi-1.5.0

Verified signature and hashes. Built with include-atlas profile.
mvn clean install -Pcontrib-check,include-grpc,include-atlas
Confirmed flows using NiFi Registry and ReportLineageToAtlas reporting
task, worked as expected.

On Thu, Jan 11, 2018 at 4:35 AM, Matt Gilman  wrote:
> +1 (binding) Release this package as nifi-1.5.0
>
> Verified signature, hashes, build, etc. Ran through a number of scenarios
> with Apache NiFi Registry 0.1.0 and everything is working as expected.
>
> Thanks Joe for RMing this release!
>
> Matt
>
> On Wed, Jan 10, 2018 at 2:22 PM, Rob Moran  wrote:
>
>> +1, non-binding
>>
>> * All looks good/in place following the release helper
>> * Reviewed help docs related to new Registry integration
>> * Connected a registry client and did some quick testing of basic version
>> control related actions
>>
>>
>> On Wed, Jan 10, 2018 at 1:24 PM Andrew Lim 
>> wrote:
>>
>> > +1 (non-binding)
>> >
>> > -Ran full clean install on OS X (10.11.6)
>> > -Tested integration with NiFi Registry
>> > -Ran record reader/writer flows
>> > -Reviewed resolved “Core UI” component Jiras and spot checked inclusion
>> in
>> > build
>> > -Reviewed documentation
>> >
>> > Drew
>> >
>> >
>> > > On Jan 9, 2018, at 5:19 AM, Joe Witt  wrote:
>> > >
>> > > Hello,
>> > >
>> > > I am pleased to be calling this vote for the source release of Apache
>> > > NiFi nifi-1.5.0.
>> > >
>> > > The source zip, including signatures, digests, etc. can be found at:
>> > > https://repository.apache.org/content/repositories/orgapachenifi-1116
>> > >
>> > > The Git tag is nifi-1.5.0-RC1
>> > > The Git commit ID is 46d30c7e92f0ad034d9b35bf1d05c350ab5547ed
>> > >
>> > https://git-wip-us.apache.org/repos/asf?p=nifi.git;a=commit;h=
>> 46d30c7e92f0ad034d9b35bf1d05c350ab5547ed
>> > >
>> > > Checksums of nifi-1.5.0-source-release.zip:
>> > > MD5: 046f2dde4af592dd8c05e55c2bbb3c4f
>> > > SHA1: 63b9a68b9f89200fd31f5561956a15b45b1b9c8c
>> > > SHA256: 40b155c4911414907835f2eb0d5a4da798935f27f1e5134218d904fe6c94
>> 2d13
>> > >
>> > > Release artifacts are signed with the following key:
>> > > https://people.apache.org/keys/committer/joewitt.asc
>> > >
>> > > KEYS file available here:
>> > > https://dist.apache.org/repos/dist/release/nifi/KEYS
>> > >
>> > > 195 issues were closed/resolved for this release:
>> > >
>> > https://issues.apache.org/jira/secure/ReleaseNote.jspa?
>> projectId=12316020=12341668
>> > >
>> > > Release note highlights can be found here:
>> > >
>> > https://cwiki.apache.org/confluence/display/NIFI/
>> Release+Notes#ReleaseNotes-Version1.5.0
>> > >
>> > > The vote will be open for 72 hours.
>> > > Please download the release candidate and evaluate the necessary items
>> > > including checking hashes, signatures, build
>> > > from source, and test.  The please vote:
>> > >
>> > > [ ] +1 Release this package as nifi-1.5.0
>> > > [ ] +0 no opinion
>> > > [ ] -1 Do not release this package because...
>> >
>> >
>>
>> --
>> Rob
>>


Re: [VOTE] Release Apache NiFi Registry 0.1.0

2017-12-29 Thread Koji Kawamura
+1 binding

This is really awesome!
I confirmed hashes and basic usages. Looks great for the 1st release.
Found couple of minor possible improvements on the NiFi side, and
posted comments to NiFi PR2219.
https://github.com/apache/nifi/pull/2219

Thanks for your work and effort, looking forward to use it in my daily
NiFi life!

Koji

On Sat, Dec 30, 2017 at 7:21 AM, Ben Qiu  wrote:
> +1
>
> On 2017-12-28 10:09, Bryan Bende  wrote:
>> Hello,>
>>
>> I am pleased to be calling this vote for the source release of Apache>
>> NiFi Registry 0.1.0.>
>>
>> The source zip, including signatures, digests, etc. can be found at:>
>> https://repository.apache.org/content/repositories/orgapachenifi-1115/>
>>
>> The Git tag is nifi-registry-0.1.0-RC1>
>> The Git commit ID is 81b99e7b04491eabb72ddf30754053ca12d0fcca>
>>
> https://git-wip-us.apache.org/repos/asf?p=nifi-registry.git;a=commit;h=81b99e7b04491eabb72ddf30754053ca12d0fcca>
>
>>
>> Checksums of nifi-registry-0.1.0-source-release.zip:>
>> MD5: 56244c3c296cdc9c3fcc6d22590b80d1>
>> SHA1: 6354e91f868f40d6656ec2467bde307260ad63ca>
>> SHA256: 2c680e441e6c4bfa2381bf004e9b19a6a79401a6a83e04597d0a714a95efd301>
>>
>> Release artifacts are signed with the following key:>
>> https://people.apache.org/keys/committer/bbende.asc>
>>
>> KEYS file available here:>
>> https://dist.apache.org/repos/dist/release/nifi/KEYS>
>>
>> 65 issues were closed/resolved for this release:>
>>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12320920=12340217>
>
>>
>> Release note highlights can be found here:>
>> https://cwiki.apache.org/confluence/display/NIFI/Release
> Notes#ReleaseNotes-NiFiRegistry0.1.0>
>>
>> The vote will be open for 96 hours.>
>>
>> Please download the release candidate and evaluate the necessary items>
>> including checking hashes, signatures, build from source, and test.>
>>
>> The please vote:>
>>
>> [ ]  1 Release this package as nifi-registry-0.1.0>
>> [ ]  0 no opinion>
>> [ ] -1 Do not release this package because...>
>>


Re: DBCP Connection Pooling using multiple OJDBC Drivers

2017-12-27 Thread Koji Kawamura
Hi Nadeem,

Did you try specifying a external directory instead of an exact jar
location, then put multiple versions of Jar there? This way
DBCPConnectionPool can utilize multiple jars. This would provide the
similar effect with putting one in NiFi lib dir.

If that doesn't work, an alternative approach would be setup different
DBCPConnectionPool instance per required JDBC driver version then
switching those by some condition.

I haven't tested above myself, so no guarantee, but just wanted to
share my thoughts.

Thanks,
Koji

On Thu, Dec 28, 2017 at 12:14 AM, Mohammed Nadeem  wrote:
> Thanks Mark that really helps. But I'm facing below issue
>
> As you said i loaded drivers externally and it was successful for oracle8i
> where i executed simple stored procedure and it is working fine but when i
> try to execute stored procedure which has array type for Oracle 11g then im
> getting this error ( can't wrap connection to oracle connection )
>
> java.lang.AbstractMethodError: null
> at
> org.apache.commons.dbcp.DelegatingConnection.unwrap(DelegatingConnection.java:553)
> ~[na:na]
> at
> org.apache.commons.dbcp.DelegatingConnection.unwrap(DelegatingConnection.java:553)
> ~[na:na]
> at
> ExecuteProcedure.executeStoredProcedure(GE_Scon_ExecuteProcedure.java:584)
> ~[na:na]
> at ExecuteProcedure.onTrigger(GE_Scon_ExecuteProcedure.java:382) 
> ~[na:na]
>
> It can't unwrap the connection to oracle connection . Earlier i had resolved
> this issue by placing ojdbc7.jar in my nifi-lib folder then it started to
> work.. But as you said we should not place any jars in the nifi lib folder
> ..
> So now its giving this error. Please help here
>
> Thanks,
> Nadeem
> Software Engineering Specialist
>
>
>
> --
> Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/


Re: Compile avro to RecordSchema

2017-12-27 Thread Koji Kawamura
Hi Mike,

You might already have found it, but AvroTypeUtil.createSchema is
probably what you are looking for.
https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-extension-utils/nifi-record-utils/nifi-avro-record-utils/src/main/java/org/apache/nifi/avro/AvroTypeUtil.java#L341

Thanks,
Koji

On Wed, Dec 27, 2017 at 9:15 PM, Mike Thomsen  wrote:
> Is there an API for compiling an avro schema to a RecordSchema object, or
> is it more involved than that?
>
> Thanks,
>
> Mike


Re: proper way in nifi to sync status between custom processors

2017-12-27 Thread Koji Kawamura
Hi Ben, you can filter events by timestamp as well.
https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#searching-for-events

On Wed, Dec 27, 2017 at 6:28 PM, 尹文才 <batman...@gmail.com> wrote:
> Hi Koji, I saw it was only showing the 1000 events so I couldn't see the
> event when the FlowFile was created.
>
> Regards,
> Ben
>
> 2017-12-27 17:21 GMT+08:00 Koji Kawamura <ijokaruma...@gmail.com>:
>
>> I see, thanks. The easiest way to look at provenance events would be
>> by right clicking a processor instance you are interested in, then
>> select 'View data provenance' context menu. This way, NiFi displays
>> provenance events for the selected processor.
>>
>> Koji
>>
>> On Wed, Dec 27, 2017 at 6:17 PM, 尹文才 <batman...@gmail.com> wrote:
>> > Hi Koji, sorry about the provenance exception, it was because there's no
>> > space left on the machine(filled up with logs)
>> >
>> > Regards,
>> > Ben
>> >
>> > 2017-12-27 17:11 GMT+08:00 尹文才 <batman...@gmail.com>:
>> >
>> >> Hi Koji, thanks, the names of the temp tables are created with format
>> >> "MMddHHmmssSSS-", the first time indicates the time and the
>> second
>> >> part is a random number with length of 4.
>> >> So I think it's not possible to have 2 duplicate table names, the only
>> >> possibly I could think is the flowfile is passed into the processor
>> twice.
>> >>
>> >> About the provenance, I had updated to use the
>> >> WriteAheadProvenanceRepository implementation, but when I tried to check
>> >> the data provenance, it showed me the following exception message:
>> >> HTTP ERROR 500
>> >>
>> >> Problem accessing /nifi/provenance. Reason:
>> >>
>> >> Server Error
>> >>
>> >> Caused by:
>> >>
>> >> javax.servlet.ServletException: org.eclipse.jetty.servlet.ServletHolder$1:
>> java.lang.NullPointerException
>> >>   at org.eclipse.jetty.server.handler.HandlerCollection.
>> handle(HandlerCollection.java:138)
>> >>   at org.eclipse.jetty.server.handler.gzip.GzipHandler.
>> handle(GzipHandler.java:561)
>> >>   at org.eclipse.jetty.server.handler.HandlerWrapper.handle(
>> HandlerWrapper.java:132)
>> >>   at org.eclipse.jetty.server.Server.handle(Server.java:564)
>> >>   at org.eclipse.jetty.server.HttpChannel.handle(
>> HttpChannel.java:320)
>> >>   at org.eclipse.jetty.server.HttpConnection.onFillable(
>> HttpConnection.java:251)
>> >>   at org.eclipse.jetty.io.AbstractConnection$
>> ReadCallback.succeeded(AbstractConnection.java:279)
>> >>   at org.eclipse.jetty.io.FillInterest.fillable(
>> FillInterest.java:110)
>> >>   at org.eclipse.jetty.io.ssl.SslConnection.onFillable(
>> SslConnection.java:258)
>> >>   at org.eclipse.jetty.io.ssl.SslConnection$3.succeeded(
>> SslConnection.java:147)
>> >>   at org.eclipse.jetty.io.FillInterest.fillable(
>> FillInterest.java:110)
>> >>   at org.eclipse.jetty.io.ChannelEndPoint$2.run(
>> ChannelEndPoint.java:124)
>> >>   at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(
>> QueuedThreadPool.java:672)
>> >>   at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(
>> QueuedThreadPool.java:590)
>> >>   at java.lang.Thread.run(Thread.java:745)
>> >> Caused by: org.eclipse.jetty.servlet.ServletHolder$1:
>> java.lang.NullPointerException
>> >>   at org.eclipse.jetty.servlet.ServletHolder.makeUnavailable(
>> ServletHolder.java:596)
>> >>   at org.eclipse.jetty.servlet.ServletHolder.initServlet(
>> ServletHolder.java:655)
>> >>   at org.eclipse.jetty.servlet.ServletHolder.getServlet(
>> ServletHolder.java:498)
>> >>   at org.eclipse.jetty.servlet.ServletHolder.ensureInstance(
>> ServletHolder.java:785)
>> >>   at org.eclipse.jetty.servlet.ServletHolder.prepare(
>> ServletHolder.java:770)
>> >>   at org.eclipse.jetty.servlet.ServletHandler.doHandle(
>> ServletHandler.java:538)
>> >>   at org.eclipse.jetty.server.handler.ScopedHandler.handle(
>> ScopedHandler.java:143)
>> >>   at org.eclipse.jetty.security.SecurityHandler.handle(
>> SecurityHandler.java:548)
>> >>   at org.eclipse.jetty.server.handler.HandlerWrapper.handle(
>> HandlerWrapper.java:132)
>> >

Re: proper way in nifi to sync status between custom processors

2017-12-27 Thread Koji Kawamura
va:258)
>>   at 
>> org.eclipse.jetty.io.ssl.SslConnection$3.succeeded(SslConnection.java:147)
>>   at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:110)
>>   at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:124)
>>   at 
>> org.eclipse.jetty.util.thread.Invocable.invokePreferred(Invocable.java:122)
>>   at 
>> org.eclipse.jetty.util.thread.strategy.ExecutingExecutionStrategy.invoke(ExecutingExecutionStrategy.java:58)
>>   at 
>> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:201)
>>   at 
>> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:133)
>>   at 
>> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:672)
>>   at 
>> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:590)
>>   at java.lang.Thread.run(Thread.java:745)
>>
>> My configuration inside nifi.properties is as below:
>> # Provenance Repository Properties
>> nifi.provenance.repository.implementation=org.apache.nifi.provenance.
>> WriteAheadProvenanceRepository
>> nifi.provenance.repository.debug.frequency=1_000_000
>> nifi.provenance.repository.encryption.key.provider.implementation=
>> nifi.provenance.repository.encryption.key.provider.location=
>> nifi.provenance.repository.encryption.key.id=
>> nifi.provenance.repository.encryption.key=
>>
>> # Persistent Provenance Repository Properties
>> nifi.provenance.repository.directory.default=../provenance_repository
>> nifi.provenance.repository.max.storage.time=24 hours
>> nifi.provenance.repository.max.storage.size=1 GB
>> nifi.provenance.repository.rollover.time=30 secs
>> nifi.provenance.repository.rollover.size=100 MB
>> nifi.provenance.repository.query.threads=2
>> nifi.provenance.repository.index.threads=1
>> nifi.provenance.repository.compress.on.rollover=true
>> nifi.provenance.repository.always.sync=false
>> nifi.provenance.repository.index.shard.size=4 GB
>>
>>
>> By the way, does this Data Provenance list all FlowFiles ever created or
>> only part of it? Should I try to find the FlowFile with the exception time
>> in the log? Thanks.
>>
>> Regards,
>> Ben
>>
>> 2017-12-27 16:57 GMT+08:00 Koji Kawamura <ijokaruma...@gmail.com>:
>>
>>> Hi Ben,
>>>
>>> The ExecuteSqlCommand retry logic does not execute the same query
>>> multiple times if it succeeds.
>>> So, there must be input FlowFiles containing the same query had been
>>> passed more than once.
>>> It could be the same FlowFile, or different FlowFiles generated by the
>>> first processor for some reason.
>>> To investigate those kind of FlowFile level information, NiFi
>>> provenance data and FlowFile lineage will be very useful.
>>> https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#
>>> viewing-flowfile-lineage
>>>
>>> I didn't mention about it earlier because you were having Provenance
>>> repository performance issue, but I hope you can use it now with the
>>> WriteAheadProvenanceRepository.
>>>
>>> Thanks,
>>> Koji
>>>
>>> On Wed, Dec 27, 2017 at 5:44 PM, 尹文才 <batman...@gmail.com> wrote:
>>> > Thanks Koji, for the ExecuteSqlCommand issue, I was trying to re-execute
>>> > the sql query if the connection is lost(connection could be unstable),
>>> my
>>> > idea is to only transfer the FlowFile to the success relationship
>>> > after successfully executing the sql query. You could see the do while
>>> loop
>>> > in the code, the transaction will be rollbacked if the execution
>>> failed; if
>>> > the connection is lost, it will retry to execute the sql.
>>> > Will this logic cause my sql to be executed twice?
>>> >
>>> > For the WaitBatch processor, I will take your approach to test
>>> individually
>>> > to see if the WaitBatch processor could cause the FlowFile repository
>>> > checkpointing failure.
>>> >
>>> > Regards,
>>> > Ben
>>> >
>>> > 2017-12-27 16:10 GMT+08:00 Koji Kawamura <ijokaruma...@gmail.com>:
>>> >
>>> >> Hi Ben,
>>> >>
>>> >> Excuse me, I'm trying, but probably I don't fully understand what you
>>> >> want to achieve with the flow.
>>> >>
>>> >> It looks weird that WaitBatch is failing with such

Re: proper way in nifi to sync status between custom processors

2017-12-27 Thread Koji Kawamura
Hi Ben,

The ExecuteSqlCommand retry logic does not execute the same query
multiple times if it succeeds.
So, there must be input FlowFiles containing the same query had been
passed more than once.
It could be the same FlowFile, or different FlowFiles generated by the
first processor for some reason.
To investigate those kind of FlowFile level information, NiFi
provenance data and FlowFile lineage will be very useful.
https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#viewing-flowfile-lineage

I didn't mention about it earlier because you were having Provenance
repository performance issue, but I hope you can use it now with the
WriteAheadProvenanceRepository.

Thanks,
Koji

On Wed, Dec 27, 2017 at 5:44 PM, 尹文才 <batman...@gmail.com> wrote:
> Thanks Koji, for the ExecuteSqlCommand issue, I was trying to re-execute
> the sql query if the connection is lost(connection could be unstable), my
> idea is to only transfer the FlowFile to the success relationship
> after successfully executing the sql query. You could see the do while loop
> in the code, the transaction will be rollbacked if the execution failed; if
> the connection is lost, it will retry to execute the sql.
> Will this logic cause my sql to be executed twice?
>
> For the WaitBatch processor, I will take your approach to test individually
> to see if the WaitBatch processor could cause the FlowFile repository
> checkpointing failure.
>
> Regards,
> Ben
>
> 2017-12-27 16:10 GMT+08:00 Koji Kawamura <ijokaruma...@gmail.com>:
>
>> Hi Ben,
>>
>> Excuse me, I'm trying, but probably I don't fully understand what you
>> want to achieve with the flow.
>>
>> It looks weird that WaitBatch is failing with such FlowFile repository
>> error, while other processor such as ReplaceText succeeds.
>> I recommend to test WaitBatch alone first without combining the
>> database related processors, by feeding a test FlowFile having
>> expected FlowFile attributes.
>> Such input FlowFiles can be created by GenerateFlowFile processor.
>> If the same error happens with only WaitBatch processor, then it
>> should be easier to debug.
>>
>> Thanks,
>> Koji
>>
>> On Wed, Dec 27, 2017 at 4:49 PM, Koji Kawamura <ijokaruma...@gmail.com>
>> wrote:
>> > Hi Ben,
>> >
>> > The one thing that looks strange in the screenshot is the
>> > ExecuteSqlCommand having FlowFiles queued in its incoming connection.
>> > Those should be transferred to 'failure' relationship.
>> >
>> > Following executeSql() method, shouldn't it re-throw the caught
>> exception?
>> >
>> >
>> > try (Connection con = dbcpService.getConnection()) {
>> > logger.debug("设置autoCommit为false");
>> > con.setAutoCommit(false);
>> >
>> > try (Statement stmt = con.createStatement()) {
>> > logger.info("执行sql语句: {}", new Object[]{sql});
>> > stmt.execute(sql);
>> >
>> > // 所有sql语句执行在一个transaction内
>> > logger.debug("提交transaction");
>> > con.commit();
>> > } catch (Exception ex) {
>> > logger.error("执行sql语句失败:{}", new Object[]{sql, ex});
>> > con.rollback();
>> > //将exception抛到外层处理
>> > throw ex;
>> > } finally {
>> > logger.debug("重新设置autoCommit为true");
>> > con.setAutoCommit(true);
>> > }
>> > } catch (Exception ex) {
>> > // HERE, the exception is swallowed, that's why the FlowFiles stay in
>> > the incoming connection.
>> > logger.error("重试执行sql语句:{}", new Object[]{sql, ex});
>> > retryOnFail = true;
>> > }
>> >
>> > Thanks,
>> > Koji
>> >
>> > On Wed, Dec 27, 2017 at 2:38 PM, 尹文才 <batman...@gmail.com> wrote:
>> >> Hi Koji, no problem. You could check the code of processor WaitBatch at
>> the
>> >> link:
>> >> https://drive.google.com/open?id=1DMpW5GMiXpyZQdui989Rr3D9rlchQfWQ
>> >>
>> >> I also uploaded a snapshot of part of NiFi flow which includes the
>> >> ExecuteSqlCommand and WaitBatch, you could check the picture at the
>> link:
>> >> https://drive.google.com/file/d/1vdxlWj8ANHQH0CMrXnydLni5o-3IVi2h/view
>> >>
>&g

Re: proper way in nifi to sync status between custom processors

2017-12-27 Thread Koji Kawamura
Hi Ben,

Excuse me, I'm trying, but probably I don't fully understand what you
want to achieve with the flow.

It looks weird that WaitBatch is failing with such FlowFile repository
error, while other processor such as ReplaceText succeeds.
I recommend to test WaitBatch alone first without combining the
database related processors, by feeding a test FlowFile having
expected FlowFile attributes.
Such input FlowFiles can be created by GenerateFlowFile processor.
If the same error happens with only WaitBatch processor, then it
should be easier to debug.

Thanks,
Koji

On Wed, Dec 27, 2017 at 4:49 PM, Koji Kawamura <ijokaruma...@gmail.com> wrote:
> Hi Ben,
>
> The one thing that looks strange in the screenshot is the
> ExecuteSqlCommand having FlowFiles queued in its incoming connection.
> Those should be transferred to 'failure' relationship.
>
> Following executeSql() method, shouldn't it re-throw the caught exception?
>
>
> try (Connection con = dbcpService.getConnection()) {
> logger.debug("设置autoCommit为false");
> con.setAutoCommit(false);
>
> try (Statement stmt = con.createStatement()) {
> logger.info("执行sql语句: {}", new Object[]{sql});
> stmt.execute(sql);
>
> // 所有sql语句执行在一个transaction内
> logger.debug("提交transaction");
> con.commit();
> } catch (Exception ex) {
> logger.error("执行sql语句失败:{}", new Object[]{sql, ex});
> con.rollback();
> //将exception抛到外层处理
> throw ex;
> } finally {
> logger.debug("重新设置autoCommit为true");
> con.setAutoCommit(true);
> }
> } catch (Exception ex) {
> // HERE, the exception is swallowed, that's why the FlowFiles stay in
> the incoming connection.
> logger.error("重试执行sql语句:{}", new Object[]{sql, ex});
> retryOnFail = true;
> }
>
> Thanks,
> Koji
>
> On Wed, Dec 27, 2017 at 2:38 PM, 尹文才 <batman...@gmail.com> wrote:
>> Hi Koji, no problem. You could check the code of processor WaitBatch at the
>> link:
>> https://drive.google.com/open?id=1DMpW5GMiXpyZQdui989Rr3D9rlchQfWQ
>>
>> I also uploaded a snapshot of part of NiFi flow which includes the
>> ExecuteSqlCommand and WaitBatch, you could check the picture at the link:
>> https://drive.google.com/file/d/1vdxlWj8ANHQH0CMrXnydLni5o-3IVi2h/view
>>
>> You mentioned above that FlowFile repository fails checkpointing will
>> affect other processors to process same FlowFile again, but as you could
>> see from my snapshot image, the ExecuteSqlCommand is the second processor
>> and before the WaitBatch processor, even if the FlowFile repository
>> checkpointing failure is caused by WaitBatch, could it lead to the
>> processors before it to process a FlowFile multiple times? Thanks.
>>
>> Regards,
>> Ben
>>
>> 2017-12-27 12:36 GMT+08:00 Koji Kawamura <ijokaruma...@gmail.com>:
>>
>>> Hi Ben,
>>>
>>> I was referring these two log messages in your previous email.
>>> These two messages are both written by ExecuteSqlCommand, it does not
>>> mean 'it was executed again'.
>>>
>>> ```
>>> 2017-12-26 07:00:01,312 INFO [Timer-Driven Process Thread-1]
>>> c.z.nifi.processors.ExecuteSqlCommand
>>> ExecuteSqlCommand[id=3c97dfd8-aaa4-3a37-626e-fed5a4822d14] 执行sql语句: SELECT
>>> TOP 0 * INTO tmp.ods_extractDataDebug_20171226031801926_9195 FROM
>>> dbo.ods_extractDataDebug;
>>> alter table tmp.ods_extractDataDebug_20171226031801926_9195 drop column
>>> _id;
>>>
>>> and it was executed again later:
>>>
>>> 2017-12-26 07:00:01,315 ERROR [Timer-Driven Process Thread-1]
>>> c.z.nifi.processors.ExecuteSqlCommand
>>> ExecuteSqlCommand[id=3c97dfd8-aaa4-3a37-626e-fed5a4822d14]
>>> 执行sql语句失败:SELECT
>>> ```
>>>
>>> As you written, the case where FlowFile repository fails checkpointing
>>> will affect other processors to process same FlowFiles again. However
>>> there won't be a simple solution to every processor to rollback its
>>> job as different processors do different things. Creating a temp table
>>> if not exist seems right approach to me.
>>>
>>> At the same time, the route cause of getting FlowFile repository
>>> failed should be investigated. Is it possible to share WaitBatch code?
>>> 

Re: proper way in nifi to sync status between custom processors

2017-12-26 Thread Koji Kawamura
Hi Ben,

The one thing that looks strange in the screenshot is the
ExecuteSqlCommand having FlowFiles queued in its incoming connection.
Those should be transferred to 'failure' relationship.

Following executeSql() method, shouldn't it re-throw the caught exception?


try (Connection con = dbcpService.getConnection()) {
logger.debug("设置autoCommit为false");
con.setAutoCommit(false);

try (Statement stmt = con.createStatement()) {
logger.info("执行sql语句: {}", new Object[]{sql});
stmt.execute(sql);

// 所有sql语句执行在一个transaction内
logger.debug("提交transaction");
con.commit();
} catch (Exception ex) {
logger.error("执行sql语句失败:{}", new Object[]{sql, ex});
con.rollback();
//将exception抛到外层处理
throw ex;
} finally {
logger.debug("重新设置autoCommit为true");
con.setAutoCommit(true);
}
} catch (Exception ex) {
// HERE, the exception is swallowed, that's why the FlowFiles stay in
the incoming connection.
logger.error("重试执行sql语句:{}", new Object[]{sql, ex});
retryOnFail = true;
}

Thanks,
Koji

On Wed, Dec 27, 2017 at 2:38 PM, 尹文才 <batman...@gmail.com> wrote:
> Hi Koji, no problem. You could check the code of processor WaitBatch at the
> link:
> https://drive.google.com/open?id=1DMpW5GMiXpyZQdui989Rr3D9rlchQfWQ
>
> I also uploaded a snapshot of part of NiFi flow which includes the
> ExecuteSqlCommand and WaitBatch, you could check the picture at the link:
> https://drive.google.com/file/d/1vdxlWj8ANHQH0CMrXnydLni5o-3IVi2h/view
>
> You mentioned above that FlowFile repository fails checkpointing will
> affect other processors to process same FlowFile again, but as you could
> see from my snapshot image, the ExecuteSqlCommand is the second processor
> and before the WaitBatch processor, even if the FlowFile repository
> checkpointing failure is caused by WaitBatch, could it lead to the
> processors before it to process a FlowFile multiple times? Thanks.
>
> Regards,
> Ben
>
> 2017-12-27 12:36 GMT+08:00 Koji Kawamura <ijokaruma...@gmail.com>:
>
>> Hi Ben,
>>
>> I was referring these two log messages in your previous email.
>> These two messages are both written by ExecuteSqlCommand, it does not
>> mean 'it was executed again'.
>>
>> ```
>> 2017-12-26 07:00:01,312 INFO [Timer-Driven Process Thread-1]
>> c.z.nifi.processors.ExecuteSqlCommand
>> ExecuteSqlCommand[id=3c97dfd8-aaa4-3a37-626e-fed5a4822d14] 执行sql语句: SELECT
>> TOP 0 * INTO tmp.ods_extractDataDebug_20171226031801926_9195 FROM
>> dbo.ods_extractDataDebug;
>> alter table tmp.ods_extractDataDebug_20171226031801926_9195 drop column
>> _id;
>>
>> and it was executed again later:
>>
>> 2017-12-26 07:00:01,315 ERROR [Timer-Driven Process Thread-1]
>> c.z.nifi.processors.ExecuteSqlCommand
>> ExecuteSqlCommand[id=3c97dfd8-aaa4-3a37-626e-fed5a4822d14]
>> 执行sql语句失败:SELECT
>> ```
>>
>> As you written, the case where FlowFile repository fails checkpointing
>> will affect other processors to process same FlowFiles again. However
>> there won't be a simple solution to every processor to rollback its
>> job as different processors do different things. Creating a temp table
>> if not exist seems right approach to me.
>>
>> At the same time, the route cause of getting FlowFile repository
>> failed should be investigated. Is it possible to share WaitBatch code?
>> The reason why ask this is all 'FlowFile Repository failed to update'
>> is related to WaitBatch processor in the log that you shared earlier.
>>
>> Thanks,
>> Koji
>>
>> On Wed, Dec 27, 2017 at 1:19 PM, 尹文才 <batman...@gmail.com> wrote:
>> > Hi Koji, I will print the sql before actually executing it, but I checked
>> > the error log line you mentioned in your reply, this error was thrown by
>> > NiFi from within another processor called WaitBatch.
>> > I didn't find similar errors as the one from the ExecuteSqlCommand
>> > processor, I think it's because only the ExecuteSqlCommand is used to
>> > create temp database tables.
>> > You could check my ExecuteSqlCommand code via the link:
>> > https://drive.google.com/open?id=1NnjBihyKpmUPEH7X28Mh2hgOrhjSk_5P
>> >
>> > If the error is really caused by FlowFile repository checkpoint failure
>> and
>> > the flowfile was executed twice, I may have to

Re: proper way in nifi to sync status between custom processors

2017-12-26 Thread Koji Kawamura
Hi Ben,

I was referring these two log messages in your previous email.
These two messages are both written by ExecuteSqlCommand, it does not
mean 'it was executed again'.

```
2017-12-26 07:00:01,312 INFO [Timer-Driven Process Thread-1]
c.z.nifi.processors.ExecuteSqlCommand
ExecuteSqlCommand[id=3c97dfd8-aaa4-3a37-626e-fed5a4822d14] 执行sql语句: SELECT
TOP 0 * INTO tmp.ods_extractDataDebug_20171226031801926_9195 FROM
dbo.ods_extractDataDebug;
alter table tmp.ods_extractDataDebug_20171226031801926_9195 drop column _id;

and it was executed again later:

2017-12-26 07:00:01,315 ERROR [Timer-Driven Process Thread-1]
c.z.nifi.processors.ExecuteSqlCommand
ExecuteSqlCommand[id=3c97dfd8-aaa4-3a37-626e-fed5a4822d14] 执行sql语句失败:SELECT
```

As you written, the case where FlowFile repository fails checkpointing
will affect other processors to process same FlowFiles again. However
there won't be a simple solution to every processor to rollback its
job as different processors do different things. Creating a temp table
if not exist seems right approach to me.

At the same time, the route cause of getting FlowFile repository
failed should be investigated. Is it possible to share WaitBatch code?
The reason why ask this is all 'FlowFile Repository failed to update'
is related to WaitBatch processor in the log that you shared earlier.

Thanks,
Koji

On Wed, Dec 27, 2017 at 1:19 PM, 尹文才 <batman...@gmail.com> wrote:
> Hi Koji, I will print the sql before actually executing it, but I checked
> the error log line you mentioned in your reply, this error was thrown by
> NiFi from within another processor called WaitBatch.
> I didn't find similar errors as the one from the ExecuteSqlCommand
> processor, I think it's because only the ExecuteSqlCommand is used to
> create temp database tables.
> You could check my ExecuteSqlCommand code via the link:
> https://drive.google.com/open?id=1NnjBihyKpmUPEH7X28Mh2hgOrhjSk_5P
>
> If the error is really caused by FlowFile repository checkpoint failure and
> the flowfile was executed twice, I may have to create the temp table only
> if doesn't exist, I didn't fix this bug in this way
> right away is because I was afraid this fix could cover some other problems.
>
> Thanks.
>
> Regards,
> Ben
>
> 2017-12-27 11:38 GMT+08:00 Koji Kawamura <ijokaruma...@gmail.com>:
>
>> Hi Ben,
>>
>> The following two log messages are very close in terms of written
>> timestamp, but have different log level.
>> 2017-12-26 07:00:01,312 INFO
>> 2017-12-26 07:00:01,315 ERROR
>>
>> I guess those are logged within a single onTrigger of your
>> ExecuteSqlCommand custom processor, one is before executing, the other
>> is when it caught an exception. Just guessing as I don't have access
>> to the code.
>>
>> Does the same issue happen with other processors bundled with Apache
>> NiFi without your custom processor running?
>>
>> If NiFi fails to update/checkpoint FlowFile repository, then the same
>> FlowFile can be processed again after restarting NiFi.
>>
>> Thanks,
>> Koji
>>
>>
>>
>> On Wed, Dec 27, 2017 at 12:21 PM, 尹文才 <batman...@gmail.com> wrote:
>> > Thanks Koji, I will look into this article about the record model.
>> >
>> > By the way, that error I previously mentioned to you occurred again, I
>> > could see the sql query was executed twice in the log, this time I had
>> > turned on the verbose NiFi logging, the sql query is as below:
>> >
>> > 2017-12-26 07:00:01,312 INFO [Timer-Driven Process Thread-1]
>> > c.z.nifi.processors.ExecuteSqlCommand
>> > ExecuteSqlCommand[id=3c97dfd8-aaa4-3a37-626e-fed5a4822d14] 执行sql语句:
>> SELECT
>> > TOP 0 * INTO tmp.ods_extractDataDebug_20171226031801926_9195 FROM
>> > dbo.ods_extractDataDebug;
>> > alter table tmp.ods_extractDataDebug_20171226031801926_9195 drop column
>> _id;
>> >
>> > and it was executed again later:
>> >
>> > 2017-12-26 07:00:01,315 ERROR [Timer-Driven Process Thread-1]
>> > c.z.nifi.processors.ExecuteSqlCommand
>> > ExecuteSqlCommand[id=3c97dfd8-aaa4-3a37-626e-fed5a4822d14]
>> 执行sql语句失败:SELECT
>> > TOP 0 * INTO tmp.ods_extractDataDebug_20171226031801926_9195 FROM
>> > dbo.ods_extractDataDebug;
>> > alter table tmp.ods_extractDataDebug_20171226031801926_9195 drop column
>> > _id;: com.microsoft.sqlserver.jdbc.SQLServerException: 数据库中已存在名为
>> > 'ods_extractDataDebug_20171226031801926_9195' 的对象。
>> > com.microsoft.sqlserver.jdbc.SQLServerException: 数据库中已存在名为
>> > 'ods_extractDataDebug_20171226031801926_9195' 的对象。
>> > at
>> > com.microsoft.sqlserver.jdbc.

Re: proper way in nifi to sync status between custom processors

2017-12-26 Thread Koji Kawamura
Hi Ben,

The following two log messages are very close in terms of written
timestamp, but have different log level.
2017-12-26 07:00:01,312 INFO
2017-12-26 07:00:01,315 ERROR

I guess those are logged within a single onTrigger of your
ExecuteSqlCommand custom processor, one is before executing, the other
is when it caught an exception. Just guessing as I don't have access
to the code.

Does the same issue happen with other processors bundled with Apache
NiFi without your custom processor running?

If NiFi fails to update/checkpoint FlowFile repository, then the same
FlowFile can be processed again after restarting NiFi.

Thanks,
Koji



On Wed, Dec 27, 2017 at 12:21 PM, 尹文才 <batman...@gmail.com> wrote:
> Thanks Koji, I will look into this article about the record model.
>
> By the way, that error I previously mentioned to you occurred again, I
> could see the sql query was executed twice in the log, this time I had
> turned on the verbose NiFi logging, the sql query is as below:
>
> 2017-12-26 07:00:01,312 INFO [Timer-Driven Process Thread-1]
> c.z.nifi.processors.ExecuteSqlCommand
> ExecuteSqlCommand[id=3c97dfd8-aaa4-3a37-626e-fed5a4822d14] 执行sql语句: SELECT
> TOP 0 * INTO tmp.ods_extractDataDebug_20171226031801926_9195 FROM
> dbo.ods_extractDataDebug;
> alter table tmp.ods_extractDataDebug_20171226031801926_9195 drop column _id;
>
> and it was executed again later:
>
> 2017-12-26 07:00:01,315 ERROR [Timer-Driven Process Thread-1]
> c.z.nifi.processors.ExecuteSqlCommand
> ExecuteSqlCommand[id=3c97dfd8-aaa4-3a37-626e-fed5a4822d14] 执行sql语句失败:SELECT
> TOP 0 * INTO tmp.ods_extractDataDebug_20171226031801926_9195 FROM
> dbo.ods_extractDataDebug;
> alter table tmp.ods_extractDataDebug_20171226031801926_9195 drop column
> _id;: com.microsoft.sqlserver.jdbc.SQLServerException: 数据库中已存在名为
> 'ods_extractDataDebug_20171226031801926_9195' 的对象。
> com.microsoft.sqlserver.jdbc.SQLServerException: 数据库中已存在名为
> 'ods_extractDataDebug_20171226031801926_9195' 的对象。
> at
> com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:217)
> at
> com.microsoft.sqlserver.jdbc.SQLServerStatement.getNextResult(SQLServerStatement.java:1655)
> at
> com.microsoft.sqlserver.jdbc.SQLServerStatement.doExecuteStatement(SQLServerStatement.java:885)
> at
> com.microsoft.sqlserver.jdbc.SQLServerStatement$StmtExecCmd.doExecute(SQLServerStatement.java:778)
> at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7505)
> at
> com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:2445)
> at
> com.microsoft.sqlserver.jdbc.SQLServerStatement.executeCommand(SQLServerStatement.java:191)
> at
> com.microsoft.sqlserver.jdbc.SQLServerStatement.executeStatement(SQLServerStatement.java:166)
> at
> com.microsoft.sqlserver.jdbc.SQLServerStatement.execute(SQLServerStatement.java:751)
> at
> org.apache.commons.dbcp.DelegatingStatement.execute(DelegatingStatement.java:264)
> at
> org.apache.commons.dbcp.DelegatingStatement.execute(DelegatingStatement.java:264)
> at
> com.zjrealtech.nifi.processors.ExecuteSqlCommand.executeSql(ExecuteSqlCommand.java:194)
> at
> com.zjrealtech.nifi.processors.ExecuteSqlCommand.onTrigger(ExecuteSqlCommand.java:164)
> at
> org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
> at
> org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1119)
> at
> org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:147)
> at
> org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:47)
> at
> org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:128)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
>
> I also saw a lot of NiFi's exception like "ProcessException: FlowFile
> Repository failed to update", not sure if this is the reason the FlowFile
> got processed twice.  Could you help to take a look at my log file? Thanks.
> You could get the log file via the link:
> https://drive.google.com/file/d/1uVgtAVNEHxAbAPEpNTOWq_N9Xu6zMEi3/view
>
> Best Regards,
> Ben
>
> 2017-12-

Re: proper way in nifi to sync status between custom processors

2017-12-26 Thread Koji Kawamura
Hi Ben,

This blog post written by Mark, would be a good starting point to get
familiar with NiFi Record model.
https://blogs.apache.org/nifi/entry/record-oriented-data-with-nifi

HA for DistributedMapCacheClientService and DistributedMapCacheServer
pair is not supported at the moment. If you need HighAvailability,
RedisDistributedMapCacheClientService with Redis replication will
provide that, I haven't tried that myself though.
https://redis.io/topics/replication

Thanks,
Koji

On Tue, Dec 26, 2017 at 7:58 PM, 尹文才 <batman...@gmail.com> wrote:
> Thanks for your quick response, Koji, I haven't heard and seen anything
> about the NiFi record data model when I was reading the NiFi
> documentations,could you tell me where this model is documented? Thanks.
>
> By the way, to my knowledge, when you need to use the 
> DistributedMapCacheServer
> from DistributedMapCacheClientService, you need to specify the host url for
> the server, this means inside a NiFi cluster
> when I specify the cache server and the node suddenly went down, I couldn't
> possibly use it until the node goes up again right? Is there currently such
> a cache server in NiFi that could support HA? Thanks.
>
> Regards,
> Ben
>
> 2017-12-26 18:34 GMT+08:00 Koji Kawamura <ijokaruma...@gmail.com>:
>
>> Hi Ben,
>>
>> As you found from existing code, DistributedMapCache is used to share
>> state among different processors, and it can be used by your custom
>> processors, too.
>> However, I'd recommend to avoid such tight dependencies between
>> FlowFiles if possible, or minimize the part in flow that requires that
>> constraint at least for better performance and simplicity.
>> For example, since a FlowFile can hold fairly large amount of data,
>> you could merge all FlowFiles in a single FlowFile, instead of batches
>> of FlowFiles. If you need logical boundaries, you can use NiFi Record
>> data model to embed multiple records within a FlowFile, Record should
>> perform better.
>>
>> Hope this helps.
>>
>> Thanks,
>> Koji
>>
>>
>> On Tue, Dec 26, 2017 at 5:55 PM, 尹文才 <batman...@gmail.com> wrote:
>> > Hi guys, I'm currently trying to find a proper way in nifi which could
>> sync
>> > status between my custom processors.
>> > our requirement is like this, we're doing some ETL work using nifi and
>> I'm
>> > extracting the data from DB into batches of FlowFiles(each batch of
>> > FlowFile has a flag FlowFile indicating the end of the batch).
>> > There're some groups of custom processors downstream that need to process
>> > these FlowFiles to do some business logic work. And we expect these
>> > processors to process one batch of FlowFiles at a time.
>> > Therefore we need to implement a custom Wait processor(let's just call it
>> > WaitBatch here) to hold all the other batches of FlowFiles while the
>> > business processors were handling the batch of FlowFiles whose creation
>> > time is earlier.
>> >
>> > In order to implement this, all the WaitBatch processors placed in the
>> flow
>> > need to read/update records in a shared map so that each set of
>> > business-logic processors process one batch at a time.
>> > The entries are keyed using the batch number of the FlowFiles and the
>> value
>> > of each entry is a batch release counter number which counts the number
>> of
>> > times the batch of FlowFiles has passed through
>> > a WaitBatch processor.
>> > When a batch is released by WaitBatch, it will try to increment the batch
>> > number entry's value by 1 and then the released batch number and counter
>> > number will also be saved locally at the WaitBatch with StateManager;
>> > when the next batch reaches the WaitBatch, it will check if the counter
>> > value of the previous released batch number in the shared map is greater
>> > than the one saved locally, if the entry for the batch number does't
>> > exist(already removed) or the value in the shared map is greater, the
>> next
>> > batch will be released and the local state and the entry on the shared
>> map
>> > will be updated similarly.
>> > In the end of the flow, a custom processor will get the batch number from
>> > each batch and remove the entry from the shared map .
>> >
>> > So this implementation requires a shared map that could read/update
>> > frequently and atomically. I checked the Wait/Notify processors in NIFI
>> and
>> > saw it is using the DistributedMapCacheClientService and
>> > DistributedMapCacheServer to sync status, so I'm wondering if I could use
>> > the DistributedMapCacheClientService to implement my logic. I also saw
>> > another implementation called RedisDistributedMapCacheClientService
>> > which seems to require Redis(I haven't used Redis).  Thanks in advance
>> for
>> > any suggestions.
>> >
>> > Regards,
>> > Ben
>>


Re: proper way in nifi to sync status between custom processors

2017-12-26 Thread Koji Kawamura
Hi Ben,

As you found from existing code, DistributedMapCache is used to share
state among different processors, and it can be used by your custom
processors, too.
However, I'd recommend to avoid such tight dependencies between
FlowFiles if possible, or minimize the part in flow that requires that
constraint at least for better performance and simplicity.
For example, since a FlowFile can hold fairly large amount of data,
you could merge all FlowFiles in a single FlowFile, instead of batches
of FlowFiles. If you need logical boundaries, you can use NiFi Record
data model to embed multiple records within a FlowFile, Record should
perform better.

Hope this helps.

Thanks,
Koji


On Tue, Dec 26, 2017 at 5:55 PM, 尹文才  wrote:
> Hi guys, I'm currently trying to find a proper way in nifi which could sync
> status between my custom processors.
> our requirement is like this, we're doing some ETL work using nifi and I'm
> extracting the data from DB into batches of FlowFiles(each batch of
> FlowFile has a flag FlowFile indicating the end of the batch).
> There're some groups of custom processors downstream that need to process
> these FlowFiles to do some business logic work. And we expect these
> processors to process one batch of FlowFiles at a time.
> Therefore we need to implement a custom Wait processor(let's just call it
> WaitBatch here) to hold all the other batches of FlowFiles while the
> business processors were handling the batch of FlowFiles whose creation
> time is earlier.
>
> In order to implement this, all the WaitBatch processors placed in the flow
> need to read/update records in a shared map so that each set of
> business-logic processors process one batch at a time.
> The entries are keyed using the batch number of the FlowFiles and the value
> of each entry is a batch release counter number which counts the number of
> times the batch of FlowFiles has passed through
> a WaitBatch processor.
> When a batch is released by WaitBatch, it will try to increment the batch
> number entry's value by 1 and then the released batch number and counter
> number will also be saved locally at the WaitBatch with StateManager;
> when the next batch reaches the WaitBatch, it will check if the counter
> value of the previous released batch number in the shared map is greater
> than the one saved locally, if the entry for the batch number does't
> exist(already removed) or the value in the shared map is greater, the next
> batch will be released and the local state and the entry on the shared map
> will be updated similarly.
> In the end of the flow, a custom processor will get the batch number from
> each batch and remove the entry from the shared map .
>
> So this implementation requires a shared map that could read/update
> frequently and atomically. I checked the Wait/Notify processors in NIFI and
> saw it is using the DistributedMapCacheClientService and
> DistributedMapCacheServer to sync status, so I'm wondering if I could use
> the DistributedMapCacheClientService to implement my logic. I also saw
> another implementation called RedisDistributedMapCacheClientService
> which seems to require Redis(I haven't used Redis).  Thanks in advance for
> any suggestions.
>
> Regards,
> Ben


Re: NIFI-4715 : ListS3 list duplicate files when incoming file throughput to S3 is high

2017-12-26 Thread Koji Kawamura
Hi Milan,

Thanks for your contribution! I reviewed the PR and posted a comment there.
Would you check that?

Koji

On Sat, Dec 23, 2017 at 7:15 AM, Milan Das  wrote:

> I have logged a defect in NIFI. ListS3 is generation duplicate flows  when
> S3 throughput is high.
>
>
>
> Root cause is:
> When the file gets uploaded to S3 simultaneously when List S3 is in
> progress.
> onTrigger--> maxTimestamp is initiated as 0L.
> This is clearing keys as per the code below
>
> When lastModifiedTime on S3 object is same as currentTimestamp for the
> listed key it should be skipped. As the key is cleared, it is loading the
> same file again.
> I think fix should be to initiate the maxTimestamp with currentTimestamp
> not 0L.
>
>
>
>
>
>
>
> https://issues.apache.org/jira/browse/ NIFI-4715
> 
>
>
>
> The fix I did already seems ok and working for us.
>
> long maxTimestamp = currentTimestamp;
>
>
>
> Wanted to check thought from other experts or of there is any other know
> fix .
>
>
>
>
>
> Regards,
>
>
>
> [image: graph]
>
> *Milan Das*
> Sr. System Architect
>
> email: m...@interset.com
> mobile: +1 678 216 5660 <(678)%20216-5660>
>
> [image: dIn icon] 
>
> www.interset.com
>
>
>
>
>


Re: The rate of the dataflow is exceeding the provenance recording rate. Slowing down flow to accommodate

2017-12-25 Thread Koji Kawamura
Thanks for the updates, Ben. Glad to hear that!
Koji

On Tue, Dec 26, 2017 at 4:21 PM, 尹文才 <batman...@gmail.com> wrote:
> Thanks Koji,  I have already updated the logback configuration to produce
> more verbose logs.
> I was trying to reply to you with the verbose nifi logs but since I
> switched to use the WriteAheadProvenanceRepository implementation, up till
> now I haven't seen the error again.
> I will continue to check when the error might occur and post the logs here
> if needed. Once again thanks very much for your help.
>
> Regards,
> Ben
>
> 2017-12-25 15:37 GMT+08:00 Koji Kawamura <ijokaruma...@gmail.com>:
>
>> Hi Ben,
>>
>> You can make NiFi log more verbose by editing:
>> NIFI_HOME/conf/logback.xml
>>
>> For example, adding following entry will reveal how NiFi repositories run:
>>
>> 
>> 
>> 
>> 
>>
>> Thanks,
>> Koji
>>
>> On Mon, Dec 25, 2017 at 4:30 PM, 尹文才 <batman...@gmail.com> wrote:
>> > Hi Koji, I also didn't find anything related to the unexpected shutdown
>> in
>> > my logs, is there anything I could do  to make NIFI log more verbose
>> > information to the logs?
>> >
>> > Regards,
>> > Ben
>> >
>> > 2017-12-25 14:56 GMT+08:00 Koji Kawamura <ijokaruma...@gmail.com>:
>> >
>> >> Hi Ben,
>> >>
>> >> I looked at the log and I expected to see some indication for the
>> >> cause of shutdown, but couldn't find any.
>> >> The PersistentProvenanceRepository rate warning is just a warning, and
>> >> it shouldn't be the trigger of an unexpected shutdown. I suspect other
>> >> reasons such as OOM killer, but I can't do any further investigation
>> >> with only these logs.
>> >>
>> >> Thanks,
>> >> Koji
>> >>
>> >> On Mon, Dec 25, 2017 at 3:46 PM, 尹文才 <batman...@gmail.com> wrote:
>> >> > Hi Koji, one more thing, do you have any idea why my first issue
>> leads to
>> >> > the unexpected shutdown of NIFI? according to the words, it will just
>> >> slow
>> >> > down the flow. thanks.
>> >> >
>> >> > Regards,
>> >> > Ben
>> >> >
>> >> > 2017-12-25 14:31 GMT+08:00 尹文才 <batman...@gmail.com>:
>> >> >
>> >> >> Hi Koji, thanks for your help, for the first issue, I will switch to
>> use
>> >> >> the WriteAheadProvenanceReopsitory implementation.
>> >> >>
>> >> >> For the second issue, I have uploaded the relevant part of my log
>> file
>> >> >> onto my google drive, the link is:
>> >> >> https://drive.google.com/open?id=1oxAkSUyYZFy6IWZSeWqHI8e9Utnw1XAj
>> >> >>
>> >> >> You mean a custom processor could possibly process a flowfile twice
>> only
>> >> >> when it's trying to commit the session but it's interrupted so the
>> >> flowfile
>> >> >> still remains inside the original queue(like NIFI went down)?
>> >> >>
>> >> >> If you need to see the full log file, please let me know, thanks.
>> >> >>
>> >> >> Regards,
>> >> >> Ben
>> >> >>
>> >> >> 2017-12-25 13:51 GMT+08:00 Koji Kawamura <ijokaruma...@gmail.com>:
>> >> >>
>> >> >>> Hi Ben,
>> >> >>>
>> >> >>> For your 2nd issue, NiFi commits a process session in Processor
>> >> >>> onTrigger when it's executed by NiFi flow engine by calling
>> >> >>> session.commit().
>> >> >>> https://github.com/apache/nifi/blob/master/nifi-api/src/main
>> >> >>> /java/org/apache/nifi/processor/AbstractProcessor.java#L28
>> >> >>> Once a process session is committed, the FlowFile state (including
>> >> >>> which queue it is in) is persisted to disk.
>> >> >>>
>> >> >>> It's possible for a Processor to process the same FlowFile more than
>> >> >>> once, if it has done its job, but failed to commit the session.
>> >> >>> For example, if your custom processor created a temp table from a
>> >> >>> FlowFile. Then before the process session is committed, something
>> >> >>> happened and NiFi process session was rollback. In this case, the
>

Re: The rate of the dataflow is exceeding the provenance recording rate. Slowing down flow to accommodate

2017-12-24 Thread Koji Kawamura
Hi Ben,

You can make NiFi log more verbose by editing:
NIFI_HOME/conf/logback.xml

For example, adding following entry will reveal how NiFi repositories run:






Thanks,
Koji

On Mon, Dec 25, 2017 at 4:30 PM, 尹文才 <batman...@gmail.com> wrote:
> Hi Koji, I also didn't find anything related to the unexpected shutdown in
> my logs, is there anything I could do  to make NIFI log more verbose
> information to the logs?
>
> Regards,
> Ben
>
> 2017-12-25 14:56 GMT+08:00 Koji Kawamura <ijokaruma...@gmail.com>:
>
>> Hi Ben,
>>
>> I looked at the log and I expected to see some indication for the
>> cause of shutdown, but couldn't find any.
>> The PersistentProvenanceRepository rate warning is just a warning, and
>> it shouldn't be the trigger of an unexpected shutdown. I suspect other
>> reasons such as OOM killer, but I can't do any further investigation
>> with only these logs.
>>
>> Thanks,
>> Koji
>>
>> On Mon, Dec 25, 2017 at 3:46 PM, 尹文才 <batman...@gmail.com> wrote:
>> > Hi Koji, one more thing, do you have any idea why my first issue leads to
>> > the unexpected shutdown of NIFI? according to the words, it will just
>> slow
>> > down the flow. thanks.
>> >
>> > Regards,
>> > Ben
>> >
>> > 2017-12-25 14:31 GMT+08:00 尹文才 <batman...@gmail.com>:
>> >
>> >> Hi Koji, thanks for your help, for the first issue, I will switch to use
>> >> the WriteAheadProvenanceReopsitory implementation.
>> >>
>> >> For the second issue, I have uploaded the relevant part of my log file
>> >> onto my google drive, the link is:
>> >> https://drive.google.com/open?id=1oxAkSUyYZFy6IWZSeWqHI8e9Utnw1XAj
>> >>
>> >> You mean a custom processor could possibly process a flowfile twice only
>> >> when it's trying to commit the session but it's interrupted so the
>> flowfile
>> >> still remains inside the original queue(like NIFI went down)?
>> >>
>> >> If you need to see the full log file, please let me know, thanks.
>> >>
>> >> Regards,
>> >> Ben
>> >>
>> >> 2017-12-25 13:51 GMT+08:00 Koji Kawamura <ijokaruma...@gmail.com>:
>> >>
>> >>> Hi Ben,
>> >>>
>> >>> For your 2nd issue, NiFi commits a process session in Processor
>> >>> onTrigger when it's executed by NiFi flow engine by calling
>> >>> session.commit().
>> >>> https://github.com/apache/nifi/blob/master/nifi-api/src/main
>> >>> /java/org/apache/nifi/processor/AbstractProcessor.java#L28
>> >>> Once a process session is committed, the FlowFile state (including
>> >>> which queue it is in) is persisted to disk.
>> >>>
>> >>> It's possible for a Processor to process the same FlowFile more than
>> >>> once, if it has done its job, but failed to commit the session.
>> >>> For example, if your custom processor created a temp table from a
>> >>> FlowFile. Then before the process session is committed, something
>> >>> happened and NiFi process session was rollback. In this case, the
>> >>> target database is already updated (the temp table is created), but
>> >>> NiFi FlowFile stays in the incoming queue. If the FlowFile is
>> >>> processed again, the processor will get an error indicating the table
>> >>> already exists.
>> >>>
>> >>> I tried to look at the logs you attached, but attachments do not seem
>> >>> to be delivered to this ML. I don't see anything attached.
>> >>>
>> >>> Thanks,
>> >>> Koji
>> >>>
>> >>>
>> >>> On Mon, Dec 25, 2017 at 1:43 PM, Koji Kawamura <ijokaruma...@gmail.com
>> >
>> >>> wrote:
>> >>> > Hi Ben,
>> >>> >
>> >>> > Just a quick recommendation for your first issue, 'The rate of the
>> >>> > dataflow is exceeding the provenance recording rate' warning message.
>> >>> > I'd recommend using WriteAheadProvenanceRepository instead of
>> >>> > PersistentProvenanceRepository. WriteAheadProvenanceRepository
>> >>> > provides better performance.
>> >>> > Please take a look at the documentation here.
>> >>> > https://nifi.apache.org/docs/nifi-docs/html/administration-g
>> >>> uide.html#provenance-repository

Re: The rate of the dataflow is exceeding the provenance recording rate. Slowing down flow to accommodate

2017-12-24 Thread Koji Kawamura
Hi Ben,

I looked at the log and I expected to see some indication for the
cause of shutdown, but couldn't find any.
The PersistentProvenanceRepository rate warning is just a warning, and
it shouldn't be the trigger of an unexpected shutdown. I suspect other
reasons such as OOM killer, but I can't do any further investigation
with only these logs.

Thanks,
Koji

On Mon, Dec 25, 2017 at 3:46 PM, 尹文才 <batman...@gmail.com> wrote:
> Hi Koji, one more thing, do you have any idea why my first issue leads to
> the unexpected shutdown of NIFI? according to the words, it will just slow
> down the flow. thanks.
>
> Regards,
> Ben
>
> 2017-12-25 14:31 GMT+08:00 尹文才 <batman...@gmail.com>:
>
>> Hi Koji, thanks for your help, for the first issue, I will switch to use
>> the WriteAheadProvenanceReopsitory implementation.
>>
>> For the second issue, I have uploaded the relevant part of my log file
>> onto my google drive, the link is:
>> https://drive.google.com/open?id=1oxAkSUyYZFy6IWZSeWqHI8e9Utnw1XAj
>>
>> You mean a custom processor could possibly process a flowfile twice only
>> when it's trying to commit the session but it's interrupted so the flowfile
>> still remains inside the original queue(like NIFI went down)?
>>
>> If you need to see the full log file, please let me know, thanks.
>>
>> Regards,
>> Ben
>>
>> 2017-12-25 13:51 GMT+08:00 Koji Kawamura <ijokaruma...@gmail.com>:
>>
>>> Hi Ben,
>>>
>>> For your 2nd issue, NiFi commits a process session in Processor
>>> onTrigger when it's executed by NiFi flow engine by calling
>>> session.commit().
>>> https://github.com/apache/nifi/blob/master/nifi-api/src/main
>>> /java/org/apache/nifi/processor/AbstractProcessor.java#L28
>>> Once a process session is committed, the FlowFile state (including
>>> which queue it is in) is persisted to disk.
>>>
>>> It's possible for a Processor to process the same FlowFile more than
>>> once, if it has done its job, but failed to commit the session.
>>> For example, if your custom processor created a temp table from a
>>> FlowFile. Then before the process session is committed, something
>>> happened and NiFi process session was rollback. In this case, the
>>> target database is already updated (the temp table is created), but
>>> NiFi FlowFile stays in the incoming queue. If the FlowFile is
>>> processed again, the processor will get an error indicating the table
>>> already exists.
>>>
>>> I tried to look at the logs you attached, but attachments do not seem
>>> to be delivered to this ML. I don't see anything attached.
>>>
>>> Thanks,
>>> Koji
>>>
>>>
>>> On Mon, Dec 25, 2017 at 1:43 PM, Koji Kawamura <ijokaruma...@gmail.com>
>>> wrote:
>>> > Hi Ben,
>>> >
>>> > Just a quick recommendation for your first issue, 'The rate of the
>>> > dataflow is exceeding the provenance recording rate' warning message.
>>> > I'd recommend using WriteAheadProvenanceRepository instead of
>>> > PersistentProvenanceRepository. WriteAheadProvenanceRepository
>>> > provides better performance.
>>> > Please take a look at the documentation here.
>>> > https://nifi.apache.org/docs/nifi-docs/html/administration-g
>>> uide.html#provenance-repository
>>> >
>>> > Thanks,
>>> > Koji
>>> >
>>> > On Mon, Dec 25, 2017 at 12:56 PM, 尹文才 <batman...@gmail.com> wrote:
>>> >> Hi guys, I'm using nifi 1.4.0 to do some ETL work in my team and I have
>>> >> encountered 2 problems during my testing.
>>> >>
>>> >> The first problem is I found the nifi bulletin board was showing the
>>> >> following warning to me:
>>> >>
>>> >> 2017-12-25 01:31:00,460 WARN [Provenance Maintenance Thread-1]
>>> >> o.a.n.p.PersistentProvenanceRepository The rate of the dataflow is
>>> exceeding
>>> >> the provenance recording rate. Slowing down flow to accommodate.
>>> Currently,
>>> >> there are 96 journal files (158278228 bytes) and threshold for
>>> blocking is
>>> >> 80 (1181116006 bytes)
>>> >>
>>> >> I don't quite understand what this means, and I found also inside the
>>> >> bootstrap log that nifi restarted itself:
>>> >>
>>> >> 2017-12-25 01:31:19,249 WARN [main] org.apache.nifi.bootstrap.RunNiFi
>>> Apache
>

Re: The rate of the dataflow is exceeding the provenance recording rate. Slowing down flow to accommodate

2017-12-24 Thread Koji Kawamura
Hi Ben,

For your 2nd issue, NiFi commits a process session in Processor
onTrigger when it's executed by NiFi flow engine by calling
session.commit().
https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/processor/AbstractProcessor.java#L28
Once a process session is committed, the FlowFile state (including
which queue it is in) is persisted to disk.

It's possible for a Processor to process the same FlowFile more than
once, if it has done its job, but failed to commit the session.
For example, if your custom processor created a temp table from a
FlowFile. Then before the process session is committed, something
happened and NiFi process session was rollback. In this case, the
target database is already updated (the temp table is created), but
NiFi FlowFile stays in the incoming queue. If the FlowFile is
processed again, the processor will get an error indicating the table
already exists.

I tried to look at the logs you attached, but attachments do not seem
to be delivered to this ML. I don't see anything attached.

Thanks,
Koji


On Mon, Dec 25, 2017 at 1:43 PM, Koji Kawamura <ijokaruma...@gmail.com> wrote:
> Hi Ben,
>
> Just a quick recommendation for your first issue, 'The rate of the
> dataflow is exceeding the provenance recording rate' warning message.
> I'd recommend using WriteAheadProvenanceRepository instead of
> PersistentProvenanceRepository. WriteAheadProvenanceRepository
> provides better performance.
> Please take a look at the documentation here.
> https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#provenance-repository
>
> Thanks,
> Koji
>
> On Mon, Dec 25, 2017 at 12:56 PM, 尹文才 <batman...@gmail.com> wrote:
>> Hi guys, I'm using nifi 1.4.0 to do some ETL work in my team and I have
>> encountered 2 problems during my testing.
>>
>> The first problem is I found the nifi bulletin board was showing the
>> following warning to me:
>>
>> 2017-12-25 01:31:00,460 WARN [Provenance Maintenance Thread-1]
>> o.a.n.p.PersistentProvenanceRepository The rate of the dataflow is exceeding
>> the provenance recording rate. Slowing down flow to accommodate. Currently,
>> there are 96 journal files (158278228 bytes) and threshold for blocking is
>> 80 (1181116006 bytes)
>>
>> I don't quite understand what this means, and I found also inside the
>> bootstrap log that nifi restarted itself:
>>
>> 2017-12-25 01:31:19,249 WARN [main] org.apache.nifi.bootstrap.RunNiFi Apache
>> NiFi appears to have died. Restarting...
>>
>> Is there anything I could do so solve this problem?
>>
>> The second problem is about the FlowFiles inside my flow, I actually
>> implemented a few custom processors to do the ETL work. one is to extract
>> multiple tables from sql server and for each flowfile out of it, it contains
>> an attribute
>> specifying the name of the temp ods table to create, and the second
>> processor is to get all flowfiles from the first processor and create all
>> the temp ods tables specified in the flowfiles' attribute.
>> I found inside the app log that one of the temp table name already existed
>> when trying to create the temp table, and it caused sql exception.
>> After taking some time investigating in the log, I found the sql query was
>> executed twice in the second processor, once before nifi restart, the second
>> execution was done right after nifi restart:
>>
>> 2017-12-25 01:32:35,639 ERROR [Timer-Driven Process Thread-7]
>> c.z.nifi.processors.ExecuteSqlCommand
>> ExecuteSqlCommand[id=3c97dfd8-aaa4-3a37-626e-fed5a4822d14] 执行sql语句失败:SELECT
>> TOP 0 * INTO tmp.ods_bd_e_reason_20171225013007005_5567 FROM
>> dbo.ods_bd_e_reason;
>>
>>
>> I have read the document of nifi in depth but I'm still not very aware of
>> nifi's internal mechanism, my suspect is nifi didn't manage to checkpoint
>> the flowfile's state(which queue it was in) in memory into flowfile
>> repository
>> before it was dead and after restarting it recovered the flowfile's state
>> from flowfile repository and then the flowfile went through the second
>> processor again and thus the sql was executed twice. Is this correct?
>>
>> I've attached the relevant part of app log, thanks.
>>
>> Regards,
>> Ben


Re: The rate of the dataflow is exceeding the provenance recording rate. Slowing down flow to accommodate

2017-12-24 Thread Koji Kawamura
Hi Ben,

Just a quick recommendation for your first issue, 'The rate of the
dataflow is exceeding the provenance recording rate' warning message.
I'd recommend using WriteAheadProvenanceRepository instead of
PersistentProvenanceRepository. WriteAheadProvenanceRepository
provides better performance.
Please take a look at the documentation here.
https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#provenance-repository

Thanks,
Koji

On Mon, Dec 25, 2017 at 12:56 PM, 尹文才  wrote:
> Hi guys, I'm using nifi 1.4.0 to do some ETL work in my team and I have
> encountered 2 problems during my testing.
>
> The first problem is I found the nifi bulletin board was showing the
> following warning to me:
>
> 2017-12-25 01:31:00,460 WARN [Provenance Maintenance Thread-1]
> o.a.n.p.PersistentProvenanceRepository The rate of the dataflow is exceeding
> the provenance recording rate. Slowing down flow to accommodate. Currently,
> there are 96 journal files (158278228 bytes) and threshold for blocking is
> 80 (1181116006 bytes)
>
> I don't quite understand what this means, and I found also inside the
> bootstrap log that nifi restarted itself:
>
> 2017-12-25 01:31:19,249 WARN [main] org.apache.nifi.bootstrap.RunNiFi Apache
> NiFi appears to have died. Restarting...
>
> Is there anything I could do so solve this problem?
>
> The second problem is about the FlowFiles inside my flow, I actually
> implemented a few custom processors to do the ETL work. one is to extract
> multiple tables from sql server and for each flowfile out of it, it contains
> an attribute
> specifying the name of the temp ods table to create, and the second
> processor is to get all flowfiles from the first processor and create all
> the temp ods tables specified in the flowfiles' attribute.
> I found inside the app log that one of the temp table name already existed
> when trying to create the temp table, and it caused sql exception.
> After taking some time investigating in the log, I found the sql query was
> executed twice in the second processor, once before nifi restart, the second
> execution was done right after nifi restart:
>
> 2017-12-25 01:32:35,639 ERROR [Timer-Driven Process Thread-7]
> c.z.nifi.processors.ExecuteSqlCommand
> ExecuteSqlCommand[id=3c97dfd8-aaa4-3a37-626e-fed5a4822d14] 执行sql语句失败:SELECT
> TOP 0 * INTO tmp.ods_bd_e_reason_20171225013007005_5567 FROM
> dbo.ods_bd_e_reason;
>
>
> I have read the document of nifi in depth but I'm still not very aware of
> nifi's internal mechanism, my suspect is nifi didn't manage to checkpoint
> the flowfile's state(which queue it was in) in memory into flowfile
> repository
> before it was dead and after restarting it recovered the flowfile's state
> from flowfile repository and then the flowfile went through the second
> processor again and thus the sql was executed twice. Is this correct?
>
> I've attached the relevant part of app log, thanks.
>
> Regards,
> Ben


Re: Bug in NiFi XML processor

2017-12-21 Thread Koji Kawamura
Hi Sreejith,

Do you still have the issue? Unfortunately the attached screenshot is
dropped so I couldn't see what error you got.
I tried to reproduce the issue, but EvaluateXPath runs fine with your
example data regardless having whitespace or not.
Here is a flow template that I used to confirm:
https://gist.github.com/ijokarumawak/6951783fea0e02db08fc97c2d551de2b

If the issue still persists, please feel free to submit a NiFi JIRA issue.
https://issues.apache.org/jira/projects/NIFI

Thanks,
Koji

On Mon, Nov 27, 2017 at 6:16 PM, Sadasivan, Sreejith
 wrote:
> Hi,
>
>
>
> I have been using NiFi for a while without any issues. Recently I found a
> bug in the EvaluateXPath processor (I was using the 1.2.0 version of it,
> recently using 1.4.0). The problem is described below:
>
>
>
> I have a CDATA section in my xml. All the CDATA are processed properly but
> the processor fails if it has [JM] at the end of the CDATA section. Please
> find the screen shot attached.
>
>
>
>  this will fail.
>
>  providing a space after [JM] will
> work.
>
>
>
> So I think this is a bug in the NiFi XML processors, which considers the
> closing bracket of [JM] as the end of CDATA.
>
>
>
> Thanks,
>
> Sreejith


Re: Duplicate Hits From invokehttp

2017-12-21 Thread Koji Kawamura
Hi V,

Would you elaborate what you mean by duplicate response?
Does it mean when a failed FlowFile at the 1st request is routed back
to the same InvokeHTTP, sent as the 2nd request, and if the 2nd
request succeeds, you get TWO duplicated output FlowFiles for the
Response relationship?

If your InvokeHTTP "Always Output Response" is set to "true", then it
always writes a FlowFile regardless of the HTTP response status code.
So you would get two FlowFiles in above scenario.

Thanks,
Koji

On Thu, Dec 21, 2017 at 2:50 PM, vin  wrote:
> Hi,
>
>
>  How to handle the duplicate responses from invokehttp.
>
> if i did not get the response from remote server, it is initiating the same
> request to the remote server and giving duplicate response,How to handle
> thiis scenario.
>
> thanks
>
> V
>
>
>
> --
> Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/


Re: [VOTE] Release Apache NiFi MiNiFi 0.3.0 - RC3

2017-12-18 Thread Koji Kawamura
Thanks Aldrin for updating RC again, and to all devs who contributed
to MiNiFi 0.3.0 release!

I confirmed:
- Hashes are correct
- MiNiFi Windows Service works
- MiNiFi Toolset works, NiFi template -> MiNiFi config yml
- NiFi template -> C2 server -> MiNiFi PullHttpChangeIngestor works nicely

+1 Release this package as minifi-0.3.0

Koji

On Tue, Dec 19, 2017 at 7:09 AM, Aldrin Piri  wrote:
> Hello,
>
> I am pleased to call this vote for the source release of Apache NiFi MiNiFi,
> minifi-0.3.0.
>
> The source zip, including signatures, digests, etc. can be found at:
> https://repository.apache.org/content/repositories/orgapachenifi-1114/
>
> The Git tag is minifi-0.3.0-RC3
> The Git commit ID is f06a7190ac07dbf02e5b0f9ee2859dea08acf3b0
> *
> https://git-wip-us.apache.org/repos/asf?p=nifi-minifi.git;a=commit;h=f06a7190ac07dbf02e5b0f9ee2859dea08acf3b0
> *
> https://github.com/apache/nifi-minifi/commit/f06a7190ac07dbf02e5b0f9ee2859dea08acf3b0
>
> Checksums of minifi-0.3.0-source-release.zip:
> MD5: 685e890486dbde4fd75db86128adf140
> SHA1: 9dcf7d1b440b1247ca0c55e2ffdc75574467fed6
> SHA256: a1c9be5ca4824fe98620d7df68469c8b9ea9461fb8dbecd1bdd6596e0b5dea89
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/aldrin.asc
>
> KEYS file available here:
> https://dist.apache.org/repos/dist/release/nifi/KEYS
>
> 23 issues were closed/resolved for this release:
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319921=12340598
>
> Release note highlights can be found here:
> https://cwiki.apache.org/confluence/display/MINIFI/Release+Notes#ReleaseNotes-Version0.3.0
>
> The vote will be open until 7:00PM EST, 21 December 2017 [1].
>
> Please download the release candidate and evaluate the necessary items
> including checking hashes, signatures, build
> from source, and test.  Then please vote:
>
> [ ] +1 Release this package as minifi-0.3.0
> [ ] +0 no opinion
> [ ] -1 Do not release this package because...
>
> Thanks!
>
> [1] You can determine this time for your local time zone at
> https://s.apache.org/minifi-0.3.0-rc3-close


Re: [VOTE] Release Apache NiFi MiNiFi 0.3.0 - RC2

2017-12-18 Thread Koji Kawamura
Hi Aldrin,

I'm verifying the updated RC now. It's working nicely.

Just a question before casting my vote.
How was the source zip file is created? I am seeing minifi.exe and
minifiw.exe are existing in
minifi-0.3.0/minifi-nar-bundles/minifi-framework-bundle/minifi-framework/minifi-resources/src/main/resources/bin
dir in the RC source zip.

I think those should not be there. To making things worse, the
minifi.exe is a 32bit one and build process skips downloading commons
daemon if exe is already in bin dir.
I had to remove exe files before running mvn command to confirm MINIFI-418.

Thanks,
Koji

On Sat, Dec 16, 2017 at 4:39 AM, Aldrin Piri  wrote:
> Hello,
>
> I am pleased to call this vote for the source release of Apache NiFi MiNiFi,
> minifi-0.3.0.
>
> The source zip, including signatures, digests, etc. can be found at:
> https://repository.apache.org/content/repositories/orgapachenifi-1113/
>
> The Git tag is minifi-0.3.0-RC2
> The Git commit ID is bc7a7948d16a68e5bc058e59cb0c31c6615d8290
> *
> https://git-wip-us.apache.org/repos/asf?p=nifi-minifi.git;a=commit;h=bc7a7948d16a68e5bc058e59cb0c31c6615d8290
> *
> https://github.com/apache/nifi-minifi/commit/bc7a7948d16a68e5bc058e59cb0c31c6615d8290
>
> Checksums of minifi-0.3.0-source-release.zip:
> MD5:  acd075dccee368ecd79c2fe385b5bd5d
> SHA1:  ace80e47217ed764d8dec16b30113d46d57cb739
> SHA256:  048712bf697fd49517f3f24a75828f51041c8bfb2cc1cc451e6f334b76d595aa
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/aldrin.asc
>
> KEYS file available here:
> https://dist.apache.org/repos/dist/release/nifi/KEYS
>
> 22 issues were closed/resolved for this release:
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319921=12340598
>
> Release note highlights can be found here:
> https://cwiki.apache.org/confluence/display/MINIFI/Release+Notes#ReleaseNotes-Version0.3.0
>
> As we are heading into the weekend, the vote will be open until 3:00PM EST,
> 20 December 2017 [1].
>
> Please download the release candidate and evaluate the necessary items
> including checking hashes, signatures, build
> from source, and test.  Then please vote:
>
> [ ] +1 Release this package as minifi-0.3.0
> [ ] +0 no opinion
> [ ] -1 Do not release this package because...
>
> Thanks!
>
> [1] You can determine this time for your local time zone at
> https://s.apache.org/minifi-0.3.0-rc2-close


Re: Run 1 instance of ExecuteStreamCommand constantly

2017-11-19 Thread Koji Kawamura
Hi,

If the script encounters a while(1) loop when it called from NiFi,
then NiFi can not do anything until the loop ends.

To achieve what you described (keep using the same instance of a
script), I'd recommend to implement an API endpoint in that script,
e.g. a simple REST endpoint to receive new input, then let NiFi make
HTTP requests using InvokeHTTP processor.
This way, your python script can receive new input and also updating ML model.

Thanks,
Koji

On Fri, Nov 17, 2017 at 4:24 AM, moe2017  wrote:
> Hey,
>
> I have python machine learning model being executed my an
> ExecuteStreamCommand processor. The problem right now is that I need this
> processor to be executed once and then consistently loop so that the model
> can update itself when new data is passed to it from stdin.
>
> I tried putting in a while(1) in my script and then sending the output to
> stdout but NiFi just hangs and the processor won't take in the data from the
> queue. Is NiFi not equipped to handle while loops? Or is there a NiFi way
> that I can constantly run only one instance of a script during the entire
> lifecycle of the pipeline?
>
> Thanks
>
>
>
> --
> Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/


Re: NiFi 1.3 - PublishKafka_0_10 - A message in the stream exceeds the maximum allowed message size of 1048576 bytes

2017-11-19 Thread Koji Kawamura
Hi Mayank,

I've tried to reproduce the issue, but to no avail so far.
PublishKafka_0_10 uses the specified Max Request Size as expected and
I got the exception if incoming message size exceeds the configured
size.
And I was able to publish messages whose size is 2.08MB with 10MB Max
Request Size.

The stacktrace you reported is created within NiFi AbstractDemarcator
(StreamDemarcator), when it tried to read bytes from incoming FlowFile
content and read size exceeds maxDataSize.
StreamDemarcator.maxDataSize is set to the specified PublishKafka_0_10
'Max Request Size'.

Does this issue still happen? If so, do you mind sharing your
processor configuration by exporting as a template?

Thanks,
Koji


On Sat, Nov 18, 2017 at 1:54 AM, mayank rathi  wrote:
> Hello All,
>
> I am getting this error in PublishKafka_0_10 processor for a message of
> size 2.08 MB. I have updated Max Request Size to 10 MB in processor
> properties and max.request.size to 10 MB in Kafka's server.properties.
> After reboot Kafka Broker I can see that max.request.size = 10 MB in Kafka
> logs but I am still getting below error.
>
> What am I missing here?
>
> 2017-11-17 11:07:47,966 ERROR [Timer-Driven Process Thread-4]
> o.a.n.p.kafka.pubsub.PublishKafka_0_10
> PublishKafka_0_10[id=e6d932d9-97ae-1647-aa8f-86d07791ce25]
> Failed to send all message for StandardFlowFileRecord[uuid=
> fa2399e5-bea5-4113-b58b-6cdef228733c,claim=StandardContentClaim
> [resourceClaim=StandardResourceClaim[id=1510934860019-132,
> container=default, section=132], offset=0, length=2160613],offset=0,name=
> 12337127439954063,size=2160613] to Kafka; routing to failure due to
> org.apache.nifi.stream.io.exception.TokenTooLargeException: A message in
> the stream exceeds the maximum allowed message size of 1048576 bytes.: {}
> org.apache.nifi.stream.io.exception.TokenTooLargeException: A message in
> the stream exceeds the maximum allowed message size of 1048576 bytes.
> at org.apache.nifi.stream.io.util.AbstractDemarcator.extractDataToken(
> AbstractDemarcator.java:157)
> at org.apache.nifi.stream.io.util.StreamDemarcator.
> nextToken(StreamDemarcator.java:129)
> at org.apache.nifi.processors.kafka.pubsub.PublisherLease.
> publish(PublisherLease.java:78)
> at org.apache.nifi.processors.kafka.pubsub.PublishKafka_0_
> 10$1.process(PublishKafka_0_10.java:334)
> at org.apache.nifi.controller.repository.StandardProcessSession.read(
> StandardProcessSession.java:2136)
> at org.apache.nifi.controller.repository.StandardProcessSession.read(
> StandardProcessSession.java:2106)
> at org.apache.nifi.processors.kafka.pubsub.PublishKafka_0_
> 10.onTrigger(PublishKafka_0_10.java:330)
> at org.apache.nifi.processor.AbstractProcessor.onTrigger(
> AbstractProcessor.java:27)
> at org.apache.nifi.controller.StandardProcessorNode.onTrigger(
> StandardProcessorNode.java:1120)
> at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(
> ContinuallyRunProcessorTask.java:147)
> at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(
> ContinuallyRunProcessorTask.java:47)
> at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.
> run(TimerDrivenSchedulingAgent.java:132)
> at java.util.concurrent.Executors$RunnableAdapter.
> call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at java.util.concurrent.ScheduledThreadPoolExecutor$
> ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> at java.util.concurrent.ScheduledThreadPoolExecutor$
> ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:748)
>
> Thanks and Regards
> Mayank
>
> --
> NOTICE: This email message is for the sole use of the intended recipient(s)
> and may contain confidential and privileged information. Any unauthorized
> review, use, disclosure or distribution is prohibited. If you are not the
> intended recipient, please contact the sender by reply email and destroy
> all copies of the original message.


Re: [EXT] Re: Please refresh my memory on NAR dependencies

2017-10-15 Thread Koji Kawamura
Peter, Matt,

If the goal is sharing org.apache.nifi.csv.CSVUtils among modules, an
alternative approach is moving CSVUtils to nifi-standard-record-util
and add ordinary JAR dependency from nifi-poi-processors. How do you
think?

Thanks,
Koji

On Mon, Oct 16, 2017 at 12:17 PM, Peter Wicks (pwicks)
 wrote:
> Matt,
>
> I am trying to re-use most of CSVUtils, including most of the property 
> descriptors and CSVUtils.createCSVFormat.
>
> It seemed like a waste to duplicate the entire class. I can try making it the 
> parent, what are the implications if I do that?
>
> Thanks,
>   Peter
>
> -Original Message-
> From: Matt Burgess [mailto:mattyb...@apache.org]
> Sent: Monday, October 16, 2017 10:58 AM
> To: dev@nifi.apache.org
> Subject: [EXT] Re: Please refresh my memory on NAR dependencies
>
> Do you have a hard requirement on the implementations in 
> nifi-record-serialization-services? Otherwise, the existing examples have the 
> processor POM pointing at the following:
>
> 
> org.apache.nifi
> nifi-record-serialization-service-api
> 
>
> which is the API JAR I think. If you need the implementations behind it, you 
> will probably need to declare that as a parent (not a
> dependency) and perhaps still use the API JAR (though I'm guessing about the 
> latter).
>
> Regards,
> Matt
>
>
> On Sun, Oct 15, 2017 at 10:27 PM, Peter Wicks (pwicks)  
> wrote:
>> For NIFI-4465 I want the nifi-poi-bundle to include a Maven dependency on 
>> nifi-record-serialization-services. So I start by adding the dependency to 
>> the pom.xml.
>>
>> 
>>org.apache.nifi
>>nifi-record-serialization-services
>> 
>>
>> I've tried several variations on this, with version numbers, putting it at 
>> higher pom levels, including it in the nifi-nar-bundles pom and marking it 
>> as included, etc...
>>
>> Throughout all this compiling is no problem, and all my unit tests run 
>> correctly. But when I try to start NiFi I immediately get Class not found 
>> exceptions from the nifi-poi classes related to the 
>> nifi-record-serialization libraries.
>>
>> I feel like I've run into this in the past, and it was due to how NAR's 
>> work. Can't remember though.
>>
>> Help would be appreciated!
>>
>> Thanks,
>>   Peter


Re: Request: Add my account to the JIRA contributors list

2017-10-10 Thread Koji Kawamura
Hi Yuri,

I've added you to JIRA contributor list, you should be able to assign
yourself now.
Thanks for your contributions to enhance NiFi UX!

Koji

On Wed, Oct 11, 2017 at 3:07 AM, Yuri <1969yuri1...@gmail.com> wrote:
> Hello,
> I'd like to be able to assign JIRA issues to myself.
>
> My JIRA account uses this particular email address.
>
> Thanks in advance.


Re: [VOTE] Release Apache NiFi 1.4.0 (RC2)

2017-09-30 Thread Koji Kawamura
+1 (binding) Release this package as nifi-1.4.0

Verified hashes, local build was successful on OS X, confirmed S2S
communication with older versions.



On Sat, Sep 30, 2017 at 9:27 AM, Andy LoPresto  wrote:
> +1 (binding)
>
> Build environment: Mac OS X 10.11.6, Java 1.8.0_101, Maven 3.3.9, JCE
> Unlimited Strength Cryptographic Jurisdiction Policies installed
>
> * verified GPG signature is valid and SHA512 digest
> * verified all checksums
> * verified all tests
> * verified checkstyle
> * verified Knox properties present in default nifi.properties
> * verified normal flow
> * verified ListenHTTP and HandleHTTPRequest only accept restricted SSLCS
> * verified bad authorizers.xml (copied from 1.2.0 -- missing
> managedAuthorizer) causes startup fail
> * verified good authorizers.xml works
> * verified secure instance works with client cert auth
> * verified secure instance works with Knox SSO
> * verified encrypted flow value migration works without Jasypt
>
> Andy LoPresto
> alopre...@apache.org
> alopresto.apa...@gmail.com
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
> On Sep 29, 2017, at 1:54 PM, Andrew Lim  wrote:
>
> +1 (non-binding)
>
> -Ran full clean install on OS X (10.11.4)
> -Tested UI changes including Variable Registry UI
> -Tested flows using Record reader/writer processors and controller services,
> working as expected
>
>
> On Sep 29, 2017, at 4:50 PM, Michael Moser  wrote:
>
> +1 (non-binding)
>
> Verified source package per release helper.
> Built on Ubuntu 14.04, all unit tests and contrib-check pass.
> Built on Windows 10, some unit tests fail and contrib-check fails on
> "nifi-poi-processors: Too many files with unapproved license" but I
> think this was expected.  The build without contrib-check and using
> -DskipTests works.
> Ran some simple flows that worked as expected.
>
> Many thanks to the community for all of the work put into this
> release!  And thanks to Jeff for being RM.
>
>
> On Fri, Sep 29, 2017 at 4:44 PM, Scott Aslan  wrote:
>
> +1 (non-binding) Release this package as nifi-1.4.0
>
> On Fri, Sep 29, 2017 at 11:52 AM, Mark Payne  wrote:
>
> +1 (binding)
>
> Verified hashes and checksum. Built with all unit tests and contrib-check
> on OSX.
>
> Was able to startup and test simple flows worked as expected.
>
> On Sep 29, 2017, at 11:20 AM, Marc P.  wrote:
>
> +1 non-binding
>
> -- verified contrib-check
> -- ran simple flows with MiNiFi
> -- sigs and hashes look good.
>
> Thanks for sending this out Jeff!
>
>
> On Fri, Sep 29, 2017 at 11:18 AM, Matt Gilman 
> wrote:
>
> +1 (binding) Release this package as nifi-1.4.0
>
> On Fri, Sep 29, 2017 at 11:07 AM, Bryan Bende  wrote:
>
> +1 (binding)
>
> - Ran through the release helper and everything checked out.
> - Ran a couple of sample flows with no issues
>
>
> On Fri, Sep 29, 2017 at 9:46 AM, James Wing  wrote:
>
> Jeff, I agree the updated KEYS file has been published.  Thanks.
>
> On Fri, Sep 29, 2017 at 6:00 AM, Jeff  wrote:
>
> James,
>
> I had to do a hard reload of the page in Chrome, since the browser
>
> kept
>
> showing me a cached version without my key.  After the hard reload, I
>
> can
>
> see my key at https://dist.apache.org/repos/dist/dev/nifi/KEYS.
>
> Could
>
> you
>
> try opening the KEYS link in incognito mode and verify that my key is
> there?
>
> Thanks,
> Jeff
>
> On Fri, Sep 29, 2017 at 1:06 AM James Wing  wrote:
>
> +1 (binding). I ran through the release helper including signature,
>
> hashes,
>
> build, and testing the binary.  I checked the LICENSE and NOTICE
>
> files.
>
> Everything looks good to me.
>
> One thing I noted is that Jeff's GPG key is not yet in the public
>
> KEYS
>
> file
>
> at https://dist.apache.org/repos/dist/dev/nifi/KEYS, but it is
>
> added
>
> in
>
> the
> master branch KEYS file to be published with the release.  I believe
>
> that
>
> is OK for the signature, we've done this before, and perhaps we
>
> should
>
> consider changing the helper text in the future.
>
> Thanks, Jeff, for putting this release together.
>
>
> On Thu, Sep 28, 2017 at 12:54 PM, Jeff  wrote:
>
> Hello,
>
> I am pleased to be calling this vote for the source release of
>
> Apache
>
> NiFi
>
> nifi-1.4.0.
>
> The source zip, including signatures, digests, etc. can be found
>
> at:
>
> https://repository.apache.org/content/repositories/
>
> orgapachenifi-
>
>
> The Git tag is nifi-1.4.0-RC2
> The Git commit ID is e6508ba7d3da5bba54abd6233a7a8f9dd4c32151
> https://git-wip-us.apache.org/repos/asf?p=nifi.git;a=commit;h=
> e6508ba7d3da5bba54abd6233a7a8f9dd4c32151
>
> Checksums of nifi-1.4.0-source-release.zip:
> MD5: 41e4083e602883a3e180032f32913414
> SHA1: 26770625138126f45bed4989adb0a6b65a767aa2
>
> Release artifacts 

Re: Ingest data into SQL database using NiFi

2017-09-27 Thread Koji Kawamura
Hi Tina,

Glad to hear you were able to get schema.

The read size in ExecuteSQL is less because it's serialized with Avro
in which data can be written efficiently, and gets bigger after
ConvertJSONToSQL because each FlowFile has SQL statement in it.

Which version of Apache NiFi are you using? If you can use 1.3.0, I'd
recommend to use QueryRecord to transform data, and PutDatabaseRecord
to store rows into the destination table.

If you're not familiar with Record data model, Mark's blog would be helpful:
https://blogs.apache.org/nifi/entry/record-oriented-data-with-nifi

Once you know how Record works, you can do interesting things such as
transform data using SQL against FlowFile:
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.3.0/org.apache.nifi.processors.standard.QueryRecord/index.html

With record data model, you don't have to split each row nor convert
to SQL statement one by one. Instead a FlowFile containing multiple
records (rows) can be passed around processors to be processed more
efficiently.

Thanks,
Koji


On Thu, Sep 28, 2017 at 4:48 AM, tzhu  wrote:
> Hi Koji,
>
> Thank you so much for your help! I didn't specify the 'Catalog Name' and
> 'Schema Name' before, and now the error is fixed.
>
> Now I have another question: After getting converted into different
> datatype, the data size gets very large. The read size in ExecuteSQL is
> about 200 MB, and the size in ConvertJSONToSQL becomes 1 GB. Is there any
> way to reduce the size? I'm thinking about two solutions. One is to use
> other efficient processors; or to split the input into small pieces, and
> maybe take 1000 rows every time to do the transformation.
>
> Hope this makes sense to you.
>
> Thank you,
>
> Tina
>
>
>
> --
> Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/


Re: Ingest data into SQL database using NiFi

2017-09-26 Thread Koji Kawamura
Hi Tina,

I tested ExecuteSQL -> ConvertAvroToJSON -> ConvertJSONToSQL -> PutSQL
flow with my SQL Server.
It worked fine, I was able to copy rows from a table to another.
One thing to note is that since you're using two different databases,
you need to specify 'Catalog Name' and 'Schema Name' at
ConvertJSONToSQL properly.

I've written a summary at this Gist page, with SQL examples,
screenshot and flow template:
https://gist.github.com/ijokarumawak/42c257afb5e80361e502564085d7999e

I hope you can find it useful.

Thanks,
Koji

On Wed, Sep 27, 2017 at 4:59 AM, tzhu  wrote:
> Hi Koji,
>
> It shows the same error as I change "date" to "tx_date". You can see the
> error message in the image attached.
>
> 
>
> In the meantime, I'm trying to use ExecuteSQL, ReplaceText, PutSQL to do the
> same thing. While it doesn't throw an error when processing, the value does
> not get copied from the source table but only default format of the datatype
> (showing "1900-01-01" for date and "0" for int).
>
> My original idea was to combine the templates " Database Extract with NiFi
>   " and " Database
> Ingest with NiFi   "
> so that NiFi selects the data from one table and insert into another one.
> Now I'm thinking maybe the idea was wrong. If you know some better way to
> solve this problem, please let me know.
>
> Thanks,
>
> Tina
>
>
>
> --
> Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/


Re: Ingest data into SQL database using NiFi

2017-09-25 Thread Koji Kawamura
The list of reserved keywords:
https://docs.microsoft.com/en-us/sql/t-sql/language-elements/reserved-keywords-transact-sql

On Tue, Sep 26, 2017 at 9:31 AM, Koji Kawamura <ijokaruma...@gmail.com> wrote:
> Hi Tina,
>
> I wonder if the column name is the cause of that issue, because 'date'
> is a reserved keyword.
> I wonder whether ConvertJSONToSQL can wrap those columns with square
> brackets as shown in your example query.
>
> If possible, can you try to change the column name to different one
> such as 'tx_date' and see if it works?
>
> Thanks,
> Koji
>
> On Mon, Sep 25, 2017 at 10:13 PM, tzhu <js.tianlu...@gmail.com> wrote:
>> Hi Koji,
>>
>> The "transaction" table only has two columns, "Date" and "Institution". I
>> don't know what else I could change to...
>>
>> Thanks,
>> Tina
>>
>>
>>
>> --
>> Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/


Re: Ingest data into SQL database using NiFi

2017-09-25 Thread Koji Kawamura
Hi Tina,

I wonder if the column name is the cause of that issue, because 'date'
is a reserved keyword.
I wonder whether ConvertJSONToSQL can wrap those columns with square
brackets as shown in your example query.

If possible, can you try to change the column name to different one
such as 'tx_date' and see if it works?

Thanks,
Koji

On Mon, Sep 25, 2017 at 10:13 PM, tzhu  wrote:
> Hi Koji,
>
> The "transaction" table only has two columns, "Date" and "Institution". I
> don't know what else I could change to...
>
> Thanks,
> Tina
>
>
>
> --
> Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/


  1   2   >