Re: Jira contributor access

2022-10-03 Thread Matt Burgess
Mathew,

I have added you as a Contributor to the NiFi projects in Jira.
Looking forward to your contributions!

Regards,
Matt

On Mon, Oct 3, 2022 at 2:25 AM Mathew Kiprop  wrote:
>
> My Jira username is '*mkapkiai*'
>
> On Mon, Oct 3, 2022 at 9:21 AM Mathew Kiprop 
> wrote:
>
> > Hello Apache NiFi team,
> >
> > I hope this email finds you well.
> > I am writing to request 'Jira contributor access'.
> >
> > Regards,
> > Mathew Kiprop
> >


Re: [VOTE] Release Apache NiFi 1.18.0 (RC4)

2022-10-03 Thread Matt Burgess
+1 binding

Ran through release helper, verified NIFI-10568 (statefulness added to
runtime component manifests) and NIFI-9042 (DatabaseParameterProvider)
and ran a couple of flows exercising various components such as
record-based processors and controller services.

Thanks for RM'ing Joe!


On Mon, Oct 3, 2022 at 4:45 PM Joe Witt  wrote:
>
> Hello,
>
> I am pleased to be calling this vote for the source release of Apache
> NiFi 1.18.0.
>
> The source zip, including signatures, digests, etc. can be found at:
> https://repository.apache.org/content/repositories/orgapachenifi-1214
>
> The source being voted upon and the convenience binaries can be found at:
> https://dist.apache.org/repos/dist/dev/nifi/nifi-1.18.0/
>
> A helpful reminder on how the release candidate verification process works:
> https://cwiki.apache.org/confluence/display/NIFI/How+to+help+verify+an+Apache+NiFi+release+candidate
>
> The Git tag is nifi-1.18.0-RC4
> The Git commit ID is 109e54cd585902a981d1b370b3dc4d1620be438c
> https://gitbox.apache.org/repos/asf?p=nifi.git;a=commit;h=109e54cd585902a981d1b370b3dc4d1620be438c
>
> Checksums of nifi-1.18.0-source-release.zip:
> SHA256: 925cbb92c107d0fa3194a349d985cff4933a61b2555eff57b1b81433fe37c139
> SHA512: 
> f143215b1746342e7584f5ad65b546fcc378cd78aa17628fb605dfdbbaf11e897a0173dd67807fc90cb18c17124a4227d5fe07e7ed609d9ed1904503b757c604
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/joewitt.asc
>
> KEYS file available here:
> https://dist.apache.org/repos/dist/release/nifi/KEYS
>
> 184 issues were closed/resolved for this release:
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316020=12352150
>
> Release note highlights can be found here:
> https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version1.18.0
>
> The vote will be open for 72 hours.
> Please download the release candidate and evaluate the necessary items
> including checking hashes, signatures, build
> from source, and test. Then please vote:
>
> [ ] +1 Release this package as nifi-1.18.0
> [ ] +0 no opinion
> [ ] -1 Do not release this package because...


Re: [VOTE] Release Apache NiFi 1.18.0 (RC4)

2022-10-03 Thread Nathan Gough
Hi,

I reverified the hashes and compiled from source with OpenJDK Runtime
Environment Zulu11.58+15-CA.

Ran a test cluster again and verified my cluster test flow still works as
expected.

+1 binding

Nathan

On Mon, Oct 3, 2022 at 4:45 PM Joe Witt  wrote:

> Hello,
>
> I am pleased to be calling this vote for the source release of Apache
> NiFi 1.18.0.
>
> The source zip, including signatures, digests, etc. can be found at:
> https://repository.apache.org/content/repositories/orgapachenifi-1214
>
> The source being voted upon and the convenience binaries can be found at:
> https://dist.apache.org/repos/dist/dev/nifi/nifi-1.18.0/
>
> A helpful reminder on how the release candidate verification process works:
>
> https://cwiki.apache.org/confluence/display/NIFI/How+to+help+verify+an+Apache+NiFi+release+candidate
>
> The Git tag is nifi-1.18.0-RC4
> The Git commit ID is 109e54cd585902a981d1b370b3dc4d1620be438c
>
> https://gitbox.apache.org/repos/asf?p=nifi.git;a=commit;h=109e54cd585902a981d1b370b3dc4d1620be438c
>
> Checksums of nifi-1.18.0-source-release.zip:
> SHA256: 925cbb92c107d0fa3194a349d985cff4933a61b2555eff57b1b81433fe37c139
> SHA512:
> f143215b1746342e7584f5ad65b546fcc378cd78aa17628fb605dfdbbaf11e897a0173dd67807fc90cb18c17124a4227d5fe07e7ed609d9ed1904503b757c604
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/joewitt.asc
>
> KEYS file available here:
> https://dist.apache.org/repos/dist/release/nifi/KEYS
>
> 184 issues were closed/resolved for this release:
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316020=12352150
>
> Release note highlights can be found here:
>
> https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version1.18.0
>
> The vote will be open for 72 hours.
> Please download the release candidate and evaluate the necessary items
> including checking hashes, signatures, build
> from source, and test. Then please vote:
>
> [ ] +1 Release this package as nifi-1.18.0
> [ ] +0 no opinion
> [ ] -1 Do not release this package because...
>


Should updateAttribute Mutate the FlowFile Passed In?

2022-10-03 Thread Eric Secules
Hello NiFi Devs,

I logged this bug not too long ago:
https://issues.apache.org/jira/browse/NIFI-10583

and I would like some validation on whether I took the right direction with
the desired behaviour being that both ProcessSessions not mutate the
flowfile passed in, rather than changing the MockProcessSession to mutate
the flowfile passed in in addition to returning a new flowfile with the
attributes added.

Thanks,
Eric


[VOTE] Release Apache NiFi 1.18.0 (RC4)

2022-10-03 Thread Joe Witt
Hello,

I am pleased to be calling this vote for the source release of Apache
NiFi 1.18.0.

The source zip, including signatures, digests, etc. can be found at:
https://repository.apache.org/content/repositories/orgapachenifi-1214

The source being voted upon and the convenience binaries can be found at:
https://dist.apache.org/repos/dist/dev/nifi/nifi-1.18.0/

A helpful reminder on how the release candidate verification process works:
https://cwiki.apache.org/confluence/display/NIFI/How+to+help+verify+an+Apache+NiFi+release+candidate

The Git tag is nifi-1.18.0-RC4
The Git commit ID is 109e54cd585902a981d1b370b3dc4d1620be438c
https://gitbox.apache.org/repos/asf?p=nifi.git;a=commit;h=109e54cd585902a981d1b370b3dc4d1620be438c

Checksums of nifi-1.18.0-source-release.zip:
SHA256: 925cbb92c107d0fa3194a349d985cff4933a61b2555eff57b1b81433fe37c139
SHA512: 
f143215b1746342e7584f5ad65b546fcc378cd78aa17628fb605dfdbbaf11e897a0173dd67807fc90cb18c17124a4227d5fe07e7ed609d9ed1904503b757c604

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/joewitt.asc

KEYS file available here:
https://dist.apache.org/repos/dist/release/nifi/KEYS

184 issues were closed/resolved for this release:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316020=12352150

Release note highlights can be found here:
https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version1.18.0

The vote will be open for 72 hours.
Please download the release candidate and evaluate the necessary items
including checking hashes, signatures, build
from source, and test. Then please vote:

[ ] +1 Release this package as nifi-1.18.0
[ ] +0 no opinion
[ ] -1 Do not release this package because...


Re: Queries regarding Nifi custom processor for Azure Data Explorer

2022-10-03 Thread Joey Frazee
Hi Tanmaya,

There’s been some community interest in ADX processors in the past. I’m 
curious, is this something your team is interested in contributing back to 
Apache NiFi, or do you plan to release it as a Microsoft open source project?

Either is fine of course.

One thing to consider is that conceptually data sources in Spark are usually a 
little different and hide a lot of details. NiFi processors tend to do fewer 
things and get composed into a flow. You could stuff most of the things you’re 
doing onto a single processor but if you have a good simple processor for 
writing to ADX some of the other stuff you mention might just be logic in a 
flow.

-joey

> On Oct 3, 2022, at 5:56 AM, Mike Thomsen  wrote:
> 
> I think that is at least partly doable within NiFi (ex yes, you can
> restrict processors to the primary node in a cluster), but I would
> recommend you consider a different approach for NiFi. Unlike Spark,
> NiFi is perfectly content to stream in huge amounts of data
> continuously without any temporary storage besides its repositories
> (flowfile, content, etc). Therefore, I think a potentially easier
> solution would be for your team to explore creating a connector
> between NiFi and Azure Data Explorer that allows the latter to
> firehose the former with data as it comes in and let the chips fall
> where they may.
> 
> You might find some useful concepts in the Kafka processors for things
> like managing a continuous flow of record data from a stream and
> converting it to blocks of NiFi record data (see the Record API in our
> documentation for details).
> 
> FWIW, I manage a data set in AWS w/ NiFi that is over 50TB compressed,
> and a fairly generic 5 node NiFi cluster crushes that data on cheap
> EC2 instances without issue. So handling TBs of data is something
> fairly out of the box for NiFi if you're worried about that.
> 
>> On Sat, Oct 1, 2022 at 12:05 AM Tanmaya Panda
>>  wrote:
>> 
>> Hi Team,
>> 
>> We at Microsoft opensource, are developing a custom for Azure Data Explorer 
>> sink connector for Apache Nifi. What we want to achieve is transactionality 
>> with data ingestion. The source of the processor can be TBs of telemetry 
>> data as well as CDC logs. What that means is in case of any failure while 
>> ingesting data of particular partition to ADX, will delete/cleanup any other 
>> ingested data of other partitions. Since Azure Data Explorer is an append 
>> only database, unfortunately we cant perform delete on the ingested data of 
>> same partition or other partition. So to achieve this kind of 
>> transactionality of large ingested data, we are thinking to implement 
>> something similar we have done for Apache Spark ADX connector. We will be 
>> creating temporary tables inside Azure Data Explorer before ingesting to the 
>> actual tables. The worker nodes in apache spark will create these temporary 
>> tables and report the ingestion status to the driver node. The driver node 
>> on receiving the success status of all the worker nodes performs ingestion 
>> on the actual table or else the ingestion is aborted along with cleanup of 
>> temporary table. So basically we aggregate the worker node task status in 
>> the driver node in spark to take further decision on whether to ingest data 
>> into ADX table or not.
>> Question 1: Is this possible in Apache Nifi follows a zero master cluster 
>> strategy as opposed to master-slave architecture of apache spark?
>> Question 2: In our custom processor in Nifi, is it possible to run custom 
>> code of a particular partition say the cluster coordinator node. Also is it 
>> possible to get the details of the partition inside the processor?
>> Question 3: Is is to possible to get the details of tasks executed at the 
>> various partitions and take decisions based on the task status. Can all 
>> these be done inside the same processor.
>> 
>> Thanks,
>> Tanmaya


[CANCEL][VOTE] Release Apache NiFi 1.18.0 (RC3)

2022-10-03 Thread Joe Witt
Team,

In light of Nandor's finding [1] and an issue Gresock [2] is working
from a community report over last week [3] and into the weekend RC3 is
cancelled.  I'll get RC4 up as soon as [2] lands.

[1] https://issues.apache.org/jira/browse/NIFI-10574
[2] https://issues.apache.org/jira/browse/NIFI-10572
[3] https://apachenifi.slack.com/archives/C0L9S92JY/p1664285128489789

Thanks

On Sun, Oct 2, 2022 at 4:28 AM Nandor Soma Abonyi
 wrote:
>
> Hello,
>
> Sorry for ruining the voting party…
>
> -1 (non-binding)
>
> The simplest GenerateFlowFile -> PutAzureDataLakeStorage flow throws an 
> exception and results in failure. Verified that tests in 
> ITPutAzureDataLakeStorage are also failing.
> I think the reason is the Azure SDK upgrade.
>
> Stack trace:
> 2022-10-02 12:14:47,389 ERROR [Timer-Driven Process Thread-9] 
> o.a.n.p.a.s.PutAzureDataLakeStorage 
> PutAzureDataLakeStorage[id=23aee885-b94b-3dc3-367c-5506568d5c16] Failed to 
> create file on Azure Data Lake Storage
> com.azure.storage.file.datalake.models.DataLakeStorageException: Status code 
> 412, "{"error":{"code":"ConditionNotMet","message":"The condition specified 
> using HTTP conditional header(s) is not 
> met.\nRequestId:bc39f465-e01f-001b-1b47-d6810400\nTime:2022-10-02T10:14:47.2730857Z"}}"
> at 
> java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:627)
> at 
> com.azure.core.implementation.http.rest.ResponseExceptionConstructorCache.invoke(ResponseExceptionConstructorCache.java:56)
> at 
> com.azure.core.implementation.http.rest.RestProxyBase.instantiateUnexpectedException(RestProxyBase.java:377)
> at 
> com.azure.core.implementation.http.rest.AsyncRestProxy.lambda$ensureExpectedStatus$1(AsyncRestProxy.java:117)
> at 
> reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.onNext(FluxMapFuseable.java:113)
> at 
> reactor.core.publisher.Operators$ScalarSubscription.request(Operators.java:2398)
> at 
> reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.request(FluxMapFuseable.java:171)
> at 
> reactor.core.publisher.Operators$MultiSubscriptionSubscriber.set(Operators.java:2194)
> at 
> reactor.core.publisher.Operators$MultiSubscriptionSubscriber.onSubscribe(Operators.java:2068)
> at 
> reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.onSubscribe(FluxMapFuseable.java:96)
> at reactor.core.publisher.MonoJust.subscribe(MonoJust.java:55)
> at 
> reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:64)
> at 
> reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:157)
> at 
> reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.onNext(FluxMapFuseable.java:129)
> at 
> reactor.core.publisher.FluxHide$SuppressFuseableSubscriber.onNext(FluxHide.java:137)
> at 
> reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.onNext(FluxMapFuseable.java:129)
> at 
> reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.onNext(FluxMapFuseable.java:129)
> at 
> reactor.core.publisher.FluxHide$SuppressFuseableSubscriber.onNext(FluxHide.java:137)
> at 
> reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onNext(FluxOnErrorResume.java:79)
> at 
> reactor.core.publisher.Operators$MonoSubscriber.complete(Operators.java:1816)
> at 
> reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:151)
> at 
> reactor.core.publisher.FluxDelaySubscription$DelaySubscriptionMainSubscriber.onNext(FluxDelaySubscription.java:189)
> at 
> reactor.core.publisher.SerializedSubscriber.onNext(SerializedSubscriber.java:99)
> at 
> reactor.core.publisher.SerializedSubscriber.onNext(SerializedSubscriber.java:99)
> at 
> reactor.core.publisher.FluxTimeout$TimeoutMainSubscriber.onNext(FluxTimeout.java:180)
> at 
> reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.onNext(FluxMapFuseable.java:129)
> at 
> reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.onNext(FluxMapFuseable.java:129)
> at 
> reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.onNext(FluxMapFuseable.java:129)
> at 
> reactor.core.publisher.Operators$MonoSubscriber.complete(Operators.java:1816)
> at 
> reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:151)
> at 
> reactor.core.publisher.SerializedSubscriber.onNext(SerializedSubscriber.java:99)
> at 
> reactor.core.publisher.FluxRetryWhen$RetryWhenMainSubscriber.onNext(FluxRetryWhen.java:174)
> at 
> reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onNext(FluxOnErrorResume.java:79)
> at 
> reactor.core.publisher.Operators$MonoInnerProducerBase.complete(Operators.java:2664)
> at 
> reactor.core.publisher.MonoSingle$SingleSubscriber.onComplete(MonoSingle.java:180)
> at 
> 

Re: Queries regarding Nifi custom processor for Azure Data Explorer

2022-10-03 Thread Mike Thomsen
I think that is at least partly doable within NiFi (ex yes, you can
restrict processors to the primary node in a cluster), but I would
recommend you consider a different approach for NiFi. Unlike Spark,
NiFi is perfectly content to stream in huge amounts of data
continuously without any temporary storage besides its repositories
(flowfile, content, etc). Therefore, I think a potentially easier
solution would be for your team to explore creating a connector
between NiFi and Azure Data Explorer that allows the latter to
firehose the former with data as it comes in and let the chips fall
where they may.

You might find some useful concepts in the Kafka processors for things
like managing a continuous flow of record data from a stream and
converting it to blocks of NiFi record data (see the Record API in our
documentation for details).

FWIW, I manage a data set in AWS w/ NiFi that is over 50TB compressed,
and a fairly generic 5 node NiFi cluster crushes that data on cheap
EC2 instances without issue. So handling TBs of data is something
fairly out of the box for NiFi if you're worried about that.

On Sat, Oct 1, 2022 at 12:05 AM Tanmaya Panda
 wrote:
>
> Hi Team,
>
> We at Microsoft opensource, are developing a custom for Azure Data Explorer 
> sink connector for Apache Nifi. What we want to achieve is transactionality 
> with data ingestion. The source of the processor can be TBs of telemetry data 
> as well as CDC logs. What that means is in case of any failure while 
> ingesting data of particular partition to ADX, will delete/cleanup any other 
> ingested data of other partitions. Since Azure Data Explorer is an append 
> only database, unfortunately we cant perform delete on the ingested data of 
> same partition or other partition. So to achieve this kind of 
> transactionality of large ingested data, we are thinking to implement 
> something similar we have done for Apache Spark ADX connector. We will be 
> creating temporary tables inside Azure Data Explorer before ingesting to the 
> actual tables. The worker nodes in apache spark will create these temporary 
> tables and report the ingestion status to the driver node. The driver node on 
> receiving the success status of all the worker nodes performs ingestion on 
> the actual table or else the ingestion is aborted along with cleanup of 
> temporary table. So basically we aggregate the worker node task status in the 
> driver node in spark to take further decision on whether to ingest data into 
> ADX table or not.
> Question 1: Is this possible in Apache Nifi follows a zero master cluster 
> strategy as opposed to master-slave architecture of apache spark?
> Question 2: In our custom processor in Nifi, is it possible to run custom 
> code of a particular partition say the cluster coordinator node. Also is it 
> possible to get the details of the partition inside the processor?
> Question 3: Is is to possible to get the details of tasks executed at the 
> various partitions and take decisions based on the task status. Can all these 
> be done inside the same processor.
>
> Thanks,
> Tanmaya


Re: Jira contributor access

2022-10-03 Thread Mathew Kiprop
My Jira username is '*mkapkiai*'

On Mon, Oct 3, 2022 at 9:21 AM Mathew Kiprop 
wrote:

> Hello Apache NiFi team,
>
> I hope this email finds you well.
> I am writing to request 'Jira contributor access'.
>
> Regards,
> Mathew Kiprop
>


Jira contributor access

2022-10-03 Thread Mathew Kiprop
Hello Apache NiFi team,

I hope this email finds you well.
I am writing to request 'Jira contributor access'.

Regards,
Mathew Kiprop