Re: discuss NiFi 1.24 release

2023-10-12 Thread David Handermann
Mike,

Yes that's correct, here's the Jira issue for tracking the migration work:

https://issues.apache.org/jira/browse/NIFI-12219

Regards,
David Handermann

On Thu, Oct 12, 2023, 5:29 PM Mike Thomsen  wrote:

> This Xodus?
>
> https://github.com/JetBrains/xodus
>
> On Thu, Oct 12, 2023 at 1:30 PM David Handermann <
> exceptionfact...@apache.org> wrote:
>
> > Mike,
> >
> > Thanks for raising this question. I am working on an automated
> > migration for the support branch from H2 to Xodus. We previously
> > handled automated migration from H2 1.4 to 2.1, and 2.1 to 2.2, so the
> > basic mechanics are in place to extract content from H2. The general
> > upgrade path will be to migrate to the latest version 1 release, and
> > then upgrade to version 2.
> >
> > Regards,
> > David Handermann
> >
> > On Thu, Oct 12, 2023 at 12:24 PM Mike Thomsen 
> > wrote:
> > >
> > > When H2 goes, what will the upgrade path look like?
> > >
> > > On Tue, Oct 10, 2023 at 7:57 PM David Handermann <
> > > exceptionfact...@apache.org> wrote:
> > >
> > > > Joe,
> > > >
> > > > Thanks for the reply, that sounds good.
> > > >
> > > > For reference, here is the Jira issue for tracking the initial
> > > > implementation:
> > > >
> > > > https://issues.apache.org/jira/browse/NIFI-12206
> > > >
> > > > Regards,
> > > > David Handermann
> > > >
> > > > On Tue, Oct 10, 2023 at 6:53 PM Joe Witt  wrote:
> > > > >
> > > > > David
> > > > >
> > > > > I think we can hold off for a few weeks - I'll respond to the Slack
> > > > message
> > > > > on that.
> > > > >
> > > > > Will be sad to see H2 go.  The original nifi flowfile repository
> ran
> > on
> > > > > H2.  Surprisingly fast and stable actually.  But happy to hear
> there
> > is
> > > > > such progress underway.
> > > > >
> > > > > Thanks
> > > > >
> > > > > On Tue, Oct 10, 2023 at 4:27 PM David Handermann <
> > > > > exceptionfact...@apache.org> wrote:
> > > > >
> > > > > > Joe,
> > > > > >
> > > > > > Thanks for initiating the discussion on a 1.24 release. There
> are a
> > > > > > handful of new features, plus additional deprecations and
> > dependency
> > > > > > upgrades that would be very useful to release sooner rather than
> > > > > > later.
> > > > > >
> > > > > > If we are considering a 1.24 release within the next week, then I
> > > > > > would expect needing a 1.25 release to incorporate some
> additional
> > > > > > deprecations around the same time we are ready for a 2.0
> milestone
> > > > > > release version.
> > > > > >
> > > > > > I am in the process of implementing a replacement for H2 to store
> > flow
> > > > > > configuration history, and I plan to have a pull request ready in
> > the
> > > > > > next few days. There will be some additional work to support
> > migration
> > > > > > on version 1 support branch.
> > > > > >
> > > > > > If we want to move forward with a 1.24 release within a week or
> so,
> > > > > > then these changes could be targeted for 1.25. If we want to
> wait a
> > > > > > few weeks, then I could see this being incorporated in 1.24.
> > > > > >
> > > > > > Regards,
> > > > > > David Handermann
> > > > > >
> > > > > > On Tue, Oct 10, 2023 at 6:14 PM Mark Payne  >
> > > > wrote:
> > > > > > >
> > > > > > > Thanks Joe. I wouldn’t argue against doing a 1.24 release.
> > > > > > >
> > > > > > > Thanks
> > > > > > > -Mark
> > > > > > >
> > > > > > > > On Oct 10, 2023, at 7:01 PM, Joe Witt 
> > wrote:
> > > > > > > >
> > > > > > > > Team,
> > > > > > > >
> > > > > > > > We have plenty of bits out there to kick a release.  I'm
> happy
> > to
> > > > RM
> > > > > > this
> > > > > > > > if nobody else wants to.
> > > > > > > >
> > > > > > > > Had a user on Slack ask for it.
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > > Joe
> > > > > > >
> > > > > >
> > > >
> >
>


Re: JOLTTransformRecord problem

2023-10-12 Thread Matthew Hawkins
I've had similar issues with the different processing done by
JoltTransformJSON and JoltTransformRecord and threw my hands in the air and
just ExcuteScript out to some Python that transforms the data correctly.

Try to minimise content transformations so the content repository doesn't
bloat with interim forms.

On Fri, 13 Oct 2023, 01:26 Mark Woodcock,  wrote:

> Chris,
>
> I guess I was so wrapped up in the ConvertRecord, it didn't occur to me to
> have the JOLT processor just emit a different format; not sure how I missed
> that.
> So, I re-rigged my flow to have  ReplaceText -> JOLTTransform ->
> UpdateAttribute -> etc.   Unfortunately, I got the attached error.
>
> This is running against the set of data that I provided earlier (that is
> the test case for the bug).  It's first 3 records all lack dimension and
> eta fields (which my ReplaceText process has removed all evidence of);
> followed by one record that has record values for both fields.  The error
> seems to be saying that those early records (at the time of writing to
> Avro) don't have the appropriate format, because they have "null" values
> (in the ETA and dimensions fields) which are not valid for the JSON schema
> which has been inferred.
>
> Why is it creating a single JSON schema for the whole file instead of
> managing each record independently?  (and not that it really matters, but
> hey, the short records are coming first, why precompute the schema before
> finishing the processing of any record?)
>
> And, is there something else I can try?
>
> thx,
>
> mew
>
>
>
>
> On Wed, Oct 11, 2023 at 4:35 PM Chris Sampson
>  wrote:
>
>> You could use an AvroWriter to output the results of the
>> JoltTransformRecord - it doesn’t need to be JSON (in or out), that’s one of
>> the great things of NiFi’s Record processors - if there’s a Reader and
>> Writer in the format you want, you can use that data and the Writer doesn’t
>> need to be the same format as the Reader.
>>
>> Good news: I’ve identified the problem in my NIFI-8135 PR [1] by adding a
>> cut-down version of your example as a unit test for the JoltTransformRecord
>> processor.
>>
>> However, I’m not so sure the output is quite what you were expecting -
>> see the
>> nifi-nar-bundles/nifi-jolt-record-bundle/nifi-jolt-record-processors/src/test/resources/TestJoltTransformRecord/flattenedOutput.json
>> file in the linked PR, the “Eta”’s “value” field appears as a Java Map
>> serialised as a String, I imagine you were wanting this to be a nested
>> Object?
>>
>> If the latter, I think we’re then running into NIFI-8134 [2], for which I
>> have a separate PR ready for review [3].
>>
>> [1]: https://github.com/apache/nifi/pull/7746/files
>>
>> [2]: https://issues.apache.org/jira/browse/NIFI-8134
>>
>> [3]: https://github.com/apache/nifi/pull/7745
>>
>>
>> Cheers,
>>
>> ---
>> Chris Sampson
>> IT Consultant
>> chris.samp...@naimuri.com
>>
>>
>> > On 11 Oct 2023, at 19:22, Mark Woodcock 
>> wrote:
>> >
>> > Chris,
>> >
>> > 1) well, reassuring to learn that I've found an actual bug; and
>> pleasing to
>> > know that I constructed an effective and illuminating test.  hurrah.
>> >
>> > 2) So, I can certainly use the ReplaceText (is there a better choice?)
>> > processor to ditch any field that looks like "whatever": "", (and I
>> > successfully implemented it), but unfortunately when I pass the
>> resulting
>> > json onto another processor (e.g. a ConvertRecord, so I can spit out
>> AVRO),
>> > the fact that the data now has different schemas causes an error.  Is
>> this
>> > just kicking the can down the road?
>> >
>> > thx,
>> >
>> > mew
>> >
>> >
>> > On Wed, Oct 11, 2023 at 6:00 AM Chris Sampson
>> >  wrote:
>> >
>> >> FYI - original thread in the archives for reference [1].
>> >>
>> >> Thanks for your more complete example, this does indeed fail with the
>> >> error you indicate. I think it’s related to NIFI-8135 [2], which
>> identified
>> >> a deficiency in the way Records are converted to Java Maps,
>> particularly
>> >> where CHOICE types are involved.
>> >>
>> >> The example data you’ve provided does indeed have a mix of String and
>> >> Record (JSON Object) values for the affected fields - this is a little
>> >> unusual, but certainly nothing that’s banned in the world of JSON, so
>> >> should probably be handled better by NiFi.
>> >>
>> >> I’ve had a go at providing a PR for NIFI-8135 (as yet unreviewed) [3].
>> I’d
>> >> been struggling to re-create the error for the ticket, but I think your
>> >> example does it nicely, so provides a good test for whether the
>> problem is
>> >> fixed - unfortunately, when I run this example data against my branch,
>> it
>> >> still fails albeit with a different error:
>> >>
>> >> java.lang.ClassCastException: class java.lang.String cannot be cast to
>> >> class org.apache.nifi.serialization.record.Record (java.lang.String is
>> in
>> >> module java.base of loader 'bootstrap';
>> >> org.apache.nifi.serialization.record.Record is in unnamed module 

Re: discuss NiFi 1.24 release

2023-10-12 Thread Mike Thomsen
This Xodus?

https://github.com/JetBrains/xodus

On Thu, Oct 12, 2023 at 1:30 PM David Handermann <
exceptionfact...@apache.org> wrote:

> Mike,
>
> Thanks for raising this question. I am working on an automated
> migration for the support branch from H2 to Xodus. We previously
> handled automated migration from H2 1.4 to 2.1, and 2.1 to 2.2, so the
> basic mechanics are in place to extract content from H2. The general
> upgrade path will be to migrate to the latest version 1 release, and
> then upgrade to version 2.
>
> Regards,
> David Handermann
>
> On Thu, Oct 12, 2023 at 12:24 PM Mike Thomsen 
> wrote:
> >
> > When H2 goes, what will the upgrade path look like?
> >
> > On Tue, Oct 10, 2023 at 7:57 PM David Handermann <
> > exceptionfact...@apache.org> wrote:
> >
> > > Joe,
> > >
> > > Thanks for the reply, that sounds good.
> > >
> > > For reference, here is the Jira issue for tracking the initial
> > > implementation:
> > >
> > > https://issues.apache.org/jira/browse/NIFI-12206
> > >
> > > Regards,
> > > David Handermann
> > >
> > > On Tue, Oct 10, 2023 at 6:53 PM Joe Witt  wrote:
> > > >
> > > > David
> > > >
> > > > I think we can hold off for a few weeks - I'll respond to the Slack
> > > message
> > > > on that.
> > > >
> > > > Will be sad to see H2 go.  The original nifi flowfile repository ran
> on
> > > > H2.  Surprisingly fast and stable actually.  But happy to hear there
> is
> > > > such progress underway.
> > > >
> > > > Thanks
> > > >
> > > > On Tue, Oct 10, 2023 at 4:27 PM David Handermann <
> > > > exceptionfact...@apache.org> wrote:
> > > >
> > > > > Joe,
> > > > >
> > > > > Thanks for initiating the discussion on a 1.24 release. There are a
> > > > > handful of new features, plus additional deprecations and
> dependency
> > > > > upgrades that would be very useful to release sooner rather than
> > > > > later.
> > > > >
> > > > > If we are considering a 1.24 release within the next week, then I
> > > > > would expect needing a 1.25 release to incorporate some additional
> > > > > deprecations around the same time we are ready for a 2.0 milestone
> > > > > release version.
> > > > >
> > > > > I am in the process of implementing a replacement for H2 to store
> flow
> > > > > configuration history, and I plan to have a pull request ready in
> the
> > > > > next few days. There will be some additional work to support
> migration
> > > > > on version 1 support branch.
> > > > >
> > > > > If we want to move forward with a 1.24 release within a week or so,
> > > > > then these changes could be targeted for 1.25. If we want to wait a
> > > > > few weeks, then I could see this being incorporated in 1.24.
> > > > >
> > > > > Regards,
> > > > > David Handermann
> > > > >
> > > > > On Tue, Oct 10, 2023 at 6:14 PM Mark Payne 
> > > wrote:
> > > > > >
> > > > > > Thanks Joe. I wouldn’t argue against doing a 1.24 release.
> > > > > >
> > > > > > Thanks
> > > > > > -Mark
> > > > > >
> > > > > > > On Oct 10, 2023, at 7:01 PM, Joe Witt 
> wrote:
> > > > > > >
> > > > > > > Team,
> > > > > > >
> > > > > > > We have plenty of bits out there to kick a release.  I'm happy
> to
> > > RM
> > > > > this
> > > > > > > if nobody else wants to.
> > > > > > >
> > > > > > > Had a user on Slack ask for it.
> > > > > > >
> > > > > > > Thanks
> > > > > > > Joe
> > > > > >
> > > > >
> > >
>


Re: discuss NiFi 1.24 release

2023-10-12 Thread David Handermann
Mike,

Thanks for raising this question. I am working on an automated
migration for the support branch from H2 to Xodus. We previously
handled automated migration from H2 1.4 to 2.1, and 2.1 to 2.2, so the
basic mechanics are in place to extract content from H2. The general
upgrade path will be to migrate to the latest version 1 release, and
then upgrade to version 2.

Regards,
David Handermann

On Thu, Oct 12, 2023 at 12:24 PM Mike Thomsen  wrote:
>
> When H2 goes, what will the upgrade path look like?
>
> On Tue, Oct 10, 2023 at 7:57 PM David Handermann <
> exceptionfact...@apache.org> wrote:
>
> > Joe,
> >
> > Thanks for the reply, that sounds good.
> >
> > For reference, here is the Jira issue for tracking the initial
> > implementation:
> >
> > https://issues.apache.org/jira/browse/NIFI-12206
> >
> > Regards,
> > David Handermann
> >
> > On Tue, Oct 10, 2023 at 6:53 PM Joe Witt  wrote:
> > >
> > > David
> > >
> > > I think we can hold off for a few weeks - I'll respond to the Slack
> > message
> > > on that.
> > >
> > > Will be sad to see H2 go.  The original nifi flowfile repository ran on
> > > H2.  Surprisingly fast and stable actually.  But happy to hear there is
> > > such progress underway.
> > >
> > > Thanks
> > >
> > > On Tue, Oct 10, 2023 at 4:27 PM David Handermann <
> > > exceptionfact...@apache.org> wrote:
> > >
> > > > Joe,
> > > >
> > > > Thanks for initiating the discussion on a 1.24 release. There are a
> > > > handful of new features, plus additional deprecations and dependency
> > > > upgrades that would be very useful to release sooner rather than
> > > > later.
> > > >
> > > > If we are considering a 1.24 release within the next week, then I
> > > > would expect needing a 1.25 release to incorporate some additional
> > > > deprecations around the same time we are ready for a 2.0 milestone
> > > > release version.
> > > >
> > > > I am in the process of implementing a replacement for H2 to store flow
> > > > configuration history, and I plan to have a pull request ready in the
> > > > next few days. There will be some additional work to support migration
> > > > on version 1 support branch.
> > > >
> > > > If we want to move forward with a 1.24 release within a week or so,
> > > > then these changes could be targeted for 1.25. If we want to wait a
> > > > few weeks, then I could see this being incorporated in 1.24.
> > > >
> > > > Regards,
> > > > David Handermann
> > > >
> > > > On Tue, Oct 10, 2023 at 6:14 PM Mark Payne 
> > wrote:
> > > > >
> > > > > Thanks Joe. I wouldn’t argue against doing a 1.24 release.
> > > > >
> > > > > Thanks
> > > > > -Mark
> > > > >
> > > > > > On Oct 10, 2023, at 7:01 PM, Joe Witt  wrote:
> > > > > >
> > > > > > Team,
> > > > > >
> > > > > > We have plenty of bits out there to kick a release.  I'm happy to
> > RM
> > > > this
> > > > > > if nobody else wants to.
> > > > > >
> > > > > > Had a user on Slack ask for it.
> > > > > >
> > > > > > Thanks
> > > > > > Joe
> > > > >
> > > >
> >


Re: discuss NiFi 1.24 release

2023-10-12 Thread Mike Thomsen
When H2 goes, what will the upgrade path look like?

On Tue, Oct 10, 2023 at 7:57 PM David Handermann <
exceptionfact...@apache.org> wrote:

> Joe,
>
> Thanks for the reply, that sounds good.
>
> For reference, here is the Jira issue for tracking the initial
> implementation:
>
> https://issues.apache.org/jira/browse/NIFI-12206
>
> Regards,
> David Handermann
>
> On Tue, Oct 10, 2023 at 6:53 PM Joe Witt  wrote:
> >
> > David
> >
> > I think we can hold off for a few weeks - I'll respond to the Slack
> message
> > on that.
> >
> > Will be sad to see H2 go.  The original nifi flowfile repository ran on
> > H2.  Surprisingly fast and stable actually.  But happy to hear there is
> > such progress underway.
> >
> > Thanks
> >
> > On Tue, Oct 10, 2023 at 4:27 PM David Handermann <
> > exceptionfact...@apache.org> wrote:
> >
> > > Joe,
> > >
> > > Thanks for initiating the discussion on a 1.24 release. There are a
> > > handful of new features, plus additional deprecations and dependency
> > > upgrades that would be very useful to release sooner rather than
> > > later.
> > >
> > > If we are considering a 1.24 release within the next week, then I
> > > would expect needing a 1.25 release to incorporate some additional
> > > deprecations around the same time we are ready for a 2.0 milestone
> > > release version.
> > >
> > > I am in the process of implementing a replacement for H2 to store flow
> > > configuration history, and I plan to have a pull request ready in the
> > > next few days. There will be some additional work to support migration
> > > on version 1 support branch.
> > >
> > > If we want to move forward with a 1.24 release within a week or so,
> > > then these changes could be targeted for 1.25. If we want to wait a
> > > few weeks, then I could see this being incorporated in 1.24.
> > >
> > > Regards,
> > > David Handermann
> > >
> > > On Tue, Oct 10, 2023 at 6:14 PM Mark Payne 
> wrote:
> > > >
> > > > Thanks Joe. I wouldn’t argue against doing a 1.24 release.
> > > >
> > > > Thanks
> > > > -Mark
> > > >
> > > > > On Oct 10, 2023, at 7:01 PM, Joe Witt  wrote:
> > > > >
> > > > > Team,
> > > > >
> > > > > We have plenty of bits out there to kick a release.  I'm happy to
> RM
> > > this
> > > > > if nobody else wants to.
> > > > >
> > > > > Had a user on Slack ask for it.
> > > > >
> > > > > Thanks
> > > > > Joe
> > > >
> > >
>


Re: JOLTTransformRecord problem

2023-10-12 Thread Mark Woodcock
Chris,

I guess I was so wrapped up in the ConvertRecord, it didn't occur to me to
have the JOLT processor just emit a different format; not sure how I missed
that.
So, I re-rigged my flow to have  ReplaceText -> JOLTTransform ->
UpdateAttribute -> etc.   Unfortunately, I got the attached error.

This is running against the set of data that I provided earlier (that is
the test case for the bug).  It's first 3 records all lack dimension and
eta fields (which my ReplaceText process has removed all evidence of);
followed by one record that has record values for both fields.  The error
seems to be saying that those early records (at the time of writing to
Avro) don't have the appropriate format, because they have "null" values
(in the ETA and dimensions fields) which are not valid for the JSON schema
which has been inferred.

Why is it creating a single JSON schema for the whole file instead of
managing each record independently?  (and not that it really matters, but
hey, the short records are coming first, why precompute the schema before
finishing the processing of any record?)

And, is there something else I can try?

thx,

mew




On Wed, Oct 11, 2023 at 4:35 PM Chris Sampson
 wrote:

> You could use an AvroWriter to output the results of the
> JoltTransformRecord - it doesn’t need to be JSON (in or out), that’s one of
> the great things of NiFi’s Record processors - if there’s a Reader and
> Writer in the format you want, you can use that data and the Writer doesn’t
> need to be the same format as the Reader.
>
> Good news: I’ve identified the problem in my NIFI-8135 PR [1] by adding a
> cut-down version of your example as a unit test for the JoltTransformRecord
> processor.
>
> However, I’m not so sure the output is quite what you were expecting - see
> the
> nifi-nar-bundles/nifi-jolt-record-bundle/nifi-jolt-record-processors/src/test/resources/TestJoltTransformRecord/flattenedOutput.json
> file in the linked PR, the “Eta”’s “value” field appears as a Java Map
> serialised as a String, I imagine you were wanting this to be a nested
> Object?
>
> If the latter, I think we’re then running into NIFI-8134 [2], for which I
> have a separate PR ready for review [3].
>
> [1]: https://github.com/apache/nifi/pull/7746/files
>
> [2]: https://issues.apache.org/jira/browse/NIFI-8134
>
> [3]: https://github.com/apache/nifi/pull/7745
>
>
> Cheers,
>
> ---
> Chris Sampson
> IT Consultant
> chris.samp...@naimuri.com
>
>
> > On 11 Oct 2023, at 19:22, Mark Woodcock 
> wrote:
> >
> > Chris,
> >
> > 1) well, reassuring to learn that I've found an actual bug; and pleasing
> to
> > know that I constructed an effective and illuminating test.  hurrah.
> >
> > 2) So, I can certainly use the ReplaceText (is there a better choice?)
> > processor to ditch any field that looks like "whatever": "", (and I
> > successfully implemented it), but unfortunately when I pass the resulting
> > json onto another processor (e.g. a ConvertRecord, so I can spit out
> AVRO),
> > the fact that the data now has different schemas causes an error.  Is
> this
> > just kicking the can down the road?
> >
> > thx,
> >
> > mew
> >
> >
> > On Wed, Oct 11, 2023 at 6:00 AM Chris Sampson
> >  wrote:
> >
> >> FYI - original thread in the archives for reference [1].
> >>
> >> Thanks for your more complete example, this does indeed fail with the
> >> error you indicate. I think it’s related to NIFI-8135 [2], which
> identified
> >> a deficiency in the way Records are converted to Java Maps, particularly
> >> where CHOICE types are involved.
> >>
> >> The example data you’ve provided does indeed have a mix of String and
> >> Record (JSON Object) values for the affected fields - this is a little
> >> unusual, but certainly nothing that’s banned in the world of JSON, so
> >> should probably be handled better by NiFi.
> >>
> >> I’ve had a go at providing a PR for NIFI-8135 (as yet unreviewed) [3].
> I’d
> >> been struggling to re-create the error for the ticket, but I think your
> >> example does it nicely, so provides a good test for whether the problem
> is
> >> fixed - unfortunately, when I run this example data against my branch,
> it
> >> still fails albeit with a different error:
> >>
> >> java.lang.ClassCastException: class java.lang.String cannot be cast to
> >> class org.apache.nifi.serialization.record.Record (java.lang.String is
> in
> >> module java.base of loader 'bootstrap';
> >> org.apache.nifi.serialization.record.Record is in unnamed module of
> loader
> >> org.apache.nifi.nar.NarClassLoader @4b3ad7ca)
> >>at
> >>
> org.apache.nifi.serialization.record.util.DataTypeUtils.convertRecordFieldtoObject(DataTypeUtils.java:893)
> >>at
> >>
> org.apache.nifi.processors.jolt.record.JoltTransformRecord.transform(JoltTransformRecord.java:425)
> >>...
> >>
> >> So it seems there’s a little more debugging and work to do for NIFI-8135
> >> yet,
> >>
> >> One way of you working around this in your example would be to remove
> >> empty