Re: "Flatten" JSON

2017-09-15 Thread Kevin Doran
+1 for adding a FlattenRecord processor. I can think of a few scenarios in 
which it would be quite useful, and it would be convenient if it could be 
accomplished without JOLT.

Thanks,
Kevin

On 9/15/17, 09:16, "Nicholas Hughes"  wrote:

Mark,

I'm definitely for making the processor as generic as possible. I don't
mind chaining together a few simple processors to get a job done (such as
convert JSON to Avro > infer schema > flatten records)... I just don't want
steps get super complex... and the Jolt Transform processor does seem very
powerful and very complex.

If there's some support for a "FlattenRecord" processor, I can submit the
Jira containing the meat of this thread.

-Nick


On Fri, Sep 15, 2017 at 9:01 AM, Mark Payne  wrote:

> Nick,
>
> I do believe that there's a way to do what you're asking with Jolt,
> without knowing any kind of schema.
> That said, Jolt can get complex pretty quickly and I don't know it well
> :)  Personally, I have no problem with having a
> FlattenRecord processor. I guess the question here, though, is are you
> using Record-oriented processors,
> or are you using JSON-specific processors?
>
> Personally, I'd like to see a FlattenRecord processor, rather than
> FlattenJSON, because that would allow
> the transformation to apply to Avro as well (and as soon as we get an XML
> reader built, XML also). However,
> the Record-oriented processors would expect that a schema be given (though
> it could also be inferred using
> another existing processor).
>
> -Mark
>
>
>
> > On Sep 15, 2017, at 7:43 AM, Nicholas Hughes <
> nicholasmhughes.n...@gmail.com> wrote:
> >
> > Is there an easy way to "flatten" arbitrary JSON within NiFi?
> >
> > For input data like that shown below from Yahoo [1]
> >
> > {
> >  "query": {
> >"count": 1,
> >"created": "2017-09-15T11:20:26Z",
> >"lang": "en-US",
> >"results": {
> >  "channel": {
> >"item": {
> >  "condition": {
> >"code": "33",
> >"date": "Fri, 15 Sep 2017 06:00 AM EDT",
> >"temp": "63",
> >"text": "Mostly Clear"
> >  }
> >}
> >  }
> >}
> >  }
> > }
> >
> >
> > ...I'd like to end up with output something like this:
> >
> > {
> >  "query.count": 1,
> >  "query.created": "2017-09-15T11:20:26Z",
> >  "query.lang": "en-US",
> >  "query.results.channel.item.condition.code": "33",
> >  "query.results.channel.item.condition.date": "Fri, 15 Sep 2017 06:00
> AM EDT",
> >  "query.results.channel.item.condition.temp": "63",
> >  "query.results.channel.item.condition.text": "Mostly Clear"
> > }
> >
> >
> > I checked out the JoltTransformJSON processor and some examples, such as
> > the nested data to "prefix soup" demo [2], but it seems as though I need
> to
> > enter information about the schema for the incoming data in order to
> > transform it. Ideally, I'd like to have a processor "just figure it out"
> > without explicit entry of a schema.
> >
> > Is there any way to accomplish this in a generic way with
> JoltTransformJSON
> > (or another native processor)?
> >
> > If not, would a ticket requesting a "Field Flattener" processor much 
like
> > the one included in StreamSets Data Collector [3] be worthwhile?
> >
> > Thanks in advance!
> >
> > -Nick
> >
> >
> > [1]
> > https://query.yahooapis.com/v1/public/yql?q=select%20item.
> condition%20from%20weather.forecast%20where%20woeid%20%
> 3D%202383558=json=store%3A%2F%2Fdatatables.org%
> 2Falltableswithkeys
> >
> > [2] http://jolt-demo.appspot.com/#bucketToPrefixSoup
> >
> > [3]
> > https://github.com/streamsets/datacollector/tree/master/
> basic-lib/src/main/java/com/streamsets/pipeline/stage/
> processor/fieldflattener
>
>





Re: "Flatten" JSON

2017-09-15 Thread Nicholas Hughes
Mark,

I'm definitely for making the processor as generic as possible. I don't
mind chaining together a few simple processors to get a job done (such as
convert JSON to Avro > infer schema > flatten records)... I just don't want
steps get super complex... and the Jolt Transform processor does seem very
powerful and very complex.

If there's some support for a "FlattenRecord" processor, I can submit the
Jira containing the meat of this thread.

-Nick


On Fri, Sep 15, 2017 at 9:01 AM, Mark Payne  wrote:

> Nick,
>
> I do believe that there's a way to do what you're asking with Jolt,
> without knowing any kind of schema.
> That said, Jolt can get complex pretty quickly and I don't know it well
> :)  Personally, I have no problem with having a
> FlattenRecord processor. I guess the question here, though, is are you
> using Record-oriented processors,
> or are you using JSON-specific processors?
>
> Personally, I'd like to see a FlattenRecord processor, rather than
> FlattenJSON, because that would allow
> the transformation to apply to Avro as well (and as soon as we get an XML
> reader built, XML also). However,
> the Record-oriented processors would expect that a schema be given (though
> it could also be inferred using
> another existing processor).
>
> -Mark
>
>
>
> > On Sep 15, 2017, at 7:43 AM, Nicholas Hughes <
> nicholasmhughes.n...@gmail.com> wrote:
> >
> > Is there an easy way to "flatten" arbitrary JSON within NiFi?
> >
> > For input data like that shown below from Yahoo [1]
> >
> > {
> >  "query": {
> >"count": 1,
> >"created": "2017-09-15T11:20:26Z",
> >"lang": "en-US",
> >"results": {
> >  "channel": {
> >"item": {
> >  "condition": {
> >"code": "33",
> >"date": "Fri, 15 Sep 2017 06:00 AM EDT",
> >"temp": "63",
> >"text": "Mostly Clear"
> >  }
> >}
> >  }
> >}
> >  }
> > }
> >
> >
> > ...I'd like to end up with output something like this:
> >
> > {
> >  "query.count": 1,
> >  "query.created": "2017-09-15T11:20:26Z",
> >  "query.lang": "en-US",
> >  "query.results.channel.item.condition.code": "33",
> >  "query.results.channel.item.condition.date": "Fri, 15 Sep 2017 06:00
> AM EDT",
> >  "query.results.channel.item.condition.temp": "63",
> >  "query.results.channel.item.condition.text": "Mostly Clear"
> > }
> >
> >
> > I checked out the JoltTransformJSON processor and some examples, such as
> > the nested data to "prefix soup" demo [2], but it seems as though I need
> to
> > enter information about the schema for the incoming data in order to
> > transform it. Ideally, I'd like to have a processor "just figure it out"
> > without explicit entry of a schema.
> >
> > Is there any way to accomplish this in a generic way with
> JoltTransformJSON
> > (or another native processor)?
> >
> > If not, would a ticket requesting a "Field Flattener" processor much like
> > the one included in StreamSets Data Collector [3] be worthwhile?
> >
> > Thanks in advance!
> >
> > -Nick
> >
> >
> > [1]
> > https://query.yahooapis.com/v1/public/yql?q=select%20item.
> condition%20from%20weather.forecast%20where%20woeid%20%
> 3D%202383558=json=store%3A%2F%2Fdatatables.org%
> 2Falltableswithkeys
> >
> > [2] http://jolt-demo.appspot.com/#bucketToPrefixSoup
> >
> > [3]
> > https://github.com/streamsets/datacollector/tree/master/
> basic-lib/src/main/java/com/streamsets/pipeline/stage/
> processor/fieldflattener
>
>


Re: "Flatten" JSON

2017-09-15 Thread Mark Payne
Nick,

I do believe that there's a way to do what you're asking with Jolt, without 
knowing any kind of schema.
That said, Jolt can get complex pretty quickly and I don't know it well :)  
Personally, I have no problem with having a
FlattenRecord processor. I guess the question here, though, is are you using 
Record-oriented processors,
or are you using JSON-specific processors?

Personally, I'd like to see a FlattenRecord processor, rather than FlattenJSON, 
because that would allow
the transformation to apply to Avro as well (and as soon as we get an XML 
reader built, XML also). However,
the Record-oriented processors would expect that a schema be given (though it 
could also be inferred using
another existing processor).

-Mark



> On Sep 15, 2017, at 7:43 AM, Nicholas Hughes  
> wrote:
> 
> Is there an easy way to "flatten" arbitrary JSON within NiFi?
> 
> For input data like that shown below from Yahoo [1]
> 
> {
>  "query": {
>"count": 1,
>"created": "2017-09-15T11:20:26Z",
>"lang": "en-US",
>"results": {
>  "channel": {
>"item": {
>  "condition": {
>"code": "33",
>"date": "Fri, 15 Sep 2017 06:00 AM EDT",
>"temp": "63",
>"text": "Mostly Clear"
>  }
>}
>  }
>}
>  }
> }
> 
> 
> ...I'd like to end up with output something like this:
> 
> {
>  "query.count": 1,
>  "query.created": "2017-09-15T11:20:26Z",
>  "query.lang": "en-US",
>  "query.results.channel.item.condition.code": "33",
>  "query.results.channel.item.condition.date": "Fri, 15 Sep 2017 06:00 AM EDT",
>  "query.results.channel.item.condition.temp": "63",
>  "query.results.channel.item.condition.text": "Mostly Clear"
> }
> 
> 
> I checked out the JoltTransformJSON processor and some examples, such as
> the nested data to "prefix soup" demo [2], but it seems as though I need to
> enter information about the schema for the incoming data in order to
> transform it. Ideally, I'd like to have a processor "just figure it out"
> without explicit entry of a schema.
> 
> Is there any way to accomplish this in a generic way with JoltTransformJSON
> (or another native processor)?
> 
> If not, would a ticket requesting a "Field Flattener" processor much like
> the one included in StreamSets Data Collector [3] be worthwhile?
> 
> Thanks in advance!
> 
> -Nick
> 
> 
> [1]
> https://query.yahooapis.com/v1/public/yql?q=select%20item.condition%20from%20weather.forecast%20where%20woeid%20%3D%202383558=json=store%3A%2F%2Fdatatables.org%2Falltableswithkeys
> 
> [2] http://jolt-demo.appspot.com/#bucketToPrefixSoup
> 
> [3]
> https://github.com/streamsets/datacollector/tree/master/basic-lib/src/main/java/com/streamsets/pipeline/stage/processor/fieldflattener



"Flatten" JSON

2017-09-15 Thread Nicholas Hughes
Is there an easy way to "flatten" arbitrary JSON within NiFi?

For input data like that shown below from Yahoo [1]

{
  "query": {
"count": 1,
"created": "2017-09-15T11:20:26Z",
"lang": "en-US",
"results": {
  "channel": {
"item": {
  "condition": {
"code": "33",
"date": "Fri, 15 Sep 2017 06:00 AM EDT",
"temp": "63",
"text": "Mostly Clear"
  }
}
  }
}
  }
}


...I'd like to end up with output something like this:

{
  "query.count": 1,
  "query.created": "2017-09-15T11:20:26Z",
  "query.lang": "en-US",
  "query.results.channel.item.condition.code": "33",
  "query.results.channel.item.condition.date": "Fri, 15 Sep 2017 06:00 AM EDT",
  "query.results.channel.item.condition.temp": "63",
  "query.results.channel.item.condition.text": "Mostly Clear"
}


I checked out the JoltTransformJSON processor and some examples, such as
the nested data to "prefix soup" demo [2], but it seems as though I need to
enter information about the schema for the incoming data in order to
transform it. Ideally, I'd like to have a processor "just figure it out"
without explicit entry of a schema.

Is there any way to accomplish this in a generic way with JoltTransformJSON
(or another native processor)?

If not, would a ticket requesting a "Field Flattener" processor much like
the one included in StreamSets Data Collector [3] be worthwhile?

Thanks in advance!

-Nick


[1]
https://query.yahooapis.com/v1/public/yql?q=select%20item.condition%20from%20weather.forecast%20where%20woeid%20%3D%202383558=json=store%3A%2F%2Fdatatables.org%2Falltableswithkeys

[2] http://jolt-demo.appspot.com/#bucketToPrefixSoup

[3]
https://github.com/streamsets/datacollector/tree/master/basic-lib/src/main/java/com/streamsets/pipeline/stage/processor/fieldflattener


RE: Want to contribute Jira[NIFI-4360]

2017-09-15 Thread Milan Chandna
Sorry if you received this mail twice.

Currently I'm working on the feature but while running full build (mvn 
-Pcontrib-check clean install) at NIFI root, as suggested in Contribution 
Guide, the build is failing in other module in Test scope.
I am certain it’s not in my module as it’s coming in master branch as well. And 
I am hitting it consistently always.
Pasted the error stack in the end of mail.

I tried searching about it and figured out there is a similar issue reported 
and it’s still unresolved.
Also have read somewhere that it is OK to skip it.
But that post is little older (around May), so is this still an issue?
So should I just skip this? 
What is the flag to skip only single test and not all?


Running org.wali.TestMinimalLockingWriteAheadLog
Tests run: 12, Failures: 0, Errors: 1, Skipped: 2, Time elapsed: 7.901 sec <<< 
FAILURE! - in org.wali.TestMinimalLockingWriteAheadLog
testRecoverFileThatHasTrailingNULBytesAndTruncation(org.wali.TestMinimalLockingWriteAheadLog)
  Time elapsed: 0.015 sec  <<< ERROR!
java.nio.channels.OverlappingFileLockException: null
at sun.nio.ch.SharedFileLockTable.checkList(FileLockTable.java:255)
at sun.nio.ch.SharedFileLockTable.add(FileLockTable.java:152)
at sun.nio.ch.FileChannelImpl.lock(FileChannelImpl.java:1063)
at java.nio.channels.FileChannel.lock(FileChannel.java:1053)
at 
org.wali.MinimalLockingWriteAheadLog.(MinimalLockingWriteAheadLog.java:187)
at 
org.wali.MinimalLockingWriteAheadLog.(MinimalLockingWriteAheadLog.java:108)
at 
org.wali.TestMinimalLockingWriteAheadLog.testRecoverFileThatHasTrailingNULBytesAndTruncation(TestMinimalLockingWriteAheadLog.java:472)


Results :


Tests in error:
  
TestMinimalLockingWriteAheadLog.testRecoverFileThatHasTrailingNULBytesAndTruncation:472
 ╗ OverlappingFileLock

-Original Message-
From: Milan Chandna [mailto:milan.chan...@microsoft.com.INVALID] 
Sent: Friday, September 15, 2017 12:45 AM
To: dev@nifi.apache.org
Subject: RE: Want to contribute Jira[NIFI-4360]

Hi Aldrin,

I'm able to assign the issue to myself.
Thanks for your help.
Appreciate the quick replies.

Regards,
Milan.

-Original Message-
From: Aldrin Piri [mailto:aldrinp...@gmail.com]
Sent: Thursday, September 14, 2017 11:16 PM
To: dev 
Subject: Re: Want to contribute Jira[NIFI-4360]

Hi Milan,

I have added your JIRA account as a contributor to the NIFI project. You should 
be able to assign issues to yourself.  Please let us know if there are any 
problems.

On Thu, Sep 14, 2017 at 6:38 AM, Milan Chandna < 
milan.chan...@microsoft.com.invalid> wrote:

> Hi Andre,
>
> NIFI-4360 is about ADLS(Azure Data Lake Store) support whereas
> NIFI-1922 is about WASB(Azure Blob Storage also called as Windows 
> Azure Storage Blob) support.
>
> They both are different kind of repositories, meant for different purposes.
> Also the user set for both technologies might be overlapping but not same.
> So we need to support ADLS as well.
> To know more about the difference, please refer to below mentioned 
> Microsoft official link.
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.
> microsoft.com%2Fen-us%2Fazure%2Fdata-lake-store%2Fdata-=02%7C01%7
> CMilan.Chandna%40microsoft.com%7C410163d9c56e4fed023408d4fb98902e%7C72
> f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636410080054933260=Jh83
> tc6ip3LCEhS5iPwJfYh0uYAiFbN6oUq8EFLetIU%3D=0
> lake-store-comparison-with-blob-storage
>
> Regards,
> Milan.
>
> -Original Message-
> From: Andre [mailto:andre-li...@fucs.org]
> Sent: Thursday, September 14, 2017 10:52 AM
> To: dev@nifi.apache.org
> Subject: Re: Want to contribute Jira[NIFI-4360]
>
> Milan,
>
> Would you be able to comment on the differences between NIFI-4360 and
> NIFI-1922 ?
>
> Cheers
>
> On Thu, Sep 14, 2017 at 1:00 PM, Milan Chandna < 
> milan.chan...@microsoft.com.invalid> wrote:
>
> > Hi,
> >
> > Can admins please make me a contributor so that I can assign 
> > Jira[NIFI-4360] > tt
> > ps%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FNIFI-4360=02%7C0
> > 1%
> > 7CMilan.Chandna%40microsoft.com%7C3111d807fbd242266c0008d4fb307f28%7
> > C7
> > 2f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636409633098059764=3
> > mo 1IncHZfHyQ4YjFX70ZbvJsVczIFSHRG3NUUtEuo0%3D=0> (that I
> > raised) to myself.
> >
> > Regards,
> > Milan.
> >
>


Error while building in TestMinimalLockingWriteAheadLog

2017-09-15 Thread Milan Chandna
Currently working on one of the feature but while running full build (mvn 
-Pcontrib-check clean install) at NIFI root, as suggested in Contribution 
Guide, the build is failing in other module in Test scope.
I am certain it's not in my module as it's coming in master branch as well. And 
I am hitting it consistently always.
Pasted the error stack in the end of mail.

I tried searching about it and figured out there is a similar issue reported 
and it's still unresolved.
Also have read somewhere that it is OK to skip it.
But that post is little older (around May), so is this still an issue?
So should I just skip this?
What is the flag to skip only single test and not all?


Running org.wali.TestMinimalLockingWriteAheadLog
Tests run: 12, Failures: 0, Errors: 1, Skipped: 2, Time elapsed: 7.901 sec <<< 
FAILURE! - in org.wali.TestMinimalLockingWriteAheadLog
testRecoverFileThatHasTrailingNULBytesAndTruncation(org.wali.TestMinimalLockingWriteAheadLog)
  Time elapsed: 0.015 sec  <<< ERROR!
java.nio.channels.OverlappingFileLockException: null
at sun.nio.ch.SharedFileLockTable.checkList(FileLockTable.java:255)
at sun.nio.ch.SharedFileLockTable.add(FileLockTable.java:152)
at sun.nio.ch.FileChannelImpl.lock(FileChannelImpl.java:1063)
at java.nio.channels.FileChannel.lock(FileChannel.java:1053)
at 
org.wali.MinimalLockingWriteAheadLog.(MinimalLockingWriteAheadLog.java:187)
at 
org.wali.MinimalLockingWriteAheadLog.(MinimalLockingWriteAheadLog.java:108)
at 
org.wali.TestMinimalLockingWriteAheadLog.testRecoverFileThatHasTrailingNULBytesAndTruncation(TestMinimalLockingWriteAheadLog.java:472)


Results :


Tests in error:
  
TestMinimalLockingWriteAheadLog.testRecoverFileThatHasTrailingNULBytesAndTruncation:472
 ╗ OverlappingFileLock


Re: NIFI templates in template folder in sync with templates inside NIFI UI

2017-09-15 Thread 尹文才
Thanks Matt, I could make use of the rest api to automate the template sync
work.

2017-09-15 1:00 GMT+08:00 Matt Gilman :

> Ben,
>
> In the 0.x baseline, the templates were stored in the templates directory
> that you're referring to. Starting in the 1.x baseline, the templates were
> migrated to become part of the flow.xml.gz. In order to support users
> upgrading from 0.x to 1.x, templates in the directory are automatically
> moved to the flow.xml.gz. If you're looking to automate template
> importing/removal, I would recommend using the REST API. If you open up
> your browser you should be able to see these requests in action.
> Additionally, you can find documentation for them here [1]. The
> import/upload endpoints are under Process Groups and the removal endpoints
> are under Templates.
>
> Thanks
>
> Matt
>
> [1] https://nifi.apache.org/docs/nifi-docs/rest-api/index.html
>
> On Thu, Sep 14, 2017 at 2:40 AM, 尹文才  wrote:
>
> > Hi guys, I put all my flow template xml files inside the nifi template
> > directory specified inside the nifi.properties config file so that nifi
> > could read in all my templates. But sometimes I need to clear all the
> > templates currently inside nifi and then place in a new template, I would
> > remove all the template xml files inside the template folder and put the
> > new template xml file into it. But when I restart nifi, there're 5
> > templates available, which are 4 old templates and 1 new template.
> >
> > I know I could manually remove all templates inside nifi, but I wish to
> do
> > this all programmatically. I also know nifi keeps the templates inside
> the
> > flow.xml.gz file.
> > So does nifi simply adds the new templates in the template folder into
> the
> > flow.xml.gz file without removing the old ones?
> >
> > I have used beyond compare trying to compare the template xml file
> content
> > with the template section inside the flow.xml extracted from flow.xml.gz,
> > they're slightly differently and most content are the same, does anyone
> > know the relationship between these two?
> >
> > What I want to achieve is to keep the templates in the template folder
> and
> > the templates in nifi in sync without having to manually  do anything in
> > nifi UI, does
> > anyone know if it's possible and if so how? Thanks.
> >
> > /Ben
> >
>