Re: nifi unique json field list

2018-03-12 Thread scott

Thanks Matt and Boris for your comments/suggestions.

I came to the same conclusion as Boris yesterday and put together a 
Groovy script. After I merge all the records into one flowfile, the 
Groovy script iterates through and comes up with a complete unique set 
of fields.


I was hoping there was a solution using distributed cache or something 
similar, but I couldn't find a way to iterate through all keys/values in 
the cache without first knowing the names of the keys. Any thought's 
about that approach?



Thanks again,

Scott


On 03/12/2018 07:26 AM, Matt Burgess wrote:

Scott,

I'm assuming you're trying to create a single (possibly sparse) table
for all the incoming JSON records, and that you do not know the total
set of fields ahead of time. If you do know the fields ahead of time,
you can create the table ahead of time and use the JoltTransformJSON
processor to add default values for any missing fields in any records.
You may also be able to do this with ConvertRecord.

If you do not know the total set of fields ahead of time, how will you
know you have "enough" records in the flow whose fields fully define
the table? If you won't ever know for sure that your flow contains
records whose values fully define the table, you'd have to try to add
missing columns as they are encountered (see [1] for SQL Server
commands to do this).  That can get messy, you'd have to pull the
field names out into attributes and use PutSQL's SQL Statement
property (added in NiFi 1.5.0) with NiFi Expression Language to
execute the SQL command for adding a column if it doesn't exist.
Otherwise you'd have to use Wait/Notify or some other barrier stuff to
ensure the column was added before a record with that field is pushed
to the database.

Basically it's very difficult to do if you don't know all the possible
table columns up front. If you are consuming from a Web service, I
would hope their API at least documents all possible fields, and you
could create a table from that.

Regards,
Matt

[1] 
https://stackoverflow.com/questions/8870802/add-a-column-to-a-table-if-it-does-not-already-exist


On Sat, Mar 10, 2018 at 7:18 PM, scott  wrote:

Hello NiFi community,

I have a problem I've been working on that I was hoping to get some
suggestions for a solution. I'm ingesting a dump of json records from a web
service periodically. The json sometimes comes with some fields missing. My
goal is to insert the data into a SQLServer DB after dynamically creating
the table based on the fields in the json records. The problem is that when
I use the list of fields from a record that is missing some fields, I get
errors when I try to insert. How do I take all the fields from all records
in my dump and then create a unique list of all fields for me to create the
table with?


Thanks,

Scott





Re: nifi unique json field list

2018-03-12 Thread Matt Burgess
Scott,

I'm assuming you're trying to create a single (possibly sparse) table
for all the incoming JSON records, and that you do not know the total
set of fields ahead of time. If you do know the fields ahead of time,
you can create the table ahead of time and use the JoltTransformJSON
processor to add default values for any missing fields in any records.
You may also be able to do this with ConvertRecord.

If you do not know the total set of fields ahead of time, how will you
know you have "enough" records in the flow whose fields fully define
the table? If you won't ever know for sure that your flow contains
records whose values fully define the table, you'd have to try to add
missing columns as they are encountered (see [1] for SQL Server
commands to do this).  That can get messy, you'd have to pull the
field names out into attributes and use PutSQL's SQL Statement
property (added in NiFi 1.5.0) with NiFi Expression Language to
execute the SQL command for adding a column if it doesn't exist.
Otherwise you'd have to use Wait/Notify or some other barrier stuff to
ensure the column was added before a record with that field is pushed
to the database.

Basically it's very difficult to do if you don't know all the possible
table columns up front. If you are consuming from a Web service, I
would hope their API at least documents all possible fields, and you
could create a table from that.

Regards,
Matt

[1] 
https://stackoverflow.com/questions/8870802/add-a-column-to-a-table-if-it-does-not-already-exist


On Sat, Mar 10, 2018 at 7:18 PM, scott  wrote:
> Hello NiFi community,
>
> I have a problem I've been working on that I was hoping to get some
> suggestions for a solution. I'm ingesting a dump of json records from a web
> service periodically. The json sometimes comes with some fields missing. My
> goal is to insert the data into a SQLServer DB after dynamically creating
> the table based on the fields in the json records. The problem is that when
> I use the list of fields from a record that is missing some fields, I get
> errors when I try to insert. How do I take all the fields from all records
> in my dump and then create a unique list of all fields for me to create the
> table with?
>
>
> Thanks,
>
> Scott
>


Re: nifi unique json field list

2018-03-12 Thread Boris Tyukin
Hi Scott,

I am not sure why you would create a new table per file - that does not
seem like a good idea to me honestly. How are going to manage these tables
then? IMHO if you are on SQL Server 2016, you may consider storing
json natively in SQL Server and then use various Json functions. The huge
benefit of this approach is that you do not have to model data upfront and
you won't lose original data in case you mess things up (and we all do).
Normally you would want to retain your original raw data so you can
reprocess it.

But if you want to create a new table per file using totally dynamic and
changing list of columns,  I would probably use a custom Groovy script to
figure out your input file structure and then build CREATE TABLE statement
snippet for every file and INSERT snippet.

Record schema in NiFi assumes a known schema upfront so I do not think it
is a good use of this feature for your dynamic use case.

Here is a good post that shows how to work with Json
http://funnifi.blogspot.com/2016/02/executescript-json-to-json-conversion.html

I found Groovy very easy to learn so do not be afraid of it if it is
something you have not used before. You can also use other languages to do
a similar thing but Groovy is a very compact language and has good
performance in NiFi, since it is JVM based language.

Boris






On Sat, Mar 10, 2018 at 7:18 PM, scott  wrote:

> Hello NiFi community,
>
> I have a problem I've been working on that I was hoping to get some
> suggestions for a solution. I'm ingesting a dump of json records from a web
> service periodically. The json sometimes comes with some fields missing. My
> goal is to insert the data into a SQLServer DB after dynamically creating
> the table based on the fields in the json records. The problem is that when
> I use the list of fields from a record that is missing some fields, I get
> errors when I try to insert. How do I take all the fields from all records
> in my dump and then create a unique list of all fields for me to create the
> table with?
>
>
> Thanks,
>
> Scott
>
>