Re: nifi unique json field list
Thanks Matt and Boris for your comments/suggestions. I came to the same conclusion as Boris yesterday and put together a Groovy script. After I merge all the records into one flowfile, the Groovy script iterates through and comes up with a complete unique set of fields. I was hoping there was a solution using distributed cache or something similar, but I couldn't find a way to iterate through all keys/values in the cache without first knowing the names of the keys. Any thought's about that approach? Thanks again, Scott On 03/12/2018 07:26 AM, Matt Burgess wrote: Scott, I'm assuming you're trying to create a single (possibly sparse) table for all the incoming JSON records, and that you do not know the total set of fields ahead of time. If you do know the fields ahead of time, you can create the table ahead of time and use the JoltTransformJSON processor to add default values for any missing fields in any records. You may also be able to do this with ConvertRecord. If you do not know the total set of fields ahead of time, how will you know you have "enough" records in the flow whose fields fully define the table? If you won't ever know for sure that your flow contains records whose values fully define the table, you'd have to try to add missing columns as they are encountered (see [1] for SQL Server commands to do this). That can get messy, you'd have to pull the field names out into attributes and use PutSQL's SQL Statement property (added in NiFi 1.5.0) with NiFi Expression Language to execute the SQL command for adding a column if it doesn't exist. Otherwise you'd have to use Wait/Notify or some other barrier stuff to ensure the column was added before a record with that field is pushed to the database. Basically it's very difficult to do if you don't know all the possible table columns up front. If you are consuming from a Web service, I would hope their API at least documents all possible fields, and you could create a table from that. Regards, Matt [1] https://stackoverflow.com/questions/8870802/add-a-column-to-a-table-if-it-does-not-already-exist On Sat, Mar 10, 2018 at 7:18 PM, scottwrote: Hello NiFi community, I have a problem I've been working on that I was hoping to get some suggestions for a solution. I'm ingesting a dump of json records from a web service periodically. The json sometimes comes with some fields missing. My goal is to insert the data into a SQLServer DB after dynamically creating the table based on the fields in the json records. The problem is that when I use the list of fields from a record that is missing some fields, I get errors when I try to insert. How do I take all the fields from all records in my dump and then create a unique list of all fields for me to create the table with? Thanks, Scott
Re: nifi unique json field list
Scott, I'm assuming you're trying to create a single (possibly sparse) table for all the incoming JSON records, and that you do not know the total set of fields ahead of time. If you do know the fields ahead of time, you can create the table ahead of time and use the JoltTransformJSON processor to add default values for any missing fields in any records. You may also be able to do this with ConvertRecord. If you do not know the total set of fields ahead of time, how will you know you have "enough" records in the flow whose fields fully define the table? If you won't ever know for sure that your flow contains records whose values fully define the table, you'd have to try to add missing columns as they are encountered (see [1] for SQL Server commands to do this). That can get messy, you'd have to pull the field names out into attributes and use PutSQL's SQL Statement property (added in NiFi 1.5.0) with NiFi Expression Language to execute the SQL command for adding a column if it doesn't exist. Otherwise you'd have to use Wait/Notify or some other barrier stuff to ensure the column was added before a record with that field is pushed to the database. Basically it's very difficult to do if you don't know all the possible table columns up front. If you are consuming from a Web service, I would hope their API at least documents all possible fields, and you could create a table from that. Regards, Matt [1] https://stackoverflow.com/questions/8870802/add-a-column-to-a-table-if-it-does-not-already-exist On Sat, Mar 10, 2018 at 7:18 PM, scottwrote: > Hello NiFi community, > > I have a problem I've been working on that I was hoping to get some > suggestions for a solution. I'm ingesting a dump of json records from a web > service periodically. The json sometimes comes with some fields missing. My > goal is to insert the data into a SQLServer DB after dynamically creating > the table based on the fields in the json records. The problem is that when > I use the list of fields from a record that is missing some fields, I get > errors when I try to insert. How do I take all the fields from all records > in my dump and then create a unique list of all fields for me to create the > table with? > > > Thanks, > > Scott >
Re: nifi unique json field list
Hi Scott, I am not sure why you would create a new table per file - that does not seem like a good idea to me honestly. How are going to manage these tables then? IMHO if you are on SQL Server 2016, you may consider storing json natively in SQL Server and then use various Json functions. The huge benefit of this approach is that you do not have to model data upfront and you won't lose original data in case you mess things up (and we all do). Normally you would want to retain your original raw data so you can reprocess it. But if you want to create a new table per file using totally dynamic and changing list of columns, I would probably use a custom Groovy script to figure out your input file structure and then build CREATE TABLE statement snippet for every file and INSERT snippet. Record schema in NiFi assumes a known schema upfront so I do not think it is a good use of this feature for your dynamic use case. Here is a good post that shows how to work with Json http://funnifi.blogspot.com/2016/02/executescript-json-to-json-conversion.html I found Groovy very easy to learn so do not be afraid of it if it is something you have not used before. You can also use other languages to do a similar thing but Groovy is a very compact language and has good performance in NiFi, since it is JVM based language. Boris On Sat, Mar 10, 2018 at 7:18 PM, scottwrote: > Hello NiFi community, > > I have a problem I've been working on that I was hoping to get some > suggestions for a solution. I'm ingesting a dump of json records from a web > service periodically. The json sometimes comes with some fields missing. My > goal is to insert the data into a SQLServer DB after dynamically creating > the table based on the fields in the json records. The problem is that when > I use the list of fields from a record that is missing some fields, I get > errors when I try to insert. How do I take all the fields from all records > in my dump and then create a unique list of all fields for me to create the > table with? > > > Thanks, > > Scott > >