John,

It sounds like you've got this working, which is great, so thanks for following 
up and sharing
the transformation that you used. I'm curious, though, if you looked into using 
PartitionRecord
instead of the Split -> JsonPath -> Merge path. PartitionRecord should make 
that simpler and
will be far better performance-wise.You may still need a MergeRecord processor 
if your intent
is to merge several FlowFiles into a larger chunk, as opposed to just 
re-combining them, but
in any case you should get better performance from PartitionRecord.

Thanks
-Mark

> On Apr 12, 2019, at 3:17 PM, John McGinn <[email protected]> wrote:
> 
> So the solution I came up with was a Jolt Transformation as follows:
> 
> [
>  {
>    "operation": "shift",
>    "spec": {
>      "*": {
>        "state": {
>          "*": {
>            "@2": "&1[]"
>          }
>        }
>      }
>    }
>  },
>  {
>    "operation": "shift",
>    "spec": {
>      "*": {
>        "0": {
>          "state": "state",
>          "city": "city[0]"
>        },
>        "*": {
>          "city": "city[&1]"
>        }
>      }
>    }
>  }
> ]
> 
> This occurs after I've split and joined JSON files by state. So, 1 flowfile 
> for California, 1 for Arizona, etc. The Jolt Transformation has 2 steps. The 
> first one create a JSON map with the key being the name of the state, and the 
> value being an array of original state/city objects. The second portion goes 
> through each key (which is 1 per flowfile), and the first record in the array 
> of original items gets the state, and starts the city array, and subsequent 
> records add to the city array.
> 
> Since I had already had SplitRecord, EvaluateJsonPath and MergeRecord in 
> place before the JoltTransformJSON, I didn't go any further. I believe that I 
> could probably replace the 3 processors with a ConvertRecord to make a JSON 
> object, and then expand the Jolt transformation to handle the splitting by 
> states within it. As well, I subsequently have a MergeRecord processor after 
> the JoltTransform that could probably go away if I were to go that extra step.
> 
> John
> 
> --------------------------------------------
> On Fri, 4/12/19, Bryan Bende <[email protected]> wrote:
> 
> Subject: Re: Merge identical JSON records to single JSON with subarray
> To: [email protected]
> Date: Friday, April 12, 2019, 10:36 AM
> 
> Hello,
> 
> I think you will likely need to use a JOLT
> transform to perform this
> operation. I
> don't know JOLT well enough to suggest the correct
> operation, so maybe others can help there.
> 
> ConvertRecord is more about
> converting between formats like using a
> JSON
> reader and a CSV writer where the schema hasn't changed,
> or using
> forward/backward compatible schemas
> like read with a schema that has 4
> fields
> and write with a schema that has 2 fields, or read with a
> schema that has 4 fields and write with a
> schema that has 6 fields
> where the new
> fields have defaults.
> 
> In
> your situation you are actually trying to change the schema
> and
> manipulate the structure and there is no
> way for ConvertRecord to
> understand how to
> go from your first schema to your second schema.
> JOLT does have these kinds of operations
> though.
> 
> -Bryan
> 
> On Fri, Apr 12, 2019 at 10:24 AM John McGinn
> <[email protected]>
> wrote:
>> 
>> I've
> looked around and haven't found a solution, or a denial
> that it can occur. Most items I've seen relate to 2 JSON
> files that do not have the same format which people want to
> merge.
>> 
>> My
> situation is different. Let's say that I have state/city
> JSON model
> {"state":"California","city":"Sacramento"}
> and I can have multiple records for a single state, but with
> different cities.
>> 
>> 
> {"name":"StateCityList"
>> , "namespace":
> "nifi.statecitylist"
>> ,
> "type": "record"
>> ,
> "fields": [
>>    {
> "name": "state", "type":
> "string" }
>> , {
> "name": "city",
> "type":"string" }
>> 
> ]
>> }
>> 
>> I can use EvaluateJsonPath to put the
> state in as an attribute, and then Merge Content to get a
> single FlowFile per state. This will produce a FlowFile with
> multiple lines of that JSON model. What I've been trying
> to do is then send that into a ConvertRecord that takes that
> JSON model as the input reader and use an array as the
> sub-object to get an array of cities.
>> 
>> 
> {"name":"StateCityList"
>> , "namespace":
> "nifi.statecitylist"
>> ,
> "type": "record"
>> ,
> "fields": [
>>    {
> "name": "state", "type":
> "string" }
>> , {
> "name": "city", "type": {
> "type": "array", "items":
> ["null","string"] } }
>> ]
>> }
>> 
>> When I do this I get
> an error with "Cannot convert value [Sacramento] of
> type java.lang.String to Object Array for field city."
> Is there something wrong with my Avro schemas? Or is this
> just something that ConvertRecord (or MergeRecord) can
> do?
>> 
>> Regarding the
> other things I've found relating to merging 2 separate
> JSON objects, were using JoltTransformJSON or ExecuteScript
> with Groovy. Is this the same situation for my scenario?
>> 
>> Thanks,
>> John McGinn

Reply via email to