So the solution I came up with was a Jolt Transformation as follows:

[
  {
    "operation": "shift",
    "spec": {
      "*": {
        "state": {
          "*": {
            "@2": "&1[]"
          }
        }
      }
    }
  },
  {
    "operation": "shift",
    "spec": {
      "*": {
        "0": {
          "state": "state",
          "city": "city[0]"
        },
        "*": {
          "city": "city[&1]"
        }
      }
    }
  }
]

This occurs after I've split and joined JSON files by state. So, 1 flowfile for 
California, 1 for Arizona, etc. The Jolt Transformation has 2 steps. The first 
one create a JSON map with the key being the name of the state, and the value 
being an array of original state/city objects. The second portion goes through 
each key (which is 1 per flowfile), and the first record in the array of 
original items gets the state, and starts the city array, and subsequent 
records add to the city array.

Since I had already had SplitRecord, EvaluateJsonPath and MergeRecord in place 
before the JoltTransformJSON, I didn't go any further. I believe that I could 
probably replace the 3 processors with a ConvertRecord to make a JSON object, 
and then expand the Jolt transformation to handle the splitting by states 
within it. As well, I subsequently have a MergeRecord processor after the 
JoltTransform that could probably go away if I were to go that extra step.

John

--------------------------------------------
On Fri, 4/12/19, Bryan Bende <[email protected]> wrote:

 Subject: Re: Merge identical JSON records to single JSON with subarray
 To: [email protected]
 Date: Friday, April 12, 2019, 10:36 AM
 
 Hello,
 
 I think you will likely need to use a JOLT
 transform to perform this
 operation. I
 don't know JOLT well enough to suggest the correct
 operation, so maybe others can help there.
 
 ConvertRecord is more about
 converting between formats like using a
 JSON
 reader and a CSV writer where the schema hasn't changed,
 or using
 forward/backward compatible schemas
 like read with a schema that has 4
 fields
 and write with a schema that has 2 fields, or read with a
 schema that has 4 fields and write with a
 schema that has 6 fields
 where the new
 fields have defaults.
 
 In
 your situation you are actually trying to change the schema
 and
 manipulate the structure and there is no
 way for ConvertRecord to
 understand how to
 go from your first schema to your second schema.
 JOLT does have these kinds of operations
 though.
 
 -Bryan
 
 On Fri, Apr 12, 2019 at 10:24 AM John McGinn
 <[email protected]>
 wrote:
 >
 > I've
 looked around and haven't found a solution, or a denial
 that it can occur. Most items I've seen relate to 2 JSON
 files that do not have the same format which people want to
 merge.
 >
 > My
 situation is different. Let's say that I have state/city
 JSON model
 {"state":"California","city":"Sacramento"}
 and I can have multiple records for a single state, but with
 different cities.
 >
 >
 {"name":"StateCityList"
 > , "namespace":
 "nifi.statecitylist"
 > ,
 "type": "record"
 > ,
 "fields": [
 >   {
 "name": "state", "type":
 "string" }
 > , {
 "name": "city",
 "type":"string" }
 >
 ]
 > }
 >
 > I can use EvaluateJsonPath to put the
 state in as an attribute, and then Merge Content to get a
 single FlowFile per state. This will produce a FlowFile with
 multiple lines of that JSON model. What I've been trying
 to do is then send that into a ConvertRecord that takes that
 JSON model as the input reader and use an array as the
 sub-object to get an array of cities.
 >
 >
 {"name":"StateCityList"
 > , "namespace":
 "nifi.statecitylist"
 > ,
 "type": "record"
 > ,
 "fields": [
 >   {
 "name": "state", "type":
 "string" }
 > , {
 "name": "city", "type": {
 "type": "array", "items":
 ["null","string"] } }
 > ]
 > }
 >
 > When I do this I get
 an error with "Cannot convert value [Sacramento] of
 type java.lang.String to Object Array for field city."
 Is there something wrong with my Avro schemas? Or is this
 just something that ConvertRecord (or MergeRecord) can
 do?
 >
 > Regarding the
 other things I've found relating to merging 2 separate
 JSON objects, were using JoltTransformJSON or ExecuteScript
 with Groovy. Is this the same situation for my scenario?
 >
 > Thanks,
 > John McGinn

Reply via email to