So the solution I came up with was a Jolt Transformation as follows:
[
{
"operation": "shift",
"spec": {
"*": {
"state": {
"*": {
"@2": "&1[]"
}
}
}
}
},
{
"operation": "shift",
"spec": {
"*": {
"0": {
"state": "state",
"city": "city[0]"
},
"*": {
"city": "city[&1]"
}
}
}
}
]
This occurs after I've split and joined JSON files by state. So, 1 flowfile for
California, 1 for Arizona, etc. The Jolt Transformation has 2 steps. The first
one create a JSON map with the key being the name of the state, and the value
being an array of original state/city objects. The second portion goes through
each key (which is 1 per flowfile), and the first record in the array of
original items gets the state, and starts the city array, and subsequent
records add to the city array.
Since I had already had SplitRecord, EvaluateJsonPath and MergeRecord in place
before the JoltTransformJSON, I didn't go any further. I believe that I could
probably replace the 3 processors with a ConvertRecord to make a JSON object,
and then expand the Jolt transformation to handle the splitting by states
within it. As well, I subsequently have a MergeRecord processor after the
JoltTransform that could probably go away if I were to go that extra step.
John
--------------------------------------------
On Fri, 4/12/19, Bryan Bende <[email protected]> wrote:
Subject: Re: Merge identical JSON records to single JSON with subarray
To: [email protected]
Date: Friday, April 12, 2019, 10:36 AM
Hello,
I think you will likely need to use a JOLT
transform to perform this
operation. I
don't know JOLT well enough to suggest the correct
operation, so maybe others can help there.
ConvertRecord is more about
converting between formats like using a
JSON
reader and a CSV writer where the schema hasn't changed,
or using
forward/backward compatible schemas
like read with a schema that has 4
fields
and write with a schema that has 2 fields, or read with a
schema that has 4 fields and write with a
schema that has 6 fields
where the new
fields have defaults.
In
your situation you are actually trying to change the schema
and
manipulate the structure and there is no
way for ConvertRecord to
understand how to
go from your first schema to your second schema.
JOLT does have these kinds of operations
though.
-Bryan
On Fri, Apr 12, 2019 at 10:24 AM John McGinn
<[email protected]>
wrote:
>
> I've
looked around and haven't found a solution, or a denial
that it can occur. Most items I've seen relate to 2 JSON
files that do not have the same format which people want to
merge.
>
> My
situation is different. Let's say that I have state/city
JSON model
{"state":"California","city":"Sacramento"}
and I can have multiple records for a single state, but with
different cities.
>
>
{"name":"StateCityList"
> , "namespace":
"nifi.statecitylist"
> ,
"type": "record"
> ,
"fields": [
> {
"name": "state", "type":
"string" }
> , {
"name": "city",
"type":"string" }
>
]
> }
>
> I can use EvaluateJsonPath to put the
state in as an attribute, and then Merge Content to get a
single FlowFile per state. This will produce a FlowFile with
multiple lines of that JSON model. What I've been trying
to do is then send that into a ConvertRecord that takes that
JSON model as the input reader and use an array as the
sub-object to get an array of cities.
>
>
{"name":"StateCityList"
> , "namespace":
"nifi.statecitylist"
> ,
"type": "record"
> ,
"fields": [
> {
"name": "state", "type":
"string" }
> , {
"name": "city", "type": {
"type": "array", "items":
["null","string"] } }
> ]
> }
>
> When I do this I get
an error with "Cannot convert value [Sacramento] of
type java.lang.String to Object Array for field city."
Is there something wrong with my Avro schemas? Or is this
just something that ConvertRecord (or MergeRecord) can
do?
>
> Regarding the
other things I've found relating to merging 2 separate
JSON objects, were using JoltTransformJSON or ExecuteScript
with Groovy. Is this the same situation for my scenario?
>
> Thanks,
> John McGinn