Re: Unable to properly tally all keys in nested json

2023-05-12 Thread Paul King
Something like this worked for me:

def topValuesMap = [:].withDefault{ [:].withDefault{ 0 } }
def tallyMap = [:].withDefault{ 0 }
def tally
tally = { Map json, String prefix ->
json.each { k, v ->
String key = prefix + k
if (v instanceof List) {
  tallyMap[key] += 1
  v.each{ tally(it, key + '.') }
} else {
def val = v?.toString().trim()
if (v) {
tallyMap[key] += 1
topValuesMap[key][v] += 1
if (v instanceof Map) tally(v, key + '.')
}
}
}
}

def root = new JsonSlurper().parse(inputStream)
root.each { // each allows json to be a list, not needed if always a map
tally(it, '')
}
println tallyMap
println topValuesMap.collectEntries{ k, m -> [(k), m.sort{ _, v -> -v.value
}] }.take(10)



Virus-free.www.avast.com

<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

On Fri, May 12, 2023 at 8:27 PM James McMahon  wrote:

> Thank you for the response, Paul. I will integrate and try these
> suggestions within my Groovy code that runs in a nifi ExecuteScript
> processor. I'll be working on this once again tonight.
>
> The map topValuesMap is intended to capture this: for each key identified
> in the json, cross-tabulate for each value associated with that key how
> many times it occurs. After the json is fully processed for key, sort the
> resulting map and retain only the top ten values found in the json. If a
> set has a lastName key, the topValuesMap that results might look something
> like this after all keys have been cross-tabulated:
>
> ["lastName": ["Smith" : 1023, "Jones" : 976, "Chang": 899, "Doe": 511,
> ...],
>  "address.street": [.],
> .
> .
> .
> "a final key": [.]
> ]
>
> Each key would have ten values in its value map, unless it cross-tabulates
> to less than ten in total, in which case it will be sorted by count value
> and all values accepted.
> Again, many thanks.
> Jim
>
> On Fri, May 12, 2023 at 2:19 AM Paul King  wrote:
>
>> I am not 100% sure what you are trying to capture in topValuesMap but for
>> tallyMap you probably want something like:
>>
>> def tallyMap = [:].withDefault{ 0 }
>> def tally
>> tally = { Map json, String prefix ->
>> json.each { k, v ->
>> if (v instanceof List) {
>>   tallyMap[prefix + k] += 1
>>   v.each{ tally(it, "$prefix${k}.") }
>> } else if (v?.toString().trim()) {
>> tallyMap[prefix + k] += 1
>> if (v instanceof Map) tally(v, "$prefix${k}.")
>> }
>> }
>> }
>>
>> def root = new JsonSlurper().parse(inputStream)
>> def initialPrefix = ''
>> tally(root, initialPrefix)
>> println tallyMap
>>
>> Output:
>> [name:1, age:1, address:1, address.street:1, address.city:1,
>> address.state:1, address.zip:1, phoneNumbers:1, phoneNumbers.type:2,
>> phoneNumbers.number:2]
>>
>>
>>
>> 
>> Virus-free.www.avast.com
>> 
>> <#m_227840230113638997_m_3707991732434518544_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>>
>> On Fri, May 12, 2023 at 10:38 AM James McMahon 
>> wrote:
>>
>>> I have this incoming json: { "name": "John Doe", "age": 42, "address": {
>>> "street": "123 Main St", "city": "Anytown", "state": "CA", "zip": "12345"
>>> }, "phoneNumbers": [ { "type": "home", "number": "555-1234" }, { "type":
>>> "work", "number": "555-5678" } ] } I wish to tally all the keys in this
>>> json in a map that gives me the key name as its key, and a count of the
>>> number of times the key occurs in the json as its value. For this example,
>>> the keys I expect in my output should include name, age, address,
>>> address.street, address.city, address.state, address.zip, phoneNumbers,
>>> phoneNumbers.type, and phoneNumbers.number. But I do not get that. Instead,
>>> I get this for the list of fields: triage.json.fields
>>> name,age,address,phoneNumbers And I get this for my tally count by key:
>>> triage.json.tallyMap [name:1, age:1, address:1, phoneNumbers:1]
>>>
>>> I am close, but not quite there. I don't capture all the keys. Here is
>>> my code. How must I modify this to get the result I require? import
>>> groovy.json.JsonSlurper import org.apache.commons.io.IOUtils import
>>> java.nio.charset.StandardCharsets def keys = [] def tallyMap = [:] def
>>> topValuesMap = [:] def ff = session.get() if (!ff) return try {
>>> session.read(ff, { inputStream -> def json = new
>>> JsonSlurper().parseText(IOUtils.toString(inputStream,
>>> StandardCharsets.UTF_8)) json.each { k, v -> if (v != null &&
>>> !v.toString().trim().isEmpty()) { tallyMap[k] = tallyMap.containsKey(

Re: Unable to properly tally all keys in nested json

2023-05-12 Thread James McMahon
Thank you for the response, Paul. I will integrate and try these
suggestions within my Groovy code that runs in a nifi ExecuteScript
processor. I'll be working on this once again tonight.

The map topValuesMap is intended to capture this: for each key identified
in the json, cross-tabulate for each value associated with that key how
many times it occurs. After the json is fully processed for key, sort the
resulting map and retain only the top ten values found in the json. If a
set has a lastName key, the topValuesMap that results might look something
like this after all keys have been cross-tabulated:

["lastName": ["Smith" : 1023, "Jones" : 976, "Chang": 899, "Doe": 511, ...],
 "address.street": [.],
.
.
.
"a final key": [.]
]

Each key would have ten values in its value map, unless it cross-tabulates
to less than ten in total, in which case it will be sorted by count value
and all values accepted.
Again, many thanks.
Jim

On Fri, May 12, 2023 at 2:19 AM Paul King  wrote:

> I am not 100% sure what you are trying to capture in topValuesMap but for
> tallyMap you probably want something like:
>
> def tallyMap = [:].withDefault{ 0 }
> def tally
> tally = { Map json, String prefix ->
> json.each { k, v ->
> if (v instanceof List) {
>   tallyMap[prefix + k] += 1
>   v.each{ tally(it, "$prefix${k}.") }
> } else if (v?.toString().trim()) {
> tallyMap[prefix + k] += 1
> if (v instanceof Map) tally(v, "$prefix${k}.")
> }
> }
> }
>
> def root = new JsonSlurper().parse(inputStream)
> def initialPrefix = ''
> tally(root, initialPrefix)
> println tallyMap
>
> Output:
> [name:1, age:1, address:1, address.street:1, address.city:1,
> address.state:1, address.zip:1, phoneNumbers:1, phoneNumbers.type:2,
> phoneNumbers.number:2]
>
>
>
> 
> Virus-free.www.avast.com
> 
> <#m_3707991732434518544_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>
> On Fri, May 12, 2023 at 10:38 AM James McMahon 
> wrote:
>
>> I have this incoming json: { "name": "John Doe", "age": 42, "address": {
>> "street": "123 Main St", "city": "Anytown", "state": "CA", "zip": "12345"
>> }, "phoneNumbers": [ { "type": "home", "number": "555-1234" }, { "type":
>> "work", "number": "555-5678" } ] } I wish to tally all the keys in this
>> json in a map that gives me the key name as its key, and a count of the
>> number of times the key occurs in the json as its value. For this example,
>> the keys I expect in my output should include name, age, address,
>> address.street, address.city, address.state, address.zip, phoneNumbers,
>> phoneNumbers.type, and phoneNumbers.number. But I do not get that. Instead,
>> I get this for the list of fields: triage.json.fields
>> name,age,address,phoneNumbers And I get this for my tally count by key:
>> triage.json.tallyMap [name:1, age:1, address:1, phoneNumbers:1]
>>
>> I am close, but not quite there. I don't capture all the keys. Here is my
>> code. How must I modify this to get the result I require? import
>> groovy.json.JsonSlurper import org.apache.commons.io.IOUtils import
>> java.nio.charset.StandardCharsets def keys = [] def tallyMap = [:] def
>> topValuesMap = [:] def ff = session.get() if (!ff) return try {
>> session.read(ff, { inputStream -> def json = new
>> JsonSlurper().parseText(IOUtils.toString(inputStream,
>> StandardCharsets.UTF_8)) json.each { k, v -> if (v != null &&
>> !v.toString().trim().isEmpty()) { tallyMap[k] = tallyMap.containsKey(k) ?
>> tallyMap[k] + 1 : 1 if (topValuesMap.containsKey(k)) { def valuesMap =
>> topValuesMap[k] valuesMap[v] = valuesMap.containsKey(v) ? valuesMap[v] + 1
>> : 1 topValuesMap[k] = valuesMap } else { topValuesMap[k] = [:].withDefault{
>> 0 }.plus([v: 1]) } } } } as InputStreamCallback) keys =
>> tallyMap.keySet().toList() def tallyMapString = tallyMap.collectEntries {
>> k, v -> [(k): v] }.toString() def topValuesMapString =
>> topValuesMap.collectEntries { k, v -> [(k): v.sort{ -it.value }.take(10)]
>> }.toString() ff = session.putAttribute(ff, 'triage.json.fields',
>> keys.join(",")) ff = session.putAttribute(ff, 'triage.json.tallyMap',
>> tallyMapString) ff = session.putAttribute(ff, 'triage.json.topValuesMap',
>> topValuesMapString) session.transfer(ff, REL_SUCCESS) } catch (Exception e)
>> { log.error('Error processing json fields', e) session.transfer(ff,
>> REL_FAILURE) }
>>
>