Something like this worked for me:

def topValuesMap = [:].withDefault{ [:].withDefault{ 0 } }
def tallyMap = [:].withDefault{ 0 }
def tally
tally = { Map json, String prefix ->
    json.each { k, v ->
        String key = prefix + k
        if (v instanceof List) {
          tallyMap[key] += 1
          v.each{ tally(it, key + '.') }
        } else {
            def val = v?.toString().trim()
            if (v) {
                tallyMap[key] += 1
                topValuesMap[key][v] += 1
                if (v instanceof Map) tally(v, key + '.')
            }
        }
    }
}

def root = new JsonSlurper().parse(inputStream)
root.each { // each allows json to be a list, not needed if always a map
    tally(it, '')
}
println tallyMap
println topValuesMap.collectEntries{ k, m -> [(k), m.sort{ _, v -> -v.value
}] }.take(10)


<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
Virus-free.www.avast.com
<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

On Fri, May 12, 2023 at 8:27 PM James McMahon <jsmcmah...@gmail.com> wrote:

> Thank you for the response, Paul. I will integrate and try these
> suggestions within my Groovy code that runs in a nifi ExecuteScript
> processor. I'll be working on this once again tonight.
>
> The map topValuesMap is intended to capture this: for each key identified
> in the json, cross-tabulate for each value associated with that key how
> many times it occurs. After the json is fully processed for key, sort the
> resulting map and retain only the top ten values found in the json. If a
> set has a lastName key, the topValuesMap that results might look something
> like this after all keys have been cross-tabulated:
>
> ["lastName": ["Smith" : 1023, "Jones" : 976, "Chang": 899, "Doe": 511,
> ...],
>  "address.street": [.....],
> .
> .
> .
> "a final key": [.....]
> ]
>
> Each key would have ten values in its value map, unless it cross-tabulates
> to less than ten in total, in which case it will be sorted by count value
> and all values accepted.
> Again, many thanks.
> Jim
>
> On Fri, May 12, 2023 at 2:19 AM Paul King <pa...@asert.com.au> wrote:
>
>> I am not 100% sure what you are trying to capture in topValuesMap but for
>> tallyMap you probably want something like:
>>
>> def tallyMap = [:].withDefault{ 0 }
>> def tally
>> tally = { Map json, String prefix ->
>>     json.each { k, v ->
>>         if (v instanceof List) {
>>           tallyMap[prefix + k] += 1
>>           v.each{ tally(it, "$prefix${k}.") }
>>         } else if (v?.toString().trim()) {
>>             tallyMap[prefix + k] += 1
>>             if (v instanceof Map) tally(v, "$prefix${k}.")
>>         }
>>     }
>> }
>>
>> def root = new JsonSlurper().parse(inputStream)
>> def initialPrefix = ''
>> tally(root, initialPrefix)
>> println tallyMap
>>
>> Output:
>> [name:1, age:1, address:1, address.street:1, address.city:1,
>> address.state:1, address.zip:1, phoneNumbers:1, phoneNumbers.type:2,
>> phoneNumbers.number:2]
>>
>>
>>
>> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
>> Virus-free.www.avast.com
>> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
>> <#m_227840230113638997_m_3707991732434518544_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>>
>> On Fri, May 12, 2023 at 10:38 AM James McMahon <jsmcmah...@gmail.com>
>> wrote:
>>
>>> I have this incoming json: { "name": "John Doe", "age": 42, "address": {
>>> "street": "123 Main St", "city": "Anytown", "state": "CA", "zip": "12345"
>>> }, "phoneNumbers": [ { "type": "home", "number": "555-1234" }, { "type":
>>> "work", "number": "555-5678" } ] } I wish to tally all the keys in this
>>> json in a map that gives me the key name as its key, and a count of the
>>> number of times the key occurs in the json as its value. For this example,
>>> the keys I expect in my output should include name, age, address,
>>> address.street, address.city, address.state, address.zip, phoneNumbers,
>>> phoneNumbers.type, and phoneNumbers.number. But I do not get that. Instead,
>>> I get this for the list of fields: triage.json.fields
>>> name,age,address,phoneNumbers And I get this for my tally count by key:
>>> triage.json.tallyMap [name:1, age:1, address:1, phoneNumbers:1]
>>>
>>> I am close, but not quite there. I don't capture all the keys. Here is
>>> my code. How must I modify this to get the result I require? import
>>> groovy.json.JsonSlurper import org.apache.commons.io.IOUtils import
>>> java.nio.charset.StandardCharsets def keys = [] def tallyMap = [:] def
>>> topValuesMap = [:] def ff = session.get() if (!ff) return try {
>>> session.read(ff, { inputStream -> def json = new
>>> JsonSlurper().parseText(IOUtils.toString(inputStream,
>>> StandardCharsets.UTF_8)) json.each { k, v -> if (v != null &&
>>> !v.toString().trim().isEmpty()) { tallyMap[k] = tallyMap.containsKey(k) ?
>>> tallyMap[k] + 1 : 1 if (topValuesMap.containsKey(k)) { def valuesMap =
>>> topValuesMap[k] valuesMap[v] = valuesMap.containsKey(v) ? valuesMap[v] + 1
>>> : 1 topValuesMap[k] = valuesMap } else { topValuesMap[k] = [:].withDefault{
>>> 0 }.plus([v: 1]) } } } } as InputStreamCallback) keys =
>>> tallyMap.keySet().toList() def tallyMapString = tallyMap.collectEntries {
>>> k, v -> [(k): v] }.toString() def topValuesMapString =
>>> topValuesMap.collectEntries { k, v -> [(k): v.sort{ -it.value }.take(10)]
>>> }.toString() ff = session.putAttribute(ff, 'triage.json.fields',
>>> keys.join(",")) ff = session.putAttribute(ff, 'triage.json.tallyMap',
>>> tallyMapString) ff = session.putAttribute(ff, 'triage.json.topValuesMap',
>>> topValuesMapString) session.transfer(ff, REL_SUCCESS) } catch (Exception e)
>>> { log.error('Error processing json fields', e) session.transfer(ff,
>>> REL_FAILURE) }
>>>
>>

Reply via email to