Thank you Matt for the suggestion, I'm currently using JsonTreeReader w/
PutDatabaseRecord.  Wasn't sure if there was a way define the desired
content with this Record Reader.

And the records I am currently working with are:
{"content":[
     {"owner":"blah",
      "share":false,
      "description": "Job URL: https:.....",
      "id":"5f6afdb03972e5000164886c",
      "name":"NAME",
      "number":38,
      "start_time":1600847279311,
      "end_time":1600847284052,
      "status":"PASSED",
      "statistics":{
            "executions":{
            "total":"142",
            "passed":"142",
            "failed":"0",
            "skipped":"0"
            },
      "defects":{
           "product_bug":{
                 "total":0,
                 "PB001":0
           },
          "automation_bug":{
                "AB001":0,
                "total":0
          },
          "system_issue":{
                "total":0,
                "SI001":0
           },
          "to_investigate":{
               "total":0,
               "TI001":0
           },
          "no_defect":{
               "ND001":0,
               "total":0
          }
      }
   },
  "tags":["tag1", "tag2", "tag3"],
  ...
]
}

And typically in python I would do
pd.json_normalzie(results.json()['content']) and I would use
json_normalize() again on the statistics and defects columns and provide
the sep='_' option.

Thanks!
Eric

On Tue, Sep 29, 2020 at 2:12 PM Matt Burgess <mattyb...@apache.org> wrote:

> Eric,
>
> Depending on how large the JSON content is, you could use
> JoltTransformJSON to "hoist" the desired data to the top level. Given
> this example JSON:
>
> {
>   "json": {
>     "data": {
>       "value": 3
>     }
>   }
> }
>
> The spec would be:
>
> [
>   {
>     "operation": "shift",
>     "spec": {
>       "json": {
>         "data": {
>           "*": "&"
>         }
>       }
>     }
>   }
> ]
>
> It "walks down the tree" until it gets to what you want (in this case
> all fields under "data"), then outputs it at the top level.
>
>
> For PutDatabaseRecord, there currently is no way to have it create the
> target table if it doesn't exist, I wrote up NIFI-7862 [1] to cover
> this improvement. For the "none of the fields map to columns" issue,
> make sure the table and field/column names match the correct
> upper/lower case. You may also need to set "Translate Field Names" to
> true so it is less strict about the matching.
>
> Regards,
> Matt
>
> [1] https://issues.apache.org/jira/browse/NIFI-7862
>
> On Tue, Sep 29, 2020 at 4:44 PM Eric Sammons <elsamm...@gmail.com> wrote:
> >
> > Hello,
> >
> > As the subject line states I am a new to NiFi.  For my first flow I am
> attempting to replace my python code with a NiFi workflow that extracts
> data from a JSON endpoint and then writes the JSON to a Redshift
> database.schema.table.
> >
> > I have set up my DBCP for redshift and I have assigned that to my
> PutDatabaseRecord processor; however, I am having (at least) two issues so
> far.
> >
> > How do I tell InvokeHTTP (OR PutDatabaseRecord) to start reading the
> JSON from a specific point, most json responses are in the form of
> result.json()['data'] or result.json()['content']?
> > What is the best approach to having the PutDatabaseRecord go to a
> "Create Table" when the table and thus columns do not exist?  Is there a
> way to have PutDatabaseRecord create the table if it doesn't exist?
> >
> > As a side note to #2, the PutDatabaseRecord is reporting
> "PutDatabaseRecord failed to process standardflowfilerecord.... due to None
> of the fields in the record map to the columns defined by the schema.table
> table."  On failures the JSON contents are being written to a file so I am
> able to see evidence that InvokeHTTP is working and that PutDatabaseRecord
> has at least looked at / for the schema.table.
> >
> > Thank you!
>

Reply via email to