Thanks Bryan.

That's exactly where my head is, and was hoping there was an easier way. A
custom processor would allow us to read in a ResulSet, by essentially
modeling it after AbstractRecordProcessor (which by the way would be great
to be extending like AbstractProcessor). The disadvantage in these
approaches is the custom code isn't taking advantage of the Avro pools to
reduce deserialization overhead. I also debated if it would be possible to
create a new service that's essentially the same as AvroReader, and then
tying that into ConvertRecord but I'm not familiar enough with this
approach and whether it can be done and registered with Nifi.

Thanks,
Jason

On Mon, Sep 14, 2020 at 2:37 PM Bryan Bende <[email protected]> wrote:

> Hello,
>
> I think it likely requires a custom processor, or custom script with
> ExecuteScript.
>
> Coming out of the database processor, you are going to have two levels of
> Avro...
>
> The outer Avro is representing the rows from your database, so you'll have
> Avro records where one field in each record is itself another Avro object.
>
> You would likely need to split all the outer records to one per flow file
> (not great for performance), then for each flow file use the custom
> processors/script to read the value of the field where the Avro blob is,
> and overwrite the flow file content with that value, then send all of these
> to a MergeRecord.
>
> -Bryan
>
>
> On Mon, Sep 14, 2020 at 2:29 PM Jason Iannone <[email protected]> wrote:
>
>> Anyone have thoughts on this? Essentially we have binary avro stored as a
>> BLOB in Oracle, and I want to extract it via Nifi and read and write out
>> the contents.
>>
>> Thanks,
>> Jason
>>
>> On Mon, Aug 17, 2020 at 10:04 AM Jason Iannone <[email protected]>
>> wrote:
>>
>>> Hi all,
>>>
>>> I have a scenario where an Avro binary is being stored as a BLOB in an
>>> RDBMS. What's the recommended approach for querying this in bulk,
>>> extracting this specific field, and batching it to HDFS?
>>>
>>>    1. GenerateTableFetch OR QueryDatabaseTableRecord
>>>    2. Extract Avro column and assemble output <-- How?
>>>    3. MergeRecord
>>>    4. PutHDFS
>>>
>>> Additional clarification is that ultimately I want to make the Avro
>>> exactly as it is (content wise), store in HDFS, with an external Hive table
>>> on top.
>>>
>>> Thanks,
>>> Jason
>>>
>>

Reply via email to