I would think we could modify ExecuteSQL to have a property for an optional
RecordWriter. It would still create the Avro schema from the DB schema as
it does today, and then if the RecordWriter was specified it would use that
to write records, otherwise it would write the Avro as it does today. This
way you could go straight to CSV.

Without that, as Joe mentioned with the improvements in master (1.4.0
snapshot) you should be able to have an AvroReader using a "Schema Access
Strategy" of "Embedded Avro Schema" which is coming from ExecuteSQL, and
then a CsvRecordSetWriter with "Schema Access Strategy" of "Inherit Record
Schema". So you wouldn't need a schema registry at all and wouldn't need
the UpdateAttribute.

-Bryan


On Tue, Aug 1, 2017 at 9:02 AM, Joe Witt <[email protected]> wrote:

> There are some great points here.  Am on a phone right now so will be very
> brief.  The current 1.4.0 snapshot on master has some nice improvements for
> ease of use with record handling.  This includes writers inheriting schema
> from the reader and others.
>
> Thanks
>
> On Aug 1, 2017 1:20 AM, "Peter Wicks (pwicks)" <[email protected]> wrote:
>
>> I hate to respond with “me too”, but I haven’t seen a response and this
>> kind of simplification is of interest to me.
>>
>>
>>
>> The PutDatabaseRecord processor already does something similar, and I
>> have only needed the AvroReader processor without a schema registry.
>>
>>
>>
>>
>>
>> *From:* Márcio Faria [mailto:[email protected]]
>> *Sent:* Tuesday, July 25, 2017 11:09 AM
>> *To:* Users <[email protected]>
>> *Subject:* [EXT] NiFi 1.3: Simplest way possible of creating CSV files
>> from SQL queries
>>
>>
>>
>> Hi,
>>
>>
>>
>> I'm looking for the simplest way possible of creating CSV files from SQL
>> queries using Apache NiFi 1.3.
>>
>>
>>
>> The flow I currently have (the files are to be SFTP'ed to a remote
>> server):
>>
>>
>>
>> ExecuteSQL -> UpdateAttribute -> ConversionRecord [3 CSs] -> PutSFTP
>>
>>
>>
>> The concept of SchemaRegistry is new to me, but if I understood it
>> correctly in order for the ConversionRecord to work properly is necessary
>> to have 3 Controller Services ([3 CSs]) associated with it:
>>
>>    - AvroSchemaRegistry, with the schema defined in Avro Schema (JSON);
>>    - AvroReader, referring to the above schema;
>>    - CSVRecordSetWriter, also referring to the same schema.
>>
>>
>>
>> It seems there are many benefits in using the schema registry, including
>> versioning, validation, etc, but in my example a simpler configuration
>> would be welcome.
>>
>>
>>
>> Isn't the schema already defined by ExecuteSQL? Can I have the
>> ConversionRecord alone with no *dedicated* SchemaRegistry (property),
>> AvroReader,(instance), or CSVRecordSetWriter (instance)? Of course, we'd
>> still need to specify the output is a CSV, so perhaps a shared
>> CSVRecordSetWriter that also gets its schema from the flow file would still
>> be useful.
>>
>>
>>
>> By the way, would the Schema Access Strategy named "Use Embedded
>> Avro Schema" be part of a simpler solution? How?
>>
>>
>>
>> In the same vein, what about having the schema-name property optionally
>> defined by the ExecuteSQL itself, so we don't have to depend on the
>> UpdateAttribute component?
>>
>>
>>
>> In summary, I'm wondering if it's possible to have 3 (+ 1 generic)
>> components instead of 6 per query:
>>
>>
>>
>> ExecuteSQL -> ConversionRecord [CSVRecordSetWriter] -> PutSFTP
>>
>>
>>
>> That would make a difference when defining multiple conversions from SQL
>> to CSV, or other equivalent flows.
>>
>>
>>
>> In addition, consider that someone might want to have maximum
>> flexibility, meaning that it would be totally acceptable to change the
>> query and get a different layout for the resulting CSV file, without having
>> to change any SchemaRegistry, Reader, or Writer.
>>
>>
>>
>> I've found a few tickets out there covering a similar topic. In
>> particular, [1] mentions the difficulty with more complex Avro data types.
>> But I don't see that being a blocker when the data source is an
>> old-fashioned SQL query.
>>
>>
>>
>> Recommendations?
>>
>>
>>
>> P.S.1 Maybe templates would save the effort, but since Controller
>> Services are "global", I'm still wondering if having too many parts would
>> make it more difficult to manage lots of flows than it could be.
>>
>>
>>
>> P.S.2 Will my 1st flow have a good performance? I'm wondering if another
>> advantage of using SchemaRegistry etc is that it prevents the creation of
>> too many records at once.
>>
>>
>>
>> Thank you,
>>
>>
>>
>> Marcio
>>
>>
>>
>> [1] NIFI-1372 Create ConvertAvroToCSV
>> <https://issues.apache.org/jira/browse/NIFI-1372>
>>
>>
>> [NIFI-1372] Create ConvertAvroToCSV - ASF JIRA
>>
>>
>>
>

Reply via email to