[
https://issues.apache.org/jira/browse/PARQUET-124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14996536#comment-14996536
]
swetha k commented on PARQUET-124:
--
[~b...@cloudera.com]
I still see the issues. Please see the Warning
This sounds good to me.
We should have a UNION logical type in parquet-format to capture this
information.
A UNION type is defined as a GROUP and should always have exactly one field
populated.
By default the name of the field is the type name but in the case of thrift
it is provided by the IDL.
[
https://issues.apache.org/jira/browse/PARQUET-124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997989#comment-14997989
]
swetha k commented on PARQUET-124:
--
[~rdblue]
I can create a JIRA issue for this. Just to confirm,
Selina,
I would use parquet-avro to create a writer. Kafka messages are commonly
encoded as Avro, so you may already be working with Avro objects. If
not, then convert to Avro and then write to the AvroParquetWriter.
You can create a the writer that creates S3 files by setting up your S3
[
https://issues.apache.org/jira/browse/PARQUET-124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14996949#comment-14996949
]
Ryan Blue commented on PARQUET-124:
---
[~swethakasireddy], it looks like this wasn't completely addressed
Selina,
You should be able to write to S3 without needing to flush to an output
stream. You would just use the S3 FileSystem to write data instead of
HDFS. This doesn't need to require Parquet to write to an OutputStream
instead of a file. Is there a reason why you want to supply an output
[
https://issues.apache.org/jira/browse/PARQUET-390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997083#comment-14997083
]
Ryan Blue commented on PARQUET-390:
---
You're right that my suggestion is a much larger issue. For this
Hi, Ryan:
Thanks a lot for your suggestion. I do not have to get the output
stream if I could write my continually Kafka message (in json, cvs or avro
format) to AWS S3 in parquet format. Would you like to introduce a little
bit more detail about it and then I find some solution in