[jira] [Commented] (PARQUET-124) parquet.hadoop.ParquetOutputCommitter.commitJob() throws parquet.io.ParquetEncodingException

2015-11-09 Thread swetha k (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14998017#comment-14998017 ] swetha k commented on PARQUET-124: -- [~rdblue] Following is the new issues. https://iss

[jira] [Commented] (PARQUET-124) parquet.hadoop.ParquetOutputCommitter.commitJob() throws parquet.io.ParquetEncodingException

2015-11-09 Thread swetha k (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14997989#comment-14997989 ] swetha k commented on PARQUET-124: -- [~rdblue] I can create a JIRA issue for this. Just

Re: Reading Parquet data from input stream and write to output stream

2015-11-09 Thread Selina Tech
Hi, Ryan: Thanks a lot for your suggestion. I do not have to get the output stream if I could write my continually Kafka message (in json, cvs or avro format) to AWS S3 in parquet format. Would you like to introduce a little bit more detail about it and then I find some solution in detail

Re: Reading Parquet data from input stream and write to output stream

2015-11-09 Thread Ryan Blue
Selina, I would use parquet-avro to create a writer. Kafka messages are commonly encoded as Avro, so you may already be working with Avro objects. If not, then convert to Avro and then write to the AvroParquetWriter. You can create a the writer that creates S3 files by setting up your S3 fil

Re: Proposal for Union type

2015-11-09 Thread Julien Le Dem
This sounds good to me. We should have a UNION logical type in parquet-format to capture this information. A UNION type is defined as a GROUP and should always have exactly one field populated. By default the name of the field is the type name but in the case of thrift it is provided by the IDL. We

[jira] [Commented] (PARQUET-390) GroupType.union(Type toMerge, boolean strict) does not honor strict parameter

2015-11-09 Thread Ryan Blue (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14997083#comment-14997083 ] Ryan Blue commented on PARQUET-390: --- You're right that my suggestion is a much larger i

Re: Reading Parquet data from input stream and write to output stream

2015-11-09 Thread Ryan Blue
Selina, You should be able to write to S3 without needing to flush to an output stream. You would just use the S3 FileSystem to write data instead of HDFS. This doesn't need to require Parquet to write to an OutputStream instead of a file. Is there a reason why you want to supply an output st

[jira] [Commented] (PARQUET-124) parquet.hadoop.ParquetOutputCommitter.commitJob() throws parquet.io.ParquetEncodingException

2015-11-09 Thread Ryan Blue (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14996949#comment-14996949 ] Ryan Blue commented on PARQUET-124: --- [~swethakasireddy], it looks like this wasn't comp

[jira] [Commented] (PARQUET-124) parquet.hadoop.ParquetOutputCommitter.commitJob() throws parquet.io.ParquetEncodingException

2015-11-09 Thread swetha k (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14996536#comment-14996536 ] swetha k commented on PARQUET-124: -- [~b...@cloudera.com] I still see the issues. Please