Adding the mailing list back and adding the benchmark script
I notice one likely-serious problem: you are spawning num_columns *
num_row_groups threads all at once. Based on what you've described
about your data, that's ~300 threads simultaneously. I would recommend
setting the number of threads e
Yeah, sounds like something went wrong. What is your data model? Parquet
can handle Avro records pretty seamlessly if you already have them.
On Wed, Mar 14, 2018 at 9:20 AM, ALeX Wang wrote:
> Hi Ryan,
>
> Thanks for the reply,
>
> We are using samza for streaming,
>
> Regarding parquet java, th
Hi Ryan,
Thanks for the reply,
We are using samza for streaming,
Regarding parquet java, then i must have not used the APIs right,,, since
last time we tried, we have 7 hadoop processes spawned for writing to a
single file and it was much slower than our parquet c++ alternative,
Thanks,
On 14
Hi Alex,
I don't think what you're trying to do makes sense. If you're using Scala,
then your data is already in the JVM and it is probably much easier to
write it to Parquet using the Java library. While that library depends on
Hadoop, you don't have to use it with HDFS. The Hadoop FileSystem int
Shrutika modi created PARQUET-1248:
--
Summary: java.lang.UnsupportedOperationException: Unimplemented
type: StringType
Key: PARQUET-1248
URL: https://issues.apache.org/jira/browse/PARQUET-1248
Projec
Shrutika modi created PARQUET-1247:
--
Summary:
org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainLongDictionary
Key: PARQUET-1247
URL: https://issues.apache.org/jira/browse/PARQUET-1247
[
https://issues.apache.org/jira/browse/PARQUET-1242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16398641#comment-16398641
]
ASF GitHub Bot commented on PARQUET-1242:
-
zivanfi opened a new pull request #87
[
https://issues.apache.org/jira/browse/PARQUET-1246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16398633#comment-16398633
]
ASF GitHub Bot commented on PARQUET-1246:
-
zivanfi commented on a change in pull
[
https://issues.apache.org/jira/browse/PARQUET-1246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16398630#comment-16398630
]
ASF GitHub Bot commented on PARQUET-1246:
-
zivanfi commented on a change in pull
[
https://issues.apache.org/jira/browse/PARQUET-1246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16398631#comment-16398631
]
ASF GitHub Bot commented on PARQUET-1246:
-
zivanfi commented on a change in pull
[
https://issues.apache.org/jira/browse/PARQUET-1246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16398632#comment-16398632
]
ASF GitHub Bot commented on PARQUET-1246:
-
zivanfi commented on a change in pull
[
https://issues.apache.org/jira/browse/PARQUET-1246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16398629#comment-16398629
]
ASF GitHub Bot commented on PARQUET-1246:
-
zivanfi commented on a change in pull
[
https://issues.apache.org/jira/browse/PARQUET-1212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gabor Szadovszky reassigned PARQUET-1212:
-
Assignee: Gabor Szadovszky
> Write column indexes: Show indexes in tools
>
13 matches
Mail list logo