Re: Estimated row-group size is significantly higher than the written one

2018-06-22 Thread Ryan Blue
I think you're right about the cause. The current estimate is what is buffered in memory, so it includes all of the intermediate data for the last page before it is finalized and compressed. We could probably get a better estimate by using the amount of buffered data and how large other pages in

[jira] [Commented] (PARQUET-1335) Logical type names in parquet-mr are not consistent with parquet-format

2018-06-22 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16520288#comment-16520288 ] ASF GitHub Bot commented on PARQUET-1335: - nandorKollar opened a new pull request #496:

[jira] [Updated] (PARQUET-1335) Logical type names in parquet-mr are not consistent with parquet-format

2018-06-22 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated PARQUET-1335: Labels: pull-request-available (was: ) > Logical type names in parquet-mr are not

[jira] [Commented] (PARQUET-1335) Logical type names in parquet-mr are not consistent with parquet-format

2018-06-22 Thread Nandor Kollar (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16520290#comment-16520290 ] Nandor Kollar commented on PARQUET-1335: [~gszadovszky] sure, but since new API is not yet

[jira] [Updated] (PARQUET-1335) Logical type names in parquet-mr are not consistent with parquet-format

2018-06-22 Thread Gabor Szadovszky (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky updated PARQUET-1335: -- Affects Version/s: 1.11.0 > Logical type names in parquet-mr are not consistent with

[jira] [Commented] (PARQUET-1335) Logical type names in parquet-mr are not consistent with parquet-format

2018-06-22 Thread Gabor Szadovszky (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16520285#comment-16520285 ] Gabor Szadovszky commented on PARQUET-1335: --- Please, make sure it'll be pushed before

[jira] [Created] (PARQUET-1335) Logical type names in parquet-mr are not consistent with parquet-format

2018-06-22 Thread Nandor Kollar (JIRA)
Nandor Kollar created PARQUET-1335: -- Summary: Logical type names in parquet-mr are not consistent with parquet-format Key: PARQUET-1335 URL: https://issues.apache.org/jira/browse/PARQUET-1335

Same Travis failure for several parquet-format PR

2018-06-22 Thread Nandor Kollar
Hi All, I recently noticed, that on parquet-format Travis builds fail with [ERROR] Plugin org.apache.maven.plugins:maven-remote-resources-plugin:1.5 or one of its dependencies could not be resolved: Failed to read artifact descriptor for

[jira] [Updated] (PARQUET-1334) [C++] memory_map parameter seems missleading in parquet file opener

2018-06-22 Thread Philipp Hoch (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philipp Hoch updated PARQUET-1334: -- Summary: [C++] memory_map parameter seems missleading in parquet file opener (was:

[jira] [Created] (PARQUET-1333) [C++] Reading of files with dictionary size 0 fails on Windows with bad_alloc

2018-06-22 Thread Philipp Hoch (JIRA)
Philipp Hoch created PARQUET-1333: - Summary: [C++] Reading of files with dictionary size 0 fails on Windows with bad_alloc Key: PARQUET-1333 URL: https://issues.apache.org/jira/browse/PARQUET-1333