[jira] [Updated] (PARQUET-520) Add version of LocalFileSource that uses memory-mapping for zero-copy reads
[ https://issues.apache.org/jira/browse/PARQUET-520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated PARQUET-520: - Description: Repurposed this JIRA after PARQUET-533. Memory-mapping will save us memory allocations and performance in some applications. We could have it as an optional API. (was: Noted this while working on PARQUET-497. If we are using a memory-mapped file, then copying data into a {{ScopedInMemoryInputStream}} as we are now is unnecessary and will yield improved performance. Perhaps this should be made a property of the {{InputStream}} (i.e. indicate whether it support zero-copy reads, and the returned buffer does not become invalid after future reads as long as the stream -- the memory map specifically in this example -- is alive)) Summary: Add version of LocalFileSource that uses memory-mapping for zero-copy reads (was: Add support for zero-copy InputStreams on memory-mapped files) > Add version of LocalFileSource that uses memory-mapping for zero-copy reads > --- > > Key: PARQUET-520 > URL: https://issues.apache.org/jira/browse/PARQUET-520 > Project: Parquet > Issue Type: Improvement > Components: parquet-cpp >Reporter: Wes McKinney >Assignee: Wes McKinney > > Repurposed this JIRA after PARQUET-533. Memory-mapping will save us memory > allocations and performance in some applications. We could have it as an > optional API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (PARQUET-525) Test coverage for malformed file failure modes on the read path
[ https://issues.apache.org/jira/browse/PARQUET-525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Le Dem resolved PARQUET-525. --- Resolution: Fixed Fix Version/s: cpp-0.1 Issue resolved by pull request 60 [https://github.com/apache/parquet-cpp/pull/60] > Test coverage for malformed file failure modes on the read path > --- > > Key: PARQUET-525 > URL: https://issues.apache.org/jira/browse/PARQUET-525 > Project: Parquet > Issue Type: Test > Components: parquet-cpp >Reporter: Wes McKinney > Fix For: cpp-0.1 > > > These code paths do not have test coverage. We should construct test cases > that each possible kind of malformation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (PARQUET-543) Remove BoundedInt encodings
[ https://issues.apache.org/jira/browse/PARQUET-543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan Blue reassigned PARQUET-543: - Assignee: Ryan Blue > Remove BoundedInt encodings > --- > > Key: PARQUET-543 > URL: https://issues.apache.org/jira/browse/PARQUET-543 > Project: Parquet > Issue Type: Improvement > Components: parquet-mr >Affects Versions: 1.8.1 >Reporter: Ryan Blue >Assignee: Ryan Blue > > The classes in org.apache.parquet.column.values.boundedint aren't used. It > looks like this was intended to be the "right" way to use the RLE/BitPacking > hybrid, but callers ended up instantiating the RLE encoder or writer directly. > The ZeroIntegerValuesReader and DevNullValuesWriter are used, but should be > relocated. The ZeroIntegerValuesReader is only used when the encoding is RLE > (in > [Encoding.java|https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/column/Encoding.java#L119]) > and the DevNullValuesWriter actually writes BIT_PACKED values. It would be > better to relocate those classes in the rle and bitpacking packages. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PARQUET-543) Remove BoundedInt encodings
Ryan Blue created PARQUET-543: - Summary: Remove BoundedInt encodings Key: PARQUET-543 URL: https://issues.apache.org/jira/browse/PARQUET-543 Project: Parquet Issue Type: Improvement Components: parquet-mr Affects Versions: 1.8.1 Reporter: Ryan Blue The classes in org.apache.parquet.column.values.boundedint aren't used. It looks like this was intended to be the "right" way to use the RLE/BitPacking hybrid, but callers ended up instantiating the RLE encoder or writer directly. The ZeroIntegerValuesReader and DevNullValuesWriter are used, but should be relocated. The ZeroIntegerValuesReader is only used when the encoding is RLE (in [Encoding.java|https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/column/Encoding.java#L119]) and the DevNullValuesWriter actually writes BIT_PACKED values. It would be better to relocate those classes in the rle and bitpacking packages. -- This message was sent by Atlassian JIRA (v6.3.4#6332)