[jira] [Created] (PARQUET-1299) [C++] Upgrade Brotli to latest version
Phillip Cloud created PARQUET-1299: -- Summary: [C++] Upgrade Brotli to latest version Key: PARQUET-1299 URL: https://issues.apache.org/jira/browse/PARQUET-1299 Project: Parquet Issue Type: Improvement Components: parquet-cpp Affects Versions: cpp-1.3.1 Reporter: Phillip Cloud Assignee: Phillip Cloud -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (PARQUET-1276) [C++] Reduce the amount of memory used for writing null decimal values
Phillip Cloud created PARQUET-1276: -- Summary: [C++] Reduce the amount of memory used for writing null decimal values Key: PARQUET-1276 URL: https://issues.apache.org/jira/browse/PARQUET-1276 Project: Parquet Issue Type: Improvement Components: parquet-cpp Affects Versions: cpp-1.3.1 Reporter: Phillip Cloud Assignee: Phillip Cloud Fix For: cpp-1.4.0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (PARQUET-1168) [C++] Table::Make no longer exists in Arrow
[ https://issues.apache.org/jira/browse/PARQUET-1168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phillip Cloud resolved PARQUET-1168. Resolution: Invalid False alarm. > [C++] Table::Make no longer exists in Arrow > --- > > Key: PARQUET-1168 > URL: https://issues.apache.org/jira/browse/PARQUET-1168 > Project: Parquet > Issue Type: Bug > Components: parquet-cpp >Affects Versions: cpp-1.2.0 >Reporter: Phillip Cloud >Assignee: Phillip Cloud > Fix For: cpp-1.3.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (PARQUET-1168) [C++] Table::Make no longer exists in Arrow
Phillip Cloud created PARQUET-1168: -- Summary: [C++] Table::Make no longer exists in Arrow Key: PARQUET-1168 URL: https://issues.apache.org/jira/browse/PARQUET-1168 Project: Parquet Issue Type: Bug Components: parquet-cpp Affects Versions: cpp-1.2.0 Reporter: Phillip Cloud Assignee: Phillip Cloud Fix For: cpp-1.3.0 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (PARQUET-1167) [C++] FieldToNode function should return a status when throwing an exception
Phillip Cloud created PARQUET-1167: -- Summary: [C++] FieldToNode function should return a status when throwing an exception Key: PARQUET-1167 URL: https://issues.apache.org/jira/browse/PARQUET-1167 Project: Parquet Issue Type: Bug Components: parquet-cpp Affects Versions: cpp-1.2.0 Reporter: Phillip Cloud Assignee: Phillip Cloud Fix For: cpp-1.3.0 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PARQUET-1163) [C++] Add test function to compare underlying values of Arrow arrays, not including type
[ https://issues.apache.org/jira/browse/PARQUET-1163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258682#comment-16258682 ] Phillip Cloud commented on PARQUET-1163: It could be, though this particular issue came up because of the logical/physical type separation in parquet. I think in general we want to compare types when asserting that arrays are equal in tests, and leave the special case comparisons (like this one) as close to the location they are needed. However, if this is generally useful outside of parquet-cpp then I can move it to Arrow. > [C++] Add test function to compare underlying values of Arrow arrays, not > including type > > > Key: PARQUET-1163 > URL: https://issues.apache.org/jira/browse/PARQUET-1163 > Project: Parquet > Issue Type: Improvement > Components: parquet-cpp >Reporter: Phillip Cloud >Assignee: Phillip Cloud > > For unsigned integer typed Arrow arrays read in from parquet files we want to > compare the values (which are stored as signed integers) but we don't want to > fail an assertion because the types of the Arrow arrays are different. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (PARQUET-1163) [C++] Add test function to compare underlying values of Arrow arrays, not including type
[ https://issues.apache.org/jira/browse/PARQUET-1163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phillip Cloud updated PARQUET-1163: --- Summary: [C++] Add test function to compare underlying values of Arrow arrays, not including type (was: Add test function to compare underlying values, not including type) > [C++] Add test function to compare underlying values of Arrow arrays, not > including type > > > Key: PARQUET-1163 > URL: https://issues.apache.org/jira/browse/PARQUET-1163 > Project: Parquet > Issue Type: Improvement > Components: parquet-cpp >Reporter: Phillip Cloud >Assignee: Phillip Cloud > > For unsigned integer typed Arrow arrays read in from parquet files we want to > compare the values (which are stored as signed integers) but we don't want to > fail an assertion because the types of the Arrow arrays are different. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (PARQUET-1163) Add test function to compare underlying values, not including type
Phillip Cloud created PARQUET-1163: -- Summary: Add test function to compare underlying values, not including type Key: PARQUET-1163 URL: https://issues.apache.org/jira/browse/PARQUET-1163 Project: Parquet Issue Type: Improvement Components: parquet-cpp Reporter: Phillip Cloud Assignee: Phillip Cloud For unsigned integer typed Arrow arrays read in from parquet files we want to compare the values (which are stored as signed integers) but we don't want to fail an assertion because the types of the Arrow arrays are different. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (PARQUET-1160) [C++] Implement BYTE_ARRAY-backed Decimal reads
[ https://issues.apache.org/jira/browse/PARQUET-1160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16253968#comment-16253968 ] Phillip Cloud edited comment on PARQUET-1160 at 11/15/17 7:13 PM: -- [~b...@cloudera.com] Are you aware of any projects that currently implement this? We can implement this but we'd need a way to generate some data to add to our test suite. was (Author: cpcloud): [~blue] Are you aware of any projects that currently implement this? We can implement this but we'd need a way to generate some data to add to our test suite. > [C++] Implement BYTE_ARRAY-backed Decimal reads > --- > > Key: PARQUET-1160 > URL: https://issues.apache.org/jira/browse/PARQUET-1160 > Project: Parquet > Issue Type: Task > Components: parquet-cpp >Affects Versions: cpp-1.3.0 >Reporter: Phillip Cloud >Assignee: Phillip Cloud > > These are valid in the parquet spec, but it seems like no system in use today > implements a writer for this type. > What systems support writing Decimals with this underlying type? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PARQUET-1160) [C++] Implement BYTE_ARRAY-backed Decimal reads
[ https://issues.apache.org/jira/browse/PARQUET-1160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16253968#comment-16253968 ] Phillip Cloud commented on PARQUET-1160: [~blue] Are you aware of any projects that currently implement this? We can implement this but we'd need a way to generate some data to add to our test suite. > [C++] Implement BYTE_ARRAY-backed Decimal reads > --- > > Key: PARQUET-1160 > URL: https://issues.apache.org/jira/browse/PARQUET-1160 > Project: Parquet > Issue Type: Task > Components: parquet-cpp >Affects Versions: cpp-1.3.0 >Reporter: Phillip Cloud >Assignee: Phillip Cloud > > These are valid in the parquet spec, but it seems like no system in use today > implements a writer for this type. > What systems support writing Decimals with this underlying type? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (PARQUET-1160) [C++] Implement BYTE_ARRAY-backed Decimal reads
[ https://issues.apache.org/jira/browse/PARQUET-1160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phillip Cloud updated PARQUET-1160: --- Description: These are valid in the parquet spec, but it seems like no system in use today implements a writer for this type. What systems support writing Decimals with this underlying type? was: These are valid in the parquet spec, but it seems like no system in use today implements a writer for this type. We should determine whether this is YAGNI, or if it's actually in use anywhere. > [C++] Implement BYTE_ARRAY-backed Decimal reads > --- > > Key: PARQUET-1160 > URL: https://issues.apache.org/jira/browse/PARQUET-1160 > Project: Parquet > Issue Type: Task > Components: parquet-cpp >Affects Versions: cpp-1.3.0 >Reporter: Phillip Cloud >Assignee: Phillip Cloud > > These are valid in the parquet spec, but it seems like no system in use today > implements a writer for this type. > What systems support writing Decimals with this underlying type? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (PARQUET-1160) [C++] Implement BYTE_ARRAY-backed Decimal reads
[ https://issues.apache.org/jira/browse/PARQUET-1160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phillip Cloud updated PARQUET-1160: --- Summary: [C++] Implement BYTE_ARRAY-backed Decimal reads (was: [C++] Determine whether we need to implement BYTE_ARRAY-backed Decimal reads) > [C++] Implement BYTE_ARRAY-backed Decimal reads > --- > > Key: PARQUET-1160 > URL: https://issues.apache.org/jira/browse/PARQUET-1160 > Project: Parquet > Issue Type: Task > Components: parquet-cpp >Affects Versions: cpp-1.3.0 >Reporter: Phillip Cloud >Assignee: Phillip Cloud > > These are valid in the parquet spec, but it seems like no system in use today > implements a writer for this type. > We should determine whether this is YAGNI, or if it's actually in use > anywhere. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (PARQUET-1160) [C++] Determine whether we need to implement BYTE_ARRAY-backed Decimal reads
Phillip Cloud created PARQUET-1160: -- Summary: [C++] Determine whether we need to implement BYTE_ARRAY-backed Decimal reads Key: PARQUET-1160 URL: https://issues.apache.org/jira/browse/PARQUET-1160 Project: Parquet Issue Type: Task Components: parquet-cpp Reporter: Phillip Cloud Assignee: Phillip Cloud These are valid in the parquet spec, but it seems like no system in use today implements a writer for this type. We should determine whether this is YAGNI, or if it's actually in use anywhere. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (PARQUET-1160) [C++] Determine whether we need to implement BYTE_ARRAY-backed Decimal reads
[ https://issues.apache.org/jira/browse/PARQUET-1160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phillip Cloud updated PARQUET-1160: --- Affects Version/s: cpp-1.3.0 > [C++] Determine whether we need to implement BYTE_ARRAY-backed Decimal reads > > > Key: PARQUET-1160 > URL: https://issues.apache.org/jira/browse/PARQUET-1160 > Project: Parquet > Issue Type: Task > Components: parquet-cpp >Affects Versions: cpp-1.3.0 >Reporter: Phillip Cloud >Assignee: Phillip Cloud > > These are valid in the parquet spec, but it seems like no system in use today > implements a writer for this type. > We should determine whether this is YAGNI, or if it's actually in use > anywhere. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (PARQUET-1159) Compatibility with C++ iterators
Phillip Cloud created PARQUET-1159: -- Summary: Compatibility with C++ iterators Key: PARQUET-1159 URL: https://issues.apache.org/jira/browse/PARQUET-1159 Project: Parquet Issue Type: Bug Components: parquet-cpp Affects Versions: cpp-1.2.0 Reporter: Phillip Cloud Assignee: Phillip Cloud Fix For: cpp-1.4.0 There are some places where it would clean up the code quite a bit to use C++ STL iterators and be compatible with their APIs. Additionally, in this PR (https://github.com/apache/parquet-cpp/pull/403) I had to allocate a separate vector to hold byte swapped values, when what I really want to do is iterate over the existing values in reverse (starting at the last valid byte) so I don't have to copy them into a separate container. This can be done with a {{std::reverse_iterator}} which allows one to use the {{++}} operator everywhere. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (PARQUET-1127) [C++] Fix AssertArraysEqual call
Phillip Cloud created PARQUET-1127: -- Summary: [C++] Fix AssertArraysEqual call Key: PARQUET-1127 URL: https://issues.apache.org/jira/browse/PARQUET-1127 Project: Parquet Issue Type: Bug Components: parquet-cpp Affects Versions: cpp-1.2.0 Reporter: Phillip Cloud Assignee: Phillip Cloud Fix For: cpp-1.3.0 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (PARQUET-1123) [C++] Update parquet-cpp to use Arrow's AssertArraysEqual
[ https://issues.apache.org/jira/browse/PARQUET-1123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phillip Cloud updated PARQUET-1123: --- Summary: [C++] Update parquet-cpp to use Arrow's AssertArraysEqual (was: Update parquet-cpp to use Arrow's AssertArraysEqual) > [C++] Update parquet-cpp to use Arrow's AssertArraysEqual > - > > Key: PARQUET-1123 > URL: https://issues.apache.org/jira/browse/PARQUET-1123 > Project: Parquet > Issue Type: Bug > Components: parquet-cpp >Affects Versions: cpp-1.2.0 >Reporter: Phillip Cloud >Assignee: Phillip Cloud > Fix For: cpp-1.3.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (PARQUET-1123) Update parquet-cpp to use Arrow's AssertArraysEqual
Phillip Cloud created PARQUET-1123: -- Summary: Update parquet-cpp to use Arrow's AssertArraysEqual Key: PARQUET-1123 URL: https://issues.apache.org/jira/browse/PARQUET-1123 Project: Parquet Issue Type: Bug Components: parquet-cpp Affects Versions: cpp-1.2.0 Reporter: Phillip Cloud Assignee: Phillip Cloud Fix For: cpp-1.3.0 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (PARQUET-1095) [C++] Read and write Arrow decimal values
[ https://issues.apache.org/jira/browse/PARQUET-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phillip Cloud reassigned PARQUET-1095: -- Assignee: Phillip Cloud > [C++] Read and write Arrow decimal values > - > > Key: PARQUET-1095 > URL: https://issues.apache.org/jira/browse/PARQUET-1095 > Project: Parquet > Issue Type: New Feature > Components: parquet-cpp >Reporter: Wes McKinney >Assignee: Phillip Cloud > Fix For: cpp-1.4.0 > > > We now have 16-byte decimal values in Arrow which have been validated against > the Java implementation. We need to be able to read and write these to > Parquet format. > To make these values readable by Impala or some other Parquet readers may > require some work. It expects the storage size to match the decimal precision > exactly. So in parquet-cpp we will need to write the correct non-zero bytes > into a FIXED_LEN_BYTE_ARRAY of the appropriate size. > We should validate this against Java Parquet implementations -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (PARQUET-1053) Fix unused result warnings due to unchecked Statuses
Phillip Cloud created PARQUET-1053: -- Summary: Fix unused result warnings due to unchecked Statuses Key: PARQUET-1053 URL: https://issues.apache.org/jira/browse/PARQUET-1053 Project: Parquet Issue Type: Improvement Components: parquet-cpp Affects Versions: cpp-1.1.0 Reporter: Phillip Cloud Fix For: cpp-1.2.0 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PARQUET-1039) PARQUET-911 Breaks Arrow
[ https://issues.apache.org/jira/browse/PARQUET-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060408#comment-16060408 ] Phillip Cloud commented on PARQUET-1039: Now I'm not sure that PARQUET-911 is the culprit here. I think PARQUET-911 exposed a possible bug in that test, which is that there are duplicate column names in the call to {{pa.Table.from_arrays}}. I'm still not sure why the data would have ever compared equal in repeated testing. I'm going to get to the bottom of this tomorrow. > PARQUET-911 Breaks Arrow > > > Key: PARQUET-1039 > URL: https://issues.apache.org/jira/browse/PARQUET-1039 > Project: Parquet > Issue Type: Bug > Components: parquet-cpp >Affects Versions: cpp-1.1.0 >Reporter: Phillip Cloud > Fix For: cpp-1.2.0 > > > When comparing the results of a parquet file to expected data, after > PARQUET-911, a single arrow test is failing: > {{pyarrow/tests/test_parquet.py::test_date_time_types}}. It's not entirely > clear how PARQUET-911 affects this code, but the data in {{time32}} and > {{time64}} columns do not compare equal. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (PARQUET-1039) PARQUET-911 Breaks Arrow
Phillip Cloud created PARQUET-1039: -- Summary: PARQUET-911 Breaks Arrow Key: PARQUET-1039 URL: https://issues.apache.org/jira/browse/PARQUET-1039 Project: Parquet Issue Type: Bug Components: parquet-cpp Affects Versions: cpp-1.1.0 Reporter: Phillip Cloud Fix For: cpp-1.2.0 When comparing the results of a parquet file to expected data, after PARQUET-911, a single arrow test is failing: {{pyarrow/tests/test_parquet.py::test_date_time_types}}. It's not entirely clear how PARQUET-911 affects this code, but the data in {{time32}} and {{time64}} columns do not compare equal. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (PARQUET-1038) Key value metadata should be nullptr if not set
[ https://issues.apache.org/jira/browse/PARQUET-1038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phillip Cloud updated PARQUET-1038: --- Summary: Key value metadata should be nullptr if not set (was: [C++] Key value metadata should be nullptr if not set) > Key value metadata should be nullptr if not set > --- > > Key: PARQUET-1038 > URL: https://issues.apache.org/jira/browse/PARQUET-1038 > Project: Parquet > Issue Type: Bug > Components: parquet-cpp >Affects Versions: cpp-1.1.0 >Reporter: Phillip Cloud > Fix For: cpp-1.2.0 > > > Key value metadata is initialized with > {{std::make_shared()}} which returns a pointer to an empty > instance of {{KeyValueMetadata}}. This breaks Arrow's deserialization of > metadata because it distinguishes between {{nullptr}} and empty metadata. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (PARQUET-1038) [C++] Key value metadata should be nullptr if not set
[ https://issues.apache.org/jira/browse/PARQUET-1038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phillip Cloud updated PARQUET-1038: --- Summary: [C++] Key value metadata should be nullptr if not set (was: Key value metadata should be nullptr if not set) > [C++] Key value metadata should be nullptr if not set > - > > Key: PARQUET-1038 > URL: https://issues.apache.org/jira/browse/PARQUET-1038 > Project: Parquet > Issue Type: Bug > Components: parquet-cpp >Affects Versions: cpp-1.1.0 >Reporter: Phillip Cloud > Fix For: cpp-1.2.0 > > > Key value metadata is initialized with > {{std::make_shared()}} which returns a pointer to an empty > instance of {{KeyValueMetadata}}. This breaks Arrow's deserialization of > metadata because it distinguishes between {{nullptr}} and empty metadata. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (PARQUET-1038) Key value metadata should be nullptr if not set
Phillip Cloud created PARQUET-1038: -- Summary: Key value metadata should be nullptr if not set Key: PARQUET-1038 URL: https://issues.apache.org/jira/browse/PARQUET-1038 Project: Parquet Issue Type: Improvement Components: parquet-cpp Affects Versions: cpp-1.1.0 Reporter: Phillip Cloud Fix For: cpp-1.2.0 Key value metadata is initialized with {{std::make_shared()}} which returns a pointer to an empty instance of {{KeyValueMetadata}}. This breaks Arrow's deserialization of metadata because it distinguishes between {{nullptr}} and empty metadata. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (PARQUET-1038) Key value metadata should be nullptr if not set
[ https://issues.apache.org/jira/browse/PARQUET-1038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phillip Cloud updated PARQUET-1038: --- Issue Type: Bug (was: Improvement) > Key value metadata should be nullptr if not set > --- > > Key: PARQUET-1038 > URL: https://issues.apache.org/jira/browse/PARQUET-1038 > Project: Parquet > Issue Type: Bug > Components: parquet-cpp >Affects Versions: cpp-1.1.0 >Reporter: Phillip Cloud > Fix For: cpp-1.2.0 > > > Key value metadata is initialized with > {{std::make_shared()}} which returns a pointer to an empty > instance of {{KeyValueMetadata}}. This breaks Arrow's deserialization of > metadata because it distinguishes between {{nullptr}} and empty metadata. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (PARQUET-997) Fix override compiler warnings
Phillip Cloud created PARQUET-997: - Summary: Fix override compiler warnings Key: PARQUET-997 URL: https://issues.apache.org/jira/browse/PARQUET-997 Project: Parquet Issue Type: Bug Components: parquet-cpp Reporter: Phillip Cloud Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PARQUET-595) Add API for key-value metadata
[ https://issues.apache.org/jira/browse/PARQUET-595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15970080#comment-15970080 ] Phillip Cloud commented on PARQUET-595: --- I'm working on this in support of writing pandas DataFrames' indexes to parquet files. > Add API for key-value metadata > -- > > Key: PARQUET-595 > URL: https://issues.apache.org/jira/browse/PARQUET-595 > Project: Parquet > Issue Type: Improvement > Components: parquet-cpp >Reporter: Uwe L. Korn > -- This message was sent by Atlassian JIRA (v6.3.15#6346)