[jira] [Assigned] (PARQUET-456) Add zlib codec support
[ https://issues.apache.org/jira/browse/PARQUET-456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney reassigned PARQUET-456: Assignee: Wes McKinney > Add zlib codec support > -- > > Key: PARQUET-456 > URL: https://issues.apache.org/jira/browse/PARQUET-456 > Project: Parquet > Issue Type: New Feature > Components: parquet-cpp >Reporter: Wes McKinney >Assignee: Wes McKinney > > See https://github.com/apache/parquet-cpp/pull/11 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PARQUET-497) Decouple Parquet physical file structure from FileReader class
Wes McKinney created PARQUET-497: Summary: Decouple Parquet physical file structure from FileReader class Key: PARQUET-497 URL: https://issues.apache.org/jira/browse/PARQUET-497 Project: Parquet Issue Type: Improvement Components: parquet-cpp Reporter: Wes McKinney It should be possible to unit test this class without creating an actual Parquet file. We can do this while also keeping the file-based initialization code path (see parquet_reader.cc) about as simple as it is now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PARQUET-499) Complete PlainEncoder implementation for all primitive types and test end to end
Wes McKinney created PARQUET-499: Summary: Complete PlainEncoder implementation for all primitive types and test end to end Key: PARQUET-499 URL: https://issues.apache.org/jira/browse/PARQUET-499 Project: Parquet Issue Type: New Feature Components: parquet-cpp Reporter: Wes McKinney As part of PARQUET-485, I added a partial {{Encoding::PLAIN}} encoder implementation. This needs to be finished, with a test suite that validates data round-trips across all primitive types. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (PARQUET-436) Implement ParquetFileWriter class entry point for generating new Parquet files
[ https://issues.apache.org/jira/browse/PARQUET-436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney reassigned PARQUET-436: Assignee: Wes McKinney > Implement ParquetFileWriter class entry point for generating new Parquet files > -- > > Key: PARQUET-436 > URL: https://issues.apache.org/jira/browse/PARQUET-436 > Project: Parquet > Issue Type: New Feature > Components: parquet-cpp >Reporter: Wes McKinney >Assignee: Wes McKinney > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PARQUET-496) Update to the latest cpplint
[ https://issues.apache.org/jira/browse/PARQUET-496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15126869#comment-15126869 ] Wes McKinney commented on PARQUET-496: -- Our {{make lint}} target is misconfigured. Patch in the works > Update to the latest cpplint > > > Key: PARQUET-496 > URL: https://issues.apache.org/jira/browse/PARQUET-496 > Project: Parquet > Issue Type: Improvement > Components: parquet-cpp >Reporter: Wes McKinney > > Indentation errors and other issues are passing through the Travis CI checks > (e.g. https://github.com/apache/parquet-cpp/pull/30), let's figure out why > this is and fix it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (PARQUET-496) Fix cpplint configuration to be more restrictive
[ https://issues.apache.org/jira/browse/PARQUET-496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nong Li resolved PARQUET-496. - Resolution: Fixed Fix Version/s: cpp-0.1 Issue resolved by pull request 33 [https://github.com/apache/parquet-cpp/pull/33] > Fix cpplint configuration to be more restrictive > > > Key: PARQUET-496 > URL: https://issues.apache.org/jira/browse/PARQUET-496 > Project: Parquet > Issue Type: Improvement > Components: parquet-cpp >Reporter: Wes McKinney >Assignee: Wes McKinney > Fix For: cpp-0.1 > > > Indentation errors and other issues are passing through the Travis CI checks > (e.g. https://github.com/apache/parquet-cpp/pull/30), let's figure out why > this is and fix it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (PARQUET-454) Address inconsistencies in boolean decoding
[ https://issues.apache.org/jira/browse/PARQUET-454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney reassigned PARQUET-454: Assignee: Wes McKinney > Address inconsistencies in boolean decoding > --- > > Key: PARQUET-454 > URL: https://issues.apache.org/jira/browse/PARQUET-454 > Project: Parquet > Issue Type: Bug > Components: parquet-cpp >Reporter: Wes McKinney >Assignee: Wes McKinney > > See patch https://github.com/apache/parquet-cpp/pull/12 > I suggest adding unit tests to verify the fix proposed in this patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PARQUET-496) Fix cpplint configuration to be more restrictive
[ https://issues.apache.org/jira/browse/PARQUET-496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated PARQUET-496: - Summary: Fix cpplint configuration to be more restrictive (was: Update to the latest cpplint) > Fix cpplint configuration to be more restrictive > > > Key: PARQUET-496 > URL: https://issues.apache.org/jira/browse/PARQUET-496 > Project: Parquet > Issue Type: Improvement > Components: parquet-cpp >Reporter: Wes McKinney > > Indentation errors and other issues are passing through the Travis CI checks > (e.g. https://github.com/apache/parquet-cpp/pull/30), let's figure out why > this is and fix it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PARQUET-500) Enable coveralls.io for apache/parquet-cpp
Wes McKinney created PARQUET-500: Summary: Enable coveralls.io for apache/parquet-cpp Key: PARQUET-500 URL: https://issues.apache.org/jira/browse/PARQUET-500 Project: Parquet Issue Type: Improvement Components: parquet-cpp Reporter: Wes McKinney This will enable me to upload code coverage re: PARQUET-486. This can be handled by anyone with admin on parquet-cpp. Please let me know the API token details by some means when you do that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PARQUET-454) Address inconsistencies in boolean decoding
[ https://issues.apache.org/jira/browse/PARQUET-454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127125#comment-15127125 ] Wes McKinney commented on PARQUET-454: -- Fixed in https://github.com/apache/parquet-cpp/pull/34. Will rebase when PARQUET-485 is merged > Address inconsistencies in boolean decoding > --- > > Key: PARQUET-454 > URL: https://issues.apache.org/jira/browse/PARQUET-454 > Project: Parquet > Issue Type: Bug > Components: parquet-cpp >Reporter: Wes McKinney >Assignee: Wes McKinney > > See patch https://github.com/apache/parquet-cpp/pull/12 > I suggest adding unit tests to verify the fix proposed in this patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PARQUET-401) Deprecate Log and move to SLF4J Logger
[ https://issues.apache.org/jira/browse/PARQUET-401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127535#comment-15127535 ] Liwei Lin commented on PARQUET-401: --- Looked through the code base and have found hundreds of Log.xxx() usages within 90+ classes. Should take 3 days to replace all of them. Do we want to get this in 1.9.0? I think it'd better not delay the release. > Deprecate Log and move to SLF4J Logger > -- > > Key: PARQUET-401 > URL: https://issues.apache.org/jira/browse/PARQUET-401 > Project: Parquet > Issue Type: Bug > Components: parquet-mr >Affects Versions: 1.8.1 >Reporter: Ryan Blue > > The current Log class is intended to allow swapping out logger back-ends, but > SLF4J already does this. It also doesn't expose as nice of an API as SLF4J, > which can handle formatting to avoid the cost of building log messages that > won't be used. I think we should deprecate the org.apache.parquet.Log class > and move to using SLF4J directly, instead of wrapping SLF4J (PARQUET-305). > This will require deprecating the current Log class and replacing the current > uses of it with SLF4J. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (PARQUET-438) Update RLE encoder/decoder modules from Impala upstream changes and adapt unit tests
[ https://issues.apache.org/jira/browse/PARQUET-438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Le Dem resolved PARQUET-438. --- Resolution: Fixed Fix Version/s: cpp-0.1 Issue resolved by pull request 31 [https://github.com/apache/parquet-cpp/pull/31] > Update RLE encoder/decoder modules from Impala upstream changes and adapt > unit tests > > > Key: PARQUET-438 > URL: https://issues.apache.org/jira/browse/PARQUET-438 > Project: Parquet > Issue Type: New Feature > Components: parquet-cpp >Reporter: Wes McKinney >Assignee: Wes McKinney > Fix For: cpp-0.1 > > > Depends on PARQUET-437 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PARQUET-401) Deprecate Log and move to SLF4J Logger
[ https://issues.apache.org/jira/browse/PARQUET-401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127515#comment-15127515 ] Liwei Lin commented on PARQUET-401: --- hi [~julienledem], [~rdblue], and [~liancheng]: Now that [Parquet-305|https://issues.apache.org/jira/browse/PARQUET-305] has been merged, maybe we should consider replacing all Log.java usages with slf4j? Should anyone hasn't started it yet, I'd like to do this. Will remove the +if (Log.DEBUG)+ condition, and place the original +LOG.debug("msg is" + msg)+ with the slfj4 parameterized form +LOG.debug("msg is {}", msg)+, leaving it for slf4j to judge if the certain log level is enabled or not. > Deprecate Log and move to SLF4J Logger > -- > > Key: PARQUET-401 > URL: https://issues.apache.org/jira/browse/PARQUET-401 > Project: Parquet > Issue Type: Bug > Components: parquet-mr >Affects Versions: 1.8.1 >Reporter: Ryan Blue > > The current Log class is intended to allow swapping out logger back-ends, but > SLF4J already does this. It also doesn't expose as nice of an API as SLF4J, > which can handle formatting to avoid the cost of building log messages that > won't be used. I think we should deprecate the org.apache.parquet.Log class > and move to using SLF4J directly, instead of wrapping SLF4J (PARQUET-305). > This will require deprecating the current Log class and replacing the current > uses of it with SLF4J. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PARQUET-485) Decouple data page delimiting from column reader / scanner classes, create test fixtures
[ https://issues.apache.org/jira/browse/PARQUET-485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15126474#comment-15126474 ] Wes McKinney commented on PARQUET-485: -- See https://github.com/apache/parquet-cpp/pull/32. This is ready for review and merge after addressing CR comments > Decouple data page delimiting from column reader / scanner classes, create > test fixtures > > > Key: PARQUET-485 > URL: https://issues.apache.org/jira/browse/PARQUET-485 > Project: Parquet > Issue Type: New Feature > Components: parquet-cpp >Reporter: Wes McKinney >Assignee: Wes McKinney > > It is difficult to test the column reader classes with mock data because the > data page resolution is tightly coupled to the actual file format layout in > {{ColumnReader::ReadNewPage}}. > I plan to separate these concerns, so that the column readers can be tested > with a sequence of data pages encoded in memory, but never actually assembled > into a file stream layout with thrift-serialized page headers. Patch > forthcoming. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (PARQUET-460) Parquet files concat tool
[ https://issues.apache.org/jira/browse/PARQUET-460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] flykobe cheng reassigned PARQUET-460: - Assignee: flykobe cheng > Parquet files concat tool > - > > Key: PARQUET-460 > URL: https://issues.apache.org/jira/browse/PARQUET-460 > Project: Parquet > Issue Type: Improvement > Components: parquet-mr >Affects Versions: 1.7.0, 1.8.0 >Reporter: flykobe cheng >Assignee: flykobe cheng > > Currently the parquet file generation is time consuming, most of time used > for serialize and compress. It cost about 10mins to generate a 100MB~ parquet > file in our scenario. We want to improve write performance without generate > too many small files, which will impact read performance. > We propose to: > 1. generate several small parquet files concurrently > 2. merge small files to one file: concat the parquet blocks in binary > (without SerDe), merge footers and modify the path and offset metadata. > We create ParquetFilesConcat class to finish step 2. It can be invoked by > parquet.tools.command.ConcatCommand. If this function approved by parquet > community, we will integrate it in spark. > It will impact compression and introduced more dictionary pages, but it can > be improved by adjusting the concurrency of step 1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (PARQUET-496) Fix cpplint configuration to be more restrictive
[ https://issues.apache.org/jira/browse/PARQUET-496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney reassigned PARQUET-496: Assignee: Wes McKinney > Fix cpplint configuration to be more restrictive > > > Key: PARQUET-496 > URL: https://issues.apache.org/jira/browse/PARQUET-496 > Project: Parquet > Issue Type: Improvement > Components: parquet-cpp >Reporter: Wes McKinney >Assignee: Wes McKinney > > Indentation errors and other issues are passing through the Travis CI checks > (e.g. https://github.com/apache/parquet-cpp/pull/30), let's figure out why > this is and fix it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PARQUET-496) Fix cpplint configuration to be more restrictive
[ https://issues.apache.org/jira/browse/PARQUET-496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15126889#comment-15126889 ] Wes McKinney commented on PARQUET-496: -- See https://github.com/apache/parquet-cpp/pull/33 > Fix cpplint configuration to be more restrictive > > > Key: PARQUET-496 > URL: https://issues.apache.org/jira/browse/PARQUET-496 > Project: Parquet > Issue Type: Improvement > Components: parquet-cpp >Reporter: Wes McKinney > > Indentation errors and other issues are passing through the Travis CI checks > (e.g. https://github.com/apache/parquet-cpp/pull/30), let's figure out why > this is and fix it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (PARQUET-468) Add a cmake option to generate the Parquet thrift headers with the thriftc in the environment
[ https://issues.apache.org/jira/browse/PARQUET-468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney reassigned PARQUET-468: Assignee: Wes McKinney > Add a cmake option to generate the Parquet thrift headers with the thriftc in > the environment > - > > Key: PARQUET-468 > URL: https://issues.apache.org/jira/browse/PARQUET-468 > Project: Parquet > Issue Type: Improvement > Components: parquet-cpp >Reporter: Wes McKinney >Assignee: Wes McKinney > > Follow-up to PARQUET-449. This will help toolchains which are unable to > upgrade to the latest version of Thrift. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (PARQUET-478) Reassembly algorithms for nested in-memory columnar memory layout
[ https://issues.apache.org/jira/browse/PARQUET-478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney reassigned PARQUET-478: Assignee: Wes McKinney > Reassembly algorithms for nested in-memory columnar memory layout > - > > Key: PARQUET-478 > URL: https://issues.apache.org/jira/browse/PARQUET-478 > Project: Parquet > Issue Type: New Feature > Components: parquet-cpp >Reporter: Wes McKinney >Assignee: Wes McKinney > > I plan to use parquet-cpp primarily in conjunction with columnar data > structures. > Specifically, this requires in the interpretation of repetition / definition > levels: > * Computing null bits / bytes for each logical level of nested tree (group, > array, primitive leaf) > * Computing implied array sizes for each repeated group (according to 1, 2, > or 3-level array encoding) > The results of this reconstruction will be simply C arrays accompanied by the > parquet-cpp logical schema; this way we can make it easy to adapt to > different in-memory columnar memory schemes. > As far as implementation, it would make sense to proceed first with > functional unit tests of the reassembly algorithms using repetition / > definition levels declared in the test suite as C++ vectors -- otherwise it's > going to be too tedious trying to produce valid Parquet test data files which > explore all of the different edge cases. > Several other teams (Spark, Drill, Parquet-Java) are currently working on > related efforts along these lines, so we can engage when appropriate to > collaborate on algorithms and nuances of this approach to avoid unnecessary > code churn / bugs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (PARQUET-498) Add a ColumnChunk builder abstraction as part of creating new row groups
[ https://issues.apache.org/jira/browse/PARQUET-498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney reassigned PARQUET-498: Assignee: Wes McKinney > Add a ColumnChunk builder abstraction as part of creating new row groups > > > Key: PARQUET-498 > URL: https://issues.apache.org/jira/browse/PARQUET-498 > Project: Parquet > Issue Type: New Feature > Components: parquet-cpp >Reporter: Wes McKinney >Assignee: Wes McKinney > > Necessary for PARQUET-452, but we should treat as an independent task. > This class will be responsible for encapsulating creating a serialized > sequence of data pages. This way, users on the write path need only specify > the desired data page size, then write arrays of values, repetition, and > definiton levels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PARQUET-388) ProtoRecordConverter might wrongly cast a Message.Builder to Message
[ https://issues.apache.org/jira/browse/PARQUET-388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127838#comment-15127838 ] Matt Martin commented on PARQUET-388: - I agree with [~wxiang7] that there seems to be something a little bit off here. I'm also getting an unexpected ClassCastException in the following Scala code: {code} val reader = ParquetReader.builder(new ProtoReadSupport[SomeMessageClass](), new Path(file.toURI)).build ... reader.read {code} At reader.read I get the following exception: {code} ClassCastException: : SomeMessageClass$Builder cannot be cast to SomeMessageClass {code} I cannot change the declaration of reader to the following: {code} val reader = ParquetReader.builder(new ProtoReadSupport[SomeMessageClass$Builder](), new Path(file.toURI)).build {code} because then I get the following error: {code} type arguments [SomeMessageClass$Builder] do not conform to class ProtoReadSupport's type parameter bounds [T <: com.google.protobuf.Message] {code} > ProtoRecordConverter might wrongly cast a Message.Builder to Message > > > Key: PARQUET-388 > URL: https://issues.apache.org/jira/browse/PARQUET-388 > Project: Parquet > Issue Type: Bug > Components: parquet-mr >Reporter: Wu Xiang >Assignee: Reuben Kuhnert > > ProtoRecordConverter returns current record as follows: > {code} > public T getCurrentRecord() { > if (buildBefore) { > return (T) this.reusedBuilder.build(); > } else { > return (T) this.reusedBuilder; > } > } > {code} > However this might fail if T is subclass of Message and buildBefore == false, > since it's actually casting a Message.Builder instance to Message type. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Not able to compile parquet-tools
Hi All, I’m not able to compile the latest parquet-tools project from the repo. The error was because not being able to find the parquet-hadoop jar with version 1.6.0rc3-SNAPSHOT. Looks like, parquet-hadoop have updated to 1.6.1-SNAPSHOT version which was available. I used that version and compilation+test worked fine. Erroneous URL: https://oss.sonatype.org/content/repositories/snapshots/com/twitter/parquet-hadoop/1.6.0rc3-SNAPSHOT/parquet-hadoop-1.6.0rc3-SNAPSHOT.jar Good URL: https://oss.sonatype.org/content/repositories/snapshots/com/twitter/parquet-hadoop/1.6.1-SNAPSHOT/parquet-hadoop-1.6.1-SNAPSHOT.jar Changes in repo: [root@sandbox parquet-mr]# git diff diff --git a/parquet-tools/pom.xml b/parquet-tools/pom.xml index 5ac37c8..0d1c0c1 100644 --- a/parquet-tools/pom.xml +++ b/parquet-tools/pom.xml @@ -21,7 +21,7 @@ com.twitter parquet ../pom.xml -1.6.0rc3-SNAPSHOT +1.6.1-SNAPSHOT 4.0.0 diff --git a/pom.xml b/pom.xml index 6153d09..8bcb032 100644 --- a/pom.xml +++ b/pom.xml @@ -9,7 +9,7 @@ com.twitter parquet - 1.6.0rc3-SNAPSHOT + 1.6.1-SNAPSHOT pom Apache Parquet MR (Incubating) I apologize for taking a short cut and not creating a JIRA + PR. Maybe some other time… Thanks, Vipin Rathor Hortonworks, Inc.
[jira] [Commented] (PARQUET-401) Deprecate Log and move to SLF4J Logger
[ https://issues.apache.org/jira/browse/PARQUET-401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127668#comment-15127668 ] Cheng Lian commented on PARQUET-401: Fix of this issue is nice to have but probably shouldn't block 1.9.0. > Deprecate Log and move to SLF4J Logger > -- > > Key: PARQUET-401 > URL: https://issues.apache.org/jira/browse/PARQUET-401 > Project: Parquet > Issue Type: Bug > Components: parquet-mr >Affects Versions: 1.8.1 >Reporter: Ryan Blue > > The current Log class is intended to allow swapping out logger back-ends, but > SLF4J already does this. It also doesn't expose as nice of an API as SLF4J, > which can handle formatting to avoid the cost of building log messages that > won't be used. I think we should deprecate the org.apache.parquet.Log class > and move to using SLF4J directly, instead of wrapping SLF4J (PARQUET-305). > This will require deprecating the current Log class and replacing the current > uses of it with SLF4J. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (PARQUET-475) Run DebugPrint on all data files in the data/ directory
[ https://issues.apache.org/jira/browse/PARQUET-475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aliaksei Sandryhaila reassigned PARQUET-475: Assignee: Aliaksei Sandryhaila > Run DebugPrint on all data files in the data/ directory > --- > > Key: PARQUET-475 > URL: https://issues.apache.org/jira/browse/PARQUET-475 > Project: Parquet > Issue Type: Test > Components: parquet-cpp >Reporter: Wes McKinney >Assignee: Aliaksei Sandryhaila > > As a smoke test. Follow-up to PARQUET-453 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PARQUET-481) Refactor and expand reader-test
[ https://issues.apache.org/jira/browse/PARQUET-481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15126425#comment-15126425 ] Aliaksei Sandryhaila commented on PARQUET-481: -- If we want to be consistent with other codebases using parquet, let's keep unit tests next to the code. I'll separate scanner and reader tests, too. > Refactor and expand reader-test > --- > > Key: PARQUET-481 > URL: https://issues.apache.org/jira/browse/PARQUET-481 > Project: Parquet > Issue Type: Sub-task > Components: parquet-cpp >Affects Versions: cpp-0.1 >Reporter: Aliaksei Sandryhaila >Assignee: Aliaksei Sandryhaila > Fix For: cpp-0.1 > > > reader-test currently tests with a single parquet file and only verifies that > we can read it, not the correctness of the output. > Proposed changes: > - Move reader-test.cc to a separate directory parquet-cpp/tests (in the > future, all unit tests will be located there) > - Expand it to work with multiple files > - Add method ParquetFileReader::JsonPrint() that prints a file contents in a > json format, so we can consistently compare the output with the ground truth > stored in parquet-cpp/data. This method will also be more handy than > DebugPrint when we start working with nested columns. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PARQUET-481) Refactor and expand reader-test
[ https://issues.apache.org/jira/browse/PARQUET-481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aliaksei Sandryhaila updated PARQUET-481: - Description: reader-test currently tests with a single parquet file and only verifies that we can read it, not the correctness of the output. Proposed changes: - Expand it to work with multiple files - Move tests for Scanner to scanner-test.cc - Add method ParquetFileReader::JsonPrint() that prints a file contents in a json format, so we can consistently compare the output with the ground truth stored in parquet-cpp/data. This method will also be more handy than DebugPrint when we start working with nested columns. was: reader-test currently tests with a single parquet file and only verifies that we can read it, not the correctness of the output. Proposed changes: - Move reader-test.cc to a separate directory parquet-cpp/tests (in the future, all unit tests will be located there) - Expand it to work with multiple files - Add method ParquetFileReader::JsonPrint() that prints a file contents in a json format, so we can consistently compare the output with the ground truth stored in parquet-cpp/data. This method will also be more handy than DebugPrint when we start working with nested columns. > Refactor and expand reader-test > --- > > Key: PARQUET-481 > URL: https://issues.apache.org/jira/browse/PARQUET-481 > Project: Parquet > Issue Type: Sub-task > Components: parquet-cpp >Affects Versions: cpp-0.1 >Reporter: Aliaksei Sandryhaila >Assignee: Aliaksei Sandryhaila > Fix For: cpp-0.1 > > > reader-test currently tests with a single parquet file and only verifies that > we can read it, not the correctness of the output. > Proposed changes: > - Expand it to work with multiple files > - Move tests for Scanner to scanner-test.cc > - Add method ParquetFileReader::JsonPrint() that prints a file contents in a > json format, so we can consistently compare the output with the ground truth > stored in parquet-cpp/data. This method will also be more handy than > DebugPrint when we start working with nested columns. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PARQUET-481) Refactor and expand reader-test
[ https://issues.apache.org/jira/browse/PARQUET-481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aliaksei Sandryhaila updated PARQUET-481: - Issue Type: Improvement (was: Sub-task) Parent: (was: PARQUET-479) > Refactor and expand reader-test > --- > > Key: PARQUET-481 > URL: https://issues.apache.org/jira/browse/PARQUET-481 > Project: Parquet > Issue Type: Improvement > Components: parquet-cpp >Affects Versions: cpp-0.1 >Reporter: Aliaksei Sandryhaila >Assignee: Aliaksei Sandryhaila > Fix For: cpp-0.1 > > > reader-test currently tests with a single parquet file and only verifies that > we can read it, not the correctness of the output. > Proposed changes: > - Expand it to work with multiple files > - Move tests for Scanner to scanner-test.cc > - Add method ParquetFileReader::JsonPrint() that prints a file contents in a > json format, so we can consistently compare the output with the ground truth > stored in parquet-cpp/data. This method will also be more handy than > DebugPrint when we start working with nested columns. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (PARQUET-479) Improve/expand functional unit tests
[ https://issues.apache.org/jira/browse/PARQUET-479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aliaksei Sandryhaila resolved PARQUET-479. -- Resolution: Won't Fix This is not an issue, but rather a discussion on functional and intergration tests. It has been moved to https://docs.google.com/document/d/1WyquzupLc3UkErO2OhqLJNQ9a84Cccc8LVUSuLQz39o/edit#. > Improve/expand functional unit tests > > > Key: PARQUET-479 > URL: https://issues.apache.org/jira/browse/PARQUET-479 > Project: Parquet > Issue Type: Improvement > Components: parquet-cpp >Affects Versions: cpp-0.1 >Reporter: Aliaksei Sandryhaila >Assignee: Aliaksei Sandryhaila > Fix For: cpp-0.1 > > > We need to add a testing framework for unit tests, and run it as a part of > each Travis CI build. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (PARQUET-495) Fix mismatches in Types class comments
[ https://issues.apache.org/jira/browse/PARQUET-495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved PARQUET-495. Resolution: Fixed Issue resolved by pull request 317 [https://github.com/apache/parquet-mr/pull/317] > Fix mismatches in Types class comments > -- > > Key: PARQUET-495 > URL: https://issues.apache.org/jira/browse/PARQUET-495 > Project: Parquet > Issue Type: Bug > Components: parquet-mr >Affects Versions: 1.8.0, 1.8.1 >Reporter: Liwei Lin >Assignee: Liwei Lin >Priority: Trivial > Fix For: 1.9.0 > > > To produce: > required group User \{ > required int64 id; > *optional* binary email (UTF8); > \} > we should do: > Types.requiredGroup() > .required(INT64).named("id") > .-*required* (BINARY).as(UTF8).named("email")- > .*optional* (BINARY).as(UTF8).named("email") > .named("User") -- This message was sent by Atlassian JIRA (v6.3.4#6332)