[jira] [Assigned] (PARQUET-456) Add zlib codec support

2016-02-01 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/PARQUET-456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned PARQUET-456:


Assignee: Wes McKinney

> Add zlib codec support
> --
>
> Key: PARQUET-456
> URL: https://issues.apache.org/jira/browse/PARQUET-456
> Project: Parquet
>  Issue Type: New Feature
>  Components: parquet-cpp
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>
> See https://github.com/apache/parquet-cpp/pull/11



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PARQUET-497) Decouple Parquet physical file structure from FileReader class

2016-02-01 Thread Wes McKinney (JIRA)
Wes McKinney created PARQUET-497:


 Summary: Decouple Parquet physical file structure from FileReader 
class
 Key: PARQUET-497
 URL: https://issues.apache.org/jira/browse/PARQUET-497
 Project: Parquet
  Issue Type: Improvement
  Components: parquet-cpp
Reporter: Wes McKinney


It should be possible to unit test this class without creating an actual 
Parquet file. We can do this while also keeping the file-based initialization 
code path (see parquet_reader.cc) about as simple as it is now. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PARQUET-499) Complete PlainEncoder implementation for all primitive types and test end to end

2016-02-01 Thread Wes McKinney (JIRA)
Wes McKinney created PARQUET-499:


 Summary: Complete PlainEncoder implementation for all primitive 
types and test end to end
 Key: PARQUET-499
 URL: https://issues.apache.org/jira/browse/PARQUET-499
 Project: Parquet
  Issue Type: New Feature
  Components: parquet-cpp
Reporter: Wes McKinney


As part of PARQUET-485, I added a partial {{Encoding::PLAIN}} encoder 
implementation. This needs to be finished, with a test suite that validates 
data round-trips across all primitive types. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (PARQUET-436) Implement ParquetFileWriter class entry point for generating new Parquet files

2016-02-01 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/PARQUET-436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned PARQUET-436:


Assignee: Wes McKinney

> Implement ParquetFileWriter class entry point for generating new Parquet files
> --
>
> Key: PARQUET-436
> URL: https://issues.apache.org/jira/browse/PARQUET-436
> Project: Parquet
>  Issue Type: New Feature
>  Components: parquet-cpp
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PARQUET-496) Update to the latest cpplint

2016-02-01 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15126869#comment-15126869
 ] 

Wes McKinney commented on PARQUET-496:
--

Our {{make lint}} target is misconfigured. Patch in the works

> Update to the latest cpplint
> 
>
> Key: PARQUET-496
> URL: https://issues.apache.org/jira/browse/PARQUET-496
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-cpp
>Reporter: Wes McKinney
>
> Indentation errors and other issues are passing through the Travis CI checks 
> (e.g. https://github.com/apache/parquet-cpp/pull/30), let's figure out why 
> this is and fix it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (PARQUET-496) Fix cpplint configuration to be more restrictive

2016-02-01 Thread Nong Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/PARQUET-496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nong Li resolved PARQUET-496.
-
   Resolution: Fixed
Fix Version/s: cpp-0.1

Issue resolved by pull request 33
[https://github.com/apache/parquet-cpp/pull/33]

> Fix cpplint configuration to be more restrictive
> 
>
> Key: PARQUET-496
> URL: https://issues.apache.org/jira/browse/PARQUET-496
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-cpp
>Reporter: Wes McKinney
>Assignee: Wes McKinney
> Fix For: cpp-0.1
>
>
> Indentation errors and other issues are passing through the Travis CI checks 
> (e.g. https://github.com/apache/parquet-cpp/pull/30), let's figure out why 
> this is and fix it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (PARQUET-454) Address inconsistencies in boolean decoding

2016-02-01 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/PARQUET-454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned PARQUET-454:


Assignee: Wes McKinney

> Address inconsistencies in boolean decoding
> ---
>
> Key: PARQUET-454
> URL: https://issues.apache.org/jira/browse/PARQUET-454
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-cpp
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>
> See patch https://github.com/apache/parquet-cpp/pull/12
> I suggest adding unit tests to verify the fix proposed in this patch. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PARQUET-496) Fix cpplint configuration to be more restrictive

2016-02-01 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/PARQUET-496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated PARQUET-496:
-
Summary: Fix cpplint configuration to be more restrictive  (was: Update to 
the latest cpplint)

> Fix cpplint configuration to be more restrictive
> 
>
> Key: PARQUET-496
> URL: https://issues.apache.org/jira/browse/PARQUET-496
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-cpp
>Reporter: Wes McKinney
>
> Indentation errors and other issues are passing through the Travis CI checks 
> (e.g. https://github.com/apache/parquet-cpp/pull/30), let's figure out why 
> this is and fix it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PARQUET-500) Enable coveralls.io for apache/parquet-cpp

2016-02-01 Thread Wes McKinney (JIRA)
Wes McKinney created PARQUET-500:


 Summary: Enable coveralls.io for apache/parquet-cpp
 Key: PARQUET-500
 URL: https://issues.apache.org/jira/browse/PARQUET-500
 Project: Parquet
  Issue Type: Improvement
  Components: parquet-cpp
Reporter: Wes McKinney


This will enable me to upload code coverage re: PARQUET-486. This can be 
handled by anyone with admin on parquet-cpp. Please let me know the API token 
details by some means when you do that. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PARQUET-454) Address inconsistencies in boolean decoding

2016-02-01 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127125#comment-15127125
 ] 

Wes McKinney commented on PARQUET-454:
--

Fixed in https://github.com/apache/parquet-cpp/pull/34. Will rebase when 
PARQUET-485 is merged

> Address inconsistencies in boolean decoding
> ---
>
> Key: PARQUET-454
> URL: https://issues.apache.org/jira/browse/PARQUET-454
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-cpp
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>
> See patch https://github.com/apache/parquet-cpp/pull/12
> I suggest adding unit tests to verify the fix proposed in this patch. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PARQUET-401) Deprecate Log and move to SLF4J Logger

2016-02-01 Thread Liwei Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127535#comment-15127535
 ] 

Liwei Lin commented on PARQUET-401:
---

Looked through the code base and have found hundreds of Log.xxx() usages within 
90+ classes. Should take 3 days to replace all of them. Do we want to get this 
in 1.9.0? I think it'd better not delay the release.

> Deprecate Log and move to SLF4J Logger
> --
>
> Key: PARQUET-401
> URL: https://issues.apache.org/jira/browse/PARQUET-401
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Affects Versions: 1.8.1
>Reporter: Ryan Blue
>
> The current Log class is intended to allow swapping out logger back-ends, but 
> SLF4J already does this. It also doesn't expose as nice of an API as SLF4J, 
> which can handle formatting to avoid the cost of building log messages that 
> won't be used. I think we should deprecate the org.apache.parquet.Log class 
> and move to using SLF4J directly, instead of wrapping SLF4J (PARQUET-305).
> This will require deprecating the current Log class and replacing the current 
> uses of it with SLF4J.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (PARQUET-438) Update RLE encoder/decoder modules from Impala upstream changes and adapt unit tests

2016-02-01 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PARQUET-438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem resolved PARQUET-438.
---
   Resolution: Fixed
Fix Version/s: cpp-0.1

Issue resolved by pull request 31
[https://github.com/apache/parquet-cpp/pull/31]

> Update RLE encoder/decoder modules from Impala upstream changes and adapt 
> unit tests
> 
>
> Key: PARQUET-438
> URL: https://issues.apache.org/jira/browse/PARQUET-438
> Project: Parquet
>  Issue Type: New Feature
>  Components: parquet-cpp
>Reporter: Wes McKinney
>Assignee: Wes McKinney
> Fix For: cpp-0.1
>
>
> Depends on PARQUET-437



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PARQUET-401) Deprecate Log and move to SLF4J Logger

2016-02-01 Thread Liwei Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127515#comment-15127515
 ] 

Liwei Lin commented on PARQUET-401:
---

hi [~julienledem], [~rdblue], and [~liancheng]:
Now that [Parquet-305|https://issues.apache.org/jira/browse/PARQUET-305] has 
been merged, maybe we should consider replacing all Log.java usages with slf4j? 
Should anyone hasn't started it yet, I'd like to do this.

Will remove the +if (Log.DEBUG)+ condition, and place the original 
+LOG.debug("msg is" + msg)+ with the slfj4 parameterized form +LOG.debug("msg 
is {}", msg)+, leaving it for slf4j to judge if the certain log level is 
enabled or not.

> Deprecate Log and move to SLF4J Logger
> --
>
> Key: PARQUET-401
> URL: https://issues.apache.org/jira/browse/PARQUET-401
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Affects Versions: 1.8.1
>Reporter: Ryan Blue
>
> The current Log class is intended to allow swapping out logger back-ends, but 
> SLF4J already does this. It also doesn't expose as nice of an API as SLF4J, 
> which can handle formatting to avoid the cost of building log messages that 
> won't be used. I think we should deprecate the org.apache.parquet.Log class 
> and move to using SLF4J directly, instead of wrapping SLF4J (PARQUET-305).
> This will require deprecating the current Log class and replacing the current 
> uses of it with SLF4J.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PARQUET-485) Decouple data page delimiting from column reader / scanner classes, create test fixtures

2016-02-01 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15126474#comment-15126474
 ] 

Wes McKinney commented on PARQUET-485:
--

See https://github.com/apache/parquet-cpp/pull/32. This is ready for review and 
merge after addressing CR comments

> Decouple data page delimiting from column reader / scanner classes, create 
> test fixtures
> 
>
> Key: PARQUET-485
> URL: https://issues.apache.org/jira/browse/PARQUET-485
> Project: Parquet
>  Issue Type: New Feature
>  Components: parquet-cpp
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>
> It is difficult to test the column reader classes with mock data because the 
> data page resolution is tightly coupled to the actual file format layout in 
> {{ColumnReader::ReadNewPage}}.
> I plan to separate these concerns, so that the column readers can be tested 
> with a sequence of data pages encoded in memory, but never actually assembled 
> into a file stream layout with thrift-serialized page headers. Patch 
> forthcoming.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (PARQUET-460) Parquet files concat tool

2016-02-01 Thread flykobe cheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/PARQUET-460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

flykobe cheng reassigned PARQUET-460:
-

Assignee: flykobe cheng

> Parquet files concat tool
> -
>
> Key: PARQUET-460
> URL: https://issues.apache.org/jira/browse/PARQUET-460
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Affects Versions: 1.7.0, 1.8.0
>Reporter: flykobe cheng
>Assignee: flykobe cheng
>
> Currently the parquet file generation is time consuming, most of time used 
> for serialize and compress. It cost about 10mins to generate a 100MB~ parquet 
> file in our scenario. We want to improve write performance without generate 
> too many small files, which will impact read performance.
> We propose to:
> 1. generate several small parquet files concurrently
> 2. merge small files to one file: concat the parquet blocks in binary 
> (without SerDe), merge footers and modify the path and offset metadata.
> We create ParquetFilesConcat class to finish step 2. It can be invoked by 
> parquet.tools.command.ConcatCommand. If this function approved by parquet 
> community, we will integrate it in spark.
> It will impact compression and introduced more dictionary pages, but it can 
> be improved by adjusting the concurrency of step 1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (PARQUET-496) Fix cpplint configuration to be more restrictive

2016-02-01 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/PARQUET-496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned PARQUET-496:


Assignee: Wes McKinney

> Fix cpplint configuration to be more restrictive
> 
>
> Key: PARQUET-496
> URL: https://issues.apache.org/jira/browse/PARQUET-496
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-cpp
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>
> Indentation errors and other issues are passing through the Travis CI checks 
> (e.g. https://github.com/apache/parquet-cpp/pull/30), let's figure out why 
> this is and fix it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PARQUET-496) Fix cpplint configuration to be more restrictive

2016-02-01 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15126889#comment-15126889
 ] 

Wes McKinney commented on PARQUET-496:
--

See https://github.com/apache/parquet-cpp/pull/33

> Fix cpplint configuration to be more restrictive
> 
>
> Key: PARQUET-496
> URL: https://issues.apache.org/jira/browse/PARQUET-496
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-cpp
>Reporter: Wes McKinney
>
> Indentation errors and other issues are passing through the Travis CI checks 
> (e.g. https://github.com/apache/parquet-cpp/pull/30), let's figure out why 
> this is and fix it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (PARQUET-468) Add a cmake option to generate the Parquet thrift headers with the thriftc in the environment

2016-02-01 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/PARQUET-468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned PARQUET-468:


Assignee: Wes McKinney

> Add a cmake option to generate the Parquet thrift headers with the thriftc in 
> the environment
> -
>
> Key: PARQUET-468
> URL: https://issues.apache.org/jira/browse/PARQUET-468
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-cpp
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>
> Follow-up to PARQUET-449. This will help toolchains which are unable to 
> upgrade to the latest version of Thrift. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (PARQUET-478) Reassembly algorithms for nested in-memory columnar memory layout

2016-02-01 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/PARQUET-478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned PARQUET-478:


Assignee: Wes McKinney

> Reassembly algorithms for nested in-memory columnar memory layout
> -
>
> Key: PARQUET-478
> URL: https://issues.apache.org/jira/browse/PARQUET-478
> Project: Parquet
>  Issue Type: New Feature
>  Components: parquet-cpp
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>
> I plan to use parquet-cpp primarily in conjunction with columnar data 
> structures. 
> Specifically, this requires in the interpretation of repetition / definition 
> levels:
> * Computing null bits / bytes for each logical level of nested tree (group, 
> array, primitive leaf)
> * Computing implied array sizes for each repeated group (according to 1, 2, 
> or 3-level array encoding)
> The results of this reconstruction will be simply C arrays accompanied by the 
> parquet-cpp logical schema; this way we can make it easy to adapt to 
> different in-memory columnar memory schemes. 
> As far as implementation, it would make sense to proceed first with 
> functional unit tests of the reassembly algorithms using repetition / 
> definition levels declared in the test suite as C++ vectors -- otherwise it's 
> going to be too tedious trying to produce valid Parquet test data files which 
> explore all of the different edge cases.
> Several other teams (Spark, Drill, Parquet-Java) are currently working on 
> related efforts along these lines, so we can engage when appropriate to 
> collaborate on algorithms and nuances of this approach to avoid unnecessary 
> code churn / bugs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (PARQUET-498) Add a ColumnChunk builder abstraction as part of creating new row groups

2016-02-01 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/PARQUET-498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned PARQUET-498:


Assignee: Wes McKinney

> Add a ColumnChunk builder abstraction as part of creating new row groups
> 
>
> Key: PARQUET-498
> URL: https://issues.apache.org/jira/browse/PARQUET-498
> Project: Parquet
>  Issue Type: New Feature
>  Components: parquet-cpp
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>
> Necessary for PARQUET-452, but we should treat as an independent task.
> This class will be responsible for encapsulating creating a serialized 
> sequence of data pages. This way, users on the write path need only specify 
> the desired data page size, then write arrays of values, repetition, and 
> definiton levels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PARQUET-388) ProtoRecordConverter might wrongly cast a Message.Builder to Message

2016-02-01 Thread Matt Martin (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127838#comment-15127838
 ] 

Matt Martin commented on PARQUET-388:
-

I agree with [~wxiang7] that there seems to be something a little bit off here. 
 I'm also getting an unexpected ClassCastException in the following Scala code:

{code}
val reader = ParquetReader.builder(new ProtoReadSupport[SomeMessageClass](), 
new Path(file.toURI)).build
...
reader.read
{code}

At reader.read I get the following exception:

{code}
ClassCastException: : SomeMessageClass$Builder cannot be cast to 
SomeMessageClass
{code}

I cannot change the declaration of reader to the following:

{code}
val reader = ParquetReader.builder(new 
ProtoReadSupport[SomeMessageClass$Builder](), new Path(file.toURI)).build
{code}

because then I get the following error:

{code}
type arguments [SomeMessageClass$Builder] do not conform to class 
ProtoReadSupport's type parameter bounds [T <: com.google.protobuf.Message]
{code}

> ProtoRecordConverter might wrongly cast a Message.Builder to Message
> 
>
> Key: PARQUET-388
> URL: https://issues.apache.org/jira/browse/PARQUET-388
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Reporter: Wu Xiang
>Assignee: Reuben Kuhnert
>
> ProtoRecordConverter returns current record as follows:
> {code}
>   public T getCurrentRecord() {
> if (buildBefore) {
>   return (T) this.reusedBuilder.build();
> } else {
>   return (T) this.reusedBuilder;
> }
>   }
> {code}
> However this might fail if T is subclass of Message and buildBefore == false, 
> since it's actually casting a Message.Builder instance to Message type.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Not able to compile parquet-tools

2016-02-01 Thread Vipin Rathor
Hi All,
I’m not able to compile the latest parquet-tools project from the repo. The 
error was because not being able to find the parquet-hadoop jar with version 
1.6.0rc3-SNAPSHOT. Looks like, parquet-hadoop have updated to 1.6.1-SNAPSHOT 
version which was available. I used that version and compilation+test worked 
fine.

Erroneous URL:
https://oss.sonatype.org/content/repositories/snapshots/com/twitter/parquet-hadoop/1.6.0rc3-SNAPSHOT/parquet-hadoop-1.6.0rc3-SNAPSHOT.jar

Good URL:
https://oss.sonatype.org/content/repositories/snapshots/com/twitter/parquet-hadoop/1.6.1-SNAPSHOT/parquet-hadoop-1.6.1-SNAPSHOT.jar

Changes in repo:
[root@sandbox parquet-mr]# git diff
diff --git a/parquet-tools/pom.xml b/parquet-tools/pom.xml
index 5ac37c8..0d1c0c1 100644
--- a/parquet-tools/pom.xml
+++ b/parquet-tools/pom.xml
@@ -21,7 +21,7 @@
 com.twitter
 parquet
 ../pom.xml
-1.6.0rc3-SNAPSHOT
+1.6.1-SNAPSHOT
   

   4.0.0
diff --git a/pom.xml b/pom.xml
index 6153d09..8bcb032 100644
--- a/pom.xml
+++ b/pom.xml
@@ -9,7 +9,7 @@

   com.twitter
   parquet
-  1.6.0rc3-SNAPSHOT
+  1.6.1-SNAPSHOT
   pom

   Apache Parquet MR (Incubating)

I apologize for taking a short cut and not creating a JIRA + PR.
Maybe some other time…

Thanks,
Vipin Rathor
Hortonworks, Inc.


[jira] [Commented] (PARQUET-401) Deprecate Log and move to SLF4J Logger

2016-02-01 Thread Cheng Lian (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127668#comment-15127668
 ] 

Cheng Lian commented on PARQUET-401:


Fix of this issue is nice to have but probably shouldn't block 1.9.0.

> Deprecate Log and move to SLF4J Logger
> --
>
> Key: PARQUET-401
> URL: https://issues.apache.org/jira/browse/PARQUET-401
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Affects Versions: 1.8.1
>Reporter: Ryan Blue
>
> The current Log class is intended to allow swapping out logger back-ends, but 
> SLF4J already does this. It also doesn't expose as nice of an API as SLF4J, 
> which can handle formatting to avoid the cost of building log messages that 
> won't be used. I think we should deprecate the org.apache.parquet.Log class 
> and move to using SLF4J directly, instead of wrapping SLF4J (PARQUET-305).
> This will require deprecating the current Log class and replacing the current 
> uses of it with SLF4J.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (PARQUET-475) Run DebugPrint on all data files in the data/ directory

2016-02-01 Thread Aliaksei Sandryhaila (JIRA)

 [ 
https://issues.apache.org/jira/browse/PARQUET-475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aliaksei Sandryhaila reassigned PARQUET-475:


Assignee: Aliaksei Sandryhaila

> Run DebugPrint on all data files in the data/ directory
> ---
>
> Key: PARQUET-475
> URL: https://issues.apache.org/jira/browse/PARQUET-475
> Project: Parquet
>  Issue Type: Test
>  Components: parquet-cpp
>Reporter: Wes McKinney
>Assignee: Aliaksei Sandryhaila
>
> As a smoke test. Follow-up to PARQUET-453



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PARQUET-481) Refactor and expand reader-test

2016-02-01 Thread Aliaksei Sandryhaila (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15126425#comment-15126425
 ] 

Aliaksei Sandryhaila commented on PARQUET-481:
--

If we want to be consistent with other codebases using parquet, let's keep unit 
tests next to the code. I'll separate scanner and reader tests, too.

> Refactor and expand reader-test
> ---
>
> Key: PARQUET-481
> URL: https://issues.apache.org/jira/browse/PARQUET-481
> Project: Parquet
>  Issue Type: Sub-task
>  Components: parquet-cpp
>Affects Versions: cpp-0.1
>Reporter: Aliaksei Sandryhaila
>Assignee: Aliaksei Sandryhaila
> Fix For: cpp-0.1
>
>
> reader-test currently tests with a single parquet file and only verifies that 
> we can read it, not the correctness of the output.
> Proposed changes:
> - Move reader-test.cc to a separate directory parquet-cpp/tests (in the 
> future, all unit tests will be located there)
> - Expand it to work with multiple files
> - Add method ParquetFileReader::JsonPrint() that prints a file contents in a 
> json format, so we can consistently compare the output with the ground truth 
> stored in parquet-cpp/data. This method will also be more handy than 
> DebugPrint when we start working with nested columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PARQUET-481) Refactor and expand reader-test

2016-02-01 Thread Aliaksei Sandryhaila (JIRA)

 [ 
https://issues.apache.org/jira/browse/PARQUET-481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aliaksei Sandryhaila updated PARQUET-481:
-
Description: 
reader-test currently tests with a single parquet file and only verifies that 
we can read it, not the correctness of the output.

Proposed changes:
- Expand it to work with multiple files
- Move tests for Scanner to scanner-test.cc
- Add method ParquetFileReader::JsonPrint() that prints a file contents in a 
json format, so we can consistently compare the output with the ground truth 
stored in parquet-cpp/data. This method will also be more handy than DebugPrint 
when we start working with nested columns.

  was:
reader-test currently tests with a single parquet file and only verifies that 
we can read it, not the correctness of the output.

Proposed changes:
- Move reader-test.cc to a separate directory parquet-cpp/tests (in the future, 
all unit tests will be located there)
- Expand it to work with multiple files
- Add method ParquetFileReader::JsonPrint() that prints a file contents in a 
json format, so we can consistently compare the output with the ground truth 
stored in parquet-cpp/data. This method will also be more handy than DebugPrint 
when we start working with nested columns.


> Refactor and expand reader-test
> ---
>
> Key: PARQUET-481
> URL: https://issues.apache.org/jira/browse/PARQUET-481
> Project: Parquet
>  Issue Type: Sub-task
>  Components: parquet-cpp
>Affects Versions: cpp-0.1
>Reporter: Aliaksei Sandryhaila
>Assignee: Aliaksei Sandryhaila
> Fix For: cpp-0.1
>
>
> reader-test currently tests with a single parquet file and only verifies that 
> we can read it, not the correctness of the output.
> Proposed changes:
> - Expand it to work with multiple files
> - Move tests for Scanner to scanner-test.cc
> - Add method ParquetFileReader::JsonPrint() that prints a file contents in a 
> json format, so we can consistently compare the output with the ground truth 
> stored in parquet-cpp/data. This method will also be more handy than 
> DebugPrint when we start working with nested columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PARQUET-481) Refactor and expand reader-test

2016-02-01 Thread Aliaksei Sandryhaila (JIRA)

 [ 
https://issues.apache.org/jira/browse/PARQUET-481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aliaksei Sandryhaila updated PARQUET-481:
-
Issue Type: Improvement  (was: Sub-task)
Parent: (was: PARQUET-479)

> Refactor and expand reader-test
> ---
>
> Key: PARQUET-481
> URL: https://issues.apache.org/jira/browse/PARQUET-481
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-cpp
>Affects Versions: cpp-0.1
>Reporter: Aliaksei Sandryhaila
>Assignee: Aliaksei Sandryhaila
> Fix For: cpp-0.1
>
>
> reader-test currently tests with a single parquet file and only verifies that 
> we can read it, not the correctness of the output.
> Proposed changes:
> - Expand it to work with multiple files
> - Move tests for Scanner to scanner-test.cc
> - Add method ParquetFileReader::JsonPrint() that prints a file contents in a 
> json format, so we can consistently compare the output with the ground truth 
> stored in parquet-cpp/data. This method will also be more handy than 
> DebugPrint when we start working with nested columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (PARQUET-479) Improve/expand functional unit tests

2016-02-01 Thread Aliaksei Sandryhaila (JIRA)

 [ 
https://issues.apache.org/jira/browse/PARQUET-479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aliaksei Sandryhaila resolved PARQUET-479.
--
Resolution: Won't Fix

This is not an issue, but rather a discussion on functional and intergration 
tests. It has been moved to 
https://docs.google.com/document/d/1WyquzupLc3UkErO2OhqLJNQ9a84Cccc8LVUSuLQz39o/edit#.

> Improve/expand functional unit tests
> 
>
> Key: PARQUET-479
> URL: https://issues.apache.org/jira/browse/PARQUET-479
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-cpp
>Affects Versions: cpp-0.1
>Reporter: Aliaksei Sandryhaila
>Assignee: Aliaksei Sandryhaila
> Fix For: cpp-0.1
>
>
> We need to add a testing framework for unit tests, and run it as a part of 
> each Travis CI build.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (PARQUET-495) Fix mismatches in Types class comments

2016-02-01 Thread Cheng Lian (JIRA)

 [ 
https://issues.apache.org/jira/browse/PARQUET-495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian resolved PARQUET-495.

Resolution: Fixed

Issue resolved by pull request 317
[https://github.com/apache/parquet-mr/pull/317]

> Fix mismatches in Types class comments
> --
>
> Key: PARQUET-495
> URL: https://issues.apache.org/jira/browse/PARQUET-495
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Affects Versions: 1.8.0, 1.8.1
>Reporter: Liwei Lin
>Assignee: Liwei Lin
>Priority: Trivial
> Fix For: 1.9.0
>
>
> To produce:
> required group User \{
> required int64 id;
> *optional* binary email (UTF8);
> \}
> we should do:
> Types.requiredGroup()
>   .required(INT64).named("id")
>   .-*required* (BINARY).as(UTF8).named("email")-
>   .*optional* (BINARY).as(UTF8).named("email")
>   .named("User")



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)