[jira] [Created] (PARQUET-467) Check for and raise error for deprecated BIT_PACKED encoding

2016-01-25 Thread Wes McKinney (JIRA)
Wes McKinney created PARQUET-467: Summary: Check for and raise error for deprecated BIT_PACKED encoding Key: PARQUET-467 URL: https://issues.apache.org/jira/browse/PARQUET-467 Project: Parquet

[jira] [Created] (PARQUET-466) Make parquet-format a git submodule and add tool for updating generated Thrift code

2016-01-25 Thread Wes McKinney (JIRA)
Wes McKinney created PARQUET-466: Summary: Make parquet-format a git submodule and add tool for updating generated Thrift code Key: PARQUET-466 URL: https://issues.apache.org/jira/browse/PARQUET-466 P

[jira] [Comment Edited] (PARQUET-465) Parquet-Avro does not support field removal

2016-01-25 Thread Thomas Omans (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15116441#comment-15116441 ] Thomas Omans edited comment on PARQUET-465 at 1/26/16 1:26 AM:

[jira] [Resolved] (PARQUET-238) Unable to Install C++ Driver - reference to 'share_ptr' is ambiguous

2016-01-25 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved PARQUET-238. -- Resolution: Resolved This is resolved with PARQUET-418 and PARQUET-267. Please let us know if y

[jira] [Comment Edited] (PARQUET-238) Unable to Install C++ Driver - reference to 'share_ptr' is ambiguous

2016-01-25 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15116456#comment-15116456 ] Wes McKinney edited comment on PARQUET-238 at 1/26/16 1:20 AM:

[jira] [Commented] (PARQUET-465) Parquet-Avro does not support field removal

2016-01-25 Thread Thomas Omans (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15116441#comment-15116441 ] Thomas Omans commented on PARQUET-465: -- Ryan, Thank you for the quick response. U

[jira] [Commented] (PARQUET-465) Parquet-Avro does not support field removal

2016-01-25 Thread Ryan Blue (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15116413#comment-15116413 ] Ryan Blue commented on PARQUET-465: --- [~eggsby], sorry for the confusion here. Right now

[jira] [Created] (PARQUET-465) Parquet-Avro does not support field removal

2016-01-25 Thread Thomas Omans (JIRA)
Thomas Omans created PARQUET-465: Summary: Parquet-Avro does not support field removal Key: PARQUET-465 URL: https://issues.apache.org/jira/browse/PARQUET-465 Project: Parquet Issue Type: Bug

[jira] [Created] (PARQUET-464) Add cmake option and #defines to enable/disable struct packing

2016-01-25 Thread Wes McKinney (JIRA)
Wes McKinney created PARQUET-464: Summary: Add cmake option and #defines to enable/disable struct packing Key: PARQUET-464 URL: https://issues.apache.org/jira/browse/PARQUET-464 Project: Parquet

Re: Parquet-cpp

2016-01-25 Thread Ryan Blue
Aliaksei, thanks for being understanding here. I agree with you that it is too difficult. We really want to get the cpp side bootstrapped as soon as possible. Lets go with what you suggested, to have contributors review one another's patches and then ask a committer for a final review once bot

[jira] [Commented] (PARQUET-449) Update to latest parquet.thrift

2016-01-25 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15116337#comment-15116337 ] Wes McKinney commented on PARQUET-449: -- [~nongli] the GitHub PR is still outstanding

Re: Parquet-cpp

2016-01-25 Thread Wes McKinney
I am happy to help out with the patch maintenance when there are conflicts. With PARQUET-437 we'll want to write more unit tests which will help make sure we aren't breaking each other's code. On Mon, Jan 25, 2016 at 2:33 PM, Aliaksei Sandryhaila wrote: > Hi Ryan, > > This sounds very reasonable

Re: Parquet for very wide table

2016-01-25 Thread Cheng Lian
PARQUET-222 is mostly a memory issue caused by the # of columns. On the write path, each column comes with write buffers, and they can accumulate to a large amount. In the case investigated in PARQUET-222, it took more than 10G to write a single row consists of 26k integer columns. I.e., this i

Re: Parquet-cpp

2016-01-25 Thread Aliaksei Sandryhaila
Hi Ryan, This sounds very reasonable. I do not argue to disregard the standard Apache approach to promoting contributors to committers. I am just pointing out that without the input from current committers it is hard for us to productively contribute to the project. As a consequence, it is ha

[jira] [Resolved] (PARQUET-449) Update to latest parquet.thrift

2016-01-25 Thread Nong Li (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nong Li resolved PARQUET-449. - Resolution: Fixed Fix Version/s: cpp-0.1 Issue resolved by pull request 21 [https://github.com/apa

Re: Parquet-cpp

2016-01-25 Thread Ryan Blue
Hi everyone, Sorry about the current backlog on the parquet-cpp side. Most of the current committer base works on the Java implementation so it's either slow or not reliable for us to do those reviews. I think the best way to move forward is to review patches for each other. That will keep t

[jira] [Created] (PARQUET-463) Add DCHECK* macros for assertions in debug builds

2016-01-25 Thread Wes McKinney (JIRA)
Wes McKinney created PARQUET-463: Summary: Add DCHECK* macros for assertions in debug builds Key: PARQUET-463 URL: https://issues.apache.org/jira/browse/PARQUET-463 Project: Parquet Issue Typ

Re: Parquet-cpp

2016-01-25 Thread Aliaksei Sandryhaila
Hi Nong and Julien, As Wes has pointed out, we have a number of patches for parquet-cpp outstanding. Wes, Deepak, and I have been reviewing each other's pull requests. At this point, the patches need to be reviewed and approved by Parquet committers in order to be committed to master. Unfort

[jira] [Commented] (PARQUET-462) Create a new Level class for definition and repetition values

2016-01-25 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15115883#comment-15115883 ] Wes McKinney commented on PARQUET-462: -- Sounds good. I'm going to work on PARQUET-43

[jira] [Commented] (PARQUET-462) Create a new Level class for definition and repetition values

2016-01-25 Thread Deepak Majeti (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15115868#comment-15115868 ] Deepak Majeti commented on PARQUET-462: --- The code duplication will happen during th

Re: Parquet for very wide table

2016-01-25 Thread Krishna
Thanks Cheng, Nong. Data in the matrix is homogenous (cells are booleans), so, I don't expect to face memory related issues. Is the limitation on the # of columns or memory issues caused by the # of columns? To me it sounds more like memory issues. On Mon, Jan 25, 2016 at 10:16 AM, Cheng Lian wr

[jira] [Commented] (PARQUET-462) Create a new Level class for definition and repetition values

2016-01-25 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15115825#comment-15115825 ] Wes McKinney commented on PARQUET-462: -- The methods for reading the definition and r

[jira] [Commented] (PARQUET-462) Create a new Level class for definition and repetition values

2016-01-25 Thread Deepak Majeti (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15115820#comment-15115820 ] Deepak Majeti commented on PARQUET-462: --- The main idea is to prevent code duplicati

[jira] [Commented] (PARQUET-462) Create a new Level class for definition and repetition values

2016-01-25 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15115790#comment-15115790 ] Wes McKinney commented on PARQUET-462: -- Could you explain this in more detail, espec

[jira] [Comment Edited] (PARQUET-433) Specialize ColumnReaders based on the column type

2016-01-25 Thread Aliaksei Sandryhaila (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15115751#comment-15115751 ] Aliaksei Sandryhaila edited comment on PARQUET-433 at 1/25/16 6:53 PM:

[jira] [Commented] (PARQUET-433) Specialize ColumnReaders based on the column type

2016-01-25 Thread Aliaksei Sandryhaila (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15115751#comment-15115751 ] Aliaksei Sandryhaila commented on PARQUET-433: -- Yes, your commit looks very

[jira] [Created] (PARQUET-462) Create a new Level class for definition and repetition values

2016-01-25 Thread Deepak Majeti (JIRA)
Deepak Majeti created PARQUET-462: - Summary: Create a new Level class for definition and repetition values Key: PARQUET-462 URL: https://issues.apache.org/jira/browse/PARQUET-462 Project: Parquet

Re: Parquet for very wide table

2016-01-25 Thread Cheng Lian
Aside from Nong's comment, I think PARQUET-222, where we discussed a performance issue of writing wide tables, can be helpful. Cheng On 1/23/16 4:53 PM, Nong Li wrote: I expect this to be difficult. This is roughly 3 orders of magnitude more than even a typical wide table use case. Answers in

[jira] [Updated] (PARQUET-461) Improve ColumnReader API

2016-01-25 Thread Deepak Majeti (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deepak Majeti updated PARQUET-461: -- Description: I would like to add some more extensions to the ColumnReader API. These extensions

[jira] [Created] (PARQUET-461) Improve ColumnReader API

2016-01-25 Thread Deepak Majeti (JIRA)
Deepak Majeti created PARQUET-461: - Summary: Improve ColumnReader API Key: PARQUET-461 URL: https://issues.apache.org/jira/browse/PARQUET-461 Project: Parquet Issue Type: Improvement