[jira] [Commented] (PARQUET-1241) Use LZ4 frame format

2018-08-19 Thread Jonathan Underwood (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16585292#comment-16585292 ] Jonathan Underwood commented on PARQUET-1241: - Here are some questions that we should

[jira] [Commented] (PARQUET-1241) Use LZ4 frame format

2018-08-19 Thread Jonathan Underwood (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16585291#comment-16585291 ] Jonathan Underwood commented on PARQUET-1241: - [~ee07b291] - not sure there's any need for

[jira] [Commented] (PARQUET-1241) Use LZ4 frame format

2018-08-19 Thread Alex Wang (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16585290#comment-16585290 ] Alex Wang commented on PARQUET-1241: Thanks a lot [~jonathan.underw...@gmail.com] for the

[jira] [Comment Edited] (PARQUET-1241) Use LZ4 frame format

2018-08-19 Thread Alex Wang (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16585285#comment-16585285 ] Alex Wang edited comment on PARQUET-1241 at 8/19/18 11:26 PM: --

[jira] [Comment Edited] (PARQUET-1241) Use LZ4 frame format

2018-08-19 Thread Jonathan Underwood (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16585288#comment-16585288 ] Jonathan Underwood edited comment on PARQUET-1241 at 8/19/18 11:10 PM:

[jira] [Comment Edited] (PARQUET-1241) Use LZ4 frame format

2018-08-19 Thread Jonathan Underwood (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16585288#comment-16585288 ] Jonathan Underwood edited comment on PARQUET-1241 at 8/19/18 11:10 PM:

[jira] [Commented] (PARQUET-1241) Use LZ4 frame format

2018-08-19 Thread Jonathan Underwood (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16585288#comment-16585288 ] Jonathan Underwood commented on PARQUET-1241: - I think there's a danger here of

[jira] [Commented] (PARQUET-1241) Use LZ4 frame format

2018-08-19 Thread Alex Wang (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16585285#comment-16585285 ] Alex Wang commented on PARQUET-1241: [~wesmckinn] sorry for this delayed replay,   I'd like to

Re: Doing a 1.5.0 C++ release

2018-08-19 Thread Deepak Majeti
Uwe, I would like to get https://issues.apache.org/jira/browse/PARQUET-1372 into this release as well. There is a PR already open for this JIRA and I got some feedback. I will address the feedback in the next couple of days. On Sun, Aug 19, 2018 at 8:48 AM Uwe L. Korn wrote: > Hello, > > as we

[jira] [Updated] (PARQUET-1372) [C++] Add an API to allow writing RowGroups based on their size rather than num_rows

2018-08-19 Thread Deepak Majeti (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deepak Majeti updated PARQUET-1372: --- Fix Version/s: (was: 1.5.0) cpp-1.5.0 > [C++] Add an API to allow

Re: [VOTE] Moving Apache Parquet C++ development process to a monorepo structure with Apache Arrow C++

2018-08-19 Thread Wes McKinney
OK. I'm a bit -0 on doing anything that results in Arrow having a nonlinear git history (and rebasing is not really an option) but we can discuss that more later On Sun, Aug 19, 2018 at 8:50 AM, Uwe L. Korn wrote: > +1 on this but also see my comments in the mail on the discussions. > > We

Re: [DISCUSS] Solutions for improving the Arrow-Parquet C++ development morass

2018-08-19 Thread Wes McKinney
hi Uwe, I agree with your points. Currently we have 3 software artifacts: 1. Arrow C++ libraries 2. Parquet C++ libraries with Arrow columnar integration 3. C++ interop layer for Python + Cython bindings Changes in #1 prompt an awkward workflow involving multiple PRs; as a result of this we

[jira] [Created] (PARQUET-1393) [C++] Change parquet::arrow::FileReader::ReadRowGroups to read into continuous arrays

2018-08-19 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created PARQUET-1393: Summary: [C++] Change parquet::arrow::FileReader::ReadRowGroups to read into continuous arrays Key: PARQUET-1393 URL: https://issues.apache.org/jira/browse/PARQUET-1393

[jira] [Created] (PARQUET-1392) [C++] Supply row group indices to parquet::arrow::FileReader::ReadTable

2018-08-19 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created PARQUET-1392: Summary: [C++] Supply row group indices to parquet::arrow::FileReader::ReadTable Key: PARQUET-1392 URL: https://issues.apache.org/jira/browse/PARQUET-1392 Project:

[jira] [Assigned] (PARQUET-1158) [C++] Basic RowGroup filtering

2018-08-19 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn reassigned PARQUET-1158: Assignee: Uwe L. Korn > [C++] Basic RowGroup filtering > --

[jira] [Updated] (PARQUET-1158) [C++] Basic RowGroup filtering

2018-08-19 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn updated PARQUET-1158: - Summary: [C++] Basic RowGroup filtering (was: C++: Basic RowGroup filtering) > [C++] Basic

[jira] [Updated] (PARQUET-1158) [C++] Basic RowGroup filtering

2018-08-19 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn updated PARQUET-1158: - Fix Version/s: (was: cpp-1.5.0) cpp-1.6.0 > [C++] Basic RowGroup

Re: [VOTE] Moving Apache Parquet C++ development process to a monorepo structure with Apache Arrow C++

2018-08-19 Thread Uwe L. Korn
+1 on this but also see my comments in the mail on the discussions. We should also keep the git history of parquet-cpp, that should not be hard with git and there is probably a StackOverflow answer out there that gives you the commands to do the merge. Uwe On Fri, Aug 17, 2018, at 12:57 AM,

Doing a 1.5.0 C++ release

2018-08-19 Thread Uwe L. Korn
Hello, as we are in the process of doing/voting on a repo merge with the Arrow project and also because there was some time since the last release, I would like to proceed with a 1.5.0 release soon. Please have a look over the issues at

[jira] [Updated] (PARQUET-1122) [C++] Support 2-level list encoding in Arrow decoding

2018-08-19 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn updated PARQUET-1122: - Fix Version/s: (was: cpp-1.5.0) cpp-1.6.0 > [C++] Support 2-level list

Re: [DISCUSS] Solutions for improving the Arrow-Parquet C++ development morass

2018-08-19 Thread Uwe L. Korn
Back from vacation, I also want to finally raise my voice. With the current state of the Parquet<->Arrow development, I see a benefit in merging the code base for now, but not necessarily forever. Parquet C++ is the main code base of an artefact for which an Arrow C++ adapter is built and that

Re: num_level in Parquet Cpp library & how to add a JSON field?

2018-08-19 Thread Uwe L. Korn
Hello Ivy, > Is there any ways to read the data in logical format? because I want to > check if my final output is correct. I usually use the parquet-cli from the parquet-mr project to check if my file is written correctly. This should give you much more informative output. Simple usage: git

[jira] [Resolved] (PARQUET-1390) [Java] Upgrade to Arrow 0.10.0

2018-08-19 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn resolved PARQUET-1390. -- Resolution: Fixed Issue resolved by pull request 516

[jira] [Assigned] (PARQUET-1390) [Java] Upgrade to Arrow 0.10.0

2018-08-19 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn reassigned PARQUET-1390: Assignee: Andy Grove > [Java] Upgrade to Arrow 0.10.0 > --

[jira] [Updated] (PARQUET-1390) [Java] Upgrade to Arrow 0.10.0

2018-08-19 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn updated PARQUET-1390: - Fix Version/s: 1.11.0 > [Java] Upgrade to Arrow 0.10.0 > -- > >

[jira] [Commented] (PARQUET-1390) [Java] Upgrade to Arrow 0.10.0

2018-08-19 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16585087#comment-16585087 ] ASF GitHub Bot commented on PARQUET-1390: - xhochy closed pull request #516: PARQUET-1390:

Re: Status of column index in parquet-mr

2018-08-19 Thread Uwe L. Korn
Hello Gabor, comment in-line > The implementation was done based on the original design of column indexes > meaning > that no row alignment is required between the pages (the only requirement > is for the pages to respect row

Status of Bloom filter

2018-08-19 Thread 俊杰陈
Hi Status as of sync-up at June: The Bloom filter benchmark was upload to PARQUET-41 jira. The PARQUET-41 was broken into several sub tasks as following: - parquet format: PARQUET-319. - Add Bloom filter utility class: PARUQET-1342 for java, PARQUET-1332 for c++. - read/write side

[jira] [Created] (PARQUET-1391) [java] Integrate Bloom filter logic

2018-08-19 Thread Junjie Chen (JIRA)
Junjie Chen created PARQUET-1391: Summary: [java] Integrate Bloom filter logic Key: PARQUET-1391 URL: https://issues.apache.org/jira/browse/PARQUET-1391 Project: Parquet Issue Type: Sub-task

[jira] [Updated] (PARQUET-1329) [C++] Integrate Bloom filter into row group filter logic

2018-08-19 Thread Junjie Chen (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junjie Chen updated PARQUET-1329: - Summary: [C++] Integrate Bloom filter into row group filter logic (was: integrate parquet

[jira] [Updated] (PARQUET-1328) [java]Bloom filter read/write implementation

2018-08-19 Thread Junjie Chen (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junjie Chen updated PARQUET-1328: - Summary: [java]Bloom filter read/write implementation (was: parquet bloom filter writer

[jira] [Assigned] (PARQUET-1328) [java]Bloom filter read/write implementation

2018-08-19 Thread Junjie Chen (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junjie Chen reassigned PARQUET-1328: Assignee: Junjie Chen > [java]Bloom filter read/write implementation >

[jira] [Updated] (PARQUET-1327) [C++]Bloom filter read/write implementation

2018-08-19 Thread Junjie Chen (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junjie Chen updated PARQUET-1327: - Summary: [C++]Bloom filter read/write implementation (was: parquet bloom filter reader