Re: [RESULT] Release Apache Parquet Format 2.9.0 RC0

2021-04-15 Thread Uwe L. Korn
Published the release. On Wed, Apr 14, 2021, at 6:30 PM, Driesprong, Fokko wrote: > Yes, you'll need PMC permissions to do that. > > A PMC could fetch the artifacts from > https://dist.apache.org/repos/dist/dev/parquet/apache-parquet-format-2.9.0-rc0/ > and push them into svn as described below

Re: [VOTE] Release Apache Parquet Format 2.9.0 RC0

2021-04-12 Thread Uwe L. Korn
+1 (binding) Verified signature and checksum on the artifact, passed the tests on macOS 11 (ARM64) with mamba create -p $(pwd)/../env maven thrift-cpp=0.13 conda activate $(pwd)/../env mvn test On Fri, Apr 9, 2021, at 10:27 AM, Gabor Szadovszky wrote: > Thanks, Wes. If this is the case I am

Re: [C++] Changing the versioning string for Parquet-CPP

2021-03-12 Thread Uwe L. Korn
When we merged this into the Arrow repo, at least from my side, there was the intention to revert that maybe at some stage again. The though behind moving parquet-cpp out of the Arrow repo again was based on the idea that Parquet was one of the many interfaces Arrow does provide access to but

Re: [DISCUSS] Ongoing LZ4 problems with Parquet files

2020-06-30 Thread Uwe L. Korn
I'm also in favor of disabling support for now. Having to deal with broken files or the detection of various incompatible implementations in the long-term will harm more than not supporting LZ4 for a while. Snappy is generally more used than LZ4 in this category as it has been available since

Re: Updating parquet web site

2019-10-18 Thread Uwe L. Korn
Hello Gabor, can we call this for clarity https://github.com/apache/parquet-site ? Thanks Uwe On Fri, Oct 18, 2019, at 9:46 AM, Gabor Szadovszky wrote: > Dear All, > > There are some stuff on our web site that is ready for update (since a > while). To spin up the process it would be great if

[jira] [Updated] (PARQUET-1586) [C++] Add --dump options to parquet-reader tool to dump def/rep levels

2019-05-26 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn updated PARQUET-1586: - Fix Version/s: (was: 1.10.1) cpp-1.6.0 > [C++] Add --dump opti

[jira] [Resolved] (PARQUET-1586) [C++] Add --dump options to parquet-reader tool to dump def/rep levels

2019-05-26 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn resolved PARQUET-1586. -- Resolution: Fixed Fix Version/s: 1.10.1 Issue resolved by pull request 4385 [https

[jira] [Resolved] (PARQUET-1583) [C++] Remove parquet::Vector class

2019-05-21 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn resolved PARQUET-1583. -- Resolution: Fixed Issue resolved by pull request 4354 [https://github.com/apache/arrow/pull

Re: Parquet vs. other Open Source Columnar Formats

2019-05-09 Thread Uwe L. Korn
Hello, Be aware that Avro and Protobuf are general serialization formats, not columnar ones such as Parquet or ORC. They are good for RPC or row-wise streaming whereas the latter two are perfect for analytics. Uwe > Am 09.05.2019 um 20:33 schrieb David Mollitor : > > I'm sure there are many

[jira] [Commented] (PARQUET-1022) [C++] Append mode in parquet-cpp

2019-03-13 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791492#comment-16791492 ] Uwe L. Korn commented on PARQUET-1022: -- There is no implementation of merging concatenating files

[jira] [Commented] (PARQUET-1022) [C++] Append mode in parquet-cpp

2019-03-12 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790518#comment-16790518 ] Uwe L. Korn commented on PARQUET-1022: -- [~thamha] The solution here is to write more files

Re: parquet using encoding other than UTF-8

2019-02-05 Thread Uwe L. Korn
Hello Manik, this is not possible at the moment. As Parquet is a portable on-disk format, we focus on having a single representation for each data type. Thus implementing readers/writers is limited to these to make their implementation simpler. Especially as you are the producer but not the

[jira] [Assigned] (PARQUET-1521) [C++] Do not use "extern template class" with parquet::ColumnWriter

2019-02-05 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn reassigned PARQUET-1521: Assignee: Wes McKinney > [C++] Do not use "extern template class"

[jira] [Resolved] (PARQUET-1521) [C++] Do not use "extern template class" with parquet::ColumnWriter

2019-02-05 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn resolved PARQUET-1521. -- Resolution: Fixed Issue resolved by pull request 3551 [https://github.com/apache/arrow/pull

[jira] [Commented] (PARQUET-1523) [C++] Vectorize comparator interface

2019-02-04 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760517#comment-16760517 ] Uwe L. Korn commented on PARQUET-1523: -- This would probably benefit of arrow-compute kernel

Re: [VOTE] Release Apache Parquet 1.10.1 RC0

2019-01-31 Thread Uwe L. Korn
+1 (binding) Build and tested using Ryan's script on Ubuntu 16.04. The script helped me a bit as it included the necessary maven options. Thanks! For future, it would be good to include one as we have in Arrow that also checks the signature. We have that in the main tree and the script also

Re: [DISCUSS] Remove old modules?

2019-01-29 Thread Uwe L. Korn
Hello Fokko, I have put up a PR for the Scala update https://github.com/apache/parquet-mr/pull/605. parquet-scrooge fails due to a Thrift parsing error but parquet-scala succeeds with Scala 2.12 With dropping scrooge, we could at least move this forward. Uwe > Am 29.01.2019 um 11:40 schrieb

Re: [DISCUSS] Bump Apache Thrift dependency to 0.12.0

2019-01-27 Thread Uwe L. Korn
oexist because we shade the one in parquet-format. Thrift should also be > > binary compatible, although I don't think they publish any guarantees. > > > > On Fri, Jan 25, 2019 at 12:53 AM Uwe L. Korn wrote: > > > > > As an FYI: parquet-cpp already uses Thrift 0.12 in

Re: [DISCUSS] Parquet Java 1.10.1 release?

2019-01-27 Thread Uwe L. Korn
Hello Ryan, Making a bugfix release sounds fine for this case. Sadly as with all other RCs, it would help to have better instructions on how to verify the release candidate. Uwe On Fri, Jan 25, 2019, at 8:19 PM, Ryan Blue wrote: > Hi everyone, > > The Spark community caught a correctness bug

[jira] [Assigned] (PARQUET-1504) Add an option to convert Parquet Int96 to Arrow Timestamp

2019-01-27 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn reassigned PARQUET-1504: Assignee: Yongyan Wang > Add an option to convert Parquet Int96 to Arrow Timest

[jira] [Resolved] (PARQUET-1504) Add an option to convert Parquet Int96 to Arrow Timestamp

2019-01-27 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn resolved PARQUET-1504. -- Resolution: Fixed Fix Version/s: 1.12.0 Issue resolved by PR https://github.com

[jira] [Updated] (PARQUET-1496) [Java] Update Scala to 2.12

2019-01-26 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn updated PARQUET-1496: - Description: When trying to build the parquet-mr code on OSX Mojave with OpenJDK 10 and 9

[jira] [Updated] (PARQUET-1496) [Java] Update Scala to 2.12

2019-01-26 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn updated PARQUET-1496: - Summary: [Java] Update Scala to 2.12 (was: [Java] Build fails on OSX and Java 10) > [J

[jira] [Assigned] (PARQUET-1496) [Java] Update Scala to 2.12

2019-01-26 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn reassigned PARQUET-1496: Assignee: Uwe L. Korn > [Java] Update Scala to 2

Re: [DISCUSS] Bump Apache Thrift dependency to 0.12.0

2019-01-25 Thread Uwe L. Korn
As an FYI: parquet-cpp already uses Thrift 0.12 in some of its binary distributions. So when there is a problem with old readers, one has to notice that we already have files out in the wild. Cheers Uwe On Fri, Jan 25, 2019, at 9:13 AM, Gabor Szadovszky wrote: > May it cause any problems that

Re: [VOTE] Release Apache Parquet 1.11.0 RC3

2019-01-22 Thread Uwe L. Korn
at these issues shall be fixed but I think we should > not stop the release because of them. > > Cheers, > Gabor > > On Mon, Jan 21, 2019 at 6:40 PM Uwe L. Korn wrote: > > > Hi, > > > > I'm sadly giving a +0 here. The signatures look good but I was unable to >

[jira] [Resolved] (PARQUET-1501) v1.8.x to be fixed with PARQUET-952 solution

2019-01-22 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn resolved PARQUET-1501. -- Resolution: Won't Fix > v1.8.x to be fixed with PARQUET-952 solut

[jira] [Updated] (PARQUET-1501) v1.8.x to be fixed with PARQUET-952 solution

2019-01-22 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn updated PARQUET-1501: - Description: The following issue fixed in AVro parquet v1.11.0. PARQUET-952: Avro union

[jira] [Commented] (PARQUET-1501) v1.8.x to be fixed with PARQUET-952 solution

2019-01-22 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16748498#comment-16748498 ] Uwe L. Korn commented on PARQUET-1501: -- [~nvijayrech] We probably won't make a bugfix release

[jira] [Updated] (PARQUET-1501) v1.8.x to be fixed with PARQUET-952 solution

2019-01-22 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn updated PARQUET-1501: - Flags: Patch (was: Patch,Important) > v1.8.x to be fixed with PARQUET-952 solut

Re: [VOTE] Release Apache Parquet 1.11.0 RC3

2019-01-21 Thread Uwe L. Korn
Hi, I'm sadly giving a +0 here. The signatures look good but I was unable to build with JDK 9/10/11 on OSX. https://issues.apache.org/jira/browse/PARQUET-1497 and https://issues.apache.org/jira/browse/PARQUET-1496 are the problems I'm running into. I've also opened a PR to document how to

[jira] [Created] (PARQUET-1497) [Java] Building on OSX fails with OpenJDK 11

2019-01-21 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created PARQUET-1497: Summary: [Java] Building on OSX fails with OpenJDK 11 Key: PARQUET-1497 URL: https://issues.apache.org/jira/browse/PARQUET-1497 Project: Parquet Issue Type

[jira] [Created] (PARQUET-1498) [Java] Add instructions to install thrift via homebrew

2019-01-21 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created PARQUET-1498: Summary: [Java] Add instructions to install thrift via homebrew Key: PARQUET-1498 URL: https://issues.apache.org/jira/browse/PARQUET-1498 Project: Parquet

[jira] [Created] (PARQUET-1496) [Java] Build fails on OSX and Java 10

2019-01-21 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created PARQUET-1496: Summary: [Java] Build fails on OSX and Java 10 Key: PARQUET-1496 URL: https://issues.apache.org/jira/browse/PARQUET-1496 Project: Parquet Issue Type: Bug

Re: [Draft REPORT] Apache Parquet - January 2019

2019-01-07 Thread Uwe L. Korn
+1 Uwe On Mon, Jan 7, 2019, at 9:14 PM, Ryan Blue wrote: > +1 > > On Mon, Jan 7, 2019 at 11:39 AM Julien Le Dem > wrote: > > > ## Description: > > Parquet is a standard and interoperable columnar file format > > for efficient analytics. Parquet has 3 sub-projects: > > - parquet-format: format

Re: [Draft REPORT] Apache Parquet - January 2019

2019-01-07 Thread Uwe L. Korn
+1 Uwe On Mon, Jan 7, 2019, at 9:14 PM, Ryan Blue wrote: > +1 > > On Mon, Jan 7, 2019 at 11:39 AM Julien Le Dem > wrote: > > > ## Description: > > Parquet is a standard and interoperable columnar file format > > for efficient analytics. Parquet has 3 sub-projects: > > - parquet-format: format

[jira] [Commented] (PARQUET-1481) [C++] SEGV when reading corrupt parquet file

2018-12-21 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726776#comment-16726776 ] Uwe L. Korn commented on PARQUET-1481: -- Can you describe how you generated this Parquet file

Re: [Discuss] Code of conduct

2018-12-09 Thread Uwe L. Korn
Hello Julien, As per ASF guideline https://www.apache.org/foundation/policies/conduct.html applies also to the Apache Parquet channels. Would that be sufficient for you? Cheers Uwe On Sat, Dec 8, 2018, at 2:14 AM, Julien Le Dem wrote: > We currently don’t have an explicit code of conduct.

[jira] [Updated] (PARQUET-490) [C++] Incorporate DELTA_BINARY_PACKED value encoder into library and add unit tests

2018-11-02 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn updated PARQUET-490: Summary: [C++] Incorporate DELTA_BINARY_PACKED value encoder into library and add unit tests

[jira] [Updated] (PARQUET-492) [C++] Incorporate DELTA_BYTE_ARRAY value encoder into library and add unit tests

2018-11-02 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn updated PARQUET-492: Summary: [C++] Incorporate DELTA_BYTE_ARRAY value encoder into library and add unit tests

[jira] [Updated] (PARQUET-491) [C++] Incorporate DELTA_LENGTH_BYTE_ARRAY value encoder into library and add unit tests

2018-11-02 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn updated PARQUET-491: Summary: [C++] Incorporate DELTA_LENGTH_BYTE_ARRAY value encoder into library and add unit tests

[jira] [Commented] (PARQUET-1454) ld-linux-x86-64.so.2 is missing

2018-10-31 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16669869#comment-16669869 ] Uwe L. Korn commented on PARQUET-1454: -- {{ld-linux-x86-64.so.2}} is a library that is normally

[jira] [Assigned] (PARQUET-1160) [C++] Implement BYTE_ARRAY-backed Decimal reads

2018-09-30 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn reassigned PARQUET-1160: Assignee: Ted Haining (was: Phillip Cloud) > [C++] Implement BYTE_ARRAY-backed Deci

[jira] [Resolved] (PARQUET-1160) [C++] Implement BYTE_ARRAY-backed Decimal reads

2018-09-30 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn resolved PARQUET-1160. -- Resolution: Fixed Fix Version/s: 1.10.1 Issue resolved by pull request 2646 [https

[jira] [Updated] (PARQUET-1160) [C++] Implement BYTE_ARRAY-backed Decimal reads

2018-09-30 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn updated PARQUET-1160: - Fix Version/s: (was: 1.10.1) cpp-1.6.0 > [C++] Implement BYTE_AR

Re: Donate C (GLib) bindings for C++ implementation

2018-09-25 Thread Uwe L. Korn
Hello Kou, this was already mentioned on the pull request but copying it here for others: We would very much like this in the Apache repository and are very grateful for the code donation. We should do the normal format code donation vote and then merge it. Uwe On Tue, Sep 25, 2018, at 7:20

[jira] [Commented] (PARQUET-1422) [C++] Use Arrow IO interfaces natively rather than current parquet:: wrappers

2018-09-22 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16624704#comment-16624704 ] Uwe L. Korn commented on PARQUET-1422: -- +1 from me, that's one of the refactoring benefits I

Re: [RESULT] [VOTE] Moving Apache Parquet C++ development process to a monorepo structure with Apache Arrow C++

2018-09-20 Thread Uwe L. Korn
Hello Wes, I'm definitely +1 on archiving the master branch. I'm not sure what you mean exactly with this. I would have simply added a final commit that deletes all code and adds a message to the README that the repository has moved into a another repo. Cheers Uwe On Thu, Sep 13, 2018, at

[RESULT][VOTE] Release Apache Parquet C++ 1.5.0 RC0

2018-09-06 Thread Uwe L. Korn
ld working properly > > > > Thanks Uwe for managing the release > > On Thu, Aug 30, 2018 at 4:36 PM Wes McKinney wrote: > > > > > > It may take me until sometime tomorrow to run the build since I want > > > to check on Windows also > > > On Wed, Aug

Re: [RESULT] [VOTE] Moving Apache Parquet C++ development process to a monorepo structure with Apache Arrow C++

2018-09-04 Thread Uwe L. Korn
k place > > > > > > https://github.com/apache/spark/commit/2fe0a1aaeebbf7f60bd4130847d738c29f1e3d53 > > > > > > It would be my preference to have a single squashed commit whose > > > message attributes the developers of the code and provides links back &

Re: [VOTE] Release Apache Parquet C++ 1.5.0 RC0

2018-08-29 Thread Uwe L. Korn
+1 (binding) Verified on Ubuntu 16.04 using `./dev/release/verify-release-candidate 1.5.0 0` On Wed, Aug 29, 2018, at 5:09 PM, Uwe L. Korn wrote: > All, > > I propose that we accept the following release candidate as the official > Apache Parquet C++ 1.5.0 release. > > Par

[VOTE] Release Apache Parquet C++ 1.5.0 RC0

2018-08-29 Thread Uwe L. Korn
All, I propose that we accept the following release candidate as the official Apache Parquet C++ 1.5.0 release. Parquet C++ 1.5.0-rc0 includes the following: --- The CHANGELOG for the release is available at:

Re: Doing a 1.5.0 C++ release

2018-08-27 Thread Uwe L. Korn
> > Thanks > >> On Sun, Aug 26, 2018 at 1:06 PM, Wes McKinney wrote: >> I think we should be able to cut a release now? We can also proceed >> with the Arrow merge at the same time once we agree how particularly >> to do that. >> >>> On Wed, Aug

Re: Date and time for next Parquet sync

2018-08-27 Thread Uwe L. Korn
Hello Nador, probably I can make this time. Just a timezone question: Is it 6pm CET or 6pm CEST? I guess the latter. See http://timesched.pocoo.org/?date=2018-08-28=central-europe-standard-time!,pacific-standard-time=1080,1140 Uwe On Mon, Aug 27, 2018, at 12:20 PM, Nandor Kollar wrote: > Hi

[jira] [Resolved] (PARQUET-1372) [C++] Add an API to allow writing RowGroups based on their size rather than num_rows

2018-08-25 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn resolved PARQUET-1372. -- Resolution: Fixed Issue resolved by pull request 484 [https://github.com/apache/parquet-cpp

[jira] [Resolved] (PARQUET-1392) [C++] Supply row group indices to parquet::arrow::FileReader::ReadTable

2018-08-23 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn resolved PARQUET-1392. -- Resolution: Fixed Resolved by PR https://github.com/apache/parquet-cpp/pull/492 >

[jira] [Moved] (PARQUET-1403) Can't save a df using Parquet if using float16

2018-08-23 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn moved ARROW-3112 to PARQUET-1403: - Affects Version/s: (was: 0.10.0) cpp-1.4.0

Re: PARQUET-1399: Move parquet-mr related code from parquet-format

2018-08-22 Thread Uwe L. Korn
Hello Gabor > I've just realized that merge commit in github is "not enabled for this > repository". Any suggestions how we can workaround this? You have to merge manually on your commandline using "git merge … && git push origin master". Uwe

Re: Doing a 1.5.0 C++ release

2018-08-22 Thread Uwe L. Korn
would like to get https://issues.apache.org/jira/browse/PARQUET-1372 into > > this release as well. There is a PR already open for this JIRA and I got > > some feedback. I will address the feedback in the next couple of days. > > > > On Sun, Aug 19, 2018 at 8:48 AM Uwe L. Korn w

Re: [VOTE] Moving Apache Parquet C++ development process to a monorepo structure with Apache Arrow C++

2018-08-21 Thread Uwe L. Korn
, 2018, at 7:36 PM, Wes McKinney wrote: > OK. I'm a bit -0 on doing anything that results in Arrow having a > nonlinear git history (and rebasing is not really an option) but we > can discuss that more later > > On Sun, Aug 19, 2018 at 8:50 AM, Uwe L. Korn wrote: > > +1 on

[jira] [Comment Edited] (PARQUET-1395) [C++] Tests fail due to not finding libboost_system.so

2018-08-20 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16585947#comment-16585947 ] Uwe L. Korn edited comment on PARQUET-1395 at 8/20/18 1:58 PM: --- Ok

[jira] [Commented] (PARQUET-1395) [C++] Tests fail due to not finding libboost_system.so

2018-08-20 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16585947#comment-16585947 ] Uwe L. Korn commented on PARQUET-1395: -- Ok, that is definitely the root of the problem

[jira] [Commented] (PARQUET-1395) [C++] Tests fail due to not finding libboost_system.so

2018-08-20 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16585931#comment-16585931 ] Uwe L. Korn commented on PARQUET-1395: -- [~pitrou] can you post the output of ? {code} % objdump

[jira] [Commented] (PARQUET-1395) [C++] Tests fail due to not finding libboost_system.so

2018-08-20 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16585927#comment-16585927 ] Uwe L. Korn commented on PARQUET-1395: -- We should have a look at what {{conda}} is doing

[jira] [Created] (PARQUET-1393) [C++] Change parquet::arrow::FileReader::ReadRowGroups to read into continuous arrays

2018-08-19 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created PARQUET-1393: Summary: [C++] Change parquet::arrow::FileReader::ReadRowGroups to read into continuous arrays Key: PARQUET-1393 URL: https://issues.apache.org/jira/browse/PARQUET-1393

[jira] [Created] (PARQUET-1392) [C++] Supply row group indices to parquet::arrow::FileReader::ReadTable

2018-08-19 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created PARQUET-1392: Summary: [C++] Supply row group indices to parquet::arrow::FileReader::ReadTable Key: PARQUET-1392 URL: https://issues.apache.org/jira/browse/PARQUET-1392 Project

[jira] [Assigned] (PARQUET-1158) [C++] Basic RowGroup filtering

2018-08-19 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn reassigned PARQUET-1158: Assignee: Uwe L. Korn > [C++] Basic RowGroup filter

[jira] [Updated] (PARQUET-1158) [C++] Basic RowGroup filtering

2018-08-19 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn updated PARQUET-1158: - Summary: [C++] Basic RowGroup filtering (was: C++: Basic RowGroup filtering) > [C++] Ba

[jira] [Updated] (PARQUET-1158) [C++] Basic RowGroup filtering

2018-08-19 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn updated PARQUET-1158: - Fix Version/s: (was: cpp-1.5.0) cpp-1.6.0 > [C++] Basic RowGr

Re: [VOTE] Moving Apache Parquet C++ development process to a monorepo structure with Apache Arrow C++

2018-08-19 Thread Uwe L. Korn
+1 on this but also see my comments in the mail on the discussions. We should also keep the git history of parquet-cpp, that should not be hard with git and there is probably a StackOverflow answer out there that gives you the commands to do the merge. Uwe On Fri, Aug 17, 2018, at 12:57 AM,

Doing a 1.5.0 C++ release

2018-08-19 Thread Uwe L. Korn
Hello, as we are in the process of doing/voting on a repo merge with the Arrow project and also because there was some time since the last release, I would like to proceed with a 1.5.0 release soon. Please have a look over the issues at

[jira] [Updated] (PARQUET-1122) [C++] Support 2-level list encoding in Arrow decoding

2018-08-19 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn updated PARQUET-1122: - Fix Version/s: (was: cpp-1.5.0) cpp-1.6.0 > [C++] Support 2-level l

Re: [DISCUSS] Solutions for improving the Arrow-Parquet C++ development morass

2018-08-19 Thread Uwe L. Korn
Back from vacation, I also want to finally raise my voice. With the current state of the Parquet<->Arrow development, I see a benefit in merging the code base for now, but not necessarily forever. Parquet C++ is the main code base of an artefact for which an Arrow C++ adapter is built and that

Re: num_level in Parquet Cpp library & how to add a JSON field?

2018-08-19 Thread Uwe L. Korn
ything breaks I can highly recommend reading parquet-mr's READMEs. Uwe > > Thanks! > -Ivy > > On 2018/08/03 13:46:15, "Uwe L. Korn" wrote: > > Hello Ivy, > > > > "primitive binary" means `Type::BYTE_ARRAY`, so you're correct. I have n

[jira] [Resolved] (PARQUET-1390) [Java] Upgrade to Arrow 0.10.0

2018-08-19 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn resolved PARQUET-1390. -- Resolution: Fixed Issue resolved by pull request 516 [https://github.com/apache/parquet-mr

[jira] [Assigned] (PARQUET-1390) [Java] Upgrade to Arrow 0.10.0

2018-08-19 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn reassigned PARQUET-1390: Assignee: Andy Grove > [Java] Upgrade to Arrow 0.1

[jira] [Updated] (PARQUET-1390) [Java] Upgrade to Arrow 0.10.0

2018-08-19 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn updated PARQUET-1390: - Fix Version/s: 1.11.0 > [Java] Upgrade to Arrow 0.1

Re: Status of column index in parquet-mr

2018-08-19 Thread Uwe L. Korn
Hello Gabor, comment in-line > The implementation was done based on the original design of column indexes > meaning > that no row alignment is required between the pages (the only requirement > is for the pages to respect row

Re: Date and time for next Parquet sync

2018-08-12 Thread Uwe L. Korn
As the meeting falls into my summer vacation I cannot participate but will try to join again if there is a meeting two weeks later. Uwe > Am 08.08.2018 um 16:43 schrieb Nandor Kollar : > > Hi All, > > It has been a while since we had a Parquet sync, therefore I'd like to > propose to have one

[jira] [Commented] (PARQUET-1370) Read consecutive column chunks in a single scan

2018-08-03 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16568403#comment-16568403 ] Uwe L. Korn commented on PARQUET-1370: -- I'm doing the same, my code looks as follows: {code:java

[jira] [Commented] (PARQUET-1370) Read consecutive column chunks in a single scan

2018-08-03 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16568368#comment-16568368 ] Uwe L. Korn commented on PARQUET-1370: -- [~rgruener] I was also plagued by this issue but I wrapped

[jira] [Commented] (PARQUET-1369) [Python] Unavailable Parquet column statistics from Spark-generated file

2018-08-03 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16568247#comment-16568247 ] Uwe L. Korn commented on PARQUET-1369: -- [~rgruener] Moved it. > [Python] Unavailable Parq

[jira] [Moved] (PARQUET-1369) [Python] Unavailable Parquet column statistics from Spark-generated file

2018-08-03 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn moved ARROW-2800 to PARQUET-1369: - Fix Version/s: (was: 0.11.0) cpp-1.5.0

Re: num_level in Parquet Cpp library & how to add a JSON field?

2018-08-03 Thread Uwe L. Korn
Hello Ivy, "primitive binary" means `Type::BYTE_ARRAY`, so you're correct. I have not yet seen anyone use the JSON field with parquet-cpp but the JSON type is simply a binary string with an annotation so I would expect everything to just work. Uwe On Thu, Aug 2, 2018, at 7:59 PM,

[jira] [Resolved] (PARQUET-1357) [C++] FormatStatValue truncates binary statistics on zero character

2018-08-01 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn resolved PARQUET-1357. -- Resolution: Fixed Issue resolved by PR https://github.com/apache/parquet-cpp/pull/479 >

[jira] [Commented] (PARQUET-1361) [C++] 1.4.1 library allows creation of parquet file w/NULL values for INT types

2018-07-31 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16563874#comment-16563874 ] Uwe L. Korn commented on PARQUET-1361: -- What is the problem with the generated Parquet file? I

[jira] [Commented] (PARQUET-1363) Add IP address logical type

2018-07-30 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562055#comment-16562055 ] Uwe L. Korn commented on PARQUET-1363: -- [~tmgstev] You would probably need two types: IPv4

[jira] [Assigned] (PARQUET-1348) [C++] Allow Arrow FileWriter To Write FileMetaData

2018-07-28 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn reassigned PARQUET-1348: Assignee: Robert Gruener > [C++] Allow Arrow FileWriter To Write FileMetaD

[jira] [Resolved] (PARQUET-1348) [C++] Allow Arrow FileWriter To Write FileMetaData

2018-07-28 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn resolved PARQUET-1348. -- Resolution: Fixed Fix Version/s: cpp-1.5.0 Issue resolved by pull request 481 [https

[jira] [Resolved] (PARQUET-1358) [C++] index_page_offset should be unset as it is not supported.

2018-07-26 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn resolved PARQUET-1358. -- Resolution: Fixed Issue resolved by pull request 480 [https://github.com/apache/parquet-cpp

[jira] [Created] (PARQUET-1358) [C++] index_page_offset should be unset as it is not supported.

2018-07-26 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created PARQUET-1358: Summary: [C++] index_page_offset should be unset as it is not supported. Key: PARQUET-1358 URL: https://issues.apache.org/jira/browse/PARQUET-1358 Project: Parquet

[jira] [Created] (PARQUET-1357) [C++] FormatStatValue truncates binary statistics on zero characters

2018-07-26 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created PARQUET-1357: Summary: [C++] FormatStatValue truncates binary statistics on zero characters Key: PARQUET-1357 URL: https://issues.apache.org/jira/browse/PARQUET-1357 Project

[jira] [Updated] (PARQUET-1357) [C++] FormatStatValue truncates binary statistics on zero character

2018-07-26 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn updated PARQUET-1357: - Summary: [C++] FormatStatValue truncates binary statistics on zero character (was: [C

[jira] [Created] (PARQUET-1349) [C++] PARQUET_RPATH_ORIGIN is not picked by the build

2018-07-14 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created PARQUET-1349: Summary: [C++] PARQUET_RPATH_ORIGIN is not picked by the build Key: PARQUET-1349 URL: https://issues.apache.org/jira/browse/PARQUET-1349 Project: Parquet

[jira] [Resolved] (PARQUET-1346) [C++] Protect against null values data in empty Arrow array

2018-07-12 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn resolved PARQUET-1346. -- Resolution: Fixed Fix Version/s: cpp-1.5.0 Issue resolved by pull request 474 [https

[jira] [Commented] (PARQUET-1343) Unable to read a parquet file

2018-06-29 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16527609#comment-16527609 ] Uwe L. Korn commented on PARQUET-1343: -- This sounds like your file really got corrupted. When

[jira] [Updated] (PARQUET-1343) Unable to read a parquet file

2018-06-29 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn updated PARQUET-1343: - Priority: Minor (was: Blocker) > Unable to read a parquet f

[jira] [Updated] (PARQUET-1333) [C++] Reading of files with dictionary size 0 fails on Windows with bad_alloc

2018-06-25 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn updated PARQUET-1333: - Fix Version/s: cpp-1.5.0 > [C++] Reading of files with dictionary size 0 fails on Wind

[jira] [Assigned] (PARQUET-1158) C++: Basic RowGroup filtering

2018-06-14 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn reassigned PARQUET-1158: Assignee: (was: Uwe L. Korn) > C++: Basic RowGroup filter

[jira] [Commented] (PARQUET-1158) C++: Basic RowGroup filtering

2018-06-14 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16512624#comment-16512624 ] Uwe L. Korn commented on PARQUET-1158: -- [~keithgchapman] No, I'm not actively working

  1   2   3   4   5   6   7   >