Re: [C++] Parquet and Arrow overlap

2024-05-17 Thread Uwe L. Korn
On Fri, May 17, 2024, at 10:36 AM, Antoine Pitrou wrote: > Hi Julien, > > On Thu, 16 May 2024 18:23:33 -0700 > Julien Le Dem wrote: >> >> As discussed, that code was moved in the arrow repo for convenience: >> https://lists.apache.org/thread/gkvbm6yyly1r4cg3f6xtnqkjz6ogn6o2 >> >> To take an

Re: [DISCUSS] Parquet C++ under which PMC?

2024-05-16 Thread Uwe L. Korn
i, >> >> >> >> I share the same feeling with Antoine that parquet-cpp seems to be fully >> >> governed by Apache Arrow PMC, not the Apache Parquet PMC. I have >> >> once discussed this with Xinli and he told me that the contribution to >> >>

Re: [DISCUSS] Parquet C++ under which PMC?

2024-05-16 Thread Uwe L. Korn
at 4:29 PM Antoine Pitrou wrote: > >> On Thu, 16 May 2024 10:08:42 +0200 >> "Uwe L. Korn" wrote: >> > On Tue, May 14, 2024, at 6:30 PM, Antoine Pitrou wrote: >> > > AFAIK, the only Parquet implementation under the Apache Parquet project >> >

Re: [DISCUSS] Parquet Reference Implementation ?

2024-05-16 Thread Uwe L. Korn
On Tue, May 14, 2024, at 6:30 PM, Antoine Pitrou wrote: > AFAIK, the only Parquet implementation under the Apache Parquet project > is parquet-mr :-) This is not true. The parquet-cpp that resides in the arrow repository is still controlled by the Apache Parquet PMC. Back then, we only merged

Re: [DISCUSS] rename parquet-mr to parquet-java?

2024-05-16 Thread Uwe L. Korn
very heavy +1 This would help a lot. On Thu, May 16, 2024, at 4:19 AM, Gang Wu wrote: > +1 on renaming the repo to reduce confusion. > > However, the java library still uses the "parquet-mr" prefix to write its > application version [1] and it is consumed by downstream projects like >

Re: [DISCUSS] Propose changing the default branch of the parquet-site repo

2024-05-12 Thread Uwe L. Korn
+1 On Sun, May 12, 2024, at 9:31 AM, Gang Wu wrote: > +1 > > This makes sense. I was also confused when I had access to > parquet-site for the first time. > > Thanks Andrew! > > Best, > Gang > > On Sun, May 12, 2024 at 3:15 AM Vinoo Ganesh wrote: > >> +1, this would be great. It's something

Archival of parquet-cpp repository

2024-05-06 Thread Uwe L. Korn
Hi, Given that we haven't the parquet-cpp for over six years now, I made a PR https://github.com/apache/parquet-cpp/pull/504 that removes most of the contents over at https://github.com/apache/parquet-cpp/pull/504. This should make it even clearer to everyone that the repository is no longer

Re: Fwd: [C++] Parquet and Arrow overlap

2024-04-24 Thread Uwe L. Korn
> > Best, > Gang > > On Wed, Apr 24, 2024 at 7:04 PM Uwe L. Korn wrote: > >> I would be very supportive of this move. The Parquet C++ development has >> been under the umbrella of the Arrow repository for more than five(six?) >> years now. Thus, the issues shoul

Re: Fwd: [C++] Parquet and Arrow overlap

2024-04-24 Thread Uwe L. Korn
I would be very supportive of this move. The Parquet C++ development has been under the umbrella of the Arrow repository for more than five(six?) years now. Thus, the issues should also be aligned with the Arrow project. Uwe On Tue, Apr 23, 2024, at 8:27 PM, Rok Mihevc wrote: > Bumping this

Re: [VOTE] Expand BYTE_STREAM_SPLIT to support FIXED_LEN_BYTE_ARRAY, INT32 and INT64

2024-03-07 Thread Uwe L. Korn
+1 (binding) On Thu, Mar 7, 2024, at 3:08 PM, Gábor Szádovszky wrote: > +1 (binding) - Not sure if "binding" matters for this case > Thanks, Antoine, for working on this! > > Antoine Pitrou ezt írta (időpont: 2024. márc. 7., Cs, > 14:18): > >> >> Hello, >> >> As discussed previously on this ML

Re: parquet-format status

2024-03-07 Thread Uwe L. Korn
I can strongly second Antoine's response here. It is a small but very import repository hold crucial information for the project.. Best Uwe On Thu, Mar 7, 2024, at 1:17 PM, Antoine Pitrou wrote: > Hello, > > I am surprised that this is suggesting to deprecate or delete a > repository just

Re: [VOTE][Format] Add Float16 type to specification

2023-10-10 Thread Uwe L. Korn
+1 (binding) On Sat, Oct 7, 2023, at 5:49 AM, Daniel Weeks wrote: > +1 > > On Fri, Oct 6, 2023, 8:33 PM Gang Wu wrote: > >> +1 (non-binding) >> >> Best, >> Gang >> >> On Sat, Oct 7, 2023 at 11:05 AM Micah Kornfield >> wrote: >> >> > I'm +1 (non-binding) for the proposal in general. >> > >> > I

Re: [Request] Send automated notifications to a separate mailing-list

2023-08-22 Thread Uwe L. Korn
+1 On Tue, Aug 22, 2023, at 5:29 AM, Gang Wu wrote: > +1 on this. > > We may create the following mailing lists: > - iss...@parquet.apache.org : notifications from JIRA issues. > - comm...@parquet.apache.org : notifications from Github PRs and comments. > > This is what the Apache ORC community

Re: [RESULT] Release Apache Parquet Format 2.9.0 RC0

2021-04-15 Thread Uwe L. Korn
Published the release. On Wed, Apr 14, 2021, at 6:30 PM, Driesprong, Fokko wrote: > Yes, you'll need PMC permissions to do that. > > A PMC could fetch the artifacts from > https://dist.apache.org/repos/dist/dev/parquet/apache-parquet-format-2.9.0-rc0/ > and push them into svn as described below

Re: [VOTE] Release Apache Parquet Format 2.9.0 RC0

2021-04-12 Thread Uwe L. Korn
+1 (binding) Verified signature and checksum on the artifact, passed the tests on macOS 11 (ARM64) with mamba create -p $(pwd)/../env maven thrift-cpp=0.13 conda activate $(pwd)/../env mvn test On Fri, Apr 9, 2021, at 10:27 AM, Gabor Szadovszky wrote: > Thanks, Wes. If this is the case I am

Re: [C++] Changing the versioning string for Parquet-CPP

2021-03-12 Thread Uwe L. Korn
When we merged this into the Arrow repo, at least from my side, there was the intention to revert that maybe at some stage again. The though behind moving parquet-cpp out of the Arrow repo again was based on the idea that Parquet was one of the many interfaces Arrow does provide access to but

Re: [DISCUSS] Ongoing LZ4 problems with Parquet files

2020-06-30 Thread Uwe L. Korn
I'm also in favor of disabling support for now. Having to deal with broken files or the detection of various incompatible implementations in the long-term will harm more than not supporting LZ4 for a while. Snappy is generally more used than LZ4 in this category as it has been available since

Re: Updating parquet web site

2019-10-18 Thread Uwe L. Korn
Hello Gabor, can we call this for clarity https://github.com/apache/parquet-site ? Thanks Uwe On Fri, Oct 18, 2019, at 9:46 AM, Gabor Szadovszky wrote: > Dear All, > > There are some stuff on our web site that is ready for update (since a > while). To spin up the process it would be great if

[jira] [Updated] (PARQUET-1586) [C++] Add --dump options to parquet-reader tool to dump def/rep levels

2019-05-26 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn updated PARQUET-1586: - Fix Version/s: (was: 1.10.1) cpp-1.6.0 > [C++] Add --dump opti

[jira] [Resolved] (PARQUET-1586) [C++] Add --dump options to parquet-reader tool to dump def/rep levels

2019-05-26 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn resolved PARQUET-1586. -- Resolution: Fixed Fix Version/s: 1.10.1 Issue resolved by pull request 4385 [https

[jira] [Resolved] (PARQUET-1583) [C++] Remove parquet::Vector class

2019-05-21 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn resolved PARQUET-1583. -- Resolution: Fixed Issue resolved by pull request 4354 [https://github.com/apache/arrow/pull

Re: Parquet vs. other Open Source Columnar Formats

2019-05-09 Thread Uwe L. Korn
Hello, Be aware that Avro and Protobuf are general serialization formats, not columnar ones such as Parquet or ORC. They are good for RPC or row-wise streaming whereas the latter two are perfect for analytics. Uwe > Am 09.05.2019 um 20:33 schrieb David Mollitor : > > I'm sure there are many

[jira] [Commented] (PARQUET-1022) [C++] Append mode in parquet-cpp

2019-03-13 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791492#comment-16791492 ] Uwe L. Korn commented on PARQUET-1022: -- There is no implementation of merging concatenating files

[jira] [Commented] (PARQUET-1022) [C++] Append mode in parquet-cpp

2019-03-12 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790518#comment-16790518 ] Uwe L. Korn commented on PARQUET-1022: -- [~thamha] The solution here is to write more files

Re: parquet using encoding other than UTF-8

2019-02-05 Thread Uwe L. Korn
Hello Manik, this is not possible at the moment. As Parquet is a portable on-disk format, we focus on having a single representation for each data type. Thus implementing readers/writers is limited to these to make their implementation simpler. Especially as you are the producer but not the

[jira] [Assigned] (PARQUET-1521) [C++] Do not use "extern template class" with parquet::ColumnWriter

2019-02-05 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn reassigned PARQUET-1521: Assignee: Wes McKinney > [C++] Do not use "extern template class"

[jira] [Resolved] (PARQUET-1521) [C++] Do not use "extern template class" with parquet::ColumnWriter

2019-02-05 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn resolved PARQUET-1521. -- Resolution: Fixed Issue resolved by pull request 3551 [https://github.com/apache/arrow/pull

[jira] [Commented] (PARQUET-1523) [C++] Vectorize comparator interface

2019-02-04 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760517#comment-16760517 ] Uwe L. Korn commented on PARQUET-1523: -- This would probably benefit of arrow-compute kernel

Re: [VOTE] Release Apache Parquet 1.10.1 RC0

2019-01-31 Thread Uwe L. Korn
+1 (binding) Build and tested using Ryan's script on Ubuntu 16.04. The script helped me a bit as it included the necessary maven options. Thanks! For future, it would be good to include one as we have in Arrow that also checks the signature. We have that in the main tree and the script also

Re: [DISCUSS] Remove old modules?

2019-01-29 Thread Uwe L. Korn
Hello Fokko, I have put up a PR for the Scala update https://github.com/apache/parquet-mr/pull/605. parquet-scrooge fails due to a Thrift parsing error but parquet-scala succeeds with Scala 2.12 With dropping scrooge, we could at least move this forward. Uwe > Am 29.01.2019 um 11:40 schrieb

Re: [DISCUSS] Bump Apache Thrift dependency to 0.12.0

2019-01-27 Thread Uwe L. Korn
oexist because we shade the one in parquet-format. Thrift should also be > > binary compatible, although I don't think they publish any guarantees. > > > > On Fri, Jan 25, 2019 at 12:53 AM Uwe L. Korn wrote: > > > > > As an FYI: parquet-cpp already uses Thrift 0.12 in

Re: [DISCUSS] Parquet Java 1.10.1 release?

2019-01-27 Thread Uwe L. Korn
Hello Ryan, Making a bugfix release sounds fine for this case. Sadly as with all other RCs, it would help to have better instructions on how to verify the release candidate. Uwe On Fri, Jan 25, 2019, at 8:19 PM, Ryan Blue wrote: > Hi everyone, > > The Spark community caught a correctness bug

[jira] [Assigned] (PARQUET-1504) Add an option to convert Parquet Int96 to Arrow Timestamp

2019-01-27 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn reassigned PARQUET-1504: Assignee: Yongyan Wang > Add an option to convert Parquet Int96 to Arrow Timest

[jira] [Resolved] (PARQUET-1504) Add an option to convert Parquet Int96 to Arrow Timestamp

2019-01-27 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn resolved PARQUET-1504. -- Resolution: Fixed Fix Version/s: 1.12.0 Issue resolved by PR https://github.com

[jira] [Updated] (PARQUET-1496) [Java] Update Scala to 2.12

2019-01-26 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn updated PARQUET-1496: - Description: When trying to build the parquet-mr code on OSX Mojave with OpenJDK 10 and 9

[jira] [Updated] (PARQUET-1496) [Java] Update Scala to 2.12

2019-01-26 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn updated PARQUET-1496: - Summary: [Java] Update Scala to 2.12 (was: [Java] Build fails on OSX and Java 10) > [J

[jira] [Assigned] (PARQUET-1496) [Java] Update Scala to 2.12

2019-01-26 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn reassigned PARQUET-1496: Assignee: Uwe L. Korn > [Java] Update Scala to 2

Re: [DISCUSS] Bump Apache Thrift dependency to 0.12.0

2019-01-25 Thread Uwe L. Korn
As an FYI: parquet-cpp already uses Thrift 0.12 in some of its binary distributions. So when there is a problem with old readers, one has to notice that we already have files out in the wild. Cheers Uwe On Fri, Jan 25, 2019, at 9:13 AM, Gabor Szadovszky wrote: > May it cause any problems that

Re: [VOTE] Release Apache Parquet 1.11.0 RC3

2019-01-22 Thread Uwe L. Korn
at these issues shall be fixed but I think we should > not stop the release because of them. > > Cheers, > Gabor > > On Mon, Jan 21, 2019 at 6:40 PM Uwe L. Korn wrote: > > > Hi, > > > > I'm sadly giving a +0 here. The signatures look good but I was unable to >

[jira] [Resolved] (PARQUET-1501) v1.8.x to be fixed with PARQUET-952 solution

2019-01-22 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn resolved PARQUET-1501. -- Resolution: Won't Fix > v1.8.x to be fixed with PARQUET-952 solut

[jira] [Updated] (PARQUET-1501) v1.8.x to be fixed with PARQUET-952 solution

2019-01-22 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn updated PARQUET-1501: - Description: The following issue fixed in AVro parquet v1.11.0. PARQUET-952: Avro union

[jira] [Commented] (PARQUET-1501) v1.8.x to be fixed with PARQUET-952 solution

2019-01-22 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16748498#comment-16748498 ] Uwe L. Korn commented on PARQUET-1501: -- [~nvijayrech] We probably won't make a bugfix release

[jira] [Updated] (PARQUET-1501) v1.8.x to be fixed with PARQUET-952 solution

2019-01-22 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn updated PARQUET-1501: - Flags: Patch (was: Patch,Important) > v1.8.x to be fixed with PARQUET-952 solut

Re: [VOTE] Release Apache Parquet 1.11.0 RC3

2019-01-21 Thread Uwe L. Korn
Hi, I'm sadly giving a +0 here. The signatures look good but I was unable to build with JDK 9/10/11 on OSX. https://issues.apache.org/jira/browse/PARQUET-1497 and https://issues.apache.org/jira/browse/PARQUET-1496 are the problems I'm running into. I've also opened a PR to document how to

[jira] [Created] (PARQUET-1497) [Java] Building on OSX fails with OpenJDK 11

2019-01-21 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created PARQUET-1497: Summary: [Java] Building on OSX fails with OpenJDK 11 Key: PARQUET-1497 URL: https://issues.apache.org/jira/browse/PARQUET-1497 Project: Parquet Issue Type

[jira] [Created] (PARQUET-1498) [Java] Add instructions to install thrift via homebrew

2019-01-21 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created PARQUET-1498: Summary: [Java] Add instructions to install thrift via homebrew Key: PARQUET-1498 URL: https://issues.apache.org/jira/browse/PARQUET-1498 Project: Parquet

[jira] [Created] (PARQUET-1496) [Java] Build fails on OSX and Java 10

2019-01-21 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created PARQUET-1496: Summary: [Java] Build fails on OSX and Java 10 Key: PARQUET-1496 URL: https://issues.apache.org/jira/browse/PARQUET-1496 Project: Parquet Issue Type: Bug

Re: [Draft REPORT] Apache Parquet - January 2019

2019-01-07 Thread Uwe L. Korn
+1 Uwe On Mon, Jan 7, 2019, at 9:14 PM, Ryan Blue wrote: > +1 > > On Mon, Jan 7, 2019 at 11:39 AM Julien Le Dem > wrote: > > > ## Description: > > Parquet is a standard and interoperable columnar file format > > for efficient analytics. Parquet has 3 sub-projects: > > - parquet-format: format

Re: [Draft REPORT] Apache Parquet - January 2019

2019-01-07 Thread Uwe L. Korn
+1 Uwe On Mon, Jan 7, 2019, at 9:14 PM, Ryan Blue wrote: > +1 > > On Mon, Jan 7, 2019 at 11:39 AM Julien Le Dem > wrote: > > > ## Description: > > Parquet is a standard and interoperable columnar file format > > for efficient analytics. Parquet has 3 sub-projects: > > - parquet-format: format

[jira] [Commented] (PARQUET-1481) [C++] SEGV when reading corrupt parquet file

2018-12-21 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726776#comment-16726776 ] Uwe L. Korn commented on PARQUET-1481: -- Can you describe how you generated this Parquet file

Re: [Discuss] Code of conduct

2018-12-09 Thread Uwe L. Korn
Hello Julien, As per ASF guideline https://www.apache.org/foundation/policies/conduct.html applies also to the Apache Parquet channels. Would that be sufficient for you? Cheers Uwe On Sat, Dec 8, 2018, at 2:14 AM, Julien Le Dem wrote: > We currently don’t have an explicit code of conduct.

[jira] [Updated] (PARQUET-490) [C++] Incorporate DELTA_BINARY_PACKED value encoder into library and add unit tests

2018-11-02 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn updated PARQUET-490: Summary: [C++] Incorporate DELTA_BINARY_PACKED value encoder into library and add unit tests

[jira] [Updated] (PARQUET-492) [C++] Incorporate DELTA_BYTE_ARRAY value encoder into library and add unit tests

2018-11-02 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn updated PARQUET-492: Summary: [C++] Incorporate DELTA_BYTE_ARRAY value encoder into library and add unit tests

[jira] [Updated] (PARQUET-491) [C++] Incorporate DELTA_LENGTH_BYTE_ARRAY value encoder into library and add unit tests

2018-11-02 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn updated PARQUET-491: Summary: [C++] Incorporate DELTA_LENGTH_BYTE_ARRAY value encoder into library and add unit tests

[jira] [Commented] (PARQUET-1454) ld-linux-x86-64.so.2 is missing

2018-10-31 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16669869#comment-16669869 ] Uwe L. Korn commented on PARQUET-1454: -- {{ld-linux-x86-64.so.2}} is a library that is normally

[jira] [Assigned] (PARQUET-1160) [C++] Implement BYTE_ARRAY-backed Decimal reads

2018-09-30 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn reassigned PARQUET-1160: Assignee: Ted Haining (was: Phillip Cloud) > [C++] Implement BYTE_ARRAY-backed Deci

[jira] [Resolved] (PARQUET-1160) [C++] Implement BYTE_ARRAY-backed Decimal reads

2018-09-30 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn resolved PARQUET-1160. -- Resolution: Fixed Fix Version/s: 1.10.1 Issue resolved by pull request 2646 [https

[jira] [Updated] (PARQUET-1160) [C++] Implement BYTE_ARRAY-backed Decimal reads

2018-09-30 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn updated PARQUET-1160: - Fix Version/s: (was: 1.10.1) cpp-1.6.0 > [C++] Implement BYTE_AR

Re: Donate C (GLib) bindings for C++ implementation

2018-09-25 Thread Uwe L. Korn
Hello Kou, this was already mentioned on the pull request but copying it here for others: We would very much like this in the Apache repository and are very grateful for the code donation. We should do the normal format code donation vote and then merge it. Uwe On Tue, Sep 25, 2018, at 7:20

[jira] [Commented] (PARQUET-1422) [C++] Use Arrow IO interfaces natively rather than current parquet:: wrappers

2018-09-22 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16624704#comment-16624704 ] Uwe L. Korn commented on PARQUET-1422: -- +1 from me, that's one of the refactoring benefits I

Re: [RESULT] [VOTE] Moving Apache Parquet C++ development process to a monorepo structure with Apache Arrow C++

2018-09-20 Thread Uwe L. Korn
Hello Wes, I'm definitely +1 on archiving the master branch. I'm not sure what you mean exactly with this. I would have simply added a final commit that deletes all code and adds a message to the README that the repository has moved into a another repo. Cheers Uwe On Thu, Sep 13, 2018, at

[RESULT][VOTE] Release Apache Parquet C++ 1.5.0 RC0

2018-09-06 Thread Uwe L. Korn
ld working properly > > > > Thanks Uwe for managing the release > > On Thu, Aug 30, 2018 at 4:36 PM Wes McKinney wrote: > > > > > > It may take me until sometime tomorrow to run the build since I want > > > to check on Windows also > > > On Wed, Aug

Re: [RESULT] [VOTE] Moving Apache Parquet C++ development process to a monorepo structure with Apache Arrow C++

2018-09-04 Thread Uwe L. Korn
k place > > > > > > https://github.com/apache/spark/commit/2fe0a1aaeebbf7f60bd4130847d738c29f1e3d53 > > > > > > It would be my preference to have a single squashed commit whose > > > message attributes the developers of the code and provides links back &

Re: [VOTE] Release Apache Parquet C++ 1.5.0 RC0

2018-08-29 Thread Uwe L. Korn
+1 (binding) Verified on Ubuntu 16.04 using `./dev/release/verify-release-candidate 1.5.0 0` On Wed, Aug 29, 2018, at 5:09 PM, Uwe L. Korn wrote: > All, > > I propose that we accept the following release candidate as the official > Apache Parquet C++ 1.5.0 release. > > Par

[VOTE] Release Apache Parquet C++ 1.5.0 RC0

2018-08-29 Thread Uwe L. Korn
All, I propose that we accept the following release candidate as the official Apache Parquet C++ 1.5.0 release. Parquet C++ 1.5.0-rc0 includes the following: --- The CHANGELOG for the release is available at:

Re: Doing a 1.5.0 C++ release

2018-08-27 Thread Uwe L. Korn
> > Thanks > >> On Sun, Aug 26, 2018 at 1:06 PM, Wes McKinney wrote: >> I think we should be able to cut a release now? We can also proceed >> with the Arrow merge at the same time once we agree how particularly >> to do that. >> >>> On Wed, Aug

Re: Date and time for next Parquet sync

2018-08-27 Thread Uwe L. Korn
Hello Nador, probably I can make this time. Just a timezone question: Is it 6pm CET or 6pm CEST? I guess the latter. See http://timesched.pocoo.org/?date=2018-08-28=central-europe-standard-time!,pacific-standard-time=1080,1140 Uwe On Mon, Aug 27, 2018, at 12:20 PM, Nandor Kollar wrote: > Hi

[jira] [Resolved] (PARQUET-1372) [C++] Add an API to allow writing RowGroups based on their size rather than num_rows

2018-08-25 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn resolved PARQUET-1372. -- Resolution: Fixed Issue resolved by pull request 484 [https://github.com/apache/parquet-cpp

[jira] [Resolved] (PARQUET-1392) [C++] Supply row group indices to parquet::arrow::FileReader::ReadTable

2018-08-23 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn resolved PARQUET-1392. -- Resolution: Fixed Resolved by PR https://github.com/apache/parquet-cpp/pull/492 >

[jira] [Moved] (PARQUET-1403) Can't save a df using Parquet if using float16

2018-08-23 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn moved ARROW-3112 to PARQUET-1403: - Affects Version/s: (was: 0.10.0) cpp-1.4.0

Re: PARQUET-1399: Move parquet-mr related code from parquet-format

2018-08-22 Thread Uwe L. Korn
Hello Gabor > I've just realized that merge commit in github is "not enabled for this > repository". Any suggestions how we can workaround this? You have to merge manually on your commandline using "git merge … && git push origin master". Uwe

Re: Doing a 1.5.0 C++ release

2018-08-22 Thread Uwe L. Korn
would like to get https://issues.apache.org/jira/browse/PARQUET-1372 into > > this release as well. There is a PR already open for this JIRA and I got > > some feedback. I will address the feedback in the next couple of days. > > > > On Sun, Aug 19, 2018 at 8:48 AM Uwe L. Korn w

Re: [VOTE] Moving Apache Parquet C++ development process to a monorepo structure with Apache Arrow C++

2018-08-21 Thread Uwe L. Korn
, 2018, at 7:36 PM, Wes McKinney wrote: > OK. I'm a bit -0 on doing anything that results in Arrow having a > nonlinear git history (and rebasing is not really an option) but we > can discuss that more later > > On Sun, Aug 19, 2018 at 8:50 AM, Uwe L. Korn wrote: > > +1 on

[jira] [Comment Edited] (PARQUET-1395) [C++] Tests fail due to not finding libboost_system.so

2018-08-20 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16585947#comment-16585947 ] Uwe L. Korn edited comment on PARQUET-1395 at 8/20/18 1:58 PM: --- Ok

[jira] [Commented] (PARQUET-1395) [C++] Tests fail due to not finding libboost_system.so

2018-08-20 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16585947#comment-16585947 ] Uwe L. Korn commented on PARQUET-1395: -- Ok, that is definitely the root of the problem

[jira] [Commented] (PARQUET-1395) [C++] Tests fail due to not finding libboost_system.so

2018-08-20 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16585931#comment-16585931 ] Uwe L. Korn commented on PARQUET-1395: -- [~pitrou] can you post the output of ? {code} % objdump

[jira] [Commented] (PARQUET-1395) [C++] Tests fail due to not finding libboost_system.so

2018-08-20 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16585927#comment-16585927 ] Uwe L. Korn commented on PARQUET-1395: -- We should have a look at what {{conda}} is doing

[jira] [Created] (PARQUET-1393) [C++] Change parquet::arrow::FileReader::ReadRowGroups to read into continuous arrays

2018-08-19 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created PARQUET-1393: Summary: [C++] Change parquet::arrow::FileReader::ReadRowGroups to read into continuous arrays Key: PARQUET-1393 URL: https://issues.apache.org/jira/browse/PARQUET-1393

[jira] [Created] (PARQUET-1392) [C++] Supply row group indices to parquet::arrow::FileReader::ReadTable

2018-08-19 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created PARQUET-1392: Summary: [C++] Supply row group indices to parquet::arrow::FileReader::ReadTable Key: PARQUET-1392 URL: https://issues.apache.org/jira/browse/PARQUET-1392 Project

[jira] [Assigned] (PARQUET-1158) [C++] Basic RowGroup filtering

2018-08-19 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn reassigned PARQUET-1158: Assignee: Uwe L. Korn > [C++] Basic RowGroup filter

[jira] [Updated] (PARQUET-1158) [C++] Basic RowGroup filtering

2018-08-19 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn updated PARQUET-1158: - Summary: [C++] Basic RowGroup filtering (was: C++: Basic RowGroup filtering) > [C++] Ba

[jira] [Updated] (PARQUET-1158) [C++] Basic RowGroup filtering

2018-08-19 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn updated PARQUET-1158: - Fix Version/s: (was: cpp-1.5.0) cpp-1.6.0 > [C++] Basic RowGr

Re: [VOTE] Moving Apache Parquet C++ development process to a monorepo structure with Apache Arrow C++

2018-08-19 Thread Uwe L. Korn
+1 on this but also see my comments in the mail on the discussions. We should also keep the git history of parquet-cpp, that should not be hard with git and there is probably a StackOverflow answer out there that gives you the commands to do the merge. Uwe On Fri, Aug 17, 2018, at 12:57 AM,

Doing a 1.5.0 C++ release

2018-08-19 Thread Uwe L. Korn
Hello, as we are in the process of doing/voting on a repo merge with the Arrow project and also because there was some time since the last release, I would like to proceed with a 1.5.0 release soon. Please have a look over the issues at

[jira] [Updated] (PARQUET-1122) [C++] Support 2-level list encoding in Arrow decoding

2018-08-19 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn updated PARQUET-1122: - Fix Version/s: (was: cpp-1.5.0) cpp-1.6.0 > [C++] Support 2-level l

Re: [DISCUSS] Solutions for improving the Arrow-Parquet C++ development morass

2018-08-19 Thread Uwe L. Korn
Back from vacation, I also want to finally raise my voice. With the current state of the Parquet<->Arrow development, I see a benefit in merging the code base for now, but not necessarily forever. Parquet C++ is the main code base of an artefact for which an Arrow C++ adapter is built and that

Re: num_level in Parquet Cpp library & how to add a JSON field?

2018-08-19 Thread Uwe L. Korn
ything breaks I can highly recommend reading parquet-mr's READMEs. Uwe > > Thanks! > -Ivy > > On 2018/08/03 13:46:15, "Uwe L. Korn" wrote: > > Hello Ivy, > > > > "primitive binary" means `Type::BYTE_ARRAY`, so you're correct. I have n

[jira] [Resolved] (PARQUET-1390) [Java] Upgrade to Arrow 0.10.0

2018-08-19 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn resolved PARQUET-1390. -- Resolution: Fixed Issue resolved by pull request 516 [https://github.com/apache/parquet-mr

[jira] [Assigned] (PARQUET-1390) [Java] Upgrade to Arrow 0.10.0

2018-08-19 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn reassigned PARQUET-1390: Assignee: Andy Grove > [Java] Upgrade to Arrow 0.1

[jira] [Updated] (PARQUET-1390) [Java] Upgrade to Arrow 0.10.0

2018-08-19 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn updated PARQUET-1390: - Fix Version/s: 1.11.0 > [Java] Upgrade to Arrow 0.1

Re: Status of column index in parquet-mr

2018-08-19 Thread Uwe L. Korn
Hello Gabor, comment in-line > The implementation was done based on the original design of column indexes > meaning > that no row alignment is required between the pages (the only requirement > is for the pages to respect row

Re: Date and time for next Parquet sync

2018-08-12 Thread Uwe L. Korn
As the meeting falls into my summer vacation I cannot participate but will try to join again if there is a meeting two weeks later. Uwe > Am 08.08.2018 um 16:43 schrieb Nandor Kollar : > > Hi All, > > It has been a while since we had a Parquet sync, therefore I'd like to > propose to have one

[jira] [Commented] (PARQUET-1370) Read consecutive column chunks in a single scan

2018-08-03 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16568403#comment-16568403 ] Uwe L. Korn commented on PARQUET-1370: -- I'm doing the same, my code looks as follows: {code:java

[jira] [Commented] (PARQUET-1370) Read consecutive column chunks in a single scan

2018-08-03 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16568368#comment-16568368 ] Uwe L. Korn commented on PARQUET-1370: -- [~rgruener] I was also plagued by this issue but I wrapped

[jira] [Commented] (PARQUET-1369) [Python] Unavailable Parquet column statistics from Spark-generated file

2018-08-03 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16568247#comment-16568247 ] Uwe L. Korn commented on PARQUET-1369: -- [~rgruener] Moved it. > [Python] Unavailable Parq

[jira] [Moved] (PARQUET-1369) [Python] Unavailable Parquet column statistics from Spark-generated file

2018-08-03 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn moved ARROW-2800 to PARQUET-1369: - Fix Version/s: (was: 0.11.0) cpp-1.5.0

Re: num_level in Parquet Cpp library & how to add a JSON field?

2018-08-03 Thread Uwe L. Korn
Hello Ivy, "primitive binary" means `Type::BYTE_ARRAY`, so you're correct. I have not yet seen anyone use the JSON field with parquet-cpp but the JSON type is simply a binary string with an annotation so I would expect everything to just work. Uwe On Thu, Aug 2, 2018, at 7:59 PM,

[jira] [Resolved] (PARQUET-1357) [C++] FormatStatValue truncates binary statistics on zero character

2018-08-01 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn resolved PARQUET-1357. -- Resolution: Fixed Issue resolved by PR https://github.com/apache/parquet-cpp/pull/479 >

[jira] [Commented] (PARQUET-1361) [C++] 1.4.1 library allows creation of parquet file w/NULL values for INT types

2018-07-31 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16563874#comment-16563874 ] Uwe L. Korn commented on PARQUET-1361: -- What is the problem with the generated Parquet file? I

[jira] [Commented] (PARQUET-1363) Add IP address logical type

2018-07-30 Thread Uwe L. Korn (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562055#comment-16562055 ] Uwe L. Korn commented on PARQUET-1363: -- [~tmgstev] You would probably need two types: IPv4

  1   2   3   4   5   6   7   >