Re: Removal of deprecated code in parquet-format

2024-03-30 Thread Driesprong, Fokko
I'm also in favor of removing the old code. Thanks for working on this Vinoo. Kind regards, Fokko Op za 30 mrt 2024 om 19:16 schreef Vinoo Ganesh : > Thanks, Gang! If there is no objection, I'll put together a PR staging the > removal. > > > > > > On Wed, Mar 27, 2024 at 10:16 PM Gang Wu

Re: [VOTE] Release Apache Parquet Format 2.10.0 RC0

2023-11-20 Thread Driesprong, Fokko
Thanks Gang for running this! +1 (non-binding) from my end. I've included the new release in the Thrift 0.19 PR and it now compiles (still some failing tests because of the new UUID type, but will fix that somewhere today). Kind regards, Fokko

Re: Drop parquet-thrift

2023-09-28 Thread Driesprong, Fokko
Hey Gang, It is also used in some of the code: - org.apache.parquet.hadoop.thrift.AbstractThriftWriteSupport - org.apache.parquet.thrift.AbstractThriftWriteSupport - org.apache.parquet.thrift.ThriftSchemaConverter - org.apache.parquet.thrift.TupleToThriftWriteSupport Yesterday I

Re: [DISCUSS] Time to release parquet format 2.10.0?

2023-04-28 Thread Driesprong, Fokko
Hey Gang, I think it would be great to get another release out! Kind regards, Fokko Op do 27 apr 2023 om 09:52 schreef Gang Wu : > Hi, > > The latest parquet format is v2.9.0 [1] which was released two years ago. > Is it a good time to release the next version? If there is no objection, I >

Re: Parquet Website Launched

2022-03-28 Thread Driesprong, Fokko
Hey Vinoo, Thanks for sharing. The new website looks absolutely amazing! Kind regards, Fokko Op ma 28 mrt. 2022 om 14:08 schreef Vinoo Ganesh : > Hi Maya, > Thanks for the feedback. A search feature will be released soon. We're > waiting for Algolia to approve Parquet as a valid open source

Re: [ANNOUNCEMENT] Gidon Gershinsky as Apache Parquet PMC

2021-11-24 Thread Driesprong, Fokko
Congrats Gidon, well deserved! Op wo 24 nov. 2021 om 22:46 schreef Chao Sun > Congratulations Gidon! > > On Wed, Nov 24, 2021 at 1:27 PM Xinli shang > wrote: > > > Hi all, > > > > The Project Management Committee (PMC) for Apache Parquet has invited > Gidon > > Gershinsky to become a PMC

Re: [RESULT] Release Apache Parquet Format 2.9.0 RC0

2021-04-14 Thread Driesprong, Fokko
Yes, you'll need PMC permissions to do that. A PMC could fetch the artifacts from https://dist.apache.org/repos/dist/dev/parquet/apache-parquet-format-2.9.0-rc0/ and push them into svn as described below :) Cheers, Fokko Op wo 14 apr. 2021 om 17:38 schreef Antoine Pitrou : > On Wed, 14 Apr

Re: [VOTE] Release Apache Parquet Format 2.9.0 RC0

2021-04-13 Thread Driesprong, Fokko
Thanks! Thanks for picking this up. +1 (non-binding) Checked locally, and looks good. Verified the signature and checksum on the artifact. Cheers, Fokko Op di 13 apr. 2021 om 14:55 schreef Antoine Pitrou : > > We have three binding +1s now. > Does anyone else want to give a vote? > > Regards

Re: [Announce] new committer: Gidon Gershinsky

2021-04-07 Thread Driesprong, Fokko
Congrats Gidon, well deserved :) Op wo 7 apr. 2021 om 18:11 schreef Dongjoon Hyun > Congrats, Gidon! :) > > Bests, > Dongjoon. > > On Wed, Apr 7, 2021 at 9:06 AM Chao Sun wrote: > > > Congrats Gidon! > > > > On Wed, Apr 7, 2021 at 8:27 AM Micah Kornfield > > wrote: > > > > > Congrats Gidon,

Re: [VOTE] Release Apache Parquet 1.12.0 RC1

2021-01-27 Thread Driesprong, Fokko
Thanks for running the release Gabor! The signature checks out: MacBook-Pro-van-Fokko:Downloads fokkodriesprong$ curl https://downloads.apache.org/parquet/KEYS > KEYS % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload

Re: [ANNOUNCE] New Parquet PMC member - Xinli Shang

2020-11-09 Thread Driesprong, Fokko
Cool! Well deserved Xinli! Cheers, Fokko Op di 10 nov. 2020 om 07:35 schreef Gabor Szadovszky : > Well deserved! Congrats, Xinli! > > Regards, > Gabor > > Gidon Gershinsky ezt írta (időpont: 2020. nov. 10., K > 6:27): > > > Congratulations, Xinli! > > > > Cheers, Gidon > > > > > > On Tue, Nov

Re: Exception thrown by AvroParquetWriter#write causes all subsequent calls to it to fail

2020-07-22 Thread Driesprong, Fokko
I've added an answer to the Ticket: https://issues.apache.org/jira/browse/PARQUET-1887 And created a PR for who's interested: https://github.com/apache/parquet-mr/pull/804 Cheers, Fokko Op wo 22 jul. 2020 om 12:29 schreef Driesprong, Fokko : > Thanks for reaching out Øyvind. > > Whic

Re: [Announce] new committer: Xinli Shang

2020-03-12 Thread Driesprong, Fokko
Great to have you onboard Xinli, welcome! Cheers, Fokko Op do 12 mrt. 2020 om 21:50 schreef Julien Le Dem : > The Project Management Committee (PMC) for Apache Parquet > has invited Xinli Shang to become a committer and we are pleased > to announce that he has accepted. > > Welcome Xinli! >

Re: Spotless

2020-01-22 Thread Driesprong, Fokko
> > On Wed, Jan 8, 2020 at 7:36 PM Ryan Blue > wrote: > > > +1 for spotless checks. > > > > On Wed, Jan 8, 2020 at 7:13 AM Driesprong, Fokko > > wrote: > > > > > Y'all, > > > > > > Recently Chen Junjie brought up the removal of

Re: Preparing release 1.11.1

2020-01-22 Thread Driesprong, Fokko
Thank you Gabor, What kind of issues are found? Let me know if I can help in any way. Cheers, Fokko Op wo 22 jan. 2020 om 11:10 schreef Gabor Szadovszky : > Dear All, > > During the migration to 1.11.0 in Spark we discovered some issues in the > parquet release. I am preparing the minor

Spotless

2020-01-08 Thread Driesprong, Fokko
Y'all, Recently Chen Junjie brought up the removal of trailing spaces within the code and the headers: https://github.com/apache/parquet-mr/pull/727#issuecomment-571562392 I've been looking into this and looked into if we can apply something like checkstyle to let it fail on trailing whitespace.

Re: [VOTE] Release Apache Parquet Format 2.8.0 RC0

2020-01-06 Thread Driesprong, Fokko
Thank you, Gabor, for running the vote. A +1 (non-binding) from me as well. Checked the changes and it looks good to me. Cheers, Fokko Op ma 6 jan. 2020 om 13:39 schreef Gabor Szadovszky : > Hi All, > > We have two +1 binding votes so far. > Anyone is interested in checking/voting this

Re: [VOTE] Release Apache Parquet 1.11.0 RC7

2019-11-19 Thread Driesprong, Fokko
efault because the benchmarks did not show significant performance > > > > penalties. See https://github.com/apache/parquet-mr/pull/647 for > > > details. > > > > > > > > About the file size change. 1.11.0 is introducing column indexes, CRC > > > > c

Re: [VOTE] Release Apache Parquet 1.11.0 RC7

2019-11-17 Thread Driesprong, Fokko
Unfortunately, a -1 from my side (non-binding) I've updated Iceberg to Parquet 1.11.0, and found three things: - We've broken backward compatibility of the constructor of ColumnChunkPageWriteStore

Re: release process - using rc tags

2019-10-30 Thread Driesprong, Fokko
+1 Op wo 30 okt. 2019 om 16:51 schreef Ryan Blue : > +1 > > I recently built the release process for Iceberg and that's what we decided > to go with in that community. Here are the docs if you'd like to copy them > to update the Parquet docs: >

Re: Stalebot

2019-10-24 Thread Driesprong, Fokko
> wrote: > > > > > >> Sounds good to have it. We might want to set the expiration limit to a > > >> larger value according to commit history. > > >> > > >> Driesprong, Fokko 于2019年10月23日周三 下午9:32写道: > > >>> > > &

Re: PARQUET-1441/parquet-mr #560 in 1.11.0 release?

2019-10-24 Thread Driesprong, Fokko
Hi Michael, Thanks for asking. The commit will be included from Parquet 1.11 as this version will be tagged from master. It looks like that the commit went into master: https://github.com/apache/parquet-mr/commit/1e5fda5310687b0856e74f00a4ea420b6b1ab34d So it will be included in 1.11. Cheers,

Re: multi threading support

2019-10-23 Thread Driesprong, Fokko
the responsibility of Parquet. > > You can parallelize by writing more Parquet files in separate threads. > > Adding locks to Parquet doesn't make much sense and is unlikely to speed > up > > your application without huge changes to Parquet. > > > > On M

Re: custom CompressionCodec support

2019-10-21 Thread Driesprong, Fokko
> We will have a look how its working for spark and not for us. > > Regards > Manik Singla > +91-9996008893 > +91-9665639677 > > "Life doesn't consist in holding good cards but playing those you hold > well." > > > On Fri, Oct 18, 2019 at 5:20 PM Driespron

Re: multi threading support

2019-10-21 Thread Driesprong, Fokko
ill increase locking if we implement concurrent access but > leave users carefree. > > > > Regards > Manik Singla > +91-9996008893 > +91-9665639677 > > "Life doesn't consist in holding good cards but playing those you hold > well." > > > On Thu, Oct

Re: Help on Parquet Write Slowness and UUID support

2019-10-21 Thread Driesprong, Fokko
Thank you Felix, Could you share some minimal examples of how you ran the benchmarking? I saw the code on the ticket, but it would be better to open a new repo on which you run the benchmark end to end. I'm also curious about how you did the Avro performance measurement. While writing a specific

Re: Updating parquet web site

2019-10-18 Thread Driesprong, Fokko
Great work! Op vr 18 okt. 2019 om 17:53 schreef Ryan Blue > Sounds good to me! Thanks for taking care of this. > > On Fri, Oct 18, 2019 at 1:44 AM Gabor Szadovszky wrote: > > > Hi Uwe, > > > > parquet-site sounds good to me. > > > > Cheers, > > Gabor > > > > On Fri, Oct 18, 2019 at 10:19 AM

Re: Working on 1.11.0 RC7

2019-10-18 Thread Driesprong, Fokko
Perfect, thanks Gabor. Cheers, Fokko Op vr 18 okt. 2019 om 14:24 schreef Gabor Szadovszky : > Hi Fokko, > > There is no separate branch. Based on the discussion on the yesterday > parquet sync 1.11.0 is planned to be released from master. > > Cheers, > Gabor > >

Re: Working on 1.11.0 RC7

2019-10-18 Thread Driesprong, Fokko
Thanks for doing the release Gabor, Is there a branch for 1.11.0? Please let me know. Cheers, Fokko Op vr 18 okt. 2019 om 09:55 schreef Gabor Szadovszky : > Dear All, > > In the next couple of weeks I'll be working on the next release candidate > of 1.11.0. If you have any ongoing issues that

Re: custom CompressionCodec support

2019-10-18 Thread Driesprong, Fokko
t there are > only smoke tests that nothing crashes. > > > Regards, > > Martin > -- > *From:* Falak Kansal > *Sent:* Thursday, October 17, 2019 4:43:54 PM > *To:* Driesprong, Fokko > *Cc:* dev@parquet.apache.org > *Subject:* Re: custom Com

Re: multi threading support

2019-10-17 Thread Driesprong, Fokko
Thank you for your question Manik, First of all, I think most of the people working on this project are guys, but I would not exclude any other gender. Secondly. Parquet is widely used in different open source project such as Hive, Presto and Spark. These frameworks scale-out by design. For

Re: custom CompressionCodec support

2019-10-17 Thread Driesprong, Fokko
Hi Manik, The supported compression codecs that ship with Parquet are tested and validated in the CI pipeline. Sometimes there are issues with compressors, therefore they are not easily pluggable. Feel free to open up a PR to the project if you believe if there are compressors missing, then we

Re: [VOTE] Release Apache Parquet Format 2.7.0 RC0

2019-09-26 Thread Driesprong, Fokko
Checked signature and checksums +1 (non-binding) Cheers, Fokko Op do 26 sep. 2019 om 10:16 schreef Gabor Szadovszky : > Checksums/signatures are correct. Tarball content is correct. Unit tests > pass. > > +1 (binding) > > On Thu, Sep 26, 2019 at 6:02 AM 俊杰陈 wrote: > > > +1, downloaded,

Re: zstd codec

2019-08-01 Thread Driesprong, Fokko
gt; well." > > > On Wed, Jul 31, 2019 at 3:34 PM Manik Singla wrote: > > > Hadoop common fails with native zStandard library not available: this > > version of libhadoop was built without zstd support. > > > > I don't want to compile libhadoop for myself

Re: zstd codec

2019-07-31 Thread Driesprong, Fokko
Hi Manik, Can you explain how you en- and decode the messages? Please keep in mind that with Parquet only the actual data is compressed. For example, the metadata in the footer isn't compressed. Cheers, Fokko Op wo 31 jul. 2019 om 11:33 schreef Manik Singla : > Hi Guys > > We are trying to use

Re: Floating point data compression for Apache Parquet

2019-07-16 Thread Driesprong, Fokko
Thank you Roman. I'm looking forward to the proposal. Regarding the process, there is more information on the Apache page: https://www.apache.org/foundation/voting.html Cheers, Fokko Driesprong Op di 16 jul. 2019 om 16:03 schreef Roman Karlstetter < roman.karlstet...@gmail.com>: > Ok, thanks

Re: New PMC member: Gabor Szadovszky

2019-06-29 Thread Driesprong, Fokko
Congrats Gabor, well deserved! Op vr 28 jun. 2019 om 19:13 schreef Tim Armstrong > Congrats Gabor! > > On Fri, Jun 28, 2019 at 10:08 AM Wes McKinney wrote: > > > Congrats! > > > > On Fri, Jun 28, 2019 at 10:34 AM Lars Volker > > wrote: > > > > > > Congratulations Gabor! > > > > > > On Fri,

Re: [DISCUSS] Prepare release for parquet-format 2.7.0?

2019-06-28 Thread Driesprong, Fokko
Ryan has a valid point here. Once the Bloom filters get released, it won't be as easy anymore to change it because we will break an already released API. There was a related discussion a while ago:

Re: [DISCUSS] Prepare release for parquet-format 2.7.0?

2019-06-27 Thread Driesprong, Fokko
If there are no other volunteers, I can cut the branch and prepare RC1 tomorrow morning. Cheers, Fokko Op do 27 jun. 2019 om 17:38 schreef Jim Apple : > > Looks like we don't have any blocking issue since there is no update in > the > > Jira(https://jira.apache.org/jira/browse/PARQUET-1608)

Re: New committer: Nandor Kollar

2019-06-25 Thread Driesprong, Fokko
Great job Nandor, congratulations! Op di 25 jun. 2019 om 19:32 schreef Xinli shang : > Congratulations Nandor! > > On Tue, Jun 25, 2019 at 8:55 AM Lars Volker > wrote: > > > Congratulations Nandor! > > > > On Tue, Jun 25, 2019 at 8:52 AM Tim Armstrong > > wrote: > > > > > Congratulations! > >

Re: [DISCUSS] Prepare release for parquet-format 2.7.0?

2019-06-20 Thread Driesprong, Fokko
Good point Jim, Personally, I'm also looking forward to the next release of Apache Parquet format. I took the liberty of creating an umbrella ticket to get an overview of the blockers that we want to get in the 2.7 release. The ticket: https://jira.apache.org/jira/browse/PARQUET-1608 Regarding

Re: Add support for Java 11

2019-06-11 Thread Driesprong, Fokko
> its children? There are some more blocking issues mentioned there. > > Br, > > Zoltan > > On Mon, Jun 10, 2019 at 9:19 PM Driesprong, Fokko > wrote: > > > > Hi all, > > > > I'm working towards making Parquet compatible with Java 11. Would it be > > possi

Add support for Java 11

2019-06-10 Thread Driesprong, Fokko
Hi all, I'm working towards making Parquet compatible with Java 11. Would it be possible to get a review on the following PR's? https://github.com/apache/parquet-format/pull/134 https://github.com/apache/parquet-format/pull/136 Since Thrift is JDK11 compatible since 0.12.0, we need to fix this

Re: [VOTE] Release Apache Parquet 1.11.0 RC6

2019-06-03 Thread Driesprong, Fokko
Ismaël, >From what I understand from Jacob Tolar, I believe we have a blocking issue in Apache Avro: https://issues.apache.org/jira/browse/AVRO-2400 We would need to release 1.9.1 first to get this sorted out. Regarding Parquet 1.11.0, I would expect a branch which has been cut from master.

Re: Current Parquet Version

2019-04-09 Thread Driesprong, Fokko
Hi Brian, You could take a look at the Github of the Apache Parquet Format itself: https://github.com/apache/parquet-format Cheers, Fokko Op ma 8 apr. 2019 om 20:19 schreef Brian Bowman : > What is most current Apache Parquet file format version? Where is this > designated on the official

Re: [DISCUSS] Upgrade to Jackson 2.x and remove the shading

2019-02-19 Thread Driesprong, Fokko
caused a lot of headache. Why go backward and make Parquet vulnerable > to > >>>> those problems? I don't see a good justification for it. > >>>> > >>>> On Mon, Feb 18, 2019 at 8:29 AM Jacques Nadeau > >>>> wrote: > >>>>

[DISCUSS] Upgrade to Jackson 2.x and remove the shading

2019-02-18 Thread Driesprong, Fokko
Hi all, Recently I've opened a PR to move from Jackson 1.x to Jackson 2.9 . I've also removed the shading project since most libraries are up to date with Jackson 2.x. Gabor suggested having a discussion on the mailing list to discuss the removal of

Re: [DISCUSS] Remove old modules?

2019-02-11 Thread Driesprong, Fokko
That is true. Shall we move this forward by creating Jira's for dropping the modules? Then we can have further discussion on the tickets themselves. For me, I would suggest following to be dropped: - parquet-hive-* - parquet-hadoop-bundle (shaded deps) - parquet-cascading - parquet-pig -

Re: [DISCUSS] Bump Apache Thrift dependency to 0.12.0

2019-01-29 Thread Driesprong, Fokko
with a different > version > > as > > > > > > before? It might require some tests if the older readers are able > > to read > > > > > > the files written with the new thrift. > > > > > > Any thoughts? > > > > > > &

Re: [DISCUSS] Remove old modules?

2019-01-29 Thread Driesprong, Fokko
Hi Ryan, I think that would be a great idea. Having these old modules around if we don't use them, doesn't make any sense. - For the parquet-scala module, I might take the time to bump this to Scala 2.12 if people are still interested in this. - Personally, I'm using parquet-tools, mostly

[DISCUSS] Bump Apache Thrift dependency to 0.12.0

2019-01-24 Thread Driesprong, Fokko
Hi all, I would like to discuss updating the Thrift dependency to 0.12.0 of Parquet. In my effort to make Parquet forward compatible for JDK11 , I stumbled upon some issues. One of them that we still rely, in both the CI and documentation, on Thrift