Re: Updates to Apache Parquet Twitter account

2024-05-23 Thread Julien Le Dem
Hello, This is correct. I have updated the website and bio. Julien On Mon, May 13, 2024 at 4:53 PM Vinoo Ganesh wrote: > We looked into this about a year ago and I think @Julien Le Dem > may be the person with access to the Parquet > twitter. > > > > > On Mon, May 1

Re: Is Parquet Meant As a Standalone Database or is a Catalog/Metastore Required?

2024-05-23 Thread Julien Le Dem
I would agree it's a bit of both. The metadata overhead (per data volume) doesn't increase when you have fewer files. That being said, you could use fewer of the metadata features in that use case if the goal is to exchange well formed data without ambiguity. For wide schema it would be useful to

Re: [DISCUSS] Parquet 3 "wide schema" metadata draft

2024-05-23 Thread Julien Le Dem
of the docs that we'll label accordingly. On Sat, May 18, 2024 at 4:30 AM Antoine Pitrou wrote: > On Fri, 17 May 2024 07:37:37 -0700 > Julien Le Dem wrote: > > This context should be added in the PR description itself. > > Good point, I've added context in the PR descri

Re: [DISCUSS] Parquet 3 metadata draft / strawman proposal

2024-05-23 Thread Julien Le Dem
I just wanted to follow up and say thank you Antoine for updating the description of your PR and bringing the discussion back to the doc. This is helpful. https://github.com/apache/parquet-format/pull/242 On Fri, May 17, 2024 at 10:37 AM Julien Le Dem wrote: > This context should be ad

Re: [DISCUSS] rename parquet-mr to parquet-java?

2024-05-20 Thread Julien Le Dem
> >>> > > >>> GitHub will handle all the redirects from the old to the new name, so > > no > > >>> reason from my end to not rename it :) > > >>> > > >>> Cheers, Fokko > > >>> > > >>> Op v

Re: [C++] Parquet and Arrow overlap

2024-05-17 Thread Julien Le Dem
in the arrow repo? (just an idea) On Fri, May 17, 2024 at 2:49 AM Uwe L. Korn wrote: > > > On Fri, May 17, 2024, at 10:36 AM, Antoine Pitrou wrote: > > Hi Julien, > > > > On Thu, 16 May 2024 18:23:33 -0700 > > Julien Le Dem wrote: > >> > >>

Re: [DISCUSS] rename parquet-mr to parquet-java?

2024-05-17 Thread Julien Le Dem
+1 I should have named it that to start with. On Fri, May 17, 2024 at 3:27 AM Wang, Yuming wrote: > +10086 > > From: Uwe L. Korn > Date: Thursday, May 16, 2024 at 15:41 > To: dev@parquet.apache.org > Subject: Re: [DISCUSS] rename parquet-mr to parquet-java? > External Email > > very heavy +1

Re: [DISCUSS] Parquet Reference Implementation ?

2024-05-17 Thread Julien Le Dem
hors are active here; especially as the Parquet Java and C++ > > teams seem to have some overlap historically, and a third > > implementation helps bring different perspectives. > > > > Regards > > > > Antoine. > > > > > > On Thu, 16 May 2024 17:37:35 -0700

Re: [DISCUSS] Parquet 3 metadata draft / strawman proposal

2024-05-17 Thread Julien Le Dem
d > when I saw that it seemed actually doable I decided it would be worth > posting the initial sketch instead of keeping it for myself. > > Regards > > Antoine. > > > On Thu, 16 May 2024 18:41:26 -0700 > Julien Le Dem wrote: > > Hi Antoine

Re: [DISCUSS] Parquet 3 metadata draft / strawman proposal

2024-05-16 Thread Julien Le Dem
Hi Antoine, On the other thread Micah is collecting feedback in a document. https://lists.apache.org/thread/61z98xgq2f76jxfjgn5xfq1jhxwm3jwf Would you mind putting your feedback there? We should collect the goals before jumping to solutions. It is a bit difficult to discuss those directly in the

Re: [C++] Parquet and Arrow overlap

2024-05-16 Thread Julien Le Dem
live in different repositories so > I think the same solutions could apply for Parquet. > > On Thu, May 16, 2024 at 12:57 AM Antoine Pitrou > wrote: > > > On Tue, 14 May 2024 10:22:37 -0700 > > Julien Le Dem wrote: > > > 1. I think we should make it easy for peop

Re: [DISCUSS] Parquet Reference Implementation ?

2024-05-16 Thread Julien Le Dem
et outside the parquet-mr/arrow world. > > > > > > Just my (non-binding) two cents. > > > > > > Cheers, > > > Ed > > > > > > [1] > > > > > > https://github.com/apache/parquet-format/blob/master/Encodings.md#delta-length-byt

Re: [DISCUSS] Parquet C++ under which PMC?

2024-05-16 Thread Julien Le Dem
Here is the thread we voted on at the time: https://lists.apache.org/thread/gkvbm6yyly1r4cg3f6xtnqkjz6ogn6o2 and the thread calling the result: https://lists.apache.org/thread/76wzx2lsbwjl363bg066g8kdsocd03rw This thread calls for giving access of Parquet committers to this part of the repo and

Re: Interest in Parquet V3

2024-05-15 Thread Julien Le Dem
ld be >amazing, as it would allow us to quickly do a feature check with a > binary > OR to see if our engine has all necessary features to read a Parquet > file. >I agree that having a compatibility matrix in a prominent spot is an >important thing to have. > > Th

Re: Repeated fields spec clarification

2024-05-15 Thread Julien Le Dem
+1 The semantics of a row group is that it contains rows and therefore starts on R=0 I generally echo Ed's sentiment here. On Wed, May 15, 2024 at 8:01 AM Andrew Lamb wrote: > Thank you all -- I have filed > https://issues.apache.org/jira/browse/PARQUET-2473 to track clarifying the > spec and

Re: [DISCUSS] Propose changing the default branch of the parquet-site repo

2024-05-15 Thread Julien Le Dem
+1 On Wed, May 15, 2024 at 4:15 AM Andrew Lamb wrote: > I plan to wait until next week to allow any one else who has an opinion to > share it here and then assuming no objections will file a ticket with ASF > Infra. > > Andrew > > On Sun, May 12, 2024 at 3:57 AM Uwe L. Korn wrote: > > > +1 > >

Re: [C++] Parquet and Arrow overlap

2024-05-14 Thread Julien Le Dem
taset <- pyarrow > > > > Best, > > Gang > > > > On Tue, May 14, 2024 at 12:38 PM Julien Le Dem < > julien-1odqgaof3lkdnm+yrof...@public.gmane.org> wrote: > > > > > It is great to see more momentum building. > > > I have myself a littl

Re: Interest in Parquet V3

2024-05-14 Thread Julien Le Dem
+1 on Micah starting a doc and following up by commenting in it. @Raphael, Wish Maple: agreed that changing the metadata representation is less important. Most engines can externalize and index metadata in some way. It is an option to propose a standard way to do it without changing the format.

Re: [DISCUSS] Parquet Reference Implementation ?

2024-05-14 Thread Julien Le Dem
I agree that parquet-mr implementation is a requirement to evolve the spec. It makes sense to me that we call parquet-mr the reference implementation and make it a requirement to evolve the spec. I would add the requirement to implement it in the parquet cpp implementation that lives in apache

Re: Fwd: [C++] Parquet and Arrow overlap

2024-05-13 Thread Julien Le Dem
It is great to see more momentum building. I have myself a little bit more time to contribute to Parquet. Personally I think moving it back would make sense. *However* I have personally contributed a lot more to the Java than the C++ code base. That move was done initially because people

Re: Interest in Parquet V3

2024-05-13 Thread Julien Le Dem
It's great to see this thread. Thank you Micah for facilitating the discussion. my 2cts: 1. I like the idea of having feature checks rather than an absolute version number. I am sorry for the confusion created by the V2 moniker. Those were indeed incremental and backwards compatible additions to

Re: Repeated fields spec clarification

2024-05-10 Thread Julien Le Dem
Jan, your understanding of the Parquet spec is correct. The semantics of "num_rows" and "first_row_index" do require records to *not* be split across pages. Push downs and page skipping require this to be true. I would consider the behavior of splitting a record across pages as a bug in

Re: [Request] Send automated notifications to a separate mailing-list

2023-08-21 Thread Julien Le Dem
+1 On Mon, Aug 21, 2023 at 10:16 AM Antoine Pitrou wrote: > > Hello, > > I would like to request that automated notifications (from GitHub, > Jira... whatever) be sent to a separate mailing-list and GMane mirror. > Currently, the endless stream of automated notifications in this > mailing-list

Re: [VOTE] Release Apache Parquet 1.12.1 RC1

2021-09-14 Thread Julien Le Dem
+1 (binding) I verified the signature the build and tests pass (with java 8) On Tue, Sep 14, 2021 at 4:14 PM Xinli shang wrote: > I also vote +1 (binding). Thanks everybody for verifying! > > On Tue, Sep 14, 2021 at 2:00 PM Chao Sun wrote: > > > +1 (non-binding). > > > > - tested on the Spark

New Parquet PMC chair

2021-05-28 Thread Julien Le Dem
Hello Parquet community, The Parquet PMC discussed and decided some time ago to move to a rotating chair. Every year around this time the PMC will elect a new chair to represent the project to the board. I'm happy to announce that Xinli Shang is the first to be elected to be VP Apache Parquet

Re: [VOTE] Release Apache Parquet 1.12.0 RC4

2021-03-24 Thread Julien Le Dem
+1 (binding) I verified the signature and built from source. All tests pass. Looks good. On Wed, Mar 24, 2021 at 2:07 AM Gabor Szadovszky wrote: > I currently have the feeling that the Avro/Jackson related issue has been > discussed and the community agrees on moving forward with this RC as is

[jira] [Commented] (PARQUET-1666) Remove Unused Modules

2020-12-02 Thread Julien Le Dem (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17242864#comment-17242864 ] Julien Le Dem commented on PARQUET-1666: that sounds good to me too > Remove Unused Modu

[ANNOUNCE] New Parquet PMC member - Xinli Shang

2020-11-09 Thread Julien Le Dem
On behalf of the Apache Parquet PMC, I'm happy to announce that Xinli Shang has accepted to join the PMC. Congrats Xinli!

Re: Metadata summary file deprecation

2020-09-29 Thread Julien Le Dem
Hi Jason, Thank you for bringing this up. A correctness issue would only come up when more parquet files are added to the same folder or files are modified. Historically folders have been considered immutables and the summary file reflects the metadata for all the files in the folder. The summary

[Announce] new committer: Xinli Shang

2020-03-12 Thread Julien Le Dem
The Project Management Committee (PMC) for Apache Parquet has invited Xinli Shang to become a committer and we are pleased to announce that he has accepted. Welcome Xinli!

[jira] [Assigned] (PARQUET-1777) add Parquet logo vector files to repo

2020-01-24 Thread Julien Le Dem (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Le Dem reassigned PARQUET-1777: -- Assignee: Julien Le Dem > add Parquet logo vector files to r

[jira] [Created] (PARQUET-1777) add Parquet logo vector files to repo

2020-01-24 Thread Julien Le Dem (Jira)
Julien Le Dem created PARQUET-1777: -- Summary: add Parquet logo vector files to repo Key: PARQUET-1777 URL: https://issues.apache.org/jira/browse/PARQUET-1777 Project: Parquet Issue Type

Re: [VOTE] Release Apache Parquet 1.11.0 RC7

2019-12-05 Thread Julien Le Dem
I verified the signatures ran the build and test It looks like the compatibility changes being discussed are not blockers. +1 (binding) On Wed, Nov 27, 2019 at 1:43 AM Gabor Szadovszky wrote: > Thanks, Zoltan. > > I also vote +1 (binding) > > Cheers, > Gabor > > On Tue, Nov 26, 2019 at 1:46

Re: Parquet sync zoom - invalid meeting ID

2019-11-21 Thread Julien Le Dem
that worked, thanks! On Thu, Nov 21, 2019 at 9:11 AM Xinli shang wrote: > Can you try https://uber.zoom.us/j/142456544? > > On Thu, Nov 21, 2019 at 9:07 AM Gabor Szadovszky wrote: > > > Hi, > > > > Is it just me who cannot join to the meeting? It says "Invalid meeting > > ID"... > > > >

Re: Parquet sync zoom - invalid meeting ID

2019-11-21 Thread Julien Le Dem
same for me. can someone send a new link? On Thu, Nov 21, 2019 at 9:08 AM Jim Apple wrote: > The same is happening to me. Additionally, one of the toll-free phone > numbers did not pick up. > > No outages I see: https://statusgator.com/services/zoom, > https://status.zoom.us/ > > On 2019/11/21

Re: Parquet Sync - 10/17/2019 - Meeting Notes

2019-10-17 Thread Julien Le Dem
;> • >> non...@gmail.com >> • >> robe...@palantir.com >> • >> szonyi.a...@gmail.com >> • >> szo...@cloudera.com >> • >> m.lac...@criteo.com >> • >> csringho...@cloudera.com >> • >> rzam...@nvidia.com >> • >> borokna...

Re: [VOTE] Release Apache Parquet Format 2.7.0 RC0

2019-09-26 Thread Julien Le Dem
verified signature, build, ran tests +1 For information: You can verify the signature by following: https://httpd.apache.org/dev/verification.html (import the KEYS file listed in the email) To build on a mac: brew install maven brew install thrift mvn test mvn package On Thu, Sep 26,

Re: [VOTE] Add BYTE_STREAM_SPLIT encoding to Apache Parquet

2019-08-29 Thread Julien Le Dem
I think this looks promising to me. At first glance it seems combining simplicity and efficiency. I'd like to hear more from other members of the PMC. On Tue, Aug 27, 2019 at 5:30 AM Radev, Martin wrote: > Dear all, > > > there was some earlier discussion on adding a new encoding for better >

Re: Writing INT96 timestamp in parquet from either avro/protobuf records

2019-05-10 Thread Julien Le Dem
Hi Arup, You are correct, you would have to use the lower level APIs or contribute the int96 support to either protobuf or avro integrations. However we are recommending users to migrate away from the int96 type so I would not recommend adding that support.

Re: Parquet Sync

2019-04-15 Thread Julien Le Dem
s fine for me, too, of course. > > Cheers, Lars > > On Mon, Apr 15, 2019, 22:14 Julien Le Dem wrote: > > > Hello all, > > Since I have been away with the new baby the Parquet syncs have fallen > > behind. > > I'd like a volunteer to run those. > > Res

Re: Parquet Sync

2019-04-15 Thread Julien Le Dem
No requirement to be a PMC member no. On Mon, Apr 15, 2019 at 10:41 PM Xinli shang wrote: > Is there any requirement like being PMC of Parquet? > > On Mon, Apr 15, 2019 at 10:14 PM Julien Le Dem > wrote: > > > Hello all, > > Since I have been away with the new

Parquet Sync

2019-04-15 Thread Julien Le Dem
Hello all, Since I have been away with the new baby the Parquet syncs have fallen behind. I'd like a volunteer to run those. Responsibilities include taking notes and posting them on the list. Also occasionally finding a good time for the meeting. Any takers? This could be a rotating duty as well.

[Draft REPORT] Apache Parquet - January 2019

2019-01-07 Thread Julien Le Dem
## Description: Parquet is a standard and interoperable columnar file format for efficient analytics. Parquet has 3 sub-projects: - parquet-format: format reference doc along with thrift based metadata definition (used by both sub-projects bellow) - parquet-mr: java apis and implementation of the

Re: [Discuss] Code of conduct

2018-12-11 Thread Julien Le Dem
ien, > > As per ASF guideline > https://www.apache.org/foundation/policies/conduct.html applies also to > the Apache Parquet channels. Would that be sufficient for you? > > Cheers > Uwe > > On Sat, Dec 8, 2018, at 2:14 AM, Julien Le Dem wrote: > > We currently don’t hav

[Discuss] Code of conduct

2018-12-07 Thread Julien Le Dem
We currently don’t have an explicit code of conduct. We’ve always encouraged respectful discussions and as far as I know all discussions have been that way. However, I don’t think we should wait for an incident to create the need for an explicit code of conduct. I suggest we adopt the contributor

parquet-sync notes December 5 2018

2018-12-05 Thread Julien Le Dem
Deepak: encryption, column statistics Zoltan: vote on the release Nandor: Ryan (netflix): release candidate, validation of the release, encryption Gidon (IBM): update encryption Lars (Cloudera Impala): Qinghui (Criteo): PR in parquet-proto, next release. Replace current proto compiler:

Re: Parquet sync meeting notes

2018-11-06 Thread Julien Le Dem
- I reached out to Ryan who will get back on the PR - I reached out to Jacques regarding page level stats - also advertised it on twitter: https://twitter.com/J_/status/1059860813115052032 On Tue, Nov 6, 2018 at 9:30 AM Julien Le Dem wrote: > Attendees: > >- Gabor (Cloudera) >

Parquet sync meeting notes

2018-11-06 Thread Julien Le Dem
Attendees: - Gabor (Cloudera) - Nandor (Cloudera) - Zoltan (Cloudera): new parquet-mr release - Anna (Cloudera): new parquet-mr release. Would like encryption update - Gidon (IBM): status of encryption design sign off - Xinli (Uber): encryption - Steven (Yelp) - Julien

Re: How to reduce the "committed time" for contributions

2018-10-17 Thread Julien Le Dem
Thanks for starting this discussion Anna. I agree we need to improve and should try your suggestions What do others think? On Tue, Oct 16, 2018 at 11:46 Anna Szonyi wrote: > Hi All, > > I wanted to follow up on the discussion we had on the weekly sync > meeting, namely: how can we reduce the

Re: [RESULT] [VOTE] Moving Apache Parquet C++ development process to a monorepo structure with Apache Arrow C++

2018-10-15 Thread Julien Le Dem
What does archiving the master branch look like? Are we renaming master and leaving a readme pointing to the new repo? On Thu, Sep 20, 2018 at 3:30 PM Wes McKinney wrote: > OK. There is still some code (examples, CLI tools) that needs to be > moved over. Once that's done and all the

parquet sync notes

2018-10-09 Thread Julien Le Dem
Gabor (Cloudera): column index, benchmark, nested types (filter, indexes) Anna (Cloudera): process, feature branches, etiquette of waiting for someone? Blocked Zoltan (Cloudera): Feature branches? When to review them? Nandor (Cloudera) parquet file with multiple row groups, schema evolution

Re: BitWeaving in Parquet?

2018-10-08 Thread Julien Le Dem
If you want (and if you don't already know him) I'm happy to ask Jignesh if he wants an intro. I think he would be happy to tell you about it. On Mon, Oct 8, 2018 at 4:04 PM Jim Apple wrote: > > That sounds like an interesting possibility. It's not that fresh in my > mind > > but I'd say from

Re: BitWeaving in Parquet?

2018-10-08 Thread Julien Le Dem
Hi Jim, I remember chatting with Jignesh Patel about it at the time. Since his company locomatix was acquired by twitter we had him as an adviser for some time. That sounds like an interesting possibility. It's not that fresh in my mind but I'd say from the storage perspective it's a variation of

parquet sync notes

2018-09-25 Thread Julien Le Dem
Lars (Cloudera Impala): listen in. Zoltan, Gabor and Nandor (Cloudera): - feature branch reviewed and merged - Parquet-format release - - Define scope Ryan (Netflix) Junjie (tencent): bloom filter Jim Apple (cloud service): bloom filter in parquet-mr? Since they got in parquet-cpp

Re: Date and time for next Parquet sync

2018-08-28 Thread Julien Le Dem
Notes: Anna (Cloudera): Bloom filter update, Iceberg Gabor, Nandor (Cloudera): - Value skipping implementation to be reviewed. Move Java code from parquet-format to parquet-mr. PR ready - How can users of Parquet handle timestamps and TZs. Allow for writing timestamp in java. Refactor

Parquet sync notes

2018-06-12 Thread Julien Le Dem
QingHui (Criteo): parquet-protobuf Lars (impala), Jim (Cloudera): Bloom filter benchmarks Ryan (Netflix): JunJie (Intel): Bloomfilter and dictionary comparison benchmarks Gidon (IBM): Encryption, feedback Xinli Shang (Uber): Encryption Bloomfilter and dictionary comparison benchmarks: -

Parquet sync notes

2018-06-07 Thread Julien Le Dem
Attendees / Agenda: Gidon (IBM): Parquet encryption. Uber, Vertica, Amazon Anna, Gabor, Nandor (Cloudera): Review for column indexing Junjie (tencent): Bloom filter Lars (Cloudera impala) Jim (Cloudera): Bloom filter Deepak (Vertica): Encryption Qinghui, Benoit (Criteo): parquet protobuf. Parquet

[Announce] new Parquet committer Benoit Hanotte

2018-05-29 Thread Julien Le Dem
We are happy to announce that Benoit has accepted to become a Parquet committer. Welcome Benoit!

Re: Permissions for committers

2018-05-22 Thread Julien Le Dem
You don’t push commits to GitHub. You push them to the Apache git and they get replicated to GitHub On Tue, May 22, 2018 at 09:37 Julien Le Dem <julien.le...@wework.com> wrote: > Do you have your github id configured in I’d.apache.org ? > > On Tue, May 22, 2018 at 06:18 Gabor

Re: Permissions for committers

2018-05-22 Thread Julien Le Dem
Do you have your github id configured in I’d.apache.org ? On Tue, May 22, 2018 at 06:18 Gabor Szadovszky wrote: > Hi, > > Could someone help me to have the required permissions on github so I can > push commits? > > Thanks a lot, > Gabor >

[Announce] new Parquet committer Constantin Muraru

2018-05-21 Thread Julien Le Dem
We are happy to announce that Constantin has accepted to become a Parquet committer. Welcome Constantin!

Re: Parquet Data Help

2018-05-21 Thread Julien Le Dem
This sounds like a hive question rather than a parquet question. Did you try posting on the hive mailing list? On Mon, May 21, 2018 at 12:59 AM, Shubham gurav wrote: > Hey Dev, > > Currently using Hive 0.13 and our database is in parquet format. When i > extract the

notes parquet sync May 9 2018

2018-05-10 Thread Julien Le Dem
Attendees and agenda building Deepak (vertica) : encryption cpp code Jim (Cloudera Palo Alto) Lars (Cloudera, impala) Nandor, Zoltan, Anna (Cloudera Budapest): - Breaking changes: avoid backwards incompatible changes Benoit (Criteo) Ryan (netflix) Julien (WeWork) Notes: - encryption

[jira] [Resolved] (PARQUET-968) Add Hive/Presto support in ProtoParquet

2018-04-26 Thread Julien Le Dem (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Le Dem resolved PARQUET-968. --- Resolution: Fixed Fix Version/s: 1.11 > Add Hive/Presto support in ProtoParq

[jira] [Assigned] (PARQUET-968) Add Hive/Presto support in ProtoParquet

2018-04-26 Thread Julien Le Dem (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Le Dem reassigned PARQUET-968: - Assignee: Constantin Muraru (was: Julien Le Dem) > Add Hive/Presto supp

[jira] [Commented] (PARQUET-968) Add Hive/Presto support in ProtoParquet

2018-04-26 Thread Julien Le Dem (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16453972#comment-16453972 ] Julien Le Dem commented on PARQUET-968: --- merged in  https://github.com/apache/parquet-mr/commit

[jira] [Assigned] (PARQUET-968) Add Hive/Presto support in ProtoParquet

2018-04-26 Thread Julien Le Dem (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Le Dem reassigned PARQUET-968: - Assignee: Julien Le Dem > Add Hive/Presto support in ProtoParq

[jira] [Commented] (PARQUET-1281) Jackson dependency

2018-04-24 Thread Julien Le Dem (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16450163#comment-16450163 ] Julien Le Dem commented on PARQUET-1281: parquet-hadoop should have its build include shading

Parquet sync

2018-04-24 Thread Julien Le Dem
Happening now: https://meet.google.com/esu-yiit-mun

Re: Date and time for the next Parquet sync

2018-04-20 Thread Julien Le Dem
+1 On Wed, Apr 18, 2018 at 9:23 AM, Zoltan Ivanfi wrote: > +1, thanks Lars! > > On Wed, Apr 18, 2018 at 6:20 PM Lars Volker wrote: > > > Hi All, > > > > It has been 3 weeks since our last Parquet community sync and I think it > > would be great to have one

Re: [VOTE] Release Apache Parquet Format 2.5.0 RC0

2018-04-12 Thread Julien Le Dem
+1 (binding) checked signature ran build and tests On Mon, Apr 9, 2018 at 8:44 AM, Ryan Blue wrote: > +1 (binding) > > Checked this for the last vote. > > On Mon, Apr 9, 2018 at 4:53 AM, Gabor Szadovszky < > gabor.szadovs...@cloudera.com> wrote: > > > Hi everyone, > >

Re: [RESULT][VOTE] Release Apache Parquet Format 2.5.0 RC0

2018-04-12 Thread Julien Le Dem
the release verification script for parquet-cpp is a good reference: https://github.com/apache/parquet-cpp/blob/master/dev/release/verify-release-candidate On Fri, Apr 6, 2018 at 8:57 AM, Ryan Blue wrote: > Yeah, I thought it was a hard limit when I wrote that. Then I

Re: parquet-mr next release with PARQUET-1217?

2018-04-12 Thread Julien Le Dem
If someone wants a 1.9.1 it can be done, we'll need someone to own the release process though. On Tue, Apr 10, 2018 at 3:53 PM, Henry Robinson wrote: > Thanks! Sorry to miss the vote - was AFK for a few days. I look forward to > testing it out anyhow. > > On 5 April 2018 at

Re: [VOTE] Release Apache Parquet Java 1.10.0 RC0

2018-04-06 Thread Julien Le Dem
+1 (binding) * verified signature and checksum * build and tested on osx On Fri, Apr 6, 2018 at 9:55 AM, Uwe L. Korn wrote: > +1 (binding) > > * Verified signature and checksum > * Build and tested on OSX using `mvn clean install >

[jira] [Resolved] (PARQUET-1259) Parquet-protobuf support both protobuf 2 and protobuf 3

2018-04-04 Thread Julien Le Dem (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Le Dem resolved PARQUET-1259. Resolution: Workaround supporting more than one version adds complexity. It sounds like

Re: String interning in parquet-format

2018-04-03 Thread Julien Le Dem
The main reason for the string interning is saving memory. Some of the early parquet design is using the column names in the metadata to refer to columns. When deserializing metadata we have a new string instance when we deserialize even though it is the same string. We don't need to rely on the

parquet sync happening now

2018-03-28 Thread Julien Le Dem
https://meet.google.com/xpc-gwie-sem

[jira] [Updated] (PARQUET-1222) Definition of float and double sort order is ambiguous

2018-03-13 Thread Julien Le Dem (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Le Dem updated PARQUET-1222: --- Summary: Definition of float and double sort order is ambiguous (was: Definition of float

Parquet sync starting now

2018-03-13 Thread Julien Le Dem
https://meet.google.com/jpy-mump-ngc

[jira] [Resolved] (PARQUET-1135) upgrade thrift and protobuf dependencies

2018-03-09 Thread Julien Le Dem (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Le Dem resolved PARQUET-1135. Resolution: Fixed merged in: https://github.com/apache/parquet-mr/commit

[jira] [Updated] (PARQUET-1135) upgrade thrift and protobuf dependencies

2018-03-09 Thread Julien Le Dem (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Le Dem updated PARQUET-1135: --- Fix Version/s: 1.9.1 Description: thrift 0.7.0 -> 0.9.3 protobuf 3.2 ->

Re: Date for next Parquet sync

2018-03-08 Thread Julien Le Dem
hours Budapest (Hungary) Tuesday, March 13, 2018 at 6:00:00 pm CET UTC+1 hour Paris (France - Île-de-France) Tuesday, March 13, 2018 at 6:00:00 pm CET UTC+1 hour Corresponding UTC (GMT) Tuesday, March 13, 2018 at 17:00:00 On Thu, Mar 8, 2018 at 4:12 PM, Julien Le Dem <julien.le...@gmail.com>

Re: Date for next Parquet sync

2018-03-08 Thread Julien Le Dem
or 10am PST but it's a little late for the team in Budapest. On Thu, Mar 8, 2018 at 4:11 PM, Julien Le Dem <julien.le...@gmail.com> wrote: > I'm sorry, it turns out I now have a conflict at this particular time. > Maybe Wednesday? > > On Mon, Mar 5, 2018 at 10:55

Re: Date for next Parquet sync

2018-03-08 Thread Julien Le Dem
I'm sorry, it turns out I now have a conflict at this particular time. Maybe Wednesday? On Mon, Mar 5, 2018 at 10:55 AM, Lars Volker wrote: > Hi All, > > It has been almost 3 weeks since the last sync and there are a bunch of > ongoing discussions on the mailing list. Let's

Re: parquet sync

2018-02-14 Thread Julien Le Dem
rst version writing into parquet-mr - Action: - - Ryan to review - Ryan and Zoltan to follow up on making parquet-format release On Wed, Feb 14, 2018 at 9:02 AM, Julien Le Dem <julien.le...@wework.com> wrote: > starting now on google hangout: > https://meet.google.com/nhj-cvpt-atx >

parquet sync

2018-02-14 Thread Julien Le Dem
starting now on google hangout: https://meet.google.com/nhj-cvpt-atx

Re: Date and Time for next Parquet sync

2018-02-09 Thread Julien Le Dem
If you have received an invitation for next Wednesday, please disregard it for now. I was just adding people to the list of reminders. I'll move it to whenever is the conclusion of this thread. I have a conflict on Tuesday though. I am available on Wednesday. On Wed, Feb 7, 2018 at 11:29 PM,

Re: parquet sync

2018-01-30 Thread Julien Le Dem
- Next sync, Tuesday in 2 weeks. On Tue, Jan 30, 2018 at 6:59 PM, Julien Le Dem <julien.le...@gmail.com> wrote: > happening now: meet.google.com/nhj-cvpt-atx >

parquet sync

2018-01-30 Thread Julien Le Dem
happening now: meet.google.com/nhj-cvpt-atx

Re: Next parquet sync

2018-01-10 Thread Julien Le Dem
e some measurements - Restart the conversation - PARQUET-787: needs a review https://github.com/apache/parquet-mr/pull/390 - Releases - - Ryan: create release jira On Tue, Jan 9, 2018 at 8:54 AM, Julien Le Dem <julien.le...@wework.com> wrote: > The sync is starting i

Re: Next parquet sync

2018-01-09 Thread Julien Le Dem
s to join but was not on the > invite, please let me know. > > Cheers, Lars > > On Mon, Jan 8, 2018 at 10:24 PM, Julien Le Dem <julien.le...@wework.com> > wrote: > > > It sounds like we're doing the parquet sync tomorrow Tuesday January 9th > at > > 9am PT (5pm UTC) >

Re: Iceberg table format

2018-01-03 Thread Julien Le Dem
Happy new year! I'm interested as well. Did you get to publish your code on github? Thanks On Fri, Dec 8, 2017 at 8:42 AM, Ryan Blue wrote: > I'm working on getting the code out to our open source github org, probably > early next week. I'll set up a mailing list for

Next parquet sync

2018-01-03 Thread Julien Le Dem
Any day of the week/time preference for the next Parquet sync? It is usually held at 9am PT (5pm UTC) on a Wednesday.

parquet sync starting now

2017-12-06 Thread Julien Le Dem
https://meet.google.com/ttv-rton-ber (all welcome)

Re: parquet sync starting in a few minutes

2017-11-22 Thread Julien Le Dem
. same time. On Wed, Nov 22, 2017 at 8:57 AM, Julien Le Dem <julien.le...@gmail.com> wrote: > https://meet.google.com/udi-dvmo-sva >

parquet sync starting in a few minutes

2017-11-22 Thread Julien Le Dem
https://meet.google.com/udi-dvmo-sva

parquet sync starting now

2017-11-08 Thread Julien Le Dem
https://meet.google.com/oto-xpdf-kug

[Announce] Congrats to our new Parquet committers

2017-10-27 Thread Julien Le Dem
Zolta Ivanfi and Lars Volker are now Parquet committers. Deepak Majeti became a committer in July. Thank you all for your sustained contribution to the project. Welcome and congrats!

Parquet sync

2017-10-25 Thread Julien Le Dem
Starting now: https://meet.google.com/oto-xpdf-kug

Re: parquet sync starting now

2017-10-11 Thread Julien Le Dem
-format release: blocked on page index parquet related table format discussion: will happen separately. next meeting in 2 weeks. On Wed, Oct 11, 2017 at 9:06 AM, Julien Le Dem <julien.le...@gmail.com> wrote: > https://meet.google.com/oto-xpdf-kug >

  1   2   3   4   >