RE: Date and time for next parquet sync

2019-01-17 Thread Santlal J Gupta
Hi team,

This email id is not available from this Friday onwards. Please add my another 
mail id(santlal561...@gmail.com) in parquet sync meeting.

From my mail id(santlal561...@gmail.com), I already sent a meeting request.

Thanks
Santlal J Gupta

-Original Message-
From: Santlal J Gupta 
Sent: Friday, September 29, 2017 10:19 AM
To: dev@parquet.apache.org
Subject: RE: Date and time for next parquet sync

Yes I want to join.

-Original Message-
From: Lars Volker [mailto:l...@cloudera.com] 
Sent: Thursday, September 28, 2017 8:40 PM
To: dev@parquet.apache.org
Subject: Date and time for next parquet sync

I sent out an meeting request for the next Parquet sync on Wednesday, October 
11th at 9am PST. Please reply to this email if you'd like to join and found 
yourself not on the invite yet.


Date and time of the next Parquet sync: December 5 at 9AM PT (6PM CET)

2018-11-27 Thread Zoltan Ivanfi
Hi,

I have sent an invitation for the next Parquet Sync for next Wednesday
(December 5) at 9AM PT (6PM CET). The meeting is open to anybody interested
in Parquet. If you have not received the invitation but would like to
attend, please send me a mail in private and I will add you.

Thanks,

Zoltan


Date and time of the next Parquet sync

2018-10-30 Thread Zoltan Ivanfi
Hi,

I have sent an invitation for the next Parquet Sync for next Tuesday
(November 6) at 6 PM CET / 9 AM PT. The meeting is open to anybody
interested in Parquet. If you have not received the invitation but would
like to attend, please send me a mail in private and I will add you.

Thanks,

Zoltan


Re: Date and time for next Parquet sync

2018-09-18 Thread Nandor Kollar
Hi All,

Since it sees that apart from you several other community members
can't attend the meeting tomorrow, would anyone mind if we'd
reschedule it for next Tuesday at the same time?

Thanks,
Nandor

On Tue, Sep 18, 2018 at 9:51 AM, Zoltan Ivanfi  wrote:
> Hi,
>
> It seems that I won't be able to attend after all, sorry for the late
> decline.
>
> Zoltan
>
> On Mon, Sep 10, 2018 at 7:21 PM Ryan Blue  wrote:
>>
>> Sorry, looks like I was wrong on the dates. Thanks, Nandor.
>>
>> On Mon, Sep 10, 2018 at 5:15 AM Nandor Kollar 
>> wrote:
>>
>> > Ryan, I was aware of Strata, actually I wanted to schedule it to 18th
>> > September, but forgot to change 'next week' in the email. So in fact I
>> > already pushed it out one week, sorry for the confusion.
>> >
>> > Gidon, 19th is fine for me, if there's no objection against it, then
>> > we can have it then!
>> >
>> > Thanks,
>> > Nandor
>> >
>> > On Fri, Sep 7, 2018 at 9:21 PM, Ryan Blue 
>> > wrote:
>> > > We may want to push this out another week because it also conflicts
>> > > with
>> > > Strata NY. I think a few of us will be travelling Tuesday and both
>> > > Julien
>> > > and I have talks on Wednesday.
>> > >
>> > > On Fri, Sep 7, 2018 at 6:24 AM Gidon Gershinsky 
>> > wrote:
>> > >
>> > >> Hi Nandor,
>> > >>
>> > >> Can we make it Wed this time, Sept 19? Or any of Tue/Wed on another
>> > week.
>> > >> Sept 18 is the Yom Kippur eve - this basically means I won't have a
>> > >> technical ability to join a call.
>> > >>
>> > >> Regarding the Google doc vs reviewed PR + .md file - it indeed
>> > >> becomes
>> > >> difficult and unneccesary to maintain two
>> > >> versions of the same documentation. Following you last mail, there
>> > >> was a
>> > >> high volume of review
>> > >> activity at the google doc, but now the spike is winding down, I'll
>> > >> be
>> > >> removing the duplicate part from the google doc
>> > >> (keeping the samples), with new comments to go to PRs (md and code).
>> > I'll
>> > >> send a detailed mail early next week.
>> > >>
>> > >>
>> > >> Cheers, Gidon.
>> > >>
>> > >> On Fri, Sep 7, 2018 at 3:42 PM Nandor Kollar
>> > > > >> >
>> > >> wrote:
>> > >>
>> > >> > Hi All,
>> > >> >
>> > >> > I'd like propose to have a Parquet Sync next week Tuesday
>> > >> > (September
>> > >> > 18th) at 6pm CEST / 9 am PST.
>> > >> >
>> > >> > Some of the topics which would be nice to discuss:
>> > >> > - review column indexes (PRs and feature branch)
>> > >> > - move Java code from format to mr (PR #517)
>> > >> > - Bloom filter spec
>> > >> > - columnar encryption spec (and general question, where to track
>> > >> > specs, Google doc vs reviewed PR + .md file)
>> > >> > - Refactor modules to use the new logical type API (PR under
>> > >> > review)
>> > >> > - new format release scope (nano precision timestamp, bloom filer?,
>> > >> > columnar encryption?)
>> > >> >
>> > >> > I'll send the meeting invite shortly. Feel free to propose other
>> > >> > time
>> > >> > slot if it is not suitable for you, and bring any additional topic
>> > >> > you'd like to discuss.
>> > >> >
>> > >> > Regards,
>> > >> > Nandor
>> > >> >
>> > >>
>> > >
>> > >
>> > > --
>> > > Ryan Blue
>> > > Software Engineer
>> > > Netflix
>> >
>>
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix


Re: Date and time for next Parquet sync

2018-09-18 Thread Zoltan Ivanfi
Hi,

It seems that I won't be able to attend after all, sorry for the late
decline.

Zoltan

On Mon, Sep 10, 2018 at 7:21 PM Ryan Blue  wrote:

> Sorry, looks like I was wrong on the dates. Thanks, Nandor.
>
> On Mon, Sep 10, 2018 at 5:15 AM Nandor Kollar 
> wrote:
>
> > Ryan, I was aware of Strata, actually I wanted to schedule it to 18th
> > September, but forgot to change 'next week' in the email. So in fact I
> > already pushed it out one week, sorry for the confusion.
> >
> > Gidon, 19th is fine for me, if there's no objection against it, then
> > we can have it then!
> >
> > Thanks,
> > Nandor
> >
> > On Fri, Sep 7, 2018 at 9:21 PM, Ryan Blue 
> > wrote:
> > > We may want to push this out another week because it also conflicts
> with
> > > Strata NY. I think a few of us will be travelling Tuesday and both
> Julien
> > > and I have talks on Wednesday.
> > >
> > > On Fri, Sep 7, 2018 at 6:24 AM Gidon Gershinsky 
> > wrote:
> > >
> > >> Hi Nandor,
> > >>
> > >> Can we make it Wed this time, Sept 19? Or any of Tue/Wed on another
> > week.
> > >> Sept 18 is the Yom Kippur eve - this basically means I won't have a
> > >> technical ability to join a call.
> > >>
> > >> Regarding the Google doc vs reviewed PR + .md file - it indeed becomes
> > >> difficult and unneccesary to maintain two
> > >> versions of the same documentation. Following you last mail, there
> was a
> > >> high volume of review
> > >> activity at the google doc, but now the spike is winding down, I'll be
> > >> removing the duplicate part from the google doc
> > >> (keeping the samples), with new comments to go to PRs (md and code).
> > I'll
> > >> send a detailed mail early next week.
> > >>
> > >>
> > >> Cheers, Gidon.
> > >>
> > >> On Fri, Sep 7, 2018 at 3:42 PM Nandor Kollar
> >  > >> >
> > >> wrote:
> > >>
> > >> > Hi All,
> > >> >
> > >> > I'd like propose to have a Parquet Sync next week Tuesday (September
> > >> > 18th) at 6pm CEST / 9 am PST.
> > >> >
> > >> > Some of the topics which would be nice to discuss:
> > >> > - review column indexes (PRs and feature branch)
> > >> > - move Java code from format to mr (PR #517)
> > >> > - Bloom filter spec
> > >> > - columnar encryption spec (and general question, where to track
> > >> > specs, Google doc vs reviewed PR + .md file)
> > >> > - Refactor modules to use the new logical type API (PR under review)
> > >> > - new format release scope (nano precision timestamp, bloom filer?,
> > >> > columnar encryption?)
> > >> >
> > >> > I'll send the meeting invite shortly. Feel free to propose other
> time
> > >> > slot if it is not suitable for you, and bring any additional topic
> > >> > you'd like to discuss.
> > >> >
> > >> > Regards,
> > >> > Nandor
> > >> >
> > >>
> > >
> > >
> > > --
> > > Ryan Blue
> > > Software Engineer
> > > Netflix
> >
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>


Re: Date and time for next Parquet sync

2018-09-10 Thread Ryan Blue
Sorry, looks like I was wrong on the dates. Thanks, Nandor.

On Mon, Sep 10, 2018 at 5:15 AM Nandor Kollar  wrote:

> Ryan, I was aware of Strata, actually I wanted to schedule it to 18th
> September, but forgot to change 'next week' in the email. So in fact I
> already pushed it out one week, sorry for the confusion.
>
> Gidon, 19th is fine for me, if there's no objection against it, then
> we can have it then!
>
> Thanks,
> Nandor
>
> On Fri, Sep 7, 2018 at 9:21 PM, Ryan Blue 
> wrote:
> > We may want to push this out another week because it also conflicts with
> > Strata NY. I think a few of us will be travelling Tuesday and both Julien
> > and I have talks on Wednesday.
> >
> > On Fri, Sep 7, 2018 at 6:24 AM Gidon Gershinsky 
> wrote:
> >
> >> Hi Nandor,
> >>
> >> Can we make it Wed this time, Sept 19? Or any of Tue/Wed on another
> week.
> >> Sept 18 is the Yom Kippur eve - this basically means I won't have a
> >> technical ability to join a call.
> >>
> >> Regarding the Google doc vs reviewed PR + .md file - it indeed becomes
> >> difficult and unneccesary to maintain two
> >> versions of the same documentation. Following you last mail, there was a
> >> high volume of review
> >> activity at the google doc, but now the spike is winding down, I'll be
> >> removing the duplicate part from the google doc
> >> (keeping the samples), with new comments to go to PRs (md and code).
> I'll
> >> send a detailed mail early next week.
> >>
> >>
> >> Cheers, Gidon.
> >>
> >> On Fri, Sep 7, 2018 at 3:42 PM Nandor Kollar
>  >> >
> >> wrote:
> >>
> >> > Hi All,
> >> >
> >> > I'd like propose to have a Parquet Sync next week Tuesday (September
> >> > 18th) at 6pm CEST / 9 am PST.
> >> >
> >> > Some of the topics which would be nice to discuss:
> >> > - review column indexes (PRs and feature branch)
> >> > - move Java code from format to mr (PR #517)
> >> > - Bloom filter spec
> >> > - columnar encryption spec (and general question, where to track
> >> > specs, Google doc vs reviewed PR + .md file)
> >> > - Refactor modules to use the new logical type API (PR under review)
> >> > - new format release scope (nano precision timestamp, bloom filer?,
> >> > columnar encryption?)
> >> >
> >> > I'll send the meeting invite shortly. Feel free to propose other time
> >> > slot if it is not suitable for you, and bring any additional topic
> >> > you'd like to discuss.
> >> >
> >> > Regards,
> >> > Nandor
> >> >
> >>
> >
> >
> > --
> > Ryan Blue
> > Software Engineer
> > Netflix
>


-- 
Ryan Blue
Software Engineer
Netflix


Re: Date and time for next Parquet sync

2018-09-10 Thread Nandor Kollar
Ryan, I was aware of Strata, actually I wanted to schedule it to 18th
September, but forgot to change 'next week' in the email. So in fact I
already pushed it out one week, sorry for the confusion.

Gidon, 19th is fine for me, if there's no objection against it, then
we can have it then!

Thanks,
Nandor

On Fri, Sep 7, 2018 at 9:21 PM, Ryan Blue  wrote:
> We may want to push this out another week because it also conflicts with
> Strata NY. I think a few of us will be travelling Tuesday and both Julien
> and I have talks on Wednesday.
>
> On Fri, Sep 7, 2018 at 6:24 AM Gidon Gershinsky  wrote:
>
>> Hi Nandor,
>>
>> Can we make it Wed this time, Sept 19? Or any of Tue/Wed on another week.
>> Sept 18 is the Yom Kippur eve - this basically means I won't have a
>> technical ability to join a call.
>>
>> Regarding the Google doc vs reviewed PR + .md file - it indeed becomes
>> difficult and unneccesary to maintain two
>> versions of the same documentation. Following you last mail, there was a
>> high volume of review
>> activity at the google doc, but now the spike is winding down, I'll be
>> removing the duplicate part from the google doc
>> (keeping the samples), with new comments to go to PRs (md and code). I'll
>> send a detailed mail early next week.
>>
>>
>> Cheers, Gidon.
>>
>> On Fri, Sep 7, 2018 at 3:42 PM Nandor Kollar > >
>> wrote:
>>
>> > Hi All,
>> >
>> > I'd like propose to have a Parquet Sync next week Tuesday (September
>> > 18th) at 6pm CEST / 9 am PST.
>> >
>> > Some of the topics which would be nice to discuss:
>> > - review column indexes (PRs and feature branch)
>> > - move Java code from format to mr (PR #517)
>> > - Bloom filter spec
>> > - columnar encryption spec (and general question, where to track
>> > specs, Google doc vs reviewed PR + .md file)
>> > - Refactor modules to use the new logical type API (PR under review)
>> > - new format release scope (nano precision timestamp, bloom filer?,
>> > columnar encryption?)
>> >
>> > I'll send the meeting invite shortly. Feel free to propose other time
>> > slot if it is not suitable for you, and bring any additional topic
>> > you'd like to discuss.
>> >
>> > Regards,
>> > Nandor
>> >
>>
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix


Re: Date and time for next Parquet sync

2018-09-09 Thread Gidon Gershinsky
Thanks to the dozens of folks who have found time to read the design
googledoc since the last Parquet sync.

Now that the traffic peak at the doc is over, I'll be handling the overlap
with the new Encryption.md file. It is becoming difficult and unnecessary
to maintain two versions in parallel, therefore the overlapping part will
be removed from the googledoc. The Encryption.md
 (formatted here
)
and the current Thrift file

together provide a technically accurate, down to a single byte, description
of the encryption format and the writer/reader protocol. You can leave new
comments at the document pull request.

Old comments are still available at the google doc, press the comments
button for the Dec'17 to Aug'18 comment history. Also, you can read the
review comments at pull requests, merged (94
, 103
, 104
 in parquet-format, 463
, 464
 in parquet-cpp) and open (
95 *, 471
, 472
 in parquet-mr and 475
 in parquet-cpp).

Besides comment history, the google doc will keep the API description
("Usage samples" section). The sample code is in Java, but the same API is
available in the C++ Parquet version (thanks Tham Ha for the hard work on
this!).

Cheers, Gidon.



On Wed, Aug 29, 2018 at 12:41 PM Nandor Kollar 
wrote:

> Hi all,
>
> Yesterday we talked about the status of the columnar encryption, and
> agreed that before anything related to it gets released, we need a
> reviewed spec. Actually Gidon already opened PR for this:
> https://github.com/apache/parquet-format/pull/101, it is based on the
> design doc (
> https://docs.google.com/document/d/1T89G7xR0zHFV1f2pjTO28jtfVm8qoNVGEJQ70Rsk-bY/edit
> )
> written by him. Julien, Ryan what do you think - is there anything
> else needed?
>
> Regards,
> Nandor
>
> On Tue, Aug 28, 2018 at 7:16 PM, Julien Le Dem
>  wrote:
> > Notes:
> > Anna (Cloudera): Bloom filter update, Iceberg
> > Gabor, Nandor (Cloudera):
> >
> >- Value skipping implementation to be reviewed. Move Java code from
> >parquet-format to parquet-mr. PR ready
> >- How can users of Parquet handle timestamps and TZs. Allow for
> writing
> >timestamp in java. Refactor original type logic to more flexible new
> >original type api.
> >- Column indexes and alignment of pages
> >- Limiting the number of records in a page to avoid skewed splits when
> >compression is really good.
> >
> > Ryan (Netflix): Iceberg stuff back to Parquet: expression library for
> push
> > down. Dictionary and stats based row group filtering.
> > JunJie (Intel): Bloom filter. Need more reviews. Have a vote on the
> design
> > and add it to parquet-format.
> > Julien (Wework): Encryption.
> >
> >
> >- Bloom Filter:
> >https://issues.apache.org/jira/projects/PARQUET/issues/PARQUET-41
> ><
> https://issues.apache.org/jira/projects/PARQUET/issues/PARQUET-41?filter=allopenissues
> >
> >-
> >   - Committed utility class to parquet-cpp
> >   - Uploaded the benchmark result.
> >   - Ready to add into the spec.
> >   - Submit a PR for the parquet reader spec.
> >   - *Action*: review parquet java utility class.
> >   https://github.com/apache/parquet-mr/pull/425
> >   - Encryption:
> >-
> >   - Nandor, Gabor reviewing.
> >   - Apis to allow pluggable key management.
> >   - Need to have a proper review of the spec.
> >   - Need more testing
> >   - Column indices:
> >-
> >   - PR to be reviewed: https://github.com/apache/parquet-mr/pull/514
> >   - Ryan: to review features branch
> >   - Moving java code from parquet-format to parquet-mr:
> >-
> >   - Action: review. https://github.com/apache/parquet-mr/pull/517
> >   - Gets the thrift file from the parquet-format released artifact.
> >   - Maximum number of records per page:
> >-
> >   - We should add a property with a maximum number of records per
> page
> >   and per row group.
> >   - Need to benchmark to figure out a good default. 10K?
> >   - Iceberg:
> >-
> >   - Some of the iceberg code should be in Parquet:
> >   -
> >  - Rewrote record reconstruction stack
> >  -
> > - Reuses page reader and decoder
> > - Then does a triple iterator that return an entire column
> in a
> > file (iterator of triples)
> >  

Re: Date and time for next Parquet sync

2018-09-07 Thread Ryan Blue
We may want to push this out another week because it also conflicts with
Strata NY. I think a few of us will be travelling Tuesday and both Julien
and I have talks on Wednesday.

On Fri, Sep 7, 2018 at 6:24 AM Gidon Gershinsky  wrote:

> Hi Nandor,
>
> Can we make it Wed this time, Sept 19? Or any of Tue/Wed on another week.
> Sept 18 is the Yom Kippur eve - this basically means I won't have a
> technical ability to join a call.
>
> Regarding the Google doc vs reviewed PR + .md file - it indeed becomes
> difficult and unneccesary to maintain two
> versions of the same documentation. Following you last mail, there was a
> high volume of review
> activity at the google doc, but now the spike is winding down, I'll be
> removing the duplicate part from the google doc
> (keeping the samples), with new comments to go to PRs (md and code). I'll
> send a detailed mail early next week.
>
>
> Cheers, Gidon.
>
> On Fri, Sep 7, 2018 at 3:42 PM Nandor Kollar  >
> wrote:
>
> > Hi All,
> >
> > I'd like propose to have a Parquet Sync next week Tuesday (September
> > 18th) at 6pm CEST / 9 am PST.
> >
> > Some of the topics which would be nice to discuss:
> > - review column indexes (PRs and feature branch)
> > - move Java code from format to mr (PR #517)
> > - Bloom filter spec
> > - columnar encryption spec (and general question, where to track
> > specs, Google doc vs reviewed PR + .md file)
> > - Refactor modules to use the new logical type API (PR under review)
> > - new format release scope (nano precision timestamp, bloom filer?,
> > columnar encryption?)
> >
> > I'll send the meeting invite shortly. Feel free to propose other time
> > slot if it is not suitable for you, and bring any additional topic
> > you'd like to discuss.
> >
> > Regards,
> > Nandor
> >
>


-- 
Ryan Blue
Software Engineer
Netflix


Re: Date and time for next Parquet sync

2018-09-07 Thread Gidon Gershinsky
Hi Nandor,

Can we make it Wed this time, Sept 19? Or any of Tue/Wed on another week.
Sept 18 is the Yom Kippur eve - this basically means I won't have a
technical ability to join a call.

Regarding the Google doc vs reviewed PR + .md file - it indeed becomes
difficult and unneccesary to maintain two
versions of the same documentation. Following you last mail, there was a
high volume of review
activity at the google doc, but now the spike is winding down, I'll be
removing the duplicate part from the google doc
(keeping the samples), with new comments to go to PRs (md and code). I'll
send a detailed mail early next week.


Cheers, Gidon.

On Fri, Sep 7, 2018 at 3:42 PM Nandor Kollar 
wrote:

> Hi All,
>
> I'd like propose to have a Parquet Sync next week Tuesday (September
> 18th) at 6pm CEST / 9 am PST.
>
> Some of the topics which would be nice to discuss:
> - review column indexes (PRs and feature branch)
> - move Java code from format to mr (PR #517)
> - Bloom filter spec
> - columnar encryption spec (and general question, where to track
> specs, Google doc vs reviewed PR + .md file)
> - Refactor modules to use the new logical type API (PR under review)
> - new format release scope (nano precision timestamp, bloom filer?,
> columnar encryption?)
>
> I'll send the meeting invite shortly. Feel free to propose other time
> slot if it is not suitable for you, and bring any additional topic
> you'd like to discuss.
>
> Regards,
> Nandor
>


Date and time for next Parquet sync

2018-09-07 Thread Nandor Kollar
Hi All,

I'd like propose to have a Parquet Sync next week Tuesday (September
18th) at 6pm CEST / 9 am PST.

Some of the topics which would be nice to discuss:
- review column indexes (PRs and feature branch)
- move Java code from format to mr (PR #517)
- Bloom filter spec
- columnar encryption spec (and general question, where to track
specs, Google doc vs reviewed PR + .md file)
- Refactor modules to use the new logical type API (PR under review)
- new format release scope (nano precision timestamp, bloom filer?,
columnar encryption?)

I'll send the meeting invite shortly. Feel free to propose other time
slot if it is not suitable for you, and bring any additional topic
you'd like to discuss.

Regards,
Nandor


Re: Date and time for next Parquet sync

2018-08-29 Thread Nandor Kollar
Hi all,

Yesterday we talked about the status of the columnar encryption, and
agreed that before anything related to it gets released, we need a
reviewed spec. Actually Gidon already opened PR for this:
https://github.com/apache/parquet-format/pull/101, it is based on the
design doc 
(https://docs.google.com/document/d/1T89G7xR0zHFV1f2pjTO28jtfVm8qoNVGEJQ70Rsk-bY/edit)
written by him. Julien, Ryan what do you think - is there anything
else needed?

Regards,
Nandor

On Tue, Aug 28, 2018 at 7:16 PM, Julien Le Dem
 wrote:
> Notes:
> Anna (Cloudera): Bloom filter update, Iceberg
> Gabor, Nandor (Cloudera):
>
>- Value skipping implementation to be reviewed. Move Java code from
>parquet-format to parquet-mr. PR ready
>- How can users of Parquet handle timestamps and TZs. Allow for writing
>timestamp in java. Refactor original type logic to more flexible new
>original type api.
>- Column indexes and alignment of pages
>- Limiting the number of records in a page to avoid skewed splits when
>compression is really good.
>
> Ryan (Netflix): Iceberg stuff back to Parquet: expression library for push
> down. Dictionary and stats based row group filtering.
> JunJie (Intel): Bloom filter. Need more reviews. Have a vote on the design
> and add it to parquet-format.
> Julien (Wework): Encryption.
>
>
>- Bloom Filter:
>https://issues.apache.org/jira/projects/PARQUET/issues/PARQUET-41
>
> 
>-
>   - Committed utility class to parquet-cpp
>   - Uploaded the benchmark result.
>   - Ready to add into the spec.
>   - Submit a PR for the parquet reader spec.
>   - *Action*: review parquet java utility class.
>   https://github.com/apache/parquet-mr/pull/425
>   - Encryption:
>-
>   - Nandor, Gabor reviewing.
>   - Apis to allow pluggable key management.
>   - Need to have a proper review of the spec.
>   - Need more testing
>   - Column indices:
>-
>   - PR to be reviewed: https://github.com/apache/parquet-mr/pull/514
>   - Ryan: to review features branch
>   - Moving java code from parquet-format to parquet-mr:
>-
>   - Action: review. https://github.com/apache/parquet-mr/pull/517
>   - Gets the thrift file from the parquet-format released artifact.
>   - Maximum number of records per page:
>-
>   - We should add a property with a maximum number of records per page
>   and per row group.
>   - Need to benchmark to figure out a good default. 10K?
>   - Iceberg:
>-
>   - Some of the iceberg code should be in Parquet:
>   -
>  - Rewrote record reconstruction stack
>  -
> - Reuses page reader and decoder
> - Then does a triple iterator that return an entire column in a
> file (iterator of triples)
> - Record reconstruction class that handles everything that the
> current one does but with {list, map} factories
> -
>- 20% faster to write, 5% faster to read
>- Easier to write object mappers
> - Helps with page level skipping.
> - High level abstractions in the iceberg library:
>  -
> - Take an expression and simplify it (not, ...) to run on
> metadata
> - Take a complex expression and split the part on the
> partition/min/max and the remaining part.
>
>
>
>
>
>
> On Mon, Aug 27, 2018 at 4:56 AM, Nandor Kollar > wrote:
>
>> Yes, CEST.
>>
>> On Mon, Aug 27, 2018 at 1:01 PM, Uwe L. Korn  wrote:
>> > Hello Nador,
>> >
>> > probably I can make this time. Just a timezone question: Is it 6pm CET
>> or 6pm CEST? I guess the latter.
>> >
>> > See http://timesched.pocoo.org/?date=2018-08-28=central-
>> europe-standard-time!,pacific-standard-time=1080,1140
>> >
>> > Uwe
>> >
>> > On Mon, Aug 27, 2018, at 12:20 PM, Nandor Kollar wrote:
>> >> Hi All,
>> >>
>> >> As discussed on last Parquet sync, I propose to have an other meeting
>> >> on August 28th, at 6pm CET / 9 am PST to discuss those topic which we
>> >> didn't have time on the sync at August 15th, and of course any new
>> >> topic too.
>> >>
>> >> Sorry for the late notice, feel free to propose other time slot if is
>> >> is not suitable for you! Calendar entry to follow.
>> >>
>> >> Regards,
>> >> Nandor
>>


Re: Date and time for next Parquet sync

2018-08-28 Thread Julien Le Dem
Notes:
Anna (Cloudera): Bloom filter update, Iceberg
Gabor, Nandor (Cloudera):

   - Value skipping implementation to be reviewed. Move Java code from
   parquet-format to parquet-mr. PR ready
   - How can users of Parquet handle timestamps and TZs. Allow for writing
   timestamp in java. Refactor original type logic to more flexible new
   original type api.
   - Column indexes and alignment of pages
   - Limiting the number of records in a page to avoid skewed splits when
   compression is really good.

Ryan (Netflix): Iceberg stuff back to Parquet: expression library for push
down. Dictionary and stats based row group filtering.
JunJie (Intel): Bloom filter. Need more reviews. Have a vote on the design
and add it to parquet-format.
Julien (Wework): Encryption.


   - Bloom Filter:
   https://issues.apache.org/jira/projects/PARQUET/issues/PARQUET-41
   

   -
  - Committed utility class to parquet-cpp
  - Uploaded the benchmark result.
  - Ready to add into the spec.
  - Submit a PR for the parquet reader spec.
  - *Action*: review parquet java utility class.
  https://github.com/apache/parquet-mr/pull/425
  - Encryption:
   -
  - Nandor, Gabor reviewing.
  - Apis to allow pluggable key management.
  - Need to have a proper review of the spec.
  - Need more testing
  - Column indices:
   -
  - PR to be reviewed: https://github.com/apache/parquet-mr/pull/514
  - Ryan: to review features branch
  - Moving java code from parquet-format to parquet-mr:
   -
  - Action: review. https://github.com/apache/parquet-mr/pull/517
  - Gets the thrift file from the parquet-format released artifact.
  - Maximum number of records per page:
   -
  - We should add a property with a maximum number of records per page
  and per row group.
  - Need to benchmark to figure out a good default. 10K?
  - Iceberg:
   -
  - Some of the iceberg code should be in Parquet:
  -
 - Rewrote record reconstruction stack
 -
- Reuses page reader and decoder
- Then does a triple iterator that return an entire column in a
file (iterator of triples)
- Record reconstruction class that handles everything that the
current one does but with {list, map} factories
-
   - 20% faster to write, 5% faster to read
   - Easier to write object mappers
- Helps with page level skipping.
- High level abstractions in the iceberg library:
 -
- Take an expression and simplify it (not, ...) to run on
metadata
- Take a complex expression and split the part on the
partition/min/max and the remaining part.






On Mon, Aug 27, 2018 at 4:56 AM, Nandor Kollar  wrote:

> Yes, CEST.
>
> On Mon, Aug 27, 2018 at 1:01 PM, Uwe L. Korn  wrote:
> > Hello Nador,
> >
> > probably I can make this time. Just a timezone question: Is it 6pm CET
> or 6pm CEST? I guess the latter.
> >
> > See http://timesched.pocoo.org/?date=2018-08-28=central-
> europe-standard-time!,pacific-standard-time=1080,1140
> >
> > Uwe
> >
> > On Mon, Aug 27, 2018, at 12:20 PM, Nandor Kollar wrote:
> >> Hi All,
> >>
> >> As discussed on last Parquet sync, I propose to have an other meeting
> >> on August 28th, at 6pm CET / 9 am PST to discuss those topic which we
> >> didn't have time on the sync at August 15th, and of course any new
> >> topic too.
> >>
> >> Sorry for the late notice, feel free to propose other time slot if is
> >> is not suitable for you! Calendar entry to follow.
> >>
> >> Regards,
> >> Nandor
>


Re: Date and time for next Parquet sync

2018-08-27 Thread Nandor Kollar
Yes, CEST.

On Mon, Aug 27, 2018 at 1:01 PM, Uwe L. Korn  wrote:
> Hello Nador,
>
> probably I can make this time. Just a timezone question: Is it 6pm CET or 6pm 
> CEST? I guess the latter.
>
> See 
> http://timesched.pocoo.org/?date=2018-08-28=central-europe-standard-time!,pacific-standard-time=1080,1140
>
> Uwe
>
> On Mon, Aug 27, 2018, at 12:20 PM, Nandor Kollar wrote:
>> Hi All,
>>
>> As discussed on last Parquet sync, I propose to have an other meeting
>> on August 28th, at 6pm CET / 9 am PST to discuss those topic which we
>> didn't have time on the sync at August 15th, and of course any new
>> topic too.
>>
>> Sorry for the late notice, feel free to propose other time slot if is
>> is not suitable for you! Calendar entry to follow.
>>
>> Regards,
>> Nandor


Re: Date and time for next Parquet sync

2018-08-27 Thread Uwe L. Korn
Hello Nador,

probably I can make this time. Just a timezone question: Is it 6pm CET or 6pm 
CEST? I guess the latter. 

See 
http://timesched.pocoo.org/?date=2018-08-28=central-europe-standard-time!,pacific-standard-time=1080,1140

Uwe

On Mon, Aug 27, 2018, at 12:20 PM, Nandor Kollar wrote:
> Hi All,
> 
> As discussed on last Parquet sync, I propose to have an other meeting
> on August 28th, at 6pm CET / 9 am PST to discuss those topic which we
> didn't have time on the sync at August 15th, and of course any new
> topic too.
> 
> Sorry for the late notice, feel free to propose other time slot if is
> is not suitable for you! Calendar entry to follow.
> 
> Regards,
> Nandor


Date and time for next Parquet sync

2018-08-27 Thread Nandor Kollar
Hi All,

As discussed on last Parquet sync, I propose to have an other meeting
on August 28th, at 6pm CET / 9 am PST to discuss those topic which we
didn't have time on the sync at August 15th, and of course any new
topic too.

Sorry for the late notice, feel free to propose other time slot if is
is not suitable for you! Calendar entry to follow.

Regards,
Nandor


Re: Date and time for next Parquet sync

2018-08-12 Thread Uwe L. Korn
As the meeting falls into my summer vacation I cannot participate but will try 
to join again if there is a meeting two weeks later.

Uwe

> Am 08.08.2018 um 16:43 schrieb Nandor Kollar :
> 
> Hi All,
> 
> It has been a while since we had a Parquet sync, therefore I'd like to
> propose to have one next week on August 15th, at 6pm CET / 9 am PST.
> 
> I'll send a meeting invite with the details soon, let me know if this time
> is not suitable for you!
> 
> Since the last sync there are couple of topics to discuss, like:
> - Status of Parquet encryption
> - Release a new minor version, scope of the new release
> - Bloom filters
> - Move Java specific code from parquet-format to parquet-mr
> - parquet.thrift usage best practices in different language bindings (Java,
> C++, Python, Rust)
> - LZ4 incompatibility
> 
> The agenda is open for suggestions.
> 
> Regards,
> Nandor



Date and time for next Parquet sync

2018-08-08 Thread Nandor Kollar
Hi All,

It has been a while since we had a Parquet sync, therefore I'd like to
propose to have one next week on August 15th, at 6pm CET / 9 am PST.

I'll send a meeting invite with the details soon, let me know if this time
is not suitable for you!

Since the last sync there are couple of topics to discuss, like:
- Status of Parquet encryption
- Release a new minor version, scope of the new release
- Bloom filters
- Move Java specific code from parquet-format to parquet-mr
- parquet.thrift usage best practices in different language bindings (Java,
C++, Python, Rust)
- LZ4 incompatibility

The agenda is open for suggestions.

Regards,
Nandor


Re: Date and time for the next Parquet sync

2018-05-08 Thread Lars Volker
I sent an invite for the proposed time. Please let me know if you would
like to be added to the meeting but haven't received an invite.

Cheers, Lars


On Mon, May 7, 2018 at 9:27 AM, Lars Volker  wrote:

> Hi All,
>
> I'd like to propose to have a Parquet Sync this week on Wednesday, May
> 9th, at 6pm CET / 9 am PST. Last time we met on a Tuesday, so this time
> it should be Wednesday.
>
> Please speak up if that time does not work for you. Otherwise I will send
> out the MR tomorrow morning.
>
> Cheers, Lars
>
>


Date and time for the next Parquet sync

2018-05-07 Thread Lars Volker
Hi All,

I'd like to propose to have a Parquet Sync this week on Wednesday, May 9th,
at 6pm CET / 9 am PST. Last time we met on a Tuesday, so this time it
should be Wednesday.

Please speak up if that time does not work for you. Otherwise I will send
out the MR tomorrow morning.

Cheers, Lars


Re: Date and time for the next Parquet sync

2018-04-21 Thread Lars Volker
I sent an invite for the proposed time. Please let me know if you would
like to be added to the meeting but haven't received an invite.

Cheers, Lars

On Fri, Apr 20, 2018 at 3:11 PM, Julien Le Dem 
wrote:

> +1
>
> On Wed, Apr 18, 2018 at 9:23 AM, Zoltan Ivanfi  wrote:
>
> > +1, thanks Lars!
> >
> > On Wed, Apr 18, 2018 at 6:20 PM Lars Volker  wrote:
> >
> > > Hi All,
> > >
> > > It has been 3 weeks since our last Parquet community sync and I think
> it
> > > would be great to have one next week. Last time we met on a Wednesday,
> so
> > > this time it should be Tuesday.
> > >
> > > I'd like to propose next Tuesday, April 24th, at 6pm CET / 9 am PST.
> > >
> > > Please speak up if that time does not work for you.
> > >
> > > Cheers, Lars
> > >
> >
>


Re: Date and time for the next Parquet sync

2018-04-20 Thread Julien Le Dem
+1

On Wed, Apr 18, 2018 at 9:23 AM, Zoltan Ivanfi  wrote:

> +1, thanks Lars!
>
> On Wed, Apr 18, 2018 at 6:20 PM Lars Volker  wrote:
>
> > Hi All,
> >
> > It has been 3 weeks since our last Parquet community sync and I think it
> > would be great to have one next week. Last time we met on a Wednesday, so
> > this time it should be Tuesday.
> >
> > I'd like to propose next Tuesday, April 24th, at 6pm CET / 9 am PST.
> >
> > Please speak up if that time does not work for you.
> >
> > Cheers, Lars
> >
>


Re: Date and time for the next Parquet sync

2018-04-18 Thread Zoltan Ivanfi
+1, thanks Lars!

On Wed, Apr 18, 2018 at 6:20 PM Lars Volker  wrote:

> Hi All,
>
> It has been 3 weeks since our last Parquet community sync and I think it
> would be great to have one next week. Last time we met on a Wednesday, so
> this time it should be Tuesday.
>
> I'd like to propose next Tuesday, April 24th, at 6pm CET / 9 am PST.
>
> Please speak up if that time does not work for you.
>
> Cheers, Lars
>


Date and time for the next Parquet sync

2018-04-18 Thread Lars Volker
Hi All,

It has been 3 weeks since our last Parquet community sync and I think it
would be great to have one next week. Last time we met on a Wednesday, so
this time it should be Tuesday.

I'd like to propose next Tuesday, April 24th, at 6pm CET / 9 am PST.

Please speak up if that time does not work for you.

Cheers, Lars


Date and time for the next Parquet sync

2018-03-22 Thread Lars Volker
Following our biweekly cadence we should have a Parquet community sync next
week. Last time we met on a Tuesday, so this time it should be Wednesday.

I propose to meet next Wednesday, March 28th, at 6pm CET / 9am PST. Europe
switches to daylight saving time during the weekend so we will be back to 9
hours difference.

Please speak up if that time does not work for you.

Cheers, Lars


Re: Date for next Parquet sync

2018-03-12 Thread Lars Volker
I sent out a meeting request for tomorrow, Tuesday, 10am PDT, 6pm CET, 5pm
UTC. If you want to join and have not received an invite, please reach out
to me.

Cheers, Lars

On Thu, Mar 8, 2018 at 4:22 PM, Julien Le Dem 
wrote:

> Actually because of Daylight saving time we will have one less hour next
> week.
> https://www.timeanddate.com/worldclock/meetingdetails.
> html?year=2018=3=13=17=0=0=224=50=195
> Location Local Time Time Zone UTC Offset
> San Francisco (USA - California) Tuesday, March 13, 2018 at 10:00:00
> am PDT UTC-7
> hours
> Budapest (Hungary) Tuesday, March 13, 2018 at 6:00:00 pm CET UTC+1 hour
> Paris (France - Île-de-France) Tuesday, March 13, 2018 at 6:00:00 pm CET
> UTC+1
> hour
> Corresponding UTC (GMT) Tuesday, March 13, 2018 at 17:00:00
>
>
> On Thu, Mar 8, 2018 at 4:12 PM, Julien Le Dem 
> wrote:
>
> > or 10am PST but it's a little late for the team in Budapest.
> >
> > On Thu, Mar 8, 2018 at 4:11 PM, Julien Le Dem 
> > wrote:
> >
> >> I'm sorry, it turns out I now have a conflict at this particular time.
> >> Maybe Wednesday?
> >>
> >> On Mon, Mar 5, 2018 at 10:55 AM, Lars Volker  wrote:
> >>
> >>> Hi All,
> >>>
> >>> It has been almost 3 weeks since the last sync and there are a bunch of
> >>> ongoing discussions on the mailing list. Let's find a date for the next
> >>> Parquet community sync. Last time we met on a Wednesday, so this time
> it
> >>> should be Tuesday.
> >>>
> >>> I propose to meet next Tuesday, March 13th, at 6pm CET / 9am PST. That
> >>> allows us to get back to the biweekly cadence without overlapping with
> >>> the
> >>> Arrow sync, which happens this week.
> >>>
> >>> Please speak up if that time does not work for you.
> >>>
> >>> Cheers, Lars
> >>>
> >>
> >>
> >
>


Re: Date for next Parquet sync

2018-03-08 Thread Julien Le Dem
Actually because of Daylight saving time we will have one less hour next
week.
https://www.timeanddate.com/worldclock/meetingdetails.html?year=2018=3=13=17=0=0=224=50=195
Location Local Time Time Zone UTC Offset
San Francisco (USA - California) Tuesday, March 13, 2018 at 10:00:00
am PDT UTC-7
hours
Budapest (Hungary) Tuesday, March 13, 2018 at 6:00:00 pm CET UTC+1 hour
Paris (France - Île-de-France) Tuesday, March 13, 2018 at 6:00:00 pm CET UTC+1
hour
Corresponding UTC (GMT) Tuesday, March 13, 2018 at 17:00:00


On Thu, Mar 8, 2018 at 4:12 PM, Julien Le Dem 
wrote:

> or 10am PST but it's a little late for the team in Budapest.
>
> On Thu, Mar 8, 2018 at 4:11 PM, Julien Le Dem 
> wrote:
>
>> I'm sorry, it turns out I now have a conflict at this particular time.
>> Maybe Wednesday?
>>
>> On Mon, Mar 5, 2018 at 10:55 AM, Lars Volker  wrote:
>>
>>> Hi All,
>>>
>>> It has been almost 3 weeks since the last sync and there are a bunch of
>>> ongoing discussions on the mailing list. Let's find a date for the next
>>> Parquet community sync. Last time we met on a Wednesday, so this time it
>>> should be Tuesday.
>>>
>>> I propose to meet next Tuesday, March 13th, at 6pm CET / 9am PST. That
>>> allows us to get back to the biweekly cadence without overlapping with
>>> the
>>> Arrow sync, which happens this week.
>>>
>>> Please speak up if that time does not work for you.
>>>
>>> Cheers, Lars
>>>
>>
>>
>


Re: Date for next Parquet sync

2018-03-08 Thread Julien Le Dem
or 10am PST but it's a little late for the team in Budapest.

On Thu, Mar 8, 2018 at 4:11 PM, Julien Le Dem 
wrote:

> I'm sorry, it turns out I now have a conflict at this particular time.
> Maybe Wednesday?
>
> On Mon, Mar 5, 2018 at 10:55 AM, Lars Volker  wrote:
>
>> Hi All,
>>
>> It has been almost 3 weeks since the last sync and there are a bunch of
>> ongoing discussions on the mailing list. Let's find a date for the next
>> Parquet community sync. Last time we met on a Wednesday, so this time it
>> should be Tuesday.
>>
>> I propose to meet next Tuesday, March 13th, at 6pm CET / 9am PST. That
>> allows us to get back to the biweekly cadence without overlapping with the
>> Arrow sync, which happens this week.
>>
>> Please speak up if that time does not work for you.
>>
>> Cheers, Lars
>>
>
>


Re: Date for next Parquet sync

2018-03-08 Thread Julien Le Dem
I'm sorry, it turns out I now have a conflict at this particular time.
Maybe Wednesday?

On Mon, Mar 5, 2018 at 10:55 AM, Lars Volker  wrote:

> Hi All,
>
> It has been almost 3 weeks since the last sync and there are a bunch of
> ongoing discussions on the mailing list. Let's find a date for the next
> Parquet community sync. Last time we met on a Wednesday, so this time it
> should be Tuesday.
>
> I propose to meet next Tuesday, March 13th, at 6pm CET / 9am PST. That
> allows us to get back to the biweekly cadence without overlapping with the
> Arrow sync, which happens this week.
>
> Please speak up if that time does not work for you.
>
> Cheers, Lars
>


Re: Date and Time for next Parquet sync

2018-02-09 Thread Julien Le Dem
If you have received an invitation for next Wednesday, please disregard it
for now.
I was just adding people to the list of reminders.
I'll move it to whenever is the conclusion of this thread.
I have a conflict on Tuesday though.
I am available on Wednesday.

On Wed, Feb 7, 2018 at 11:29 PM, Gabor Szadovszky <
gabor.szadovs...@cloudera.com> wrote:

> Hi All,
>
> I would vote on Tuesday but don’t have any problem with skipping this one
> if Wednesday fits more for others.
>
> Cheers,
> Gabor
>
> > On 7 Feb 2018, at 19:00, Lars Volker  wrote:
> >
> > Hi All,
> >
> > I propose to have the next regular Parquet sync next week, either on
> > Tuesday or Wednesday at 9am PST / 6pm CET.
> >
> > The last one was on a Tuesday so this one would default to Wednesday.
> Let's
> > have a quick vote here by replying to this email with your day of choice.
> > Feel free to propose any other time if neither of these work for you.
> >
> > Cheers, Lars
>
>


Re: Date and Time for next Parquet sync

2018-02-07 Thread Gabor Szadovszky
Hi All,

I would vote on Tuesday but don’t have any problem with skipping this one if 
Wednesday fits more for others.

Cheers,
Gabor

> On 7 Feb 2018, at 19:00, Lars Volker  wrote:
> 
> Hi All,
> 
> I propose to have the next regular Parquet sync next week, either on
> Tuesday or Wednesday at 9am PST / 6pm CET.
> 
> The last one was on a Tuesday so this one would default to Wednesday. Let's
> have a quick vote here by replying to this email with your day of choice.
> Feel free to propose any other time if neither of these work for you.
> 
> Cheers, Lars



Date and Time for next Parquet sync

2018-02-07 Thread Lars Volker
Hi All,

I propose to have the next regular Parquet sync next week, either on
Tuesday or Wednesday at 9am PST / 6pm CET.

The last one was on a Tuesday so this one would default to Wednesday. Let's
have a quick vote here by replying to this email with your day of choice.
Feel free to propose any other time if neither of these work for you.

Cheers, Lars


Re: Date and time for next parquet sync

2018-01-29 Thread Lars Volker
Thanks all who replied, I sent an invite for Tuesday. Cheers, Lars

On Mon, Jan 29, 2018 at 10:56 AM, Marcel Kornacker 
wrote:

> +1 for Tuesday
>
> On Mon, Jan 29, 2018 at 4:03 AM, Uwe L. Korn  wrote:
> > +1, Tuesday to Thursday are ok for me but I would prefer Tuesday this
> week.
> >
> > Uwe
> >
> > On Mon, Jan 29, 2018, at 12:54 PM, Zoltan Ivanfi wrote:
> >> +1 for Tuesday, this week I can't attend on Wednesday.
> >>
> >> Zoltan
> >>
> >> On Mon, Jan 29, 2018 at 7:29 AM Lars Volker  wrote:
> >>
> >> > I'm good with either day. Does anyone prefer Wednesday over Tuesday?
> >> >
> >> > On Tue, Jan 23, 2018 at 11:27 PM, Gabor Szadovszky <
> >> > gabor.szadovs...@cloudera.com> wrote:
> >> >
> >> > > Hi All,
> >> > >
> >> > > As usual, I’m the one who complains…
> >> > > Tuesday/Thursday would be better for me. If one of these days is
> suitable
> >> > > for everyone I would be happy to participate. If not, I’m fine with
> going
> >> > > to the next meeting instead.
> >> > >
> >> > > Cheers,
> >> > > Gabor
> >> > >
> >> > > > On 24 Jan 2018, at 00:56, Lars Volker  wrote:
> >> > > >
> >> > > > Hi All,
> >> > > >
> >> > > > After chatting with Julien I'd like to propose to do the next
> regular
> >> > > > Parquet sync on next Wednesday, January 31st, at 5pm GMT (6pm
> CET, 9am
> >> > > > PST). This will get us back to alternating weeks with the arrow
> sync.
> >> > If
> >> > > > that doesn't work for you, please let me know.
> >> > > >
> >> > > > Cheers, Lars
> >> > >
> >> > >
> >> >
>


Re: Date and time for next parquet sync

2018-01-29 Thread Marcel Kornacker
+1 for Tuesday

On Mon, Jan 29, 2018 at 4:03 AM, Uwe L. Korn  wrote:
> +1, Tuesday to Thursday are ok for me but I would prefer Tuesday this week.
>
> Uwe
>
> On Mon, Jan 29, 2018, at 12:54 PM, Zoltan Ivanfi wrote:
>> +1 for Tuesday, this week I can't attend on Wednesday.
>>
>> Zoltan
>>
>> On Mon, Jan 29, 2018 at 7:29 AM Lars Volker  wrote:
>>
>> > I'm good with either day. Does anyone prefer Wednesday over Tuesday?
>> >
>> > On Tue, Jan 23, 2018 at 11:27 PM, Gabor Szadovszky <
>> > gabor.szadovs...@cloudera.com> wrote:
>> >
>> > > Hi All,
>> > >
>> > > As usual, I’m the one who complains…
>> > > Tuesday/Thursday would be better for me. If one of these days is suitable
>> > > for everyone I would be happy to participate. If not, I’m fine with going
>> > > to the next meeting instead.
>> > >
>> > > Cheers,
>> > > Gabor
>> > >
>> > > > On 24 Jan 2018, at 00:56, Lars Volker  wrote:
>> > > >
>> > > > Hi All,
>> > > >
>> > > > After chatting with Julien I'd like to propose to do the next regular
>> > > > Parquet sync on next Wednesday, January 31st, at 5pm GMT (6pm CET, 9am
>> > > > PST). This will get us back to alternating weeks with the arrow sync.
>> > If
>> > > > that doesn't work for you, please let me know.
>> > > >
>> > > > Cheers, Lars
>> > >
>> > >
>> >


Re: Date and time for next parquet sync

2018-01-29 Thread Uwe L. Korn
+1, Tuesday to Thursday are ok for me but I would prefer Tuesday this week.

Uwe

On Mon, Jan 29, 2018, at 12:54 PM, Zoltan Ivanfi wrote:
> +1 for Tuesday, this week I can't attend on Wednesday.
> 
> Zoltan
> 
> On Mon, Jan 29, 2018 at 7:29 AM Lars Volker  wrote:
> 
> > I'm good with either day. Does anyone prefer Wednesday over Tuesday?
> >
> > On Tue, Jan 23, 2018 at 11:27 PM, Gabor Szadovszky <
> > gabor.szadovs...@cloudera.com> wrote:
> >
> > > Hi All,
> > >
> > > As usual, I’m the one who complains…
> > > Tuesday/Thursday would be better for me. If one of these days is suitable
> > > for everyone I would be happy to participate. If not, I’m fine with going
> > > to the next meeting instead.
> > >
> > > Cheers,
> > > Gabor
> > >
> > > > On 24 Jan 2018, at 00:56, Lars Volker  wrote:
> > > >
> > > > Hi All,
> > > >
> > > > After chatting with Julien I'd like to propose to do the next regular
> > > > Parquet sync on next Wednesday, January 31st, at 5pm GMT (6pm CET, 9am
> > > > PST). This will get us back to alternating weeks with the arrow sync.
> > If
> > > > that doesn't work for you, please let me know.
> > > >
> > > > Cheers, Lars
> > >
> > >
> >


Re: Date and time for next parquet sync

2018-01-29 Thread Zoltan Ivanfi
+1 for Tuesday, this week I can't attend on Wednesday.

Zoltan

On Mon, Jan 29, 2018 at 7:29 AM Lars Volker  wrote:

> I'm good with either day. Does anyone prefer Wednesday over Tuesday?
>
> On Tue, Jan 23, 2018 at 11:27 PM, Gabor Szadovszky <
> gabor.szadovs...@cloudera.com> wrote:
>
> > Hi All,
> >
> > As usual, I’m the one who complains…
> > Tuesday/Thursday would be better for me. If one of these days is suitable
> > for everyone I would be happy to participate. If not, I’m fine with going
> > to the next meeting instead.
> >
> > Cheers,
> > Gabor
> >
> > > On 24 Jan 2018, at 00:56, Lars Volker  wrote:
> > >
> > > Hi All,
> > >
> > > After chatting with Julien I'd like to propose to do the next regular
> > > Parquet sync on next Wednesday, January 31st, at 5pm GMT (6pm CET, 9am
> > > PST). This will get us back to alternating weeks with the arrow sync.
> If
> > > that doesn't work for you, please let me know.
> > >
> > > Cheers, Lars
> >
> >
>


Re: Date and time for next parquet sync

2018-01-28 Thread Lars Volker
I'm good with either day. Does anyone prefer Wednesday over Tuesday?

On Tue, Jan 23, 2018 at 11:27 PM, Gabor Szadovszky <
gabor.szadovs...@cloudera.com> wrote:

> Hi All,
>
> As usual, I’m the one who complains…
> Tuesday/Thursday would be better for me. If one of these days is suitable
> for everyone I would be happy to participate. If not, I’m fine with going
> to the next meeting instead.
>
> Cheers,
> Gabor
>
> > On 24 Jan 2018, at 00:56, Lars Volker  wrote:
> >
> > Hi All,
> >
> > After chatting with Julien I'd like to propose to do the next regular
> > Parquet sync on next Wednesday, January 31st, at 5pm GMT (6pm CET, 9am
> > PST). This will get us back to alternating weeks with the arrow sync. If
> > that doesn't work for you, please let me know.
> >
> > Cheers, Lars
>
>


Re: Date and time for next parquet sync

2018-01-23 Thread Gabor Szadovszky
Hi All,

As usual, I’m the one who complains…
Tuesday/Thursday would be better for me. If one of these days is suitable for 
everyone I would be happy to participate. If not, I’m fine with going to the 
next meeting instead.

Cheers,
Gabor

> On 24 Jan 2018, at 00:56, Lars Volker  wrote:
> 
> Hi All,
> 
> After chatting with Julien I'd like to propose to do the next regular
> Parquet sync on next Wednesday, January 31st, at 5pm GMT (6pm CET, 9am
> PST). This will get us back to alternating weeks with the arrow sync. If
> that doesn't work for you, please let me know.
> 
> Cheers, Lars



Date and time for next parquet sync

2018-01-23 Thread Lars Volker
Hi All,

After chatting with Julien I'd like to propose to do the next regular
Parquet sync on next Wednesday, January 31st, at 5pm GMT (6pm CET, 9am
PST). This will get us back to alternating weeks with the arrow sync. If
that doesn't work for you, please let me know.

Cheers, Lars


Re: Next parquet sync

2018-01-10 Thread Julien Le Dem
notes:
Agenda and attendees:

   -  Anuj Phadke (impala team)
   - Uwe (Blue Yonder, parquet-cpp):
   -
  - Discuss parquet dotnet project
   - Lars (Impala):
   -
  - timestamp int96. Deprecate ordering
   - Nandor (File format team in Cloudera)
   - Zoltan (Cloudera):
   -
  - discuss page size recommendation
   - Gabor (file formats)
   - Ryan (Netflix):
   -
  - Been working on better read api. Rewrite record construction in
  parquet-avro + 5%.
  - Discuss PARQUET-787 (how we build the decoders for byte arrays)
   - Marcel
   - Julien (Wework):
   -
  - releases


Agenda:

   - Deprecating ordering for int96 timestamp:
   https://issues.apache.org/jira/browse/PARQUET-1065
   
<https://meet.google.com/linkredirect?authuser=0=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FPARQUET-1065>
   -
  - It was decided to deprecate ordering for int96.
  -
 - Do not use existing min/max stat for int96
 - Label int96 as not supporting ordering
 - Do not write int96 anymore
 - Always support reading int96 for backward compatibility with
 existing files
  - PR from Zoltan. Change to change int96 ordering from Unsigned to
  undefined.
  - Lars: Impala actually uses int96 min/max and ordering and will do
  it for some time.
  - Conclusion:
  -
 - Add language to say writing int96 is not allowed (caveat that
 people can do things anyway). People should use 64bits
timestamps instead.
 - Add spec of int96 to doc with warning that the only purpose is
 to enable reading existing files.
 - PRs:
 -
- https://github.com/apache/parquet-format/pull/77
- https://github.com/apache/parquet-format/pull/49
- Action: lars to update #49



   - Parquet dotnet project
   -
  - Discussion on wether we should import it in the apache parquet
  project
  - General advice is to make sure the authors are engaged enough with
  the project to maintain it long term.
  - We should Keep reaching out and support this effort



   - Page size reco
   -
  - Zoltan: Create a JIRA.
  - Wait for page skipping implementation to get numbers on the impact
  of page size
  - Look at different strategies for page size (bytes before
  compression, #values, ...)
  - Make some measurements
  - Restart the conversation
   - PARQUET-787: needs a review

https://github.com/apache/parquet-mr/pull/390


   - Releases
   -
  - Ryan: create release jira



On Tue, Jan 9, 2018 at 8:54 AM, Julien Le Dem <julien.le...@wework.com>
wrote:

> The sync is starting in a few minutes:
> https://meet.google.com/cxa-nppv-caa
> (as a reminder, everybody is welcome to join if only to be a fly on the
> wall)
>
> On Tue, Jan 9, 2018 at 2:31 AM, Lars Volker <l...@cloudera.com> wrote:
>
>> Great, I sent out an invite. If anyone wants to join but was not on the
>> invite, please let me know.
>>
>> Cheers, Lars
>>
>> On Mon, Jan 8, 2018 at 10:24 PM, Julien Le Dem <julien.le...@wework.com>
>> wrote:
>>
>> > It sounds like we're doing the parquet sync tomorrow Tuesday January
>> 9th at
>> > 9am PT (5pm UTC)
>> >
>> > On Thu, Jan 4, 2018 at 9:17 AM, Marcel Kornacker <marc...@gmail.com>
>> > wrote:
>> >
>> > > My preference for next week would be Tuesday as well.
>> > >
>> > > On Thu, Jan 4, 2018 at 8:25 AM, Zoltan Ivanfi <z...@cloudera.com>
>> wrote:
>> > >
>> > > > Hi,
>> > > >
>> > > > According to the latest results of the availability poll, Tuesdays
>> > seems
>> > > > to work for slightly more people than Wednesdays. I'll try to post
>> the
>> > > > chart below, let's see whether the mailing list allows it or removes
>> > it:
>> > > > [image: pasted1]
>> > > >
>> > > > I would suggest to either use Tuesdays or alternate between Tuesdays
>> > and
>> > > > Wednesdays (since the group of 9 Tuesday voters does not contain
>> all 8
>> > > > Wednesday voters). The last sync was on Tuesdays, so the next can
>> be on
>> > > > Wednesday if you would like to follow this alternating scheme.
>> > > >
>> > > > Best regards,
>> > > >
>> > > > Zoltan
>> > > >
>> > > >
>> > > > On Thu, Jan 4, 2018 at 4:27 PM Wes McKinney <wesmck...@gmail.com>
>> > wrote:
>> > > >
>> > > >> We have been staggering the Arrow syncs by 1 week, also on
>> Wednesdays
>> > > >> at 9am PT. If you are going to have the next Parquet sync on 1/10,
>> we
>> > > >> would have the next Arrow sync on 1/17. Let me know what you prefer
>> > > >>
>> > > >> On Thu, Jan 4, 2018 at 4:10 AM, Lars Volker <l...@cloudera.com>
>> wrote:
>> > > >> > 1/10 would work for me.
>> > > >> >
>> > > >> > On Thu, Jan 4, 2018 at 3:22 AM, Julien Le Dem <
>> > julien.le...@gmail.com
>> > > >
>> > > >> > wrote:
>> > > >> >
>> > > >> >> Any day of the week/time preference for the next Parquet sync?
>> > > >> >> It is usually held at 9am PT (5pm UTC) on a Wednesday.
>> > > >> >>
>> > > >>
>> > > >
>> > >
>> >
>>
>
>


Re: Next parquet sync

2018-01-09 Thread Julien Le Dem
The sync is starting in a few minutes:
https://meet.google.com/cxa-nppv-caa
(as a reminder, everybody is welcome to join if only to be a fly on the
wall)

On Tue, Jan 9, 2018 at 2:31 AM, Lars Volker <l...@cloudera.com> wrote:

> Great, I sent out an invite. If anyone wants to join but was not on the
> invite, please let me know.
>
> Cheers, Lars
>
> On Mon, Jan 8, 2018 at 10:24 PM, Julien Le Dem <julien.le...@wework.com>
> wrote:
>
> > It sounds like we're doing the parquet sync tomorrow Tuesday January 9th
> at
> > 9am PT (5pm UTC)
> >
> > On Thu, Jan 4, 2018 at 9:17 AM, Marcel Kornacker <marc...@gmail.com>
> > wrote:
> >
> > > My preference for next week would be Tuesday as well.
> > >
> > > On Thu, Jan 4, 2018 at 8:25 AM, Zoltan Ivanfi <z...@cloudera.com> wrote:
> > >
> > > > Hi,
> > > >
> > > > According to the latest results of the availability poll, Tuesdays
> > seems
> > > > to work for slightly more people than Wednesdays. I'll try to post
> the
> > > > chart below, let's see whether the mailing list allows it or removes
> > it:
> > > > [image: pasted1]
> > > >
> > > > I would suggest to either use Tuesdays or alternate between Tuesdays
> > and
> > > > Wednesdays (since the group of 9 Tuesday voters does not contain all
> 8
> > > > Wednesday voters). The last sync was on Tuesdays, so the next can be
> on
> > > > Wednesday if you would like to follow this alternating scheme.
> > > >
> > > > Best regards,
> > > >
> > > > Zoltan
> > > >
> > > >
> > > > On Thu, Jan 4, 2018 at 4:27 PM Wes McKinney <wesmck...@gmail.com>
> > wrote:
> > > >
> > > >> We have been staggering the Arrow syncs by 1 week, also on
> Wednesdays
> > > >> at 9am PT. If you are going to have the next Parquet sync on 1/10,
> we
> > > >> would have the next Arrow sync on 1/17. Let me know what you prefer
> > > >>
> > > >> On Thu, Jan 4, 2018 at 4:10 AM, Lars Volker <l...@cloudera.com>
> wrote:
> > > >> > 1/10 would work for me.
> > > >> >
> > > >> > On Thu, Jan 4, 2018 at 3:22 AM, Julien Le Dem <
> > julien.le...@gmail.com
> > > >
> > > >> > wrote:
> > > >> >
> > > >> >> Any day of the week/time preference for the next Parquet sync?
> > > >> >> It is usually held at 9am PT (5pm UTC) on a Wednesday.
> > > >> >>
> > > >>
> > > >
> > >
> >
>


Re: Next parquet sync

2018-01-09 Thread Lars Volker
Great, I sent out an invite. If anyone wants to join but was not on the
invite, please let me know.

Cheers, Lars

On Mon, Jan 8, 2018 at 10:24 PM, Julien Le Dem <julien.le...@wework.com>
wrote:

> It sounds like we're doing the parquet sync tomorrow Tuesday January 9th at
> 9am PT (5pm UTC)
>
> On Thu, Jan 4, 2018 at 9:17 AM, Marcel Kornacker <marc...@gmail.com>
> wrote:
>
> > My preference for next week would be Tuesday as well.
> >
> > On Thu, Jan 4, 2018 at 8:25 AM, Zoltan Ivanfi <z...@cloudera.com> wrote:
> >
> > > Hi,
> > >
> > > According to the latest results of the availability poll, Tuesdays
> seems
> > > to work for slightly more people than Wednesdays. I'll try to post the
> > > chart below, let's see whether the mailing list allows it or removes
> it:
> > > [image: pasted1]
> > >
> > > I would suggest to either use Tuesdays or alternate between Tuesdays
> and
> > > Wednesdays (since the group of 9 Tuesday voters does not contain all 8
> > > Wednesday voters). The last sync was on Tuesdays, so the next can be on
> > > Wednesday if you would like to follow this alternating scheme.
> > >
> > > Best regards,
> > >
> > > Zoltan
> > >
> > >
> > > On Thu, Jan 4, 2018 at 4:27 PM Wes McKinney <wesmck...@gmail.com>
> wrote:
> > >
> > >> We have been staggering the Arrow syncs by 1 week, also on Wednesdays
> > >> at 9am PT. If you are going to have the next Parquet sync on 1/10, we
> > >> would have the next Arrow sync on 1/17. Let me know what you prefer
> > >>
> > >> On Thu, Jan 4, 2018 at 4:10 AM, Lars Volker <l...@cloudera.com> wrote:
> > >> > 1/10 would work for me.
> > >> >
> > >> > On Thu, Jan 4, 2018 at 3:22 AM, Julien Le Dem <
> julien.le...@gmail.com
> > >
> > >> > wrote:
> > >> >
> > >> >> Any day of the week/time preference for the next Parquet sync?
> > >> >> It is usually held at 9am PT (5pm UTC) on a Wednesday.
> > >> >>
> > >>
> > >
> >
>


Re: Next parquet sync

2018-01-04 Thread Marcel Kornacker
My preference for next week would be Tuesday as well.

On Thu, Jan 4, 2018 at 8:25 AM, Zoltan Ivanfi <z...@cloudera.com> wrote:

> Hi,
>
> According to the latest results of the availability poll, Tuesdays seems
> to work for slightly more people than Wednesdays. I'll try to post the
> chart below, let's see whether the mailing list allows it or removes it:
> [image: pasted1]
>
> I would suggest to either use Tuesdays or alternate between Tuesdays and
> Wednesdays (since the group of 9 Tuesday voters does not contain all 8
> Wednesday voters). The last sync was on Tuesdays, so the next can be on
> Wednesday if you would like to follow this alternating scheme.
>
> Best regards,
>
> Zoltan
>
>
> On Thu, Jan 4, 2018 at 4:27 PM Wes McKinney <wesmck...@gmail.com> wrote:
>
>> We have been staggering the Arrow syncs by 1 week, also on Wednesdays
>> at 9am PT. If you are going to have the next Parquet sync on 1/10, we
>> would have the next Arrow sync on 1/17. Let me know what you prefer
>>
>> On Thu, Jan 4, 2018 at 4:10 AM, Lars Volker <l...@cloudera.com> wrote:
>> > 1/10 would work for me.
>> >
>> > On Thu, Jan 4, 2018 at 3:22 AM, Julien Le Dem <julien.le...@gmail.com>
>> > wrote:
>> >
>> >> Any day of the week/time preference for the next Parquet sync?
>> >> It is usually held at 9am PT (5pm UTC) on a Wednesday.
>> >>
>>
>


Re: Next parquet sync

2018-01-04 Thread Zoltan Ivanfi
Hi,

According to the latest results of the availability poll, Tuesdays seems to
work for slightly more people than Wednesdays. I'll try to post the chart
below, let's see whether the mailing list allows it or removes it:
[image: pasted1]

I would suggest to either use Tuesdays or alternate between Tuesdays and
Wednesdays (since the group of 9 Tuesday voters does not contain all 8
Wednesday voters). The last sync was on Tuesdays, so the next can be on
Wednesday if you would like to follow this alternating scheme.

Best regards,

Zoltan

On Thu, Jan 4, 2018 at 4:27 PM Wes McKinney <wesmck...@gmail.com> wrote:

> We have been staggering the Arrow syncs by 1 week, also on Wednesdays
> at 9am PT. If you are going to have the next Parquet sync on 1/10, we
> would have the next Arrow sync on 1/17. Let me know what you prefer
>
> On Thu, Jan 4, 2018 at 4:10 AM, Lars Volker <l...@cloudera.com> wrote:
> > 1/10 would work for me.
> >
> > On Thu, Jan 4, 2018 at 3:22 AM, Julien Le Dem <julien.le...@gmail.com>
> > wrote:
> >
> >> Any day of the week/time preference for the next Parquet sync?
> >> It is usually held at 9am PT (5pm UTC) on a Wednesday.
> >>
>


Re: Next parquet sync

2018-01-04 Thread Lars Volker
1/10 would work for me.

On Thu, Jan 4, 2018 at 3:22 AM, Julien Le Dem <julien.le...@gmail.com>
wrote:

> Any day of the week/time preference for the next Parquet sync?
> It is usually held at 9am PT (5pm UTC) on a Wednesday.
>


Next parquet sync

2018-01-03 Thread Julien Le Dem
Any day of the week/time preference for the next Parquet sync?
It is usually held at 9am PT (5pm UTC) on a Wednesday.


Next Parquet Sync on Tue, Dec 19th, 9am PST

2017-12-08 Thread Lars Volker
Hi All,

In the last Parquet sync we scheduled the next one for Dec 19th, 9am PST. I
just sent out an invite to everyone who was on the last invite. If you
would like to receive an invite, too, please reply to this email.

If that day and time does not work for you, please speak up.

Cheers, Lars


Next Parquet Sync on Wednesday, Dec 6th, 9am PST

2017-11-30 Thread Lars Volker
Hi All,

In the last Hangout we seemed to agree to have the next sync in two weeks.
I have just sent out a meeting request for next Wednesday, Dec 6th at 9am
PST to everyone who was on the last invite. If you're not yet on the invite
and would like to join, please reply to this email and I'll be happy to add
you.

If the date doesn't work for you and you'd like us to find another time,
please speak up.

Cheers, Lars


Re: Date and time for next parquet sync

2017-09-29 Thread Lars Volker
I added you to the invite.

On Thu, Sep 28, 2017 at 9:48 PM, Santlal J Gupta <
santlal.gu...@bitwiseglobal.com> wrote:

> Yes I want to join.
>
> -Original Message-
> From: Lars Volker [mailto:l...@cloudera.com]
> Sent: Thursday, September 28, 2017 8:40 PM
> To: dev@parquet.apache.org
> Subject: Date and time for next parquet sync
>
> I sent out an meeting request for the next Parquet sync on Wednesday,
> October 11th at 9am PST. Please reply to this email if you'd like to join
> and found yourself not on the invite yet.
>


RE: Date and time for next parquet sync

2017-09-28 Thread Santlal J Gupta
Yes I want to join.

-Original Message-
From: Lars Volker [mailto:l...@cloudera.com] 
Sent: Thursday, September 28, 2017 8:40 PM
To: dev@parquet.apache.org
Subject: Date and time for next parquet sync

I sent out an meeting request for the next Parquet sync on Wednesday, October 
11th at 9am PST. Please reply to this email if you'd like to join and found 
yourself not on the invite yet.


Date and time for next parquet sync

2017-09-28 Thread Lars Volker
I sent out an meeting request for the next Parquet sync on Wednesday,
October 11th at 9am PST. Please reply to this email if you'd like to join
and found yourself not on the invite yet.


Re: Date and time for next Parquet Sync

2017-09-13 Thread Julien Le Dem
Notes:
Parquet Sync Sept 13 2017:

Lars (Impala Cloudera - CA): want feedback on Puja’s pull request for page
index
Anna (Cloudera - Hungary)
Jim (Cloudera - CA): Bloom Filters
Ryan (Netflix - CA): parquet-cli zstd/lz4 to try out. Parquet format
release, logical type PR.
Junjie (Intel - Shanghai): Bloom filter status
Bikramjeet (Cloudera Impala - CA): clarify specification for column stats
and type for min/max storage
Wes (Twosigma - NY): C++
Julien (CA): patch release of parquet-mr

TZs: GMT-8, GMT-5, GMT+1, GMT+8
Time: 9am (SF), 12am (NY), 6pm (Budapest), 1am (Shanghai) !

 - Bloom Filter:
- Junjie submitted pull request for parquet-format and parquet-mr. bloom
filter utility + tests.
- https://github.com/apache/parquet-format/pull/62/files
- not to be merged right away but feedback
- https://github.com/apache/parquet-mr/pull/425/files
- to move to package protected or tests to start incremental merge
without making it public
- Need review: Ryan, Julien, Jim
- compatibility, integration tests?
- old compatibility test repo:
https://github.com/Parquet/parquet-compatibility
- Arrow integration tests:
https://github.com/apache/arrow/tree/master/integration
- Action: Anna, Lars to follow up with Cloudera

Build: travis-ci broken with latest linux thrift-7 incompatibility
 - parquet-mr should move to thrift-9: PARQUET-1103
 - pin thrift to fixed version in build like in parquet-format.

 - Page Index: https://github.com/apache/parquet-format/pull/63
   - Action review by end of next week: Julien, Ryan, Marcel
   - TODO (Lars?): move design doc to markdown in parquet-format
   - should add (brief) comments in thrift definition (clarify in review)

 - zstd/lz4:
   - Ryan has e version of parquet-cli working with zstd, lz4 and brotli
for experimentation
   - building with zstd backported was difficult. (provides hadoop jar)
   - anyone interested in running their own tests?
   - Lars to check at Cloudera.
   - Ryan to send out on the list
   - Wes built benchmarking fixtures in Cpp. todo write tests.
   - use some shareable dataset for validation (NY Taxi dataset?).

 - Logical type PR: https://github.com/apache/parquet-format/pull/51
- TODO: feedback
- reviewers: Julien

 - clarification of min max storage:
   -
https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L215
   - format of min and max values is the same as defined by the type.

- making releases:
  - want a parquet-format release for:
- logical types (not merged yet)
- page indexes (not merged yet)
- sort order (merged)
  - we won’t block on bloom filter. We can make another release as soon as
it is ready.
  - Ryan to run the parquet-format release.
  - need volunteer for parquet-mr release.



On Wed, Sep 13, 2017 at 8:58 AM, Julien Le Dem <julien.le...@gmail.com>
wrote:

> The Parquet sync is starting now at:
> https://meet.google.com/ent-mvhf-twr
>
> On Tue, Sep 12, 2017 at 8:55 PM, Julien Le Dem <julien.le...@gmail.com>
> wrote:
>
>> +1
>>
>> On Mon, Sep 11, 2017 at 8:36 PM, Lars Volker <l...@cloudera.com> wrote:
>>
>>> There were no objections so I sent out a meeting invite to everyone who
>>> was
>>> on the last invite. If you'd like to participate, too, please reply to
>>> this
>>> email.
>>>
>>> Cheers, Lars
>>>
>>> On Mon, Sep 11, 2017 at 11:06 AM, Ryan Blue <rb...@netflix.com.invalid>
>>> wrote:
>>>
>>> > That works for me.
>>> >
>>> > On Mon, Sep 11, 2017 at 7:55 AM, Lars Volker <l...@cloudera.com> wrote:
>>> >
>>> > > Hi All,
>>> > >
>>> > > I'd like to propose to have the next Parquet Sync on Wednesday, Sep
>>> 13th,
>>> > > at 9am PST. Possible topics would be the pull request to add a page
>>> index
>>> > > to the format, ongoing work on bloom filters.
>>> > >
>>> > > If Wednesday does not work for you, please propose another date and
>>> time.
>>> > > Otherwise I'll send out a MR later today.
>>> > >
>>> > > Cheers, Lars
>>> > >
>>> >
>>> >
>>> >
>>> > --
>>> > Ryan Blue
>>> > Software Engineer
>>> > Netflix
>>> >
>>>
>>
>>
>


Re: Date and time for next Parquet Sync

2017-09-13 Thread Julien Le Dem
The Parquet sync is starting now at:
https://meet.google.com/ent-mvhf-twr

On Tue, Sep 12, 2017 at 8:55 PM, Julien Le Dem <julien.le...@gmail.com>
wrote:

> +1
>
> On Mon, Sep 11, 2017 at 8:36 PM, Lars Volker <l...@cloudera.com> wrote:
>
>> There were no objections so I sent out a meeting invite to everyone who
>> was
>> on the last invite. If you'd like to participate, too, please reply to
>> this
>> email.
>>
>> Cheers, Lars
>>
>> On Mon, Sep 11, 2017 at 11:06 AM, Ryan Blue <rb...@netflix.com.invalid>
>> wrote:
>>
>> > That works for me.
>> >
>> > On Mon, Sep 11, 2017 at 7:55 AM, Lars Volker <l...@cloudera.com> wrote:
>> >
>> > > Hi All,
>> > >
>> > > I'd like to propose to have the next Parquet Sync on Wednesday, Sep
>> 13th,
>> > > at 9am PST. Possible topics would be the pull request to add a page
>> index
>> > > to the format, ongoing work on bloom filters.
>> > >
>> > > If Wednesday does not work for you, please propose another date and
>> time.
>> > > Otherwise I'll send out a MR later today.
>> > >
>> > > Cheers, Lars
>> > >
>> >
>> >
>> >
>> > --
>> > Ryan Blue
>> > Software Engineer
>> > Netflix
>> >
>>
>
>


Re: Date and time for next Parquet Sync

2017-09-12 Thread Julien Le Dem
+1

On Mon, Sep 11, 2017 at 8:36 PM, Lars Volker <l...@cloudera.com> wrote:

> There were no objections so I sent out a meeting invite to everyone who was
> on the last invite. If you'd like to participate, too, please reply to this
> email.
>
> Cheers, Lars
>
> On Mon, Sep 11, 2017 at 11:06 AM, Ryan Blue <rb...@netflix.com.invalid>
> wrote:
>
> > That works for me.
> >
> > On Mon, Sep 11, 2017 at 7:55 AM, Lars Volker <l...@cloudera.com> wrote:
> >
> > > Hi All,
> > >
> > > I'd like to propose to have the next Parquet Sync on Wednesday, Sep
> 13th,
> > > at 9am PST. Possible topics would be the pull request to add a page
> index
> > > to the format, ongoing work on bloom filters.
> > >
> > > If Wednesday does not work for you, please propose another date and
> time.
> > > Otherwise I'll send out a MR later today.
> > >
> > > Cheers, Lars
> > >
> >
> >
> >
> > --
> > Ryan Blue
> > Software Engineer
> > Netflix
> >
>


Re: Date and time for next Parquet Sync

2017-09-11 Thread Ryan Blue
That works for me.

On Mon, Sep 11, 2017 at 7:55 AM, Lars Volker <l...@cloudera.com> wrote:

> Hi All,
>
> I'd like to propose to have the next Parquet Sync on Wednesday, Sep 13th,
> at 9am PST. Possible topics would be the pull request to add a page index
> to the format, ongoing work on bloom filters.
>
> If Wednesday does not work for you, please propose another date and time.
> Otherwise I'll send out a MR later today.
>
> Cheers, Lars
>



-- 
Ryan Blue
Software Engineer
Netflix


Re: Next parquet sync

2017-07-19 Thread 俊杰陈
Hi,
People in shanghai should OK with PST 4PM or 5PM.

2017-07-20 2:40 GMT+08:00 Julien Le Dem <jul...@ledem.net>:

> The next Parquet sync will be Wednesday 8/2 at 9am PT on google hangout.
> We had attendees from Shanghai this time which makes it midnight their
> time.
> Reply to this email if you’d like a different day/time
> Julien




-- 
Thanks & Best Regards


Next parquet sync

2017-07-19 Thread Julien Le Dem
The next Parquet sync will be Wednesday 8/2 at 9am PT on google hangout.
We had attendees from Shanghai this time which makes it midnight their time.
Reply to this email if you’d like a different day/time
Julien

Next Parquet sync

2017-04-12 Thread Julien Le Dem
The next sync will be April 26th at 10am PT on google hangout
https://plus.google.com/hangouts/_/dremio.com/parquet-sync-up

-- 
Julien


Re: Next Parquet sync

2017-03-24 Thread Julien Le Dem
Notes:
Attendees and agenda:
Julien (Dremio):
  - Adding page metadata in the footer.
  - Sort min/max
Zoltan (Cloudera)
Ryan (Netflix):
  - Sort min/max
Deepak (Vertica):
  - Timestamp: progress.
Lars (Impala):
  - Min/Max.
  - page level
Marcel (Impala):
  - timestamp: micros vs millis
  - index pages for sorted pages
  - augment process: use google docs for improved discussion through
comments.
Uwe (Blue yonder):
Wes (Twosigma):
 - parquet-cpp 1.0 release: improvement in arrow, update timestamp
integration
 - share code with impala team on metadata.

- proposition to use google doc to discuss spec.
  - link in JIRA description by default
  - open for comments to everyone.
  - on demand add edit rights
  - one person responsible for curating and resolving comments.

- page metadata in footer / index pages when files are sorted
  - page metadata: PARQUET-907
  - TODO: Julien to create a google doc for spec (linked in jira).
  - page index: PARQUET-922
  - ISAM
  - binary search on
  - single lookup or single range scan
  - TODO: Marcel to create a google doc to discuss the spec (linked in
JIRA)

- min/max metadata PR: https://github.com/apache/parquet-format/pull/46
  - TODO: Ryan to update PR with Union approach

- Timestamp
  - micros vs millis:
 - micros is wide enough a range that it could be the default and
supersede millis
 - add in the doc that micros is the default for Timestamp.
 - enforce that timestamp mills can always be converted to  micros by
restricting the year range
 - TODO: PR to update doc specifying micros is preferred and mills is
restricted.

- parquet-cpp 1.0 arrow integration regarding timestamps
- once arrow 0.3 is out we should have a corresponding parquet-cpp
release.
- c++ code sharing with impala
- Impala team to discuss internally the best way to reuse code.










On Wed, Mar 8, 2017 at 11:53 AM, Julien Le Dem <jul...@dremio.com> wrote:

> The next Parquet sync will be Wednesday March 22nd at 10:00am PT on
> google hangout:
> https://plus.google.com/hangouts/_/dremio.com/parquet-sync-up
>
> --
> Julien
>



-- 
Julien


Next Parquet sync

2017-03-08 Thread Julien Le Dem
The next Parquet sync will be Wednesday March 22nd at 10:00am PT on google
hangout:
https://plus.google.com/hangouts/_/dremio.com/parquet-sync-up

-- 
Julien


Re: Next parquet sync

2017-02-02 Thread Lars Volker
Thanks Uwe and Julien for the information. I'm looking forward to it!

On Thu, Feb 2, 2017 at 5:35 PM, Julien Le Dem <jul...@dremio.com> wrote:

> As Uwe mentioned everybody is welcome.
>
> The next Parquet sync will be Monday 2/6 10am PT on Google hangout:
> https://plus.google.com/hangouts/_/dremio.com/parquet-sync-up
>
> If there is more than one of you in the same location I'd recommend sharing
> the connection.
> The sync is every other week, lasts one hour and goes as follows:
>  - go around the "table" for everyone to quickly introduce themselves and
> state the agenda items they'd want discussed (if any). It could be letting
> others know of what they're planning to work on, helping reaching a
> consensus on a JIRA, reminding people to review something that's important
> to them...
>  - once the agenda is built from this first round we go over each item in
> order.
>  - at the end notes are sent to the list. They usually have a list of
> action items (follow up on jira, review PR #x, ...) and resolved/unresolved
> discussion points.
>
> Generally, discussions happen on the mailing list, JIRA or github PRs and
> the sync helps getting those to conclusion faster.
>
> On Thu, Feb 2, 2017 at 6:50 AM, Uwe L. Korn <uw...@xhochy.com> wrote:
>
> > Sure! Everyone interested about Parquet is welcome!
> >
> > On Thu, Feb 2, 2017, at 03:27 PM, Lars Volker wrote:
> > > I'm one of the developers currently working on statistics support in
> > > Impala. Would it be ok if I join the hangout, too?
> > >
> > > Best wishes, Lars
> > >
> > > On Tue, Jan 31, 2017 at 10:40 PM, Julien Le Dem <jul...@dremio.com>
> > > wrote:
> > >
> > > > Actually Friday 10am doesn't work for me.
> > > > I'll schedule Monday 2/6 10 am PT
> > > >
> > > >
> > > > On Tue, Jan 31, 2017 at 11:55 AM, Wes McKinney <wesmck...@gmail.com>
> > > > wrote:
> > > >
> > > > > Julien? Let us try to do Friday 2/3 at 10AM PT if possible.
> > > > >
> > > > > On Mon, Jan 30, 2017 at 12:34 PM, Ryan Blue
> > <rb...@netflix.com.invalid>
> > > > > wrote:
> > > > > > Both work for me.
> > > > > >
> > > > > > On Sun, Jan 29, 2017 at 11:38 PM, Uwe L. Korn <uw...@xhochy.com>
> > > > wrote:
> > > > > >
> > > > > >> Both dates are fine for me, too
> > > > > >>
> > > > > >> > Am 30.01.2017 um 04:15 schrieb Wes McKinney <
> > wesmck...@gmail.com>:
> > > > > >> >
> > > > > >> > Does Monday 2/6 work? We could also do this coming Friday 2/3
> > > > > >> >
> > > > > >> >> On Sat, Jan 28, 2017 at 1:30 AM, Julien Le Dem <
> > jul...@ledem.net>
> > > > > >> wrote:
> > > > > >> >> Happy to move
> > > > > >> >> What day would work?
> > > > > >> >> Julien
> > > > > >> >>
> > > > > >> >>> On Jan 26, 2017, at 19:45, Wes McKinney <
> wesmck...@gmail.com>
> > > > > wrote:
> > > > > >> >>>
> > > > > >> >>> This falls during Spark Summit East -- not sure if anyone
> else
> > > > has a
> > > > > >> >>> conflict with this
> > > > > >> >>>
> > > > > >> >>>> On Thu, Jan 26, 2017 at 7:02 PM, Julien Le Dem <
> > > > jul...@dremio.com>
> > > > > >> wrote:
> > > > > >> >>>> Next parquet sync will happen Thursday February 9th at 10am
> > PT on
> > > > > >> google
> > > > > >> >>>> hangout
> > > > > >> >>>> https://plus.google.com/hangouts/_/dremio.com/parquet-
> > sync-up
> > > > > >> >>>> notes will be sent on the list
> > > > > >> >>>>
> > > > > >> >>>> --
> > > > > >> >>>> Julien
> > > > > >>
> > > > > >>
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Ryan Blue
> > > > > > Software Engineer
> > > > > > Netflix
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Julien
> > > >
> >
>
>
>
> --
> Julien
>


Re: Next parquet sync

2017-02-02 Thread Julien Le Dem
I Zoltan,
This is a one time change.
Sorry you won’t be able to join this one.
We can either keep the Thursday 10am PT slot in the future or start a new 
thread to find another time.
Unless someone starts a new thread I’ll keep it as is.

> On Feb 2, 2017, at 8:48 AM, Zoltan Ivanfi <z...@cloudera.com> wrote:
> 
> Hi,
> 
> Is this Monday timeslot a one-time change or a regular one? Sadly this
> timeslot does not work for me and I would be sad if I had to miss all
> future syncs.
> 
> Zoltan
> 
> On Thu, Feb 2, 2017 at 5:36 PM Julien Le Dem <jul...@dremio.com> wrote:
> 
>> As Uwe mentioned everybody is welcome.
>> 
>> The next Parquet sync will be Monday 2/6 10am PT on Google hangout:
>> https://plus.google.com/hangouts/_/dremio.com/parquet-sync-up
>> 
>> If there is more than one of you in the same location I'd recommend sharing
>> the connection.
>> The sync is every other week, lasts one hour and goes as follows:
>> - go around the "table" for everyone to quickly introduce themselves and
>> state the agenda items they'd want discussed (if any). It could be letting
>> others know of what they're planning to work on, helping reaching a
>> consensus on a JIRA, reminding people to review something that's important
>> to them...
>> - once the agenda is built from this first round we go over each item in
>> order.
>> - at the end notes are sent to the list. They usually have a list of
>> action items (follow up on jira, review PR #x, ...) and resolved/unresolved
>> discussion points.
>> 
>> Generally, discussions happen on the mailing list, JIRA or github PRs and
>> the sync helps getting those to conclusion faster.
>> 
>> On Thu, Feb 2, 2017 at 6:50 AM, Uwe L. Korn <uw...@xhochy.com> wrote:
>> 
>>> Sure! Everyone interested about Parquet is welcome!
>>> 
>>> On Thu, Feb 2, 2017, at 03:27 PM, Lars Volker wrote:
>>>> I'm one of the developers currently working on statistics support in
>>>> Impala. Would it be ok if I join the hangout, too?
>>>> 
>>>> Best wishes, Lars
>>>> 
>>>> On Tue, Jan 31, 2017 at 10:40 PM, Julien Le Dem <jul...@dremio.com>
>>>> wrote:
>>>> 
>>>>> Actually Friday 10am doesn't work for me.
>>>>> I'll schedule Monday 2/6 10 am PT
>>>>> 
>>>>> 
>>>>> On Tue, Jan 31, 2017 at 11:55 AM, Wes McKinney <wesmck...@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> Julien? Let us try to do Friday 2/3 at 10AM PT if possible.
>>>>>> 
>>>>>> On Mon, Jan 30, 2017 at 12:34 PM, Ryan Blue
>>> <rb...@netflix.com.invalid>
>>>>>> wrote:
>>>>>>> Both work for me.
>>>>>>> 
>>>>>>> On Sun, Jan 29, 2017 at 11:38 PM, Uwe L. Korn <uw...@xhochy.com>
>>>>> wrote:
>>>>>>> 
>>>>>>>> Both dates are fine for me, too
>>>>>>>> 
>>>>>>>>> Am 30.01.2017 um 04:15 schrieb Wes McKinney <
>>> wesmck...@gmail.com>:
>>>>>>>>> 
>>>>>>>>> Does Monday 2/6 work? We could also do this coming Friday 2/3
>>>>>>>>> 
>>>>>>>>>> On Sat, Jan 28, 2017 at 1:30 AM, Julien Le Dem <
>>> jul...@ledem.net>
>>>>>>>> wrote:
>>>>>>>>>> Happy to move
>>>>>>>>>> What day would work?
>>>>>>>>>> Julien
>>>>>>>>>> 
>>>>>>>>>>> On Jan 26, 2017, at 19:45, Wes McKinney <
>> wesmck...@gmail.com>
>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> This falls during Spark Summit East -- not sure if anyone
>> else
>>>>> has a
>>>>>>>>>>> conflict with this
>>>>>>>>>>> 
>>>>>>>>>>>> On Thu, Jan 26, 2017 at 7:02 PM, Julien Le Dem <
>>>>> jul...@dremio.com>
>>>>>>>> wrote:
>>>>>>>>>>>> Next parquet sync will happen Thursday February 9th at 10am
>>> PT on
>>>>>>>> google
>>>>>>>>>>>> hangout
>>>>>>>>>>>> https://plus.google.com/hangouts/_/dremio.com/parquet-
>>> sync-up
>>>>>>>>>>>> notes will be sent on the list
>>>>>>>>>>>> 
>>>>>>>>>>>> --
>>>>>>>>>>>> Julien
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Ryan Blue
>>>>>>> Software Engineer
>>>>>>> Netflix
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Julien
>>>>> 
>>> 
>> 
>> 
>> 
>> --
>> Julien
>> 



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: Next parquet sync

2017-02-02 Thread Zoltan Ivanfi
Hi,

Is this Monday timeslot a one-time change or a regular one? Sadly this
timeslot does not work for me and I would be sad if I had to miss all
future syncs.

Zoltan

On Thu, Feb 2, 2017 at 5:36 PM Julien Le Dem <jul...@dremio.com> wrote:

> As Uwe mentioned everybody is welcome.
>
> The next Parquet sync will be Monday 2/6 10am PT on Google hangout:
> https://plus.google.com/hangouts/_/dremio.com/parquet-sync-up
>
> If there is more than one of you in the same location I'd recommend sharing
> the connection.
> The sync is every other week, lasts one hour and goes as follows:
>  - go around the "table" for everyone to quickly introduce themselves and
> state the agenda items they'd want discussed (if any). It could be letting
> others know of what they're planning to work on, helping reaching a
> consensus on a JIRA, reminding people to review something that's important
> to them...
>  - once the agenda is built from this first round we go over each item in
> order.
>  - at the end notes are sent to the list. They usually have a list of
> action items (follow up on jira, review PR #x, ...) and resolved/unresolved
> discussion points.
>
> Generally, discussions happen on the mailing list, JIRA or github PRs and
> the sync helps getting those to conclusion faster.
>
> On Thu, Feb 2, 2017 at 6:50 AM, Uwe L. Korn <uw...@xhochy.com> wrote:
>
> > Sure! Everyone interested about Parquet is welcome!
> >
> > On Thu, Feb 2, 2017, at 03:27 PM, Lars Volker wrote:
> > > I'm one of the developers currently working on statistics support in
> > > Impala. Would it be ok if I join the hangout, too?
> > >
> > > Best wishes, Lars
> > >
> > > On Tue, Jan 31, 2017 at 10:40 PM, Julien Le Dem <jul...@dremio.com>
> > > wrote:
> > >
> > > > Actually Friday 10am doesn't work for me.
> > > > I'll schedule Monday 2/6 10 am PT
> > > >
> > > >
> > > > On Tue, Jan 31, 2017 at 11:55 AM, Wes McKinney <wesmck...@gmail.com>
> > > > wrote:
> > > >
> > > > > Julien? Let us try to do Friday 2/3 at 10AM PT if possible.
> > > > >
> > > > > On Mon, Jan 30, 2017 at 12:34 PM, Ryan Blue
> > <rb...@netflix.com.invalid>
> > > > > wrote:
> > > > > > Both work for me.
> > > > > >
> > > > > > On Sun, Jan 29, 2017 at 11:38 PM, Uwe L. Korn <uw...@xhochy.com>
> > > > wrote:
> > > > > >
> > > > > >> Both dates are fine for me, too
> > > > > >>
> > > > > >> > Am 30.01.2017 um 04:15 schrieb Wes McKinney <
> > wesmck...@gmail.com>:
> > > > > >> >
> > > > > >> > Does Monday 2/6 work? We could also do this coming Friday 2/3
> > > > > >> >
> > > > > >> >> On Sat, Jan 28, 2017 at 1:30 AM, Julien Le Dem <
> > jul...@ledem.net>
> > > > > >> wrote:
> > > > > >> >> Happy to move
> > > > > >> >> What day would work?
> > > > > >> >> Julien
> > > > > >> >>
> > > > > >> >>> On Jan 26, 2017, at 19:45, Wes McKinney <
> wesmck...@gmail.com>
> > > > > wrote:
> > > > > >> >>>
> > > > > >> >>> This falls during Spark Summit East -- not sure if anyone
> else
> > > > has a
> > > > > >> >>> conflict with this
> > > > > >> >>>
> > > > > >> >>>> On Thu, Jan 26, 2017 at 7:02 PM, Julien Le Dem <
> > > > jul...@dremio.com>
> > > > > >> wrote:
> > > > > >> >>>> Next parquet sync will happen Thursday February 9th at 10am
> > PT on
> > > > > >> google
> > > > > >> >>>> hangout
> > > > > >> >>>> https://plus.google.com/hangouts/_/dremio.com/parquet-
> > sync-up
> > > > > >> >>>> notes will be sent on the list
> > > > > >> >>>>
> > > > > >> >>>> --
> > > > > >> >>>> Julien
> > > > > >>
> > > > > >>
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Ryan Blue
> > > > > > Software Engineer
> > > > > > Netflix
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Julien
> > > >
> >
>
>
>
> --
> Julien
>


Re: Next parquet sync

2017-02-02 Thread Uwe L. Korn
Sure! Everyone interested about Parquet is welcome!

On Thu, Feb 2, 2017, at 03:27 PM, Lars Volker wrote:
> I'm one of the developers currently working on statistics support in
> Impala. Would it be ok if I join the hangout, too?
> 
> Best wishes, Lars
> 
> On Tue, Jan 31, 2017 at 10:40 PM, Julien Le Dem <jul...@dremio.com>
> wrote:
> 
> > Actually Friday 10am doesn't work for me.
> > I'll schedule Monday 2/6 10 am PT
> >
> >
> > On Tue, Jan 31, 2017 at 11:55 AM, Wes McKinney <wesmck...@gmail.com>
> > wrote:
> >
> > > Julien? Let us try to do Friday 2/3 at 10AM PT if possible.
> > >
> > > On Mon, Jan 30, 2017 at 12:34 PM, Ryan Blue <rb...@netflix.com.invalid>
> > > wrote:
> > > > Both work for me.
> > > >
> > > > On Sun, Jan 29, 2017 at 11:38 PM, Uwe L. Korn <uw...@xhochy.com>
> > wrote:
> > > >
> > > >> Both dates are fine for me, too
> > > >>
> > > >> > Am 30.01.2017 um 04:15 schrieb Wes McKinney <wesmck...@gmail.com>:
> > > >> >
> > > >> > Does Monday 2/6 work? We could also do this coming Friday 2/3
> > > >> >
> > > >> >> On Sat, Jan 28, 2017 at 1:30 AM, Julien Le Dem <jul...@ledem.net>
> > > >> wrote:
> > > >> >> Happy to move
> > > >> >> What day would work?
> > > >> >> Julien
> > > >> >>
> > > >> >>> On Jan 26, 2017, at 19:45, Wes McKinney <wesmck...@gmail.com>
> > > wrote:
> > > >> >>>
> > > >> >>> This falls during Spark Summit East -- not sure if anyone else
> > has a
> > > >> >>> conflict with this
> > > >> >>>
> > > >> >>>> On Thu, Jan 26, 2017 at 7:02 PM, Julien Le Dem <
> > jul...@dremio.com>
> > > >> wrote:
> > > >> >>>> Next parquet sync will happen Thursday February 9th at 10am PT on
> > > >> google
> > > >> >>>> hangout
> > > >> >>>> https://plus.google.com/hangouts/_/dremio.com/parquet-sync-up
> > > >> >>>> notes will be sent on the list
> > > >> >>>>
> > > >> >>>> --
> > > >> >>>> Julien
> > > >>
> > > >>
> > > >
> > > >
> > > > --
> > > > Ryan Blue
> > > > Software Engineer
> > > > Netflix
> > >
> >
> >
> >
> > --
> > Julien
> >


Re: Next parquet sync

2017-02-02 Thread Lars Volker
I'm one of the developers currently working on statistics support in
Impala. Would it be ok if I join the hangout, too?

Best wishes, Lars

On Tue, Jan 31, 2017 at 10:40 PM, Julien Le Dem <jul...@dremio.com> wrote:

> Actually Friday 10am doesn't work for me.
> I'll schedule Monday 2/6 10 am PT
>
>
> On Tue, Jan 31, 2017 at 11:55 AM, Wes McKinney <wesmck...@gmail.com>
> wrote:
>
> > Julien? Let us try to do Friday 2/3 at 10AM PT if possible.
> >
> > On Mon, Jan 30, 2017 at 12:34 PM, Ryan Blue <rb...@netflix.com.invalid>
> > wrote:
> > > Both work for me.
> > >
> > > On Sun, Jan 29, 2017 at 11:38 PM, Uwe L. Korn <uw...@xhochy.com>
> wrote:
> > >
> > >> Both dates are fine for me, too
> > >>
> > >> > Am 30.01.2017 um 04:15 schrieb Wes McKinney <wesmck...@gmail.com>:
> > >> >
> > >> > Does Monday 2/6 work? We could also do this coming Friday 2/3
> > >> >
> > >> >> On Sat, Jan 28, 2017 at 1:30 AM, Julien Le Dem <jul...@ledem.net>
> > >> wrote:
> > >> >> Happy to move
> > >> >> What day would work?
> > >> >> Julien
> > >> >>
> > >> >>> On Jan 26, 2017, at 19:45, Wes McKinney <wesmck...@gmail.com>
> > wrote:
> > >> >>>
> > >> >>> This falls during Spark Summit East -- not sure if anyone else
> has a
> > >> >>> conflict with this
> > >> >>>
> > >> >>>> On Thu, Jan 26, 2017 at 7:02 PM, Julien Le Dem <
> jul...@dremio.com>
> > >> wrote:
> > >> >>>> Next parquet sync will happen Thursday February 9th at 10am PT on
> > >> google
> > >> >>>> hangout
> > >> >>>> https://plus.google.com/hangouts/_/dremio.com/parquet-sync-up
> > >> >>>> notes will be sent on the list
> > >> >>>>
> > >> >>>> --
> > >> >>>> Julien
> > >>
> > >>
> > >
> > >
> > > --
> > > Ryan Blue
> > > Software Engineer
> > > Netflix
> >
>
>
>
> --
> Julien
>


Re: Next parquet sync

2017-01-31 Thread Julien Le Dem
Actually Friday 10am doesn't work for me.
I'll schedule Monday 2/6 10 am PT


On Tue, Jan 31, 2017 at 11:55 AM, Wes McKinney <wesmck...@gmail.com> wrote:

> Julien? Let us try to do Friday 2/3 at 10AM PT if possible.
>
> On Mon, Jan 30, 2017 at 12:34 PM, Ryan Blue <rb...@netflix.com.invalid>
> wrote:
> > Both work for me.
> >
> > On Sun, Jan 29, 2017 at 11:38 PM, Uwe L. Korn <uw...@xhochy.com> wrote:
> >
> >> Both dates are fine for me, too
> >>
> >> > Am 30.01.2017 um 04:15 schrieb Wes McKinney <wesmck...@gmail.com>:
> >> >
> >> > Does Monday 2/6 work? We could also do this coming Friday 2/3
> >> >
> >> >> On Sat, Jan 28, 2017 at 1:30 AM, Julien Le Dem <jul...@ledem.net>
> >> wrote:
> >> >> Happy to move
> >> >> What day would work?
> >> >> Julien
> >> >>
> >> >>> On Jan 26, 2017, at 19:45, Wes McKinney <wesmck...@gmail.com>
> wrote:
> >> >>>
> >> >>> This falls during Spark Summit East -- not sure if anyone else has a
> >> >>> conflict with this
> >> >>>
> >> >>>> On Thu, Jan 26, 2017 at 7:02 PM, Julien Le Dem <jul...@dremio.com>
> >> wrote:
> >> >>>> Next parquet sync will happen Thursday February 9th at 10am PT on
> >> google
> >> >>>> hangout
> >> >>>> https://plus.google.com/hangouts/_/dremio.com/parquet-sync-up
> >> >>>> notes will be sent on the list
> >> >>>>
> >> >>>> --
> >> >>>> Julien
> >>
> >>
> >
> >
> > --
> > Ryan Blue
> > Software Engineer
> > Netflix
>



-- 
Julien


Re: Next parquet sync

2017-01-31 Thread Wes McKinney
Julien? Let us try to do Friday 2/3 at 10AM PT if possible.

On Mon, Jan 30, 2017 at 12:34 PM, Ryan Blue <rb...@netflix.com.invalid> wrote:
> Both work for me.
>
> On Sun, Jan 29, 2017 at 11:38 PM, Uwe L. Korn <uw...@xhochy.com> wrote:
>
>> Both dates are fine for me, too
>>
>> > Am 30.01.2017 um 04:15 schrieb Wes McKinney <wesmck...@gmail.com>:
>> >
>> > Does Monday 2/6 work? We could also do this coming Friday 2/3
>> >
>> >> On Sat, Jan 28, 2017 at 1:30 AM, Julien Le Dem <jul...@ledem.net>
>> wrote:
>> >> Happy to move
>> >> What day would work?
>> >> Julien
>> >>
>> >>> On Jan 26, 2017, at 19:45, Wes McKinney <wesmck...@gmail.com> wrote:
>> >>>
>> >>> This falls during Spark Summit East -- not sure if anyone else has a
>> >>> conflict with this
>> >>>
>> >>>> On Thu, Jan 26, 2017 at 7:02 PM, Julien Le Dem <jul...@dremio.com>
>> wrote:
>> >>>> Next parquet sync will happen Thursday February 9th at 10am PT on
>> google
>> >>>> hangout
>> >>>> https://plus.google.com/hangouts/_/dremio.com/parquet-sync-up
>> >>>> notes will be sent on the list
>> >>>>
>> >>>> --
>> >>>> Julien
>>
>>
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix


Re: Next parquet sync

2017-01-27 Thread Julien Le Dem
Happy to move
What day would work?
Julien

> On Jan 26, 2017, at 19:45, Wes McKinney <wesmck...@gmail.com> wrote:
> 
> This falls during Spark Summit East -- not sure if anyone else has a
> conflict with this
> 
>> On Thu, Jan 26, 2017 at 7:02 PM, Julien Le Dem <jul...@dremio.com> wrote:
>> Next parquet sync will happen Thursday February 9th at 10am PT on google
>> hangout
>> https://plus.google.com/hangouts/_/dremio.com/parquet-sync-up
>> notes will be sent on the list
>> 
>> --
>> Julien


Re: Next parquet sync

2017-01-26 Thread Wes McKinney
This falls during Spark Summit East -- not sure if anyone else has a
conflict with this

On Thu, Jan 26, 2017 at 7:02 PM, Julien Le Dem <jul...@dremio.com> wrote:
> Next parquet sync will happen Thursday February 9th at 10am PT on google
> hangout
> https://plus.google.com/hangouts/_/dremio.com/parquet-sync-up
> notes will be sent on the list
>
> --
> Julien


Next parquet sync

2017-01-26 Thread Julien Le Dem
Next parquet sync will happen Thursday February 9th at 10am PT on google
hangout
https://plus.google.com/hangouts/_/dremio.com/parquet-sync-up
notes will be sent on the list

-- 
Julien


Next Parquet sync up

2016-09-02 Thread Julien Le Dem
The next parquet sync up will be on hangout:
https://plus.google.com/hangouts/_/dremio.com/parquet-sync-up
Thu, September 8, 10am – 11am PT
http://timesched.pocoo.org/?date=2016-09-08=pacific-standard-time!=600,660
reply to this email if you wish to be added to a google calendar invite

As usual the agenda will be set at the beginning of the meeting by the
attendees.
Notes will be sent on the list afterwards.

-- 
Julien


About the time for next Parquet sync up (Jan 27th)

2016-01-19 Thread Liwei Lin
hi Julien~

Next sync up is scheduled at 10am PT, but it seems a little inconvient for
Asian attenders, since it's 2am UTC+8.

So I wonder if we could make it like 3pm or 4pm PT ? It's still OK if we
could not change it -- I'll try to stay up to 2am :-)


Below is a time mapping from PST to UTC+8, FYI , Thanks!


*PST* *UTC+8*
12:00 AM 4:00 PM
1:00 AM 5:00 PM
2:00 AM 6:00 PM
3:00 AM 7:00 PM
4:00 AM 8:00 PM
5:00 AM 9:00 PM
6:00 AM 10:00 PM
7:00 AM 11:00 PM
8:00 AM 12:00 AM
9:00 AM 1:00 AM
10:00 AM 2:00 AM
11:00 AM 3:00 AM
12:00 PM 4:00 AM
1:00 PM 5:00 AM
2:00 PM 6:00 AM
3:00 PM 7:00 AM
4:00 PM 8:00 AM
5:00 PM 9:00 AM
6:00 PM 10:00 AM
7:00 PM 11:00 AM
8:00 PM 12:00 PM
9:00 PM 1:00 PM
10:00 PM 2:00 PM
11:00 PM 3:00 PM


Re: Next Parquet sync up tomorrow Wednesday 10am PT on hangout

2015-09-02 Thread Julien Le Dem
Right!
Here are the notes:

Attendance: 
 - Daniel (Netflix)
 - Zhenxiao (Netflix)
 - Julien (Dremio)
 - Amit (Dremio)

Notes:
 - Use of the dictionary in Predicate push down: up to 40x faster for very 
selective queries observed at Netflix.
 - Vectorization (PARQUET-131)
   - Pull request #257 needs a review (Julien to review)
   - Netflix does the Presto integration
   - Dong Chen from Intel does the Hive integration.
 - ByteBuffer GSOC change 
  - Jason has merged master into the ByteBuffer branch: PR #267
  - Julien gave a first review.

At first glance it does *not* look like the vectorization and the ByteBuffer 
changes will conflict.


> On Aug 31, 2015, at 10:38 PM, Jacques Nadeau  wrote:
> 
> By Wednesday, you mean the day after tomorrow, right? :)
> 
> On Mon, Aug 31, 2015 at 10:29 PM, Julien Le Dem  wrote:
> 
>> Wed, September 2, 10:00 AM PDT
>> https://plus.google.com/hangouts/_/event/cob1rrt1spt1f15qbsfeqv51cmc
>> 
>> --
>> Julien
>> 
>> 
>> 
>> --
>> Julien
>> 



Fwd: Next Parquet sync up tomorrow Wednesday 10am PT on hangout

2015-08-31 Thread Julien Le Dem
Wed, September 2, 10:00 AM PDT
https://plus.google.com/hangouts/_/event/cob1rrt1spt1f15qbsfeqv51cmc

-- 
Julien



-- 
Julien


Re: Next Parquet sync up tomorrow Wednesday 10am PT on hangout

2015-08-31 Thread Jacques Nadeau
By Wednesday, you mean the day after tomorrow, right? :)

On Mon, Aug 31, 2015 at 10:29 PM, Julien Le Dem  wrote:

> Wed, September 2, 10:00 AM PDT
> https://plus.google.com/hangouts/_/event/cob1rrt1spt1f15qbsfeqv51cmc
>
> --
> Julien
>
>
>
> --
> Julien
>


Re: Next Parquet Sync Up

2015-07-25 Thread Ryan Blue

+1 Wednesday

On 07/22/2015 04:58 PM, Julien Le Dem wrote:

+1 Wednesday

On Wed, Jul 22, 2015 at 4:02 PM, Jason Altekruse altekruseja...@gmail.com
wrote:


+1 for wednesday

On Wed, Jul 22, 2015 at 3:47 PM, Jacques Nadeau jacq...@apache.org
wrote:


+1 for Wed.

On Wed, Jul 22, 2015 at 3:45 PM, Alex Levenson 
alexleven...@twitter.com.invalid wrote:


+1 for Wednesday

On Wed, Jul 22, 2015 at 3:44 PM, Julien Le Dem

jul...@twitter.com.invalid



wrote:


Wednesday then?
no more conflicts?

On Tue, Jul 21, 2015 at 7:26 PM, Alex Levenson 
alexleven...@twitter.com.invalid wrote:


Sorry to be difficult but, can I request any day other than Monday

--

how

about Wednesday?

On Tue, Jul 21, 2015 at 7:19 PM, Julien Le Dem jul...@ledem.net

wrote:



There's no particular reason for Tuesdays.
We could do the next one on a Monday.
Anybody objects?

Julien


On Jul 21, 2015, at 17:37, Jacques Nadeau jacq...@apache.org

wrote:


Any chance we can have these on either a different day or time?

The

Drill

hangout is every Tuesday at 10am so I always have to pick one

or

the

other.


On Tue, Jul 21, 2015 at 10:56 AM, Nezih Yigitbasi 
nyigitb...@netflix.com.invalid wrote:


An update to actions, I will create a PR for the vectorized

read

instead

of Zhenxiao.

Thanks,
Nezih

On Tue, Jul 21, 2015 at 10:51 AM, Julien Le Dem

jul...@twitter.com.invalid

wrote:


Agenda
- Julien (Twitter):
   - interested in ByteBuffer status
- Ryan (by email): interested in ByteBuffer status. did some

work

on

bloom

filters.
PARQUET-251 and PARQUET-246 make sure 2.0 encodings and other

new

features

are solid.
- Daniel, Nezih, Zhengxiao (Netflix):
- update on Vectorized read path for Presto (Dong Chen for

Hive)

- Parquet-99: OOM on write
- Ippokratis: Impala team.
- Jason Altekruse: (Drill/MapR)
   - update on Java direct memory representation (hadoop 2.0

ByteBuffer)

   - currently uses a fork of Parquet that uses the GSOC work.
- Tianshuo: 1.8.1 release.
- Sanjeev (Twitter):
  - want to hear updates about vectorized in Presto

actions:
  - Zhengxiao: update vectorization PR
  - Jason: update ByteBuffer PR
  - Jason: open JIRA for dic encoding fallback pointer
  - Daniel: opened a PR for PARQUET-99: up for review

Notes:
- Vectorized read path for Presto (Dong Chen for Hive)

PARQUET-131

   - batch read
   - lazy materialization
   - Netflix integrated with Presto, Dong Chen integrated

with

Hive

   - Nezih: micro/macro benchmark
- micro 2 read paths
  - only primitives, no converters (3 x faster

with

vectorized)
  - complex with converters (no different

performance)

- macro Presto :
  - complex types not better
  - 2x better for primitive types
   - Daniel: projection + predicate well optimized with

presto

(lazy

load, lazy materialization). predicate push down and using

dic

in

predicate

evaluation.
   - Ippokratis: fan out? = 100 values per collection,

list/map

materialization expansive

- Dictionary encoding: because of fallback mechanism. We

don't

know

when

the dictionary ends. = Jason to open a JIRA

- Parquet-99: OOM on write
   - all big rows: (10MB per row) runs OOM before we first

check

   - big variability in size: small initial rows throw off

estimate

and

following big rows blow memory
   - add settings for checking at constant #rows.
   - we should experiment with simpler strategies

- ByteBuffer status:
   - Jason need to rebase the PR
   - Parquet-77


On Tue, Jul 21, 2015 at 10:05 AM, Julien Le Dem 

jul...@twitter.com

wrote:


It's happening now:


https://plus.google.com/hangouts/_/twitter.com/parquet-sync-up


On Tue, Jul 14, 2015 at 10:04 AM, Julien Le Dem 

jul...@twitter.com



wrote:


The next Parquet sync up will be held on google hangout on

7/21/2015

at

10 am PST


https://plus.google.com/hangouts/_/twitter.com/parquet-sync-up








--
Alex Levenson
@THISWILLWORK







--
Alex Levenson
@THISWILLWORK










--
Ryan Blue
Software Engineer
Cloudera, Inc.


Re: Next Parquet Sync Up

2015-07-22 Thread Alex Levenson
+1 for Wednesday

On Wed, Jul 22, 2015 at 3:44 PM, Julien Le Dem jul...@twitter.com.invalid
wrote:

 Wednesday then?
 no more conflicts?

 On Tue, Jul 21, 2015 at 7:26 PM, Alex Levenson 
 alexleven...@twitter.com.invalid wrote:

  Sorry to be difficult but, can I request any day other than Monday -- how
  about Wednesday?
 
  On Tue, Jul 21, 2015 at 7:19 PM, Julien Le Dem jul...@ledem.net wrote:
 
   There's no particular reason for Tuesdays.
   We could do the next one on a Monday.
   Anybody objects?
  
   Julien
  
On Jul 21, 2015, at 17:37, Jacques Nadeau jacq...@apache.org
 wrote:
   
Any chance we can have these on either a different day or time?  The
   Drill
hangout is every Tuesday at 10am so I always have to pick one or the
   other.
   
On Tue, Jul 21, 2015 at 10:56 AM, Nezih Yigitbasi 
nyigitb...@netflix.com.invalid wrote:
   
An update to actions, I will create a PR for the vectorized read
   instead
of Zhenxiao.
   
Thanks,
Nezih
   
On Tue, Jul 21, 2015 at 10:51 AM, Julien Le Dem
   jul...@twitter.com.invalid
wrote:
   
Agenda
- Julien (Twitter):
  - interested in ByteBuffer status
- Ryan (by email): interested in ByteBuffer status. did some work
 on
bloom
filters.
PARQUET-251 and PARQUET-246 make sure 2.0 encodings and other new
features
are solid.
- Daniel, Nezih, Zhengxiao (Netflix):
   - update on Vectorized read path for Presto (Dong Chen for Hive)
   - Parquet-99: OOM on write
- Ippokratis: Impala team.
- Jason Altekruse: (Drill/MapR)
  - update on Java direct memory representation (hadoop 2.0
  ByteBuffer)
  - currently uses a fork of Parquet that uses the GSOC work.
- Tianshuo: 1.8.1 release.
- Sanjeev (Twitter):
 - want to hear updates about vectorized in Presto
   
actions:
 - Zhengxiao: update vectorization PR
 - Jason: update ByteBuffer PR
 - Jason: open JIRA for dic encoding fallback pointer
 - Daniel: opened a PR for PARQUET-99: up for review
   
Notes:
- Vectorized read path for Presto (Dong Chen for Hive) PARQUET-131
  - batch read
  - lazy materialization
  - Netflix integrated with Presto, Dong Chen integrated with
  Hive
  - Nezih: micro/macro benchmark
   - micro 2 read paths
 - only primitives, no converters (3 x faster with
vectorized)
 - complex with converters (no different
 performance)
   - macro Presto :
 - complex types not better
 - 2x better for primitive types
  - Daniel: projection + predicate well optimized with presto
  (lazy
load, lazy materialization). predicate push down and using dic in
predicate
evaluation.
  - Ippokratis: fan out? = 100 values per collection, list/map
materialization expansive
   
- Dictionary encoding: because of fallback mechanism. We don't know
   when
the dictionary ends. = Jason to open a JIRA
   
- Parquet-99: OOM on write
  - all big rows: (10MB per row) runs OOM before we first check
  - big variability in size: small initial rows throw off estimate
  and
following big rows blow memory
  - add settings for checking at constant #rows.
  - we should experiment with simpler strategies
   
- ByteBuffer status:
  - Jason need to rebase the PR
  - Parquet-77
   
   
On Tue, Jul 21, 2015 at 10:05 AM, Julien Le Dem 
 jul...@twitter.com
wrote:
   
It's happening now:
https://plus.google.com/hangouts/_/twitter.com/parquet-sync-up
   
On Tue, Jul 14, 2015 at 10:04 AM, Julien Le Dem 
 jul...@twitter.com
  
wrote:
   
The next Parquet sync up will be held on google hangout on
  7/21/2015
at
10 am PST
https://plus.google.com/hangouts/_/twitter.com/parquet-sync-up
   
  
 
 
 
  --
  Alex Levenson
  @THISWILLWORK
 




-- 
Alex Levenson
@THISWILLWORK


Re: Next Parquet Sync Up

2015-07-22 Thread Jacques Nadeau
+1 for Wed.

On Wed, Jul 22, 2015 at 3:45 PM, Alex Levenson 
alexleven...@twitter.com.invalid wrote:

 +1 for Wednesday

 On Wed, Jul 22, 2015 at 3:44 PM, Julien Le Dem jul...@twitter.com.invalid
 
 wrote:

  Wednesday then?
  no more conflicts?
 
  On Tue, Jul 21, 2015 at 7:26 PM, Alex Levenson 
  alexleven...@twitter.com.invalid wrote:
 
   Sorry to be difficult but, can I request any day other than Monday --
 how
   about Wednesday?
  
   On Tue, Jul 21, 2015 at 7:19 PM, Julien Le Dem jul...@ledem.net
 wrote:
  
There's no particular reason for Tuesdays.
We could do the next one on a Monday.
Anybody objects?
   
Julien
   
 On Jul 21, 2015, at 17:37, Jacques Nadeau jacq...@apache.org
  wrote:

 Any chance we can have these on either a different day or time?
 The
Drill
 hangout is every Tuesday at 10am so I always have to pick one or
 the
other.

 On Tue, Jul 21, 2015 at 10:56 AM, Nezih Yigitbasi 
 nyigitb...@netflix.com.invalid wrote:

 An update to actions, I will create a PR for the vectorized read
instead
 of Zhenxiao.

 Thanks,
 Nezih

 On Tue, Jul 21, 2015 at 10:51 AM, Julien Le Dem
jul...@twitter.com.invalid
 wrote:

 Agenda
 - Julien (Twitter):
   - interested in ByteBuffer status
 - Ryan (by email): interested in ByteBuffer status. did some work
  on
 bloom
 filters.
 PARQUET-251 and PARQUET-246 make sure 2.0 encodings and other new
 features
 are solid.
 - Daniel, Nezih, Zhengxiao (Netflix):
- update on Vectorized read path for Presto (Dong Chen for
 Hive)
- Parquet-99: OOM on write
 - Ippokratis: Impala team.
 - Jason Altekruse: (Drill/MapR)
   - update on Java direct memory representation (hadoop 2.0
   ByteBuffer)
   - currently uses a fork of Parquet that uses the GSOC work.
 - Tianshuo: 1.8.1 release.
 - Sanjeev (Twitter):
  - want to hear updates about vectorized in Presto

 actions:
  - Zhengxiao: update vectorization PR
  - Jason: update ByteBuffer PR
  - Jason: open JIRA for dic encoding fallback pointer
  - Daniel: opened a PR for PARQUET-99: up for review

 Notes:
 - Vectorized read path for Presto (Dong Chen for Hive)
 PARQUET-131
   - batch read
   - lazy materialization
   - Netflix integrated with Presto, Dong Chen integrated with
   Hive
   - Nezih: micro/macro benchmark
- micro 2 read paths
  - only primitives, no converters (3 x faster
 with
 vectorized)
  - complex with converters (no different
  performance)
- macro Presto :
  - complex types not better
  - 2x better for primitive types
   - Daniel: projection + predicate well optimized with presto
   (lazy
 load, lazy materialization). predicate push down and using dic in
 predicate
 evaluation.
   - Ippokratis: fan out? = 100 values per collection,
 list/map
 materialization expansive

 - Dictionary encoding: because of fallback mechanism. We don't
 know
when
 the dictionary ends. = Jason to open a JIRA

 - Parquet-99: OOM on write
   - all big rows: (10MB per row) runs OOM before we first check
   - big variability in size: small initial rows throw off
 estimate
   and
 following big rows blow memory
   - add settings for checking at constant #rows.
   - we should experiment with simpler strategies

 - ByteBuffer status:
   - Jason need to rebase the PR
   - Parquet-77


 On Tue, Jul 21, 2015 at 10:05 AM, Julien Le Dem 
  jul...@twitter.com
 wrote:

 It's happening now:
 https://plus.google.com/hangouts/_/twitter.com/parquet-sync-up

 On Tue, Jul 14, 2015 at 10:04 AM, Julien Le Dem 
  jul...@twitter.com
   
 wrote:

 The next Parquet sync up will be held on google hangout on
   7/21/2015
 at
 10 am PST
 https://plus.google.com/hangouts/_/twitter.com/parquet-sync-up

   
  
  
  
   --
   Alex Levenson
   @THISWILLWORK
  
 



 --
 Alex Levenson
 @THISWILLWORK



Re: Next Parquet Sync Up

2015-07-22 Thread Julien Le Dem
+1 Wednesday

On Wed, Jul 22, 2015 at 4:02 PM, Jason Altekruse altekruseja...@gmail.com
wrote:

 +1 for wednesday

 On Wed, Jul 22, 2015 at 3:47 PM, Jacques Nadeau jacq...@apache.org
 wrote:

  +1 for Wed.
 
  On Wed, Jul 22, 2015 at 3:45 PM, Alex Levenson 
  alexleven...@twitter.com.invalid wrote:
 
   +1 for Wednesday
  
   On Wed, Jul 22, 2015 at 3:44 PM, Julien Le Dem
  jul...@twitter.com.invalid
   
   wrote:
  
Wednesday then?
no more conflicts?
   
On Tue, Jul 21, 2015 at 7:26 PM, Alex Levenson 
alexleven...@twitter.com.invalid wrote:
   
 Sorry to be difficult but, can I request any day other than Monday
 --
   how
 about Wednesday?

 On Tue, Jul 21, 2015 at 7:19 PM, Julien Le Dem jul...@ledem.net
   wrote:

  There's no particular reason for Tuesdays.
  We could do the next one on a Monday.
  Anybody objects?
 
  Julien
 
   On Jul 21, 2015, at 17:37, Jacques Nadeau jacq...@apache.org
wrote:
  
   Any chance we can have these on either a different day or time?
   The
  Drill
   hangout is every Tuesday at 10am so I always have to pick one
 or
   the
  other.
  
   On Tue, Jul 21, 2015 at 10:56 AM, Nezih Yigitbasi 
   nyigitb...@netflix.com.invalid wrote:
  
   An update to actions, I will create a PR for the vectorized
  read
  instead
   of Zhenxiao.
  
   Thanks,
   Nezih
  
   On Tue, Jul 21, 2015 at 10:51 AM, Julien Le Dem
  jul...@twitter.com.invalid
   wrote:
  
   Agenda
   - Julien (Twitter):
 - interested in ByteBuffer status
   - Ryan (by email): interested in ByteBuffer status. did some
  work
on
   bloom
   filters.
   PARQUET-251 and PARQUET-246 make sure 2.0 encodings and other
  new
   features
   are solid.
   - Daniel, Nezih, Zhengxiao (Netflix):
  - update on Vectorized read path for Presto (Dong Chen for
   Hive)
  - Parquet-99: OOM on write
   - Ippokratis: Impala team.
   - Jason Altekruse: (Drill/MapR)
 - update on Java direct memory representation (hadoop 2.0
 ByteBuffer)
 - currently uses a fork of Parquet that uses the GSOC work.
   - Tianshuo: 1.8.1 release.
   - Sanjeev (Twitter):
- want to hear updates about vectorized in Presto
  
   actions:
- Zhengxiao: update vectorization PR
- Jason: update ByteBuffer PR
- Jason: open JIRA for dic encoding fallback pointer
- Daniel: opened a PR for PARQUET-99: up for review
  
   Notes:
   - Vectorized read path for Presto (Dong Chen for Hive)
   PARQUET-131
 - batch read
 - lazy materialization
 - Netflix integrated with Presto, Dong Chen integrated
  with
 Hive
 - Nezih: micro/macro benchmark
  - micro 2 read paths
- only primitives, no converters (3 x faster
   with
   vectorized)
- complex with converters (no different
performance)
  - macro Presto :
- complex types not better
- 2x better for primitive types
 - Daniel: projection + predicate well optimized with
  presto
 (lazy
   load, lazy materialization). predicate push down and using
 dic
  in
   predicate
   evaluation.
 - Ippokratis: fan out? = 100 values per collection,
   list/map
   materialization expansive
  
   - Dictionary encoding: because of fallback mechanism. We
 don't
   know
  when
   the dictionary ends. = Jason to open a JIRA
  
   - Parquet-99: OOM on write
 - all big rows: (10MB per row) runs OOM before we first
 check
 - big variability in size: small initial rows throw off
   estimate
 and
   following big rows blow memory
 - add settings for checking at constant #rows.
 - we should experiment with simpler strategies
  
   - ByteBuffer status:
 - Jason need to rebase the PR
 - Parquet-77
  
  
   On Tue, Jul 21, 2015 at 10:05 AM, Julien Le Dem 
jul...@twitter.com
   wrote:
  
   It's happening now:
  
  https://plus.google.com/hangouts/_/twitter.com/parquet-sync-up
  
   On Tue, Jul 14, 2015 at 10:04 AM, Julien Le Dem 
jul...@twitter.com
 
   wrote:
  
   The next Parquet sync up will be held on google hangout on
 7/21/2015
   at
   10 am PST
  
  https://plus.google.com/hangouts/_/twitter.com/parquet-sync-up
  
 



 --
 Alex Levenson
 @THISWILLWORK

   
  
  
  
   --
   Alex Levenson
   @THISWILLWORK
  
 



Re: Next Parquet Sync Up

2015-07-22 Thread Jason Altekruse
+1 for wednesday

On Wed, Jul 22, 2015 at 3:47 PM, Jacques Nadeau jacq...@apache.org wrote:

 +1 for Wed.

 On Wed, Jul 22, 2015 at 3:45 PM, Alex Levenson 
 alexleven...@twitter.com.invalid wrote:

  +1 for Wednesday
 
  On Wed, Jul 22, 2015 at 3:44 PM, Julien Le Dem
 jul...@twitter.com.invalid
  
  wrote:
 
   Wednesday then?
   no more conflicts?
  
   On Tue, Jul 21, 2015 at 7:26 PM, Alex Levenson 
   alexleven...@twitter.com.invalid wrote:
  
Sorry to be difficult but, can I request any day other than Monday --
  how
about Wednesday?
   
On Tue, Jul 21, 2015 at 7:19 PM, Julien Le Dem jul...@ledem.net
  wrote:
   
 There's no particular reason for Tuesdays.
 We could do the next one on a Monday.
 Anybody objects?

 Julien

  On Jul 21, 2015, at 17:37, Jacques Nadeau jacq...@apache.org
   wrote:
 
  Any chance we can have these on either a different day or time?
  The
 Drill
  hangout is every Tuesday at 10am so I always have to pick one or
  the
 other.
 
  On Tue, Jul 21, 2015 at 10:56 AM, Nezih Yigitbasi 
  nyigitb...@netflix.com.invalid wrote:
 
  An update to actions, I will create a PR for the vectorized
 read
 instead
  of Zhenxiao.
 
  Thanks,
  Nezih
 
  On Tue, Jul 21, 2015 at 10:51 AM, Julien Le Dem
 jul...@twitter.com.invalid
  wrote:
 
  Agenda
  - Julien (Twitter):
- interested in ByteBuffer status
  - Ryan (by email): interested in ByteBuffer status. did some
 work
   on
  bloom
  filters.
  PARQUET-251 and PARQUET-246 make sure 2.0 encodings and other
 new
  features
  are solid.
  - Daniel, Nezih, Zhengxiao (Netflix):
 - update on Vectorized read path for Presto (Dong Chen for
  Hive)
 - Parquet-99: OOM on write
  - Ippokratis: Impala team.
  - Jason Altekruse: (Drill/MapR)
- update on Java direct memory representation (hadoop 2.0
ByteBuffer)
- currently uses a fork of Parquet that uses the GSOC work.
  - Tianshuo: 1.8.1 release.
  - Sanjeev (Twitter):
   - want to hear updates about vectorized in Presto
 
  actions:
   - Zhengxiao: update vectorization PR
   - Jason: update ByteBuffer PR
   - Jason: open JIRA for dic encoding fallback pointer
   - Daniel: opened a PR for PARQUET-99: up for review
 
  Notes:
  - Vectorized read path for Presto (Dong Chen for Hive)
  PARQUET-131
- batch read
- lazy materialization
- Netflix integrated with Presto, Dong Chen integrated
 with
Hive
- Nezih: micro/macro benchmark
 - micro 2 read paths
   - only primitives, no converters (3 x faster
  with
  vectorized)
   - complex with converters (no different
   performance)
 - macro Presto :
   - complex types not better
   - 2x better for primitive types
- Daniel: projection + predicate well optimized with
 presto
(lazy
  load, lazy materialization). predicate push down and using dic
 in
  predicate
  evaluation.
- Ippokratis: fan out? = 100 values per collection,
  list/map
  materialization expansive
 
  - Dictionary encoding: because of fallback mechanism. We don't
  know
 when
  the dictionary ends. = Jason to open a JIRA
 
  - Parquet-99: OOM on write
- all big rows: (10MB per row) runs OOM before we first check
- big variability in size: small initial rows throw off
  estimate
and
  following big rows blow memory
- add settings for checking at constant #rows.
- we should experiment with simpler strategies
 
  - ByteBuffer status:
- Jason need to rebase the PR
- Parquet-77
 
 
  On Tue, Jul 21, 2015 at 10:05 AM, Julien Le Dem 
   jul...@twitter.com
  wrote:
 
  It's happening now:
 
 https://plus.google.com/hangouts/_/twitter.com/parquet-sync-up
 
  On Tue, Jul 14, 2015 at 10:04 AM, Julien Le Dem 
   jul...@twitter.com

  wrote:
 
  The next Parquet sync up will be held on google hangout on
7/21/2015
  at
  10 am PST
 
 https://plus.google.com/hangouts/_/twitter.com/parquet-sync-up
 

   
   
   
--
Alex Levenson
@THISWILLWORK
   
  
 
 
 
  --
  Alex Levenson
  @THISWILLWORK
 



Re: Next Parquet Sync Up

2015-07-21 Thread Julien Le Dem
It's happening now:
https://plus.google.com/hangouts/_/twitter.com/parquet-sync-up

On Tue, Jul 14, 2015 at 10:04 AM, Julien Le Dem jul...@twitter.com wrote:

 The next Parquet sync up will be held on google hangout on 7/21/2015 at 10
 am PST
 https://plus.google.com/hangouts/_/twitter.com/parquet-sync-up



Re: Next Parquet Sync Up

2015-07-21 Thread Julien Le Dem
Agenda
- Julien (Twitter):
   - interested in ByteBuffer status
- Ryan (by email): interested in ByteBuffer status. did some work on bloom
filters.
 PARQUET-251 and PARQUET-246 make sure 2.0 encodings and other new features
are solid.
- Daniel, Nezih, Zhengxiao (Netflix):
- update on Vectorized read path for Presto (Dong Chen for Hive)
- Parquet-99: OOM on write
- Ippokratis: Impala team.
- Jason Altekruse: (Drill/MapR)
   - update on Java direct memory representation (hadoop 2.0 ByteBuffer)
   - currently uses a fork of Parquet that uses the GSOC work.
- Tianshuo: 1.8.1 release.
- Sanjeev (Twitter):
  - want to hear updates about vectorized in Presto

actions:
  - Zhengxiao: update vectorization PR
  - Jason: update ByteBuffer PR
  - Jason: open JIRA for dic encoding fallback pointer
  - Daniel: opened a PR for PARQUET-99: up for review

Notes:
- Vectorized read path for Presto (Dong Chen for Hive) PARQUET-131
   - batch read
   - lazy materialization
   - Netflix integrated with Presto, Dong Chen integrated with Hive
   - Nezih: micro/macro benchmark
- micro 2 read paths
  - only primitives, no converters (3 x faster with
vectorized)
  - complex with converters (no different performance)
- macro Presto :
  - complex types not better
  - 2x better for primitive types
   - Daniel: projection + predicate well optimized with presto (lazy
load, lazy materialization). predicate push down and using dic in predicate
evaluation.
   - Ippokratis: fan out? = 100 values per collection, list/map
materialization expansive

 - Dictionary encoding: because of fallback mechanism. We don't know when
the dictionary ends. = Jason to open a JIRA

- Parquet-99: OOM on write
   - all big rows: (10MB per row) runs OOM before we first check
   - big variability in size: small initial rows throw off estimate and
following big rows blow memory
   - add settings for checking at constant #rows.
   - we should experiment with simpler strategies

- ByteBuffer status:
   - Jason need to rebase the PR
   - Parquet-77


On Tue, Jul 21, 2015 at 10:05 AM, Julien Le Dem jul...@twitter.com wrote:

 It's happening now:
 https://plus.google.com/hangouts/_/twitter.com/parquet-sync-up

 On Tue, Jul 14, 2015 at 10:04 AM, Julien Le Dem jul...@twitter.com
 wrote:

 The next Parquet sync up will be held on google hangout on 7/21/2015 at
 10 am PST
 https://plus.google.com/hangouts/_/twitter.com/parquet-sync-up





Next Parquet Sync Up in August

2015-07-21 Thread Julien Le Dem
The next Parquet sync up will be held on google hangout on 8/11/2015 at 10
am PST
https://plus.google.com/hangouts/_/twitter.com/parquet-sync-up


Re: Next Parquet Sync Up

2015-07-21 Thread Alex Levenson
Sorry to be difficult but, can I request any day other than Monday -- how
about Wednesday?

On Tue, Jul 21, 2015 at 7:19 PM, Julien Le Dem jul...@ledem.net wrote:

 There's no particular reason for Tuesdays.
 We could do the next one on a Monday.
 Anybody objects?

 Julien

  On Jul 21, 2015, at 17:37, Jacques Nadeau jacq...@apache.org wrote:
 
  Any chance we can have these on either a different day or time?  The
 Drill
  hangout is every Tuesday at 10am so I always have to pick one or the
 other.
 
  On Tue, Jul 21, 2015 at 10:56 AM, Nezih Yigitbasi 
  nyigitb...@netflix.com.invalid wrote:
 
  An update to actions, I will create a PR for the vectorized read
 instead
  of Zhenxiao.
 
  Thanks,
  Nezih
 
  On Tue, Jul 21, 2015 at 10:51 AM, Julien Le Dem
 jul...@twitter.com.invalid
  wrote:
 
  Agenda
  - Julien (Twitter):
- interested in ByteBuffer status
  - Ryan (by email): interested in ByteBuffer status. did some work on
  bloom
  filters.
  PARQUET-251 and PARQUET-246 make sure 2.0 encodings and other new
  features
  are solid.
  - Daniel, Nezih, Zhengxiao (Netflix):
 - update on Vectorized read path for Presto (Dong Chen for Hive)
 - Parquet-99: OOM on write
  - Ippokratis: Impala team.
  - Jason Altekruse: (Drill/MapR)
- update on Java direct memory representation (hadoop 2.0 ByteBuffer)
- currently uses a fork of Parquet that uses the GSOC work.
  - Tianshuo: 1.8.1 release.
  - Sanjeev (Twitter):
   - want to hear updates about vectorized in Presto
 
  actions:
   - Zhengxiao: update vectorization PR
   - Jason: update ByteBuffer PR
   - Jason: open JIRA for dic encoding fallback pointer
   - Daniel: opened a PR for PARQUET-99: up for review
 
  Notes:
  - Vectorized read path for Presto (Dong Chen for Hive) PARQUET-131
- batch read
- lazy materialization
- Netflix integrated with Presto, Dong Chen integrated with Hive
- Nezih: micro/macro benchmark
 - micro 2 read paths
   - only primitives, no converters (3 x faster with
  vectorized)
   - complex with converters (no different performance)
 - macro Presto :
   - complex types not better
   - 2x better for primitive types
- Daniel: projection + predicate well optimized with presto (lazy
  load, lazy materialization). predicate push down and using dic in
  predicate
  evaluation.
- Ippokratis: fan out? = 100 values per collection, list/map
  materialization expansive
 
  - Dictionary encoding: because of fallback mechanism. We don't know
 when
  the dictionary ends. = Jason to open a JIRA
 
  - Parquet-99: OOM on write
- all big rows: (10MB per row) runs OOM before we first check
- big variability in size: small initial rows throw off estimate and
  following big rows blow memory
- add settings for checking at constant #rows.
- we should experiment with simpler strategies
 
  - ByteBuffer status:
- Jason need to rebase the PR
- Parquet-77
 
 
  On Tue, Jul 21, 2015 at 10:05 AM, Julien Le Dem jul...@twitter.com
  wrote:
 
  It's happening now:
  https://plus.google.com/hangouts/_/twitter.com/parquet-sync-up
 
  On Tue, Jul 14, 2015 at 10:04 AM, Julien Le Dem jul...@twitter.com
  wrote:
 
  The next Parquet sync up will be held on google hangout on 7/21/2015
  at
  10 am PST
  https://plus.google.com/hangouts/_/twitter.com/parquet-sync-up
 




-- 
Alex Levenson
@THISWILLWORK


Re: Next Parquet Sync Up

2015-07-21 Thread Jacques Nadeau
Any chance we can have these on either a different day or time?  The Drill
hangout is every Tuesday at 10am so I always have to pick one or the other.

On Tue, Jul 21, 2015 at 10:56 AM, Nezih Yigitbasi 
nyigitb...@netflix.com.invalid wrote:

 An update to actions, I will create a PR for the vectorized read instead
 of Zhenxiao.

 Thanks,
 Nezih

 On Tue, Jul 21, 2015 at 10:51 AM, Julien Le Dem jul...@twitter.com.invalid
 
 wrote:

  Agenda
  - Julien (Twitter):
 - interested in ByteBuffer status
  - Ryan (by email): interested in ByteBuffer status. did some work on
 bloom
  filters.
   PARQUET-251 and PARQUET-246 make sure 2.0 encodings and other new
 features
  are solid.
  - Daniel, Nezih, Zhengxiao (Netflix):
  - update on Vectorized read path for Presto (Dong Chen for Hive)
  - Parquet-99: OOM on write
  - Ippokratis: Impala team.
  - Jason Altekruse: (Drill/MapR)
 - update on Java direct memory representation (hadoop 2.0 ByteBuffer)
 - currently uses a fork of Parquet that uses the GSOC work.
  - Tianshuo: 1.8.1 release.
  - Sanjeev (Twitter):
- want to hear updates about vectorized in Presto
 
  actions:
- Zhengxiao: update vectorization PR
- Jason: update ByteBuffer PR
- Jason: open JIRA for dic encoding fallback pointer
- Daniel: opened a PR for PARQUET-99: up for review
 
  Notes:
  - Vectorized read path for Presto (Dong Chen for Hive) PARQUET-131
 - batch read
 - lazy materialization
 - Netflix integrated with Presto, Dong Chen integrated with Hive
 - Nezih: micro/macro benchmark
  - micro 2 read paths
- only primitives, no converters (3 x faster with
  vectorized)
- complex with converters (no different performance)
  - macro Presto :
- complex types not better
- 2x better for primitive types
 - Daniel: projection + predicate well optimized with presto (lazy
  load, lazy materialization). predicate push down and using dic in
 predicate
  evaluation.
 - Ippokratis: fan out? = 100 values per collection, list/map
  materialization expansive
 
   - Dictionary encoding: because of fallback mechanism. We don't know when
  the dictionary ends. = Jason to open a JIRA
 
  - Parquet-99: OOM on write
 - all big rows: (10MB per row) runs OOM before we first check
 - big variability in size: small initial rows throw off estimate and
  following big rows blow memory
 - add settings for checking at constant #rows.
 - we should experiment with simpler strategies
 
  - ByteBuffer status:
 - Jason need to rebase the PR
 - Parquet-77
 
 
  On Tue, Jul 21, 2015 at 10:05 AM, Julien Le Dem jul...@twitter.com
  wrote:
 
   It's happening now:
   https://plus.google.com/hangouts/_/twitter.com/parquet-sync-up
  
   On Tue, Jul 14, 2015 at 10:04 AM, Julien Le Dem jul...@twitter.com
   wrote:
  
   The next Parquet sync up will be held on google hangout on 7/21/2015
 at
   10 am PST
   https://plus.google.com/hangouts/_/twitter.com/parquet-sync-up