Invitation: Parquet Sync @ Thu Sep 19, 2019 9am - 10am (PDT) (dev@parquet.apache.org)

2019-09-06 Thread shangx
BEGIN:VCALENDAR
PRODID:-//Google Inc//Google Calendar 70.9054//EN
VERSION:2.0
CALSCALE:GREGORIAN
METHOD:REQUEST
BEGIN:VEVENT
DTSTART:20190919T16Z
DTEND:20190919T17Z
DTSTAMP:20190906T225855Z
ORGANIZER;CN=sha...@uber.com:mailto:sha...@uber.com
UID:3f2npc97o5cj0qnqnrlfjld...@google.com
ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=ACCEPTED;RSVP=TRUE
 ;CN=sha...@uber.com;X-NUM-GUESTS=0:mailto:sha...@uber.com
ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=ACCEPTED;RSVP=TRUE
 ;CN=gg5...@gmail.com;X-NUM-GUESTS=0:mailto:gg5...@gmail.com
ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=NEEDS-ACTION;RSVP=
 TRUE;CN=Daniel Weeks;X-NUM-GUESTS=0:mailto:dwe...@netflix.com
ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=NEEDS-ACTION;RSVP=
 TRUE;CN=aniket...@gmail.com;X-NUM-GUESTS=0:mailto:aniket...@gmail.com
ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=NEEDS-ACTION;RSVP=
 TRUE;CN=daniels...@gmail.com;X-NUM-GUESTS=0:mailto:daniels...@gmail.com
ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=NEEDS-ACTION;RSVP=
 TRUE;CN=altekruseja...@gmail.com;X-NUM-GUESTS=0:mailto:altekrusejason@gmail
 .com
ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=NEEDS-ACTION;RSVP=
 TRUE;CN=ippokra...@gmail.com;X-NUM-GUESTS=0:mailto:ippokra...@gmail.com
ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=NEEDS-ACTION;RSVP=
 TRUE;CN=Lars Volker;X-NUM-GUESTS=0:mailto:l...@cloudera.com
ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=NEEDS-ACTION;RSVP=
 TRUE;CN=Mohit Sabharwal;X-NUM-GUESTS=0:mailto:mo...@cloudera.com
ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=NEEDS-ACTION;RSVP=
 TRUE;CN=santlal.gu...@bitwiseglobal.com;X-NUM-GUESTS=0:mailto:santlal.gupta
 @bitwiseglobal.com
ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=NEEDS-ACTION;RSVP=
 TRUE;CN=yumw...@ebay.com;X-NUM-GUESTS=0:mailto:yumw...@ebay.com
ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=NEEDS-ACTION;RSVP=
 TRUE;CN=smanik...@gmail.com;X-NUM-GUESTS=0:mailto:smanik...@gmail.com
ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=NEEDS-ACTION;RSVP=
 TRUE;CN=szo...@cloudera.com;X-NUM-GUESTS=0:mailto:szo...@cloudera.com
ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=NEEDS-ACTION;RSVP=
 TRUE;CN=Julien Le Dem;X-NUM-GUESTS=0:mailto:julien.le...@wework.com
ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=NEEDS-ACTION;RSVP=
 TRUE;CN=j.cof...@criteo.com;X-NUM-GUESTS=0:mailto:j.cof...@criteo.com
ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=NEEDS-ACTION;RSVP=
 TRUE;CN=dev@parquet.apache.org;X-NUM-GUESTS=0:mailto:dev@parquet.apache.org
ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=NEEDS-ACTION;RSVP=
 TRUE;CN=m.lac...@criteo.com;X-NUM-GUESTS=0:mailto:m.lac...@criteo.com
ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=NEEDS-ACTION;RSVP=
 TRUE;CN=non...@gmail.com;X-NUM-GUESTS=0:mailto:non...@gmail.com
ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=NEEDS-ACTION;RSVP=
 TRUE;CN=jacq...@apache.org;X-NUM-GUESTS=0:mailto:jacq...@apache.org
ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=NEEDS-ACTION;RSVP=
 TRUE;CN=fnoth...@berkeley.edu;X-NUM-GUESTS=0:mailto:fnoth...@berkeley.edu
ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=NEEDS-ACTION;RSVP=
 TRUE;CN=ven...@uber.com;X-NUM-GUESTS=0:mailto:ven...@uber.com
ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=NEEDS-ACTION;RSVP=
 TRUE;CN=borokna...@cloudera.com;X-NUM-GUESTS=0:mailto:boroknagyz@cloudera.c
 om
ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=NEEDS-ACTION;RSVP=
 TRUE;CN="Xu, Cheng A";X-NUM-GUESTS=0:mailto:cheng.a...@intel.com
ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=NEEDS-ACTION;RSVP=
 TRUE;CN=majeti.dee...@gmail.com;X-NUM-GUESTS=0:mailto:majeti.deepak@gmail.c
 om
ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=NEEDS-ACTION;RSVP=
 TRUE;CN=csringho...@cloudera.com;X-NUM-GUESTS=0:mailto:csringhofer@cloudera
 .com
ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=NEEDS-ACTION;RSVP=
 TRUE;CN=stak...@cloudera.com;X-NUM-GUESTS=0:mailto:stak...@cloudera.com
ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=NEEDS-ACTION;RSVP=
 TRUE;CN=o.kaidan...@criteo.com;X-NUM-GUESTS=0:mailto:o.kaidan...@criteo.com
ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=NEEDS-ACTION;RSVP=
 TRUE;CN=bikramjeet@cloudera.com;X-NUM-GUESTS=0:mailto:bikramjeet.vig@cl
 oudera.com
ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=NEEDS-ACTION;RSVP=
 TRUE;CN=brian.bow...@sas.com;X-NUM-GUESTS=0:mailto:brian.bow...@sas.com
ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=NEEDS-ACTION;RSVP=
 TRUE;CN=apha...@cloudera.com;X-NUM-GUESTS=0:mailto:apha...@cloudera.com
ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=NEEDS-ACTION;RSVP=
 TRUE;CN=nkol...@cloudera.com;X-NUM-GUESTS=0:mailto:nkol...@cloudera.com
ATTENDEE;CUTYPE=INDIVIDUAL;ROLE=REQ-PARTICIPANT;PARTSTAT=NEEDS-ACTION;RSVP=
 

Re: [VOTE] Parquet Bloom filter spec sign-off

2019-09-06 Thread Ryan Blue
+1 on the current spec. Is everyone else still +1?

Sorry for the delay, I didn't realize that everything had been addressed
and I didn't see the email from Jim in my inbox.

On Wed, Aug 28, 2019 at 10:13 AM Jim Apple  wrote:

> We've got +1's from Zoltan and Gabor. Ryan, you've committed a few BF
> patches that were written in response to your feedback on this list. Are
> you in a position to vote +1 now, or do you have further concerns we could
> address?
>
> On 2019/07/31 02:17:15, 俊杰陈  wrote:
> > Dear Parquet developers
> >
> > We still need your vote!
> >
> >
> > On Wed, Jul 24, 2019 at 9:30 PM 俊杰陈  wrote:
> > >
> > > Hi @Ryan Blue  @Wes McKinney
> > >
> > > We need your valuable vote, any feedback is welcome as well.
> > >
> > > On Tue, Jul 23, 2019 at 1:24 PM 俊杰陈  wrote:
> > > >
> > > > Call for voting again.
> > > >
> > > > On Fri, Jul 19, 2019 at 1:17 PM 俊杰陈  wrote:
> > > > >
> > > > > Dear Parquet developers
> > > > >
> > > > > We need more votes, please help to vote on this.
> > > > >
> > > > > On Wed, Jul 17, 2019 at 3:42 PM Gabor Szadovszky
> > > > >  wrote:
> > > > > >
> > > > > > After getting in PARQUET-1625 I vote again for having bloom
> filter spec and
> > > > > > the thrift file update as is in parquet-format master.
> > > > > > +1 (binding)
> > > > > >
> > > > > > On Mon, Jul 15, 2019 at 3:23 PM 俊杰陈  wrote:
> > > > > >
> > > > > > > Thanks Gabor, It's never too late to make it better. We don't
> have to
> > > > > > > run it in a hurry, it has been developed for a long time yet.:)
> > > > > > >
> > > > > > > The thrift file is indeed a bit lag behind the spec. As the
> spec
> > > > > > > defined, the bloom filter data is stored near the footer which
> means
> > > > > > > we don't have to handle it like the page. Therefore, I just
> opened a
> > > > > > > jira to remove bloom_filter_page_header in PageHeader
> structure, while
> > > > > > > the BloomFitlerHeader is kept intentionally for convenience.
> Since the
> > > > > > > spec and the thrift should be aligned with each other
> eventually, so
> > > > > > > the vote target is both of them.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Jul 15, 2019 at 7:48 PM Gabor Szadovszky
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > Hi Junjie,
> > > > > > > >
> > > > > > > > Sorry for bringing up this a bit late but I have some
> problems with the
> > > > > > > > format update. The parquet.thrift file is updated to have
> the bloom
> > > > > > > filters
> > > > > > > > as a page (just as dictionaries and data pages). Meanwhile,
> the spec
> > > > > > > > (BloomFilter.md) says that the bloom filter is stored near
> the footer.
> > > > > > > So,
> > > > > > > > if the bloom filter is not part of the row-groups (like
> column indexes) I
> > > > > > > > would not add it as a page. See the struct ColumnIndex in
> the thrift
> > > > > > > file.
> > > > > > > > This struct is not referenced anywhere in it only declared.
> It was done
> > > > > > > > this way because we don't parse it in the same way as we
> parse the pages.
> > > > > > > >
> > > > > > > > Currently, I am not 100% sure about the target of this vote.
> If it is a
> > > > > > > > vote about adding bloom filters in general then it is a +1
> (binding). If
> > > > > > > it
> > > > > > > > is about adding the bloom filters to parquet-format as is
> then, it is a
> > > > > > > -1
> > > > > > > > (binding) until we fix the issue above.
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > > Gabor
> > > > > > > >
> > > > > > > > On Mon, Jul 15, 2019 at 11:45 AM Gidon Gershinsky <
> gg5...@gmail.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > +1 (non-binding)
> > > > > > > > >
> > > > > > > > > On Mon, Jul 15, 2019 at 12:08 PM Zoltan Ivanfi
>  > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > +1 (binding)
> > > > > > > > > >
> > > > > > > > > > On Mon, Jul 15, 2019 at 9:57 AM 俊杰陈 
> wrote:
> > > > > > > > > > >
> > > > > > > > > > > Dear Parquet developers
> > > > > > > > > > >
> > > > > > > > > > > I'd like to resume this vote, you can start to vote
> now. Thanks for
> > > > > > > > > your
> > > > > > > > > > time.
> > > > > > > > > > >
> > > > > > > > > > > On Wed, Jul 10, 2019 at 9:29 PM 俊杰陈 <
> cjjnj...@gmail.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > I see, will resume this next week.  Thanks.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Wed, Jul 10, 2019 at 5:26 PM Zoltan Ivanfi
> > > > > > > > > 
> > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > Hi Junjie,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Since there are ongoing improvements addressing
> review
> > > > > > > comments, I
> > > > > > > > > > would
> > > > > > > > > > > > > hold off with the vote for a few more days until
> the
> > > > > > > specification
> > > > > > > > > > settles.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Br,
> > > > > > > > > > > >