Re: [DISCUSS] Dictionary Encoding Clarifications/Future Proofing

2019-11-14 Thread Micah Kornfield
Ok, anything else do discuss? Otherwise I'll plan on a new vote with the original language + an explicit call-out that dictionary replacement isn't supported for the file format in the PR On Thursday, November 14, 2019, Antoine Pitrou wrote: > > Right. The dictionaries can be found from the fi

Re: [DISCUSS] Dictionary Encoding Clarifications/Future Proofing

2019-11-14 Thread Antoine Pitrou
Right. The dictionaries can be found from the file footer, so it seems ok. Thank you Regards Antoine. Le 14/11/2019 à 07:11, Micah Kornfield a écrit : > I'll add for: > > If so, how does this play with the fact that there potentially are delta >> dictionaries in the "stream"? > > That in

Re: [DISCUSS] Dictionary Encoding Clarifications/Future Proofing

2019-11-13 Thread Micah Kornfield
I'll add for: If so, how does this play with the fact that there potentially are delta > dictionaries in the "stream"? That in this case the important feature is the dictionary batches have an explicit ordering in the file format based on metadata. So their ordering in the "stream" is largely ir

Re: [DISCUSS] Dictionary Encoding Clarifications/Future Proofing

2019-11-12 Thread Wes McKinney
Hi Antoine, Each *record batch* is intended to be readable in random order. To read any record batch requires loading the dictionaries indicated in the schema, so appending the deltas as part of this process does not seem like it would introduce hardship given that such logic is needed to properly

Re: [DISCUSS] Dictionary Encoding Clarifications/Future Proofing

2019-11-12 Thread Antoine Pitrou
Hi, Sorry for the delay. My high-level question is the following: is the file format intended to be readable in random order (rather than having to read through it in sequence as with the stream format)? If so, how does this play with the fact that there potentially are delta dictionaries in

Re: [DISCUSS] Dictionary Encoding Clarifications/Future Proofing

2019-11-07 Thread Micah Kornfield
I think the main sticking point was dictionaries in the file format. It seems like the use-case for delta dictionaries might be limited so I didn't feel strongly about it. Antoine, did you have more thoughts on this? Thanks, Micah On Wed, Nov 6, 2019 at 9:24 AM Wes McKinney wrote: > Just bum

Re: [DISCUSS] Dictionary Encoding Clarifications/Future Proofing

2019-11-06 Thread Wes McKinney
Just bumping this thread for more comments On Wed, Oct 30, 2019 at 3:11 PM Wes McKinney wrote: > > Returning to this discussion as there seems to lack consensus in the vote > thread > > Copying Micah's proposals in the VOTE thread here, I wanted to state > my opinions so we can discuss further a

Re: [DISCUSS] Dictionary Encoding Clarifications/Future Proofing

2019-10-30 Thread Wes McKinney
Returning to this discussion as there seems to lack consensus in the vote thread Copying Micah's proposals in the VOTE thread here, I wanted to state my opinions so we can discuss further and see where there is potential disagreement 1. It is not required that all dictionary batches occur at the

Re: [DISCUSS] Dictionary Encoding Clarifications/Future Proofing

2019-10-16 Thread Micah Kornfield
I'll plan on starting a vote in the next day or two if there are no further objections/comments. On Sun, Oct 13, 2019 at 11:06 AM Micah Kornfield wrote: > I think the only point asked on the PR that I think is worth discussing is > assumptions about dictionaries at the beginning of streams. > >

Re: [DISCUSS] Dictionary Encoding Clarifications/Future Proofing

2019-10-13 Thread Micah Kornfield
I think the only point asked on the PR that I think is worth discussing is assumptions about dictionaries at the beginning of streams. There are two options: 1. Based on the current wording, it does not seem that all dictionaries need to be at the beginning of the stream if they aren't made use o

Re: [DISCUSS] Dictionary Encoding Clarifications/Future Proofing

2019-10-07 Thread Micah Kornfield
> > > So, why would we allow dictionary replacement instead of have the > > emitter use a new dictionary id? Is it to optimize memory consumption > > on the receiver? > The dictionary id's are set in the schema, so it's not possible to > change the dictionary id after the schema has been sent.

Re: [DISCUSS] Dictionary Encoding Clarifications/Future Proofing

2019-10-06 Thread Wes McKinney
On Sun, Oct 6, 2019 at 4:30 AM Antoine Pitrou wrote: > > On Sat, 5 Oct 2019 17:01:27 -0600 > Micah Kornfield wrote: > > I've opened a pull request [1] to clarify some recent conversations about > > semantics/edge cases for dictionary encoding [2][3] around interleaved > > batches and when isDelta

Re: [DISCUSS] Dictionary Encoding Clarifications/Future Proofing

2019-10-06 Thread Antoine Pitrou
On Sat, 5 Oct 2019 17:01:27 -0600 Micah Kornfield wrote: > I've opened a pull request [1] to clarify some recent conversations about > semantics/edge cases for dictionary encoding [2][3] around interleaved > batches and when isDelta=False. > > Specifically, it proposes isDelta=False indicates dic

[DISCUSS] Dictionary Encoding Clarifications/Future Proofing

2019-10-05 Thread Micah Kornfield
I've opened a pull request [1] to clarify some recent conversations about semantics/edge cases for dictionary encoding [2][3] around interleaved batches and when isDelta=False. Specifically, it proposes isDelta=False indicates dictionary replacement. For the file format, only one isDelta=False bat