Ok, anything else do discuss? Otherwise I'll plan on a new vote with the
original language + an explicit call-out that dictionary replacement isn't
supported for the file format in the PR
On Thursday, November 14, 2019, Antoine Pitrou wrote:
>
> Right. The dictionaries can be found from the fi
Right. The dictionaries can be found from the file footer, so it seems ok.
Thank you
Regards
Antoine.
Le 14/11/2019 à 07:11, Micah Kornfield a écrit :
> I'll add for:
>
> If so, how does this play with the fact that there potentially are delta
>> dictionaries in the "stream"?
>
> That in
I'll add for:
If so, how does this play with the fact that there potentially are delta
> dictionaries in the "stream"?
That in this case the important feature is the dictionary batches have an
explicit ordering in the file format based on metadata. So their ordering
in the "stream" is largely ir
Hi Antoine,
Each *record batch* is intended to be readable in random order. To read any
record batch requires loading the dictionaries indicated in the schema, so
appending the deltas as part of this process does not seem like it would
introduce hardship given that such logic is needed to properly
Hi,
Sorry for the delay.
My high-level question is the following: is the file format intended to
be readable in random order (rather than having to read through it in
sequence as with the stream format)? If so, how does this play with the
fact that there potentially are delta dictionaries in
I think the main sticking point was dictionaries in the file format. It
seems like the use-case for delta dictionaries might be limited so I didn't
feel strongly about it.
Antoine, did you have more thoughts on this?
Thanks,
Micah
On Wed, Nov 6, 2019 at 9:24 AM Wes McKinney wrote:
> Just bum
Just bumping this thread for more comments
On Wed, Oct 30, 2019 at 3:11 PM Wes McKinney wrote:
>
> Returning to this discussion as there seems to lack consensus in the vote
> thread
>
> Copying Micah's proposals in the VOTE thread here, I wanted to state
> my opinions so we can discuss further a
Returning to this discussion as there seems to lack consensus in the vote thread
Copying Micah's proposals in the VOTE thread here, I wanted to state
my opinions so we can discuss further and see where there is potential
disagreement
1. It is not required that all dictionary batches occur at the
I'll plan on starting a vote in the next day or two if there are no further
objections/comments.
On Sun, Oct 13, 2019 at 11:06 AM Micah Kornfield
wrote:
> I think the only point asked on the PR that I think is worth discussing is
> assumptions about dictionaries at the beginning of streams.
>
>
I think the only point asked on the PR that I think is worth discussing is
assumptions about dictionaries at the beginning of streams.
There are two options:
1. Based on the current wording, it does not seem that all dictionaries
need to be at the beginning of the stream if they aren't made use o
>
> > So, why would we allow dictionary replacement instead of have the
> > emitter use a new dictionary id? Is it to optimize memory consumption
> > on the receiver?
> The dictionary id's are set in the schema, so it's not possible to
> change the dictionary id after the schema has been sent.
On Sun, Oct 6, 2019 at 4:30 AM Antoine Pitrou wrote:
>
> On Sat, 5 Oct 2019 17:01:27 -0600
> Micah Kornfield wrote:
> > I've opened a pull request [1] to clarify some recent conversations about
> > semantics/edge cases for dictionary encoding [2][3] around interleaved
> > batches and when isDelta
On Sat, 5 Oct 2019 17:01:27 -0600
Micah Kornfield wrote:
> I've opened a pull request [1] to clarify some recent conversations about
> semantics/edge cases for dictionary encoding [2][3] around interleaved
> batches and when isDelta=False.
>
> Specifically, it proposes isDelta=False indicates dic
I've opened a pull request [1] to clarify some recent conversations about
semantics/edge cases for dictionary encoding [2][3] around interleaved
batches and when isDelta=False.
Specifically, it proposes isDelta=False indicates dictionary replacement.
For the file format, only one isDelta=False bat
14 matches
Mail list logo