Re: Summary of RLE and other compression efforts?

2020-03-26 Thread Micah Kornfield
Hi Evan, > Hope everyone is staying safe! Thanks you too. A fairly substantial amount of CPU is needed for translating from Parquet; > main memory bandwidth becomes a factor. Thus, it seems speed and > constraining factors varies widely by application I agree performance is going to be applic

Re: Summary of RLE and other compression efforts?

2020-03-24 Thread Evan Chan
Hi Micah, Hope everyone is staying safe! > On Mar 16, 2020, at 9:41 PM, Micah Kornfield wrote: > > I feel a little uncomfortable in the fact that there isn't a more clearly > defined dividing line for what belongs in Arrow and what doesn't. I suppose > this is what discussions like these are

Re: Summary of RLE and other compression efforts?

2020-03-16 Thread Micah Kornfield
in >> >> wrote: >> >>>> >> >>>> Hey Evan, >> >>>> >> >>>> >> >>>> thank you for the interest. >> >>>> >> >>>> There has been some effort for compressin

Re: Summary of RLE and other compression efforts?

2020-03-14 Thread Evan Chan
ast as large as >= 15 bits > >> of entropy per element. I suppose the encoding might actually also make > >> sense for high-entropy integer data but I am not super sure. > >>>> For low-entropy data, the dictionary encoding is good though I suspect > >> there can be room for perform

Re: Summary of RLE and other compression efforts?

2020-03-12 Thread Micah Kornfield
ight actually also make > >> sense for high-entropy integer data but I am not super sure. > >>>> For low-entropy data, the dictionary encoding is good though I suspect > >> there can be room for performance improvements. > >>>> This is my final report

Re: Summary of RLE and other compression efforts?

2020-03-11 Thread Wes McKinney
pressor, such as ZSTD, LZ4, etc, is used. It only works well for > >> high-entropy floating-point data, somewhere at least as large as >= 15 bits > >> of entropy per element. I suppose the encoding might actually also make > >> sense for high-entropy integer data but

Re: Summary of RLE and other compression efforts?

2020-03-11 Thread Evan Chan
opy integer data but I am not super sure. >>>> For low-entropy data, the dictionary encoding is good though I suspect >> there can be room for performance improvements. >>>> This is my final report for the encoding here: >> https://github.com/martinradev/

Re: Summary of RLE and other compression efforts?

2020-03-11 Thread Antoine Pitrou
Hi, Le 11/03/2020 à 06:31, Micah Kornfield a écrit : > > I still think we should be careful on what is added to the spec, in > particular, we should be focused on encodings that can be used to improve > computational efficiency rather than just smaller size. Also, it is > important to note that

Re: Summary of RLE and other compression efforts?

2020-03-10 Thread Micah Kornfield
ion as the one in https://github.com/powturbo/Turbo-Transpose. > > > > > > > > > Maybe the points I sent can be helpful. > > > > > > > > > Kinds regards, > > > > > > Martin > > > > > > __

Re: Summary of RLE and other compression efforts?

2020-03-10 Thread Wes McKinney
> > > > Maybe the points I sent can be helpful. > > > > > > Kinds regards, > > > > Martin > > > > > > From: evan_c...@apple.com on behalf of Evan Chan > > > > Sent: Tuesday, March 10, 2020 5

Re: Summary of RLE and other compression efforts?

2020-03-10 Thread Evan Chan
igation turned out be quite the same >> solution as the one in https://github.com/powturbo/Turbo-Transpose. >> >> >> Maybe the points I sent can be helpful. >> >> >> Kinds regards, >> >> Martin >> >> _

Re: Summary of RLE and other compression efforts?

2020-03-10 Thread Evan Chan
__ > From: evan_c...@apple.com on behalf of Evan Chan > > Sent: Tuesday, March 10, 2020 5:15:48 AM > To: dev@arrow.apache.org > Subject: Summary of RLE and other compression efforts? > > Hi folks, > > I’m curious about the state of efforts for more compressed e

Re: Summary of RLE and other compression efforts?

2020-03-10 Thread Wes McKinney
nspose. > > > Maybe the points I sent can be helpful. > > > Kinds regards, > > Martin > > > From: evan_c...@apple.com on behalf of Evan Chan > > Sent: Tuesday, March 10, 2020 5:15:48 AM > To: dev@arrow.apache.org > Su

Re: Summary of RLE and other compression efforts?

2020-03-10 Thread Radev, Martin
ary of RLE and other compression efforts? Hi folks, I’m curious about the state of efforts for more compressed encodings in the Arrow columnar format. I saw discussions previously about RLE, but is there a place to summarize all of the different efforts that are ongoing to bring more compres

Summary of RLE and other compression efforts?

2020-03-09 Thread Evan Chan
Hi folks, I’m curious about the state of efforts for more compressed encodings in the Arrow columnar format. I saw discussions previously about RLE, but is there a place to summarize all of the different efforts that are ongoing to bring more compressed encodings? Is there an effort to compre