Dear all,
thank you for your work on the Apache Parquet format.
We are a group of students at the Technical University of Munich who would like
to extend the available compression and encoding options for 32-bit and 64-bit
floating point data in Apache Parquet.
The current encodings and
,
Martin
From: Zoltan Ivanfi
Sent: Friday, July 5, 2019 4:21:39 PM
To: Radev, Martin
Cc: Parquet Dev; Raoofy, Amir; Karlstetter, Roman
Subject: Re: Floating point data compression for Apache Parquet
Hi Martin,
Thanks for the explanations, makes sense. Nice work
for adding the "BYTE_STREAM_SPLIT" encoding to
parquet-cpp within Apache Arrow.
How should we proceed?
It would be great to get feedback from other community members.
Regards,
Martin
____
From: Radev, Martin
Sent: Tuesday, July 9, 2019 1:01:25 AM
From: Zoltan Ivanfi
Sent: Monday, July 8, 2019 5:06:30 PM
To: Radev, Martin
Cc: Parquet Dev; Raoofy, Amir; Karlstetter, Roman
Subject: Re: Floating point data compression for Apache Parquet
Hi Martin,
I agree that bs_zstd would be a good place to start. Regarding the choice
es though it might be somewhat more complex and I have not really
though about this case.
Regards,
Martin
From: Zoltan Ivanfi
Sent: Wednesday, July 3, 2019 6:07:50 PM
To: Parquet Dev; Radev, Martin
Cc: Raoofy, Amir; Karlstetter, Roman
Subject: Re: Floating point da
n 12, 2019 at 11:10 PM Radev, Martin wrote:
>
> Dear all,
>
> thank you for your work on the Apache Parquet format.
>
> We are a group of students at the Technical University of Munich who would
> like to extend the available compression and encoding options for 32-bit an
Dear all,
there was some earlier discussion on adding a new encoding for better
compression of FP32 and FP64 data.
The pull request which extends the format is here:
https://github.com/apache/parquet-format/pull/144
The change has one approval from earlier from Zoltan.
The results from an
of the PMC.
On Tue, Aug 27, 2019 at 5:30 AM Radev, Martin wrote:
> Dear all,
>
>
> there was some earlier discussion on adding a new encoding for better
> compression of FP32 and FP64 data.
>
>
> The pull request which extends the format is here:
> https://github.com/apa
the archives of this mailing
list.
Regards,
Gabor
On Thu, Jul 25, 2019 at 12:56 PM Radev, Martin wrote:
> Dear all,
>
>
> how should be proceeded with this proposal?
>
>
> Would somebody like to offer feedback on the new encoding, change of
> specification, and patc
gt;
> On Tue, Sep 3, 2019, 2:17 PM Radev, Martin wrote:
>
> > Hello all,
> >
> >
> > thank you Julien for the interest.
> >
> >
> > Could other people, part of Apache Parquet, share their opinions?
> >
> > Do you have your own data whi
Dear all,
I am interested in comparing the available encodings and compression algorithms
in Parquet using the parquet-mr project.
The metrics I would like to collect is compression ratio and
compression/decompression speed.
Is there an available project which does something similar which I
Dear Apache Parquet Devs,
I would like to make a proposal for extending the Apache Parquet specification
by adding a better encoding for FP data which improves compression ratio and
also to raise the question of adding a lossy compression algorithm for FP data.
Contents:
1. Problem: FP data
to help me drive this?
Regards,
Martin
From: Radev, Martin
Sent: Tuesday, July 23, 2019 8:22:43 PM
To: dev@parquet.apache.org
Cc: Zoltan Ivanfi; wesmck...@gmail.com; fo...@driesprong.frl;
heue...@gmail.com; Karlstetter, Roman; Raoofy, Amir
Subject: [DISCUSS][JAVA][C
Hello Ryan,
we decided that it would be beneficial to try out your proposal.
I will look into it and provide measurements on the compression ratio and speed.
Regards,
Martin
From: Ryan Blue
Sent: Saturday, September 14, 2019 2:23:20 AM
To: Radev, Martin
Cc
what code and what settings to use.
>
> On Thu, Oct 31, 2019 at 3:51 AM Radev, Martin wrote:
> >
> > Dear all,
> >
> >
> > would there be any interest in reviewing the BYTE_STREAM_SPLIT encoding?
> >
> > Please feel free to contact me di
Hello Manik,
If the compression level is really propagated to the library, what compression
levels did you check?
Regards,
Martin
From: Manik Singla
Sent: Monday, October 21, 2019 10:11:36 PM
To: Parquet Dev
Cc: fa...@sumologic.com; Radev, Martin
Subject
Hi Falak,
I was one of the people who recently exposed this to Arrow but this is not part
of the Parquet specification.
In particular, any implementation for writing parquet files can decide whether
to expose this or select a reasonable value internally.
If you're using Arrow, you would
at 11:22 PM Wes McKinney wrote:
>
> > +1 from me on adding the FP encoding
> >
> > On Sat, Nov 2, 2019 at 4:51 AM Radev, Martin wrote:
> > >
> > > Hello all,
> > >
> > >
> > > thanks for the vote Ryan and to Wes for the fee
and fp64 values. My early experiments also show that this
encoding+zstd performs better on average than any of the specialized
floating-point lossless compressors like fpc, spdp, zfp.
Regards,
Martin
From: Radev, Martin
Sent: Thursday, October 10, 2019 2:34:15
file/d/1wfLQyO2G5nofYFkS7pVbUW0-oJkQqBvv/view?usp=sharing
Regards,
Martin
From: Ryan Blue
Sent: Thursday, September 19, 2019 7:54 PM
To: Radev, Martin
Cc: Parquet Dev; Raoofy, Amir; Karlstetter, Roman
Subject: Re: [VOTE] Add BYTE_STREAM_SPLIT encoding to Apache P
Hi Xin,
thanks for the interest in extending Parquet. I suppose this is only about the
Parquet Writer/Reader implementation, not about changes to the Parquet
specification.
I would like to know whether offloading the task of compressing/decompressing
some data is really beneficial
Dear all,
in our project of using Parquet for streaming fp data with various entropy, we
definitely needed to treat the columns differently.
For fp data with low entropy, dictionary encoding provided good results. For fp
data with entropy >15 bits element, the newly added encoding + zstd
22 matches
Mail list logo