Re: ZSTD-JNI

2020-05-21 Thread Любен
Hi,

I don't know any performance or correctness problems with Zstd-JNI. It
tracks very closely the upstream (the native part) and tries to expose most
of the functionality. Regarding streaming interfaces, assuming that you are
going to use them,  there are currently 2 approaches:

- ZstdInputStream/ZstdOutputStream filters that decompress/compress
streams, similar to the Gzip implementation from the standard library.
- variants that work with direct buffers. If it fits with how your code is
structured, it may be slightly faster.

If you have any specific questions, please let me know. Also you can send
me your PR when it's ready so I may have suggestions.

BTW, it's strange Hadoop decided to reimplement it their own way. The rest
of the ecosystem is using Zstd-JNI, e.g. Spark, Flink, Cassandra, etc.

Regards,
luben




On Thu, May 21, 2020 at 2:34 AM Xinli shang  wrote:

> Hi all,
>
> I see parquet-mr has been using ZSTD-JNI
> for the parquet-cli
> 
> project. It is a clean approach to use this JNI for testing ZSTD instead of
> using Hadoop implementation, especially when testing in localhost. I am
> wondering maybe we can promote it to parquet-hadoop project as ZSTD
> becomes more and more popular. I have a prototype working but I would like
> to ask if anybody knows any issues (performance, reliability etc) of
> ZSTD-JNI ? It is welcome to share any
> feedback on using this JNI.
>
> BTW, I am also trying out the AirCompressor
>  approach, but it seems the
> ZSTD compression level is not adjustable.
>
> --
> Xinli Shang
>


Re: ZSTD-JNI

2020-05-21 Thread Xinli shang
Thank you so much Luben! Here
 is the PR. Please have a
look!

On Wed, May 20, 2020 at 6:51 PM Любен  wrote:

> Hi,
>
> I don't know any performance or correctness problems with Zstd-JNI. It
> tracks very closely the upstream (the native part) and tries to expose most
> of the functionality. Regarding streaming interfaces, assuming that you are
> going to use them,  there are currently 2 approaches:
>
> - ZstdInputStream/ZstdOutputStream filters that decompress/compress
> streams, similar to the Gzip implementation from the standard library.
> - variants that work with direct buffers. If it fits with how your code is
> structured, it may be slightly faster.
>
> If you have any specific questions, please let me know. Also you can send
> me your PR when it's ready so I may have suggestions.
>
> BTW, it's strange Hadoop decided to reimplement it their own way. The rest
> of the ecosystem is using Zstd-JNI, e.g. Spark, Flink, Cassandra, etc.
>
> Regards,
> luben
>
>
>
>
> On Thu, May 21, 2020 at 2:34 AM Xinli shang  wrote:
>
>> Hi all,
>>
>> I see parquet-mr has been using ZSTD-JNI
>> for
>> the parquet-cli
>> 
>> project. It is a clean approach to use this JNI for testing ZSTD instead of
>> using Hadoop implementation, especially when testing in localhost. I am
>> wondering maybe we can promote it to parquet-hadoop project as ZSTD
>> becomes more and more popular. I have a prototype working but I would like
>> to ask if anybody knows any issues (performance, reliability etc) of
>> ZSTD-JNI
>> ?
>> It is welcome to share any feedback on using this JNI.
>>
>> BTW, I am also trying out the AirCompressor
>> 
>>  approach,
>> but it seems the ZSTD compression level is not adjustable.
>>
>> --
>> Xinli Shang
>>
>

-- 
Xinli Shang