Hi Sid,

Snappy itself is not splittable. But the format that contains the actual
data like parquet (which are basically divided into row groups) can be
compressed using snappy.
This works because blocks(pages of parquet format) inside the parquet can
be independently compressed using snappy.

Thanks
Amit

On Wed, Sep 14, 2022 at 8:14 PM Sid <flinkbyhe...@gmail.com> wrote:

> Hello experts,
>
> I know that Gzip and snappy files are not splittable i.e data won't be
> distributed into multiple blocks rather it would try to load the data in a
> single partition/block
>
> So, my question is when I write the parquet data via spark it gets stored
> at the destination with something like *part*.snappy.parquet*
>
> So, when I read this data will it affect my performance?
>
> Please help me if there is any understanding gap.
>
> Thanks,
> Sid
>

Reply via email to