Hi Vibhath,

Please make sure that your Impala is not affected by IMPALA-10310
<https://issues.apache.org/jira/browse/IMPALA-10310> (Impala 3.3 and 3.4
have this bug).
If your version has the bug then the workaround is to set
PARQUET_OBJECT_STORE_SPLIT_SIZE / fs.s3a.block.size to the row group size
used by your writer.

Cheers,
    Zoltan


On Mon, Mar 22, 2021 at 7:01 AM Tim Armstrong <tim.g.armstr...@gmail.com>
wrote:

> Impala can read files with multiple row groups fine - many other engines
> generate files like that and it comes up all the time.
>
> I believe the column chunks end up being written in the order of the table
> schema, but maybe someone else knows for sure.
>
> Impala targets a 64kb page size.
>
> On Sun, 21 Mar 2021 at 22:54, Vibhath Ileperuma <
> vibhatharunapr...@gmail.com> wrote:
>
>> Hi all,
>>
>> I noticed that impala written parquet files contain only one row group.
>> I'm using Apache NIFI to generate a set of parquet files and those
>> parquet files might contain more than one row group in one parquet file. I
>> would like to know how impala will be affected if I add these parquet files
>> into an impala s3 table (by adding a new partition).
>> Further, I would like to know how the pages are arranged in one row group
>> in a impala written parquet file..
>>
>> Thanks & Regards
>>
>> *Vibhath Ileperuma*
>>
>>
>>
>>

Reply via email to