Hi Vibhath, Please make sure that your Impala is not affected by IMPALA-10310 <https://issues.apache.org/jira/browse/IMPALA-10310> (Impala 3.3 and 3.4 have this bug). If your version has the bug then the workaround is to set PARQUET_OBJECT_STORE_SPLIT_SIZE / fs.s3a.block.size to the row group size used by your writer.
Cheers, Zoltan On Mon, Mar 22, 2021 at 7:01 AM Tim Armstrong <tim.g.armstr...@gmail.com> wrote: > Impala can read files with multiple row groups fine - many other engines > generate files like that and it comes up all the time. > > I believe the column chunks end up being written in the order of the table > schema, but maybe someone else knows for sure. > > Impala targets a 64kb page size. > > On Sun, 21 Mar 2021 at 22:54, Vibhath Ileperuma < > vibhatharunapr...@gmail.com> wrote: > >> Hi all, >> >> I noticed that impala written parquet files contain only one row group. >> I'm using Apache NIFI to generate a set of parquet files and those >> parquet files might contain more than one row group in one parquet file. I >> would like to know how impala will be affected if I add these parquet files >> into an impala s3 table (by adding a new partition). >> Further, I would like to know how the pages are arranged in one row group >> in a impala written parquet file.. >> >> Thanks & Regards >> >> *Vibhath Ileperuma* >> >> >> >>