Hi Andrey,

Which version of Kudu and Impala are you using? Just that can make a huge
difference.

Apart from that, make sure Kudu has enough memory (no memory back
pressure), you have enough maintenance manager threads (1/3 or 1/4 the
number of disks), and that your partitioning favors good load distribution.

But TBH writing to Parquet will remain faster than writing to Kudu, because
Kudu isn't just dropping the rows into a file and has to do more than that.

Hope this helps,

J-D

On Wed, Aug 9, 2017 at 9:05 AM, Andrey Kuznetsov <andrey_kuznet...@epam.com>
wrote:

> Hi folk,
>
> I have a problem with hdfs to kudu performance, I have created external
> table with CSV data and ran “insert as select”  from it to kudu-table and
> to parquet-table:
>
> Importing to parquet-table is 3x faster than to kudu – do you know some
> tips/tricks to increase performance of import?
>
> actually I am importing 8TB of data, so it is critical for me,
>
>
>
> Best regards,
>
> *ANDREY KUZNETSOV*
>
> *Software Engineering Team Leader, Assessment Global Discipline Head
> (Java)*
>
>
>
> *Office: *+7 482 263 00 70 *x* 42766 <+7%20482%20263%2000%2070;ext=42766>
>    *Cell: *+7 920 154 05 72 <+7%20920%20154%2005%2072>   *Email: *
> andrey_kuznet...@epam.com
>
> *Tver,* *Russia *  *epam.com <http://www.epam.com/>*
>
>
>
> CONFIDENTIALITY CAUTION AND DISCLAIMER
> This message is intended only for the use of the individual(s) or
> entity(ies) to which it is addressed and contains information that is
> legally privileged and confidential. If you are not the intended recipient,
> or the person responsible for delivering the message to the intended
> recipient, you are hereby notified that any dissemination, distribution or
> copying of this communication is strictly prohibited. All unintended
> recipients are obliged to delete this message and destroy any printed
> copies.
>
>
>

Reply via email to