Hi Jean-Daniel, Nice to hear you) I use kudu 1.3, I hope kudu has enough memory (about 256Gb each node), I have played with threads parameter, but there are no a lot of differences - it is extremely slow…
Best regards, ANDREY KUZNETSOV Software Engineering Team Leader, Assessment Global Discipline Head (Java) Office: +7 482 263 00 70 x 42766<tel:+7%20482%20263%2000%2070;ext=42766> Cell: +7 920 154 05 72<tel:+7%20920%20154%2005%2072> Email: [email protected]<mailto:[email protected]> Tver, Russia epam.com<http://www.epam.com/> CONFIDENTIALITY CAUTION AND DISCLAIMER This message is intended only for the use of the individual(s) or entity(ies) to which it is addressed and contains information that is legally privileged and confidential. If you are not the intended recipient, or the person responsible for delivering the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. All unintended recipients are obliged to delete this message and destroy any printed copies. From: Jean-Daniel Cryans [mailto:[email protected]] Sent: Wednesday, August 9, 2017 10:52 PM To: [email protected] Cc: Special SBER-BPOC Team <[email protected]> Subject: Re: [kudu] import from hdfs Hi Andrey, Which version of Kudu and Impala are you using? Just that can make a huge difference. Apart from that, make sure Kudu has enough memory (no memory back pressure), you have enough maintenance manager threads (1/3 or 1/4 the number of disks), and that your partitioning favors good load distribution. But TBH writing to Parquet will remain faster than writing to Kudu, because Kudu isn't just dropping the rows into a file and has to do more than that. Hope this helps, J-D On Wed, Aug 9, 2017 at 9:05 AM, Andrey Kuznetsov <[email protected]<mailto:[email protected]>> wrote: Hi folk, I have a problem with hdfs to kudu performance, I have created external table with CSV data and ran “insert as select” from it to kudu-table and to parquet-table: Importing to parquet-table is 3x faster than to kudu – do you know some tips/tricks to increase performance of import? actually I am importing 8TB of data, so it is critical for me, Best regards, ANDREY KUZNETSOV Software Engineering Team Leader, Assessment Global Discipline Head (Java) Office: +7 482 263 00 70 x 42766<tel:+7%20482%20263%2000%2070;ext=42766> Cell: +7 920 154 05 72<tel:+7%20920%20154%2005%2072> Email: [email protected]<mailto:[email protected]> Tver, Russia epam.com<http://www.epam.com/> CONFIDENTIALITY CAUTION AND DISCLAIMER This message is intended only for the use of the individual(s) or entity(ies) to which it is addressed and contains information that is legally privileged and confidential. If you are not the intended recipient, or the person responsible for delivering the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. All unintended recipients are obliged to delete this message and destroy any printed copies.
