RE: [kudu] import from hdfs

Andrey Kuznetsov Thu, 10 Aug 2017 04:47:48 -0700

Hi Jean-Daniel,
Nice to hear you)

I use kudu 1.3, I hope kudu has enough memory (about 256Gb each node),
I have played with threads parameter, but there are no a lot of differences -
it is extremely slow…


Best regards,
ANDREY KUZNETSOV
Software Engineering Team Leader, Assessment Global Discipline Head (Java)

Office: +7 482 263 00 70 x 42766<tel:+7%20482%20263%2000%2070;ext=42766>   
Cell: +7 920 154 05 72<tel:+7%20920%20154%2005%2072>   Email: 
[email protected]<mailto:[email protected]>
Tver, Russia   epam.com<http://www.epam.com/>

CONFIDENTIALITY CAUTION AND DISCLAIMER
This message is intended only for the use of the individual(s) or entity(ies) 
to which it is addressed and contains information that is legally privileged 
and confidential. If you are not the intended recipient, or the person 
responsible for delivering the message to the intended recipient, you are 
hereby notified that any dissemination, distribution or copying of this 
communication is strictly prohibited. All unintended recipients are obliged to 
delete this message and destroy any printed copies.

From: Jean-Daniel Cryans [mailto:[email protected]]
Sent: Wednesday, August 9, 2017 10:52 PM
To: [email protected]
Cc: Special SBER-BPOC Team <[email protected]>
Subject: Re: [kudu] import from hdfs

Hi Andrey,

Which version of Kudu and Impala are you using? Just that can make a huge 
difference.

Apart from that, make sure Kudu has enough memory (no memory back pressure), 
you have enough maintenance manager threads (1/3 or 1/4 the number of disks), 
and that your partitioning favors good load distribution.

But TBH writing to Parquet will remain faster than writing to Kudu, because 
Kudu isn't just dropping the rows into a file and has to do more than that.

Hope this helps,

J-D

On Wed, Aug 9, 2017 at 9:05 AM, Andrey Kuznetsov 
<[email protected]<mailto:[email protected]>> wrote:
Hi folk,
I have a problem with hdfs to kudu performance, I have created external table 
with CSV data and ran “insert as select”  from it to kudu-table and to 
parquet-table:
Importing to parquet-table is 3x faster than to kudu – do you know some 
tips/tricks to increase performance of import?
actually I am importing 8TB of data, so it is critical for me,

Best regards,
ANDREY KUZNETSOV
Software Engineering Team Leader, Assessment Global Discipline Head (Java)

Office: +7 482 263 00 70 x 42766<tel:+7%20482%20263%2000%2070;ext=42766>   
Cell: +7 920 154 05 72<tel:+7%20920%20154%2005%2072>   Email: 
[email protected]<mailto:[email protected]>
Tver, Russia   epam.com<http://www.epam.com/>

CONFIDENTIALITY CAUTION AND DISCLAIMER
This message is intended only for the use of the individual(s) or entity(ies) 
to which it is addressed and contains information that is legally privileged 
and confidential. If you are not the intended recipient, or the person 
responsible for delivering the message to the intended recipient, you are 
hereby notified that any dissemination, distribution or copying of this 
communication is strictly prohibited. All unintended recipients are obliged to 
delete this message and destroy any printed copies.

RE: [kudu] import from hdfs

Reply via email to