[kudu] import from hdfs

2017-08-09 Thread Andrey Kuznetsov
Hi folk, I have a problem with hdfs to kudu performance, I have created external table with CSV data and ran "insert as select" from it to kudu-table and to parquet-table: Importing to parquet-table is 3x faster than to kudu - do you know some tips/tricks to increase performance of import? actu

[KUDU] Rebalancing

2017-08-09 Thread Andrey Kuznetsov
Hi folk, I have a problem with kudu: I need rebalance 8Tb of data due sizing cluster with 3 new worker nodes, but it looks like it is not supported by KUDU, is it? Best regards, ANDREY KUZNETSOV Software Engineering Team Leader Office: +7 482 263 00 70 x 42766 Cell: +7 920 154 05 72 Email:

Re: [kudu] import from hdfs

2017-08-09 Thread Jean-Daniel Cryans
Hi Andrey, Which version of Kudu and Impala are you using? Just that can make a huge difference. Apart from that, make sure Kudu has enough memory (no memory back pressure), you have enough maintenance manager threads (1/3 or 1/4 the number of disks), and that your partitioning favors good load d

Re: [KUDU] Rebalancing

2017-08-09 Thread Jean-Daniel Cryans
(moving dev@ to bcc, this is not a question about how to implement something in the Kudu codebase) Right, there's no data balancing right now unless you add more tablets/create new tables. If you can afford it, dump the table to Parquet, drop the table, create a new one (that will get the distribu