Found the reason from profiles. It is again about the exchange. Noshuffle 
helped a lot. Because when you do create table parq as select * from kudu180M 
it scans kudu, writes directly to HDFS. When you do insert into parq partition 
(year) select * from kudu180M where partition=2018 then it just reads 45M rows, 
but the exchange hashes the rows, so it is slower.

On 2018/07/31 20:59:28, Mike Percy <[email protected]> wrote: 
> Can you post a query profile from Impala for one of the slow insert jobs?
> 
> Mike
> 
> On Tue, Jul 31, 2018 at 12:56 PM Tomas Farkas <[email protected]> wrote:
> 
> > Hi,
> > wanted share with you the preliminary results of my Kudu testing on AWS
> > Created a set of performance tests for evaluation of different instance
> > types in AWS and different configurations (Kudu separated from Impala, Kudu
> > and Impala on the same nodes); different drive (st1 and gp2) settings and
> > here my results:
> >
> > I was quite dissapointed by the inserts in Step3 see attached sqls,
> >
> > Any hints, ideas, why this does not scale?
> > Thanks
> >
> >
> >
> 

Reply via email to