Re: Why is a single INSERT very slow in Hive?

Jörn Franke Mon, 11 Sep 2017 12:19:25 -0700

Why do you want to do single inserts? 
It has been more designed for bulk loads.
In any case newer version of Hive 2 using TEZ +llap improve it significantly 
(also for bulk analysis). Nevertheless, it is good practice to not use single 
inserts in an analysis systems, but try to combine and bulk-load them.


> On 11. Sep 2017, at 21:01, Jinhui Qin <qin.jin...@gmail.com> wrote:
> 
>  
> 
> Hi, 
> I am new to Hive. I just created a simple table in hive and inserted two 
> records, the first insertion took 16.4 sec, while the second took 14.3 sec. 
> Why is that very slow? is this the normal performance you get in Hive using 
> INSERT ? Is there a way to improve the performance of a single "insert" in 
> Hive? Any help would be really appreciated. Thanks!
> 
> Here is the record from a terminal in Hive shell:
> 
> =========================
> 
> hive> show tables;
> OK
> Time taken: 2.758 seconds
> hive> create table people(id int, name string, age int);
> OK
> Time taken: 0.283 seconds
> hive> insert into table people(1,'Tom A', 20);
> Query ID = hive_20170911134052_04680c79-432a-43e0-827b-29a4212fbbc0
> Total jobs = 3
> Launching Job 1 out of 3
> Number of reduce tasks is set to 0 since there's no reduce operator
> Starting Job = job_1505146047428_0098, Tracking URL = 
> http://iop-hadoop-bi.novalocal:8088/proxy/application_1505146047428_0098/
> Kill Command = /usr/iop/4.1.0.0/hadoop/bin/hadoop job  -kill 
> job_1505146047428_0098
> Hadoop job information for Stage-1: number of mappers: 1; number of reducers: > 0
> 2017-09-11 13:41:01,492 Stage-1 map = 0%,  reduce = 0%
> 2017-09-11 13:41:06,940 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.7 
> sec
> MapReduce Total cumulative CPU time: 2 seconds 700 msec
> Ended Job = job_1505146047428_0098
> Stage-4 is selected by condition resolver.
> Stage-3 is filtered out by condition resolver.
> Stage-5 is filtered out by condition resolver.
> Moving data to: 
> hdfs://iop-hadoop-bi.novalocal:8020/apps/hive/warehouse/people/.hive-staging_hive_2017-09-11_13-40-52_106_462156758110461544
> 1-1/-ext-10000
> Loading data to table default.people
> Table default.people stats: [numFiles=1, numRows=1, totalSize=11, 
> rawDataSize=10]
> MapReduce Jobs Launched: 
> Stage-Stage-1: Map: 1   Cumulative CPU: 2.7 sec   HDFS Read: 3836 HDFS Write: 
> 81 SUCCESS
> Total MapReduce CPU Time Spent: 2 seconds 700 msec
> OK
> Time taken: 16.417 seconds
> hive> insert into table people values(1,'Tom A', 20);
> Query ID = hive_20170911134128_c8f46977-7718-4496-9a98-cce0f89ced79
> Total jobs = 3
> Launching Job 1 out of 3
> Number of reduce tasks is set to 0 since there's no reduce operator
> Starting Job = job_1505146047428_0099, Tracking URL = 
> http://iop-hadoop-bi.novalocal:8088/proxy/application_1505146047428_0099/
> Kill Command = /usr/iop/4.1.0.0/hadoop/bin/hadoop job  -kill 
> job_1505146047428_0099
> Hadoop job information for Stage-1: number of mappers: 1; number of reducers: > 0
> 2017-09-11 13:41:36,289 Stage-1 map = 0%,  reduce = 0%
> 2017-09-11 13:41:40,721 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.28 
> sec
> MapReduce Total cumulative CPU time: 2 seconds 280 msec
> Ended Job = job_1505146047428_0099
> Stage-4 is selected by condition resolver.
> Stage-3 is filtered out by condition resolver.
> Stage-5 is filtered out by condition resolver.
> Moving data to: 
> hdfs://iop-hadoop-bi.novalocal:8020/apps/hive/warehouse/people/.hive-staging_hive_2017-09-11_13-41-28_757_445847252207124056
> 7-1/-ext-10000
> Loading data to table default.people
> Table default.people stats: [numFiles=2, numRows=2, totalSize=22, 
> rawDataSize=20]
> MapReduce Jobs Launched: 
> Stage-Stage-1: Map: 1   Cumulative CPU: 2.28 sec   HDFS Read: 3924 HDFS 
> Write: 81 SUCCESS
> Total MapReduce CPU Time Spent: 2 seconds 280 msec
> OK
> Time taken: 14.288 seconds
> hive> exit;
> =================
> 
>  
> Jinhui
> 
>

Re: Why is a single INSERT very slow in Hive?

Reply via email to