[ 
https://issues.apache.org/jira/browse/HIVE-14627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15438281#comment-15438281
 ] 

Prasanth Jayachandran edited comment on HIVE-14627 at 8/26/16 1:46 AM:
-----------------------------------------------------------------------

I looked at all minimr tests and they seem to use only 3 tables from the 
initial tables (src, srcpart and part). Also none of the tests seem to require 
column stats. So I created another q_test_init.sql file that loads only these 
table. With that I repeated the experiment mentioned in the description, the 
entire test took only 2m33s. 

The tests that are deleted does not have the corresponding qfile (must have 
been deleted but not removed from properties file).

[~sseth] Can you please take a look? Also we can now increase the batch size 
for minimr tests (maybe 10? There are 50 tests now, so 5 batches). 



was (Author: prasanth_j):
I looked at all minimr tests and they seem to use only 3 tables from the 
initial tables (src, srcpart and part). Also none of the tests seem to require 
column stats. So I created another q_test_init.sql file that loads only these 
table. With that I repeated the experiment mentioned in the description, the 
entire test took only 2m33s. 

The tests that are deleted does not have the corresponding qfile (must have 
been deleted but not removed from properties file).

[~sseth] Can you please take a look? Also we can now increase the batch size 
for minimr tests (maybe 10?). 


> Improvements to MiniMr tests
> ----------------------------
>
>                 Key: HIVE-14627
>                 URL: https://issues.apache.org/jira/browse/HIVE-14627
>             Project: Hive
>          Issue Type: Sub-task
>    Affects Versions: 2.2.0
>            Reporter: Prasanth Jayachandran
>            Assignee: Prasanth Jayachandran
>         Attachments: HIVE-14627.1.patch, HIVE-14627.2.patch
>
>
> Currently MiniMr is extremely slow, I ran udf_using.q on MiniMr and following 
> are the execution time breakdown
> Total time - 13m59s
> Junit reported time for testcase - 50s
> Most of the time is spent in creating/loading/analyzing initial tables - ~12m
> Cleanup - ~1m
> There is huge overhead for running MiniMr tests when compared to the actual 
> test runtime. 
> Ran the same test without init script.
> Total time - 2m17s
> Junit reported time for testcase - 52s
> Also I noticed some tests that doesn't have to run on MiniMr (like 
> udf_using.q that does not require MiniMr. It just reads/write to hdfs which 
> we can do in MiniTez/MiniLlap which are way faster). Most tests access only 
> very few initial tables to read few rows from it. We can fix those tests to 
> load just the table that is required for the table instead of all initial 
> tables. Also we can remove q_init_script.sql initialization for MiniMr after 
> rewriting and moving over the unwanted tests which should cut down the 
> runtime a lot.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to