[jira] [Updated] (HIVE-14627) Improvements to MiniMr tests

2016-08-29 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-14627:
-
   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Committed to master. Thanks for the reviews!

> Improvements to MiniMr tests
> 
>
> Key: HIVE-14627
> URL: https://issues.apache.org/jira/browse/HIVE-14627
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Fix For: 2.2.0
>
> Attachments: HIVE-14627.1.patch, HIVE-14627.2.patch, 
> HIVE-14627.3.patch
>
>
> Currently MiniMr is extremely slow, I ran udf_using.q on MiniMr and following 
> are the execution time breakdown
> Total time - 13m59s
> Junit reported time for testcase - 50s
> Most of the time is spent in creating/loading/analyzing initial tables - ~12m
> Cleanup - ~1m
> There is huge overhead for running MiniMr tests when compared to the actual 
> test runtime. 
> Ran the same test without init script.
> Total time - 2m17s
> Junit reported time for testcase - 52s
> Also I noticed some tests that doesn't have to run on MiniMr (like 
> udf_using.q that does not require MiniMr. It just reads/write to hdfs which 
> we can do in MiniTez/MiniLlap which are way faster). Most tests access only 
> very few initial tables to read few rows from it. We can fix those tests to 
> load just the table that is required for the table instead of all initial 
> tables. Also we can remove q_init_script.sql initialization for MiniMr after 
> rewriting and moving over the unwanted tests which should cut down the 
> runtime a lot.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14627) Improvements to MiniMr tests

2016-08-26 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-14627:
-
Attachment: HIVE-14627.3.patch

All tests are just statistics difference caused by missing analyze commands in 
init scripts.

infer_bucket_sort_reducers_power_two.q shows a different column for bucketing. 
That is because of join reordering not happening as CBO got disabled (no column 
statistics). But I guess this test is not intended to test CBO, so the diff is 
expected and safe. 

> Improvements to MiniMr tests
> 
>
> Key: HIVE-14627
> URL: https://issues.apache.org/jira/browse/HIVE-14627
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14627.1.patch, HIVE-14627.2.patch, 
> HIVE-14627.3.patch
>
>
> Currently MiniMr is extremely slow, I ran udf_using.q on MiniMr and following 
> are the execution time breakdown
> Total time - 13m59s
> Junit reported time for testcase - 50s
> Most of the time is spent in creating/loading/analyzing initial tables - ~12m
> Cleanup - ~1m
> There is huge overhead for running MiniMr tests when compared to the actual 
> test runtime. 
> Ran the same test without init script.
> Total time - 2m17s
> Junit reported time for testcase - 52s
> Also I noticed some tests that doesn't have to run on MiniMr (like 
> udf_using.q that does not require MiniMr. It just reads/write to hdfs which 
> we can do in MiniTez/MiniLlap which are way faster). Most tests access only 
> very few initial tables to read few rows from it. We can fix those tests to 
> load just the table that is required for the table instead of all initial 
> tables. Also we can remove q_init_script.sql initialization for MiniMr after 
> rewriting and moving over the unwanted tests which should cut down the 
> runtime a lot.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14627) Improvements to MiniMr tests

2016-08-25 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-14627:
-
Attachment: HIVE-14627.2.patch

orc_mr_pathalias.q has been removed from minimr as it is not required to run on 
minimr. It just tests for prefix matching in different tables doing joins. This 
is a test for HiveInputFormat. Running it in TestCliDriver will alone be 
sufficient. 

> Improvements to MiniMr tests
> 
>
> Key: HIVE-14627
> URL: https://issues.apache.org/jira/browse/HIVE-14627
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14627.1.patch, HIVE-14627.2.patch
>
>
> Currently MiniMr is extremely slow, I ran udf_using.q on MiniMr and following 
> are the execution time breakdown
> Total time - 13m59s
> Junit reported time for testcase - 50s
> Most of the time is spent in creating/loading/analyzing initial tables - ~12m
> Cleanup - ~1m
> There is huge overhead for running MiniMr tests when compared to the actual 
> test runtime. 
> Ran the same test without init script.
> Total time - 2m17s
> Junit reported time for testcase - 52s
> Also I noticed some tests that doesn't have to run on MiniMr (like 
> udf_using.q that does not require MiniMr. It just reads/write to hdfs which 
> we can do in MiniTez/MiniLlap which are way faster). Most tests access only 
> very few initial tables to read few rows from it. We can fix those tests to 
> load just the table that is required for the table instead of all initial 
> tables. Also we can remove q_init_script.sql initialization for MiniMr after 
> rewriting and moving over the unwanted tests which should cut down the 
> runtime a lot.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14627) Improvements to MiniMr tests

2016-08-25 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-14627:
-
Attachment: HIVE-14627.1.patch

> Improvements to MiniMr tests
> 
>
> Key: HIVE-14627
> URL: https://issues.apache.org/jira/browse/HIVE-14627
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14627.1.patch
>
>
> Currently MiniMr is extremely slow, I ran udf_using.q on MiniMr and following 
> are the execution time breakdown
> Total time - 13m59s
> Junit reported time for testcase - 50s
> Most of the time is spent in creating/loading/analyzing initial tables - ~12m
> Cleanup - ~1m
> There is huge overhead for running MiniMr tests when compared to the actual 
> test runtime. 
> Ran the same test without init script.
> Total time - 2m17s
> Junit reported time for testcase - 52s
> Also I noticed some tests that doesn't have to run on MiniMr (like 
> udf_using.q that does not require MiniMr. It just reads/write to hdfs which 
> we can do in MiniTez/MiniLlap which are way faster). Most tests access only 
> very few initial tables to read few rows from it. We can fix those tests to 
> load just the table that is required for the table instead of all initial 
> tables. Also we can remove q_init_script.sql initialization for MiniMr after 
> rewriting and moving over the unwanted tests which should cut down the 
> runtime a lot.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14627) Improvements to MiniMr tests

2016-08-25 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-14627:
-
Status: Patch Available  (was: Open)

> Improvements to MiniMr tests
> 
>
> Key: HIVE-14627
> URL: https://issues.apache.org/jira/browse/HIVE-14627
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14627.1.patch
>
>
> Currently MiniMr is extremely slow, I ran udf_using.q on MiniMr and following 
> are the execution time breakdown
> Total time - 13m59s
> Junit reported time for testcase - 50s
> Most of the time is spent in creating/loading/analyzing initial tables - ~12m
> Cleanup - ~1m
> There is huge overhead for running MiniMr tests when compared to the actual 
> test runtime. 
> Ran the same test without init script.
> Total time - 2m17s
> Junit reported time for testcase - 52s
> Also I noticed some tests that doesn't have to run on MiniMr (like 
> udf_using.q that does not require MiniMr. It just reads/write to hdfs which 
> we can do in MiniTez/MiniLlap which are way faster). Most tests access only 
> very few initial tables to read few rows from it. We can fix those tests to 
> load just the table that is required for the table instead of all initial 
> tables. Also we can remove q_init_script.sql initialization for MiniMr after 
> rewriting and moving over the unwanted tests which should cut down the 
> runtime a lot.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14627) Improvements to MiniMr tests

2016-08-24 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-14627:
-
Description: 
Currently MiniMr is extremely slow, I ran udf_using.q on MiniMr and following 
are the execution time breakdown

Total time - 13m59s
Junit reported time for testcase - 50s
Most of the time is spent in creating/loading/analyzing initial tables - ~12m
Cleanup - ~1m

There is huge overhead for running MiniMr tests when compared to the actual 
test runtime. 

Ran the same test without init script.
Total time - 2m17s
Junit reported time for testcase - 52s

Also I noticed some tests that doesn't have to run on MiniMr (like udf_using.q 
that does not require MiniMr. It just reads/write to hdfs which we can do in 
MiniTez/MiniLlap which are way faster). Most tests access only very few initial 
tables to read few rows from it. We can fix those tests to load just the table 
that is required for the table instead of all initial tables. Also we can 
remove q_init_script.sql initialization for MiniMr after rewriting and moving 
over the unwanted tests which should cut down the runtime a lot.  


  was:
Currently MiniMr is extremely slow, I ran udf_using.q on MiniMr and following 
are the execution time breakdown

Total time - 13m59s
Junit reported time for testcase - 50s
Most of the time is spent in creating/loading/analyzing initial tables - ~12m
Cleanup - ~1m

There is huge overhead for running MiniMr tests when compared to the actual 
test runtime. 

Also I noticed some tests that doesn't have to run on MiniMr (like udf_using.q 
that does not require MiniMr. It just reads/write to hdfs which we can do in 
MiniTez/MiniLlap which are way faster). Most tests access only very few initial 
tables to read few rows from it. We can fix those tests to load just the table 
that is required for the table instead of all initial tables. Also we can 
remove q_init_script.sql initialization for MiniMr after rewriting and moving 
over the unwanted tests which should cut down the runtime a lot.  



> Improvements to MiniMr tests
> 
>
> Key: HIVE-14627
> URL: https://issues.apache.org/jira/browse/HIVE-14627
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>
> Currently MiniMr is extremely slow, I ran udf_using.q on MiniMr and following 
> are the execution time breakdown
> Total time - 13m59s
> Junit reported time for testcase - 50s
> Most of the time is spent in creating/loading/analyzing initial tables - ~12m
> Cleanup - ~1m
> There is huge overhead for running MiniMr tests when compared to the actual 
> test runtime. 
> Ran the same test without init script.
> Total time - 2m17s
> Junit reported time for testcase - 52s
> Also I noticed some tests that doesn't have to run on MiniMr (like 
> udf_using.q that does not require MiniMr. It just reads/write to hdfs which 
> we can do in MiniTez/MiniLlap which are way faster). Most tests access only 
> very few initial tables to read few rows from it. We can fix those tests to 
> load just the table that is required for the table instead of all initial 
> tables. Also we can remove q_init_script.sql initialization for MiniMr after 
> rewriting and moving over the unwanted tests which should cut down the 
> runtime a lot.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)