[jira] [Commented] (HIVE-7052) Optimize split calculation time

2014-06-02 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14015383#comment-14015383
 ] 

Hive QA commented on HIVE-7052:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12647870/HIVE-7052.7.patch

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 5510 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_display_colstats_tbllvl
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketmapjoin6
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas
org.apache.hadoop.hive.ql.exec.tez.TestTezTask.testSubmit
org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimal
org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalX
org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalXY
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/368/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/368/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-368/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12647870

 Optimize split calculation time
 ---

 Key: HIVE-7052
 URL: https://issues.apache.org/jira/browse/HIVE-7052
 Project: Hive
  Issue Type: Bug
 Environment: hive + tez
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
  Labels: performance
 Attachments: HIVE-7052-profiler-1.png, HIVE-7052-profiler-2.png, 
 HIVE-7052-v3.patch, HIVE-7052-v7.patch, HIVE-7052.7.patch


 When running a TPC-DS query (query_27),  significant amount of time was spent 
 in split computation on a dataset of size 200 GB (ORC format).
 Profiling revealed that, 
 1. Lot of time was spent in Config's subtitutevar (regex) in 
 HiveInputFormat.getSplits() method.  
 2. FileSystem was created repeatedly in OrcInputFormat.generateSplitsInfo(). 
 I will attach the profiler snapshots soon.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7052) Optimize split calculation time

2014-06-02 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14015997#comment-14015997
 ] 

Rajesh Balamohan commented on HIVE-7052:


Failures are not related to this patch.

 Optimize split calculation time
 ---

 Key: HIVE-7052
 URL: https://issues.apache.org/jira/browse/HIVE-7052
 Project: Hive
  Issue Type: Bug
 Environment: hive + tez
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
  Labels: performance
 Attachments: HIVE-7052-profiler-1.png, HIVE-7052-profiler-2.png, 
 HIVE-7052-v3.patch, HIVE-7052-v7.patch, HIVE-7052.7.patch


 When running a TPC-DS query (query_27),  significant amount of time was spent 
 in split computation on a dataset of size 200 GB (ORC format).
 Profiling revealed that, 
 1. Lot of time was spent in Config's subtitutevar (regex) in 
 HiveInputFormat.getSplits() method.  
 2. FileSystem was created repeatedly in OrcInputFormat.generateSplitsInfo(). 
 I will attach the profiler snapshots soon.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7052) Optimize split calculation time

2014-06-02 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14016015#comment-14016015
 ] 

Prasanth J commented on HIVE-7052:
--

Patch committed to trunk. Thanks [~rajesh.balamohan] for the contribution.

 Optimize split calculation time
 ---

 Key: HIVE-7052
 URL: https://issues.apache.org/jira/browse/HIVE-7052
 Project: Hive
  Issue Type: Bug
 Environment: hive + tez
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
  Labels: performance
 Fix For: 0.14.0

 Attachments: HIVE-7052-profiler-1.png, HIVE-7052-profiler-2.png, 
 HIVE-7052-v3.patch, HIVE-7052-v7.patch, HIVE-7052.7.patch


 When running a TPC-DS query (query_27),  significant amount of time was spent 
 in split computation on a dataset of size 200 GB (ORC format).
 Profiling revealed that, 
 1. Lot of time was spent in Config's subtitutevar (regex) in 
 HiveInputFormat.getSplits() method.  
 2. FileSystem was created repeatedly in OrcInputFormat.generateSplitsInfo(). 
 I will attach the profiler snapshots soon.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7052) Optimize split calculation time

2014-05-30 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14013349#comment-14013349
 ] 

Prasanth J commented on HIVE-7052:
--

+1

 Optimize split calculation time
 ---

 Key: HIVE-7052
 URL: https://issues.apache.org/jira/browse/HIVE-7052
 Project: Hive
  Issue Type: Bug
 Environment: hive + tez
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
  Labels: performance
 Attachments: HIVE-7052-profiler-1.png, HIVE-7052-profiler-2.png, 
 HIVE-7052-v3.patch, HIVE-7052-v7.patch


 When running a TPC-DS query (query_27),  significant amount of time was spent 
 in split computation on a dataset of size 200 GB (ORC format).
 Profiling revealed that, 
 1. Lot of time was spent in Config's subtitutevar (regex) in 
 HiveInputFormat.getSplits() method.  
 2. FileSystem was created repeatedly in OrcInputFormat.generateSplitsInfo(). 
 I will attach the profiler snapshots soon.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7052) Optimize split calculation time

2014-05-30 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14013350#comment-14013350
 ] 

Prasanth J commented on HIVE-7052:
--

I think the patch is not in proper format. Hive QA does not seem to pick up 
this patch. Can you reupload with HIVE-.7.patch name?

 Optimize split calculation time
 ---

 Key: HIVE-7052
 URL: https://issues.apache.org/jira/browse/HIVE-7052
 Project: Hive
  Issue Type: Bug
 Environment: hive + tez
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
  Labels: performance
 Attachments: HIVE-7052-profiler-1.png, HIVE-7052-profiler-2.png, 
 HIVE-7052-v3.patch, HIVE-7052-v7.patch


 When running a TPC-DS query (query_27),  significant amount of time was spent 
 in split computation on a dataset of size 200 GB (ORC format).
 Profiling revealed that, 
 1. Lot of time was spent in Config's subtitutevar (regex) in 
 HiveInputFormat.getSplits() method.  
 2. FileSystem was created repeatedly in OrcInputFormat.generateSplitsInfo(). 
 I will attach the profiler snapshots soon.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7052) Optimize split calculation time

2014-05-30 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14013355#comment-14013355
 ] 

Prasanth J commented on HIVE-7052:
--

Looks like the patch is already queued up. 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/
Renaming is not required. 

 Optimize split calculation time
 ---

 Key: HIVE-7052
 URL: https://issues.apache.org/jira/browse/HIVE-7052
 Project: Hive
  Issue Type: Bug
 Environment: hive + tez
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
  Labels: performance
 Attachments: HIVE-7052-profiler-1.png, HIVE-7052-profiler-2.png, 
 HIVE-7052-v3.patch, HIVE-7052-v7.patch


 When running a TPC-DS query (query_27),  significant amount of time was spent 
 in split computation on a dataset of size 200 GB (ORC format).
 Profiling revealed that, 
 1. Lot of time was spent in Config's subtitutevar (regex) in 
 HiveInputFormat.getSplits() method.  
 2. FileSystem was created repeatedly in OrcInputFormat.generateSplitsInfo(). 
 I will attach the profiler snapshots soon.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7052) Optimize split calculation time

2014-05-15 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995994#comment-13995994
 ] 

Rajesh Balamohan commented on HIVE-7052:


https://reviews.apache.org/r/21357/diff/#index_header

 Optimize split calculation time
 ---

 Key: HIVE-7052
 URL: https://issues.apache.org/jira/browse/HIVE-7052
 Project: Hive
  Issue Type: Bug
 Environment: hive + tez
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
  Labels: performance
 Attachments: HIVE-7052-profiler-1.png, HIVE-7052-profiler-2.png


 When running a TPC-DS query (query_27),  significant amount of time was spent 
 in split computation on a dataset of size 200 GB (ORC format).
 Profiling revealed that, 
 1. Lot of time was spent in Config's subtitutevar (regex) in 
 HiveInputFormat.getSplits() method.  
 2. FileSystem was created repeatedly in OrcInputFormat.generateSplitsInfo(). 
 I will attach the profiler snapshots soon.



--
This message was sent by Atlassian JIRA
(v6.2#6252)