[jira] [Commented] (HIVE-7052) Optimize split calculation time
[ https://issues.apache.org/jira/browse/HIVE-7052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14015383#comment-14015383 ] Hive QA commented on HIVE-7052: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12647870/HIVE-7052.7.patch {color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 5510 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_display_colstats_tbllvl org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketmapjoin6 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas org.apache.hadoop.hive.ql.exec.tez.TestTezTask.testSubmit org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimal org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalX org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalXY org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/368/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/368/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-368/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 9 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12647870 Optimize split calculation time --- Key: HIVE-7052 URL: https://issues.apache.org/jira/browse/HIVE-7052 Project: Hive Issue Type: Bug Environment: hive + tez Reporter: Rajesh Balamohan Assignee: Rajesh Balamohan Labels: performance Attachments: HIVE-7052-profiler-1.png, HIVE-7052-profiler-2.png, HIVE-7052-v3.patch, HIVE-7052-v7.patch, HIVE-7052.7.patch When running a TPC-DS query (query_27), significant amount of time was spent in split computation on a dataset of size 200 GB (ORC format). Profiling revealed that, 1. Lot of time was spent in Config's subtitutevar (regex) in HiveInputFormat.getSplits() method. 2. FileSystem was created repeatedly in OrcInputFormat.generateSplitsInfo(). I will attach the profiler snapshots soon. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7052) Optimize split calculation time
[ https://issues.apache.org/jira/browse/HIVE-7052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14015997#comment-14015997 ] Rajesh Balamohan commented on HIVE-7052: Failures are not related to this patch. Optimize split calculation time --- Key: HIVE-7052 URL: https://issues.apache.org/jira/browse/HIVE-7052 Project: Hive Issue Type: Bug Environment: hive + tez Reporter: Rajesh Balamohan Assignee: Rajesh Balamohan Labels: performance Attachments: HIVE-7052-profiler-1.png, HIVE-7052-profiler-2.png, HIVE-7052-v3.patch, HIVE-7052-v7.patch, HIVE-7052.7.patch When running a TPC-DS query (query_27), significant amount of time was spent in split computation on a dataset of size 200 GB (ORC format). Profiling revealed that, 1. Lot of time was spent in Config's subtitutevar (regex) in HiveInputFormat.getSplits() method. 2. FileSystem was created repeatedly in OrcInputFormat.generateSplitsInfo(). I will attach the profiler snapshots soon. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7052) Optimize split calculation time
[ https://issues.apache.org/jira/browse/HIVE-7052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14016015#comment-14016015 ] Prasanth J commented on HIVE-7052: -- Patch committed to trunk. Thanks [~rajesh.balamohan] for the contribution. Optimize split calculation time --- Key: HIVE-7052 URL: https://issues.apache.org/jira/browse/HIVE-7052 Project: Hive Issue Type: Bug Environment: hive + tez Reporter: Rajesh Balamohan Assignee: Rajesh Balamohan Labels: performance Fix For: 0.14.0 Attachments: HIVE-7052-profiler-1.png, HIVE-7052-profiler-2.png, HIVE-7052-v3.patch, HIVE-7052-v7.patch, HIVE-7052.7.patch When running a TPC-DS query (query_27), significant amount of time was spent in split computation on a dataset of size 200 GB (ORC format). Profiling revealed that, 1. Lot of time was spent in Config's subtitutevar (regex) in HiveInputFormat.getSplits() method. 2. FileSystem was created repeatedly in OrcInputFormat.generateSplitsInfo(). I will attach the profiler snapshots soon. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7052) Optimize split calculation time
[ https://issues.apache.org/jira/browse/HIVE-7052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14013349#comment-14013349 ] Prasanth J commented on HIVE-7052: -- +1 Optimize split calculation time --- Key: HIVE-7052 URL: https://issues.apache.org/jira/browse/HIVE-7052 Project: Hive Issue Type: Bug Environment: hive + tez Reporter: Rajesh Balamohan Assignee: Rajesh Balamohan Labels: performance Attachments: HIVE-7052-profiler-1.png, HIVE-7052-profiler-2.png, HIVE-7052-v3.patch, HIVE-7052-v7.patch When running a TPC-DS query (query_27), significant amount of time was spent in split computation on a dataset of size 200 GB (ORC format). Profiling revealed that, 1. Lot of time was spent in Config's subtitutevar (regex) in HiveInputFormat.getSplits() method. 2. FileSystem was created repeatedly in OrcInputFormat.generateSplitsInfo(). I will attach the profiler snapshots soon. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7052) Optimize split calculation time
[ https://issues.apache.org/jira/browse/HIVE-7052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14013350#comment-14013350 ] Prasanth J commented on HIVE-7052: -- I think the patch is not in proper format. Hive QA does not seem to pick up this patch. Can you reupload with HIVE-.7.patch name? Optimize split calculation time --- Key: HIVE-7052 URL: https://issues.apache.org/jira/browse/HIVE-7052 Project: Hive Issue Type: Bug Environment: hive + tez Reporter: Rajesh Balamohan Assignee: Rajesh Balamohan Labels: performance Attachments: HIVE-7052-profiler-1.png, HIVE-7052-profiler-2.png, HIVE-7052-v3.patch, HIVE-7052-v7.patch When running a TPC-DS query (query_27), significant amount of time was spent in split computation on a dataset of size 200 GB (ORC format). Profiling revealed that, 1. Lot of time was spent in Config's subtitutevar (regex) in HiveInputFormat.getSplits() method. 2. FileSystem was created repeatedly in OrcInputFormat.generateSplitsInfo(). I will attach the profiler snapshots soon. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7052) Optimize split calculation time
[ https://issues.apache.org/jira/browse/HIVE-7052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14013355#comment-14013355 ] Prasanth J commented on HIVE-7052: -- Looks like the patch is already queued up. http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/ Renaming is not required. Optimize split calculation time --- Key: HIVE-7052 URL: https://issues.apache.org/jira/browse/HIVE-7052 Project: Hive Issue Type: Bug Environment: hive + tez Reporter: Rajesh Balamohan Assignee: Rajesh Balamohan Labels: performance Attachments: HIVE-7052-profiler-1.png, HIVE-7052-profiler-2.png, HIVE-7052-v3.patch, HIVE-7052-v7.patch When running a TPC-DS query (query_27), significant amount of time was spent in split computation on a dataset of size 200 GB (ORC format). Profiling revealed that, 1. Lot of time was spent in Config's subtitutevar (regex) in HiveInputFormat.getSplits() method. 2. FileSystem was created repeatedly in OrcInputFormat.generateSplitsInfo(). I will attach the profiler snapshots soon. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7052) Optimize split calculation time
[ https://issues.apache.org/jira/browse/HIVE-7052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995994#comment-13995994 ] Rajesh Balamohan commented on HIVE-7052: https://reviews.apache.org/r/21357/diff/#index_header Optimize split calculation time --- Key: HIVE-7052 URL: https://issues.apache.org/jira/browse/HIVE-7052 Project: Hive Issue Type: Bug Environment: hive + tez Reporter: Rajesh Balamohan Assignee: Rajesh Balamohan Labels: performance Attachments: HIVE-7052-profiler-1.png, HIVE-7052-profiler-2.png When running a TPC-DS query (query_27), significant amount of time was spent in split computation on a dataset of size 200 GB (ORC format). Profiling revealed that, 1. Lot of time was spent in Config's subtitutevar (regex) in HiveInputFormat.getSplits() method. 2. FileSystem was created repeatedly in OrcInputFormat.generateSplitsInfo(). I will attach the profiler snapshots soon. -- This message was sent by Atlassian JIRA (v6.2#6252)