subject:"\[jira\] \[Commented\] \(HIVE\-20512\) Improve record and memory usage logging in SparkRecordHandler"

[jira] [Commented] (HIVE-20512) Improve record and memory usage logging in SparkRecordHandler

2018-11-13 Thread Hive QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16684929#comment-16684929
 ] 

Hive QA commented on HIVE-20512:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12947932/HIVE-20512.92.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 15539 tests passed

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/14906/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/14906/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-14906/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12947932 - PreCommit-HIVE-Build

> Improve record and memory usage logging in SparkRecordHandler
> -
>
> Key: HIVE-20512
> URL: https://issues.apache.org/jira/browse/HIVE-20512
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Bharathkrishna Guruvayoor Murali
>Priority: Major
> Attachments: HIVE-20512.1.patch, HIVE-20512.2.patch, 
> HIVE-20512.3.patch, HIVE-20512.4.patch, HIVE-20512.5.patch, 
> HIVE-20512.6.patch, HIVE-20512.7.patch, HIVE-20512.8.patch, 
> HIVE-20512.9.patch, HIVE-20512.91.patch, HIVE-20512.92.patch
>
>
> We currently log memory usage and # of records processed in Spark tasks, but 
> we should improve the methodology for how frequently we log this info. 
> Currently we use the following code:
> {code:java}
> private long getNextLogThreshold(long currentThreshold) {
> // A very simple counter to keep track of number of rows processed by the
> // reducer. It dumps
> // every 1 million times, and quickly before that
> if (currentThreshold >= 100) {
>   return currentThreshold + 100;
> }
> return 10 * currentThreshold;
>   }
> {code}
> The issue is that after a while, the increase by 10x factor means that you 
> have to process a huge # of records before this gets triggered.
> A better approach would be to log this info at a given interval. This would 
> help in debugging tasks that are seemingly hung.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20512) Improve record and memory usage logging in SparkRecordHandler

2018-11-13 Thread Hive QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16684883#comment-16684883
 ] 

Hive QA commented on HIVE-20512:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
37s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
2s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
35s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  3m 
45s{color} | {color:blue} ql in master has 2316 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
55s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
36s{color} | {color:green} ql: The patch generated 0 new + 4 unchanged - 5 
fixed = 4 total (was 9) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  3m 
47s{color} | {color:red} ql generated 1 new + 2316 unchanged - 0 fixed = 2317 
total (was 2316) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
13s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 22m  9s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:ql |
|  |  Increment of volatile field 
org.apache.hadoop.hive.ql.exec.spark.SparkRecordHandler.rowNumber in 
org.apache.hadoop.hive.ql.exec.spark.SparkRecordHandler.incrementRowNumber()  
At SparkRecordHandler.java:in 
org.apache.hadoop.hive.ql.exec.spark.SparkRecordHandler.incrementRowNumber()  
At SparkRecordHandler.java:[line 115] |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-14906/dev-support/hive-personality.sh
 |
| git revision | master / af40170 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| findbugs | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-14906/yetus/new-findbugs-ql.html
 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-14906/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Improve record and memory usage logging in SparkRecordHandler
> -
>
> Key: HIVE-20512
> URL: https://issues.apache.org/jira/browse/HIVE-20512
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Bharathkrishna Guruvayoor Murali
>Priority: Major
> Attachments: HIVE-20512.1.patch, HIVE-20512.2.patch, 
> HIVE-20512.3.patch, HIVE-20512.4.patch, HIVE-20512.5.patch, 
> HIVE-20512.6.patch, HIVE-20512.7.patch, HIVE-20512.8.patch, 
> HIVE-20512.9.patch, HIVE-20512.91.patch, HIVE-20512.92.patch
>
>
> We currently log memory usage and # of records processed in Spark tasks, but 
> we should improve the methodology for how frequently we log this info. 
> Currently we use the following code:
> {code:java}
> private long getNextLogThreshold(long currentThreshold) {
> // A very simple counter to keep track of number of rows processed by the
> // reducer. It dumps
> // every 1 million times, and quickly before that
>

[jira] [Commented] (HIVE-20512) Improve record and memory usage logging in SparkRecordHandler

2018-11-12 Thread Hive QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16684609#comment-16684609
 ] 

Hive QA commented on HIVE-20512:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12947894/HIVE-20512.91.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 15504 tests 
executed
*Failed tests:*
{noformat}
TestMiniSparkOnYarnCliDriver - did not produce a TEST-*.xml file (likely timed 
out) (batchId=189)

[infer_bucket_sort_reducers_power_two.q,list_bucket_dml_10.q,orc_merge9.q,leftsemijoin_mr.q,bucket6.q,bucketmapjoin7.q,uber_reduce.q,empty_dir_in_table.q,vector_outer_join2.q,spark_explain_groupbyshuffle.q,spark_dynamic_partition_pruning.q,spark_combine_equivalent_work.q,orc_merge1.q,spark_use_op_stats.q,orc_merge_diff_fs.q,quotedid_smb.q,truncate_column_buckets.q,spark_vectorized_dynamic_partition_pruning.q,spark_in_process_launcher.q,orc_merge3.q]
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=128)

[load_dyn_part15.q,explaindenpendencydiffengs.q,transform2.q,groupby5.q,cbo_semijoin.q,bucketmapjoin13.q,alter_merge_stats_orc.q,subquery_scalar.q,union_remove_2.q,groupby_position.q,join12.q,smb_mapjoin_8.q,subquery_select.q,join21.q,auto_join16.q]
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/14897/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/14897/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-14897/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12947894 - PreCommit-HIVE-Build

> Improve record and memory usage logging in SparkRecordHandler
> -
>
> Key: HIVE-20512
> URL: https://issues.apache.org/jira/browse/HIVE-20512
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Bharathkrishna Guruvayoor Murali
>Priority: Major
> Attachments: HIVE-20512.1.patch, HIVE-20512.2.patch, 
> HIVE-20512.3.patch, HIVE-20512.4.patch, HIVE-20512.5.patch, 
> HIVE-20512.6.patch, HIVE-20512.7.patch, HIVE-20512.8.patch, 
> HIVE-20512.9.patch, HIVE-20512.91.patch
>
>
> We currently log memory usage and # of records processed in Spark tasks, but 
> we should improve the methodology for how frequently we log this info. 
> Currently we use the following code:
> {code:java}
> private long getNextLogThreshold(long currentThreshold) {
> // A very simple counter to keep track of number of rows processed by the
> // reducer. It dumps
> // every 1 million times, and quickly before that
> if (currentThreshold >= 100) {
>   return currentThreshold + 100;
> }
> return 10 * currentThreshold;
>   }
> {code}
> The issue is that after a while, the increase by 10x factor means that you 
> have to process a huge # of records before this gets triggered.
> A better approach would be to log this info at a given interval. This would 
> help in debugging tasks that are seemingly hung.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20512) Improve record and memory usage logging in SparkRecordHandler

2018-11-12 Thread Hive QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16684565#comment-16684565
 ] 

Hive QA commented on HIVE-20512:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
21s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
4s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
36s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  3m 
48s{color} | {color:blue} ql in master has 2316 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
53s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
35s{color} | {color:green} ql: The patch generated 0 new + 4 unchanged - 5 
fixed = 4 total (was 9) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  3m 
51s{color} | {color:red} ql generated 1 new + 2316 unchanged - 0 fixed = 2317 
total (was 2316) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
13s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 23m 16s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:ql |
|  |  Increment of volatile field 
org.apache.hadoop.hive.ql.exec.spark.SparkRecordHandler.rowNumber in 
org.apache.hadoop.hive.ql.exec.spark.SparkRecordHandler.incrementRowNumber()  
At SparkRecordHandler.java:in 
org.apache.hadoop.hive.ql.exec.spark.SparkRecordHandler.incrementRowNumber()  
At SparkRecordHandler.java:[line 113] |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-14897/dev-support/hive-personality.sh
 |
| git revision | master / af40170 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| findbugs | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-14897/yetus/new-findbugs-ql.html
 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-14897/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Improve record and memory usage logging in SparkRecordHandler
> -
>
> Key: HIVE-20512
> URL: https://issues.apache.org/jira/browse/HIVE-20512
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Bharathkrishna Guruvayoor Murali
>Priority: Major
> Attachments: HIVE-20512.1.patch, HIVE-20512.2.patch, 
> HIVE-20512.3.patch, HIVE-20512.4.patch, HIVE-20512.5.patch, 
> HIVE-20512.6.patch, HIVE-20512.7.patch, HIVE-20512.8.patch, 
> HIVE-20512.9.patch, HIVE-20512.91.patch
>
>
> We currently log memory usage and # of records processed in Spark tasks, but 
> we should improve the methodology for how frequently we log this info. 
> Currently we use the following code:
> {code:java}
> private long getNextLogThreshold(long currentThreshold) {
> // A very simple counter to keep track of number of rows processed by the
> // reducer. It dumps
> // every 1 million times, and quickly before that
> if

[jira] [Commented] (HIVE-20512) Improve record and memory usage logging in SparkRecordHandler

2018-11-10 Thread Hive QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682498#comment-16682498
 ] 

Hive QA commented on HIVE-20512:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12947614/HIVE-20512.9.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 15502 tests 
executed
*Failed tests:*
{noformat}
TestMiniSparkOnYarnCliDriver - did not produce a TEST-*.xml file (likely timed 
out) (batchId=189)

[infer_bucket_sort_reducers_power_two.q,list_bucket_dml_10.q,orc_merge9.q,leftsemijoin_mr.q,bucket6.q,bucketmapjoin7.q,uber_reduce.q,empty_dir_in_table.q,vector_outer_join2.q,spark_explain_groupbyshuffle.q,spark_dynamic_partition_pruning.q,spark_combine_equivalent_work.q,orc_merge1.q,spark_use_op_stats.q,orc_merge_diff_fs.q,quotedid_smb.q,truncate_column_buckets.q,spark_vectorized_dynamic_partition_pruning.q,spark_in_process_launcher.q,orc_merge3.q]
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=128)

[load_dyn_part15.q,explaindenpendencydiffengs.q,transform2.q,groupby5.q,cbo_semijoin.q,bucketmapjoin13.q,alter_merge_stats_orc.q,subquery_scalar.q,union_remove_2.q,groupby_position.q,join12.q,smb_mapjoin_8.q,subquery_select.q,join21.q,auto_join16.q]
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/14859/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/14859/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-14859/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12947614 - PreCommit-HIVE-Build

> Improve record and memory usage logging in SparkRecordHandler
> -
>
> Key: HIVE-20512
> URL: https://issues.apache.org/jira/browse/HIVE-20512
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Bharathkrishna Guruvayoor Murali
>Priority: Major
> Attachments: HIVE-20512.1.patch, HIVE-20512.2.patch, 
> HIVE-20512.3.patch, HIVE-20512.4.patch, HIVE-20512.5.patch, 
> HIVE-20512.6.patch, HIVE-20512.7.patch, HIVE-20512.8.patch, HIVE-20512.9.patch
>
>
> We currently log memory usage and # of records processed in Spark tasks, but 
> we should improve the methodology for how frequently we log this info. 
> Currently we use the following code:
> {code:java}
> private long getNextLogThreshold(long currentThreshold) {
> // A very simple counter to keep track of number of rows processed by the
> // reducer. It dumps
> // every 1 million times, and quickly before that
> if (currentThreshold >= 100) {
>   return currentThreshold + 100;
> }
> return 10 * currentThreshold;
>   }
> {code}
> The issue is that after a while, the increase by 10x factor means that you 
> have to process a huge # of records before this gets triggered.
> A better approach would be to log this info at a given interval. This would 
> help in debugging tasks that are seemingly hung.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20512) Improve record and memory usage logging in SparkRecordHandler

2018-11-10 Thread Hive QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682462#comment-16682462
 ] 

Hive QA commented on HIVE-20512:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
28s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
0s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
36s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  3m 
37s{color} | {color:blue} ql in master has 2315 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
54s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
36s{color} | {color:green} ql: The patch generated 0 new + 4 unchanged - 5 
fixed = 4 total (was 9) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  3m 
49s{color} | {color:red} ql generated 1 new + 2315 unchanged - 0 fixed = 2316 
total (was 2315) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
12s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 22m  2s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:ql |
|  |  Increment of volatile field 
org.apache.hadoop.hive.ql.exec.spark.SparkRecordHandler.rowNumber in 
org.apache.hadoop.hive.ql.exec.spark.SparkRecordHandler.incrementRowNumber()  
At SparkRecordHandler.java:in 
org.apache.hadoop.hive.ql.exec.spark.SparkRecordHandler.incrementRowNumber()  
At SparkRecordHandler.java:[line 113] |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-14859/dev-support/hive-personality.sh
 |
| git revision | master / 7ae4a2c |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| findbugs | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-14859/yetus/new-findbugs-ql.html
 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-14859/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Improve record and memory usage logging in SparkRecordHandler
> -
>
> Key: HIVE-20512
> URL: https://issues.apache.org/jira/browse/HIVE-20512
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Bharathkrishna Guruvayoor Murali
>Priority: Major
> Attachments: HIVE-20512.1.patch, HIVE-20512.2.patch, 
> HIVE-20512.3.patch, HIVE-20512.4.patch, HIVE-20512.5.patch, 
> HIVE-20512.6.patch, HIVE-20512.7.patch, HIVE-20512.8.patch, HIVE-20512.9.patch
>
>
> We currently log memory usage and # of records processed in Spark tasks, but 
> we should improve the methodology for how frequently we log this info. 
> Currently we use the following code:
> {code:java}
> private long getNextLogThreshold(long currentThreshold) {
> // A very simple counter to keep track of number of rows processed by the
> // reducer. It dumps
> // every 1 million times, and quickly before that
> if (currentThreshold >= 100) {
>

[jira] [Commented] (HIVE-20512) Improve record and memory usage logging in SparkRecordHandler

2018-11-09 Thread Hive QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16681769#comment-16681769
 ] 

Hive QA commented on HIVE-20512:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12947442/HIVE-20512.9.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 15501 tests 
executed
*Failed tests:*
{noformat}
TestMiniSparkOnYarnCliDriver - did not produce a TEST-*.xml file (likely timed 
out) (batchId=189)

[infer_bucket_sort_reducers_power_two.q,list_bucket_dml_10.q,orc_merge9.q,leftsemijoin_mr.q,bucket6.q,bucketmapjoin7.q,uber_reduce.q,empty_dir_in_table.q,vector_outer_join2.q,spark_explain_groupbyshuffle.q,spark_dynamic_partition_pruning.q,spark_combine_equivalent_work.q,orc_merge1.q,spark_use_op_stats.q,orc_merge_diff_fs.q,quotedid_smb.q,truncate_column_buckets.q,spark_vectorized_dynamic_partition_pruning.q,spark_in_process_launcher.q,orc_merge3.q]
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=128)

[load_dyn_part15.q,explaindenpendencydiffengs.q,transform2.q,groupby5.q,cbo_semijoin.q,bucketmapjoin13.q,alter_merge_stats_orc.q,subquery_scalar.q,union_remove_2.q,groupby_position.q,join12.q,smb_mapjoin_8.q,subquery_select.q,join21.q,auto_join16.q]
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/14836/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/14836/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-14836/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12947442 - PreCommit-HIVE-Build

> Improve record and memory usage logging in SparkRecordHandler
> -
>
> Key: HIVE-20512
> URL: https://issues.apache.org/jira/browse/HIVE-20512
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Bharathkrishna Guruvayoor Murali
>Priority: Major
> Attachments: HIVE-20512.1.patch, HIVE-20512.2.patch, 
> HIVE-20512.3.patch, HIVE-20512.4.patch, HIVE-20512.5.patch, 
> HIVE-20512.6.patch, HIVE-20512.7.patch, HIVE-20512.8.patch, HIVE-20512.9.patch
>
>
> We currently log memory usage and # of records processed in Spark tasks, but 
> we should improve the methodology for how frequently we log this info. 
> Currently we use the following code:
> {code:java}
> private long getNextLogThreshold(long currentThreshold) {
> // A very simple counter to keep track of number of rows processed by the
> // reducer. It dumps
> // every 1 million times, and quickly before that
> if (currentThreshold >= 100) {
>   return currentThreshold + 100;
> }
> return 10 * currentThreshold;
>   }
> {code}
> The issue is that after a while, the increase by 10x factor means that you 
> have to process a huge # of records before this gets triggered.
> A better approach would be to log this info at a given interval. This would 
> help in debugging tasks that are seemingly hung.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20512) Improve record and memory usage logging in SparkRecordHandler

2018-11-09 Thread Hive QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16681698#comment-16681698
 ] 

Hive QA commented on HIVE-20512:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
34s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
1s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
36s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  3m 
40s{color} | {color:blue} ql in master has 2315 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
52s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
36s{color} | {color:green} ql: The patch generated 0 new + 4 unchanged - 5 
fixed = 4 total (was 9) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  3m 
51s{color} | {color:red} ql generated 1 new + 2315 unchanged - 0 fixed = 2316 
total (was 2315) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
12s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 22m  9s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:ql |
|  |  Increment of volatile field 
org.apache.hadoop.hive.ql.exec.spark.SparkRecordHandler.rowNumber in 
org.apache.hadoop.hive.ql.exec.spark.SparkRecordHandler.incrementRowNumber()  
At SparkRecordHandler.java:in 
org.apache.hadoop.hive.ql.exec.spark.SparkRecordHandler.incrementRowNumber()  
At SparkRecordHandler.java:[line 113] |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-14836/dev-support/hive-personality.sh
 |
| git revision | master / 01cef92 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| findbugs | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-14836/yetus/new-findbugs-ql.html
 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-14836/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Improve record and memory usage logging in SparkRecordHandler
> -
>
> Key: HIVE-20512
> URL: https://issues.apache.org/jira/browse/HIVE-20512
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Bharathkrishna Guruvayoor Murali
>Priority: Major
> Attachments: HIVE-20512.1.patch, HIVE-20512.2.patch, 
> HIVE-20512.3.patch, HIVE-20512.4.patch, HIVE-20512.5.patch, 
> HIVE-20512.6.patch, HIVE-20512.7.patch, HIVE-20512.8.patch, HIVE-20512.9.patch
>
>
> We currently log memory usage and # of records processed in Spark tasks, but 
> we should improve the methodology for how frequently we log this info. 
> Currently we use the following code:
> {code:java}
> private long getNextLogThreshold(long currentThreshold) {
> // A very simple counter to keep track of number of rows processed by the
> // reducer. It dumps
> // every 1 million times, and quickly before that
> if (currentThreshold >= 100) {
>

[jira] [Commented] (HIVE-20512) Improve record and memory usage logging in SparkRecordHandler

2018-11-08 Thread Hive QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16679734#comment-16679734
 ] 

Hive QA commented on HIVE-20512:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
45s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
59s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
35s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  3m 
43s{color} | {color:blue} ql in master has 2315 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
54s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
35s{color} | {color:green} ql: The patch generated 0 new + 4 unchanged - 5 
fixed = 4 total (was 9) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  3m 
47s{color} | {color:red} ql generated 1 new + 2315 unchanged - 0 fixed = 2316 
total (was 2315) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
12s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 22m  7s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:ql |
|  |  Increment of volatile field 
org.apache.hadoop.hive.ql.exec.spark.SparkRecordHandler.rowNumber in 
org.apache.hadoop.hive.ql.exec.spark.SparkRecordHandler.incrementRowNumber()  
At SparkRecordHandler.java:in 
org.apache.hadoop.hive.ql.exec.spark.SparkRecordHandler.incrementRowNumber()  
At SparkRecordHandler.java:[line 113] |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-14809/dev-support/hive-personality.sh
 |
| git revision | master / ad597b0 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| findbugs | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-14809/yetus/new-findbugs-ql.html
 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-14809/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Improve record and memory usage logging in SparkRecordHandler
> -
>
> Key: HIVE-20512
> URL: https://issues.apache.org/jira/browse/HIVE-20512
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Bharathkrishna Guruvayoor Murali
>Priority: Major
> Attachments: HIVE-20512.1.patch, HIVE-20512.2.patch, 
> HIVE-20512.3.patch, HIVE-20512.4.patch, HIVE-20512.5.patch, 
> HIVE-20512.6.patch, HIVE-20512.7.patch, HIVE-20512.8.patch
>
>
> We currently log memory usage and # of records processed in Spark tasks, but 
> we should improve the methodology for how frequently we log this info. 
> Currently we use the following code:
> {code:java}
> private long getNextLogThreshold(long currentThreshold) {
> // A very simple counter to keep track of number of rows processed by the
> // reducer. It dumps
> // every 1 million times, and quickly before that
> if (currentThreshold >= 100) {
>   return

[jira] [Commented] (HIVE-20512) Improve record and memory usage logging in SparkRecordHandler

2018-11-08 Thread Hive QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16679785#comment-16679785
 ] 

Hive QA commented on HIVE-20512:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12947270/HIVE-20512.8.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 15526 tests passed

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/14809/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/14809/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-14809/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12947270 - PreCommit-HIVE-Build

> Improve record and memory usage logging in SparkRecordHandler
> -
>
> Key: HIVE-20512
> URL: https://issues.apache.org/jira/browse/HIVE-20512
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Bharathkrishna Guruvayoor Murali
>Priority: Major
> Attachments: HIVE-20512.1.patch, HIVE-20512.2.patch, 
> HIVE-20512.3.patch, HIVE-20512.4.patch, HIVE-20512.5.patch, 
> HIVE-20512.6.patch, HIVE-20512.7.patch, HIVE-20512.8.patch
>
>
> We currently log memory usage and # of records processed in Spark tasks, but 
> we should improve the methodology for how frequently we log this info. 
> Currently we use the following code:
> {code:java}
> private long getNextLogThreshold(long currentThreshold) {
> // A very simple counter to keep track of number of rows processed by the
> // reducer. It dumps
> // every 1 million times, and quickly before that
> if (currentThreshold >= 100) {
>   return currentThreshold + 100;
> }
> return 10 * currentThreshold;
>   }
> {code}
> The issue is that after a while, the increase by 10x factor means that you 
> have to process a huge # of records before this gets triggered.
> A better approach would be to log this info at a given interval. This would 
> help in debugging tasks that are seemingly hung.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20512) Improve record and memory usage logging in SparkRecordHandler

2018-11-08 Thread Bharathkrishna Guruvayoor Murali (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16680039#comment-16680039
 ] 

Bharathkrishna Guruvayoor Murali commented on HIVE-20512:
-

Adding awaitTermination() and shutDownNow() after canceling the thread in 
close().

> Improve record and memory usage logging in SparkRecordHandler
> -
>
> Key: HIVE-20512
> URL: https://issues.apache.org/jira/browse/HIVE-20512
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Bharathkrishna Guruvayoor Murali
>Priority: Major
> Attachments: HIVE-20512.1.patch, HIVE-20512.2.patch, 
> HIVE-20512.3.patch, HIVE-20512.4.patch, HIVE-20512.5.patch, 
> HIVE-20512.6.patch, HIVE-20512.7.patch, HIVE-20512.8.patch, HIVE-20512.9.patch
>
>
> We currently log memory usage and # of records processed in Spark tasks, but 
> we should improve the methodology for how frequently we log this info. 
> Currently we use the following code:
> {code:java}
> private long getNextLogThreshold(long currentThreshold) {
> // A very simple counter to keep track of number of rows processed by the
> // reducer. It dumps
> // every 1 million times, and quickly before that
> if (currentThreshold >= 100) {
>   return currentThreshold + 100;
> }
> return 10 * currentThreshold;
>   }
> {code}
> The issue is that after a while, the increase by 10x factor means that you 
> have to process a huge # of records before this gets triggered.
> A better approach would be to log this info at a given interval. This would 
> help in debugging tasks that are seemingly hung.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20512) Improve record and memory usage logging in SparkRecordHandler

2018-11-08 Thread Sahil Takiar (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16680044#comment-16680044
 ] 

Sahil Takiar commented on HIVE-20512:
-

+1 pending tests.

> Improve record and memory usage logging in SparkRecordHandler
> -
>
> Key: HIVE-20512
> URL: https://issues.apache.org/jira/browse/HIVE-20512
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Bharathkrishna Guruvayoor Murali
>Priority: Major
> Attachments: HIVE-20512.1.patch, HIVE-20512.2.patch, 
> HIVE-20512.3.patch, HIVE-20512.4.patch, HIVE-20512.5.patch, 
> HIVE-20512.6.patch, HIVE-20512.7.patch, HIVE-20512.8.patch, HIVE-20512.9.patch
>
>
> We currently log memory usage and # of records processed in Spark tasks, but 
> we should improve the methodology for how frequently we log this info. 
> Currently we use the following code:
> {code:java}
> private long getNextLogThreshold(long currentThreshold) {
> // A very simple counter to keep track of number of rows processed by the
> // reducer. It dumps
> // every 1 million times, and quickly before that
> if (currentThreshold >= 100) {
>   return currentThreshold + 100;
> }
> return 10 * currentThreshold;
>   }
> {code}
> The issue is that after a while, the increase by 10x factor means that you 
> have to process a huge # of records before this gets triggered.
> A better approach would be to log this info at a given interval. This would 
> help in debugging tasks that are seemingly hung.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20512) Improve record and memory usage logging in SparkRecordHandler

2018-11-07 Thread Bharathkrishna Guruvayoor Murali (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678684#comment-16678684
 ] 

Bharathkrishna Guruvayoor Murali commented on HIVE-20512:
-

AwaitTermination causes delay by blocking the main thread. Since we actually 
don't care about whether loggerThread is executed during close, avoiding usage 
of awaitTermination by canceling the scheduledFuture after shutDown as 
suggested by Sahil.
Attaching HIVE-20512.8.patch with this change to see if the tests pass.

> Improve record and memory usage logging in SparkRecordHandler
> -
>
> Key: HIVE-20512
> URL: https://issues.apache.org/jira/browse/HIVE-20512
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Bharathkrishna Guruvayoor Murali
>Priority: Major
> Attachments: HIVE-20512.1.patch, HIVE-20512.2.patch, 
> HIVE-20512.3.patch, HIVE-20512.4.patch, HIVE-20512.5.patch, 
> HIVE-20512.6.patch, HIVE-20512.7.patch, HIVE-20512.8.patch
>
>
> We currently log memory usage and # of records processed in Spark tasks, but 
> we should improve the methodology for how frequently we log this info. 
> Currently we use the following code:
> {code:java}
> private long getNextLogThreshold(long currentThreshold) {
> // A very simple counter to keep track of number of rows processed by the
> // reducer. It dumps
> // every 1 million times, and quickly before that
> if (currentThreshold >= 100) {
>   return currentThreshold + 100;
> }
> return 10 * currentThreshold;
>   }
> {code}
> The issue is that after a while, the increase by 10x factor means that you 
> have to process a huge # of records before this gets triggered.
> A better approach would be to log this info at a given interval. This would 
> help in debugging tasks that are seemingly hung.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20512) Improve record and memory usage logging in SparkRecordHandler

2018-11-06 Thread Hive QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677025#comment-16677025
 ] 

Hive QA commented on HIVE-20512:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12947008/HIVE-20512.7.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 30 failed/errored test(s), 15060 tests 
executed
*Failed tests:*
{noformat}
TestMiniSparkOnYarnCliDriver - did not produce a TEST-*.xml file (likely timed 
out) (batchId=189)

[infer_bucket_sort_reducers_power_two.q,list_bucket_dml_10.q,orc_merge9.q,leftsemijoin_mr.q,bucket6.q,bucketmapjoin7.q,uber_reduce.q,empty_dir_in_table.q,vector_outer_join2.q,spark_explain_groupbyshuffle.q,spark_dynamic_partition_pruning.q,spark_combine_equivalent_work.q,orc_merge1.q,spark_use_op_stats.q,orc_merge_diff_fs.q,quotedid_smb.q,truncate_column_buckets.q,spark_vectorized_dynamic_partition_pruning.q,spark_in_process_launcher.q,orc_merge3.q]
TestMiniSparkOnYarnCliDriver - did not produce a TEST-*.xml file (likely timed 
out) (batchId=190)

[infer_bucket_sort_num_buckets.q,gen_udf_example_add10.q,spark_explainuser_1.q,spark_use_ts_stats_for_mapjoin.q,orc_merge6.q,orc_merge5.q,bucketmapjoin6.q,spark_opt_shuffle_serde.q,temp_table_external.q,spark_dynamic_partition_pruning_6.q,dynamic_rdd_cache.q,auto_sortmerge_join_16.q,vector_outer_join3.q,spark_dynamic_partition_pruning_7.q,schemeAuthority.q,parallel_orderby.q,vector_outer_join1.q,load_hdfs_file_with_space_in_the_name.q,spark_dynamic_partition_pruning_recursive_mapjoin.q,spark_dynamic_partition_pruning_mapjoin_only.q]
TestMiniSparkOnYarnCliDriver - did not produce a TEST-*.xml file (likely timed 
out) (batchId=191)

[insert_overwrite_directory2.q,spark_dynamic_partition_pruning_4.q,import_exported_table.q,vector_outer_join0.q,bucket4.q,orc_merge4.q,infer_bucket_sort_merge.q,orc_merge_incompat1.q,root_dir_external_table.q,constprog_partitioner.q,constprog_semijoin.q,external_table_with_space_in_location_path.q,spark_constprog_dpp.q,spark_dynamic_partition_pruning_3.q,load_fs2.q,infer_bucket_sort_map_operators.q,spark_dynamic_partition_pruning_2.q,vector_inner_join.q,spark_multi_insert_parallel_orderby.q,remote_script.q]
TestMiniSparkOnYarnCliDriver - did not produce a TEST-*.xml file (likely timed 
out) (batchId=192)

[scriptfile1.q,vector_outer_join5.q,file_with_header_footer.q,input16_cc.q,bucket5.q,orc_merge2.q,reduce_deduplicate.q,schemeAuthority2.q,spark_dynamic_partition_pruning_5.q,orc_merge8.q,orc_merge_incompat2.q,infer_bucket_sort_bucketed_table.q,vector_outer_join4.q,disable_merge_for_bucketing.q,orc_merge7.q]
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=110)

[bucketmapjoin4.q,bucket_map_join_spark4.q,union21.q,groupby2_noskew.q,timestamp_2.q,date_join1.q,mergejoins.q,smb_mapjoin_11.q,auto_sortmerge_join_3.q,mapjoin_test_outer.q,vectorization_9.q,merge2.q,groupby6_noskew.q,auto_join_without_localtask.q,multi_join_union.q]
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=111)

[join_cond_pushdown_unqual4.q,union_remove_7.q,join13.q,join_vc.q,groupby_cube1.q,parquet_vectorization_2.q,bucket_map_join_spark2.q,sample3.q,smb_mapjoin_19.q,union23.q,union.q,union31.q,cbo_udf_udaf.q,ptf_decimal.q,bucketmapjoin2.q]
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=112)

[parallel_join1.q,union27.q,union12.q,groupby7_map_multi_single_reducer.q,varchar_join1.q,join7.q,join_reorder4.q,skewjoinopt2.q,bucketsortoptimize_insert_2.q,smb_mapjoin_17.q,script_env_var1.q,groupby7_map.q,bucketsortoptimize_insert_8.q,stats16.q,union20.q]
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=114)

[groupby_map_ppr.q,nullgroup4_multi_distinct.q,join_rc.q,union14.q,order2.q,smb_mapjoin_12.q,vector_cast_constant.q,union_remove_4.q,parquet_vectorization_1.q,auto_join11.q,udaf_collect_set.q,vectorization_12.q,groupby_sort_skew_1_23.q,smb_mapjoin_25.q,skewjoinopt12.q]
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=115)

[skewjoinopt15.q,auto_join18.q,list_bucket_dml_2.q,input1_limit.q,load_dyn_part3.q,union_remove_14.q,auto_sortmerge_join_14.q,auto_sortmerge_join_15.q,union10.q,bucket_map_join_tez2.q,groupby5_map_skew.q,load_dyn_part7.q,join_reorder.q,bucketmapjoin8.q,union34.q]
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=116)

[avro_joins.q,parquet_vectorization_8.q,auto_join14.q,vectorization_14.q,auto_join26.q,stats1.q,cbo_stats.q,union22.q,union_view.q,subquery_views.q,smb_mapjoin_22.q,stats15.q,ptf_matchpath.q,transform_ppr1.q,sample1.q]
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=117)

[jira] [Commented] (HIVE-20512) Improve record and memory usage logging in SparkRecordHandler

2018-11-06 Thread Hive QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16676913#comment-16676913
 ] 

Hive QA commented on HIVE-20512:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
39s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
3s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
37s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  3m 
40s{color} | {color:blue} ql in master has 2315 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
53s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
35s{color} | {color:green} ql: The patch generated 0 new + 4 unchanged - 5 
fixed = 4 total (was 9) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  3m 
52s{color} | {color:red} ql generated 1 new + 2315 unchanged - 0 fixed = 2316 
total (was 2315) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
13s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 22m 19s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:ql |
|  |  Increment of volatile field 
org.apache.hadoop.hive.ql.exec.spark.SparkRecordHandler.rowNumber in 
org.apache.hadoop.hive.ql.exec.spark.SparkRecordHandler.incrementRowNumber()  
At SparkRecordHandler.java:in 
org.apache.hadoop.hive.ql.exec.spark.SparkRecordHandler.incrementRowNumber()  
At SparkRecordHandler.java:[line 110] |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-14770/dev-support/hive-personality.sh
 |
| git revision | master / 353c55e |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| findbugs | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-14770/yetus/new-findbugs-ql.html
 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-14770/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Improve record and memory usage logging in SparkRecordHandler
> -
>
> Key: HIVE-20512
> URL: https://issues.apache.org/jira/browse/HIVE-20512
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Bharathkrishna Guruvayoor Murali
>Priority: Major
> Attachments: HIVE-20512.1.patch, HIVE-20512.2.patch, 
> HIVE-20512.3.patch, HIVE-20512.4.patch, HIVE-20512.5.patch, 
> HIVE-20512.6.patch, HIVE-20512.7.patch
>
>
> We currently log memory usage and # of records processed in Spark tasks, but 
> we should improve the methodology for how frequently we log this info. 
> Currently we use the following code:
> {code:java}
> private long getNextLogThreshold(long currentThreshold) {
> // A very simple counter to keep track of number of rows processed by the
> // reducer. It dumps
> // every 1 million times, and quickly before that
> if (currentThreshold >= 100) {
>   return currentThreshold + 100;
>

[jira] [Commented] (HIVE-20512) Improve record and memory usage logging in SparkRecordHandler

2018-11-05 Thread Bharathkrishna Guruvayoor Murali (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16676147#comment-16676147
 ] 

Bharathkrishna Guruvayoor Murali commented on HIVE-20512:
-

I am adding back shutDown and awaitTermination according to Javadoc it looks 
cleaner, although I felt shutDownNow will be enough for this specific 
situation. I think tests timed out because of awaitTermination which blocks 
main thread until timeout. I have reduced the timeout to 15 seconds from 30 for 
awaitTimeout, let's see if tests still fail.

> Improve record and memory usage logging in SparkRecordHandler
> -
>
> Key: HIVE-20512
> URL: https://issues.apache.org/jira/browse/HIVE-20512
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Bharathkrishna Guruvayoor Murali
>Priority: Major
> Attachments: HIVE-20512.1.patch, HIVE-20512.2.patch, 
> HIVE-20512.3.patch, HIVE-20512.4.patch, HIVE-20512.5.patch, 
> HIVE-20512.6.patch, HIVE-20512.7.patch
>
>
> We currently log memory usage and # of records processed in Spark tasks, but 
> we should improve the methodology for how frequently we log this info. 
> Currently we use the following code:
> {code:java}
> private long getNextLogThreshold(long currentThreshold) {
> // A very simple counter to keep track of number of rows processed by the
> // reducer. It dumps
> // every 1 million times, and quickly before that
> if (currentThreshold >= 100) {
>   return currentThreshold + 100;
> }
> return 10 * currentThreshold;
>   }
> {code}
> The issue is that after a while, the increase by 10x factor means that you 
> have to process a huge # of records before this gets triggered.
> A better approach would be to log this info at a given interval. This would 
> help in debugging tasks that are seemingly hung.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20512) Improve record and memory usage logging in SparkRecordHandler

2018-11-05 Thread Sahil Takiar (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16675889#comment-16675889
 ] 

Sahil Takiar commented on HIVE-20512:
-

I'm not sure why {{awaitTermination}} would be causing the tests to timeout. Do 
they hang locally? If not, it could have just been a temporary test infra 
issue. The problem with calling {{shutdownNow}} directly is that is cancels any 
in progress tasks by interrupting any in progress threads. This can lead to 
spurious errors in the task logs, which can be confusing. It's generally 
recommended to follow the shutdown pattern outlined in 
[https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ExecutorService.html]

> Improve record and memory usage logging in SparkRecordHandler
> -
>
> Key: HIVE-20512
> URL: https://issues.apache.org/jira/browse/HIVE-20512
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Bharathkrishna Guruvayoor Murali
>Priority: Major
> Attachments: HIVE-20512.1.patch, HIVE-20512.2.patch, 
> HIVE-20512.3.patch, HIVE-20512.4.patch, HIVE-20512.5.patch, HIVE-20512.6.patch
>
>
> We currently log memory usage and # of records processed in Spark tasks, but 
> we should improve the methodology for how frequently we log this info. 
> Currently we use the following code:
> {code:java}
> private long getNextLogThreshold(long currentThreshold) {
> // A very simple counter to keep track of number of rows processed by the
> // reducer. It dumps
> // every 1 million times, and quickly before that
> if (currentThreshold >= 100) {
>   return currentThreshold + 100;
> }
> return 10 * currentThreshold;
>   }
> {code}
> The issue is that after a while, the increase by 10x factor means that you 
> have to process a huge # of records before this gets triggered.
> A better approach would be to log this info at a given interval. This would 
> help in debugging tasks that are seemingly hung.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20512) Improve record and memory usage logging in SparkRecordHandler

2018-11-02 Thread Bharathkrishna Guruvayoor Murali (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673291#comment-16673291
 ] 

Bharathkrishna Guruvayoor Murali commented on HIVE-20512:
-

Tests run successfully. [~stakiar] , can you please push this patch to master 
if there are no further comments.

> Improve record and memory usage logging in SparkRecordHandler
> -
>
> Key: HIVE-20512
> URL: https://issues.apache.org/jira/browse/HIVE-20512
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Bharathkrishna Guruvayoor Murali
>Priority: Major
> Attachments: HIVE-20512.1.patch, HIVE-20512.2.patch, 
> HIVE-20512.3.patch, HIVE-20512.4.patch, HIVE-20512.5.patch, HIVE-20512.6.patch
>
>
> We currently log memory usage and # of records processed in Spark tasks, but 
> we should improve the methodology for how frequently we log this info. 
> Currently we use the following code:
> {code:java}
> private long getNextLogThreshold(long currentThreshold) {
> // A very simple counter to keep track of number of rows processed by the
> // reducer. It dumps
> // every 1 million times, and quickly before that
> if (currentThreshold >= 100) {
>   return currentThreshold + 100;
> }
> return 10 * currentThreshold;
>   }
> {code}
> The issue is that after a while, the increase by 10x factor means that you 
> have to process a huge # of records before this gets triggered.
> A better approach would be to log this info at a given interval. This would 
> help in debugging tasks that are seemingly hung.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20512) Improve record and memory usage logging in SparkRecordHandler

2018-11-02 Thread Hive QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672786#comment-16672786
 ] 

Hive QA commented on HIVE-20512:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12946581/HIVE-20512.6.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 15520 tests passed

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/14701/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/14701/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-14701/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12946581 - PreCommit-HIVE-Build

> Improve record and memory usage logging in SparkRecordHandler
> -
>
> Key: HIVE-20512
> URL: https://issues.apache.org/jira/browse/HIVE-20512
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Bharathkrishna Guruvayoor Murali
>Priority: Major
> Attachments: HIVE-20512.1.patch, HIVE-20512.2.patch, 
> HIVE-20512.3.patch, HIVE-20512.4.patch, HIVE-20512.5.patch, HIVE-20512.6.patch
>
>
> We currently log memory usage and # of records processed in Spark tasks, but 
> we should improve the methodology for how frequently we log this info. 
> Currently we use the following code:
> {code:java}
> private long getNextLogThreshold(long currentThreshold) {
> // A very simple counter to keep track of number of rows processed by the
> // reducer. It dumps
> // every 1 million times, and quickly before that
> if (currentThreshold >= 100) {
>   return currentThreshold + 100;
> }
> return 10 * currentThreshold;
>   }
> {code}
> The issue is that after a while, the increase by 10x factor means that you 
> have to process a huge # of records before this gets triggered.
> A better approach would be to log this info at a given interval. This would 
> help in debugging tasks that are seemingly hung.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20512) Improve record and memory usage logging in SparkRecordHandler

2018-11-02 Thread Hive QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672716#comment-16672716
 ] 

Hive QA commented on HIVE-20512:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
34s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
2s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
37s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  3m 
44s{color} | {color:blue} ql in master has 2315 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
54s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
37s{color} | {color:green} ql: The patch generated 0 new + 4 unchanged - 5 
fixed = 4 total (was 9) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  3m 
52s{color} | {color:red} ql generated 1 new + 2315 unchanged - 0 fixed = 2316 
total (was 2315) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
13s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 22m 17s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:ql |
|  |  Increment of volatile field 
org.apache.hadoop.hive.ql.exec.spark.SparkRecordHandler.rowNumber in 
org.apache.hadoop.hive.ql.exec.spark.SparkRecordHandler.incrementRowNumber()  
At SparkRecordHandler.java:in 
org.apache.hadoop.hive.ql.exec.spark.SparkRecordHandler.incrementRowNumber()  
At SparkRecordHandler.java:[line 110] |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-14701/dev-support/hive-personality.sh
 |
| git revision | master / f3fab45 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| findbugs | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-14701/yetus/new-findbugs-ql.html
 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-14701/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Improve record and memory usage logging in SparkRecordHandler
> -
>
> Key: HIVE-20512
> URL: https://issues.apache.org/jira/browse/HIVE-20512
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Bharathkrishna Guruvayoor Murali
>Priority: Major
> Attachments: HIVE-20512.1.patch, HIVE-20512.2.patch, 
> HIVE-20512.3.patch, HIVE-20512.4.patch, HIVE-20512.5.patch, HIVE-20512.6.patch
>
>
> We currently log memory usage and # of records processed in Spark tasks, but 
> we should improve the methodology for how frequently we log this info. 
> Currently we use the following code:
> {code:java}
> private long getNextLogThreshold(long currentThreshold) {
> // A very simple counter to keep track of number of rows processed by the
> // reducer. It dumps
> // every 1 million times, and quickly before that
> if (currentThreshold >= 100) {
>   return currentThreshold + 100;
> }
> return 10 *

[jira] [Commented] (HIVE-20512) Improve record and memory usage logging in SparkRecordHandler

2018-10-31 Thread Antal Sinkovits (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16670892#comment-16670892
 ] 

Antal Sinkovits commented on HIVE-20512:


LGTM

> Improve record and memory usage logging in SparkRecordHandler
> -
>
> Key: HIVE-20512
> URL: https://issues.apache.org/jira/browse/HIVE-20512
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Bharathkrishna Guruvayoor Murali
>Priority: Major
> Attachments: HIVE-20512.1.patch, HIVE-20512.2.patch, 
> HIVE-20512.3.patch, HIVE-20512.4.patch, HIVE-20512.5.patch, HIVE-20512.6.patch
>
>
> We currently log memory usage and # of records processed in Spark tasks, but 
> we should improve the methodology for how frequently we log this info. 
> Currently we use the following code:
> {code:java}
> private long getNextLogThreshold(long currentThreshold) {
> // A very simple counter to keep track of number of rows processed by the
> // reducer. It dumps
> // every 1 million times, and quickly before that
> if (currentThreshold >= 100) {
>   return currentThreshold + 100;
> }
> return 10 * currentThreshold;
>   }
> {code}
> The issue is that after a while, the increase by 10x factor means that you 
> have to process a huge # of records before this gets triggered.
> A better approach would be to log this info at a given interval. This would 
> help in debugging tasks that are seemingly hung.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20512) Improve record and memory usage logging in SparkRecordHandler

2018-10-29 Thread Hive QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667884#comment-16667884
 ] 

Hive QA commented on HIVE-20512:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12946097/HIVE-20512.5.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 15518 tests passed

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/14684/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/14684/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-14684/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12946097 - PreCommit-HIVE-Build

> Improve record and memory usage logging in SparkRecordHandler
> -
>
> Key: HIVE-20512
> URL: https://issues.apache.org/jira/browse/HIVE-20512
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Bharathkrishna Guruvayoor Murali
>Priority: Major
> Attachments: HIVE-20512.1.patch, HIVE-20512.2.patch, 
> HIVE-20512.3.patch, HIVE-20512.4.patch, HIVE-20512.5.patch
>
>
> We currently log memory usage and # of records processed in Spark tasks, but 
> we should improve the methodology for how frequently we log this info. 
> Currently we use the following code:
> {code:java}
> private long getNextLogThreshold(long currentThreshold) {
> // A very simple counter to keep track of number of rows processed by the
> // reducer. It dumps
> // every 1 million times, and quickly before that
> if (currentThreshold >= 100) {
>   return currentThreshold + 100;
> }
> return 10 * currentThreshold;
>   }
> {code}
> The issue is that after a while, the increase by 10x factor means that you 
> have to process a huge # of records before this gets triggered.
> A better approach would be to log this info at a given interval. This would 
> help in debugging tasks that are seemingly hung.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20512) Improve record and memory usage logging in SparkRecordHandler

2018-10-29 Thread Hive QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667853#comment-16667853
 ] 

Hive QA commented on HIVE-20512:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
 5s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
59s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
36s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  3m 
33s{color} | {color:blue} ql in master has 2315 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
54s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
38s{color} | {color:green} ql: The patch generated 0 new + 4 unchanged - 5 
fixed = 4 total (was 9) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  3m 
55s{color} | {color:red} ql generated 1 new + 2315 unchanged - 0 fixed = 2316 
total (was 2315) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
13s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 22m 26s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:ql |
|  |  Increment of volatile field 
org.apache.hadoop.hive.ql.exec.spark.SparkRecordHandler.rowNumber in 
org.apache.hadoop.hive.ql.exec.spark.SparkRecordHandler.incrementRowNumber()  
At SparkRecordHandler.java:in 
org.apache.hadoop.hive.ql.exec.spark.SparkRecordHandler.incrementRowNumber()  
At SparkRecordHandler.java:[line 109] |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-14684/dev-support/hive-personality.sh
 |
| git revision | master / 64bea03 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| findbugs | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-14684/yetus/new-findbugs-ql.html
 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-14684/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Improve record and memory usage logging in SparkRecordHandler
> -
>
> Key: HIVE-20512
> URL: https://issues.apache.org/jira/browse/HIVE-20512
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Bharathkrishna Guruvayoor Murali
>Priority: Major
> Attachments: HIVE-20512.1.patch, HIVE-20512.2.patch, 
> HIVE-20512.3.patch, HIVE-20512.4.patch, HIVE-20512.5.patch
>
>
> We currently log memory usage and # of records processed in Spark tasks, but 
> we should improve the methodology for how frequently we log this info. 
> Currently we use the following code:
> {code:java}
> private long getNextLogThreshold(long currentThreshold) {
> // A very simple counter to keep track of number of rows processed by the
> // reducer. It dumps
> // every 1 million times, and quickly before that
> if (currentThreshold >= 100) {
>   return currentThreshold + 100;
> }
> return 10 * currentThreshold;
>

[jira] [Commented] (HIVE-20512) Improve record and memory usage logging in SparkRecordHandler

2018-10-29 Thread Bharathkrishna Guruvayoor Murali (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667698#comment-16667698
 ] 

Bharathkrishna Guruvayoor Murali commented on HIVE-20512:
-

Tests run locally but here it ails with "did not produce a TEST-*.xml file 
(likely timed out". Attaching HIVE-20512.5.patch with no awaitTermination to 
see if the tests are passing.

> Improve record and memory usage logging in SparkRecordHandler
> -
>
> Key: HIVE-20512
> URL: https://issues.apache.org/jira/browse/HIVE-20512
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Bharathkrishna Guruvayoor Murali
>Priority: Major
> Attachments: HIVE-20512.1.patch, HIVE-20512.2.patch, 
> HIVE-20512.3.patch, HIVE-20512.4.patch, HIVE-20512.5.patch
>
>
> We currently log memory usage and # of records processed in Spark tasks, but 
> we should improve the methodology for how frequently we log this info. 
> Currently we use the following code:
> {code:java}
> private long getNextLogThreshold(long currentThreshold) {
> // A very simple counter to keep track of number of rows processed by the
> // reducer. It dumps
> // every 1 million times, and quickly before that
> if (currentThreshold >= 100) {
>   return currentThreshold + 100;
> }
> return 10 * currentThreshold;
>   }
> {code}
> The issue is that after a while, the increase by 10x factor means that you 
> have to process a huge # of records before this gets triggered.
> A better approach would be to log this info at a given interval. This would 
> help in debugging tasks that are seemingly hung.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20512) Improve record and memory usage logging in SparkRecordHandler

2018-10-29 Thread Bharathkrishna Guruvayoor Murali (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667405#comment-16667405
 ] 

Bharathkrishna Guruvayoor Murali commented on HIVE-20512:
-

Thanks [~stakiar] for the review.
The test failures are all related to Hive-On-spark so I definitely want to 
check if those are working. I will follow up on that.

> Improve record and memory usage logging in SparkRecordHandler
> -
>
> Key: HIVE-20512
> URL: https://issues.apache.org/jira/browse/HIVE-20512
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Bharathkrishna Guruvayoor Murali
>Priority: Major
> Attachments: HIVE-20512.1.patch, HIVE-20512.2.patch, 
> HIVE-20512.3.patch, HIVE-20512.4.patch
>
>
> We currently log memory usage and # of records processed in Spark tasks, but 
> we should improve the methodology for how frequently we log this info. 
> Currently we use the following code:
> {code:java}
> private long getNextLogThreshold(long currentThreshold) {
> // A very simple counter to keep track of number of rows processed by the
> // reducer. It dumps
> // every 1 million times, and quickly before that
> if (currentThreshold >= 100) {
>   return currentThreshold + 100;
> }
> return 10 * currentThreshold;
>   }
> {code}
> The issue is that after a while, the increase by 10x factor means that you 
> have to process a huge # of records before this gets triggered.
> A better approach would be to log this info at a given interval. This would 
> help in debugging tasks that are seemingly hung.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20512) Improve record and memory usage logging in SparkRecordHandler

2018-10-29 Thread Sahil Takiar (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667403#comment-16667403
 ] 

Sahil Takiar commented on HIVE-20512:
-

+1 are the test failures related? Given there are no new unit tests for this 
patch, I'm assuming it was at least manually verified to work?

> Improve record and memory usage logging in SparkRecordHandler
> -
>
> Key: HIVE-20512
> URL: https://issues.apache.org/jira/browse/HIVE-20512
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Bharathkrishna Guruvayoor Murali
>Priority: Major
> Attachments: HIVE-20512.1.patch, HIVE-20512.2.patch, 
> HIVE-20512.3.patch, HIVE-20512.4.patch
>
>
> We currently log memory usage and # of records processed in Spark tasks, but 
> we should improve the methodology for how frequently we log this info. 
> Currently we use the following code:
> {code:java}
> private long getNextLogThreshold(long currentThreshold) {
> // A very simple counter to keep track of number of rows processed by the
> // reducer. It dumps
> // every 1 million times, and quickly before that
> if (currentThreshold >= 100) {
>   return currentThreshold + 100;
> }
> return 10 * currentThreshold;
>   }
> {code}
> The issue is that after a while, the increase by 10x factor means that you 
> have to process a huge # of records before this gets triggered.
> A better approach would be to log this info at a given interval. This would 
> help in debugging tasks that are seemingly hung.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20512) Improve record and memory usage logging in SparkRecordHandler

2018-10-28 Thread Hive QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1751#comment-1751
 ] 

Hive QA commented on HIVE-20512:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12945976/HIVE-20512.4.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 30 failed/errored test(s), 15043 tests 
executed
*Failed tests:*
{noformat}
TestMiniSparkOnYarnCliDriver - did not produce a TEST-*.xml file (likely timed 
out) (batchId=189)

[infer_bucket_sort_reducers_power_two.q,list_bucket_dml_10.q,orc_merge9.q,leftsemijoin_mr.q,bucket6.q,bucketmapjoin7.q,uber_reduce.q,empty_dir_in_table.q,vector_outer_join2.q,spark_explain_groupbyshuffle.q,spark_dynamic_partition_pruning.q,spark_combine_equivalent_work.q,orc_merge1.q,spark_use_op_stats.q,orc_merge_diff_fs.q,quotedid_smb.q,truncate_column_buckets.q,spark_vectorized_dynamic_partition_pruning.q,spark_in_process_launcher.q,orc_merge3.q]
TestMiniSparkOnYarnCliDriver - did not produce a TEST-*.xml file (likely timed 
out) (batchId=190)

[infer_bucket_sort_num_buckets.q,gen_udf_example_add10.q,spark_explainuser_1.q,spark_use_ts_stats_for_mapjoin.q,orc_merge6.q,orc_merge5.q,bucketmapjoin6.q,spark_opt_shuffle_serde.q,temp_table_external.q,spark_dynamic_partition_pruning_6.q,dynamic_rdd_cache.q,auto_sortmerge_join_16.q,vector_outer_join3.q,spark_dynamic_partition_pruning_7.q,schemeAuthority.q,parallel_orderby.q,vector_outer_join1.q,load_hdfs_file_with_space_in_the_name.q,spark_dynamic_partition_pruning_recursive_mapjoin.q,spark_dynamic_partition_pruning_mapjoin_only.q]
TestMiniSparkOnYarnCliDriver - did not produce a TEST-*.xml file (likely timed 
out) (batchId=191)

[insert_overwrite_directory2.q,spark_dynamic_partition_pruning_4.q,import_exported_table.q,vector_outer_join0.q,bucket4.q,orc_merge4.q,infer_bucket_sort_merge.q,orc_merge_incompat1.q,root_dir_external_table.q,constprog_partitioner.q,constprog_semijoin.q,external_table_with_space_in_location_path.q,spark_constprog_dpp.q,spark_dynamic_partition_pruning_3.q,load_fs2.q,infer_bucket_sort_map_operators.q,spark_dynamic_partition_pruning_2.q,vector_inner_join.q,spark_multi_insert_parallel_orderby.q,remote_script.q]
TestMiniSparkOnYarnCliDriver - did not produce a TEST-*.xml file (likely timed 
out) (batchId=192)

[scriptfile1.q,vector_outer_join5.q,file_with_header_footer.q,input16_cc.q,bucket5.q,orc_merge2.q,reduce_deduplicate.q,schemeAuthority2.q,spark_dynamic_partition_pruning_5.q,orc_merge8.q,orc_merge_incompat2.q,infer_bucket_sort_bucketed_table.q,vector_outer_join4.q,disable_merge_for_bucketing.q,orc_merge7.q]
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=110)

[bucketmapjoin4.q,bucket_map_join_spark4.q,union21.q,groupby2_noskew.q,timestamp_2.q,date_join1.q,mergejoins.q,smb_mapjoin_11.q,auto_sortmerge_join_3.q,mapjoin_test_outer.q,vectorization_9.q,merge2.q,groupby6_noskew.q,auto_join_without_localtask.q,multi_join_union.q]
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=111)

[join_cond_pushdown_unqual4.q,union_remove_7.q,join13.q,join_vc.q,groupby_cube1.q,parquet_vectorization_2.q,bucket_map_join_spark2.q,sample3.q,smb_mapjoin_19.q,union23.q,union.q,union31.q,cbo_udf_udaf.q,ptf_decimal.q,bucketmapjoin2.q]
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=112)

[parallel_join1.q,union27.q,union12.q,groupby7_map_multi_single_reducer.q,varchar_join1.q,join7.q,join_reorder4.q,skewjoinopt2.q,bucketsortoptimize_insert_2.q,smb_mapjoin_17.q,script_env_var1.q,groupby7_map.q,bucketsortoptimize_insert_8.q,stats16.q,union20.q]
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=114)

[groupby_map_ppr.q,nullgroup4_multi_distinct.q,join_rc.q,union14.q,order2.q,smb_mapjoin_12.q,vector_cast_constant.q,union_remove_4.q,parquet_vectorization_1.q,auto_join11.q,udaf_collect_set.q,vectorization_12.q,groupby_sort_skew_1_23.q,smb_mapjoin_25.q,skewjoinopt12.q]
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=115)

[skewjoinopt15.q,auto_join18.q,list_bucket_dml_2.q,input1_limit.q,load_dyn_part3.q,union_remove_14.q,auto_sortmerge_join_14.q,auto_sortmerge_join_15.q,union10.q,bucket_map_join_tez2.q,groupby5_map_skew.q,load_dyn_part7.q,join_reorder.q,bucketmapjoin8.q,union34.q]
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=116)

[avro_joins.q,parquet_vectorization_8.q,auto_join14.q,vectorization_14.q,auto_join26.q,stats1.q,cbo_stats.q,union22.q,union_view.q,subquery_views.q,smb_mapjoin_22.q,stats15.q,ptf_matchpath.q,transform_ppr1.q,sample1.q]
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=117)

[jira] [Commented] (HIVE-20512) Improve record and memory usage logging in SparkRecordHandler

2018-10-28 Thread Hive QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1701#comment-1701
 ] 

Hive QA commented on HIVE-20512:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
52s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
2s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
35s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  3m 
42s{color} | {color:blue} ql in master has 2317 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
56s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
37s{color} | {color:green} ql: The patch generated 0 new + 4 unchanged - 5 
fixed = 4 total (was 9) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  3m 
50s{color} | {color:red} ql generated 1 new + 2317 unchanged - 0 fixed = 2318 
total (was 2317) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
13s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 22m 23s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:ql |
|  |  Increment of volatile field 
org.apache.hadoop.hive.ql.exec.spark.SparkRecordHandler.rowNumber in 
org.apache.hadoop.hive.ql.exec.spark.SparkRecordHandler.incrementRowNumber()  
At SparkRecordHandler.java:in 
org.apache.hadoop.hive.ql.exec.spark.SparkRecordHandler.incrementRowNumber()  
At SparkRecordHandler.java:[line 109] |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-14676/dev-support/hive-personality.sh
 |
| git revision | master / 54bba9c |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| findbugs | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-14676/yetus/new-findbugs-ql.html
 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-14676/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Improve record and memory usage logging in SparkRecordHandler
> -
>
> Key: HIVE-20512
> URL: https://issues.apache.org/jira/browse/HIVE-20512
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Bharathkrishna Guruvayoor Murali
>Priority: Major
> Attachments: HIVE-20512.1.patch, HIVE-20512.2.patch, 
> HIVE-20512.3.patch, HIVE-20512.4.patch
>
>
> We currently log memory usage and # of records processed in Spark tasks, but 
> we should improve the methodology for how frequently we log this info. 
> Currently we use the following code:
> {code:java}
> private long getNextLogThreshold(long currentThreshold) {
> // A very simple counter to keep track of number of rows processed by the
> // reducer. It dumps
> // every 1 million times, and quickly before that
> if (currentThreshold >= 100) {
>   return currentThreshold + 100;
> }
> return 10 * currentThreshold;
>   }
> {code}
> The

[jira] [Commented] (HIVE-20512) Improve record and memory usage logging in SparkRecordHandler

2018-10-28 Thread Hive QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1610#comment-1610
 ] 

Hive QA commented on HIVE-20512:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12945839/HIVE-20512.4.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 33 failed/errored test(s), 15026 tests 
executed
*Failed tests:*
{noformat}
TestMiniDruidCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=196)

[druidmini_masking.q,druidmini_test1.q,druidkafkamini_basic.q,druidmini_joins.q,druid_timestamptz.q]
TestMiniLlapCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=154)

[intersect_all.q,unionDistinct_1.q,table_nonprintable.q,orc_llap_counters1.q,mm_cttas.q,whroot_external1.q,global_limit.q,cte_2.q,rcfile_createas1.q,dynamic_partition_pruning_2.q,intersect_merge.q,results_cache_diff_fs.q,cttl.q,parallel_colstats.q,load_hdfs_file_with_space_in_the_name.q]
TestMiniSparkOnYarnCliDriver - did not produce a TEST-*.xml file (likely timed 
out) (batchId=189)

[infer_bucket_sort_reducers_power_two.q,list_bucket_dml_10.q,orc_merge9.q,leftsemijoin_mr.q,bucket6.q,bucketmapjoin7.q,uber_reduce.q,empty_dir_in_table.q,vector_outer_join2.q,spark_explain_groupbyshuffle.q,spark_dynamic_partition_pruning.q,spark_combine_equivalent_work.q,orc_merge1.q,spark_use_op_stats.q,orc_merge_diff_fs.q,quotedid_smb.q,truncate_column_buckets.q,spark_vectorized_dynamic_partition_pruning.q,spark_in_process_launcher.q,orc_merge3.q]
TestMiniSparkOnYarnCliDriver - did not produce a TEST-*.xml file (likely timed 
out) (batchId=190)

[infer_bucket_sort_num_buckets.q,gen_udf_example_add10.q,spark_explainuser_1.q,spark_use_ts_stats_for_mapjoin.q,orc_merge6.q,orc_merge5.q,bucketmapjoin6.q,spark_opt_shuffle_serde.q,temp_table_external.q,spark_dynamic_partition_pruning_6.q,dynamic_rdd_cache.q,auto_sortmerge_join_16.q,vector_outer_join3.q,spark_dynamic_partition_pruning_7.q,schemeAuthority.q,parallel_orderby.q,vector_outer_join1.q,load_hdfs_file_with_space_in_the_name.q,spark_dynamic_partition_pruning_recursive_mapjoin.q,spark_dynamic_partition_pruning_mapjoin_only.q]
TestMiniSparkOnYarnCliDriver - did not produce a TEST-*.xml file (likely timed 
out) (batchId=191)

[insert_overwrite_directory2.q,spark_dynamic_partition_pruning_4.q,import_exported_table.q,vector_outer_join0.q,bucket4.q,orc_merge4.q,infer_bucket_sort_merge.q,orc_merge_incompat1.q,root_dir_external_table.q,constprog_partitioner.q,constprog_semijoin.q,external_table_with_space_in_location_path.q,spark_constprog_dpp.q,spark_dynamic_partition_pruning_3.q,load_fs2.q,infer_bucket_sort_map_operators.q,spark_dynamic_partition_pruning_2.q,vector_inner_join.q,spark_multi_insert_parallel_orderby.q,remote_script.q]
TestMiniSparkOnYarnCliDriver - did not produce a TEST-*.xml file (likely timed 
out) (batchId=192)

[scriptfile1.q,vector_outer_join5.q,file_with_header_footer.q,input16_cc.q,bucket5.q,orc_merge2.q,reduce_deduplicate.q,schemeAuthority2.q,spark_dynamic_partition_pruning_5.q,orc_merge8.q,orc_merge_incompat2.q,infer_bucket_sort_bucketed_table.q,vector_outer_join4.q,disable_merge_for_bucketing.q,orc_merge7.q]
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=110)

[bucketmapjoin4.q,bucket_map_join_spark4.q,union21.q,groupby2_noskew.q,timestamp_2.q,date_join1.q,mergejoins.q,smb_mapjoin_11.q,auto_sortmerge_join_3.q,mapjoin_test_outer.q,vectorization_9.q,merge2.q,groupby6_noskew.q,auto_join_without_localtask.q,multi_join_union.q]
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=111)

[join_cond_pushdown_unqual4.q,union_remove_7.q,join13.q,join_vc.q,groupby_cube1.q,parquet_vectorization_2.q,bucket_map_join_spark2.q,sample3.q,smb_mapjoin_19.q,union23.q,union.q,union31.q,cbo_udf_udaf.q,ptf_decimal.q,bucketmapjoin2.q]
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=112)

[parallel_join1.q,union27.q,union12.q,groupby7_map_multi_single_reducer.q,varchar_join1.q,join7.q,join_reorder4.q,skewjoinopt2.q,bucketsortoptimize_insert_2.q,smb_mapjoin_17.q,script_env_var1.q,groupby7_map.q,bucketsortoptimize_insert_8.q,stats16.q,union20.q]
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=114)

[groupby_map_ppr.q,nullgroup4_multi_distinct.q,join_rc.q,union14.q,order2.q,smb_mapjoin_12.q,vector_cast_constant.q,union_remove_4.q,parquet_vectorization_1.q,auto_join11.q,udaf_collect_set.q,vectorization_12.q,groupby_sort_skew_1_23.q,smb_mapjoin_25.q,skewjoinopt12.q]
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=115)

[jira] [Commented] (HIVE-20512) Improve record and memory usage logging in SparkRecordHandler

2018-10-28 Thread Hive QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1593#comment-1593
 ] 

Hive QA commented on HIVE-20512:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
42s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
1s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
38s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  3m 
42s{color} | {color:blue} ql in master has 2317 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
53s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
35s{color} | {color:green} ql: The patch generated 0 new + 4 unchanged - 5 
fixed = 4 total (was 9) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  3m 
45s{color} | {color:red} ql generated 1 new + 2317 unchanged - 0 fixed = 2318 
total (was 2317) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
12s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 22m 11s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:ql |
|  |  Increment of volatile field 
org.apache.hadoop.hive.ql.exec.spark.SparkRecordHandler.rowNumber in 
org.apache.hadoop.hive.ql.exec.spark.SparkRecordHandler.incrementRowNumber()  
At SparkRecordHandler.java:in 
org.apache.hadoop.hive.ql.exec.spark.SparkRecordHandler.incrementRowNumber()  
At SparkRecordHandler.java:[line 109] |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-14672/dev-support/hive-personality.sh
 |
| git revision | master / 54bba9c |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| findbugs | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-14672/yetus/new-findbugs-ql.html
 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-14672/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Improve record and memory usage logging in SparkRecordHandler
> -
>
> Key: HIVE-20512
> URL: https://issues.apache.org/jira/browse/HIVE-20512
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Bharathkrishna Guruvayoor Murali
>Priority: Major
> Attachments: HIVE-20512.1.patch, HIVE-20512.2.patch, 
> HIVE-20512.3.patch, HIVE-20512.4.patch
>
>
> We currently log memory usage and # of records processed in Spark tasks, but 
> we should improve the methodology for how frequently we log this info. 
> Currently we use the following code:
> {code:java}
> private long getNextLogThreshold(long currentThreshold) {
> // A very simple counter to keep track of number of rows processed by the
> // reducer. It dumps
> // every 1 million times, and quickly before that
> if (currentThreshold >= 100) {
>   return currentThreshold + 100;
> }
> return 10 * currentThreshold;
>   }
> {code}
> The

[jira] [Commented] (HIVE-20512) Improve record and memory usage logging in SparkRecordHandler

2018-10-24 Thread Hive QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16663077#comment-16663077
 ] 

Hive QA commented on HIVE-20512:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12945471/HIVE-20512.3.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 15506 tests passed

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/14630/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/14630/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-14630/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12945471 - PreCommit-HIVE-Build

> Improve record and memory usage logging in SparkRecordHandler
> -
>
> Key: HIVE-20512
> URL: https://issues.apache.org/jira/browse/HIVE-20512
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Bharathkrishna Guruvayoor Murali
>Priority: Major
> Attachments: HIVE-20512.1.patch, HIVE-20512.2.patch, 
> HIVE-20512.3.patch
>
>
> We currently log memory usage and # of records processed in Spark tasks, but 
> we should improve the methodology for how frequently we log this info. 
> Currently we use the following code:
> {code:java}
> private long getNextLogThreshold(long currentThreshold) {
> // A very simple counter to keep track of number of rows processed by the
> // reducer. It dumps
> // every 1 million times, and quickly before that
> if (currentThreshold >= 100) {
>   return currentThreshold + 100;
> }
> return 10 * currentThreshold;
>   }
> {code}
> The issue is that after a while, the increase by 10x factor means that you 
> have to process a huge # of records before this gets triggered.
> A better approach would be to log this info at a given interval. This would 
> help in debugging tasks that are seemingly hung.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20512) Improve record and memory usage logging in SparkRecordHandler

2018-10-24 Thread Hive QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16663036#comment-16663036
 ] 

Hive QA commented on HIVE-20512:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
 7s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
8s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
38s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m  
8s{color} | {color:blue} ql in master has 2317 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
57s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
40s{color} | {color:green} ql: The patch generated 0 new + 4 unchanged - 5 
fixed = 4 total (was 9) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
13s{color} | {color:red} The patch generated 2 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 23m 53s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-14630/dev-support/hive-personality.sh
 |
| git revision | master / 3cbc13e |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| asflicense | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-14630/yetus/patch-asflicense-problems.txt
 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-14630/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Improve record and memory usage logging in SparkRecordHandler
> -
>
> Key: HIVE-20512
> URL: https://issues.apache.org/jira/browse/HIVE-20512
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Bharathkrishna Guruvayoor Murali
>Priority: Major
> Attachments: HIVE-20512.1.patch, HIVE-20512.2.patch, 
> HIVE-20512.3.patch
>
>
> We currently log memory usage and # of records processed in Spark tasks, but 
> we should improve the methodology for how frequently we log this info. 
> Currently we use the following code:
> {code:java}
> private long getNextLogThreshold(long currentThreshold) {
> // A very simple counter to keep track of number of rows processed by the
> // reducer. It dumps
> // every 1 million times, and quickly before that
> if (currentThreshold >= 100) {
>   return currentThreshold + 100;
> }
> return 10 * currentThreshold;
>   }
> {code}
> The issue is that after a while, the increase by 10x factor means that you 
> have to process a huge # of records before this gets triggered.
> A better approach would be to log this info at a given interval. This would 
> help in debugging tasks that are seemingly hung.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20512) Improve record and memory usage logging in SparkRecordHandler

2018-10-20 Thread Hive QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16657987#comment-16657987
 ] 

Hive QA commented on HIVE-20512:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12944872/HIVE-20512.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 15111 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_decimal_aggregate]
 (batchId=162)
org.apache.hive.jdbc.miniHS2.TestMiniHS2.testConfInSession (batchId=261)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/14598/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/14598/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-14598/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12944872 - PreCommit-HIVE-Build

> Improve record and memory usage logging in SparkRecordHandler
> -
>
> Key: HIVE-20512
> URL: https://issues.apache.org/jira/browse/HIVE-20512
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Bharathkrishna Guruvayoor Murali
>Priority: Major
> Attachments: HIVE-20512.1.patch, HIVE-20512.2.patch
>
>
> We currently log memory usage and # of records processed in Spark tasks, but 
> we should improve the methodology for how frequently we log this info. 
> Currently we use the following code:
> {code:java}
> private long getNextLogThreshold(long currentThreshold) {
> // A very simple counter to keep track of number of rows processed by the
> // reducer. It dumps
> // every 1 million times, and quickly before that
> if (currentThreshold >= 100) {
>   return currentThreshold + 100;
> }
> return 10 * currentThreshold;
>   }
> {code}
> The issue is that after a while, the increase by 10x factor means that you 
> have to process a huge # of records before this gets triggered.
> A better approach would be to log this info at a given interval. This would 
> help in debugging tasks that are seemingly hung.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20512) Improve record and memory usage logging in SparkRecordHandler

2018-10-20 Thread Hive QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16657981#comment-16657981
 ] 

Hive QA commented on HIVE-20512:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
1s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 11m 
47s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
13s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
43s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m 
36s{color} | {color:blue} ql in master has 2318 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
2s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
44s{color} | {color:green} ql: The patch generated 0 new + 4 unchanged - 5 
fixed = 4 total (was 9) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
5s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
15s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 29m 40s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-14598/dev-support/hive-personality.sh
 |
| git revision | master / 7ed8bf4 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-14598/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Improve record and memory usage logging in SparkRecordHandler
> -
>
> Key: HIVE-20512
> URL: https://issues.apache.org/jira/browse/HIVE-20512
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Bharathkrishna Guruvayoor Murali
>Priority: Major
> Attachments: HIVE-20512.1.patch, HIVE-20512.2.patch
>
>
> We currently log memory usage and # of records processed in Spark tasks, but 
> we should improve the methodology for how frequently we log this info. 
> Currently we use the following code:
> {code:java}
> private long getNextLogThreshold(long currentThreshold) {
> // A very simple counter to keep track of number of rows processed by the
> // reducer. It dumps
> // every 1 million times, and quickly before that
> if (currentThreshold >= 100) {
>   return currentThreshold + 100;
> }
> return 10 * currentThreshold;
>   }
> {code}
> The issue is that after a while, the increase by 10x factor means that you 
> have to process a huge # of records before this gets triggered.
> A better approach would be to log this info at a given interval. This would 
> help in debugging tasks that are seemingly hung.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20512) Improve record and memory usage logging in SparkRecordHandler

2018-10-20 Thread Bharathkrishna Guruvayoor Murali (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16657965#comment-16657965
 ] 

Bharathkrishna Guruvayoor Murali commented on HIVE-20512:
-

Added review board link with updated patch addressing the above comments.
[~stakiar] 

> Improve record and memory usage logging in SparkRecordHandler
> -
>
> Key: HIVE-20512
> URL: https://issues.apache.org/jira/browse/HIVE-20512
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Bharathkrishna Guruvayoor Murali
>Priority: Major
> Attachments: HIVE-20512.1.patch, HIVE-20512.2.patch
>
>
> We currently log memory usage and # of records processed in Spark tasks, but 
> we should improve the methodology for how frequently we log this info. 
> Currently we use the following code:
> {code:java}
> private long getNextLogThreshold(long currentThreshold) {
> // A very simple counter to keep track of number of rows processed by the
> // reducer. It dumps
> // every 1 million times, and quickly before that
> if (currentThreshold >= 100) {
>   return currentThreshold + 100;
> }
> return 10 * currentThreshold;
>   }
> {code}
> The issue is that after a while, the increase by 10x factor means that you 
> have to process a huge # of records before this gets triggered.
> A better approach would be to log this info at a given interval. This would 
> help in debugging tasks that are seemingly hung.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20512) Improve record and memory usage logging in SparkRecordHandler

2018-10-18 Thread Sahil Takiar (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16655439#comment-16655439
 ] 

Sahil Takiar commented on HIVE-20512:
-

A few comments:
(1) I think the logging should be done in a separate thread so that we don't 
have to invoke {{logMemoryInfo()}} for each record, which can add significant 
overhead to per-record processing
(2) I think we should start with a lower interval, something like 15 seconds

You could try and add a unit test that logs to a string buffers, and then parse 
that string buffer in a unit test. However, I don't think its necessary.

CC: [~asinkovits]

> Improve record and memory usage logging in SparkRecordHandler
> -
>
> Key: HIVE-20512
> URL: https://issues.apache.org/jira/browse/HIVE-20512
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Bharathkrishna Guruvayoor Murali
>Priority: Major
> Attachments: HIVE-20512.1.patch
>
>
> We currently log memory usage and # of records processed in Spark tasks, but 
> we should improve the methodology for how frequently we log this info. 
> Currently we use the following code:
> {code:java}
> private long getNextLogThreshold(long currentThreshold) {
> // A very simple counter to keep track of number of rows processed by the
> // reducer. It dumps
> // every 1 million times, and quickly before that
> if (currentThreshold >= 100) {
>   return currentThreshold + 100;
> }
> return 10 * currentThreshold;
>   }
> {code}
> The issue is that after a while, the increase by 10x factor means that you 
> have to process a huge # of records before this gets triggered.
> A better approach would be to log this info at a given interval. This would 
> help in debugging tasks that are seemingly hung.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20512) Improve record and memory usage logging in SparkRecordHandler

2018-10-17 Thread Hive QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16653949#comment-16653949
 ] 

Hive QA commented on HIVE-20512:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12944264/HIVE-20512.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 15096 tests passed

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/14524/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/14524/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-14524/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12944264 - PreCommit-HIVE-Build

> Improve record and memory usage logging in SparkRecordHandler
> -
>
> Key: HIVE-20512
> URL: https://issues.apache.org/jira/browse/HIVE-20512
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Bharathkrishna Guruvayoor Murali
>Priority: Major
> Attachments: HIVE-20512.1.patch
>
>
> We currently log memory usage and # of records processed in Spark tasks, but 
> we should improve the methodology for how frequently we log this info. 
> Currently we use the following code:
> {code:java}
> private long getNextLogThreshold(long currentThreshold) {
> // A very simple counter to keep track of number of rows processed by the
> // reducer. It dumps
> // every 1 million times, and quickly before that
> if (currentThreshold >= 100) {
>   return currentThreshold + 100;
> }
> return 10 * currentThreshold;
>   }
> {code}
> The issue is that after a while, the increase by 10x factor means that you 
> have to process a huge # of records before this gets triggered.
> A better approach would be to log this info at a given interval. This would 
> help in debugging tasks that are seemingly hung.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20512) Improve record and memory usage logging in SparkRecordHandler

2018-10-17 Thread Hive QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16653885#comment-16653885
 ] 

Hive QA commented on HIVE-20512:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
 1s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
1s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
37s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  3m 
56s{color} | {color:blue} ql in master has 2318 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
55s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
38s{color} | {color:green} ql: The patch generated 0 new + 4 unchanged - 2 
fixed = 4 total (was 6) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
13s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 23m 16s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-14524/dev-support/hive-personality.sh
 |
| git revision | master / fa97c67 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-14524/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Improve record and memory usage logging in SparkRecordHandler
> -
>
> Key: HIVE-20512
> URL: https://issues.apache.org/jira/browse/HIVE-20512
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Bharathkrishna Guruvayoor Murali
>Priority: Major
> Attachments: HIVE-20512.1.patch
>
>
> We currently log memory usage and # of records processed in Spark tasks, but 
> we should improve the methodology for how frequently we log this info. 
> Currently we use the following code:
> {code:java}
> private long getNextLogThreshold(long currentThreshold) {
> // A very simple counter to keep track of number of rows processed by the
> // reducer. It dumps
> // every 1 million times, and quickly before that
> if (currentThreshold >= 100) {
>   return currentThreshold + 100;
> }
> return 10 * currentThreshold;
>   }
> {code}
> The issue is that after a while, the increase by 10x factor means that you 
> have to process a huge # of records before this gets triggered.
> A better approach would be to log this info at a given interval. This would 
> help in debugging tasks that are seemingly hung.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20512) Improve record and memory usage logging in SparkRecordHandler

2018-10-16 Thread Hive QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16652888#comment-16652888
 ] 

Hive QA commented on HIVE-20512:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12944186/HIVE-20512.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 15096 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.ql.TestTxnCommands.testMergeOnTezEdges (batchId=318)
org.apache.hive.jdbc.TestJdbcWithMiniLlapArrow.testComplexQuery (batchId=253)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/14512/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/14512/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-14512/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12944186 - PreCommit-HIVE-Build

> Improve record and memory usage logging in SparkRecordHandler
> -
>
> Key: HIVE-20512
> URL: https://issues.apache.org/jira/browse/HIVE-20512
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Bharathkrishna Guruvayoor Murali
>Priority: Major
> Attachments: HIVE-20512.1.patch
>
>
> We currently log memory usage and # of records processed in Spark tasks, but 
> we should improve the methodology for how frequently we log this info. 
> Currently we use the following code:
> {code:java}
> private long getNextLogThreshold(long currentThreshold) {
> // A very simple counter to keep track of number of rows processed by the
> // reducer. It dumps
> // every 1 million times, and quickly before that
> if (currentThreshold >= 100) {
>   return currentThreshold + 100;
> }
> return 10 * currentThreshold;
>   }
> {code}
> The issue is that after a while, the increase by 10x factor means that you 
> have to process a huge # of records before this gets triggered.
> A better approach would be to log this info at a given interval. This would 
> help in debugging tasks that are seemingly hung.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20512) Improve record and memory usage logging in SparkRecordHandler

2018-10-16 Thread Hive QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16652858#comment-16652858
 ] 

Hive QA commented on HIVE-20512:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
33s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
58s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
35s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  3m 
38s{color} | {color:blue} ql in master has 2318 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
54s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
35s{color} | {color:green} ql: The patch generated 0 new + 4 unchanged - 2 
fixed = 4 total (was 6) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
12s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 21m 59s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-14512/dev-support/hive-personality.sh
 |
| git revision | master / fa97c67 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-14512/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Improve record and memory usage logging in SparkRecordHandler
> -
>
> Key: HIVE-20512
> URL: https://issues.apache.org/jira/browse/HIVE-20512
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Bharathkrishna Guruvayoor Murali
>Priority: Major
> Attachments: HIVE-20512.1.patch
>
>
> We currently log memory usage and # of records processed in Spark tasks, but 
> we should improve the methodology for how frequently we log this info. 
> Currently we use the following code:
> {code:java}
> private long getNextLogThreshold(long currentThreshold) {
> // A very simple counter to keep track of number of rows processed by the
> // reducer. It dumps
> // every 1 million times, and quickly before that
> if (currentThreshold >= 100) {
>   return currentThreshold + 100;
> }
> return 10 * currentThreshold;
>   }
> {code}
> The issue is that after a while, the increase by 10x factor means that you 
> have to process a huge # of records before this gets triggered.
> A better approach would be to log this info at a given interval. This would 
> help in debugging tasks that are seemingly hung.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20512) Improve record and memory usage logging in SparkRecordHandler

2018-10-12 Thread Sahil Takiar (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648296#comment-16648296
 ] 

Sahil Takiar commented on HIVE-20512:
-

I mean time interval. So log the # of rows processed every few minutes (or 
something like that). Perhaps an exponentially increasing interval would work 
best so the logs don't explode.

> Improve record and memory usage logging in SparkRecordHandler
> -
>
> Key: HIVE-20512
> URL: https://issues.apache.org/jira/browse/HIVE-20512
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Priority: Major
>
> We currently log memory usage and # of records processed in Spark tasks, but 
> we should improve the methodology for how frequently we log this info. 
> Currently we use the following code:
> {code:java}
> private long getNextLogThreshold(long currentThreshold) {
> // A very simple counter to keep track of number of rows processed by the
> // reducer. It dumps
> // every 1 million times, and quickly before that
> if (currentThreshold >= 100) {
>   return currentThreshold + 100;
> }
> return 10 * currentThreshold;
>   }
> {code}
> The issue is that after a while, the increase by 10x factor means that you 
> have to process a huge # of records before this gets triggered.
> A better approach would be to log this info at a given interval. This would 
> help in debugging tasks that are seemingly hung.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20512) Improve record and memory usage logging in SparkRecordHandler

2018-10-12 Thread Bharathkrishna Guruvayoor Murali (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648292#comment-16648292
 ] 

Bharathkrishna Guruvayoor Murali commented on HIVE-20512:
-

Hi Sahil,



What do you mean by logging at a given interval? Do you mean to log at given 
intervals of time instead of the number of rows?

> Improve record and memory usage logging in SparkRecordHandler
> -
>
> Key: HIVE-20512
> URL: https://issues.apache.org/jira/browse/HIVE-20512
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Priority: Major
>
> We currently log memory usage and # of records processed in Spark tasks, but 
> we should improve the methodology for how frequently we log this info. 
> Currently we use the following code:
> {code:java}
> private long getNextLogThreshold(long currentThreshold) {
> // A very simple counter to keep track of number of rows processed by the
> // reducer. It dumps
> // every 1 million times, and quickly before that
> if (currentThreshold >= 100) {
>   return currentThreshold + 100;
> }
> return 10 * currentThreshold;
>   }
> {code}
> The issue is that after a while, the increase by 10x factor means that you 
> have to process a huge # of records before this gets triggered.
> A better approach would be to log this info at a given interval. This would 
> help in debugging tasks that are seemingly hung.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

42 matches

Mail list logo