[jira] [Commented] (HIVE-19937) Intern fields in MapWork on deserialization

2018-08-03 Thread Vihang Karajgaonkar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16568535#comment-16568535
 ] 

Vihang Karajgaonkar commented on HIVE-19937:


HI [~stakiar] Sorry for the delay in responding. Can you please add a comment 
mentioning that set method interns the duplicate strings, just before the 
{{mapWork.setPathToPartitionInfo}} and {{partitionDesc.setBaseFileName}} in the 
customer deserializer's read method so that its more obvious why we are calling 
the set method. I feel it is easier to understand that way.

Rest looks good. Thanks for the patch. +1

> Intern fields in MapWork on deserialization
> ---
>
> Key: HIVE-19937
> URL: https://issues.apache.org/jira/browse/HIVE-19937
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HIVE-19937.1.patch, HIVE-19937.2.patch, 
> HIVE-19937.3.patch, HIVE-19937.4.patch, HIVE-19937.5.patch, 
> post-patch-report.html, report.html
>
>
> When fixing HIVE-16395, we decided that each new Spark task should clone the 
> {{JobConf}} object to prevent any {{ConcurrentModificationException}} from 
> being thrown. However, setting this variable comes at a cost of storing a 
> duplicate {{JobConf}} object for each Spark task. These objects can take up a 
> significant amount of memory, we should intern them so that Spark tasks 
> running in the same JVM don't store duplicate copies.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19937) Intern fields in MapWork on deserialization

2018-08-02 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16567402#comment-16567402
 ] 

Sahil Takiar commented on HIVE-19937:
-

Ping [~vihangk1]

> Intern fields in MapWork on deserialization
> ---
>
> Key: HIVE-19937
> URL: https://issues.apache.org/jira/browse/HIVE-19937
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HIVE-19937.1.patch, HIVE-19937.2.patch, 
> HIVE-19937.3.patch, HIVE-19937.4.patch, HIVE-19937.5.patch, 
> post-patch-report.html, report.html
>
>
> When fixing HIVE-16395, we decided that each new Spark task should clone the 
> {{JobConf}} object to prevent any {{ConcurrentModificationException}} from 
> being thrown. However, setting this variable comes at a cost of storing a 
> duplicate {{JobConf}} object for each Spark task. These objects can take up a 
> significant amount of memory, we should intern them so that Spark tasks 
> running in the same JVM don't store duplicate copies.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19937) Intern fields in MapWork on deserialization

2018-07-24 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16554477#comment-16554477
 ] 

Sahil Takiar commented on HIVE-19937:
-

[~vihangk1] could you take a look?

> Intern fields in MapWork on deserialization
> ---
>
> Key: HIVE-19937
> URL: https://issues.apache.org/jira/browse/HIVE-19937
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HIVE-19937.1.patch, HIVE-19937.2.patch, 
> HIVE-19937.3.patch, HIVE-19937.4.patch, HIVE-19937.5.patch, 
> post-patch-report.html, report.html
>
>
> When fixing HIVE-16395, we decided that each new Spark task should clone the 
> {{JobConf}} object to prevent any {{ConcurrentModificationException}} from 
> being thrown. However, setting this variable comes at a cost of storing a 
> duplicate {{JobConf}} object for each Spark task. These objects can take up a 
> significant amount of memory, we should intern them so that Spark tasks 
> running in the same JVM don't store duplicate copies.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19937) Intern fields in MapWork on deserialization

2018-07-23 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553538#comment-16553538
 ] 

Sahil Takiar commented on HIVE-19937:
-

Thanks Misha, yes I will keep this in mind. I don't think this part of the code 
is CPU bound, so it shouldn't be an issue.

> Intern fields in MapWork on deserialization
> ---
>
> Key: HIVE-19937
> URL: https://issues.apache.org/jira/browse/HIVE-19937
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HIVE-19937.1.patch, HIVE-19937.2.patch, 
> HIVE-19937.3.patch, HIVE-19937.4.patch, HIVE-19937.5.patch, 
> post-patch-report.html, report.html
>
>
> When fixing HIVE-16395, we decided that each new Spark task should clone the 
> {{JobConf}} object to prevent any {{ConcurrentModificationException}} from 
> being thrown. However, setting this variable comes at a cost of storing a 
> duplicate {{JobConf}} object for each Spark task. These objects can take up a 
> significant amount of memory, we should intern them so that Spark tasks 
> running in the same JVM don't store duplicate copies.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19937) Intern fields in MapWork on deserialization

2018-07-23 Thread Misha Dmitriev (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553279#comment-16553279
 ] 

Misha Dmitriev commented on HIVE-19937:
---

The last patch looks good to me.

The only slight concern that I got from looking at it once again is the 
following: in one or two places you switched from passing around Strings to 
passing around Paths, and subsequently switched someĀ  HashMap(s) to HashMap. Note that a lookup in a map where keys 
are complex objects is slower, because Path.equals(Path) is slower than 
String.equals(String) - it may involve comparison of many strings, etc. I 
haven't seen any reports on Hive CPU performance problems, and I hope this code 
is not on a critical path, and/or that GC-related savings would offset the 
potential hashmap lookup slowdown... but anyway, I guess it's worth remembering 
about this.

> Intern fields in MapWork on deserialization
> ---
>
> Key: HIVE-19937
> URL: https://issues.apache.org/jira/browse/HIVE-19937
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HIVE-19937.1.patch, HIVE-19937.2.patch, 
> HIVE-19937.3.patch, HIVE-19937.4.patch, HIVE-19937.5.patch, 
> post-patch-report.html, report.html
>
>
> When fixing HIVE-16395, we decided that each new Spark task should clone the 
> {{JobConf}} object to prevent any {{ConcurrentModificationException}} from 
> being thrown. However, setting this variable comes at a cost of storing a 
> duplicate {{JobConf}} object for each Spark task. These objects can take up a 
> significant amount of memory, we should intern them so that Spark tasks 
> running in the same JVM don't store duplicate copies.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19937) Intern fields in MapWork on deserialization

2018-07-22 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552113#comment-16552113
 ] 

Hive QA commented on HIVE-19937:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12932601/HIVE-19937.5.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 14675 tests 
executed
*Failed tests:*
{noformat}
TestMiniDruidCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=192)

[druidmini_dynamic_partition.q,druidmini_expressions.q,druidmini_test_alter.q,druidmini_test1.q,druidmini_test_insert.q]
org.apache.hadoop.hive.cli.TestMiniDruidCliDriver.testCliDriver[druid_timestamptz]
 (batchId=193)
org.apache.hadoop.hive.cli.TestMiniDruidCliDriver.testCliDriver[druidmini_joins]
 (batchId=193)
org.apache.hadoop.hive.cli.TestMiniDruidCliDriver.testCliDriver[druidmini_masking]
 (batchId=193)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/12780/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/12780/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-12780/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12932601 - PreCommit-HIVE-Build

> Intern fields in MapWork on deserialization
> ---
>
> Key: HIVE-19937
> URL: https://issues.apache.org/jira/browse/HIVE-19937
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HIVE-19937.1.patch, HIVE-19937.2.patch, 
> HIVE-19937.3.patch, HIVE-19937.4.patch, HIVE-19937.5.patch, 
> post-patch-report.html, report.html
>
>
> When fixing HIVE-16395, we decided that each new Spark task should clone the 
> {{JobConf}} object to prevent any {{ConcurrentModificationException}} from 
> being thrown. However, setting this variable comes at a cost of storing a 
> duplicate {{JobConf}} object for each Spark task. These objects can take up a 
> significant amount of memory, we should intern them so that Spark tasks 
> running in the same JVM don't store duplicate copies.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19937) Intern fields in MapWork on deserialization

2018-07-22 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552067#comment-16552067
 ] 

Hive QA commented on HIVE-19937:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
 3s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
8s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
41s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m  
2s{color} | {color:blue} ql in master has 2280 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
59s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
5s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
41s{color} | {color:red} ql: The patch generated 5 new + 298 unchanged - 4 
fixed = 303 total (was 302) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  4m 
18s{color} | {color:red} ql generated 3 new + 2279 unchanged - 1 fixed = 2282 
total (was 2280) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
12s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 24m  1s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:ql |
|  |  Redundant nullcheck of nominal which is known to be null in 
org.apache.hadoop.hive.ql.exec.AbstractMapOperator.getNominalPath(Path)  
Redundant null check at AbstractMapOperator.java:is known to be null in 
org.apache.hadoop.hive.ql.exec.AbstractMapOperator.getNominalPath(Path)  
Redundant null check at AbstractMapOperator.java:[line 113] |
|  |  org.apache.hadoop.hive.ql.exec.SerializationUtilities$MapWorkSerializer 
implements Comparator but not Serializable  At 
SerializationUtilities.java:Serializable  At SerializationUtilities.java:[lines 
551-562] |
|  |  
org.apache.hadoop.hive.ql.exec.SerializationUtilities$PartitionDescSerializer 
implements Comparator but not Serializable  At 
SerializationUtilities.java:Serializable  At SerializationUtilities.java:[lines 
572-585] |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-12780/dev-support/hive-personality.sh
 |
| git revision | master / cce3a05 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-12780/yetus/diff-checkstyle-ql.txt
 |
| findbugs | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-12780/yetus/new-findbugs-ql.html
 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-12780/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Intern fields in MapWork on deserialization
> ---
>
> Key: HIVE-19937
> URL: https://issues.apache.org/jira/browse/HIVE-19937
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HIVE-19937.1.patch, HIVE-19937.2.patch, 
> HIVE-19937.3.patch, HIVE-19937.4.patch, HIVE-19937.5.patch, 
> post-patch-report.html, report.html
>
>
> When 

[jira] [Commented] (HIVE-19937) Intern fields in MapWork on deserialization

2018-07-20 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16550341#comment-16550341
 ] 

Hive QA commented on HIVE-19937:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12932285/HIVE-19937.4.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/12721/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/12721/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-12721/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Tests exited with: Exception: Patch URL 
https://issues.apache.org/jira/secure/attachment/12932285/HIVE-19937.4.patch 
was found in seen patch url's cache and a test was probably run already on it. 
Aborting...
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12932285 - PreCommit-HIVE-Build

> Intern fields in MapWork on deserialization
> ---
>
> Key: HIVE-19937
> URL: https://issues.apache.org/jira/browse/HIVE-19937
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HIVE-19937.1.patch, HIVE-19937.2.patch, 
> HIVE-19937.3.patch, HIVE-19937.4.patch, post-patch-report.html, report.html
>
>
> When fixing HIVE-16395, we decided that each new Spark task should clone the 
> {{JobConf}} object to prevent any {{ConcurrentModificationException}} from 
> being thrown. However, setting this variable comes at a cost of storing a 
> duplicate {{JobConf}} object for each Spark task. These objects can take up a 
> significant amount of memory, we should intern them so that Spark tasks 
> running in the same JVM don't store duplicate copies.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19937) Intern fields in MapWork on deserialization

2018-07-20 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16550340#comment-16550340
 ] 

Hive QA commented on HIVE-19937:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12932285/HIVE-19937.4.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 14679 tests 
executed
*Failed tests:*
{noformat}
org.apache.hive.minikdc.TestJdbcWithMiniKdc.testTokenAuth (batchId=263)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/12720/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/12720/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-12720/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12932285 - PreCommit-HIVE-Build

> Intern fields in MapWork on deserialization
> ---
>
> Key: HIVE-19937
> URL: https://issues.apache.org/jira/browse/HIVE-19937
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HIVE-19937.1.patch, HIVE-19937.2.patch, 
> HIVE-19937.3.patch, HIVE-19937.4.patch, post-patch-report.html, report.html
>
>
> When fixing HIVE-16395, we decided that each new Spark task should clone the 
> {{JobConf}} object to prevent any {{ConcurrentModificationException}} from 
> being thrown. However, setting this variable comes at a cost of storing a 
> duplicate {{JobConf}} object for each Spark task. These objects can take up a 
> significant amount of memory, we should intern them so that Spark tasks 
> running in the same JVM don't store duplicate copies.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19937) Intern fields in MapWork on deserialization

2018-07-20 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16550312#comment-16550312
 ] 

Hive QA commented on HIVE-19937:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
23s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
10s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
42s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m  
8s{color} | {color:blue} ql in master has 2281 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
0s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
5s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
43s{color} | {color:red} ql: The patch generated 5 new + 298 unchanged - 4 
fixed = 303 total (was 302) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  4m 
18s{color} | {color:red} ql generated 3 new + 2280 unchanged - 1 fixed = 2283 
total (was 2281) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
13s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 24m 39s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:ql |
|  |  Redundant nullcheck of nominal which is known to be null in 
org.apache.hadoop.hive.ql.exec.AbstractMapOperator.getNominalPath(Path)  
Redundant null check at AbstractMapOperator.java:is known to be null in 
org.apache.hadoop.hive.ql.exec.AbstractMapOperator.getNominalPath(Path)  
Redundant null check at AbstractMapOperator.java:[line 113] |
|  |  org.apache.hadoop.hive.ql.exec.SerializationUtilities$MapWorkSerializer 
implements Comparator but not Serializable  At 
SerializationUtilities.java:Serializable  At SerializationUtilities.java:[lines 
551-562] |
|  |  
org.apache.hadoop.hive.ql.exec.SerializationUtilities$PartitionDescSerializer 
implements Comparator but not Serializable  At 
SerializationUtilities.java:Serializable  At SerializationUtilities.java:[lines 
572-585] |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-12720/dev-support/hive-personality.sh
 |
| git revision | master / 851c8ab |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-12720/yetus/diff-checkstyle-ql.txt
 |
| findbugs | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-12720/yetus/new-findbugs-ql.html
 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-12720/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Intern fields in MapWork on deserialization
> ---
>
> Key: HIVE-19937
> URL: https://issues.apache.org/jira/browse/HIVE-19937
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HIVE-19937.1.patch, HIVE-19937.2.patch, 
> HIVE-19937.3.patch, HIVE-19937.4.patch, post-patch-report.html, report.html
>
>
> When fixing HIVE-16395, we 

[jira] [Commented] (HIVE-19937) Intern fields in MapWork on deserialization

2018-07-19 Thread Misha Dmitriev (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549768#comment-16549768
 ] 

Misha Dmitriev commented on HIVE-19937:
---

+1

One small comment: instead of
 {{this.baseFileName = baseFileName == null ? null : baseFileName.intern();}}
you can just use
 {{this.baseFileName = StringUtils.internIfNotNull(baseFileName);}}

> Intern fields in MapWork on deserialization
> ---
>
> Key: HIVE-19937
> URL: https://issues.apache.org/jira/browse/HIVE-19937
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HIVE-19937.1.patch, HIVE-19937.2.patch, 
> HIVE-19937.3.patch, HIVE-19937.4.patch, post-patch-report.html, report.html
>
>
> When fixing HIVE-16395, we decided that each new Spark task should clone the 
> {{JobConf}} object to prevent any {{ConcurrentModificationException}} from 
> being thrown. However, setting this variable comes at a cost of storing a 
> duplicate {{JobConf}} object for each Spark task. These objects can take up a 
> significant amount of memory, we should intern them so that Spark tasks 
> running in the same JVM don't store duplicate copies.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19937) Intern fields in MapWork on deserialization

2018-07-19 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549704#comment-16549704
 ] 

Sahil Takiar commented on HIVE-19937:
-

The overheads probably won't grow in proportion to the heap, but the goal is to 
allow users to run Hive-on-Spark successfully even with low heap settings (e.g. 
1g).

The overheads are more of a function of the workload. In this case, the 
workload is TPC-DS (a standard SQL benchmarks). Hive users to run queries that 
scan more partitions can expect the overheads to increase.

> Intern fields in MapWork on deserialization
> ---
>
> Key: HIVE-19937
> URL: https://issues.apache.org/jira/browse/HIVE-19937
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HIVE-19937.1.patch, HIVE-19937.2.patch, 
> HIVE-19937.3.patch, post-patch-report.html, report.html
>
>
> When fixing HIVE-16395, we decided that each new Spark task should clone the 
> {{JobConf}} object to prevent any {{ConcurrentModificationException}} from 
> being thrown. However, setting this variable comes at a cost of storing a 
> duplicate {{JobConf}} object for each Spark task. These objects can take up a 
> significant amount of memory, we should intern them so that Spark tasks 
> running in the same JVM don't store duplicate copies.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19937) Intern fields in MapWork on deserialization

2018-07-19 Thread Misha Dmitriev (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549667#comment-16549667
 ] 

Misha Dmitriev commented on HIVE-19937:
---

Thank you for publishing the updated jxray report, [~stakiar] Looks like things 
work as expected. One small question/concern: I see that this test uses a 
pretty small heap, less than 1GB. Is it expected that all the overheads that 
you are fixing will grow proportionally with a bigger heap?

> Intern fields in MapWork on deserialization
> ---
>
> Key: HIVE-19937
> URL: https://issues.apache.org/jira/browse/HIVE-19937
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HIVE-19937.1.patch, HIVE-19937.2.patch, 
> HIVE-19937.3.patch, post-patch-report.html, report.html
>
>
> When fixing HIVE-16395, we decided that each new Spark task should clone the 
> {{JobConf}} object to prevent any {{ConcurrentModificationException}} from 
> being thrown. However, setting this variable comes at a cost of storing a 
> duplicate {{JobConf}} object for each Spark task. These objects can take up a 
> significant amount of memory, we should intern them so that Spark tasks 
> running in the same JVM don't store duplicate copies.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19937) Intern fields in MapWork on deserialization

2018-07-19 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549601#comment-16549601
 ] 

Sahil Takiar commented on HIVE-19937:
-

Attached an jxray report after applying the patch on a cluster and running the 
same test. When comparing the two reports, the overhead from fields in 
{{MapWork}} is now gone. There is still a lot of overhead due to duplicate 
entires in {{Properties}} objects, I need to fix that. I noticed there is a bit 
of overhead in {{ProjectionPusher}} that I plan to fix in the next version of 
this patch.

> Intern fields in MapWork on deserialization
> ---
>
> Key: HIVE-19937
> URL: https://issues.apache.org/jira/browse/HIVE-19937
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HIVE-19937.1.patch, HIVE-19937.2.patch, 
> HIVE-19937.3.patch, post-patch-report.html, report.html
>
>
> When fixing HIVE-16395, we decided that each new Spark task should clone the 
> {{JobConf}} object to prevent any {{ConcurrentModificationException}} from 
> being thrown. However, setting this variable comes at a cost of storing a 
> duplicate {{JobConf}} object for each Spark task. These objects can take up a 
> significant amount of memory, we should intern them so that Spark tasks 
> running in the same JVM don't store duplicate copies.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)