[jira] [Commented] (HIVE-17304) ThreadMXBean based memory allocation monitory for hash table loader

2017-10-20 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213456#comment-16213456
 ] 

Prasanth Jayachandran commented on HIVE-17304:
--

No. This needs further testing. I need to do some in-depth analysis to see 
which one is more accurate to the actual value based on heapdumps. So it's not 
ready for commit yet.

> ThreadMXBean based memory allocation monitory for hash table loader
> ---
>
> Key: HIVE-17304
> URL: https://issues.apache.org/jira/browse/HIVE-17304
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-17304.1.patch, HIVE-17304.2.patch
>
>
> Hash table memory monitoring is based on java data model which can be 
> unreliable because of various reasons (wrong object size estimation, adding 
> new variables to any class without accounting its size for memory monitoring, 
> etc.). We can use allocation size per thread that is provided by ThreadMXBean 
> and fallback to DataModel in case if JDK doesn't support thread based 
> allocations. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17304) ThreadMXBean based memory allocation monitory for hash table loader

2017-10-20 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213445#comment-16213445
 ] 

Ashutosh Chauhan commented on HIVE-17304:
-

[~prasanth_j] Shall we commit this?

> ThreadMXBean based memory allocation monitory for hash table loader
> ---
>
> Key: HIVE-17304
> URL: https://issues.apache.org/jira/browse/HIVE-17304
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-17304.1.patch, HIVE-17304.2.patch
>
>
> Hash table memory monitoring is based on java data model which can be 
> unreliable because of various reasons (wrong object size estimation, adding 
> new variables to any class without accounting its size for memory monitoring, 
> etc.). We can use allocation size per thread that is provided by ThreadMXBean 
> and fallback to DataModel in case if JDK doesn't support thread based 
> allocations. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17304) ThreadMXBean based memory allocation monitory for hash table loader

2017-08-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16146556#comment-16146556
 ] 

Hive QA commented on HIVE-17304:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12884351/HIVE-17304.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 11014 tests 
executed
*Failed tests:*
{noformat}
TestTxnCommandsBase - did not produce a TEST-*.xml file (likely timed out) 
(batchId=280)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=61)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[mergejoin] 
(batchId=158)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[bucketizedhiveinputformat]
 (batchId=170)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=234)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6593/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6593/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6593/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12884351 - PreCommit-HIVE-Build

> ThreadMXBean based memory allocation monitory for hash table loader
> ---
>
> Key: HIVE-17304
> URL: https://issues.apache.org/jira/browse/HIVE-17304
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-17304.1.patch, HIVE-17304.2.patch
>
>
> Hash table memory monitoring is based on java data model which can be 
> unreliable because of various reasons (wrong object size estimation, adding 
> new variables to any class without accounting its size for memory monitoring, 
> etc.). We can use allocation size per thread that is provided by ThreadMXBean 
> and fallback to DataModel in case if JDK doesn't support thread based 
> allocations. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17304) ThreadMXBean based memory allocation monitory for hash table loader

2017-08-29 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16146111#comment-16146111
 ] 

Sergey Shelukhin commented on HIVE-17304:
-

+1 with some testing

> ThreadMXBean based memory allocation monitory for hash table loader
> ---
>
> Key: HIVE-17304
> URL: https://issues.apache.org/jira/browse/HIVE-17304
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-17304.1.patch
>
>
> Hash table memory monitoring is based on java data model which can be 
> unreliable because of various reasons (wrong object size estimation, adding 
> new variables to any class without accounting its size for memory monitoring, 
> etc.). We can use allocation size per thread that is provided by ThreadMXBean 
> and fallback to DataModel in case if JDK doesn't support thread based 
> allocations. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17304) ThreadMXBean based memory allocation monitory for hash table loader

2017-08-29 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16146107#comment-16146107
 ] 

Prasanth Jayachandran commented on HIVE-17304:
--

The config changed because we are often very close to estimates in most of the 
cases (vectorized atleast). I have seen some heapdumps with 2GB hash tables and 
estimates from log lines are also very close to 2GB (<5%). Initial 2x factor 
was added earlier primarily for non-vectorized cases + object overhead + 
key/value size misestimation. 
Also 2x factor is after memory overscription which already gives some more room 
for hash tables. With this patch even in non-vectorized case we are pretty 
close when ThreadMXBean info is used. The idea is to get close to noconditional 
task size + oversubscribed memory. So relaxed it to 1.5x :)


> ThreadMXBean based memory allocation monitory for hash table loader
> ---
>
> Key: HIVE-17304
> URL: https://issues.apache.org/jira/browse/HIVE-17304
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-17304.1.patch
>
>
> Hash table memory monitoring is based on java data model which can be 
> unreliable because of various reasons (wrong object size estimation, adding 
> new variables to any class without accounting its size for memory monitoring, 
> etc.). We can use allocation size per thread that is provided by ThreadMXBean 
> and fallback to DataModel in case if JDK doesn't support thread based 
> allocations. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17304) ThreadMXBean based memory allocation monitory for hash table loader

2017-08-29 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16146089#comment-16146089
 ] 

Sergey Shelukhin commented on HIVE-17304:
-

Why did the config change? Otherwise looks good. Might need some realistic 
testing.

> ThreadMXBean based memory allocation monitory for hash table loader
> ---
>
> Key: HIVE-17304
> URL: https://issues.apache.org/jira/browse/HIVE-17304
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-17304.1.patch
>
>
> Hash table memory monitoring is based on java data model which can be 
> unreliable because of various reasons (wrong object size estimation, adding 
> new variables to any class without accounting its size for memory monitoring, 
> etc.). We can use allocation size per thread that is provided by ThreadMXBean 
> and fallback to DataModel in case if JDK doesn't support thread based 
> allocations. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17304) ThreadMXBean based memory allocation monitory for hash table loader

2017-08-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16124653#comment-16124653
 ] 

Hive QA commented on HIVE-17304:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12881589/HIVE-17304.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 11004 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite]
 (batchId=240)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_move]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_move_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only]
 (batchId=170)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=235)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=180)
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testConcurrentStatements (batchId=228)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6368/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6368/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6368/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 12 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12881589 - PreCommit-HIVE-Build

> ThreadMXBean based memory allocation monitory for hash table loader
> ---
>
> Key: HIVE-17304
> URL: https://issues.apache.org/jira/browse/HIVE-17304
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-17304.1.patch
>
>
> Hash table memory monitoring is based on java data model which can be 
> unreliable because of various reasons (wrong object size estimation, adding 
> new variables to any class without accounting its size for memory monitoring, 
> etc.). We can use allocation size per thread that is provided by ThreadMXBean 
> and fallback to DataModel in case if JDK doesn't support thread based 
> allocations. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17304) ThreadMXBean based memory allocation monitory for hash table loader

2017-08-11 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16124382#comment-16124382
 ] 

Prasanth Jayachandran commented on HIVE-17304:
--

This provides a better estimates and also the estimates are pretty close the 
actual object size (observed this from heapdumps) atleast for vectorized case. 
Also bringing down the inflation factor from 2.0 to 1.5 as a result. Still 
testing this patch on larger dataset. 

> ThreadMXBean based memory allocation monitory for hash table loader
> ---
>
> Key: HIVE-17304
> URL: https://issues.apache.org/jira/browse/HIVE-17304
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-17304.1.patch
>
>
> Hash table memory monitoring is based on java data model which can be 
> unreliable because of various reasons (wrong object size estimation, adding 
> new variables to any class without accounting its size for memory monitoring, 
> etc.). We can use allocation size per thread that is provided by ThreadMXBean 
> and fallback to DataModel in case if JDK doesn't support thread based 
> allocations. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)