[jira] [Commented] (HIVE-22198) Execute unoin-all with childs Join in parallel

2019-09-22 Thread Aditya Shah (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16935564#comment-16935564
 ] 

Aditya Shah commented on HIVE-22198:


[~luguangming], can we repair parents just in case of the conditional task 
instead of doing the same for all? 

> Execute unoin-all with childs Join in parallel
> --
>
> Key: HIVE-22198
> URL: https://issues.apache.org/jira/browse/HIVE-22198
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: LuGuangMing
>Assignee: LuGuangMing
>Priority: Major
> Attachments: HIVE-22198.patch, image-2019-09-20-11-38-37-433.png, 
> image-2019-09-20-11-39-30-347.png, test-parallel.sql
>
>
> set parallel is true, set skewjoin is false, set auto convert join is false. 
> run a unoin all, There is nothing error message, but some result data is 
> missing, details check attatchment [^test-parallel.sql]
> create table tab1(tid int, com string) row format delimited fields terminated 
> by '\t' stored as textfile;
>  create table tab2(tid int, com string) row format delimited fields 
> terminated by '\t' stored as textfile;
>  create table tab3(tid int, com string) row format delimited fields 
> terminated by '\t' stored as textfile;
>  create table tab4(tid int, com string) row format delimited fields 
> terminated by '\t' stored as textfile;
> insert into tab1 values(1,'abc'),(2,'bcd'),(3,'cde');
>  insert into tab2 values(1,'abc'),(2,'bcd'),(3,'cde');
>  insert into tab3 values(1,'abc'),(2,'bcd'),(3,'cde');
>  insert into tab4 values(1,'abc'),(2,'bcd'),(3,'cde');
> set hive.auto.convert.join=false;
>  set hive.optimize.skewjoin=true;
>  set hive.exec.parallel=true;
> SELECT sum(1) as a 
>  FROM tab1 t1 
>  INNER JOIN tab2 t2 
>  ON t1.com = t2.com
>  UNION ALL
>  SELECT sum(1) as a 
>  FROM tab3 t3 
>  INNER JOIN tab4 t4 
>  ON t3.com = t4.com;
> create table test_parallel stored as orcfile as 
>  SELECT sum(1) as a 
>  FROM tab1 t1 
>  INNER JOIN tab2 t2 
>  ON t1.com = t2.com
>  UNION ALL
>  SELECT sum(1) as a 
>  FROM tab3 t3 
>  INNER JOIN tab4 t4 
>  ON t3.com = t4.com;
> select * from test_parallel;
> The result data should be two, but only one.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22222) Clean up the error handling in Driver - get rid of global variables

2019-09-22 Thread Miklos Gergely (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-2?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Gergely updated HIVE-2:
--
Attachment: HIVE-2.02.patch

> Clean up the error handling in Driver - get rid of global variables
> ---
>
> Key: HIVE-2
> URL: https://issues.apache.org/jira/browse/HIVE-2
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-2.01.patch, HIVE-2.02.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The error handling in Hive is done with some global variables for no apparent 
> reason, as all the data that is gathered to described an exception are 
> produced and used at the point where the exception occurred. Thus having 
> global variables is misleading. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22221) Llap external client - Need to reduce LlapBaseInputFormat#getSplits() footprint

2019-09-22 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16935557#comment-16935557
 ] 

Hive QA commented on HIVE-1:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12981024/HIVE-1.4.patch

{color:green}SUCCESS:{color} +1 due to 4 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 16840 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[show_functions] 
(batchId=81)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/18685/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/18685/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-18685/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12981024 - PreCommit-HIVE-Build

> Llap external client - Need to reduce LlapBaseInputFormat#getSplits() 
> footprint  
> -
>
> Key: HIVE-1
> URL: https://issues.apache.org/jira/browse/HIVE-1
> Project: Hive
>  Issue Type: Bug
>  Components: llap, UDF
>Reporter: Shubham Chaurasia
>Assignee: Shubham Chaurasia
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-1.1.patch, HIVE-1.2.patch, 
> HIVE-1.3.patch, HIVE-1.4.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While querying through llap external client, LlapBaseInputFormat#getSplits() 
> invokes get_splits() (GenericUDTFGetSplits) udtf under the hoods.
> GenericUDTFGetSplits returns LlapInputSplit in which planBytes[] occupies 
> around 90% of the split size.
> Depending on data size/partitions and plan,  LlapInputSplit can grow upto 1mb 
> with planBytes[] being common to all the splits and occupying more than 850 
> kb. Also, it sometimes causes OOM on HS2 depending on HS2 heap size.
> This can be resolved by separating out common parts from actual splits and 
> reassembling them at client side. 
> We can also provide an option where client can say it does not want to 
> reassemble them and can take the control of reassembling in it's hands.
> Splits can be broken like:
> 1) schema split
> 2) plan split
> 3) actual split 1
> 4) actual split 2and so on.
> This greatly reduces the memory(in my case from 5GB(~5000 splits) to around 
> 15MB) on server side  and hence the data transfer. And this eliminates OOM on 
> HS2 side.
> cc [~jdere] [~sankarh] [~thejas]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22221) Llap external client - Need to reduce LlapBaseInputFormat#getSplits() footprint

2019-09-22 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16935548#comment-16935548
 ] 

Hive QA commented on HIVE-1:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
29s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
40s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
18s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
19s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
25s{color} | {color:blue} llap-client in master has 26 extant Findbugs 
warnings. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m  
6s{color} | {color:blue} ql in master has 1570 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
27s{color} | {color:blue} llap-ext-client in master has 1 extant Findbugs 
warnings. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
38s{color} | {color:blue} itests/hive-unit in master has 2 extant Findbugs 
warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
47s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
27s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
 9s{color} | {color:green} The patch llap-client passed checkstyle {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
40s{color} | {color:green} ql: The patch generated 0 new + 96 unchanged - 4 
fixed = 96 total (was 100) {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
11s{color} | {color:green} llap-ext-client: The patch generated 0 new + 36 
unchanged - 2 fixed = 36 total (was 38) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
17s{color} | {color:red} itests/hive-unit: The patch generated 3 new + 53 
unchanged - 5 fixed = 56 total (was 58) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
32s{color} | {color:red} llap-client generated 1 new + 26 unchanged - 0 fixed = 
27 total (was 26) {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  4m  
4s{color} | {color:red} ql generated 1 new + 1569 unchanged - 1 fixed = 1570 
total (was 1570) {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
32s{color} | {color:red} llap-ext-client generated 1 new + 1 unchanged - 0 
fixed = 2 total (was 1) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
36s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
13s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 35m 30s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:llap-client |
|  |  org.apache.hadoop.hive.llap.LlapInputSplit.setPlanBytes(byte[]) may 
expose internal representation by storing an externally mutable object into 
LlapInputSplit.planBytes  At LlapInputSplit.java:by storing an externally 
mutable object into LlapInputSplit.planBytes  At LlapInputSplit.java:[line 95] |
| FindBugs | module:ql |
|  |  Redundant nullcheck of driverCleanup, which is known to be non-null in 
org.apache.hadoop.hive.ql.udf.generic.GenericUDTFGetSplits.createPlanFragment(String,
 

[jira] [Updated] (HIVE-22221) Llap external client - Need to reduce LlapBaseInputFormat#getSplits() footprint

2019-09-22 Thread Shubham Chaurasia (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shubham Chaurasia updated HIVE-1:
-
Attachment: HIVE-1.4.patch

> Llap external client - Need to reduce LlapBaseInputFormat#getSplits() 
> footprint  
> -
>
> Key: HIVE-1
> URL: https://issues.apache.org/jira/browse/HIVE-1
> Project: Hive
>  Issue Type: Bug
>  Components: llap, UDF
>Reporter: Shubham Chaurasia
>Assignee: Shubham Chaurasia
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-1.1.patch, HIVE-1.2.patch, 
> HIVE-1.3.patch, HIVE-1.4.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While querying through llap external client, LlapBaseInputFormat#getSplits() 
> invokes get_splits() (GenericUDTFGetSplits) udtf under the hoods.
> GenericUDTFGetSplits returns LlapInputSplit in which planBytes[] occupies 
> around 90% of the split size.
> Depending on data size/partitions and plan,  LlapInputSplit can grow upto 1mb 
> with planBytes[] being common to all the splits and occupying more than 850 
> kb. Also, it sometimes causes OOM on HS2 depending on HS2 heap size.
> This can be resolved by separating out common parts from actual splits and 
> reassembling them at client side. 
> We can also provide an option where client can say it does not want to 
> reassemble them and can take the control of reassembling in it's hands.
> Splits can be broken like:
> 1) schema split
> 2) plan split
> 3) actual split 1
> 4) actual split 2and so on.
> This greatly reduces the memory(in my case from 5GB(~5000 splits) to around 
> 15MB) on server side  and hence the data transfer. And this eliminates OOM on 
> HS2 side.
> cc [~jdere] [~sankarh] [~thejas]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22228) SemanticAnalyzer cleanup - visibility + types

2019-09-22 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-8?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16935438#comment-16935438
 ] 

Hive QA commented on HIVE-8:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12981020/HIVE-8.01.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 16833 tests passed

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/18684/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/18684/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-18684/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12981020 - PreCommit-HIVE-Build

> SemanticAnalyzer cleanup - visibility + types
> -
>
> Key: HIVE-8
> URL: https://issues.apache.org/jira/browse/HIVE-8
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-8.01.patch
>
>
> Cleaning up SemanticAnalyzer:
>  * reduce the visibility of those functions/variables that are too wide, so 
> their scope is clearer
>  * modify the type of data structures, use interface instead of actual 
> implementation (e.g. HashMap -> Map in variable declaration)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22208) Column name with reserved keyword is unescaped when query including join on table with mask column is re-written

2019-09-22 Thread Bo soon Park (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16935430#comment-16935430
 ] 

Bo soon Park commented on HIVE-22208:
-

[~jcamachorodriguez] Could you consider this issue as critical one?

Customer needs to use the reserved word for column in their table

> Column name with reserved keyword is unescaped when query including join on 
> table with mask column is re-written
> 
>
> Key: HIVE-22208
> URL: https://issues.apache.org/jira/browse/HIVE-22208
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.0, 4.0.0
>Reporter: Riju Trivedi
>Assignee: Jesus Camacho Rodriguez
>Priority: Critical
>
> Join query  involving table with mask column and  other having reserved 
> keyword as column name fails with SemanticException during parsing re-written 
> query :
> Original Query :
> {code:java}
> select a.`date`, b.nm
> from sample_keyword a
> join sample_mask b
> on b.id = a.id;
> {code}
> Re-written Query :
>   
> {code:java}
> select a.date, b.nm
> from sample_keyword a
> join (SELECT `id`, CAST(mask_hash(nm) AS string) AS `nm`, 
> BLOCK__OFFSET__INSIDE__FILE, INPUT__FILE__NAME, ROW__ID FROM 
> `default`.`sample_mask` )`b`
> on b.id = a.id;
> {code}
> Re-written query does not have escape quotes for date column which cause 
> SemanticException while parsing :
> {code:java}
> org.apache.hadoop.hive.ql.parse.ParseException: line 1:9 cannot recognize 
> input near 'a' '.' 'date' in selection target 
>
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.rewriteASTWithMaskAndFilter( 
> SemanticAnalyzer.java:12084)  
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal( 
> SemanticAnalyzer.java:12298)
> at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal( 
> CalcitePlanner.java:360)
> at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze( 
> BaseSemanticAnalyzer.java:289)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:664)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1869)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22222) Clean up the error handling in Driver - get rid of global variables

2019-09-22 Thread Miklos Gergely (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-2?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Gergely updated HIVE-2:
--
Attachment: (was: HIVE-2.02.patch)

> Clean up the error handling in Driver - get rid of global variables
> ---
>
> Key: HIVE-2
> URL: https://issues.apache.org/jira/browse/HIVE-2
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-2.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The error handling in Hive is done with some global variables for no apparent 
> reason, as all the data that is gathered to described an exception are 
> produced and used at the point where the exception occurred. Thus having 
> global variables is misleading. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22222) Clean up the error handling in Driver - get rid of global variables

2019-09-22 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16935427#comment-16935427
 ] 

Hive QA commented on HIVE-2:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12981018/HIVE-2.02.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 16801 tests 
executed
*Failed tests:*
{noformat}
TestDataSourceProviderFactory - did not produce a TEST-*.xml file (likely timed 
out) (batchId=233)
TestObjectStore - did not produce a TEST-*.xml file (likely timed out) 
(batchId=233)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/18683/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/18683/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-18683/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12981018 - PreCommit-HIVE-Build

> Clean up the error handling in Driver - get rid of global variables
> ---
>
> Key: HIVE-2
> URL: https://issues.apache.org/jira/browse/HIVE-2
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-2.01.patch, HIVE-2.02.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The error handling in Hive is done with some global variables for no apparent 
> reason, as all the data that is gathered to described an exception are 
> produced and used at the point where the exception occurred. Thus having 
> global variables is misleading. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22228) SemanticAnalyzer cleanup - visibility + types

2019-09-22 Thread Miklos Gergely (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-8?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Gergely updated HIVE-8:
--
Attachment: HIVE-8.01.patch

> SemanticAnalyzer cleanup - visibility + types
> -
>
> Key: HIVE-8
> URL: https://issues.apache.org/jira/browse/HIVE-8
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-8.01.patch
>
>
> Cleaning up SemanticAnalyzer:
>  * reduce the visibility of those functions/variables that are too wide, so 
> their scope is clearer
>  * modify the type of data structures, use interface instead of actual 
> implementation (e.g. HashMap -> Map in variable declaration)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22228) SemanticAnalyzer cleanup - visibility + types

2019-09-22 Thread Miklos Gergely (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-8?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Gergely updated HIVE-8:
--
Attachment: (was: HIVE-8.01.patch)

> SemanticAnalyzer cleanup - visibility + types
> -
>
> Key: HIVE-8
> URL: https://issues.apache.org/jira/browse/HIVE-8
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
> Fix For: 4.0.0
>
>
> Cleaning up SemanticAnalyzer:
>  * reduce the visibility of those functions/variables that are too wide, so 
> their scope is clearer
>  * modify the type of data structures, use interface instead of actual 
> implementation (e.g. HashMap -> Map in variable declaration)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22222) Clean up the error handling in Driver - get rid of global variables

2019-09-22 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16935422#comment-16935422
 ] 

Hive QA commented on HIVE-2:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
22s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
2s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
38s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m  
5s{color} | {color:blue} ql in master has 1570 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
57s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
40s{color} | {color:green} ql: The patch generated 0 new + 86 unchanged - 47 
fixed = 86 total (was 133) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
14s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 24m  3s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-18683/dev-support/hive-personality.sh
 |
| git revision | master / 25f0fb4 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.1 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-18683/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Clean up the error handling in Driver - get rid of global variables
> ---
>
> Key: HIVE-2
> URL: https://issues.apache.org/jira/browse/HIVE-2
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-2.01.patch, HIVE-2.02.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The error handling in Hive is done with some global variables for no apparent 
> reason, as all the data that is gathered to described an exception are 
> produced and used at the point where the exception occurred. Thus having 
> global variables is misleading. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22228) SemanticAnalyzer cleanup - visibility + types

2019-09-22 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-8?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16935416#comment-16935416
 ] 

Hive QA commented on HIVE-8:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12981017/HIVE-8.01.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 16833 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.metastore.TestPartitionManagement.testPartitionDiscoveryTransactionalTable
 (batchId=223)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/18682/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/18682/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-18682/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12981017 - PreCommit-HIVE-Build

> SemanticAnalyzer cleanup - visibility + types
> -
>
> Key: HIVE-8
> URL: https://issues.apache.org/jira/browse/HIVE-8
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-8.01.patch
>
>
> Cleaning up SemanticAnalyzer:
>  * reduce the visibility of those functions/variables that are too wide, so 
> their scope is clearer
>  * modify the type of data structures, use interface instead of actual 
> implementation (e.g. HashMap -> Map in variable declaration)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22228) SemanticAnalyzer cleanup - visibility + types

2019-09-22 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-8?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16935412#comment-16935412
 ] 

Hive QA commented on HIVE-8:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
33s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
 2s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
30s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 8s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  3m 
53s{color} | {color:blue} ql in master has 1570 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
49s{color} | {color:blue} itests/util in master has 51 extant Findbugs 
warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
14s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
25s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
33s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
53s{color} | {color:red} ql: The patch generated 40 new + 1889 unchanged - 269 
fixed = 1929 total (was 2158) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  4m  
7s{color} | {color:red} ql generated 7 new + 1560 unchanged - 10 fixed = 1567 
total (was 1570) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
14s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
13s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 29m 15s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:ql |
|  |  Possible null pointer dereference of RowSchema.signature on branch that 
might be infeasible in org.apache.hadoop.hive.ql.exec.RowSchema.equals(Object)  
Dereferenced at RowSchema.java:RowSchema.signature on branch that might be 
infeasible in org.apache.hadoop.hive.ql.exec.RowSchema.equals(Object)  
Dereferenced at RowSchema.java:[line 132] |
|  |  The field org.apache.hadoop.hive.ql.parse.QBJoinTree.rhsSemijoin is 
transient but isn't set by deserialization  In QBJoinTree.java:but isn't set by 
deserialization  In QBJoinTree.java |
|  |  tmp could be null and is guaranteed to be dereferenced in 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genColListRegex(String, 
String, ASTNode, List, Set, RowResolver, RowResolver, Integer, RowResolver, 
List, boolean)  Dereferenced at SemanticAnalyzer.java:is guaranteed to be 
dereferenced in 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genColListRegex(String, 
String, ASTNode, List, Set, RowResolver, RowResolver, Integer, RowResolver, 
List, boolean)  Dereferenced at SemanticAnalyzer.java:[line 3749] |
|  |  Nullcheck of tmp at line 3707 of value previously dereferenced in 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genColListRegex(String, 
String, ASTNode, List, Set, RowResolver, RowResolver, Integer, RowResolver, 
List, boolean)  At SemanticAnalyzer.java:3707 of value previously dereferenced 
in org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genColListRegex(String, 
String, ASTNode, List, Set, RowResolver, RowResolver, Integer, RowResolver, 
List, boolean)  At SemanticAnalyzer.java:[line 3707] |
|  |  Nullcheck of sampleExprs at line 11483 of value previously dereferenced 
in org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genTablePlan(String, QB)  
At SemanticAnalyzer.java:11483 of value 

[jira] [Updated] (HIVE-22228) SemanticAnalyzer cleanup - visibility + types

2019-09-22 Thread Miklos Gergely (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-8?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Gergely updated HIVE-8:
--
Attachment: HIVE-8.01.patch

> SemanticAnalyzer cleanup - visibility + types
> -
>
> Key: HIVE-8
> URL: https://issues.apache.org/jira/browse/HIVE-8
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-8.01.patch
>
>
> Cleaning up SemanticAnalyzer:
>  * reduce the visibility of those functions/variables that are too wide, so 
> their scope is clearer
>  * modify the type of data structures, use interface instead of actual 
> implementation (e.g. HashMap -> Map in variable declaration)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22222) Clean up the error handling in Driver - get rid of global variables

2019-09-22 Thread Miklos Gergely (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-2?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Gergely updated HIVE-2:
--
Attachment: (was: HIVE-2.02.patch)

> Clean up the error handling in Driver - get rid of global variables
> ---
>
> Key: HIVE-2
> URL: https://issues.apache.org/jira/browse/HIVE-2
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-2.01.patch, HIVE-2.02.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The error handling in Hive is done with some global variables for no apparent 
> reason, as all the data that is gathered to described an exception are 
> produced and used at the point where the exception occurred. Thus having 
> global variables is misleading. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22222) Clean up the error handling in Driver - get rid of global variables

2019-09-22 Thread Miklos Gergely (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-2?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Gergely updated HIVE-2:
--
Attachment: HIVE-2.02.patch

> Clean up the error handling in Driver - get rid of global variables
> ---
>
> Key: HIVE-2
> URL: https://issues.apache.org/jira/browse/HIVE-2
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-2.01.patch, HIVE-2.02.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The error handling in Hive is done with some global variables for no apparent 
> reason, as all the data that is gathered to described an exception are 
> produced and used at the point where the exception occurred. Thus having 
> global variables is misleading. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22228) SemanticAnalyzer cleanup - visibility + types

2019-09-22 Thread Miklos Gergely (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-8?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Gergely updated HIVE-8:
--
Attachment: (was: HIVE-8.01.patch)

> SemanticAnalyzer cleanup - visibility + types
> -
>
> Key: HIVE-8
> URL: https://issues.apache.org/jira/browse/HIVE-8
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-8.01.patch
>
>
> Cleaning up SemanticAnalyzer:
>  * reduce the visibility of those functions/variables that are too wide, so 
> their scope is clearer
>  * modify the type of data structures, use interface instead of actual 
> implementation (e.g. HashMap -> Map in variable declaration)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22221) Llap external client - Need to reduce LlapBaseInputFormat#getSplits() footprint

2019-09-22 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16935303#comment-16935303
 ] 

Hive QA commented on HIVE-1:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12981003/HIVE-1.3.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/18681/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/18681/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-18681/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2019-09-22 12:56:37.348
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-18681/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2019-09-22 12:56:37.350
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at 25f0fb4 HIVE-20113: Shuffle avoidance: Disable 1-1 edges for 
sorted shuffle (Vineet Garg, Gopal V reviewed by Jesus Camacho Rodriguez)
+ git clean -f -d
Removing ${project.basedir}/
Removing itests/${project.basedir}/
Removing standalone-metastore/metastore-server/src/gen/
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at 25f0fb4 HIVE-20113: Shuffle avoidance: Disable 1-1 edges for 
sorted shuffle (Vineet Garg, Gopal V reviewed by Jesus Camacho Rodriguez)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2019-09-22 12:56:38.268
+ rm -rf ../yetus_PreCommit-HIVE-Build-18681
+ mkdir ../yetus_PreCommit-HIVE-Build-18681
+ git gc
+ cp -R . ../yetus_PreCommit-HIVE-Build-18681
+ mkdir /data/hiveptest/logs/PreCommit-HIVE-Build-18681/yetus
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
Going to apply patch with: git apply -p0
+ [[ maven == \m\a\v\e\n ]]
+ rm -rf /data/hiveptest/working/maven/org/apache/hive
+ mvn -B clean install -DskipTests -T 4 -q 
-Dmaven.repo.local=/data/hiveptest/working/maven
protoc-jar: executing: [/tmp/protoc2293711469475654133.exe, --version]
libprotoc 2.5.0
protoc-jar: executing: [/tmp/protoc2293711469475654133.exe, 
-I/data/hiveptest/working/apache-github-source-source/standalone-metastore/metastore-common/src/main/protobuf/org/apache/hadoop/hive/metastore,
 
--java_out=/data/hiveptest/working/apache-github-source-source/standalone-metastore/metastore-common/target/generated-sources,
 
/data/hiveptest/working/apache-github-source-source/standalone-metastore/metastore-common/src/main/protobuf/org/apache/hadoop/hive/metastore/metastore.proto]
ANTLR Parser Generator  Version 3.5.2
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-remote-resources-plugin:1.5:process 
(process-resource-bundles) on project hive-pre-upgrade: Execution 
process-resource-bundles of goal 
org.apache.maven.plugins:maven-remote-resources-plugin:1.5:process failed. 
ConcurrentModificationException -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/PluginExecutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :hive-pre-upgrade
+ result=1
+ '[' 1 -ne 0 ']'
+ rm -rf yetus_PreCommit-HIVE-Build-18681
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12981003 - 

[jira] [Updated] (HIVE-22221) Llap external client - Need to reduce LlapBaseInputFormat#getSplits() footprint

2019-09-22 Thread Shubham Chaurasia (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shubham Chaurasia updated HIVE-1:
-
Attachment: HIVE-1.3.patch

> Llap external client - Need to reduce LlapBaseInputFormat#getSplits() 
> footprint  
> -
>
> Key: HIVE-1
> URL: https://issues.apache.org/jira/browse/HIVE-1
> Project: Hive
>  Issue Type: Bug
>  Components: llap, UDF
>Reporter: Shubham Chaurasia
>Assignee: Shubham Chaurasia
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-1.1.patch, HIVE-1.2.patch, 
> HIVE-1.3.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While querying through llap external client, LlapBaseInputFormat#getSplits() 
> invokes get_splits() (GenericUDTFGetSplits) udtf under the hoods.
> GenericUDTFGetSplits returns LlapInputSplit in which planBytes[] occupies 
> around 90% of the split size.
> Depending on data size/partitions and plan,  LlapInputSplit can grow upto 1mb 
> with planBytes[] being common to all the splits and occupying more than 850 
> kb. Also, it sometimes causes OOM on HS2 depending on HS2 heap size.
> This can be resolved by separating out common parts from actual splits and 
> reassembling them at client side. 
> We can also provide an option where client can say it does not want to 
> reassemble them and can take the control of reassembling in it's hands.
> Splits can be broken like:
> 1) schema split
> 2) plan split
> 3) actual split 1
> 4) actual split 2and so on.
> This greatly reduces the memory(in my case from 5GB(~5000 splits) to around 
> 15MB) on server side  and hence the data transfer. And this eliminates OOM on 
> HS2 side.
> cc [~jdere] [~sankarh] [~thejas]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)