[jira] [Comment Edited] (HIVE-24066) Hive query on parquet data should identify if column is not present in file schema and show NULL value instead of Exception

2022-07-07 Thread Seonguk Kim (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17563720#comment-17563720
 ] 

Seonguk Kim edited comment on HIVE-24066 at 7/8/22 6:25 AM:


null check support for `context.os` would be useful.

(null check for struct column that not exists in file)


was (Author: JIRAUSER292443):
It would be useful if null check for context.os works.

(null check for struct column that not exists in file)

> Hive query on parquet data should identify if column is not present in file 
> schema and show NULL value instead of Exception
> ---
>
> Key: HIVE-24066
> URL: https://issues.apache.org/jira/browse/HIVE-24066
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.3.5, 3.1.2
>Reporter: Jainik Vora
>Priority: Major
> Attachments: day_01.snappy.parquet
>
>
> I created a hive table containing columns with struct data type 
>   
> {code:java}
> CREATE EXTERNAL TABLE test_dwh.sample_parquet_table (
>   `context` struct<
> `app`: struct<
> `build`: string,
> `name`: string,
> `namespace`: string,
> `version`: string
> >,
> `device`: struct<
> `adtrackingenabled`: boolean,
> `advertisingid`: string,
> `id`: string,
> `manufacturer`: string,
> `model`: string,
> `type`: string
> >,
> `locale`: string,
> `library`: struct<
> `name`: string,
> `version`: string
> >,
> `os`: struct<
> `name`: string,
> `version`: string
> >,
> `screen`: struct<
> `height`: bigint,
> `width`: bigint
> >,
> `network`: struct<
> `carrier`: string,
> `cellular`: boolean,
> `wifi`: boolean
>  >,
> `timezone`: string,
> `userAgent`: string
> >
> ) PARTITIONED BY (day string)
> STORED as PARQUET
> LOCATION 's3://xyz/events'{code}
>  
>  All columns are nullable hence the parquet files read by the table don't 
> always contain all columns. If any file in a partition doesn't have 
> "context.os" struct and if "context.os.name" is queried, Hive throws an 
> exception as below. Same for "context.screen" as well.
>   
> {code:java}
> 2020-10-23T00:44:10,496 ERROR [db58bfe6-d0ca-4233-845a-8a10916c3ff1 
> main([])]: CliDriver (SessionState.java:printError(1126)) - Failed with 
> exception java.io.IOException:java.lang.RuntimeException: Primitive type 
> osshould not doesn't match typeos[name]
> 2020-10-23T00:44:10,496 ERROR [db58bfe6-d0ca-4233-845a-8a10916c3ff1 
> main([])]: CliDriver (SessionState.java:printError(1126)) - Failed with 
> exception java.io.IOException:java.lang.RuntimeException: Primitive type 
> osshould not doesn't match typeos[name]java.io.IOException: 
> java.lang.RuntimeException: Primitive type osshould not doesn't match 
> typeos[name] 
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:521)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:428)
>   at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:147)
>   at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2208)
>   at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:253)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336)
>   at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:787)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.apache.hadoop.util.RunJar.run(RunJar.java:239)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
> Caused by: java.lang.RuntimeException: Primitive type osshould not doesn't 
> match typeos[name] 
>   at 
> org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.projectLeafTypes(DataWritableReadSupport.java:330)
>  
>   at 
> org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.projectLeafTypes(DataWritableReadSupport.java:322)
>  
>   at 
> org.apache.hadoop.hive.ql.io.parquet.read

[jira] [Work logged] (HIVE-22193) Graceful Shutdown HiveServer2

2022-07-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22193?focusedWorklogId=788805&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-788805
 ]

ASF GitHub Bot logged work on HIVE-22193:
-

Author: ASF GitHub Bot
Created on: 08/Jul/22 00:34
Start Date: 08/Jul/22 00:34
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 opened a new pull request, #3386:
URL: https://github.com/apache/hive/pull/3386

   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   1.  added unit test
   2. start hs2 locally; invoke bin/hive --service hiveserver2 --graceful_stop 
to stop hs2
   the graceful_stop command can also take two parameters, that is, the process 
id and maximum time(default: 1800s) for killing the process. If absent,  the 
process id will be assigned to the content of 
$HIVESERVER2_PID_DIR/hiveserver2.pid. 
   The HIVESERVER2_PID_DIR is default to HIVE_CONF_DIR if not specified.
   
   




Issue Time Tracking
---

Worklog Id: (was: 788805)
Time Spent: 2h 10m  (was: 2h)

> Graceful Shutdown HiveServer2
> -
>
> Key: HIVE-22193
> URL: https://issues.apache.org/jira/browse/HIVE-22193
> Project: Hive
>  Issue Type: Improvement
>  Components: Server Infrastructure
>Reporter: chenshiyun
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> We have a lot of HiveSever2 servers deployed on production environment (about 
> 10 nodes). 
> However, if we want to change configuration or add patches, we would have to 
> restart all of them one by one. So all the Hive Sql job running on the server 
> will be defeated, and there may be some mistakes come up on the jdbc client 
> occasionally.
> In the proposed changes,  planning to add Graceful Shutdown HiveSever2 method 
> to avoid affecting the production environment jobs



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-22193) Graceful Shutdown HiveServer2

2022-07-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22193?focusedWorklogId=788803&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-788803
 ]

ASF GitHub Bot logged work on HIVE-22193:
-

Author: ASF GitHub Bot
Created on: 08/Jul/22 00:32
Start Date: 08/Jul/22 00:32
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 closed pull request #3386: HIVE-22193: 
Graceful Shutdown HiveServer2
URL: https://github.com/apache/hive/pull/3386




Issue Time Tracking
---

Worklog Id: (was: 788803)
Time Spent: 2h  (was: 1h 50m)

> Graceful Shutdown HiveServer2
> -
>
> Key: HIVE-22193
> URL: https://issues.apache.org/jira/browse/HIVE-22193
> Project: Hive
>  Issue Type: Improvement
>  Components: Server Infrastructure
>Reporter: chenshiyun
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> We have a lot of HiveSever2 servers deployed on production environment (about 
> 10 nodes). 
> However, if we want to change configuration or add patches, we would have to 
> restart all of them one by one. So all the Hive Sql job running on the server 
> will be defeated, and there may be some mistakes come up on the jdbc client 
> occasionally.
> In the proposed changes,  planning to add Graceful Shutdown HiveSever2 method 
> to avoid affecting the production environment jobs



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-26378) Improve error message for masking over complex data types

2022-07-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-26378:
--
Labels: pull-request-available  (was: )

> Improve error message for masking over complex data types
> -
>
> Key: HIVE-26378
> URL: https://issues.apache.org/jira/browse/HIVE-26378
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2, Security
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The current error when applying column masking over (unsupported) complex 
> data types could be improved and be more explicit.
> Currently, the thrown error is as follows:
> {noformat}
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.parse.SemanticException:org.apache.hadoop.hive.ql.parse.ParseException:
>  line 1:57 cannot recognize input near 'map' '<' 'string' in primitive type 
> specification
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.rewriteASTWithMaskAndFilter(SemanticAnalyzer.java:10370)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10486)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:219)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:465)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:321)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1224)
> at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1218)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:146)
> ... 15 more
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.parse.ParseException:line 1:57 cannot recognize 
> input near 'map' '<' 'string' in primitive type specification
> at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:214)
> at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:171)
> at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.rewriteASTWithMaskAndFilter(SemanticAnalyzer.java:10368)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26378) Improve error message for masking over complex data types

2022-07-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26378?focusedWorklogId=788669&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-788669
 ]

ASF GitHub Bot logged work on HIVE-26378:
-

Author: ASF GitHub Bot
Created on: 07/Jul/22 15:55
Start Date: 07/Jul/22 15:55
Worklog Time Spent: 10m 
  Work Description: asolimando opened a new pull request, #3421:
URL: https://github.com/apache/hive/pull/3421

   
   
   ### What changes were proposed in this pull request?
   
   
   Improving the current error message for column masking over complex data 
types, currently Hive is throwing a `ParseException` that is confusing to the 
end user. 
   
   ### Why are the changes needed?
   
   
   User should have a clear error saying that masking is not supported for 
complex data types.
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   No.
   
   ### How was this patch tested?
   
   
   `mvn test -Dtest=TestNegativeLlapLocalCliDriver 
-Dqfile="masking_complex_type.q" -Dtest.output.overwrite -pl itests/qtest 
-Pitests`
   




Issue Time Tracking
---

Worklog Id: (was: 788669)
Remaining Estimate: 0h
Time Spent: 10m

> Improve error message for masking over complex data types
> -
>
> Key: HIVE-26378
> URL: https://issues.apache.org/jira/browse/HIVE-26378
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2, Security
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The current error when applying column masking over (unsupported) complex 
> data types could be improved and be more explicit.
> Currently, the thrown error is as follows:
> {noformat}
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.parse.SemanticException:org.apache.hadoop.hive.ql.parse.ParseException:
>  line 1:57 cannot recognize input near 'map' '<' 'string' in primitive type 
> specification
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.rewriteASTWithMaskAndFilter(SemanticAnalyzer.java:10370)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10486)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:219)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:465)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:321)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1224)
> at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1218)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:146)
> ... 15 more
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.parse.ParseException:line 1:57 cannot recognize 
> input near 'map' '<' 'string' in primitive type specification
> at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:214)
> at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:171)
> at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.rewriteASTWithMaskAndFilter(SemanticAnalyzer.java:10368)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-20628) Parsing error when using a complex map data type under dynamic column masking

2022-07-07 Thread Alessandro Solimando (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-20628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando resolved HIVE-20628.
-
Resolution: Invalid

This is not a bug nor a regression since complex data types have never been 
supported by Hive.

> Parsing error when using a complex map data type under dynamic column masking
> -
>
> Key: HIVE-20628
> URL: https://issues.apache.org/jira/browse/HIVE-20628
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2, Parser, Security
>Affects Versions: 2.1.0
> Environment: The error can be simulated using HDP 2.6.4 sandbox
>Reporter: Darryl Dutton
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When trying to use the map complex data type as part of dynamic column mask, 
> Hive throws a parsing error as it is expecting a primitive type (see trace 
> pasted below). The use case is trying to apply masking to elements within a 
> map type by applying a custom hive UDF (to apply the mask) using Ranger. 
> Expect Hive to support complex data types for masking in addition to the 
> primitive types. The expectation occurs when Hive need to evaluate the UDF or 
> apply a standard mask (pass-through works as expected). You can recreate the 
> problem by creating a simple table with a map data type column, then applying 
> the masking to that column through a Ranger resource based policy and  a 
> custom function (you can use a standard Hive UDF  str_to_map('F4','') to 
> simulate returning a map). 
> CREATE  TABLE `mask_test`(
>  `key` string, 
>  `value` map)
> STORED AS INPUTFORMAT 
>  'org.apache.hadoop.mapred.TextInputFormat'
>  
> INSERT INTO TABLE mask_test
> SELECT 'AAA' as key, 
> map('F1','2022','F2','','F3','333') as value
> FROM (select 1 ) as temp;
>  
>  
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.parse.SemanticException:org.apache.hadoop.hive.ql.parse.ParseException:
>  line 1:57 cannot recognize input near 'map' '<' 'string' in primitive type 
> specification
>  at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.rewriteASTWithMaskAndFilter(SemanticAnalyzer.java:10370)
>  at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10486)
>  at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:219)
>  at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238)
>  at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:465)
>  at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:321)
>  at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1224)
>  at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1218)
>  at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:146)
>  ... 15 more
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.parse.ParseException:line 1:57 cannot recognize 
> input near 'map' '<' 'string' in primitive type specification
>  at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:214)
>  at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:171)
>  at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
>  at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.rewriteASTWithMaskAndFilter(SemanticAnalyzer.java:10368)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-26378) Improve error message for masking over complex data types

2022-07-07 Thread Alessandro Solimando (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando updated HIVE-26378:

Description: 
The current error when applying column masking over (unsupported) complex data 
types could be improved and be more explicit.

Currently, the thrown error is as follows:
{noformat}
Caused by: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.parse.SemanticException:org.apache.hadoop.hive.ql.parse.ParseException:
 line 1:57 cannot recognize input near 'map' '<' 'string' in primitive type 
specification
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.rewriteASTWithMaskAndFilter(SemanticAnalyzer.java:10370)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10486)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:219)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:465)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:321)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1224)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1218)
at 
org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:146)
... 15 more
Caused by: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.parse.ParseException:line 1:57 cannot recognize input 
near 'map' '<' 'string' in primitive type specification
at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:214)
at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:171)
at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.rewriteASTWithMaskAndFilter(SemanticAnalyzer.java:10368)
{noformat}


> Improve error message for masking over complex data types
> -
>
> Key: HIVE-26378
> URL: https://issues.apache.org/jira/browse/HIVE-26378
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2, Security
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>
> The current error when applying column masking over (unsupported) complex 
> data types could be improved and be more explicit.
> Currently, the thrown error is as follows:
> {noformat}
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.parse.SemanticException:org.apache.hadoop.hive.ql.parse.ParseException:
>  line 1:57 cannot recognize input near 'map' '<' 'string' in primitive type 
> specification
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.rewriteASTWithMaskAndFilter(SemanticAnalyzer.java:10370)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10486)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:219)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:465)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:321)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1224)
> at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1218)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:146)
> ... 15 more
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.parse.ParseException:line 1:57 cannot recognize 
> input near 'map' '<' 'string' in primitive type specification
> at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:214)
> at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:171)
> at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.rewriteASTWithMaskAndFilter(SemanticAnalyzer.java:10368)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work started] (HIVE-26378) Improve error message for masking over complex data types

2022-07-07 Thread Alessandro Solimando (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-26378 started by Alessandro Solimando.
---
> Improve error message for masking over complex data types
> -
>
> Key: HIVE-26378
> URL: https://issues.apache.org/jira/browse/HIVE-26378
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2, Security
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>
> The current error when applying column masking over (unsupported) complex 
> data types could be improved and be more explicit.
> Currently, the thrown error is as follows:
> {noformat}
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.parse.SemanticException:org.apache.hadoop.hive.ql.parse.ParseException:
>  line 1:57 cannot recognize input near 'map' '<' 'string' in primitive type 
> specification
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.rewriteASTWithMaskAndFilter(SemanticAnalyzer.java:10370)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10486)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:219)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:465)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:321)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1224)
> at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1218)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:146)
> ... 15 more
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.parse.ParseException:line 1:57 cannot recognize 
> input near 'map' '<' 'string' in primitive type specification
> at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:214)
> at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:171)
> at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.rewriteASTWithMaskAndFilter(SemanticAnalyzer.java:10368)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-20628) Parsing error when using a complex map data type under dynamic column masking

2022-07-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-20628?focusedWorklogId=788657&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-788657
 ]

ASF GitHub Bot logged work on HIVE-20628:
-

Author: ASF GitHub Bot
Created on: 07/Jul/22 15:21
Start Date: 07/Jul/22 15:21
Worklog Time Spent: 10m 
  Work Description: asolimando closed pull request #3417: HIVE-20628: 
Parsing error when using a complex map data type under dy…
URL: https://github.com/apache/hive/pull/3417




Issue Time Tracking
---

Worklog Id: (was: 788657)
Time Spent: 20m  (was: 10m)

> Parsing error when using a complex map data type under dynamic column masking
> -
>
> Key: HIVE-20628
> URL: https://issues.apache.org/jira/browse/HIVE-20628
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2, Parser, Security
>Affects Versions: 2.1.0
> Environment: The error can be simulated using HDP 2.6.4 sandbox
>Reporter: Darryl Dutton
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When trying to use the map complex data type as part of dynamic column mask, 
> Hive throws a parsing error as it is expecting a primitive type (see trace 
> pasted below). The use case is trying to apply masking to elements within a 
> map type by applying a custom hive UDF (to apply the mask) using Ranger. 
> Expect Hive to support complex data types for masking in addition to the 
> primitive types. The expectation occurs when Hive need to evaluate the UDF or 
> apply a standard mask (pass-through works as expected). You can recreate the 
> problem by creating a simple table with a map data type column, then applying 
> the masking to that column through a Ranger resource based policy and  a 
> custom function (you can use a standard Hive UDF  str_to_map('F4','') to 
> simulate returning a map). 
> CREATE  TABLE `mask_test`(
>  `key` string, 
>  `value` map)
> STORED AS INPUTFORMAT 
>  'org.apache.hadoop.mapred.TextInputFormat'
>  
> INSERT INTO TABLE mask_test
> SELECT 'AAA' as key, 
> map('F1','2022','F2','','F3','333') as value
> FROM (select 1 ) as temp;
>  
>  
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.parse.SemanticException:org.apache.hadoop.hive.ql.parse.ParseException:
>  line 1:57 cannot recognize input near 'map' '<' 'string' in primitive type 
> specification
>  at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.rewriteASTWithMaskAndFilter(SemanticAnalyzer.java:10370)
>  at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10486)
>  at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:219)
>  at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238)
>  at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:465)
>  at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:321)
>  at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1224)
>  at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1218)
>  at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:146)
>  ... 15 more
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.parse.ParseException:line 1:57 cannot recognize 
> input near 'map' '<' 'string' in primitive type specification
>  at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:214)
>  at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:171)
>  at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
>  at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.rewriteASTWithMaskAndFilter(SemanticAnalyzer.java:10368)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-26378) Improve error message for masking over complex data types

2022-07-07 Thread Alessandro Solimando (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando reassigned HIVE-26378:
---


> Improve error message for masking over complex data types
> -
>
> Key: HIVE-26378
> URL: https://issues.apache.org/jira/browse/HIVE-26378
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2, Security
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Alessandro Solimando
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HIVE-20628) Parsing error when using a complex map data type under dynamic column masking

2022-07-07 Thread Alessandro Solimando (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-20628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17563830#comment-17563830
 ] 

Alessandro Solimando edited comment on HIVE-20628 at 7/7/22 3:17 PM:
-

Masking complex types has never been supported by Hive, the current error 
though could be improved and be more explicit about what is happening.


was (Author: asolimando):
Complex types are not supported by Hive, the current error could be improved 
and be more explicit about what is happening.

> Parsing error when using a complex map data type under dynamic column masking
> -
>
> Key: HIVE-20628
> URL: https://issues.apache.org/jira/browse/HIVE-20628
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2, Parser, Security
>Affects Versions: 2.1.0
> Environment: The error can be simulated using HDP 2.6.4 sandbox
>Reporter: Darryl Dutton
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When trying to use the map complex data type as part of dynamic column mask, 
> Hive throws a parsing error as it is expecting a primitive type (see trace 
> pasted below). The use case is trying to apply masking to elements within a 
> map type by applying a custom hive UDF (to apply the mask) using Ranger. 
> Expect Hive to support complex data types for masking in addition to the 
> primitive types. The expectation occurs when Hive need to evaluate the UDF or 
> apply a standard mask (pass-through works as expected). You can recreate the 
> problem by creating a simple table with a map data type column, then applying 
> the masking to that column through a Ranger resource based policy and  a 
> custom function (you can use a standard Hive UDF  str_to_map('F4','') to 
> simulate returning a map). 
> CREATE  TABLE `mask_test`(
>  `key` string, 
>  `value` map)
> STORED AS INPUTFORMAT 
>  'org.apache.hadoop.mapred.TextInputFormat'
>  
> INSERT INTO TABLE mask_test
> SELECT 'AAA' as key, 
> map('F1','2022','F2','','F3','333') as value
> FROM (select 1 ) as temp;
>  
>  
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.parse.SemanticException:org.apache.hadoop.hive.ql.parse.ParseException:
>  line 1:57 cannot recognize input near 'map' '<' 'string' in primitive type 
> specification
>  at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.rewriteASTWithMaskAndFilter(SemanticAnalyzer.java:10370)
>  at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10486)
>  at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:219)
>  at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238)
>  at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:465)
>  at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:321)
>  at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1224)
>  at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1218)
>  at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:146)
>  ... 15 more
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.parse.ParseException:line 1:57 cannot recognize 
> input near 'map' '<' 'string' in primitive type specification
>  at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:214)
>  at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:171)
>  at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
>  at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.rewriteASTWithMaskAndFilter(SemanticAnalyzer.java:10368)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-20628) Parsing error when using a complex map data type under dynamic column masking

2022-07-07 Thread Alessandro Solimando (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-20628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17563830#comment-17563830
 ] 

Alessandro Solimando commented on HIVE-20628:
-

Complex types are not supported by Hive, the current error could be improved 
and be more explicit about what is happening.

> Parsing error when using a complex map data type under dynamic column masking
> -
>
> Key: HIVE-20628
> URL: https://issues.apache.org/jira/browse/HIVE-20628
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2, Parser, Security
>Affects Versions: 2.1.0
> Environment: The error can be simulated using HDP 2.6.4 sandbox
>Reporter: Darryl Dutton
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When trying to use the map complex data type as part of dynamic column mask, 
> Hive throws a parsing error as it is expecting a primitive type (see trace 
> pasted below). The use case is trying to apply masking to elements within a 
> map type by applying a custom hive UDF (to apply the mask) using Ranger. 
> Expect Hive to support complex data types for masking in addition to the 
> primitive types. The expectation occurs when Hive need to evaluate the UDF or 
> apply a standard mask (pass-through works as expected). You can recreate the 
> problem by creating a simple table with a map data type column, then applying 
> the masking to that column through a Ranger resource based policy and  a 
> custom function (you can use a standard Hive UDF  str_to_map('F4','') to 
> simulate returning a map). 
> CREATE  TABLE `mask_test`(
>  `key` string, 
>  `value` map)
> STORED AS INPUTFORMAT 
>  'org.apache.hadoop.mapred.TextInputFormat'
>  
> INSERT INTO TABLE mask_test
> SELECT 'AAA' as key, 
> map('F1','2022','F2','','F3','333') as value
> FROM (select 1 ) as temp;
>  
>  
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.parse.SemanticException:org.apache.hadoop.hive.ql.parse.ParseException:
>  line 1:57 cannot recognize input near 'map' '<' 'string' in primitive type 
> specification
>  at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.rewriteASTWithMaskAndFilter(SemanticAnalyzer.java:10370)
>  at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10486)
>  at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:219)
>  at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238)
>  at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:465)
>  at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:321)
>  at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1224)
>  at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1218)
>  at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:146)
>  ... 15 more
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.parse.ParseException:line 1:57 cannot recognize 
> input near 'map' '<' 'string' in primitive type specification
>  at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:214)
>  at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:171)
>  at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
>  at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.rewriteASTWithMaskAndFilter(SemanticAnalyzer.java:10368)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26373) ClassCastException while inserting Avro data into Hbase table for nested struct with Timestamp

2022-07-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26373?focusedWorklogId=788628&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-788628
 ]

ASF GitHub Bot logged work on HIVE-26373:
-

Author: ASF GitHub Bot
Created on: 07/Jul/22 13:29
Start Date: 07/Jul/22 13:29
Worklog Time Spent: 10m 
  Work Description: zabetak commented on code in PR #3418:
URL: https://github.com/apache/hive/pull/3418#discussion_r915874809


##
hbase-handler/src/test/results/positive/hbase_avro_nested_timestamp.q.out:
##
@@ -0,0 +1,45 @@
+PREHOOK: query: CREATE EXTERNAL TABLE tbl(
+`key` string COMMENT '',
+`data_frv4` struct<`id`:string, `dischargedate`:struct<`value`:timestamp>>)
+ROW FORMAT SERDE
+  'org.apache.hadoop.hive.hbase.HBaseSerDe'
+STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
+WITH SERDEPROPERTIES (
+'serialization.format'='1',
+'hbase.columns.mapping' = ':key,data:frV4',
+'data.frV4.serialization.type'='avro',
+ A masked pattern was here 
+)
+TBLPROPERTIES (
+'hbase.table.name' = 'HiveAvroTable',
+'hbase.struct.autogenerate'='true')
+PREHOOK: type: CREATETABLE
+PREHOOK: Output: database:default
+PREHOOK: Output: default@tbl
+POSTHOOK: query: CREATE EXTERNAL TABLE tbl(
+`key` string COMMENT '',
+`data_frv4` struct<`id`:string, `dischargedate`:struct<`value`:timestamp>>)
+ROW FORMAT SERDE
+  'org.apache.hadoop.hive.hbase.HBaseSerDe'
+STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
+WITH SERDEPROPERTIES (
+'serialization.format'='1',
+'hbase.columns.mapping' = ':key,data:frV4',
+'data.frV4.serialization.type'='avro',
+ A masked pattern was here 
+)
+TBLPROPERTIES (
+'hbase.table.name' = 'HiveAvroTable',
+'hbase.struct.autogenerate'='true')
+POSTHOOK: type: CREATETABLE
+POSTHOOK: Output: database:default
+POSTHOOK: Output: default@tbl
+PREHOOK: query: select data_frV4.dischargedate.value from tbl
+PREHOOK: type: QUERY
+PREHOOK: Input: default@tbl
+ A masked pattern was here 
+POSTHOOK: query: select data_frV4.dischargedate.value from tbl
+POSTHOOK: type: QUERY
+POSTHOOK: Input: default@tbl
+ A masked pattern was here 
+1970-01-19 20:16:19.2

Review Comment:
   The resolution of this is discussed here: 
https://github.com/apache/hive/pull/3418#issuecomment-1177583388





Issue Time Tracking
---

Worklog Id: (was: 788628)
Time Spent: 1h  (was: 50m)

> ClassCastException while inserting Avro data into Hbase table for nested 
> struct with Timestamp
> --
>
> Key: HIVE-26373
> URL: https://issues.apache.org/jira/browse/HIVE-26373
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Soumyakanti Das
>Assignee: Soumyakanti Das
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> For Avro data where the schema has nested struct with a Timestamp datatype, 
> we get the following ClassCastException:
> {code:java}
> 2022-07-05T08:40:51,572 ERROR [LocalJobRunner Map Task Executor #0] 
> mr.ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:573)
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:148)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapRunner.run(ExecMapRunner.java:37)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
> at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:271)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.ClassCastException: 
> org.apache.hadoop.hive.common.type.Timestamp cannot be cast to 
> org.apache.hadoop.hive.serde2.lazy.LazyPrimitive
> at 
> org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.AbstractPrimitiveLazyObjectInspector.getPrimitiveWritableObject(AbstractPrimitiveLazyObjectInspector.java:40)
> at 
> org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyTimestampObjectInspector.getPrimitiveWritableObject(LazyTimestampObjectInspector.java:29)
> at 
> org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitiveUTF8(Lazy

[jira] [Work logged] (HIVE-26373) ClassCastException while inserting Avro data into Hbase table for nested struct with Timestamp

2022-07-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26373?focusedWorklogId=788627&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-788627
 ]

ASF GitHub Bot logged work on HIVE-26373:
-

Author: ASF GitHub Bot
Created on: 07/Jul/22 13:27
Start Date: 07/Jul/22 13:27
Worklog Time Spent: 10m 
  Work Description: zabetak commented on code in PR #3418:
URL: https://github.com/apache/hive/pull/3418#discussion_r915873512


##
hbase-handler/src/test/results/positive/hbase_avro_nested_timestamp.q.out:
##
@@ -0,0 +1,45 @@
+PREHOOK: query: CREATE EXTERNAL TABLE tbl(
+`key` string COMMENT '',
+`data_frv4` struct<`id`:string, `dischargedate`:struct<`value`:timestamp>>)
+ROW FORMAT SERDE
+  'org.apache.hadoop.hive.hbase.HBaseSerDe'
+STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
+WITH SERDEPROPERTIES (
+'serialization.format'='1',
+'hbase.columns.mapping' = ':key,data:frV4',
+'data.frV4.serialization.type'='avro',
+ A masked pattern was here 
+)
+TBLPROPERTIES (
+'hbase.table.name' = 'HiveAvroTable',
+'hbase.struct.autogenerate'='true')
+PREHOOK: type: CREATETABLE
+PREHOOK: Output: database:default
+PREHOOK: Output: default@tbl
+POSTHOOK: query: CREATE EXTERNAL TABLE tbl(
+`key` string COMMENT '',
+`data_frv4` struct<`id`:string, `dischargedate`:struct<`value`:timestamp>>)
+ROW FORMAT SERDE
+  'org.apache.hadoop.hive.hbase.HBaseSerDe'
+STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
+WITH SERDEPROPERTIES (
+'serialization.format'='1',
+'hbase.columns.mapping' = ':key,data:frV4',
+'data.frV4.serialization.type'='avro',
+ A masked pattern was here 
+)
+TBLPROPERTIES (
+'hbase.table.name' = 'HiveAvroTable',
+'hbase.struct.autogenerate'='true')
+POSTHOOK: type: CREATETABLE
+POSTHOOK: Output: database:default
+POSTHOOK: Output: default@tbl
+PREHOOK: query: select data_frV4.dischargedate.value from tbl
+PREHOOK: type: QUERY
+PREHOOK: Input: default@tbl
+ A masked pattern was here 
+POSTHOOK: query: select data_frV4.dischargedate.value from tbl
+POSTHOOK: type: QUERY
+POSTHOOK: Input: default@tbl
+ A masked pattern was here 
+1970-01-19 20:16:19.2

Review Comment:
   Copying here the offline follow-up by Soumyakanti:
   
   It's because `"logicalType": "timestamp-millis"` is defined in the avsc.
   
   I had to make this change 
   ```java
   dateRecord.put("value", 
LocalDate.of(2022,7,5).atStartOfDay().atZone(ZoneOffset.UTC).toInstant().toEpochMilli());
   ```
   
   However, right now the result I am getting for this is: 2022-07-04 17:00:00





Issue Time Tracking
---

Worklog Id: (was: 788627)
Time Spent: 50m  (was: 40m)

> ClassCastException while inserting Avro data into Hbase table for nested 
> struct with Timestamp
> --
>
> Key: HIVE-26373
> URL: https://issues.apache.org/jira/browse/HIVE-26373
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Soumyakanti Das
>Assignee: Soumyakanti Das
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> For Avro data where the schema has nested struct with a Timestamp datatype, 
> we get the following ClassCastException:
> {code:java}
> 2022-07-05T08:40:51,572 ERROR [LocalJobRunner Map Task Executor #0] 
> mr.ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:573)
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:148)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapRunner.run(ExecMapRunner.java:37)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
> at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:271)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.ClassCastException: 
> org.apache.hadoop.hive.common.type.Timestamp cannot be cast to 
> org.apache.hadoop.hive.serde2.lazy.LazyPrimitive
> at 
> org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.AbstractPrimitiveLazyObjectInspector.getPrimitiveWritableObject(AbstractPrimitiveLazyOb

[jira] [Work logged] (HIVE-26373) ClassCastException while inserting Avro data into Hbase table for nested struct with Timestamp

2022-07-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26373?focusedWorklogId=788618&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-788618
 ]

ASF GitHub Bot logged work on HIVE-26373:
-

Author: ASF GitHub Bot
Created on: 07/Jul/22 13:05
Start Date: 07/Jul/22 13:05
Worklog Time Spent: 10m 
  Work Description: zabetak commented on PR #3418:
URL: https://github.com/apache/hive/pull/3418#issuecomment-1177583388

   Hive has been always converting data from local time zone to UTC when 
writing and from UTC to local time zone when reading. I updated the way the the 
timestamp is stored in HBase 
(https://github.com/apache/hive/pull/3418/commits/fc9bc94be427a02485b089c2aeb6b494644beb05)
 to make it coherent with the way it is read by the query. 
   
   There are properties and Avro file metadata which can control if we want to 
perform the conversion or not (e.g., `hive.avro.timestamp.skip.conversion`) but 
these are not working at the moment for HBase (and basically anything that 
relies on `AvroLazyObjectInspector`). This is a bug that should be fixed but it 
is out of the scope of this PR.




Issue Time Tracking
---

Worklog Id: (was: 788618)
Time Spent: 40m  (was: 0.5h)

> ClassCastException while inserting Avro data into Hbase table for nested 
> struct with Timestamp
> --
>
> Key: HIVE-26373
> URL: https://issues.apache.org/jira/browse/HIVE-26373
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Soumyakanti Das
>Assignee: Soumyakanti Das
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> For Avro data where the schema has nested struct with a Timestamp datatype, 
> we get the following ClassCastException:
> {code:java}
> 2022-07-05T08:40:51,572 ERROR [LocalJobRunner Map Task Executor #0] 
> mr.ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:573)
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:148)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapRunner.run(ExecMapRunner.java:37)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
> at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:271)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.ClassCastException: 
> org.apache.hadoop.hive.common.type.Timestamp cannot be cast to 
> org.apache.hadoop.hive.serde2.lazy.LazyPrimitive
> at 
> org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.AbstractPrimitiveLazyObjectInspector.getPrimitiveWritableObject(AbstractPrimitiveLazyObjectInspector.java:40)
> at 
> org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyTimestampObjectInspector.getPrimitiveWritableObject(LazyTimestampObjectInspector.java:29)
> at 
> org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitiveUTF8(LazyUtils.java:308)
> at 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:292)
> at 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:247)
> at 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.doSerialize(LazySimpleSerDe.java:231)
> at 
> org.apache.hadoop.hive.serde2.AbstractEncodingAwareSerDe.serialize(AbstractEncodingAwareSerDe.java:55)
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:1059)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:937)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:937)
> at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:128)
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:152)
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:552)
> ... 11 more {code}
> The problem starts in {{toLazyObject

[jira] [Resolved] (HIVE-22822) Column masking policies on complex column will cause unrelative query failures

2022-07-07 Thread Quanlong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang resolved HIVE-22822.
---
Resolution: Duplicate

> Column masking policies on complex column will cause unrelative query failures
> --
>
> Key: HIVE-22822
> URL: https://issues.apache.org/jira/browse/HIVE-22822
> Project: Hive
>  Issue Type: Bug
>Reporter: Quanlong Huang
>Priority: Major
>
>  Create a table with complex types columns:
> {code:sql}
> CREATE TABLE customers(
>   id int, 
>   name string, 
>   email_preferences 
> struct>,
>  
>  addresses 
> map>,
>  
>   orders 
> array>>>
> ) stored as ORC;
> {code}
> In Ranger, add a column masking policy on the 'addresses' column to nullify 
> the values. Then run "select id from customers" in Hive. Hit the error:
> {code:java}
> Error while compiling statement: FAILED: SemanticException 
> org.apache.hadoop.hive.ql.parse.ParseException: line 1:101 cannot recognize 
> input near 'map' '<' 'string' in primitive type specification
> {code}
> The query just reads the "id" column and the failure looks like relative to 
> the masked "address" column. It should not fail.
> I use Hive3 in testing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-22822) Column masking policies on complex column will cause unrelative query failures

2022-07-07 Thread Quanlong Huang (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17563782#comment-17563782
 ] 

Quanlong Huang commented on HIVE-22822:
---

Resolving this as it duplicates HIVE-20628.

> Column masking policies on complex column will cause unrelative query failures
> --
>
> Key: HIVE-22822
> URL: https://issues.apache.org/jira/browse/HIVE-22822
> Project: Hive
>  Issue Type: Bug
>Reporter: Quanlong Huang
>Priority: Major
>
>  Create a table with complex types columns:
> {code:sql}
> CREATE TABLE customers(
>   id int, 
>   name string, 
>   email_preferences 
> struct>,
>  
>  addresses 
> map>,
>  
>   orders 
> array>>>
> ) stored as ORC;
> {code}
> In Ranger, add a column masking policy on the 'addresses' column to nullify 
> the values. Then run "select id from customers" in Hive. Hit the error:
> {code:java}
> Error while compiling statement: FAILED: SemanticException 
> org.apache.hadoop.hive.ql.parse.ParseException: line 1:101 cannot recognize 
> input near 'map' '<' 'string' in primitive type specification
> {code}
> The query just reads the "id" column and the failure looks like relative to 
> the masked "address" column. It should not fail.
> I use Hive3 in testing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HIVE-24066) Hive query on parquet data should identify if column is not present in file schema and show NULL value instead of Exception

2022-07-07 Thread Seonguk Kim (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17563720#comment-17563720
 ] 

Seonguk Kim edited comment on HIVE-24066 at 7/7/22 11:52 AM:
-

It would be useful if null check for context.os works.

(null check for struct column that not exists in file)


was (Author: JIRAUSER292443):
It seems null check for context.os (struct column that not exists in file) 
should work.

> Hive query on parquet data should identify if column is not present in file 
> schema and show NULL value instead of Exception
> ---
>
> Key: HIVE-24066
> URL: https://issues.apache.org/jira/browse/HIVE-24066
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.3.5, 3.1.2
>Reporter: Jainik Vora
>Priority: Major
> Attachments: day_01.snappy.parquet
>
>
> I created a hive table containing columns with struct data type 
>   
> {code:java}
> CREATE EXTERNAL TABLE test_dwh.sample_parquet_table (
>   `context` struct<
> `app`: struct<
> `build`: string,
> `name`: string,
> `namespace`: string,
> `version`: string
> >,
> `device`: struct<
> `adtrackingenabled`: boolean,
> `advertisingid`: string,
> `id`: string,
> `manufacturer`: string,
> `model`: string,
> `type`: string
> >,
> `locale`: string,
> `library`: struct<
> `name`: string,
> `version`: string
> >,
> `os`: struct<
> `name`: string,
> `version`: string
> >,
> `screen`: struct<
> `height`: bigint,
> `width`: bigint
> >,
> `network`: struct<
> `carrier`: string,
> `cellular`: boolean,
> `wifi`: boolean
>  >,
> `timezone`: string,
> `userAgent`: string
> >
> ) PARTITIONED BY (day string)
> STORED as PARQUET
> LOCATION 's3://xyz/events'{code}
>  
>  All columns are nullable hence the parquet files read by the table don't 
> always contain all columns. If any file in a partition doesn't have 
> "context.os" struct and if "context.os.name" is queried, Hive throws an 
> exception as below. Same for "context.screen" as well.
>   
> {code:java}
> 2020-10-23T00:44:10,496 ERROR [db58bfe6-d0ca-4233-845a-8a10916c3ff1 
> main([])]: CliDriver (SessionState.java:printError(1126)) - Failed with 
> exception java.io.IOException:java.lang.RuntimeException: Primitive type 
> osshould not doesn't match typeos[name]
> 2020-10-23T00:44:10,496 ERROR [db58bfe6-d0ca-4233-845a-8a10916c3ff1 
> main([])]: CliDriver (SessionState.java:printError(1126)) - Failed with 
> exception java.io.IOException:java.lang.RuntimeException: Primitive type 
> osshould not doesn't match typeos[name]java.io.IOException: 
> java.lang.RuntimeException: Primitive type osshould not doesn't match 
> typeos[name] 
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:521)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:428)
>   at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:147)
>   at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2208)
>   at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:253)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336)
>   at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:787)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.apache.hadoop.util.RunJar.run(RunJar.java:239)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
> Caused by: java.lang.RuntimeException: Primitive type osshould not doesn't 
> match typeos[name] 
>   at 
> org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.projectLeafTypes(DataWritableReadSupport.java:330)
>  
>   at 
> org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.projectLeafTypes(DataWritableReadSupport.java:322)
>  
>   at 
> org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadS

[jira] [Comment Edited] (HIVE-24066) Hive query on parquet data should identify if column is not present in file schema and show NULL value instead of Exception

2022-07-07 Thread Seonguk Kim (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17563720#comment-17563720
 ] 

Seonguk Kim edited comment on HIVE-24066 at 7/7/22 11:33 AM:
-

It seems null check for context.os (struct column that not exists in file) 
should work.


was (Author: JIRAUSER292443):
It seems null check for context.os (column that not exists in file) should work.

> Hive query on parquet data should identify if column is not present in file 
> schema and show NULL value instead of Exception
> ---
>
> Key: HIVE-24066
> URL: https://issues.apache.org/jira/browse/HIVE-24066
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.3.5, 3.1.2
>Reporter: Jainik Vora
>Priority: Major
> Attachments: day_01.snappy.parquet
>
>
> I created a hive table containing columns with struct data type 
>   
> {code:java}
> CREATE EXTERNAL TABLE test_dwh.sample_parquet_table (
>   `context` struct<
> `app`: struct<
> `build`: string,
> `name`: string,
> `namespace`: string,
> `version`: string
> >,
> `device`: struct<
> `adtrackingenabled`: boolean,
> `advertisingid`: string,
> `id`: string,
> `manufacturer`: string,
> `model`: string,
> `type`: string
> >,
> `locale`: string,
> `library`: struct<
> `name`: string,
> `version`: string
> >,
> `os`: struct<
> `name`: string,
> `version`: string
> >,
> `screen`: struct<
> `height`: bigint,
> `width`: bigint
> >,
> `network`: struct<
> `carrier`: string,
> `cellular`: boolean,
> `wifi`: boolean
>  >,
> `timezone`: string,
> `userAgent`: string
> >
> ) PARTITIONED BY (day string)
> STORED as PARQUET
> LOCATION 's3://xyz/events'{code}
>  
>  All columns are nullable hence the parquet files read by the table don't 
> always contain all columns. If any file in a partition doesn't have 
> "context.os" struct and if "context.os.name" is queried, Hive throws an 
> exception as below. Same for "context.screen" as well.
>   
> {code:java}
> 2020-10-23T00:44:10,496 ERROR [db58bfe6-d0ca-4233-845a-8a10916c3ff1 
> main([])]: CliDriver (SessionState.java:printError(1126)) - Failed with 
> exception java.io.IOException:java.lang.RuntimeException: Primitive type 
> osshould not doesn't match typeos[name]
> 2020-10-23T00:44:10,496 ERROR [db58bfe6-d0ca-4233-845a-8a10916c3ff1 
> main([])]: CliDriver (SessionState.java:printError(1126)) - Failed with 
> exception java.io.IOException:java.lang.RuntimeException: Primitive type 
> osshould not doesn't match typeos[name]java.io.IOException: 
> java.lang.RuntimeException: Primitive type osshould not doesn't match 
> typeos[name] 
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:521)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:428)
>   at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:147)
>   at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2208)
>   at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:253)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336)
>   at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:787)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.apache.hadoop.util.RunJar.run(RunJar.java:239)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
> Caused by: java.lang.RuntimeException: Primitive type osshould not doesn't 
> match typeos[name] 
>   at 
> org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.projectLeafTypes(DataWritableReadSupport.java:330)
>  
>   at 
> org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.projectLeafTypes(DataWritableReadSupport.java:322)
>  
>   at 
> org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.getProjectedSchema(Data

[jira] [Commented] (HIVE-24066) Hive query on parquet data should identify if column is not present in file schema and show NULL value instead of Exception

2022-07-07 Thread Seonguk Kim (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17563720#comment-17563720
 ] 

Seonguk Kim commented on HIVE-24066:


It seems null check for context.os (column that not exists in file) should work.

> Hive query on parquet data should identify if column is not present in file 
> schema and show NULL value instead of Exception
> ---
>
> Key: HIVE-24066
> URL: https://issues.apache.org/jira/browse/HIVE-24066
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.3.5, 3.1.2
>Reporter: Jainik Vora
>Priority: Major
> Attachments: day_01.snappy.parquet
>
>
> I created a hive table containing columns with struct data type 
>   
> {code:java}
> CREATE EXTERNAL TABLE test_dwh.sample_parquet_table (
>   `context` struct<
> `app`: struct<
> `build`: string,
> `name`: string,
> `namespace`: string,
> `version`: string
> >,
> `device`: struct<
> `adtrackingenabled`: boolean,
> `advertisingid`: string,
> `id`: string,
> `manufacturer`: string,
> `model`: string,
> `type`: string
> >,
> `locale`: string,
> `library`: struct<
> `name`: string,
> `version`: string
> >,
> `os`: struct<
> `name`: string,
> `version`: string
> >,
> `screen`: struct<
> `height`: bigint,
> `width`: bigint
> >,
> `network`: struct<
> `carrier`: string,
> `cellular`: boolean,
> `wifi`: boolean
>  >,
> `timezone`: string,
> `userAgent`: string
> >
> ) PARTITIONED BY (day string)
> STORED as PARQUET
> LOCATION 's3://xyz/events'{code}
>  
>  All columns are nullable hence the parquet files read by the table don't 
> always contain all columns. If any file in a partition doesn't have 
> "context.os" struct and if "context.os.name" is queried, Hive throws an 
> exception as below. Same for "context.screen" as well.
>   
> {code:java}
> 2020-10-23T00:44:10,496 ERROR [db58bfe6-d0ca-4233-845a-8a10916c3ff1 
> main([])]: CliDriver (SessionState.java:printError(1126)) - Failed with 
> exception java.io.IOException:java.lang.RuntimeException: Primitive type 
> osshould not doesn't match typeos[name]
> 2020-10-23T00:44:10,496 ERROR [db58bfe6-d0ca-4233-845a-8a10916c3ff1 
> main([])]: CliDriver (SessionState.java:printError(1126)) - Failed with 
> exception java.io.IOException:java.lang.RuntimeException: Primitive type 
> osshould not doesn't match typeos[name]java.io.IOException: 
> java.lang.RuntimeException: Primitive type osshould not doesn't match 
> typeos[name] 
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:521)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:428)
>   at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:147)
>   at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2208)
>   at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:253)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336)
>   at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:787)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.apache.hadoop.util.RunJar.run(RunJar.java:239)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
> Caused by: java.lang.RuntimeException: Primitive type osshould not doesn't 
> match typeos[name] 
>   at 
> org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.projectLeafTypes(DataWritableReadSupport.java:330)
>  
>   at 
> org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.projectLeafTypes(DataWritableReadSupport.java:322)
>  
>   at 
> org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.getProjectedSchema(DataWritableReadSupport.java:249)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.init(DataWritableReadSupport.java:379)
>  
>   at 
> org.apache.

[jira] [Commented] (HIVE-26373) ClassCastException while inserting Avro data into Hbase table for nested struct with Timestamp

2022-07-07 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17563716#comment-17563716
 ] 

Stamatis Zampetakis commented on HIVE-26373:


This bug probably exists from the very beginning that Hive added support for 
Avro data in HBase (HIVE-6147).

> ClassCastException while inserting Avro data into Hbase table for nested 
> struct with Timestamp
> --
>
> Key: HIVE-26373
> URL: https://issues.apache.org/jira/browse/HIVE-26373
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Soumyakanti Das
>Assignee: Soumyakanti Das
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> For Avro data where the schema has nested struct with a Timestamp datatype, 
> we get the following ClassCastException:
> {code:java}
> 2022-07-05T08:40:51,572 ERROR [LocalJobRunner Map Task Executor #0] 
> mr.ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:573)
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:148)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapRunner.run(ExecMapRunner.java:37)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
> at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:271)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.ClassCastException: 
> org.apache.hadoop.hive.common.type.Timestamp cannot be cast to 
> org.apache.hadoop.hive.serde2.lazy.LazyPrimitive
> at 
> org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.AbstractPrimitiveLazyObjectInspector.getPrimitiveWritableObject(AbstractPrimitiveLazyObjectInspector.java:40)
> at 
> org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyTimestampObjectInspector.getPrimitiveWritableObject(LazyTimestampObjectInspector.java:29)
> at 
> org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitiveUTF8(LazyUtils.java:308)
> at 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:292)
> at 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:247)
> at 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.doSerialize(LazySimpleSerDe.java:231)
> at 
> org.apache.hadoop.hive.serde2.AbstractEncodingAwareSerDe.serialize(AbstractEncodingAwareSerDe.java:55)
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:1059)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:937)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:937)
> at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:128)
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:152)
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:552)
> ... 11 more {code}
> The problem starts in {{toLazyObject}} method of 
> {*}AvroLazyObjectInspector.java{*}, when 
> [this|https://github.com/apache/hive/blob/53009126f6fe7ccf24cf052fd6c156542f38b19d/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroLazyObjectInspector.java#L347]
>  condition returns false for {*}Timestamp{*}, preventing the conversion of 
> *Timestamp* to *LazyTimestamp* 
> [here|https://github.com/apache/hive/blob/53009126f6fe7ccf24cf052fd6c156542f38b19d/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyFactory.java#L132].
> The solution is to return {{true}} for Timestamps in the {{isPrimitive}} 
> method.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-26372) QTests depend on mysql docker image are fail

2022-07-07 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa resolved HIVE-26372.
---
Resolution: Fixed

It seems mysql prints less details to stderr in 5.7.38. I don't know if follow 
up releases keep this behavior.
I just hardcoded the version 5.7.37 to unblock the PTest infra.

Pushed to master. Thanks [~zabetak] for review.

> QTests depend on mysql docker image are fail
> 
>
> Key: HIVE-26372
> URL: https://issues.apache.org/jira/browse/HIVE-26372
> Project: Hive
>  Issue Type: Bug
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When QTest framework launches a mysql docker container checks whether the 
> mysql instance is ready for receiving connections. It search for the text 
> {code}
> ready for connections
> {code}
>  in the stderr:
> https://github.com/apache/hive/blob/2f619988f69a569bfcdc2bef5d35a9ecabb2ef13/itests/util/src/main/java/org/apache/hadoop/hive/ql/externalDB/MySQLExternalDB.java#L56
> Seems that this behavior is changed at MySql side and QTest framework enters 
> into a infinite loo then times out after 300 sec.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26372) QTests depend on mysql docker image are fail

2022-07-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26372?focusedWorklogId=788542&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-788542
 ]

ASF GitHub Bot logged work on HIVE-26372:
-

Author: ASF GitHub Bot
Created on: 07/Jul/22 08:56
Start Date: 07/Jul/22 08:56
Worklog Time Spent: 10m 
  Work Description: kasakrisz merged PR #3416:
URL: https://github.com/apache/hive/pull/3416




Issue Time Tracking
---

Worklog Id: (was: 788542)
Time Spent: 20m  (was: 10m)

> QTests depend on mysql docker image are fail
> 
>
> Key: HIVE-26372
> URL: https://issues.apache.org/jira/browse/HIVE-26372
> Project: Hive
>  Issue Type: Bug
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When QTest framework launches a mysql docker container checks whether the 
> mysql instance is ready for receiving connections. It search for the text 
> {code}
> ready for connections
> {code}
>  in the stderr:
> https://github.com/apache/hive/blob/2f619988f69a569bfcdc2bef5d35a9ecabb2ef13/itests/util/src/main/java/org/apache/hadoop/hive/ql/externalDB/MySQLExternalDB.java#L56
> Seems that this behavior is changed at MySql side and QTest framework enters 
> into a infinite loo then times out after 300 sec.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26376) Hive Metastore connection leak (OOM Error)

2022-07-07 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17563607#comment-17563607
 ] 

Ayush Saxena commented on HIVE-26376:
-

Wild guess from the trace and problem. The config would be to disable 
FileSystem cache. 

{{fs.hdfs.impl.disable.cache}} will disable caching of FIleSystems for all HDFS 
Fs, similar configs for all FileSystems is there.

So, if there is no cached FileSystem, so no memory occupancy and no OOM. But 
that config is at Hadoop end

> Hive Metastore connection leak (OOM Error)
> --
>
> Key: HIVE-26376
> URL: https://issues.apache.org/jira/browse/HIVE-26376
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.1.2
> Environment: !Screenshot 2022-07-07 at 11.52.33 AM.png!
>Reporter: Ranith Sardar
>Priority: Major
> Attachments: Screenshot 2022-07-07 at 11.52.33 AM.png
>
>
> Hive version:3.1.2
> Hive metastore heap size is 14GB, Memory Leak is happening after 4-5 days, 
> hive meta-store throwing error with OOM.
> If we disable the configuration, the memory leak disappears.
> In the case of, Heap dump size 3.5GB, a large number of filesystem objects(> 
> 9k instances) are being retained. It's occupying most of the heap space. 
> Added snapshot from the eclipse MAT.
> Bellow are part of the stack trace for OOM error:
> {code:java}
> at 
> org.apache.hadoop.hive.common.FileUtils.getFileStatusOrNull(Lorg/apache/hadoop/fs/FileSystem;Lorg/apache/hadoop/fs/Path;)Lorg/apache/hadoop/fs/FileStatus;
>  (FileUtils.java:801)
>   at 
> org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider.checkPermissions(Lorg/apache/hadoop/conf/Configuration;Lorg/apache/hadoop/fs/Path;Ljava/util/EnumSet;)V
>  (StorageBasedAuthorizationProvider.java:371)
>   at 
> org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider.authorize(Lorg/apache/hadoop/fs/Path;[Lorg/apache/hadoop/hive/ql/security/authorization/Privilege;[Lorg/apache/hadoop/hive/ql/security/authorization/Privilege;)V
>  (StorageBasedAuthorizationProvider.java:346)
>   at 
> org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider.authorize(Lorg/apache/hadoop/hive/metastore/api/Database;[Lorg/apache/hadoop/hive/ql/security/authorization/Privilege;[Lorg/apache/hadoop/hive/ql/security/authorization/Privilege;)V
>  (StorageBasedAuthorizationProvider.java:154)
>   at 
> org.apache.hadoop.hive.ql.security.authorization.AuthorizationPreEventListener.authorizeReadDatabase(Lorg/apache/hadoop/hive/metastore/events/PreReadDatabaseEvent;)V
>  (AuthorizationPreEventListener.java:208)
>   at 
> org.apache.hadoop.hive.ql.security.authorization.AuthorizationPreEventListener.onEvent(Lorg/apache/hadoop/hive/metastore/events/PreEventContext;)V
>  (AuthorizationPreEventListener.java:153)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.firePreEvent(Lorg/apache/hadoop/hive/metastore/events/PreEventContext;)V
>  (HiveMetaStore.java:3221)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database(Ljava/lang/String;)Lorg/apache/hadoop/hive/metastore/api/Database;
>  (HiveMetaStore.java:1352){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26376) Hive Metastore connection leak (OOM Error)

2022-07-07 Thread Alessandro Solimando (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17563602#comment-17563602
 ] 

Alessandro Solimando commented on HIVE-26376:
-

Disabling which configuration exactly solves the problem?

> Hive Metastore connection leak (OOM Error)
> --
>
> Key: HIVE-26376
> URL: https://issues.apache.org/jira/browse/HIVE-26376
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.1.2
> Environment: !Screenshot 2022-07-07 at 11.52.33 AM.png!
>Reporter: Ranith Sardar
>Priority: Major
> Attachments: Screenshot 2022-07-07 at 11.52.33 AM.png
>
>
> Hive version:3.1.2
> Hive metastore heap size is 14GB, Memory Leak is happening after 4-5 days, 
> hive meta-store throwing error with OOM.
> If we disable the configuration, the memory leak disappears.
> In the case of, Heap dump size 3.5GB, a large number of filesystem objects(> 
> 9k instances) are being retained. It's occupying most of the heap space. 
> Added snapshot from the eclipse MAT.
> Bellow are part of the stack trace for OOM error:
> {code:java}
> at 
> org.apache.hadoop.hive.common.FileUtils.getFileStatusOrNull(Lorg/apache/hadoop/fs/FileSystem;Lorg/apache/hadoop/fs/Path;)Lorg/apache/hadoop/fs/FileStatus;
>  (FileUtils.java:801)
>   at 
> org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider.checkPermissions(Lorg/apache/hadoop/conf/Configuration;Lorg/apache/hadoop/fs/Path;Ljava/util/EnumSet;)V
>  (StorageBasedAuthorizationProvider.java:371)
>   at 
> org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider.authorize(Lorg/apache/hadoop/fs/Path;[Lorg/apache/hadoop/hive/ql/security/authorization/Privilege;[Lorg/apache/hadoop/hive/ql/security/authorization/Privilege;)V
>  (StorageBasedAuthorizationProvider.java:346)
>   at 
> org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider.authorize(Lorg/apache/hadoop/hive/metastore/api/Database;[Lorg/apache/hadoop/hive/ql/security/authorization/Privilege;[Lorg/apache/hadoop/hive/ql/security/authorization/Privilege;)V
>  (StorageBasedAuthorizationProvider.java:154)
>   at 
> org.apache.hadoop.hive.ql.security.authorization.AuthorizationPreEventListener.authorizeReadDatabase(Lorg/apache/hadoop/hive/metastore/events/PreReadDatabaseEvent;)V
>  (AuthorizationPreEventListener.java:208)
>   at 
> org.apache.hadoop.hive.ql.security.authorization.AuthorizationPreEventListener.onEvent(Lorg/apache/hadoop/hive/metastore/events/PreEventContext;)V
>  (AuthorizationPreEventListener.java:153)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.firePreEvent(Lorg/apache/hadoop/hive/metastore/events/PreEventContext;)V
>  (HiveMetaStore.java:3221)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database(Ljava/lang/String;)Lorg/apache/hadoop/hive/metastore/api/Database;
>  (HiveMetaStore.java:1352){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)