[jira] [Commented] (HIVE-24666) Vectorized UDFToBoolean may unable to filter rows if input is string

2021-01-24 Thread Zhihua Deng (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17271106#comment-17271106
 ] 

Zhihua Deng commented on HIVE-24666:


Thanks much for the reply.

> So this might need a generic wrap PROJECTION with SelectColumnIsTrue as a 
>general case (not just for cast or just that one cast boolean).

I'm not sure I understand for this,  but put all non-boolean filter expressions 
casting at logical plan and move the vectorized UDFToBoolean to a standalone 
method.  The vectorization has implemented the constants, user customized 
functions,  columns that wrapping with SelectColumnIsTrue if use these to 
filter the rows.  Cloud you please put it a little bit more if I am wrong? 

> but it fixes only the specific issue by wrapping it with the filter (that 
>modifies the .selected vector) - the real issue is hiding somewhere else.

I think the cause is that the vectorized expressions of UDFToBoolean only have 
PROJECTION mode in them, as you have explained, so when we use it to filter 
rows, we should evaluate SelectColumnIsTrue on the results of cast before 
forwarding the batch to the next operation.

 

> Vectorized UDFToBoolean may unable to filter rows if input is string
> 
>
> Key: HIVE-24666
> URL: https://issues.apache.org/jira/browse/HIVE-24666
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HIVE-24666.2.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If we use cast boolean in where conditions to filter rows,  in vectorization 
> execution the filter is unable to filter rows,  step to reproduce:
> {code:java}
> create table vtb (key string, value string);
> insert into table vtb values('0', 'val0'), ('false', 'valfalse'),('off', 
> 'valoff'),('no','valno'),('vk', 'valvk');
> select distinct value from vtb where cast(key as boolean); {code}
> It's seems we don't generate a SelectColumnIsTrue to filter the rows if the 
> casted type is string:
>  
> https://github.com/apache/hive/blob/ff6f3565e50148b7bcfbcf19b970379f2bd59290/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java#L2995-L2996



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24675) Handle external table replication for HA with same NS and lazy copy.

2021-01-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24675?focusedWorklogId=540896=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-540896
 ]

ASF GitHub Bot logged work on HIVE-24675:
-

Author: ASF GitHub Bot
Created on: 25/Jan/21 06:58
Start Date: 25/Jan/21 06:58
Worklog Time Spent: 10m 
  Work Description: ArkoSharma commented on a change in pull request #1898:
URL: https://github.com/apache/hive/pull/1898#discussion_r563496522



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplExternalTables.java
##
@@ -183,6 +184,22 @@ private void dirLocationToCopy(FileList fileList, Path 
sourcePath, HiveConf conf
 throws HiveException {
   Path basePath = getExternalTableBaseDir(conf);
   Path targetPath = externalTableDataPath(conf, basePath, sourcePath);
+  //Here, when src and target are HA clusters with same NS, then 
sourcePath would have the correct host
+  //whereas the targetPath would have an host that refers to the target 
cluster. This is fine for
+  //data-copy running during dump as the correct logical locations would 
be used. But if data-copy runs during
+  //load, then the remote location needs to point to the src cluster from 
where the data would be copied and
+  //the common original NS would suffice for targetPath.
+  
if(hiveConf.getBoolVar(HiveConf.ConfVars.REPL_HA_DATAPATH_REPLACE_REMOTE_NAMESERVICE)
 &&
+  
hiveConf.getBoolVar(HiveConf.ConfVars.REPL_RUN_DATA_COPY_TASKS_ON_TARGET)) {
+String remoteNS = 
hiveConf.get(HiveConf.ConfVars.REPL_HA_DATAPATH_REPLACE_REMOTE_NAMESERVICE_NAME.varname);
+if (StringUtils.isEmpty(remoteNS)) {
+  throw new SemanticException(ErrorMsg.REPL_INVALID_CONFIG_FOR_SERVICE

Review comment:
   Yes, it is a non recoverable error if the remote NS config is not 
specified.There needs to be a valid remote NS name and the same name should be 
used in both the dump and load commands. 
   
   The same exception is being thrown for managed tables.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 540896)
Time Spent: 50m  (was: 40m)

> Handle external table replication for HA with same NS and lazy copy.
> 
>
> Key: HIVE-24675
> URL: https://issues.apache.org/jira/browse/HIVE-24675
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24675.01.patch, HIVE-24675.02.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24675) Handle external table replication for HA with same NS and lazy copy.

2021-01-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24675?focusedWorklogId=540895=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-540895
 ]

ASF GitHub Bot logged work on HIVE-24675:
-

Author: ASF GitHub Bot
Created on: 25/Jan/21 06:49
Start Date: 25/Jan/21 06:49
Worklog Time Spent: 10m 
  Work Description: ArkoSharma commented on a change in pull request #1898:
URL: https://github.com/apache/hive/pull/1898#discussion_r563493498



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosAcrossInstances.java
##
@@ -2122,4 +2221,21 @@ private void setupUDFJarOnHDFS(Path 
identityUdfLocalPath, Path identityUdfHdfsPa
 + NS_REMOTE + "'");
 return withClause;
   }
+
+  /*
+   * Method used from TestReplicationScenariosExclusiveReplica
+   */
+  private void assertExternalFileInfo(List expected, String 
dumplocation, boolean isIncremental,
+  WarehouseInstance warehouseInstance)
+  throws IOException {
+Path hivePath = new Path(dumplocation, ReplUtils.REPL_HIVE_BASE_DIR);
+Path metadataPath = new Path(hivePath, EximUtil.METADATA_PATH_NAME);
+Path externalTableInfoFile;
+if (isIncremental) {
+  externalTableInfoFile = new Path(hivePath, FILE_NAME);

Review comment:
   This is the deprecated one used in existing external table tests. 
Changing these tests with the new file is done as part a separate issue which 
cleans up the deprecated file.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 540895)
Time Spent: 40m  (was: 0.5h)

> Handle external table replication for HA with same NS and lazy copy.
> 
>
> Key: HIVE-24675
> URL: https://issues.apache.org/jira/browse/HIVE-24675
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24675.01.patch, HIVE-24675.02.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24675) Handle external table replication for HA with same NS and lazy copy.

2021-01-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24675?focusedWorklogId=540885=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-540885
 ]

ASF GitHub Bot logged work on HIVE-24675:
-

Author: ASF GitHub Bot
Created on: 25/Jan/21 06:18
Start Date: 25/Jan/21 06:18
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #1898:
URL: https://github.com/apache/hive/pull/1898#discussion_r563482079



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosAcrossInstances.java
##
@@ -1707,6 +1738,50 @@ public void testHdfsNameserviceLazyCopyIncr() throws 
Throwable {
 }
   }
 
+  @Test
+  public void testHdfsNSLazyCopyIncrExtTbls() throws Throwable {

Review comment:
   both tests can be combined?

##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosAcrossInstances.java
##
@@ -2122,4 +2221,21 @@ private void setupUDFJarOnHDFS(Path 
identityUdfLocalPath, Path identityUdfHdfsPa
 + NS_REMOTE + "'");
 return withClause;
   }
+
+  /*
+   * Method used from TestReplicationScenariosExclusiveReplica
+   */
+  private void assertExternalFileInfo(List expected, String 
dumplocation, boolean isIncremental,
+  WarehouseInstance warehouseInstance)
+  throws IOException {
+Path hivePath = new Path(dumplocation, ReplUtils.REPL_HIVE_BASE_DIR);
+Path metadataPath = new Path(hivePath, EximUtil.METADATA_PATH_NAME);
+Path externalTableInfoFile;
+if (isIncremental) {
+  externalTableInfoFile = new Path(hivePath, FILE_NAME);

Review comment:
   Is this the actual file used by code or the deprecated one?

##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosAcrossInstances.java
##
@@ -1668,6 +1671,34 @@ public void testHdfsNameserviceLazyCopy() throws 
Throwable {
 }
   }
 
+  @Test
+  public void testHdfsNSLazyCopyBootStrapExtTbls() throws Throwable {
+List clause = getHdfsNameserviceClause();
+clause.add("'" + 
HiveConf.ConfVars.REPL_DUMP_METADATA_ONLY_FOR_EXTERNAL_TABLE.varname + 
"'='false'");
+clause.add("'" + HiveConf.ConfVars.REPL_INCLUDE_EXTERNAL_TABLES.varname + 
"'='true'");

Review comment:
   this is true by default

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplExternalTables.java
##
@@ -183,6 +184,22 @@ private void dirLocationToCopy(FileList fileList, Path 
sourcePath, HiveConf conf
 throws HiveException {
   Path basePath = getExternalTableBaseDir(conf);
   Path targetPath = externalTableDataPath(conf, basePath, sourcePath);
+  //Here, when src and target are HA clusters with same NS, then 
sourcePath would have the correct host
+  //whereas the targetPath would have an host that refers to the target 
cluster. This is fine for
+  //data-copy running during dump as the correct logical locations would 
be used. But if data-copy runs during
+  //load, then the remote location needs to point to the src cluster from 
where the data would be copied and
+  //the common original NS would suffice for targetPath.
+  
if(hiveConf.getBoolVar(HiveConf.ConfVars.REPL_HA_DATAPATH_REPLACE_REMOTE_NAMESERVICE)
 &&
+  
hiveConf.getBoolVar(HiveConf.ConfVars.REPL_RUN_DATA_COPY_TASKS_ON_TARGET)) {
+String remoteNS = 
hiveConf.get(HiveConf.ConfVars.REPL_HA_DATAPATH_REPLACE_REMOTE_NAMESERVICE_NAME.varname);
+if (StringUtils.isEmpty(remoteNS)) {
+  throw new SemanticException(ErrorMsg.REPL_INVALID_CONFIG_FOR_SERVICE

Review comment:
   non recoverable error?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 540885)
Time Spent: 0.5h  (was: 20m)

> Handle external table replication for HA with same NS and lazy copy.
> 
>
> Key: HIVE-24675
> URL: https://issues.apache.org/jira/browse/HIVE-24675
> Project: Hive
>  Issue Type: Bug
>Reporter: Arko Sharma
>Assignee: Arko Sharma
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24675.01.patch, HIVE-24675.02.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24386) Add builder methods for GetTablesRequest and GetPartitionsRequest to HiveMetaStoreClient

2021-01-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24386?focusedWorklogId=540833=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-540833
 ]

ASF GitHub Bot logged work on HIVE-24386:
-

Author: ASF GitHub Bot
Created on: 25/Jan/21 04:07
Start Date: 25/Jan/21 04:07
Worklog Time Spent: 10m 
  Work Description: vnhive closed pull request #1694:
URL: https://github.com/apache/hive/pull/1694


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 540833)
Time Spent: 1h 50m  (was: 1h 40m)

> Add builder methods for GetTablesRequest and GetPartitionsRequest to 
> HiveMetaStoreClient
> 
>
> Key: HIVE-24386
> URL: https://issues.apache.org/jira/browse/HIVE-24386
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Narayanan Venkateswaran
>Assignee: Narayanan Venkateswaran
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Builder methods for GetTablesRequest and GetPartitionsRequest should be added 
> to the HiveMetaStoreClient class.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-19253) HMS ignores tableType property for external tables

2021-01-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-19253?focusedWorklogId=540804=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-540804
 ]

ASF GitHub Bot logged work on HIVE-19253:
-

Author: ASF GitHub Bot
Created on: 25/Jan/21 01:32
Start Date: 25/Jan/21 01:32
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #1537:
URL: https://github.com/apache/hive/pull/1537#issuecomment-766484617


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 540804)
Time Spent: 2h 20m  (was: 2h 10m)

> HMS ignores tableType property for external tables
> --
>
> Key: HIVE-19253
> URL: https://issues.apache.org/jira/browse/HIVE-19253
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.1.0, 3.0.0, 4.0.0
>Reporter: Alex Kolbasov
>Assignee: Vihang Karajgaonkar
>Priority: Major
>  Labels: newbie, pull-request-available
> Attachments: HIVE-19253.01.patch, HIVE-19253.02.patch, 
> HIVE-19253.03.patch, HIVE-19253.03.patch, HIVE-19253.04.patch, 
> HIVE-19253.05.patch, HIVE-19253.06.patch, HIVE-19253.07.patch, 
> HIVE-19253.08.patch, HIVE-19253.09.patch, HIVE-19253.10.patch, 
> HIVE-19253.11.patch, HIVE-19253.12.patch
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> When someone creates a table using Thrift API they may think that setting 
> tableType to {{EXTERNAL_TABLE}} creates an external table. And boom - their 
> table is gone later because HMS will silently change it to managed table.
> here is the offending code:
> {code:java}
>   private MTable convertToMTable(Table tbl) throws InvalidObjectException,
>   MetaException {
> ...
> // If the table has property EXTERNAL set, update table type
> // accordingly
> String tableType = tbl.getTableType();
> boolean isExternal = 
> Boolean.parseBoolean(tbl.getParameters().get("EXTERNAL"));
> if (TableType.MANAGED_TABLE.toString().equals(tableType)) {
>   if (isExternal) {
> tableType = TableType.EXTERNAL_TABLE.toString();
>   }
> }
> if (TableType.EXTERNAL_TABLE.toString().equals(tableType)) {
>   if (!isExternal) { // Here!
> tableType = TableType.MANAGED_TABLE.toString();
>   }
> }
> {code}
> So if the EXTERNAL parameter is not set, table type is changed to managed 
> even if it was external in the first place - which is wrong.
> More over, in other places code looks at the table property to decide table 
> type and some places look at parameter. HMS should really make its mind which 
> one to use.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24313) Optimise stats collection for file sizes on cloud storage

2021-01-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24313?focusedWorklogId=540803=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-540803
 ]

ASF GitHub Bot logged work on HIVE-24313:
-

Author: ASF GitHub Bot
Created on: 25/Jan/21 01:32
Start Date: 25/Jan/21 01:32
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1636:
URL: https://github.com/apache/hive/pull/1636


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 540803)
Time Spent: 50m  (was: 40m)

> Optimise stats collection for file sizes on cloud storage
> -
>
> Key: HIVE-24313
> URL: https://issues.apache.org/jira/browse/HIVE-24313
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Rajesh Balamohan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> When stats information is not present (e.g external table), RelOptHiveTable 
> computes basic stats at runtime.
> Following is the codepath.
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/RelOptHiveTable.java#L598]
> {code:java}
> Statistics stats = StatsUtils.collectStatistics(hiveConf, partitionList,
> hiveTblMetadata, hiveNonPartitionCols, 
> nonPartColNamesThatRqrStats, colStatsCached,
> nonPartColNamesThatRqrStats, true);
>  {code}
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java#L322]
> {code:java}
> for (Partition p : partList.getNotDeniedPartns()) {
> BasicStats basicStats = 
> basicStatsFactory.build(Partish.buildFor(table, p));
> partStats.add(basicStats);
>   }
>  {code}
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/stats/BasicStats.java#L205]
>  
> {code:java}
> try {
> ds = getFileSizeForPath(path);
>   } catch (IOException e) {
> ds = 0L;
>   }
>  {code}
>  
> For a table & query with large number of partitions, this takes long time to 
> compute statistics and increases compilation time.  It would be good to fix 
> it with "ForkJoinPool" ( 
> partList.getNotDeniedPartns().parallelStream().forEach((p) )
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24430) DiskRangeInfo should make use of DiskRangeList

2021-01-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24430?focusedWorklogId=540801=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-540801
 ]

ASF GitHub Bot logged work on HIVE-24430:
-

Author: ASF GitHub Bot
Created on: 25/Jan/21 01:32
Start Date: 25/Jan/21 01:32
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #1709:
URL: https://github.com/apache/hive/pull/1709#issuecomment-766484603


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 540801)
Time Spent: 40m  (was: 0.5h)

> DiskRangeInfo should make use of DiskRangeList
> --
>
> Key: HIVE-24430
> URL: https://issues.apache.org/jira/browse/HIVE-24430
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> DiskRangeInfo should make user of DiskRangeList instead of List – 
> this will help us transition to ORC 1.6.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24324) Remove deprecated API usage from Avro

2021-01-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24324?focusedWorklogId=540802=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-540802
 ]

ASF GitHub Bot logged work on HIVE-24324:
-

Author: ASF GitHub Bot
Created on: 25/Jan/21 01:32
Start Date: 25/Jan/21 01:32
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #1711:
URL: https://github.com/apache/hive/pull/1711#issuecomment-766484598


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 540802)
Time Spent: 1h 20m  (was: 1h 10m)

> Remove deprecated API usage from Avro
> -
>
> Key: HIVE-24324
> URL: https://issues.apache.org/jira/browse/HIVE-24324
> Project: Hive
>  Issue Type: Improvement
>  Components: Avro
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.3.8, 4.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> {{JsonProperties#getJsonProp}} has been marked as deprecated in Avro 1.8 and 
> removed since Avro 1.9. This replaces the API usage for this with 
> {{getObjectProp}} which doesn't leak Json node from jackson. This will help 
> downstream apps to depend on Hive while using higher version of Avro, and 
> also help Hive to upgrade Avro version itself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24456) Column masking/hashing function in hive should use SH512 if FIPS mode is enabled

2021-01-24 Thread Sai Hemanth Gantasala (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sai Hemanth Gantasala updated HIVE-24456:
-
Issue Type: Improvement  (was: Wish)

> Column masking/hashing function in hive should use SH512 if FIPS mode is 
> enabled
> 
>
> Key: HIVE-24456
> URL: https://issues.apache.org/jira/browse/HIVE-24456
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> hive-site.xml should have the following property to indicate that FIPS mode 
> is enabled.
> 
>     hive.masking.algo
>      sha512
> 
> If this property is present, then GenericUDFMaskHash should use SHA512 
> instead of SHA256 encoding for column masking.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-24666) Vectorized UDFToBoolean may unable to filter rows if input is string

2021-01-24 Thread Gopal Vijayaraghavan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17270899#comment-17270899
 ] 

Gopal Vijayaraghavan edited comment on HIVE-24666 at 1/24/21, 2:22 PM:
---

I think for an uncasted column, this is how it gets placed for decimal64.

https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java#L1942

{code}
  } else if (childExpr instanceof ExprNodeColumnDesc) {
  int colIndex = getInputColumnIndex((ExprNodeColumnDesc) childExpr);
  if (childrenMode == VectorExpressionDescriptor.Mode.FILTER) {

VectorExpression filterExpr = 
getFilterOnBooleanColumnExpression((ExprNodeColumnDesc) childExpr, colIndex);
if (filterExpr == null) {
  return null;
}

children.add(filterExpr);
  }
  arguments[i] = colIndex;
}
{code}


was (Author: gopalv):
I think for an uncasted column, this is how it gets placed.

https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java#L1942

{code}
  } else if (childExpr instanceof ExprNodeColumnDesc) {
  int colIndex = getInputColumnIndex((ExprNodeColumnDesc) childExpr);
  if (childrenMode == VectorExpressionDescriptor.Mode.FILTER) {

VectorExpression filterExpr = 
getFilterOnBooleanColumnExpression((ExprNodeColumnDesc) childExpr, colIndex);
if (filterExpr == null) {
  return null;
}

children.add(filterExpr);
  }
  arguments[i] = colIndex;
}
{code}

> Vectorized UDFToBoolean may unable to filter rows if input is string
> 
>
> Key: HIVE-24666
> URL: https://issues.apache.org/jira/browse/HIVE-24666
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HIVE-24666.2.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If we use cast boolean in where conditions to filter rows,  in vectorization 
> execution the filter is unable to filter rows,  step to reproduce:
> {code:java}
> create table vtb (key string, value string);
> insert into table vtb values('0', 'val0'), ('false', 'valfalse'),('off', 
> 'valoff'),('no','valno'),('vk', 'valvk');
> select distinct value from vtb where cast(key as boolean); {code}
> It's seems we don't generate a SelectColumnIsTrue to filter the rows if the 
> casted type is string:
>  
> https://github.com/apache/hive/blob/ff6f3565e50148b7bcfbcf19b970379f2bd59290/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java#L2995-L2996



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24666) Vectorized UDFToBoolean may unable to filter rows if input is string

2021-01-24 Thread Gopal Vijayaraghavan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17270899#comment-17270899
 ] 

Gopal Vijayaraghavan commented on HIVE-24666:
-

I think for an uncasted column, this is how it gets placed.

https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java#L1942

{code}
  } else if (childExpr instanceof ExprNodeColumnDesc) {
  int colIndex = getInputColumnIndex((ExprNodeColumnDesc) childExpr);
  if (childrenMode == VectorExpressionDescriptor.Mode.FILTER) {

VectorExpression filterExpr = 
getFilterOnBooleanColumnExpression((ExprNodeColumnDesc) childExpr, colIndex);
if (filterExpr == null) {
  return null;
}

children.add(filterExpr);
  }
  arguments[i] = colIndex;
}
{code}

> Vectorized UDFToBoolean may unable to filter rows if input is string
> 
>
> Key: HIVE-24666
> URL: https://issues.apache.org/jira/browse/HIVE-24666
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HIVE-24666.2.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If we use cast boolean in where conditions to filter rows,  in vectorization 
> execution the filter is unable to filter rows,  step to reproduce:
> {code:java}
> create table vtb (key string, value string);
> insert into table vtb values('0', 'val0'), ('false', 'valfalse'),('off', 
> 'valoff'),('no','valno'),('vk', 'valvk');
> select distinct value from vtb where cast(key as boolean); {code}
> It's seems we don't generate a SelectColumnIsTrue to filter the rows if the 
> casted type is string:
>  
> https://github.com/apache/hive/blob/ff6f3565e50148b7bcfbcf19b970379f2bd59290/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java#L2995-L2996



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24666) Vectorized UDFToBoolean may unable to filter rows if input is string

2021-01-24 Thread Gopal Vijayaraghavan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17270881#comment-17270881
 ] 

Gopal Vijayaraghavan commented on HIVE-24666:
-

So this fix is good, but it fixes only the specific issue by wrapping it with 
the filter (that modifies the .selected vector) - the real issue is hiding 
somewhere else.

> Vectorized UDFToBoolean may unable to filter rows if input is string
> 
>
> Key: HIVE-24666
> URL: https://issues.apache.org/jira/browse/HIVE-24666
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HIVE-24666.2.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If we use cast boolean in where conditions to filter rows,  in vectorization 
> execution the filter is unable to filter rows,  step to reproduce:
> {code:java}
> create table vtb (key string, value string);
> insert into table vtb values('0', 'val0'), ('false', 'valfalse'),('off', 
> 'valoff'),('no','valno'),('vk', 'valvk');
> select distinct value from vtb where cast(key as boolean); {code}
> It's seems we don't generate a SelectColumnIsTrue to filter the rows if the 
> casted type is string:
>  
> https://github.com/apache/hive/blob/ff6f3565e50148b7bcfbcf19b970379f2bd59290/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java#L2995-L2996



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24666) Vectorized UDFToBoolean may unable to filter rows if input is string

2021-01-24 Thread Gopal Vijayaraghavan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17270879#comment-17270879
 ] 

Gopal Vijayaraghavan commented on HIVE-24666:
-

I see your fix, but I think this is a class of problems as it is pretty clear 
that the CastStringToBoolean only has a PROJECTION in it, so the 
SelectColumnIsTrue has got FILTER.

So this might need a generic wrap PROJECTION with SelectColumnIsTrue as a 
general case (not just for cast or just that one cast boolean).

> Vectorized UDFToBoolean may unable to filter rows if input is string
> 
>
> Key: HIVE-24666
> URL: https://issues.apache.org/jira/browse/HIVE-24666
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HIVE-24666.2.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If we use cast boolean in where conditions to filter rows,  in vectorization 
> execution the filter is unable to filter rows,  step to reproduce:
> {code:java}
> create table vtb (key string, value string);
> insert into table vtb values('0', 'val0'), ('false', 'valfalse'),('off', 
> 'valoff'),('no','valno'),('vk', 'valvk');
> select distinct value from vtb where cast(key as boolean); {code}
> It's seems we don't generate a SelectColumnIsTrue to filter the rows if the 
> casted type is string:
>  
> https://github.com/apache/hive/blob/ff6f3565e50148b7bcfbcf19b970379f2bd59290/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java#L2995-L2996



--
This message was sent by Atlassian Jira
(v8.3.4#803005)