[jira] [Updated] (HIVE-27638) Preparing for 4.0.0-beta-2 development

2023-08-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-27638:
--
Labels: pull-request-available  (was: )

> Preparing for 4.0.0-beta-2 development
> --
>
> Key: HIVE-27638
> URL: https://issues.apache.org/jira/browse/HIVE-27638
> Project: Hive
>  Issue Type: Task
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>
> The main goal of this ticket is to increment the version and add the 
> necessary metastore upgrade scripts so we don't lose track of what changed 
> after the beta-1 release.
> If later we decide to use another name (other than beta-2) that would be 
> completely fine (and hopefully a simple rename would do). The most important 
> thing in this change is to have the scripts in place so we don't mess up when 
> we push changes to the metastore schema.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27638) Preparing for 4.0.0-beta-2 development

2023-08-22 Thread Stamatis Zampetakis (Jira)
Stamatis Zampetakis created HIVE-27638:
--

 Summary: Preparing for 4.0.0-beta-2 development
 Key: HIVE-27638
 URL: https://issues.apache.org/jira/browse/HIVE-27638
 Project: Hive
  Issue Type: Task
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


The main goal of this ticket is to increment the version and add the necessary 
metastore upgrade scripts so we don't lose track of what changed after the 
beta-1 release.

If later we decide to use another name (other than beta-2) that would be 
completely fine (and hopefully a simple rename would do). The most important 
thing in this change is to have the scripts in place so we don't mess up when 
we push changes to the metastore schema.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27637) Compare highest write ID of compaction records when trying to perform abort cleanup

2023-08-22 Thread Zsolt Miskolczi (Jira)
Zsolt Miskolczi created HIVE-27637:
--

 Summary: Compare highest write ID of compaction records when 
trying to perform abort cleanup
 Key: HIVE-27637
 URL: https://issues.apache.org/jira/browse/HIVE-27637
 Project: Hive
  Issue Type: Task
  Components: Hive
Reporter: Zsolt Miskolczi
Assignee: Zsolt Miskolczi


Compare highest write ID of compaction records when trying to get the potential 
table/partitions for abort cleanup.

Idea: If there exists a highest write ID of a record in COMPACTION_QUEUE for a 
table/partition which is greater than the max(aborted write ID) for that 
table/partition, then we can potentially ignore abort cleanup for such 
tables/partitions. This is because compaction will perform cleanup of obsolete 
deltas and aborted deltas hence doing abort cleanup is redundant here.

This is more of an optimisation since it can potentially save some filesystem 
operations (mainly file-listing during construction of Acid state).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-27631) Fix CCE when set fs.hdfs.impl other than DistributedFileSystem

2023-08-22 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena resolved HIVE-27631.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

> Fix CCE when set fs.hdfs.impl other than DistributedFileSystem
> --
>
> Key: HIVE-27631
> URL: https://issues.apache.org/jira/browse/HIVE-27631
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.3.5, 2.3.7, 3.1.3
>Reporter: Baolong Mao
>Assignee: Baolong Mao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: image-2023-08-17-17-01-15-753.png
>
>
> !image-2023-08-17-17-01-15-753.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27631) Fix CCE when set fs.hdfs.impl other than DistributedFileSystem

2023-08-22 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757469#comment-17757469
 ] 

Ayush Saxena commented on HIVE-27631:
-

Committed to master. 
Thanx [~maobaolong] for the contribution!!!

> Fix CCE when set fs.hdfs.impl other than DistributedFileSystem
> --
>
> Key: HIVE-27631
> URL: https://issues.apache.org/jira/browse/HIVE-27631
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.3.5, 2.3.7, 3.1.3
>Reporter: Baolong Mao
>Assignee: Baolong Mao
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2023-08-17-17-01-15-753.png
>
>
> !image-2023-08-17-17-01-15-753.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27631) Fix CCE when set fs.hdfs.impl other than DistributedFileSystem

2023-08-22 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HIVE-27631:

Summary: Fix CCE when set fs.hdfs.impl other than DistributedFileSystem  
(was: CCE while use Alluxio ShimFilesystem to adapt hdfs)

> Fix CCE when set fs.hdfs.impl other than DistributedFileSystem
> --
>
> Key: HIVE-27631
> URL: https://issues.apache.org/jira/browse/HIVE-27631
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.3.5, 2.3.7, 3.1.3
>Reporter: Baolong Mao
>Assignee: Baolong Mao
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2023-08-17-17-01-15-753.png
>
>
> !image-2023-08-17-17-01-15-753.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-27585) Upgrade kryo serialization lib to latest version

2023-08-22 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena resolved HIVE-27585.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

> Upgrade kryo serialization lib to latest version
> 
>
> Key: HIVE-27585
> URL: https://issues.apache.org/jira/browse/HIVE-27585
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 4.0.0-beta-1
>Reporter: Suprith Chandrashekharachar
>Assignee: Suprith Chandrashekharachar
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Some performance improvements and other bug fixes that could be useful 
> [https://github.com/EsotericSoftware/kryo/compare/kryo-parent-5.2.0...kryo-parent-5.5.0.]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27585) Upgrade kryo serialization lib to latest version

2023-08-22 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757466#comment-17757466
 ] 

Ayush Saxena commented on HIVE-27585:
-

Committed to master.
Thanx [~Chandrashekharachar] for the contribution, [~aturoczy] & [~simhadri-g] 
for the reviews!!!

> Upgrade kryo serialization lib to latest version
> 
>
> Key: HIVE-27585
> URL: https://issues.apache.org/jira/browse/HIVE-27585
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 4.0.0-beta-1
>Reporter: Suprith Chandrashekharachar
>Assignee: Suprith Chandrashekharachar
>Priority: Minor
>  Labels: pull-request-available
>
> Some performance improvements and other bug fixes that could be useful 
> [https://github.com/EsotericSoftware/kryo/compare/kryo-parent-5.2.0...kryo-parent-5.5.0.]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27113) Increasing default for hive.thrift.client.max.message.size to 2 GB

2023-08-22 Thread Riju Trivedi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Riju Trivedi updated HIVE-27113:

Description: 
 
{code:java}
HIVE_THRIFT_CLIENT_MAX_MESSAGE_SIZE("hive.thrift.client.max.message.size", 
"1gb",
new SizeValidator(-1L, true, (long) Integer.MAX_VALUE, true),
"Thrift client configuration for max message size. 0 or -1 will use the default 
defined in the Thrift " +
"library. The upper limit is 2147483648 bytes (or 2gb).")
 
{code}
Documentation on the help suggests setting 2147483648 while Integer Max is 
2147483647. So, it actually becomes -1 and gets set to thrift default limit 
(100 MB)

  was:
HIVE_THRIFT_CLIENT_MAX_MESSAGE_SIZE("hive.thrift.client.max.message.size", 
"1gb",
new SizeValidator(-1L, true, (long) Integer.MAX_VALUE, true),
"Thrift client configuration for max message size. 0 or -1 will use the default 
defined in the Thrift " +
"library. The upper limit is 2147483648 bytes (or 2gb).")


Documentation on the help suggests setting 2147483648 while Integer Max is 
2147483647. So, it actually becomes -1 and gets set to thrift default limit 
(100 MB)


> Increasing default for hive.thrift.client.max.message.size to 2 GB
> --
>
> Key: HIVE-27113
> URL: https://issues.apache.org/jira/browse/HIVE-27113
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Riju Trivedi
>Assignee: Riju Trivedi
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
>  
> {code:java}
> HIVE_THRIFT_CLIENT_MAX_MESSAGE_SIZE("hive.thrift.client.max.message.size", 
> "1gb",
> new SizeValidator(-1L, true, (long) Integer.MAX_VALUE, true),
> "Thrift client configuration for max message size. 0 or -1 will use the 
> default defined in the Thrift " +
> "library. The upper limit is 2147483648 bytes (or 2gb).")
>  
> {code}
> Documentation on the help suggests setting 2147483648 while Integer Max is 
> 2147483647. So, it actually becomes -1 and gets set to thrift default limit 
> (100 MB)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27113) Increasing default for hive.thrift.client.max.message.size to 2 GB

2023-08-22 Thread Riju Trivedi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Riju Trivedi updated HIVE-27113:

Description: 
HIVE_THRIFT_CLIENT_MAX_MESSAGE_SIZE("hive.thrift.client.max.message.size", 
"1gb",
new SizeValidator(-1L, true, (long) Integer.MAX_VALUE, true),
"Thrift client configuration for max message size. 0 or -1 will use the default 
defined in the Thrift " +
"library. The upper limit is 2147483648 bytes (or 2gb).")


Documentation on the help suggests setting 2147483648 while Integer Max is 
2147483647. So, it actually becomes -1 and gets set to thrift default limit 
(100 MB)

  was:
HIVE_THRIFT_CLIENT_MAX_MESSAGE_SIZE("hive.thrift.client.max.message.size", 
"1gb",
new SizeValidator(-1L, true, (long) Integer.MAX_VALUE, true),
"Thrift client configuration for max message size. 0 or -1 will use the 
default defined in the Thrift " +
"library. The upper limit is 2147483648 bytes (or 2gb).")
Documentation on the help suggests setting 2147483648 while Integer Max is 
2147483647. So, it actually becomes -1 and gets set to thrift default limit 
(100 MB)


> Increasing default for hive.thrift.client.max.message.size to 2 GB
> --
>
> Key: HIVE-27113
> URL: https://issues.apache.org/jira/browse/HIVE-27113
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Riju Trivedi
>Assignee: Riju Trivedi
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> HIVE_THRIFT_CLIENT_MAX_MESSAGE_SIZE("hive.thrift.client.max.message.size", 
> "1gb",
> new SizeValidator(-1L, true, (long) Integer.MAX_VALUE, true),
> "Thrift client configuration for max message size. 0 or -1 will use the 
> default defined in the Thrift " +
> "library. The upper limit is 2147483648 bytes (or 2gb).")
> Documentation on the help suggests setting 2147483648 while Integer Max is 
> 2147483647. So, it actually becomes -1 and gets set to thrift default limit 
> (100 MB)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27636) Exception in HiveMaterializedViewsRegistry is leaving staging directories behind

2023-08-22 Thread Riju Trivedi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Riju Trivedi updated HIVE-27636:

Description: 
In case of any exception while query parsing in 
`HiveMaterializedViewsRegistry.createMaterialization`, we bail out and there is 
no hdfs dir cleanup until JVM exit. This leaves behind the staging directories. 
For a long-running HS2, these staging directories keeps on increasing and can 
cause limit reached exception.
{code:java}
Error: Error while compiling statement: FAILED: RuntimeException Cannot create 
staging directory 
'hdfs://aidaprd01/warehouse/tablespace/managed/hive/test.db/testTable/.hive-staging_hive_2023-08-05_06-17-06_711_5516272990801215078-168329:
 The directory item limit of 
/warehouse/tablespace/managed/hive/test.db/testTable is exceeded: limit=1048576 
items=1048576 {code}
We should do hdfs directory cleanup for `HiveMaterializedViewsRegistry` thread 
[here|https://github.infra.cloudera.com/CDH/hive/blob/39b9e39e5167c8fcd35683f8e9e2c9a89fe86555/ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMaterializedViewsRegistry.java#L226]

  was:
In case of any exception while query parsing in 
`HiveMaterializedViewsRegistry.createMaterialization`, we bailout and there is 
no hdfs dir cleanup until JVM exit. This leaves behind the staging directories. 
For a long-running HS2, these staging  directories keeps on increasing and can 
cause limit reached exception.
{code:java}
Error: Error while compiling statement: FAILED: RuntimeException Cannot create 
staging directory 
'hdfs://aidaprd01/warehouse/tablespace/managed/hive/test.db/testTable/.hive-staging_hive_2023-08-05_06-17-06_711_5516272990801215078-168329:
 The directory item limit of 
/warehouse/tablespace/managed/hive/test.db/testTable is exceeded: limit=1048576 
items=1048576 {code}


> Exception in HiveMaterializedViewsRegistry is leaving staging directories 
> behind
> 
>
> Key: HIVE-27636
> URL: https://issues.apache.org/jira/browse/HIVE-27636
> Project: Hive
>  Issue Type: Bug
>  Components: Materialized views
>Reporter: Riju Trivedi
>Priority: Major
>
> In case of any exception while query parsing in 
> `HiveMaterializedViewsRegistry.createMaterialization`, we bail out and there 
> is no hdfs dir cleanup until JVM exit. This leaves behind the staging 
> directories. For a long-running HS2, these staging directories keeps on 
> increasing and can cause limit reached exception.
> {code:java}
> Error: Error while compiling statement: FAILED: RuntimeException Cannot 
> create staging directory 
> 'hdfs://aidaprd01/warehouse/tablespace/managed/hive/test.db/testTable/.hive-staging_hive_2023-08-05_06-17-06_711_5516272990801215078-168329:
>  The directory item limit of 
> /warehouse/tablespace/managed/hive/test.db/testTable is exceeded: 
> limit=1048576 items=1048576 {code}
> We should do hdfs directory cleanup for `HiveMaterializedViewsRegistry` 
> thread 
> [here|https://github.infra.cloudera.com/CDH/hive/blob/39b9e39e5167c8fcd35683f8e9e2c9a89fe86555/ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMaterializedViewsRegistry.java#L226]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27636) Exception in HiveMaterializedViewsRegistry is leaving staging directories behind

2023-08-22 Thread Riju Trivedi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Riju Trivedi updated HIVE-27636:

Description: 
In case of any exception while query parsing in 
`HiveMaterializedViewsRegistry.createMaterialization`, we bailout and there is 
no hdfs dir cleanup until JVM exit. This leaves behind the staging directories. 
For a long-running HS2, these staging  directories keeps on increasing and can 
cause limit reached exception.
{code:java}
Error: Error while compiling statement: FAILED: RuntimeException Cannot create 
staging directory 
'hdfs://aidaprd01/warehouse/tablespace/managed/hive/test.db/testTable/.hive-staging_hive_2023-08-05_06-17-06_711_5516272990801215078-168329:
 The directory item limit of 
/warehouse/tablespace/managed/hive/test.db/testTable is exceeded: limit=1048576 
items=1048576 {code}

> Exception in HiveMaterializedViewsRegistry is leaving staging directories 
> behind
> 
>
> Key: HIVE-27636
> URL: https://issues.apache.org/jira/browse/HIVE-27636
> Project: Hive
>  Issue Type: Bug
>  Components: Materialized views
>Reporter: Riju Trivedi
>Priority: Major
>
> In case of any exception while query parsing in 
> `HiveMaterializedViewsRegistry.createMaterialization`, we bailout and there 
> is no hdfs dir cleanup until JVM exit. This leaves behind the staging 
> directories. For a long-running HS2, these staging  directories keeps on 
> increasing and can cause limit reached exception.
> {code:java}
> Error: Error while compiling statement: FAILED: RuntimeException Cannot 
> create staging directory 
> 'hdfs://aidaprd01/warehouse/tablespace/managed/hive/test.db/testTable/.hive-staging_hive_2023-08-05_06-17-06_711_5516272990801215078-168329:
>  The directory item limit of 
> /warehouse/tablespace/managed/hive/test.db/testTable is exceeded: 
> limit=1048576 items=1048576 {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27636) Exception in HiveMaterializedViewsRegistry is leaving staging directories behind

2023-08-22 Thread Riju Trivedi (Jira)
Riju Trivedi created HIVE-27636:
---

 Summary: Exception in HiveMaterializedViewsRegistry is leaving 
staging directories behind
 Key: HIVE-27636
 URL: https://issues.apache.org/jira/browse/HIVE-27636
 Project: Hive
  Issue Type: Bug
  Components: Materialized views
Reporter: Riju Trivedi






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-24078) result rows not equal in mr and tez

2023-08-22 Thread Attila Turoczy (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Turoczy updated HIVE-24078:
--
Labels: check mapreduce obsolete?  (was: Obsolete)

> result rows not equal in mr and tez
> ---
>
> Key: HIVE-24078
> URL: https://issues.apache.org/jira/browse/HIVE-24078
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Tez
>Affects Versions: 3.1.2
>Reporter: kuqiqi
>Assignee: shubhangi priya
>Priority: Blocker
>  Labels: check, mapreduce, obsolete?
>
> select
> rank_num,
> province_name,
> programset_id,
> programset_name,
> programset_type,
> cv,
> uv,
> pt,
> rank_num2,
> rank_num3,
> city_name,
> level,
> cp_code,
> cp_name,
> version_type,
> zz.city_code,
> zz.province_alias,
> '20200815' dt
> from 
> (SELECT row_number() over(partition BY 
> a1.province_alias,a1.city_code,a1.version_type
>  ORDER BY cast(a1.cv AS bigint) DESC) AS rank_num,
>  province_name(a1.province_alias) AS province_name,
>  a1.program_set_id AS programset_id,
>  a2.programset_name,
>  a2.type_name AS programset_type,
>  a1.cv,
>  a1.uv,
>  cast(a1.pt/360 as decimal(20,2)) pt,
>  row_number() over (partition by 
> a1.province_alias,a1.city_code,a1.version_type order by cast(a1.uv as bigint) 
> desc ) as rank_num2,
>  row_number() over (partition by 
> a1.province_alias,a1.city_code,a1.version_type order by cast(a1.pt as bigint) 
> desc ) as rank_num3,
>  a1.city_code,
>  a1.city_name,
>  '3' as level,
>  a2.cp_code,
>  a2.cp_name,
>  '20200815'as dt,
>  a1.province_alias,
>  a1.version_type
> FROM temp.dmp_device_vod_valid_day_v1_20200815_hn a1
> LEFT JOIN temp.dmp_device_vod_valid_day_v2_20200815_hn a2 ON 
> a1.program_set_id=a2.programset_id
> WHERE a2.programset_name IS NOT NULL ) zz
> where rank_num<1000 or rank_num2<1000 or rank_num3<1000
> ;
>  
> This sql gets 76742 rows in mr, but 76681 rows in tez.How to fix it?
> I think the problem maybe lies in row_number.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27170) facing issues while using tez 0.9.2 as execution engine to hive 2.3.9

2023-08-22 Thread Attila Turoczy (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757306#comment-17757306
 ] 

Attila Turoczy commented on HIVE-27170:
---

I strongly recommended to use latest version of hive. Hive 2 is pretty old 

> facing issues while using tez 0.9.2 as execution engine to hive 2.3.9
> -
>
> Key: HIVE-27170
> URL: https://issues.apache.org/jira/browse/HIVE-27170
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Tez
>Affects Versions: 2.3.9
>Reporter: vikran
>Priority: Critical
> Attachments: hive-site.txt, hive_error_in_yarn.txt, tez-site.txt
>
>
> Hi Team,
> am using below versions:
> hive 2.3.9
> tez 0.9.2
> spark 3.3.2
>  hive-site.xml(attached)
>  tez-site.xml(attached)
> i have added tez jars and files as well as hive jars into /apps/tez in hdfs 
> directory,
> when am trying to start hive in cli, i am getting below error,
> hive> INSERT INTO emp1.employee values(7,'scott',23,'M');
> Query ID = azureuser_20230324061903_97928963-410d-44a0-aa47-a83cdc24ce88
> Total jobs = 1
> Launching Job 1 out of 1
> *FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.tez.TezTask*
> and i have attached complete error log from appmaster.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27322) Iceberg: metadata location overrides can cause data breach

2023-08-22 Thread Attila Turoczy (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Turoczy updated HIVE-27322:
--
Labels: check  (was: )

> Iceberg: metadata location overrides can cause data breach
> --
>
> Key: HIVE-27322
> URL: https://issues.apache.org/jira/browse/HIVE-27322
> Project: Hive
>  Issue Type: Bug
>  Components: Iceberg integration
>Affects Versions: 4.0.0-alpha-2
>Reporter: Janos Kovacs
>Priority: Blocker
>  Labels: check
>
> Set to bug/blocker instead of enhancement due to its security related nature, 
> Hive4 should not be released w/o fix for this. Please reset if needed.
>  
> Context: 
>  * There are some core tables with sensitive data that users can only query 
> with data masking enforced (e.g. via Ranger). Let's assume this is the 
> `default.icebergsecured` table.
>  * An end-user can only access the masked form of the sensitive data as 
> expected...
>  * The users also have privilege to create new tables in their own sandbox 
> databases - let's assume this is the `default.trojanhorse` table for now.
>  * The user can create a malicious table that exposes the sensitive data 
> non-masked leading to a possible data breach.
>  * Hive runs with doAs=false to be able to enforce FGAC and prevent end-user 
> direct file-system access needs
> Repro:
>  * First make sure the data is secured by the masking policy:
> {noformat}
> 
> beeline -e "
> DROP TABLE IF EXISTS default.icebergsecured PURGE;
> CREATE EXTERNAL TABLE default.icebergsecured (txt string, secret string) 
> STORED BY ICEBERG;
> INSERT INTO default.icebergsecured VALUES ('You might be allowed to see 
> this.','You are NOT allowed to see this!');
> "
> 
> beeline -e "
> SELECT * FROM default.icebergsecured;
> "
> +++
> | icebergsecured.txt | icebergsecured.secret  |
> +++
> | You might be allowed to see this.  | MASKED BY RANGER FOR SECURITY  |
> +++
> {noformat}
>  * Now let the user to create the malicious table exposing the sensitive data:
> {noformat}
> 
> SECURED_META_LOCATION=$(HADOOP_CLIENT_OPTS="-Djline.terminal=jline.UnsupportedTerminal"
>  beeline -e "DESCRIBE FORMATTED default.icebergsecured;" 2>/dev/null |grep 
> metadata_location  |grep -v previous_metadata_location | awk '{print $5}')
> beeline -e "
> DROP TABLE IF EXISTS default.trojanhorse;
> CREATE EXTERNAL TABLE default.trojanhorse (txt string, secret string) STORED 
> BY ICEBERG
> TBLPROPERTIES (
>   'metadata_location'='${SECURED_META_LOCATION}');
> SELECT * FROM default.trojanhorse;
> "
> ++---+
> |  trojanhorse.txt   |trojanhorse.secret |
> ++---+
> | You might be allowed to see this.  | You are not allowed to see this!  |
> ++---+
> {noformat}
>  
> Currently - after HIVE-26707 - the rwstorage authorization only has either 
> the dummy path or the explicit path set for uri:  
> {noformat}
> Permission denied: user [oozie] does not have [RWSTORAGE] privilege on 
> [iceberg://default/trojanhorse?snapshot=%2Fwarehouse%2Ftablespace%2Fexternal%2Fhive%2Ftrojanhorse%2Fmetadata%2Fdummy.metadata.json]
> Permission denied: user [oozie] does not have [RWSTORAGE] privilege on 
> [iceberg://default/trojanhorse?snapshot=%2Fwarehouse%2Ftablespace%2Fexternal%2Fhive%2Ficebergsecured%2Fmetadata%2F1-f4c2a428-30ce-4afd-82ff-d46ecbf02244.metadata.json]
>  
> {noformat}
> This is can be used only to decide whether a user is allowed to create 
> iceberg tables in certain databases with certain names but controlling it's 
> metadata location is hard in that form:
>  * it does not provide a variable of "default table location" so a rule needs 
> to know the per-database table location or per-catalog warehouse location to 
> be able to construct it
>  * it does not provide a rich regex to filter out `/../` style directory 
> references
>  * but basically there should be also a flag whether explicit metadata 
> location is provided or not instead of the dummy reference, which then again 
> needs explicit matching in the policy to handle
>  
> Proposed enhancement:
>  * The URL for the iceberg table's rwstorage authorization should be changed 
> the following way
>  ** the /? is good but
>  *** the location should not be url encoded, or at least the authorizer 
> should check the policy against the decoded url
>  *** the separator between the table and location should be "/" 

[jira] [Updated] (HIVE-27323) Iceberg: malformed manifest file or list can cause data breach

2023-08-22 Thread Attila Turoczy (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Turoczy updated HIVE-27323:
--
Labels: check  (was: )

> Iceberg: malformed manifest file or list can cause data breach
> --
>
> Key: HIVE-27323
> URL: https://issues.apache.org/jira/browse/HIVE-27323
> Project: Hive
>  Issue Type: Bug
>  Components: Iceberg integration
>Affects Versions: 4.0.0-alpha-2
>Reporter: Janos Kovacs
>Priority: Blocker
>  Labels: check
>
> Set to bug/blocker instead of enhancement due to its security related nature, 
> Hive4 should not be released w/o fix for this. Please reset if needed.
>  
> Fyi: it's similar to HIVE-27322 but this is more based on Iceberg's internals 
> and can't just be fixed via the storagehandler authorizer.
>  
> Context: 
>  * There are some core tables with sensitive data that users can only query 
> with data masking enforced (e.g. via Ranger). Let's assume this is the 
> `default.icebergsecured` table.
>  * An end-user can only access the masked form of the sensitive data as 
> expected...
>  * The users also have privilege to create new tables in their own sandbox 
> databases - let's assume this is the `default.trojanhorse` table for now.
>  * The user can create a malicious table that exposes the sensitive data 
> non-masked leading to a possible data breach.
>  * Hive runs with doAs=false to be able to enforce FGAC and prevent end-user 
> direct file-system access needs
> Repro:
>  * First make sure the data is secured by the masking policy:
> {noformat}
> 
> beeline -e "
> DROP TABLE IF EXISTS default.icebergsecured PURGE;
> CREATE EXTERNAL TABLE default.icebergsecured (txt string, secret string) 
> STORED BY ICEBERG;
> INSERT INTO default.icebergsecured VALUES ('You might be allowed to see 
> this.','You are NOT allowed to see this!');
> "
> 
> beeline -e "
> SELECT * FROM default.icebergsecured;
> "
> +++
> | icebergsecured.txt | icebergsecured.secret  |
> +++
> | You might be allowed to see this.  | MASKED BY RANGER FOR SECURITY  |
> +++
> {noformat}
>  * Now let the user to create the malicious table exposing the sensitive data:
> {noformat}
> 
> beeline -e "
> DROP TABLE IF EXISTS default.trojanhorseviadata;
> CREATE EXTERNAL TABLE default.trojanhorseviadata (txt string, secret string) 
> STORED BY ICEBERG
> LOCATION '/some-user-writeable-location/trojanhorseviadata';
> INSERT INTO default.trojanhorseviadata VALUES ('placeholder','placeholder');
> "
> SECURE_DATA_FILE=$(HADOOP_CLIENT_OPTS="-Djline.terminal=jline.UnsupportedTerminal"
>   beeline --outputformat=csv2 --showHeader=false --verbose=false 
> --showWarnings=false --silent=true --report=false -e "SELECT file_path FROM 
> default.icebergsecured.files;" 2>/dev/null)
> TROJAN_META_LOCATION=$(HADOOP_CLIENT_OPTS="-Djline.terminal=jline.UnsupportedTerminal"
>  beeline -e "DESCRIBE FORMATTED default.trojanhorseviadata;" 2>/dev/null 
> |grep metadata_location  |grep -v previous_metadata_location | awk '{print 
> $5}')
> TROJAN_MANIFESTLIST_LOCATION=$(hdfs dfs -cat $TROJAN_META_LOCATION |grep 
> "manifest-list"  |cut -f4 -d\")
> hdfs dfs -get $TROJAN_MANIFESTLIST_LOCATION
> TROJAN_MANIFESTLIST=$(basename $TROJAN_MANIFESTLIST_LOCATION)
> TROJAN_MANIFESTFILE_LOCATION=$(avro-tools tojson $TROJAN_MANIFESTLIST |jq 
> '.manifest_path' |tr -d \")
> hdfs dfs -get $TROJAN_MANIFESTFILE_LOCATION
> TROJAN_MANIFESTFILE=$(basename $TROJAN_MANIFESTFILE_LOCATION)
> mv ${TROJAN_MANIFESTFILE} ${TROJAN_MANIFESTFILE}.orig
> avro-tools tojson ${TROJAN_MANIFESTFILE}.orig |jq --arg fp 
> "$SECURE_DATA_FILE" '.data_file.file_path = $fp' > ${TROJAN_MANIFESTFILE}.json
> avro-tools getschema ${TROJAN_MANIFESTFILE}.orig > 
> ${TROJAN_MANIFESTFILE}.schema
> avro-tools fromjson --codec deflate --schema-file 
> ${TROJAN_MANIFESTFILE}.schema ${TROJAN_MANIFESTFILE}.json > 
> ${TROJAN_MANIFESTFILE}.new
> hdfs dfs -put -f ${TROJAN_MANIFESTFILE}.new $TROJAN_MANIFESTFILE_LOCATION
> beeline -e "SELECT * FROM default.trojanhorseviadata;"
> ++---+
> |   trojanhorseviadata.txt   | trojanhorseviadata.secret |
> ++---+
> | You might be allowed to see this.  | You are not allowed to see this!  |
> ++---+
> {noformat}
>  
> There are actually multiple options to create such table and modify the 
> manifest/list like reuse parts of the iceberg code or just use spark which 

[jira] [Updated] (HIVE-27420) No such method exception with shaded codahale metric pattern

2023-08-22 Thread Attila Turoczy (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Turoczy updated HIVE-27420:
--
Labels: check  (was: )

> No such method exception with shaded codahale metric pattern
> 
>
> Key: HIVE-27420
> URL: https://issues.apache.org/jira/browse/HIVE-27420
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0-alpha-1
>Reporter: Shubham Sharma
>Assignee: Shubham Sharma
>Priority: Major
>  Labels: check
> Fix For: 4.0.0-alpha-2
>
>
> HIVE-22059 causes failure to start Hive server and metastore safely failing 
> due to missing method error:
> {code:java}
> java.lang.NoSuchMethodError: 
> com.github.joshelser.dropwizard.metrics.hadoop.HadoopMetrics2Reporter.forRegistry(Lorg/apache/hive/com/codahale/metrics/MetricRegistry;)Lcom/github/joshelser/dropwizard/metrics/hadoop/HadoopMetrics2Reporter$Builder;
>     at 
> org.apache.hadoop.hive.metastore.metrics.Metrics.(Metrics.java:286) 
> ~[hive-exec-3.1.4.3.2.2.0-1.jar:3.1.4.3.2.2.0-1]
>     at 
> org.apache.hadoop.hive.metastore.metrics.Metrics.initialize(Metrics.java:71) 
> ~[hive-exec-3.1.4.3.2.2.0-1.jar:3.1.4.3.2.2.0-1]
>     at 
> org.apache.hadoop.hive.metastore.HiveMetaStore.main(HiveMetaStore.java:306) 
> [hive-exec-3.1.4.3.2.2.0-1.jar:3.1.4.3.2.2.0-1]
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.8.0_362]
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_362]
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_362]
>     at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_362]
>     at org.apache.hadoop.util.RunJar.run(RunJar.java:323) 
> [hadoop-common-3.2.3.3.2.2.0-1.jar:?]
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:236) 
> [hadoop-common-3.2.3.3.2.2.0-1.jar:?]{code}
> This is caused as class pattern is shaded and relocated causing above no such 
> method error
> {code:java}
> com.codahale.metrics
> org.apache.hive.com.codahale.metrics {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-13157) MetaStoreEventListener.onAlter triggered for INSERT and SELECT

2023-08-22 Thread Attila Turoczy (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-13157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Turoczy updated HIVE-13157:
--
Labels: obsolete?  (was: )

> MetaStoreEventListener.onAlter triggered for INSERT and SELECT
> --
>
> Key: HIVE-13157
> URL: https://issues.apache.org/jira/browse/HIVE-13157
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 1.2.1, 4.0.0
>Reporter: Eugen Stoianovici
>Priority: Critical
>  Labels: obsolete?
>
> The event onAlter from 
> org.apache.hadoop.hive.metastore.MetaStoreEventListener is triggered when 
> INSERT or SELECT statements are executed on the target table.
> Furthermore, the value of transient_lastDdl is updated in table properties 
> for INSERT statements.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-21750) INSERT OVERWRITE with empty result set does not clear transactional table

2023-08-22 Thread Attila Turoczy (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Turoczy resolved HIVE-21750.
---
Resolution: Duplicate

https://issues.apache.org/jira/browse/HIVE-16051

> INSERT OVERWRITE with empty result set does not clear transactional table
> -
>
> Key: HIVE-21750
> URL: https://issues.apache.org/jira/browse/HIVE-21750
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Todd Lipcon
>Priority: Critical
>
> The following query:
> {code}
> INSERT OVERWRITE TABLE t SELECT 1 WHERE FALSE
> {code}
> should serve to truncate a table by producing an empty base data directory. 
> In fact no new base directory is created, so the table is not cleared. (at 
> least with an insert_only table, I didn't test full-ACID)
> This bug does not seem to happen with non-transactional tables.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] (HIVE-22636) Data loss on skewjoin for ACID tables.

2023-08-22 Thread Attila Turoczy (Jira)


[ https://issues.apache.org/jira/browse/HIVE-22636 ]


Attila Turoczy deleted comment on HIVE-22636:
---

was (Author: JIRAUSER300479):
https://issues.apache.org/jira/browse/HIVE-16051

> Data loss on skewjoin for ACID tables.
> --
>
> Key: HIVE-22636
> URL: https://issues.apache.org/jira/browse/HIVE-22636
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Aditya Shah
>Priority: Blocker
>  Labels: check, hive-4.0.0-must
>
> I am trying to do a skewjoin and writing the result into a FullAcid table. 
> The results are incorrect. The issue is similar to seen for MM tables in 
> HIVE-16051 where the fix was to skip having a skewjoin for MM table. 
> Steps to reproduce:
> Used a qtest similar to HIVE-16051:
> {code:java}
> --! qt:dataset:src1
> --! qt:dataset:src
> -- MASK_LINEAGE
> set hive.mapred.mode=nonstrict;
> set hive.exec.dynamic.partition.mode=nonstrict;
> set hive.support.concurrency=true;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> set hive.optimize.skewjoin=true;
> set hive.skewjoin.key=2;
> set hive.optimize.metadataonly=false;
> CREATE TABLE skewjoin_acid(key INT, value STRING) STORED AS ORC tblproperties 
> ("transactional"="true");
> FROM src src1 JOIN src src2 ON (src1.key = src2.key) INSERT into TABLE 
> skewjoin_acid SELECT src1.key, src2.value;
> select count(distinct key) from skewjoin_acid;
> drop table skewjoin_acid;
> {code}
> The expected result for the count was 309 but got 173. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Reopened] (HIVE-22636) Data loss on skewjoin for ACID tables.

2023-08-22 Thread Attila Turoczy (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Turoczy reopened HIVE-22636:
---

> Data loss on skewjoin for ACID tables.
> --
>
> Key: HIVE-22636
> URL: https://issues.apache.org/jira/browse/HIVE-22636
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Aditya Shah
>Priority: Blocker
>  Labels: check, hive-4.0.0-must
>
> I am trying to do a skewjoin and writing the result into a FullAcid table. 
> The results are incorrect. The issue is similar to seen for MM tables in 
> HIVE-16051 where the fix was to skip having a skewjoin for MM table. 
> Steps to reproduce:
> Used a qtest similar to HIVE-16051:
> {code:java}
> --! qt:dataset:src1
> --! qt:dataset:src
> -- MASK_LINEAGE
> set hive.mapred.mode=nonstrict;
> set hive.exec.dynamic.partition.mode=nonstrict;
> set hive.support.concurrency=true;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> set hive.optimize.skewjoin=true;
> set hive.skewjoin.key=2;
> set hive.optimize.metadataonly=false;
> CREATE TABLE skewjoin_acid(key INT, value STRING) STORED AS ORC tblproperties 
> ("transactional"="true");
> FROM src src1 JOIN src src2 ON (src1.key = src2.key) INSERT into TABLE 
> skewjoin_acid SELECT src1.key, src2.value;
> select count(distinct key) from skewjoin_acid;
> drop table skewjoin_acid;
> {code}
> The expected result for the count was 309 but got 173. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-22636) Data loss on skewjoin for ACID tables.

2023-08-22 Thread Attila Turoczy (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Turoczy resolved HIVE-22636.
---
Resolution: Duplicate

https://issues.apache.org/jira/browse/HIVE-16051

> Data loss on skewjoin for ACID tables.
> --
>
> Key: HIVE-22636
> URL: https://issues.apache.org/jira/browse/HIVE-22636
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Aditya Shah
>Priority: Blocker
>  Labels: check, hive-4.0.0-must
>
> I am trying to do a skewjoin and writing the result into a FullAcid table. 
> The results are incorrect. The issue is similar to seen for MM tables in 
> HIVE-16051 where the fix was to skip having a skewjoin for MM table. 
> Steps to reproduce:
> Used a qtest similar to HIVE-16051:
> {code:java}
> --! qt:dataset:src1
> --! qt:dataset:src
> -- MASK_LINEAGE
> set hive.mapred.mode=nonstrict;
> set hive.exec.dynamic.partition.mode=nonstrict;
> set hive.support.concurrency=true;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> set hive.optimize.skewjoin=true;
> set hive.skewjoin.key=2;
> set hive.optimize.metadataonly=false;
> CREATE TABLE skewjoin_acid(key INT, value STRING) STORED AS ORC tblproperties 
> ("transactional"="true");
> FROM src src1 JOIN src src2 ON (src1.key = src2.key) INSERT into TABLE 
> skewjoin_acid SELECT src1.key, src2.value;
> select count(distinct key) from skewjoin_acid;
> drop table skewjoin_acid;
> {code}
> The expected result for the count was 309 but got 173. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-23586) load data overwrite into bucket table failed

2023-08-22 Thread Attila Turoczy (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Turoczy updated HIVE-23586:
--
Labels: hive-4.0.0-must pull-request-available  (was: 
pull-request-available)

> load data overwrite into bucket table failed
> 
>
> Key: HIVE-23586
> URL: https://issues.apache.org/jira/browse/HIVE-23586
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.0, 3.1.2, 4.0.0
>Reporter: zhaolong
>Assignee: zhaolong
>Priority: Critical
>  Labels: hive-4.0.0-must, pull-request-available
> Attachments: HIVE-23586.01.patch, image-2020-06-01-21-40-21-726.png, 
> image-2020-06-01-21-41-28-732.png
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> load data overwrite into bucket table is failed if filename is not like 
> 00_0, but insert new data in the table.
>  
> for example:
> CREATE EXTERNAL TABLE IF NOT EXISTS test_hive2 (name string,account string) 
> PARTITIONED BY (logdate string) CLUSTERED BY (account) INTO 4 BUCKETS row 
> format delimited fields terminated by '|' STORED AS textfile;
>  load data inpath 'hdfs://hacluster/tmp/zltest' overwrite into table 
> default.test_hive2 partition (logdate='20200508');
>  !image-2020-06-01-21-40-21-726.png!
>  load data inpath 'hdfs://hacluster/tmp/zltest' overwrite into table 
> default.test_hive2 partition (logdate='20200508');// should overwrite but 
> insert new data
>  !image-2020-06-01-21-41-28-732.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-23586) load data overwrite into bucket table failed

2023-08-22 Thread Attila Turoczy (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757295#comment-17757295
 ] 

Attila Turoczy commented on HIVE-23586:
---

[~fsilent] Thanks for your contribution. Can we reborn this PR?
cc: [~ayushsaxena]

> load data overwrite into bucket table failed
> 
>
> Key: HIVE-23586
> URL: https://issues.apache.org/jira/browse/HIVE-23586
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.0, 3.1.2, 4.0.0
>Reporter: zhaolong
>Assignee: zhaolong
>Priority: Critical
>  Labels: pull-request-available
> Attachments: HIVE-23586.01.patch, image-2020-06-01-21-40-21-726.png, 
> image-2020-06-01-21-41-28-732.png
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> load data overwrite into bucket table is failed if filename is not like 
> 00_0, but insert new data in the table.
>  
> for example:
> CREATE EXTERNAL TABLE IF NOT EXISTS test_hive2 (name string,account string) 
> PARTITIONED BY (logdate string) CLUSTERED BY (account) INTO 4 BUCKETS row 
> format delimited fields terminated by '|' STORED AS textfile;
>  load data inpath 'hdfs://hacluster/tmp/zltest' overwrite into table 
> default.test_hive2 partition (logdate='20200508');
>  !image-2020-06-01-21-40-21-726.png!
>  load data inpath 'hdfs://hacluster/tmp/zltest' overwrite into table 
> default.test_hive2 partition (logdate='20200508');// should overwrite but 
> insert new data
>  !image-2020-06-01-21-41-28-732.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-26505) Case When Some result data is lost when there are common column conditions and partitioned column conditions

2023-08-22 Thread Attila Turoczy (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Turoczy updated HIVE-26505:
--
Labels: check hive-4.0.0-must  (was: check)

> Case When Some result data is lost when there are common column conditions 
> and partitioned column conditions 
> -
>
> Key: HIVE-26505
> URL: https://issues.apache.org/jira/browse/HIVE-26505
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.0, 4.0.0-alpha-1
>Reporter: GuangMing Lu
>Priority: Critical
>  Labels: check, hive-4.0.0-must
>
> {code:java}https://issues.apache.org/jira/browse/HIVE-26505#
> create table test0831 (id string) partitioned by (cp string);
> insert into test0831 values ('a', '2022-08-23'),('c', '2022-08-23'),('d', 
> '2022-08-23');
> insert into test0831 values ('a', '2022-08-24'),('b', '2022-08-24');
> select * from test0831;
> +-+--+
> | test0831.id | test0831.cp  |
> +-+--+
> | a     | 2022-08-23   |
> | b        | 2022-08-23   |
> | a        | 2022-08-23   |
> | c        | 2022-08-24   |
> | d        | 2022-08-24   |
> +-+--+
> select * from test0831 where (case when id='a' and cp='2022-08-23' then 1 
> else 0 end)=0;  
> +--+--+
> | test0830.id  | test0830.cp  |
> +--+--+
> | a        | 2022-08-24   |
> | b        | 2022-08-24   |
> +--+--+
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-24621) TEXT and varchar datatype does not support unicode encoding in MSSQL

2023-08-22 Thread Attila Turoczy (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Turoczy updated HIVE-24621:
--
Labels: check hive-4.0.0-must  (was: )

> TEXT and varchar datatype does not support unicode encoding in MSSQL
> 
>
> Key: HIVE-24621
> URL: https://issues.apache.org/jira/browse/HIVE-24621
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Affects Versions: 4.0.0
>Reporter: Nikhil Gupta
>Assignee: Nikhil Gupta
>Priority: Critical
>  Labels: check, hive-4.0.0-must
>
> Why Unicode is required?
> In following example the Chinese character cannot be properly interpreted. 
> {noformat}
> CREATE VIEW `test_view` AS select `test_tbl_char`.`col1` from 
> `test_db5`.`test_tbl_char` where `test_tbl_char`.`col1`='你好'; 
> show create table test_view;
> ++
> |                   createtab_stmt                   |
> ++
> | CREATE VIEW `test_view` AS select `test_tbl_char`.`col1` from 
> `test_db5`.`test_tbl_char` where `test_tbl_char`.`col1`='??' |
> ++ {noformat}
>  
> This issue comes because TBLS is defined as follows:
>  
> CREATE TABLE TBLS
> (
>  TBL_ID bigint NOT NULL,
>  CREATE_TIME int NOT NULL,
>  DB_ID bigint NULL,
>  LAST_ACCESS_TIME int NOT NULL,
>  OWNER nvarchar(767) NULL,
>  OWNER_TYPE nvarchar(10) NULL,
>  RETENTION int NOT NULL,
>  SD_ID bigint NULL,
>  TBL_NAME nvarchar(256) NULL,
>  TBL_TYPE nvarchar(128) NULL,
>  VIEW_EXPANDED_TEXT text NULL,
>  VIEW_ORIGINAL_TEXT text NULL,
>  IS_REWRITE_ENABLED bit NOT NULL DEFAULT 0,
>  WRITE_ID bigint NOT NULL DEFAULT 0
> );
> Text data type does not support unicode encoding irrespective of collation
> varchar data type does not support unicode encoding prior to SQL Server 2019. 
> Also UTF8 enabled Collation needs to be defined for use of unicode characters.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-26505) Case When Some result data is lost when there are common column conditions and partitioned column conditions

2023-08-22 Thread Attila Turoczy (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Turoczy updated HIVE-26505:
--
Labels: check  (was: )

> Case When Some result data is lost when there are common column conditions 
> and partitioned column conditions 
> -
>
> Key: HIVE-26505
> URL: https://issues.apache.org/jira/browse/HIVE-26505
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.0, 4.0.0-alpha-1
>Reporter: GuangMing Lu
>Priority: Critical
>  Labels: check
>
> {code:java}https://issues.apache.org/jira/browse/HIVE-26505#
> create table test0831 (id string) partitioned by (cp string);
> insert into test0831 values ('a', '2022-08-23'),('c', '2022-08-23'),('d', 
> '2022-08-23');
> insert into test0831 values ('a', '2022-08-24'),('b', '2022-08-24');
> select * from test0831;
> +-+--+
> | test0831.id | test0831.cp  |
> +-+--+
> | a     | 2022-08-23   |
> | b        | 2022-08-23   |
> | a        | 2022-08-23   |
> | c        | 2022-08-24   |
> | d        | 2022-08-24   |
> +-+--+
> select * from test0831 where (case when id='a' and cp='2022-08-23' then 1 
> else 0 end)=0;  
> +--+--+
> | test0830.id  | test0830.cp  |
> +--+--+
> | a        | 2022-08-24   |
> | b        | 2022-08-24   |
> +--+--+
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-22636) Data loss on skewjoin for ACID tables.

2023-08-22 Thread Attila Turoczy (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Turoczy updated HIVE-22636:
--
Labels: check hive-4.0.0-must  (was: )

> Data loss on skewjoin for ACID tables.
> --
>
> Key: HIVE-22636
> URL: https://issues.apache.org/jira/browse/HIVE-22636
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Aditya Shah
>Priority: Blocker
>  Labels: check, hive-4.0.0-must
>
> I am trying to do a skewjoin and writing the result into a FullAcid table. 
> The results are incorrect. The issue is similar to seen for MM tables in 
> HIVE-16051 where the fix was to skip having a skewjoin for MM table. 
> Steps to reproduce:
> Used a qtest similar to HIVE-16051:
> {code:java}
> --! qt:dataset:src1
> --! qt:dataset:src
> -- MASK_LINEAGE
> set hive.mapred.mode=nonstrict;
> set hive.exec.dynamic.partition.mode=nonstrict;
> set hive.support.concurrency=true;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> set hive.optimize.skewjoin=true;
> set hive.skewjoin.key=2;
> set hive.optimize.metadataonly=false;
> CREATE TABLE skewjoin_acid(key INT, value STRING) STORED AS ORC tblproperties 
> ("transactional"="true");
> FROM src src1 JOIN src src2 ON (src1.key = src2.key) INSERT into TABLE 
> skewjoin_acid SELECT src1.key, src2.value;
> select count(distinct key) from skewjoin_acid;
> drop table skewjoin_acid;
> {code}
> The expected result for the count was 309 but got 173. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-11117) Hive external table - skip header and trailer property issue

2023-08-22 Thread Attila Turoczy (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757294#comment-17757294
 ] 

Attila Turoczy commented on HIVE-7:
---

Why don't you want use TEZ?

> Hive external table - skip header and trailer property issue
> 
>
> Key: HIVE-7
> URL: https://issues.apache.org/jira/browse/HIVE-7
> Project: Hive
>  Issue Type: Bug
> Environment: Production
>Reporter: Janarthanan
>Priority: Critical
>  Labels: check
>
> I am using an external hive table pointing to a HDFS location. The external 
> table is partitioned on year/mm/dd folders. When there are more than one 
> partition folder (ex: /2015/01/02/file.txt & /2015/01/03/file2.txt), the 
> select on external table, skips the DATA RECORD instead of skipping the 
> header/trailer record from one of the file). 
> tblproperties ("skip.header.line.count"="1");
> Resolution: On enabling hive.input format instead of text input format and 
> execution using TEZ engine instead of MapReduce resovled the issue. 
> How to resolve the problem without setting these parameters ? I don't want to 
> run the hive query using TEZ.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-11117) Hive external table - skip header and trailer property issue

2023-08-22 Thread Attila Turoczy (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-7?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Turoczy updated HIVE-7:
--
Labels: check mapreduce  (was: check)

> Hive external table - skip header and trailer property issue
> 
>
> Key: HIVE-7
> URL: https://issues.apache.org/jira/browse/HIVE-7
> Project: Hive
>  Issue Type: Bug
> Environment: Production
>Reporter: Janarthanan
>Priority: Critical
>  Labels: check, mapreduce
>
> I am using an external hive table pointing to a HDFS location. The external 
> table is partitioned on year/mm/dd folders. When there are more than one 
> partition folder (ex: /2015/01/02/file.txt & /2015/01/03/file2.txt), the 
> select on external table, skips the DATA RECORD instead of skipping the 
> header/trailer record from one of the file). 
> tblproperties ("skip.header.line.count"="1");
> Resolution: On enabling hive.input format instead of text input format and 
> execution using TEZ engine instead of MapReduce resovled the issue. 
> How to resolve the problem without setting these parameters ? I don't want to 
> run the hive query using TEZ.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-11117) Hive external table - skip header and trailer property issue

2023-08-22 Thread Attila Turoczy (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-7?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Turoczy updated HIVE-7:
--
Labels: check  (was: )

> Hive external table - skip header and trailer property issue
> 
>
> Key: HIVE-7
> URL: https://issues.apache.org/jira/browse/HIVE-7
> Project: Hive
>  Issue Type: Bug
> Environment: Production
>Reporter: Janarthanan
>Priority: Critical
>  Labels: check
>
> I am using an external hive table pointing to a HDFS location. The external 
> table is partitioned on year/mm/dd folders. When there are more than one 
> partition folder (ex: /2015/01/02/file.txt & /2015/01/03/file2.txt), the 
> select on external table, skips the DATA RECORD instead of skipping the 
> header/trailer record from one of the file). 
> tblproperties ("skip.header.line.count"="1");
> Resolution: On enabling hive.input format instead of text input format and 
> execution using TEZ engine instead of MapReduce resovled the issue. 
> How to resolve the problem without setting these parameters ? I don't want to 
> run the hive query using TEZ.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-24078) result rows not equal in mr and tez

2023-08-22 Thread Attila Turoczy (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Turoczy updated HIVE-24078:
--
Affects Version/s: (was: 4.0.0)

> result rows not equal in mr and tez
> ---
>
> Key: HIVE-24078
> URL: https://issues.apache.org/jira/browse/HIVE-24078
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Tez
>Affects Versions: 3.1.2
>Reporter: kuqiqi
>Assignee: shubhangi priya
>Priority: Blocker
>  Labels: Obsolete
>
> select
> rank_num,
> province_name,
> programset_id,
> programset_name,
> programset_type,
> cv,
> uv,
> pt,
> rank_num2,
> rank_num3,
> city_name,
> level,
> cp_code,
> cp_name,
> version_type,
> zz.city_code,
> zz.province_alias,
> '20200815' dt
> from 
> (SELECT row_number() over(partition BY 
> a1.province_alias,a1.city_code,a1.version_type
>  ORDER BY cast(a1.cv AS bigint) DESC) AS rank_num,
>  province_name(a1.province_alias) AS province_name,
>  a1.program_set_id AS programset_id,
>  a2.programset_name,
>  a2.type_name AS programset_type,
>  a1.cv,
>  a1.uv,
>  cast(a1.pt/360 as decimal(20,2)) pt,
>  row_number() over (partition by 
> a1.province_alias,a1.city_code,a1.version_type order by cast(a1.uv as bigint) 
> desc ) as rank_num2,
>  row_number() over (partition by 
> a1.province_alias,a1.city_code,a1.version_type order by cast(a1.pt as bigint) 
> desc ) as rank_num3,
>  a1.city_code,
>  a1.city_name,
>  '3' as level,
>  a2.cp_code,
>  a2.cp_name,
>  '20200815'as dt,
>  a1.province_alias,
>  a1.version_type
> FROM temp.dmp_device_vod_valid_day_v1_20200815_hn a1
> LEFT JOIN temp.dmp_device_vod_valid_day_v2_20200815_hn a2 ON 
> a1.program_set_id=a2.programset_id
> WHERE a2.programset_name IS NOT NULL ) zz
> where rank_num<1000 or rank_num2<1000 or rank_num3<1000
> ;
>  
> This sql gets 76742 rows in mr, but 76681 rows in tez.How to fix it?
> I think the problem maybe lies in row_number.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-24078) result rows not equal in mr and tez

2023-08-22 Thread Attila Turoczy (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Turoczy updated HIVE-24078:
--
Labels: Obsolete  (was: check obsolete?)

> result rows not equal in mr and tez
> ---
>
> Key: HIVE-24078
> URL: https://issues.apache.org/jira/browse/HIVE-24078
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Tez
>Affects Versions: 3.1.2, 4.0.0
>Reporter: kuqiqi
>Assignee: shubhangi priya
>Priority: Blocker
>  Labels: Obsolete
>
> select
> rank_num,
> province_name,
> programset_id,
> programset_name,
> programset_type,
> cv,
> uv,
> pt,
> rank_num2,
> rank_num3,
> city_name,
> level,
> cp_code,
> cp_name,
> version_type,
> zz.city_code,
> zz.province_alias,
> '20200815' dt
> from 
> (SELECT row_number() over(partition BY 
> a1.province_alias,a1.city_code,a1.version_type
>  ORDER BY cast(a1.cv AS bigint) DESC) AS rank_num,
>  province_name(a1.province_alias) AS province_name,
>  a1.program_set_id AS programset_id,
>  a2.programset_name,
>  a2.type_name AS programset_type,
>  a1.cv,
>  a1.uv,
>  cast(a1.pt/360 as decimal(20,2)) pt,
>  row_number() over (partition by 
> a1.province_alias,a1.city_code,a1.version_type order by cast(a1.uv as bigint) 
> desc ) as rank_num2,
>  row_number() over (partition by 
> a1.province_alias,a1.city_code,a1.version_type order by cast(a1.pt as bigint) 
> desc ) as rank_num3,
>  a1.city_code,
>  a1.city_name,
>  '3' as level,
>  a2.cp_code,
>  a2.cp_name,
>  '20200815'as dt,
>  a1.province_alias,
>  a1.version_type
> FROM temp.dmp_device_vod_valid_day_v1_20200815_hn a1
> LEFT JOIN temp.dmp_device_vod_valid_day_v2_20200815_hn a2 ON 
> a1.program_set_id=a2.programset_id
> WHERE a2.programset_name IS NOT NULL ) zz
> where rank_num<1000 or rank_num2<1000 or rank_num3<1000
> ;
>  
> This sql gets 76742 rows in mr, but 76681 rows in tez.How to fix it?
> I think the problem maybe lies in row_number.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-24078) result rows not equal in mr and tez

2023-08-22 Thread Attila Turoczy (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757287#comment-17757287
 ] 

Attila Turoczy commented on HIVE-24078:
---

I do not think we should work on this anymore. MR is too old and not as 
relevant nowadays as TEZ or LLAP. Also, since the ticket is created the TEZ has 
several new versions. 

> result rows not equal in mr and tez
> ---
>
> Key: HIVE-24078
> URL: https://issues.apache.org/jira/browse/HIVE-24078
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Tez
>Affects Versions: 3.1.2, 4.0.0
>Reporter: kuqiqi
>Assignee: shubhangi priya
>Priority: Blocker
>  Labels: check, obsolete?
>
> select
> rank_num,
> province_name,
> programset_id,
> programset_name,
> programset_type,
> cv,
> uv,
> pt,
> rank_num2,
> rank_num3,
> city_name,
> level,
> cp_code,
> cp_name,
> version_type,
> zz.city_code,
> zz.province_alias,
> '20200815' dt
> from 
> (SELECT row_number() over(partition BY 
> a1.province_alias,a1.city_code,a1.version_type
>  ORDER BY cast(a1.cv AS bigint) DESC) AS rank_num,
>  province_name(a1.province_alias) AS province_name,
>  a1.program_set_id AS programset_id,
>  a2.programset_name,
>  a2.type_name AS programset_type,
>  a1.cv,
>  a1.uv,
>  cast(a1.pt/360 as decimal(20,2)) pt,
>  row_number() over (partition by 
> a1.province_alias,a1.city_code,a1.version_type order by cast(a1.uv as bigint) 
> desc ) as rank_num2,
>  row_number() over (partition by 
> a1.province_alias,a1.city_code,a1.version_type order by cast(a1.pt as bigint) 
> desc ) as rank_num3,
>  a1.city_code,
>  a1.city_name,
>  '3' as level,
>  a2.cp_code,
>  a2.cp_name,
>  '20200815'as dt,
>  a1.province_alias,
>  a1.version_type
> FROM temp.dmp_device_vod_valid_day_v1_20200815_hn a1
> LEFT JOIN temp.dmp_device_vod_valid_day_v2_20200815_hn a2 ON 
> a1.program_set_id=a2.programset_id
> WHERE a2.programset_name IS NOT NULL ) zz
> where rank_num<1000 or rank_num2<1000 or rank_num3<1000
> ;
>  
> This sql gets 76742 rows in mr, but 76681 rows in tez.How to fix it?
> I think the problem maybe lies in row_number.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-24078) result rows not equal in mr and tez

2023-08-22 Thread Attila Turoczy (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Turoczy updated HIVE-24078:
--
Labels: check obsolete?  (was: obsolete?)

> result rows not equal in mr and tez
> ---
>
> Key: HIVE-24078
> URL: https://issues.apache.org/jira/browse/HIVE-24078
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Tez
>Affects Versions: 3.1.2, 4.0.0
>Reporter: kuqiqi
>Assignee: shubhangi priya
>Priority: Blocker
>  Labels: check, obsolete?
>
> select
> rank_num,
> province_name,
> programset_id,
> programset_name,
> programset_type,
> cv,
> uv,
> pt,
> rank_num2,
> rank_num3,
> city_name,
> level,
> cp_code,
> cp_name,
> version_type,
> zz.city_code,
> zz.province_alias,
> '20200815' dt
> from 
> (SELECT row_number() over(partition BY 
> a1.province_alias,a1.city_code,a1.version_type
>  ORDER BY cast(a1.cv AS bigint) DESC) AS rank_num,
>  province_name(a1.province_alias) AS province_name,
>  a1.program_set_id AS programset_id,
>  a2.programset_name,
>  a2.type_name AS programset_type,
>  a1.cv,
>  a1.uv,
>  cast(a1.pt/360 as decimal(20,2)) pt,
>  row_number() over (partition by 
> a1.province_alias,a1.city_code,a1.version_type order by cast(a1.uv as bigint) 
> desc ) as rank_num2,
>  row_number() over (partition by 
> a1.province_alias,a1.city_code,a1.version_type order by cast(a1.pt as bigint) 
> desc ) as rank_num3,
>  a1.city_code,
>  a1.city_name,
>  '3' as level,
>  a2.cp_code,
>  a2.cp_name,
>  '20200815'as dt,
>  a1.province_alias,
>  a1.version_type
> FROM temp.dmp_device_vod_valid_day_v1_20200815_hn a1
> LEFT JOIN temp.dmp_device_vod_valid_day_v2_20200815_hn a2 ON 
> a1.program_set_id=a2.programset_id
> WHERE a2.programset_name IS NOT NULL ) zz
> where rank_num<1000 or rank_num2<1000 or rank_num3<1000
> ;
>  
> This sql gets 76742 rows in mr, but 76681 rows in tez.How to fix it?
> I think the problem maybe lies in row_number.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-24078) result rows not equal in mr and tez

2023-08-22 Thread Attila Turoczy (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Turoczy updated HIVE-24078:
--
Labels: obsolete?  (was: )

> result rows not equal in mr and tez
> ---
>
> Key: HIVE-24078
> URL: https://issues.apache.org/jira/browse/HIVE-24078
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Tez
>Affects Versions: 3.1.2
>Reporter: kuqiqi
>Assignee: shubhangi priya
>Priority: Blocker
>  Labels: obsolete?
>
> select
> rank_num,
> province_name,
> programset_id,
> programset_name,
> programset_type,
> cv,
> uv,
> pt,
> rank_num2,
> rank_num3,
> city_name,
> level,
> cp_code,
> cp_name,
> version_type,
> zz.city_code,
> zz.province_alias,
> '20200815' dt
> from 
> (SELECT row_number() over(partition BY 
> a1.province_alias,a1.city_code,a1.version_type
>  ORDER BY cast(a1.cv AS bigint) DESC) AS rank_num,
>  province_name(a1.province_alias) AS province_name,
>  a1.program_set_id AS programset_id,
>  a2.programset_name,
>  a2.type_name AS programset_type,
>  a1.cv,
>  a1.uv,
>  cast(a1.pt/360 as decimal(20,2)) pt,
>  row_number() over (partition by 
> a1.province_alias,a1.city_code,a1.version_type order by cast(a1.uv as bigint) 
> desc ) as rank_num2,
>  row_number() over (partition by 
> a1.province_alias,a1.city_code,a1.version_type order by cast(a1.pt as bigint) 
> desc ) as rank_num3,
>  a1.city_code,
>  a1.city_name,
>  '3' as level,
>  a2.cp_code,
>  a2.cp_name,
>  '20200815'as dt,
>  a1.province_alias,
>  a1.version_type
> FROM temp.dmp_device_vod_valid_day_v1_20200815_hn a1
> LEFT JOIN temp.dmp_device_vod_valid_day_v2_20200815_hn a2 ON 
> a1.program_set_id=a2.programset_id
> WHERE a2.programset_name IS NOT NULL ) zz
> where rank_num<1000 or rank_num2<1000 or rank_num3<1000
> ;
>  
> This sql gets 76742 rows in mr, but 76681 rows in tez.How to fix it?
> I think the problem maybe lies in row_number.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-24078) result rows not equal in mr and tez

2023-08-22 Thread Attila Turoczy (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Turoczy updated HIVE-24078:
--
Affects Version/s: 4.0.0

> result rows not equal in mr and tez
> ---
>
> Key: HIVE-24078
> URL: https://issues.apache.org/jira/browse/HIVE-24078
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Tez
>Affects Versions: 3.1.2, 4.0.0
>Reporter: kuqiqi
>Assignee: shubhangi priya
>Priority: Blocker
>  Labels: obsolete?
>
> select
> rank_num,
> province_name,
> programset_id,
> programset_name,
> programset_type,
> cv,
> uv,
> pt,
> rank_num2,
> rank_num3,
> city_name,
> level,
> cp_code,
> cp_name,
> version_type,
> zz.city_code,
> zz.province_alias,
> '20200815' dt
> from 
> (SELECT row_number() over(partition BY 
> a1.province_alias,a1.city_code,a1.version_type
>  ORDER BY cast(a1.cv AS bigint) DESC) AS rank_num,
>  province_name(a1.province_alias) AS province_name,
>  a1.program_set_id AS programset_id,
>  a2.programset_name,
>  a2.type_name AS programset_type,
>  a1.cv,
>  a1.uv,
>  cast(a1.pt/360 as decimal(20,2)) pt,
>  row_number() over (partition by 
> a1.province_alias,a1.city_code,a1.version_type order by cast(a1.uv as bigint) 
> desc ) as rank_num2,
>  row_number() over (partition by 
> a1.province_alias,a1.city_code,a1.version_type order by cast(a1.pt as bigint) 
> desc ) as rank_num3,
>  a1.city_code,
>  a1.city_name,
>  '3' as level,
>  a2.cp_code,
>  a2.cp_name,
>  '20200815'as dt,
>  a1.province_alias,
>  a1.version_type
> FROM temp.dmp_device_vod_valid_day_v1_20200815_hn a1
> LEFT JOIN temp.dmp_device_vod_valid_day_v2_20200815_hn a2 ON 
> a1.program_set_id=a2.programset_id
> WHERE a2.programset_name IS NOT NULL ) zz
> where rank_num<1000 or rank_num2<1000 or rank_num3<1000
> ;
>  
> This sql gets 76742 rows in mr, but 76681 rows in tez.How to fix it?
> I think the problem maybe lies in row_number.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-24200) MSCK repair table is not working

2023-08-22 Thread Attila Turoczy (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Turoczy updated HIVE-24200:
--
Labels: obsolete?  (was: )

> MSCK repair table is not working
> 
>
> Key: HIVE-24200
> URL: https://issues.apache.org/jira/browse/HIVE-24200
> Project: Hive
>  Issue Type: Bug
>  Components: CLI, Hive, HiveServer2
>Affects Versions: 3.1.0
>Reporter: stephbat
>Priority: Critical
>  Labels: obsolete?
>
> *+steps to reproduce :+*
> create external table test_sync_part (name string) partitioned by (id int) 
> location '/projects/PTEST/dev/hive/test_sync_part';
> insert into table test_sync_part values ('nom1',1),('nom2',2);
> delete the sub-folder of one partition on the folder 
> /projects/PTEST/dev/hive/test_sync_part
> msck repair table test_sync_part drop partitions;
> {code:java}
> 2020-09-24T14:45:57,419 INFO  [HiveServer2-Background-Pool: Thread-208]: 
> metastore.Msck (:()) - Tables not in metastore: []
> 2020-09-24T14:45:57,419 INFO  [HiveServer2-Background-Pool: Thread-208]: 
> metastore.Msck (:()) - Tables missing on filesystem: []
> 2020-09-24T14:45:57,419 INFO  [HiveServer2-Background-Pool: Thread-208]: 
> metastore.Msck (:()) - Partitions not in metastore: []
> 2020-09-24T14:45:57,419 INFO  [HiveServer2-Background-Pool: Thread-208]: 
> metastore.Msck (:()) - Partitions missing from filesystem: 
> [test_sync_part:id=2]
> 2020-09-24T14:45:57,419 INFO  [HiveServer2-Background-Pool: Thread-208]: 
> metastore.Msck (:()) - Expired partitions: []
> 2020-09-24T14:45:57,420 INFO  [HiveServer2-Background-Pool: Thread-208]: 
> metastore.HiveMetaStoreClient (:()) - Closed a connection to metastore, 
> current connections: 8
> 2020-09-24T14:45:57,420 INFO  [HiveServer2-Background-Pool: Thread-208]: 
> reexec.ReOptimizePlugin (:()) - ReOptimization: retryPossible: false
> 2020-09-24T14:45:57,420 INFO  [HiveServer2-Background-Pool: Thread-208]: 
> hooks.HiveProtoLoggingHook (:()) - Received post-hook notification for: 
> hive_20200924144557_3e164203-720a-4e4a-bbdd-b65f53901e15
> 2020-09-24T14:45:57,421 ERROR [HiveServer2-Background-Pool: Thread-208]: 
> ql.Driver (:()) - FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask
> 2020-09-24T14:45:57,421 INFO  [HiveServer2-Background-Pool: Thread-208]: 
> ql.Driver (:()) - Completed executing 
> command(queryId=hive_20200924144557_3e164203-720a-4e4a-bbdd-b65f53901e15); 
> Time taken: 0.289 seconds
> 2020-09-24T14:45:57,421 INFO  [HiveServer2-Background-Pool: Thread-208]: 
> lockmgr.DbTxnManager (:()) - Stopped heartbeat for query: 
> hive_20200924144557_3e164203-720a-4e4a-bbdd-b65f53901e15
> 2020-09-24T14:45:57,458 ERROR [HiveServer2-Background-Pool: Thread-208]: 
> operation.Operation (:()) - Error running hive query:
> org.apache.hive.service.cli.HiveSQLException: Error while processing 
> statement: FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask
> at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:348)
>  ~[hive-service-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:228)
>  ~[hive-service-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87)
>  ~[hive-service-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
> at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:324)
>  ~[hive-service-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
> at java.security.AccessController.doPrivileged(Native Method) 
> ~[?:1.8.0_112]
> at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_112]
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>  ~[hadoop-common-3.1.1.3.1.4.0-315.jar:?]
> at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:342)
>  ~[hive-service-3.1.0.3.1.4.0-315.jar:3.1.0.3.1.4.0-315]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[?:1.8.0_112]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[?:1.8.0_112]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[?:1.8.0_112]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[?:1.8.0_112]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  ~[?:1.8.0_112]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  ~[?:1.8.0_112]
> at java.lang.Thread.run(Thread.java:745) [?:1.8.0_112]
> {code}
>  



--
This message was sent by 

[jira] [Resolved] (HIVE-25924) CLONE - CLONE - SchemaTool error: Unknown version specified for initialization: 3.1.0

2023-08-22 Thread Attila Turoczy (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Turoczy resolved HIVE-25924.
---
Resolution: Duplicate

HIVE-25923

> CLONE - CLONE - SchemaTool error: Unknown version specified for 
> initialization: 3.1.0
> -
>
> Key: HIVE-25924
> URL: https://issues.apache.org/jira/browse/HIVE-25924
> Project: Hive
>  Issue Type: Bug
>  Components: Database/Schema
>Affects Versions: 3.1.1
>Reporter: MOHAMMAD AAMIR
>Assignee: sushant waghmode
>Priority: Critical
>
> {{While trying to initialise the schema using SchemaTool in Hive 3.1.1, it 
> was failing with this message `}}
> {code:java}
> Unknown version specified for initialization: 3.1.0
> {code}
> {{`}}
> It looks to be a bug to me. I had to use Apache Hive 3.0.0
> {{}}
> {code:java}
> // ./schematool  -dbType Derby -initSchema --verbose
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/home/rhel/apache-hive-3.1.1-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/home/rhel/hadoop-3.2.0/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
> Metastore connection URL: jdbc:derby:;databaseName=metastore_db;create=true
> Metastore Connection Driver : org.apache.derby.jdbc.EmbeddedDriver
> Metastore connection User: APP
> Starting metastore schema initialization to 3.1.0
> org.apache.hadoop.hive.metastore.HiveMetaException: Unknown version specified 
> for initialization: 3.1.0
> org.apache.hadoop.hive.metastore.HiveMetaException: Unknown version specified 
> for initialization: 3.1.0
> at 
> org.apache.hadoop.hive.metastore.MetaStoreSchemaInfo.generateInitFileName(MetaStoreSchemaInfo.java:137)
> at org.apache.hive.beeline.HiveSchemaTool.doInit(HiveSchemaTool.java:585)
> at org.apache.hive.beeline.HiveSchemaTool.doInit(HiveSchemaTool.java:567)
> at org.apache.hive.beeline.HiveSchemaTool.main(HiveSchemaTool.java:1517)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
> *** schemaTool failed ***
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-25924) CLONE - CLONE - SchemaTool error: Unknown version specified for initialization: 3.1.0

2023-08-22 Thread Attila Turoczy (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757271#comment-17757271
 ] 

Attila Turoczy commented on HIVE-25924:
---

As I see it is already resolved. 

> CLONE - CLONE - SchemaTool error: Unknown version specified for 
> initialization: 3.1.0
> -
>
> Key: HIVE-25924
> URL: https://issues.apache.org/jira/browse/HIVE-25924
> Project: Hive
>  Issue Type: Bug
>  Components: Database/Schema
>Affects Versions: 3.1.1
>Reporter: MOHAMMAD AAMIR
>Assignee: sushant waghmode
>Priority: Critical
>
> {{While trying to initialise the schema using SchemaTool in Hive 3.1.1, it 
> was failing with this message `}}
> {code:java}
> Unknown version specified for initialization: 3.1.0
> {code}
> {{`}}
> It looks to be a bug to me. I had to use Apache Hive 3.0.0
> {{}}
> {code:java}
> // ./schematool  -dbType Derby -initSchema --verbose
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/home/rhel/apache-hive-3.1.1-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/home/rhel/hadoop-3.2.0/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
> Metastore connection URL: jdbc:derby:;databaseName=metastore_db;create=true
> Metastore Connection Driver : org.apache.derby.jdbc.EmbeddedDriver
> Metastore connection User: APP
> Starting metastore schema initialization to 3.1.0
> org.apache.hadoop.hive.metastore.HiveMetaException: Unknown version specified 
> for initialization: 3.1.0
> org.apache.hadoop.hive.metastore.HiveMetaException: Unknown version specified 
> for initialization: 3.1.0
> at 
> org.apache.hadoop.hive.metastore.MetaStoreSchemaInfo.generateInitFileName(MetaStoreSchemaInfo.java:137)
> at org.apache.hive.beeline.HiveSchemaTool.doInit(HiveSchemaTool.java:585)
> at org.apache.hive.beeline.HiveSchemaTool.doInit(HiveSchemaTool.java:567)
> at org.apache.hive.beeline.HiveSchemaTool.main(HiveSchemaTool.java:1517)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
> *** schemaTool failed ***
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-25487) Caused by :org.apache.hive.com.esotericsoftware.kryo.KryoException:Unable to find class :S_4

2023-08-22 Thread Attila Turoczy (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Turoczy updated HIVE-25487:
--
Labels: obsolete? patch pull-request-available  (was: patch 
pull-request-available)

> Caused by :org.apache.hive.com.esotericsoftware.kryo.KryoException:Unable  to 
> find class :S_4
> -
>
> Key: HIVE-25487
> URL: https://issues.apache.org/jira/browse/HIVE-25487
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Tez
>Affects Versions: 3.1.1
>Reporter: chengxinpeng
>Assignee: Ashutosh Chauhan
>Priority: Blocker
>  Labels: obsolete?, patch, pull-request-available
> Attachments: 微信图片_20210827223829.jpg
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> at java.lang.CLassloader.loadClass(Classloader.java:351) 
>  at java.lang.Class.forName0(Native Hethod) 
>  at java.lang.Class.forName(Class.Java:348)
>  at org 
> apache.hive.com.esotericsoftware.kryo.utii.DefaultClassResolver.readName(DefaultClassResolver.java:154)
>  
>  ... 63 more
> 2021-08-26 09:27:57,158[INFO[App Shared Poo1 -#1 
> ]|dag.RootInputinitializerManager: Failed Inputinitiatizer for Input: 
> _dumy_tahle on vertex vertex_162774552112_1545_1_00 []Map 1]
> 2021-08-26 09:27:57,159[ERROR(Dispatcher thread (Central) impl.VertexImpl|: 
> Vertex Input:dumy table initializer failed, 
> vertexs=vertex_1627745521112_1545_1_00 [Map 1]
> org.apache.tez.dag.app.dag.impl.AMUserCodeException: 
> java.lang.RuntimeException: failed to load plan: 
> hdfs://nameservicetenant/tmp/hive/hive/8fblf9db-f922-4e31-af4a-12abb4ba405/hive_2021-08-26
>  at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager.runInitializerAndProcessResult
>  RootInputinitializerManager.Tava:158) 
>  at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager.lambda$createAndStartInitializing$1(RootInputInitializerManager.java:132)
>  at java.util.concurrent.Executors$RunnableAdapter.cal1(Executors.java:511)
>  at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutuzeTask.java:125)
>  
>  at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
>  
>  at 
> com.google.common.util.concurrent.TrustedListenableFutureTaak.run(TruseedListenableFutureTank.Java:78)
>  
>  at 
> java.uti1.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.jave:1149)
>  
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  
>  at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException Failed to load plan: 
> hdfs://nameservicetenant/tmp/hive/hive/8fblf9db-f922-4e31-af4a-12abb4ba405/hive_2021-08-26
>  
>  at org apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:528) 
>  at org.apache.hadoop.hive.ql.exec.Utilities.getMapWork (Utilities.java:359)
>  at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat. 
> java:442)
>  at 
> org.apache.hadoop.hive.ql.io.CombineHiveInputformat.getSplits(CombineHiveInputFormat.java:508)
>  
>  at 
> org.apache.tez.mapreduce.hadoop.MRInputHelpers.genecateOldSplits(MRInputHelpers.
>  java:489)
>  at 
> org.apache.tez.mapreduce.hadoop.MRInputHelpers.generateinputSplitsToMem(MRInputHelpers.java:338)
>  at 
> org.apache.tez.mapreduce.common.MRInputAMSp1itGenerator.initialize(MRInputAMSplitGenerator.java:121)
>  at 
> org.apache.tez.dag.app.dag.RootInputInitializerianager.lambda$runInitializer$2(RootInputInitializerManager,
>  java:173) 
>  at java.security.AccessController.doPrivileged (Native Method) 
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
>  at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager.runInitializer(RootInputInitializerManager.java:166)
>  at 
> org.apache.tez.dag.app.dag.RootinputInitializerManager.runInitializerAndProcessResult(RootInputInitializertanager.java:147)
>  ...8 more
> Caused by: org.apache.hive.com.esotericsoftwaze.kryo.KryoException: Unable to 
> find class S_4 
> Serialization trace:
> parentOperators (org.apache.hadoop.hive.ql.exec.fileSinkOperator) 
> childOperators (org.apache.hadoop.hive.ql.exec.UDTFOperacor) 
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator) 
> childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator) 
> aliasTowork (org.apache.hadoop.hive.ql.plan.Mapwork)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.uti1.DefaultClassResolver.readName(DefaultClassResolver.java:133)
>  
>  at 
> org.apache.hive.com.esotericsoftware.kryo.uti1.DefaultClassResolvet.readClass(DefaultClassResolver.Java:156)
>  at org.apache.hive.com.ebotericsoftware.kryo.kryo.readClass 

[jira] [Commented] (HIVE-25487) Caused by :org.apache.hive.com.esotericsoftware.kryo.KryoException:Unable to find class :S_4

2023-08-22 Thread Attila Turoczy (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757261#comment-17757261
 ] 

Attila Turoczy commented on HIVE-25487:
---

Is this still an issue? [~abstractdog] 

> Caused by :org.apache.hive.com.esotericsoftware.kryo.KryoException:Unable  to 
> find class :S_4
> -
>
> Key: HIVE-25487
> URL: https://issues.apache.org/jira/browse/HIVE-25487
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Tez
>Affects Versions: 3.1.1
>Reporter: chengxinpeng
>Assignee: Ashutosh Chauhan
>Priority: Blocker
>  Labels: patch, pull-request-available
> Attachments: 微信图片_20210827223829.jpg
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> at java.lang.CLassloader.loadClass(Classloader.java:351) 
>  at java.lang.Class.forName0(Native Hethod) 
>  at java.lang.Class.forName(Class.Java:348)
>  at org 
> apache.hive.com.esotericsoftware.kryo.utii.DefaultClassResolver.readName(DefaultClassResolver.java:154)
>  
>  ... 63 more
> 2021-08-26 09:27:57,158[INFO[App Shared Poo1 -#1 
> ]|dag.RootInputinitializerManager: Failed Inputinitiatizer for Input: 
> _dumy_tahle on vertex vertex_162774552112_1545_1_00 []Map 1]
> 2021-08-26 09:27:57,159[ERROR(Dispatcher thread (Central) impl.VertexImpl|: 
> Vertex Input:dumy table initializer failed, 
> vertexs=vertex_1627745521112_1545_1_00 [Map 1]
> org.apache.tez.dag.app.dag.impl.AMUserCodeException: 
> java.lang.RuntimeException: failed to load plan: 
> hdfs://nameservicetenant/tmp/hive/hive/8fblf9db-f922-4e31-af4a-12abb4ba405/hive_2021-08-26
>  at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager.runInitializerAndProcessResult
>  RootInputinitializerManager.Tava:158) 
>  at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager.lambda$createAndStartInitializing$1(RootInputInitializerManager.java:132)
>  at java.util.concurrent.Executors$RunnableAdapter.cal1(Executors.java:511)
>  at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutuzeTask.java:125)
>  
>  at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
>  
>  at 
> com.google.common.util.concurrent.TrustedListenableFutureTaak.run(TruseedListenableFutureTank.Java:78)
>  
>  at 
> java.uti1.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.jave:1149)
>  
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  
>  at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException Failed to load plan: 
> hdfs://nameservicetenant/tmp/hive/hive/8fblf9db-f922-4e31-af4a-12abb4ba405/hive_2021-08-26
>  
>  at org apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:528) 
>  at org.apache.hadoop.hive.ql.exec.Utilities.getMapWork (Utilities.java:359)
>  at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat. 
> java:442)
>  at 
> org.apache.hadoop.hive.ql.io.CombineHiveInputformat.getSplits(CombineHiveInputFormat.java:508)
>  
>  at 
> org.apache.tez.mapreduce.hadoop.MRInputHelpers.genecateOldSplits(MRInputHelpers.
>  java:489)
>  at 
> org.apache.tez.mapreduce.hadoop.MRInputHelpers.generateinputSplitsToMem(MRInputHelpers.java:338)
>  at 
> org.apache.tez.mapreduce.common.MRInputAMSp1itGenerator.initialize(MRInputAMSplitGenerator.java:121)
>  at 
> org.apache.tez.dag.app.dag.RootInputInitializerianager.lambda$runInitializer$2(RootInputInitializerManager,
>  java:173) 
>  at java.security.AccessController.doPrivileged (Native Method) 
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
>  at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager.runInitializer(RootInputInitializerManager.java:166)
>  at 
> org.apache.tez.dag.app.dag.RootinputInitializerManager.runInitializerAndProcessResult(RootInputInitializertanager.java:147)
>  ...8 more
> Caused by: org.apache.hive.com.esotericsoftwaze.kryo.KryoException: Unable to 
> find class S_4 
> Serialization trace:
> parentOperators (org.apache.hadoop.hive.ql.exec.fileSinkOperator) 
> childOperators (org.apache.hadoop.hive.ql.exec.UDTFOperacor) 
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator) 
> childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator) 
> aliasTowork (org.apache.hadoop.hive.ql.plan.Mapwork)
>  at 
> org.apache.hive.com.esotericsoftware.kryo.uti1.DefaultClassResolver.readName(DefaultClassResolver.java:133)
>  
>  at 
> org.apache.hive.com.esotericsoftware.kryo.uti1.DefaultClassResolvet.readClass(DefaultClassResolver.Java:156)
>  at org.apache.hive.com.ebotericsoftware.kryo.kryo.readClass (Kryo.java:670)



--
This 

[jira] [Commented] (HIVE-26537) Deprecate older APIs in the HMS

2023-08-22 Thread Attila Turoczy (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757254#comment-17757254
 ] 

Attila Turoczy commented on HIVE-26537:
---

[~hemanth619] Can we move it into in-progress? Also do you need any help to 
check this? We should put or abaddon this for hive 4. 

> Deprecate older APIs in the HMS
> ---
>
> Key: HIVE-26537
> URL: https://issues.apache.org/jira/browse/HIVE-26537
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0-alpha-1, 4.0.0-alpha-2
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Critical
>  Labels: hive-4.0.0-must, pull-request-available
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> This Jira is to track the clean-up(deprecate older APIs and point the HMS 
> client to the newer APIs) work in the hive metastore server.
> More details will be added here soon.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-27618) Backport of HIVE-25446: Wrong execption thrown if capacity<=0

2023-08-22 Thread Sankar Hariappan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan resolved HIVE-27618.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

> Backport of HIVE-25446: Wrong execption thrown if capacity<=0
> -
>
> Key: HIVE-27618
> URL: https://issues.apache.org/jira/browse/HIVE-27618
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Aman Raj
>Assignee: Aman Raj
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-27615) Backport of HIVE-21280 : Null pointer exception on running compaction against a MM table.

2023-08-22 Thread Sankar Hariappan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan resolved HIVE-27615.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

> Backport of HIVE-21280 : Null pointer exception on running compaction against 
> a MM table.
> -
>
> Key: HIVE-27615
> URL: https://issues.apache.org/jira/browse/HIVE-27615
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Aman Raj
>Assignee: Aman Raj
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-27552) Backport of HIVE-22360, HIVE-20619 : MultiDelimitSerDe returns wrong results in last column when the loaded file has more columns than those in table schema

2023-08-22 Thread Sankar Hariappan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan resolved HIVE-27552.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

> Backport of HIVE-22360, HIVE-20619 : MultiDelimitSerDe returns wrong results 
> in last column when the loaded file has more columns than those in table 
> schema
> 
>
> Key: HIVE-27552
> URL: https://issues.apache.org/jira/browse/HIVE-27552
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 3.2.0
>Reporter: Aman Raj
>Assignee: Aman Raj
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-27632) ClassCast Exception in Vectorization converting decimal64 to decimal

2023-08-22 Thread Riju Trivedi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Riju Trivedi reassigned HIVE-27632:
---

Assignee: Stephen Carlin

> ClassCast Exception in Vectorization converting decimal64 to decimal
> 
>
> Key: HIVE-27632
> URL: https://issues.apache.org/jira/browse/HIVE-27632
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Riju Trivedi
>Assignee: Stephen Carlin
>Priority: Major
>  Labels: pull-request-available
> Attachments: vectortest.q
>
>
> Attached [^vectortest.q] which fails with the below ClassCast Exception
> {code:java}
> Caused by: java.lang.ClassCastException: 
> org.apache.hadoop.hive.ql.exec.vector.Decimal64ColumnVector cannot be cast to 
> org.apache.hadoop.hive.ql.exec.vector.DecimalColumnVector
> at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.gen.FilterDecimalColEqualDecimalScalar.evaluate(FilterDecimalColEqualDecimalScalar.java:64)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator.process(VectorFilterOperator.java:125)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:919)
> at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:171)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.deliverVectorizedRowBatch(VectorMapOperator.java:809)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:878)
>  {code}
> This seems related to HIVE-26208 , which avoids Decimal64 to Decimal 
> conversion for the vector expressions that explicitly handle decimal64 types. 
> However, in this scenario exception comes from 
> `FilterDecimalColEqualDecimalScalar`. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)