[jira] [Comment Edited] (HIVE-21100) Allow flattening of table subdirectories resulted when using TEZ engine and UNION clause

2022-12-01 Thread Arnaud Linz (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17641987#comment-17641987
 ] 

Arnaud Linz edited comment on HIVE-21100 at 12/1/22 5:54 PM:
-

The workaround does not always work as sometimes the merge step is skipped, 
despite having set hive.merge.tezfiles=true; (the files must be smaller than 
{{hive.merge.size.per.task}} /   {{hive.merge.smallfiles.avgsize)}}

So to be sure we need to add a "hand made" HDFS move after each query with 
unions to keep the flat directory structure that is necessary for many tools 
(like Dataiku).

Knowing that this post processing is done outside an Hive Lock with direct Hdfs 
access makes it a fragile step... And a very cumbersome one.

This case is not minor to us, it was discovered during a Hive2 
(MR/Spark)->Hive3 (Tez) migration, and has led to numerous production issues.


was (Author: arnaudl):
The workaround does not always work as sometimes the merge step is skipped, 
despite having set hive.merge.tezfiles=true; (the files must be smaller than 
{{hive.merge.size.per.task}} /   {{hive.merge.smallfiles.avgsize)}}

So to be sure we need to add a "hand made" HDFS move after each query with 
unions to keep the flat directory structure that is necessary for many tools 
(like Dataiku). 

Knowing that this post processing is done outside an Hive Lock with direct Hdfs 
access makes it a fragile step... And a very cumbersome one.

This case is not minor to us, it was discovered during a Hive2 
(MR/Spark)->Hive3 (Tez) migration, and has lead to numerous production issues.

> Allow flattening of table subdirectories resulted when using TEZ engine and 
> UNION clause
> 
>
> Key: HIVE-21100
> URL: https://issues.apache.org/jira/browse/HIVE-21100
> Project: Hive
>  Issue Type: Improvement
>Reporter: George Pachitariu
>Assignee: George Pachitariu
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HIVE-21100.1.patch, HIVE-21100.2.patch, 
> HIVE-21100.3.patch, HIVE-21100.patch
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Right now, when writing data into a table with Tez engine and the clause 
> UNION ALL is the last step of the query, Hive on Tez will create a 
> subdirectory for each branch of the UNION ALL.
> With this patch the subdirectories are removed, and the files are renamed and 
> moved to the parent directory.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HIVE-21100) Allow flattening of table subdirectories resulted when using TEZ engine and UNION clause

2022-12-01 Thread Arnaud Linz (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17641987#comment-17641987
 ] 

Arnaud Linz edited comment on HIVE-21100 at 12/1/22 5:53 PM:
-

The workaround does not always work as sometimes the merge step is skipped, 
despite having set hive.merge.tezfiles=true; (the files must be smaller than 
{{hive.merge.size.per.task}} /   {{hive.merge.smallfiles.avgsize)}}

So to be sure we need to add a "hand made" HDFS move after each query with 
unions to keep the flat directory structure that is necessary for many tools 
(like Dataiku). 

Knowing that this post processing is done outside an Hive Lock with direct Hdfs 
access makes it a fragile step... And a very cumbersome one.

This case is not minor to us, it was discovered during a Hive2 
(MR/Spark)->Hive3 (Tez) migration, and has lead to numerous production issues.


was (Author: arnaudl):
The workaround does not always work as sometimes the merge step is skipped, 
despite having set hive.merge.tezfiles=true;
So to be sure we need to add a "hand made" HDFS move after each query with 
unions to keep the flat directory structure that is necessary for many tools 
(like Dataiku). Very cumbersome in my opinion.

> Allow flattening of table subdirectories resulted when using TEZ engine and 
> UNION clause
> 
>
> Key: HIVE-21100
> URL: https://issues.apache.org/jira/browse/HIVE-21100
> Project: Hive
>  Issue Type: Improvement
>Reporter: George Pachitariu
>Assignee: George Pachitariu
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HIVE-21100.1.patch, HIVE-21100.2.patch, 
> HIVE-21100.3.patch, HIVE-21100.patch
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Right now, when writing data into a table with Tez engine and the clause 
> UNION ALL is the last step of the query, Hive on Tez will create a 
> subdirectory for each branch of the UNION ALL.
> With this patch the subdirectories are removed, and the files are renamed and 
> moved to the parent directory.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-21100) Allow flattening of table subdirectories resulted when using TEZ engine and UNION clause

2022-12-01 Thread Arnaud Linz (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17641987#comment-17641987
 ] 

Arnaud Linz commented on HIVE-21100:


The workaround does not always work as sometimes the merge step is skipped, 
despite having set hive.merge.tezfiles=true;
So to be sure we need to add a "hand made" HDFS move after each query with 
unions to keep the flat directory structure that is necessary for many tools 
(like Dataiku). Very cumbersome in my opinion.

> Allow flattening of table subdirectories resulted when using TEZ engine and 
> UNION clause
> 
>
> Key: HIVE-21100
> URL: https://issues.apache.org/jira/browse/HIVE-21100
> Project: Hive
>  Issue Type: Improvement
>Reporter: George Pachitariu
>Assignee: George Pachitariu
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HIVE-21100.1.patch, HIVE-21100.2.patch, 
> HIVE-21100.3.patch, HIVE-21100.patch
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Right now, when writing data into a table with Tez engine and the clause 
> UNION ALL is the last step of the query, Hive on Tez will create a 
> subdirectory for each branch of the UNION ALL.
> With this patch the subdirectories are removed, and the files are renamed and 
> moved to the parent directory.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-24507) "File file:XXX.jar does not exist" when changing content of "hive.reloadable.aux.jars.path" directories

2020-12-08 Thread Arnaud Linz (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arnaud Linz updated HIVE-24507:
---
Description: 
Purpose of hive.reloadable.aux.jars.path, introduced by 
https://issues.apache.org/jira/browse/HIVE-7553 was do avoid scheduling 
maintenance window for every jar change, but it is not enough.

On a large system, the lack of atomicity between the directory listing of jars 
contained in hive.reloadable.aux.jars.path and the actual use of the file when 
uploaded to the job's yarn resources may lead to query failures, even if no 
jar/UDF is used in the failing query (because it is a global parameter).

Stack trace sample:
{code:java}
 File file:/XXX.jar does not exist
   at 
org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:641)
   at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:867)
   at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:631)
   at 
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442)
   at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:378)
   at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:329)
   at 
org.apache.hadoop.mapreduce.JobResourceUploader.copyRemoteFiles(JobResourceUploader.java:703)
   at 
org.apache.hadoop.mapreduce.JobResourceUploader.uploadLibJars(JobResourceUploader.java:315)
   at 
org.apache.hadoop.mapreduce.JobResourceUploader.uploadResourcesInternal(JobResourceUploader.java:207)
   at 
org.apache.hadoop.mapreduce.JobResourceUploader.uploadResources(JobResourceUploader.java:135)
   at 
org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:99)
   at 
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:194)
   at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1570)
   at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1567)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:422)
   at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1567)
   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:576)
   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:571)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:422)
   at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
   at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:571)
   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:562)
   at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:444)
   at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:151)
   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199)
   at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97)
   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2200)
   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1843)
   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1563)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1339)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1334)
   at 
org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:256)
   at 
org.apache.hive.service.cli.operation.SQLOperation.access$600(SQLOperation.java:92)
   at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:345)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:422)
   at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
   at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:357)
   at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   at java.lang.Thread.run(Thread.java:748)
{code}
 

It's probably not possible to achieve atomicity, but this lack of atomicity 
should be taken into account and this error should be a warning. Actually, if a 
jar is removed, it's probably because no query are using it any longer. And if 
it was really used, it will trigger another ClassNotFound error later that, 
with the warning log, can suffice.

 

 

  was:
Purpose of hive.reloadable.aux.jars.path, introduced by 
https://issues.apache.org/jira/browse/HIVE-7553 was do avoid scheduling 
maintenance window for every jar change, but it is not enough.

On a large system, the lack of atomicity between the directory listing of 

[jira] [Updated] (HIVE-24507) "File file:XXX.jar does not exist" when changing content of "hive.reloadable.aux.jars.path" directories

2020-12-08 Thread Arnaud Linz (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arnaud Linz updated HIVE-24507:
---
Summary: "File file:XXX.jar does not exist" when changing content of 
"hive.reloadable.aux.jars.path" directories  (was: "File file:XXX.jar does not 
exist" when changing content of "hive.reloadable.aux.jars.path" directory 
content)

> "File file:XXX.jar does not exist" when changing content of 
> "hive.reloadable.aux.jars.path" directories
> ---
>
> Key: HIVE-24507
> URL: https://issues.apache.org/jira/browse/HIVE-24507
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 2.1.1
>Reporter: Arnaud Linz
>Priority: Major
>
> Purpose of hive.reloadable.aux.jars.path, introduced by 
> https://issues.apache.org/jira/browse/HIVE-7553 was do avoid scheduling 
> maintenance window for every jar change, but it is not enough.
> On a large system, the lack of atomicity between the directory listing of 
> jars contained in hive.reloadable.aux.jars.path and the actual use of the 
> file when uploaded to the job's yarn resources lead to query failures, even 
> if no jar/UDF is used in the failing query (because it is a global parameter).
> Stack trace sample:
> {code:java}
>  File file:/XXX.jar does not exist
>at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:641)
>at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:867)
>at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:631)
>at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442)
>at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:378)
>at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:329)
>at 
> org.apache.hadoop.mapreduce.JobResourceUploader.copyRemoteFiles(JobResourceUploader.java:703)
>at 
> org.apache.hadoop.mapreduce.JobResourceUploader.uploadLibJars(JobResourceUploader.java:315)
>at 
> org.apache.hadoop.mapreduce.JobResourceUploader.uploadResourcesInternal(JobResourceUploader.java:207)
>at 
> org.apache.hadoop.mapreduce.JobResourceUploader.uploadResources(JobResourceUploader.java:135)
>at 
> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:99)
>at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:194)
>at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1570)
>at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1567)
>at java.security.AccessController.doPrivileged(Native Method)
>at javax.security.auth.Subject.doAs(Subject.java:422)
>at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
>at org.apache.hadoop.mapreduce.Job.submit(Job.java:1567)
>at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:576)
>at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:571)
>at java.security.AccessController.doPrivileged(Native Method)
>at javax.security.auth.Subject.doAs(Subject.java:422)
>at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
>at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:571)
>at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:562)
>at 
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:444)
>at 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:151)
>at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199)
>at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97)
>at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2200)
>at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1843)
>at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1563)
>at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1339)
>at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1334)
>at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:256)
>at 
> org.apache.hive.service.cli.operation.SQLOperation.access$600(SQLOperation.java:92)
>at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:345)
>at java.security.AccessController.doPrivileged(Native Method)
>at javax.security.auth.Subject.doAs(Subject.java:422)
>at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
>at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:357)
>at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>at 

[jira] [Updated] (HIVE-24507) "File file:XXX.jar does not exist" when changing content of "hive.reloadable.aux.jars.path" directory content

2020-12-08 Thread Arnaud Linz (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arnaud Linz updated HIVE-24507:
---
Description: 
Purpose of hive.reloadable.aux.jars.path, introduced by 
https://issues.apache.org/jira/browse/HIVE-7553 was do avoid scheduling 
maintenance window for every jar change, but it is not enough.

On a large system, the lack of atomicity between the directory listing of jars 
contained in hive.reloadable.aux.jars.path and the actual use of the file when 
uploaded to the job's yarn resources lead to query failures, even if no jar/UDF 
is used in the failing query (because it is a global parameter).

Stack trace sample:
{code:java}
 File file:/XXX.jar does not exist
   at 
org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:641)
   at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:867)
   at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:631)
   at 
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442)
   at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:378)
   at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:329)
   at 
org.apache.hadoop.mapreduce.JobResourceUploader.copyRemoteFiles(JobResourceUploader.java:703)
   at 
org.apache.hadoop.mapreduce.JobResourceUploader.uploadLibJars(JobResourceUploader.java:315)
   at 
org.apache.hadoop.mapreduce.JobResourceUploader.uploadResourcesInternal(JobResourceUploader.java:207)
   at 
org.apache.hadoop.mapreduce.JobResourceUploader.uploadResources(JobResourceUploader.java:135)
   at 
org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:99)
   at 
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:194)
   at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1570)
   at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1567)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:422)
   at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1567)
   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:576)
   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:571)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:422)
   at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
   at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:571)
   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:562)
   at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:444)
   at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:151)
   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199)
   at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97)
   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2200)
   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1843)
   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1563)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1339)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1334)
   at 
org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:256)
   at 
org.apache.hive.service.cli.operation.SQLOperation.access$600(SQLOperation.java:92)
   at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:345)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:422)
   at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
   at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:357)
   at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   at java.lang.Thread.run(Thread.java:748)
{code}
 

It's probably not possible to achieve atomicity, but this lack of atomicity 
should be taken into account and this error should be a warning. Actually, if a 
jar is removed, it's probably because no query are using it any longer. And if 
it was really used, it will trigger another ClassNotFound error later that, 
with the warning log, can suffice.

 

 

  was:
Purpose of hive.reloadable.aux.jars.path, introduced by 
https://issues.apache.org/jira/browse/HIVE-7553 was do avoid scheduling 
maintenance window for every jar change, but it is not enough.

On a large system, the lack of atomicity between the directory listing of jars 

[jira] [Updated] (HIVE-23105) JDBC regression breaks getUpdateCount / getMoreResult API contract

2020-03-30 Thread Arnaud Linz (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arnaud Linz updated HIVE-23105:
---
Summary: JDBC regression breaks getUpdateCount / getMoreResult API contract 
 (was: HiveServer2 regression breaks getUpdateCount / getMoreResult API 
contract)

> JDBC regression breaks getUpdateCount / getMoreResult API contract
> --
>
> Key: HIVE-23105
> URL: https://issues.apache.org/jira/browse/HIVE-23105
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 2.1.1
>Reporter: Arnaud Linz
>Priority: Major
>
> Migrating from CDH 5.16 (Hive 1.1.0+) to CDH 6.3 (Hive 2.1.1+) introduced a 
> regression in the the JDBC driver.
> It was detected in a "agnostic" jdbc handling service which works for several 
> DBMS including Teradata, Impala, and the former Hive driver.
>  
> java.sql.Statement JDBC Interface method :
> {code:java}
>  /** 
>  *  Retrieves the current result as an update count; 
>  *  if the result is a ResultSet object or there are no more 
> results, -1 
>  *  is returned. This method should be called only once per result. 
>  * 
>  * @return the current result as an update count; -1 if the current 
> result is a 
>  * ResultSet object or there are no more results 
>  * @exception SQLException if a database access error occurs or 
>  * this method is called on a closed Statement 
>  * @see #execute 
>  */ 
> int getUpdateCount() throws SQLException; {code}
>     Does not return -1 when it should, it rather throws :
>  
> {code:java}
> Caused by: java.sql.SQLException: 
> org.apache.thrift.protocol.TProtocolException: Required field 
> 'operationHandle' is unset! 
> Struct:TGetOperationStatusReq(operationHandle:null) 
> at 
> org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:395)
>  
> at 
> org.apache.hive.jdbc.HiveStatement.getUpdateCount(HiveStatement.java:688) 
> ... 30 more 
> Caused by: org.apache.thrift.protocol.TProtocolException: Required field 
> 'operationHandle' is unset! 
> Struct:TGetOperationStatusReq(operationHandle:null) 
> at 
> org.apache.hive.service.rpc.thrift.TGetOperationStatusReq.validate(TGetOperationStatusReq.java:294)
>  
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$GetOperationStatus_args.validate(TCLIService.java:12587)
>  
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$GetOperationStatus_args$GetOperationStatus_argsStandardScheme.write(TCLIService.java:12644)
>  
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$GetOperationStatus_args$GetOperationStatus_argsStandardScheme.write(TCLIService.java:12613)
>  
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$GetOperationStatus_args.write(TCLIService.java:12564)
>  
> at org.apache.thrift.TServiceClient.sendBase(TServiceClient.java:71) 
> at org.apache.thrift.TServiceClient.sendBase(TServiceClient.java:62) 
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Client.send_GetOperationStatus(TCLIService.java:461)
>  
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Client.GetOperationStatus(TCLIService.java:453)
>  
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  
> at java.lang.reflect.Method.invoke(Method.java:498) 
> at 
> org.apache.hive.jdbc.HiveConnection$SynchronizedHandler.invoke(HiveConnection.java:1415)
>  
> at com.sun.proxy.$Proxy20.GetOperationStatus(Unknown Source) 
> at 
> org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:364)
>  
> ... 33 more {code}
>  
> And method:
> {code:java}
>     /** 
>  * Moves to this Statement object's next result, returns 
>  * true if it is a ResultSet object, and 
>  * implicitly closes any current ResultSet 
>  * object(s) obtained with the method getResultSet. 
>  * 
>  * There are no more results when the following is true: 
>  * {@code 
>  * // stmt is a Statement object 
>  * ((stmt.getMoreResults() == false) && (stmt.getUpdateCount() == 
> -1)) 
>  * } 
>  * 
>  * @return true if the next result is a 
> ResultSet 
>  * object; false if it is an update count or there 
> are 
>  * no more results 
>  * @exception SQLException if a database access error occurs or 
>  * this method is called on a closed Statement 
>  * @see #execute 
>  */ 
> boolean getMoreResults() throws SQLException; 
> {code}
> Always returns true if the statement is not a result set whereas false is 
> expected.
>  
>  
>  

[jira] [Updated] (HIVE-23105) HiveServer2 regression breaks getUpdateCount / getMoreResult API contract

2020-03-30 Thread Arnaud Linz (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arnaud Linz updated HIVE-23105:
---
Description: 
Migrating from CDH 5.16 (Hive 1.1.0+) to CDH 6.3 (Hive 2.1.1+) introduced a 
regression in the the JDBC driver.

It was detected in a "agnostic" jdbc handling service which works for several 
DBMS including Teradata, Impala, and the former Hive driver.

 

java.sql.Statement JDBC Interface method :
{code:java}
 /** 
 *  Retrieves the current result as an update count; 
 *  if the result is a ResultSet object or there are no more 
results, -1 
 *  is returned. This method should be called only once per result. 
 * 
 * @return the current result as an update count; -1 if the current result 
is a 
 * ResultSet object or there are no more results 
 * @exception SQLException if a database access error occurs or 
 * this method is called on a closed Statement 
 * @see #execute 
 */ 
int getUpdateCount() throws SQLException; {code}
    Does not return -1 when it should, it rather throws :

 
{code:java}
Caused by: java.sql.SQLException: 
org.apache.thrift.protocol.TProtocolException: Required field 'operationHandle' 
is unset! Struct:TGetOperationStatusReq(operationHandle:null) 
at 
org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:395)
 
at 
org.apache.hive.jdbc.HiveStatement.getUpdateCount(HiveStatement.java:688) 
... 30 more 
Caused by: org.apache.thrift.protocol.TProtocolException: Required field 
'operationHandle' is unset! Struct:TGetOperationStatusReq(operationHandle:null) 
at 
org.apache.hive.service.rpc.thrift.TGetOperationStatusReq.validate(TGetOperationStatusReq.java:294)
 
at 
org.apache.hive.service.rpc.thrift.TCLIService$GetOperationStatus_args.validate(TCLIService.java:12587)
 
at 
org.apache.hive.service.rpc.thrift.TCLIService$GetOperationStatus_args$GetOperationStatus_argsStandardScheme.write(TCLIService.java:12644)
 
at 
org.apache.hive.service.rpc.thrift.TCLIService$GetOperationStatus_args$GetOperationStatus_argsStandardScheme.write(TCLIService.java:12613)
 
at 
org.apache.hive.service.rpc.thrift.TCLIService$GetOperationStatus_args.write(TCLIService.java:12564)
 
at org.apache.thrift.TServiceClient.sendBase(TServiceClient.java:71) 
at org.apache.thrift.TServiceClient.sendBase(TServiceClient.java:62) 
at 
org.apache.hive.service.rpc.thrift.TCLIService$Client.send_GetOperationStatus(TCLIService.java:461)
 
at 
org.apache.hive.service.rpc.thrift.TCLIService$Client.GetOperationStatus(TCLIService.java:453)
 
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 
at java.lang.reflect.Method.invoke(Method.java:498) 
at 
org.apache.hive.jdbc.HiveConnection$SynchronizedHandler.invoke(HiveConnection.java:1415)
 
at com.sun.proxy.$Proxy20.GetOperationStatus(Unknown Source) 
at 
org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:364)
 
... 33 more {code}
 

And method:
{code:java}
    /** 
 * Moves to this Statement object's next result, returns 
 * true if it is a ResultSet object, and 
 * implicitly closes any current ResultSet 
 * object(s) obtained with the method getResultSet. 
 * 
 * There are no more results when the following is true: 
 * {@code 
 * // stmt is a Statement object 
 * ((stmt.getMoreResults() == false) && (stmt.getUpdateCount() == -1)) 
 * } 
 * 
 * @return true if the next result is a ResultSet 
 * object; false if it is an update count or there are 
 * no more results 
 * @exception SQLException if a database access error occurs or 
 * this method is called on a closed Statement 
 * @see #execute 
 */ 
boolean getMoreResults() throws SQLException; 
{code}
Always returns true if the statement is not a result set whereas false is 
expected.

 

 

 

 

  was:
Migrating from CDH 5.16 (Hive 1.1.0+) to CDH 6.3 (Hive 2.1.1+) introduced a 
regression in the the JDBC driver.

It was detected in a "agnostic" jdbc handling service which works for several 
DBMS including Teradata, Impala, and the former Hive driver.

 

java.sql.Statement JDBC Interface method :
{code:java}
 /** 
 *  Retrieves the current result as an update count; 
 *  if the result is a ResultSet object or there are no more 
results, -1 
 *  is returned. This method should be called only once per result. 
 * 
 * @return the current result as an update count; -1 if the current result 
is a 
 * ResultSet object or there are no more results 
 * @exception SQLException if a database access error occurs or 
 * this method is 

[jira] [Updated] (HIVE-23105) HiveServer2 regression breaks getUpdateCount / getMoreResult API contract

2020-03-30 Thread Arnaud Linz (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arnaud Linz updated HIVE-23105:
---
Description: 
Migrating from CDH 5.16 (Hive 1.1.0+) to CDH 6.3 (Hive 2.1.1+) introduced a 
regression in the the JDBC driver.

It was detected in a "agnostic" jdbc handling service which works for several 
DBMS including Teradata, Impala, and the former Hive driver.

 

java.sql.Statement JDBC Interface method :
{code:java}
 /** 
 *  Retrieves the current result as an update count; 
 *  if the result is a ResultSet object or there are no more 
results, -1 
 *  is returned. This method should be called only once per result. 
 * 
 * @return the current result as an update count; -1 if the current result 
is a 
 * ResultSet object or there are no more results 
 * @exception SQLException if a database access error occurs or 
 * this method is called on a closed Statement 
 * @see #execute 
 */ 
int getUpdateCount() throws SQLException; {code}
    Does not return -1 when it should, it rather throws :

 
{code:java}
Caused by: java.sql.SQLException: 
org.apache.thrift.protocol.TProtocolException: Required field 'operationHandle' 
is unset! Struct:TGetOperationStatusReq(operationHandle:null) 
at 
org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:395)
 
at 
org.apache.hive.jdbc.HiveStatement.getUpdateCount(HiveStatement.java:688) 
... 30 more 
Caused by: org.apache.thrift.protocol.TProtocolException: Required field 
'operationHandle' is unset! Struct:TGetOperationStatusReq(operationHandle:null) 
at 
org.apache.hive.service.rpc.thrift.TGetOperationStatusReq.validate(TGetOperationStatusReq.java:294)
 
at 
org.apache.hive.service.rpc.thrift.TCLIService$GetOperationStatus_args.validate(TCLIService.java:12587)
 
at 
org.apache.hive.service.rpc.thrift.TCLIService$GetOperationStatus_args$GetOperationStatus_argsStandardScheme.write(TCLIService.java:12644)
 
at 
org.apache.hive.service.rpc.thrift.TCLIService$GetOperationStatus_args$GetOperationStatus_argsStandardScheme.write(TCLIService.java:12613)
 
at 
org.apache.hive.service.rpc.thrift.TCLIService$GetOperationStatus_args.write(TCLIService.java:12564)
 
at org.apache.thrift.TServiceClient.sendBase(TServiceClient.java:71) 
at org.apache.thrift.TServiceClient.sendBase(TServiceClient.java:62) 
at 
org.apache.hive.service.rpc.thrift.TCLIService$Client.send_GetOperationStatus(TCLIService.java:461)
 
at 
org.apache.hive.service.rpc.thrift.TCLIService$Client.GetOperationStatus(TCLIService.java:453)
 
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 
at java.lang.reflect.Method.invoke(Method.java:498) 
at 
org.apache.hive.jdbc.HiveConnection$SynchronizedHandler.invoke(HiveConnection.java:1415)
 
at com.sun.proxy.$Proxy20.GetOperationStatus(Unknown Source) 
at 
org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:364)
 
... 33 more {code}
 

And method:
{code:java}
    /** 
 * Moves to this Statement object's next result, returns 
 * true if it is a ResultSet object, and 
 * implicitly closes any current ResultSet 
 * object(s) obtained with the method getResultSet. 
 * 
 * There are no more results when the following is true: 
 * {@code 
 * // stmt is a Statement object 
 * ((stmt.getMoreResults() == false) && (stmt.getUpdateCount() == -1)) 
 * } 
 * 
 * @return true if the next result is a ResultSet 
 * object; false if it is an update count or there are 
 * no more results 
 * @exception SQLException if a database access error occurs or 
 * this method is called on a closed Statement 
 * @see #execute 
 */ 
boolean getMoreResults() throws SQLException; 
{code}
Always returns true if the statement is not a result set whereas false is 
expected (especially since the javadoc's ((stmt.getMoreResults() == false) && 
(stmt.getUpdateCount() == -1)) throws an Exception...) 

 

 

 

 

  was:
Migrating from CDH 5.16 (Hive 1.1.0+) to CDH 6.3 (Hive 2.1.1+) introduced a 
regression in the the JDBC driver.

It was detected in a "agnostic" jdbc handling service which works for several 
DBMS including Teradata, Impala, and the former Hive driver.

 

 

Statement JDBC Method :
{code:java}
 /** 
 *  Retrieves the current result as an update count; 
 *  if the result is a ResultSet object or there are no more 
results, -1 
 *  is returned. This method should be called only once per result. 
 * 
 * @return the current result as an update count; -1 if the current result 
is a 
 * ResultSet object or there are no 

[jira] [Commented] (HIVE-11339) org.apache.hadoop.hive.serde2.io.TimestampWritable.write(DataOutput out) makes incorrect cast

2016-04-28 Thread Arnaud Linz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15262393#comment-15262393
 ] 

Arnaud Linz commented on HIVE-11339:


Ok for me.

> org.apache.hadoop.hive.serde2.io.TimestampWritable.write(DataOutput out) 
> makes incorrect cast
> -
>
> Key: HIVE-11339
> URL: https://issues.apache.org/jira/browse/HIVE-11339
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 0.14.0
>Reporter: Arnaud Linz
>Assignee: Zoltan Haindrich
>  Labels: easyfix, newbie
> Attachments: HIVE-11339.patch
>
>
> Hi, it's my first Jira and I don't know how to make patches, so I'll explain 
> the issue in the description as it is rather simple.
> I have a problem serializing "DefaultHCatRecord" using Apache Flink when 
> those records include Timestamps because of an incorrect class cast in 
> org.apache.hadoop.hive.serde2.io.TimestampWritable.write(DataOutput out). It 
> is implemented using a cast to Outputstream  : 
> public void write(DataOutput out) throws IOException {
> write((OutputStream) out);
>  }
> but nothing says that a DataOutput object is an OutputStream, (and it's not 
> the case in Flink) it should rather be implmented using the same code as 
> write(OutputStream) :
> {
> checkBytes();
> out.write(currentBytes, offset, getTotalLength());
> }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)