[jira] [Updated] (HIVE-23801) TestMiniLlapLocalCliDriver[replication_metrics_ingest] is flaky

2021-06-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23801:
--
Labels: pull-request-available  (was: )

> TestMiniLlapLocalCliDriver[replication_metrics_ingest] is flaky
> ---
>
> Key: HIVE-23801
> URL: https://issues.apache.org/jira/browse/HIVE-23801
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Reporter: Peter Vary
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This test is flaky. See: 
> [http://ci.hive.apache.org/job/hive-flaky-check/62/console]
> {code:java}
> 21:59:19  [INFO] ---
> 21:59:19  [INFO]  T E S T S
> 21:59:19  [INFO] ---
> 21:59:19  [INFO] Running org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver
> 22:01:56  [ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time 
> elapsed: 144.366 s <<< FAILURE! - in 
> org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver
> 22:01:56  [ERROR] 
> org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[replication_metrics_ingest]
>   Time elapsed: 124.174 s  <<< FAILURE!
> 22:01:56  java.lang.AssertionError: 
> 22:01:56  Client Execution succeeded but contained differences (error code = 
> 1) after executing replication_metrics_ingest.q 
> 22:01:56  76c76
> 22:01:56  < 3 repl2   1
> 22:01:56  ---
> 22:01:56  > 2 repl2   1
>  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23801) TestMiniLlapLocalCliDriver[replication_metrics_ingest] is flaky

2021-06-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23801?focusedWorklogId=617394=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-617394
 ]

ASF GitHub Bot logged work on HIVE-23801:
-

Author: ASF GitHub Bot
Created on: 01/Jul/21 03:50
Start Date: 01/Jul/21 03:50
Worklog Time Spent: 10m 
  Work Description: ayushtkn opened a new pull request #2443:
URL: https://github.com/apache/hive/pull/2443


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 617394)
Remaining Estimate: 0h
Time Spent: 10m

> TestMiniLlapLocalCliDriver[replication_metrics_ingest] is flaky
> ---
>
> Key: HIVE-23801
> URL: https://issues.apache.org/jira/browse/HIVE-23801
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Reporter: Peter Vary
>Assignee: Ayush Saxena
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This test is flaky. See: 
> [http://ci.hive.apache.org/job/hive-flaky-check/62/console]
> {code:java}
> 21:59:19  [INFO] ---
> 21:59:19  [INFO]  T E S T S
> 21:59:19  [INFO] ---
> 21:59:19  [INFO] Running org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver
> 22:01:56  [ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time 
> elapsed: 144.366 s <<< FAILURE! - in 
> org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver
> 22:01:56  [ERROR] 
> org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[replication_metrics_ingest]
>   Time elapsed: 124.174 s  <<< FAILURE!
> 22:01:56  java.lang.AssertionError: 
> 22:01:56  Client Execution succeeded but contained differences (error code = 
> 1) after executing replication_metrics_ingest.q 
> 22:01:56  76c76
> 22:01:56  < 3 repl2   1
> 22:01:56  ---
> 22:01:56  > 2 repl2   1
>  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23801) TestMiniLlapLocalCliDriver[replication_metrics_ingest] is flaky

2021-06-30 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena reassigned HIVE-23801:
---

Assignee: Ayush Saxena

> TestMiniLlapLocalCliDriver[replication_metrics_ingest] is flaky
> ---
>
> Key: HIVE-23801
> URL: https://issues.apache.org/jira/browse/HIVE-23801
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Reporter: Peter Vary
>Assignee: Ayush Saxena
>Priority: Major
>
> This test is flaky. See: 
> [http://ci.hive.apache.org/job/hive-flaky-check/62/console]
> {code:java}
> 21:59:19  [INFO] ---
> 21:59:19  [INFO]  T E S T S
> 21:59:19  [INFO] ---
> 21:59:19  [INFO] Running org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver
> 22:01:56  [ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time 
> elapsed: 144.366 s <<< FAILURE! - in 
> org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver
> 22:01:56  [ERROR] 
> org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[replication_metrics_ingest]
>   Time elapsed: 124.174 s  <<< FAILURE!
> 22:01:56  java.lang.AssertionError: 
> 22:01:56  Client Execution succeeded but contained differences (error code = 
> 1) after executing replication_metrics_ingest.q 
> 22:01:56  76c76
> 22:01:56  < 3 repl2   1
> 22:01:56  ---
> 22:01:56  > 2 repl2   1
>  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25303) CTAS hive.create.as.external.legacy tries to place data files in managed WH path

2021-06-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25303?focusedWorklogId=617372=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-617372
 ]

ASF GitHub Bot logged work on HIVE-25303:
-

Author: ASF GitHub Bot
Created on: 01/Jul/21 02:01
Start Date: 01/Jul/21 02:01
Worklog Time Spent: 10m 
  Work Description: saihemanth-cloudera opened a new pull request #2442:
URL: https://github.com/apache/hive/pull/2442


   …ble path to external path when CTAS hive.create.as.external.legacy is set
   
   This reverts commit 5430dda6f259635d12f6ffee34455c24c73a7082.
   
   
   
   ### What changes were proposed in this pull request?
   Revert Hive-24625 and update task compiler code to set table to external 
when doing CTAS and hive.create.as.external.legacy is set.
   
   
   
   ### Why are the changes needed?
   Otherwise when hive.create.as.external.legacy is set, table creation is 
happening in managed path
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   
   ### How was this patch tested?
   Local and remote machine.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 617372)
Remaining Estimate: 0h
Time Spent: 10m

> CTAS hive.create.as.external.legacy tries to place data files in managed WH 
> path
> 
>
> Key: HIVE-25303
> URL: https://issues.apache.org/jira/browse/HIVE-25303
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Standalone Metastore
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Under legacy table creation mode (hive.create.as.external.legacy=true), when 
> a database has been created in a specific LOCATION, in a session where that 
> database is USEd, tables created using
> CREATE TABLE  AS SELECT 
> should inherit the HDFS path from the database's location.
> Instead, Hive is trying to write the table data into 
> /warehouse/tablespace/managed/hive//



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25303) CTAS hive.create.as.external.legacy tries to place data files in managed WH path

2021-06-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25303:
--
Labels: pull-request-available  (was: )

> CTAS hive.create.as.external.legacy tries to place data files in managed WH 
> path
> 
>
> Key: HIVE-25303
> URL: https://issues.apache.org/jira/browse/HIVE-25303
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Standalone Metastore
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Under legacy table creation mode (hive.create.as.external.legacy=true), when 
> a database has been created in a specific LOCATION, in a session where that 
> database is USEd, tables created using
> CREATE TABLE  AS SELECT 
> should inherit the HDFS path from the database's location.
> Instead, Hive is trying to write the table data into 
> /warehouse/tablespace/managed/hive//



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25303) CTAS hive.create.as.external.legacy tries to place data files in managed WH path

2021-06-30 Thread Sai Hemanth Gantasala (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sai Hemanth Gantasala reassigned HIVE-25303:



> CTAS hive.create.as.external.legacy tries to place data files in managed WH 
> path
> 
>
> Key: HIVE-25303
> URL: https://issues.apache.org/jira/browse/HIVE-25303
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Standalone Metastore
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>
> Under legacy table creation mode (hive.create.as.external.legacy=true), when 
> a database has been created in a specific LOCATION, in a session where that 
> database is USEd, tables created using
> CREATE TABLE  AS SELECT 
> should inherit the HDFS path from the database's location.
> Instead, Hive is trying to write the table data into 
> /warehouse/tablespace/managed/hive//



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24483) Bump protobuf version to 3.12.0

2021-06-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24483?focusedWorklogId=617353=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-617353
 ]

ASF GitHub Bot logged work on HIVE-24483:
-

Author: ASF GitHub Bot
Created on: 01/Jul/21 00:08
Start Date: 01/Jul/21 00:08
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #1975:
URL: https://github.com/apache/hive/pull/1975#issuecomment-871806524


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 617353)
Time Spent: 2h 10m  (was: 2h)

> Bump protobuf version to 3.12.0
> ---
>
> Key: HIVE-24483
> URL: https://issues.apache.org/jira/browse/HIVE-24483
> Project: Hive
>  Issue Type: Improvement
>Reporter: Cameron Moberg
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> The following protoc version's used in hive is very old i.e. 2.5.0 
> [https://repo.maven.apache.org/maven2/com/google/protobuf/protoc/] .  The 
> v2.5.0 does not have aarch64 support. But the AArch64 support started from 
> v3.5.0 on-words in google's protobuf project release. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25048) Refine the start/end functions in HMSHandler

2021-06-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25048?focusedWorklogId=617346=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-617346
 ]

ASF GitHub Bot logged work on HIVE-25048:
-

Author: ASF GitHub Bot
Created on: 30/Jun/21 23:46
Start Date: 30/Jun/21 23:46
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 opened a new pull request #2441:
URL: https://github.com/apache/hive/pull/2441


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 617346)
Time Spent: 50m  (was: 40m)

> Refine the start/end functions in HMSHandler
> 
>
> Key: HIVE-25048
> URL: https://issues.apache.org/jira/browse/HIVE-25048
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Some start/end functions are incomplete in the HMSHandler, the functions can 
> audit the use actions, monitor the performance, and notify the listeners.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25164) Execute Bootstrap REPL load DDL tasks in parallel

2021-06-30 Thread Pravin Sinha (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17372162#comment-17372162
 ] 

Pravin Sinha commented on HIVE-25164:
-

Committed to master.

Thanks for the review, [~aasha]  !!

> Execute Bootstrap REPL load DDL tasks in parallel
> -
>
> Key: HIVE-25164
> URL: https://issues.apache.org/jira/browse/HIVE-25164
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25164) Execute Bootstrap REPL load DDL tasks in parallel

2021-06-30 Thread Pravin Sinha (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pravin Sinha updated HIVE-25164:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Execute Bootstrap REPL load DDL tasks in parallel
> -
>
> Key: HIVE-25164
> URL: https://issues.apache.org/jira/browse/HIVE-25164
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25164) Execute Bootstrap REPL load DDL tasks in parallel

2021-06-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25164?focusedWorklogId=617231=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-617231
 ]

ASF GitHub Bot logged work on HIVE-25164:
-

Author: ASF GitHub Bot
Created on: 30/Jun/21 19:08
Start Date: 30/Jun/21 19:08
Worklog Time Spent: 10m 
  Work Description: pkumarsinha merged pull request #2320:
URL: https://github.com/apache/hive/pull/2320


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 617231)
Time Spent: 0.5h  (was: 20m)

> Execute Bootstrap REPL load DDL tasks in parallel
> -
>
> Key: HIVE-25164
> URL: https://issues.apache.org/jira/browse/HIVE-25164
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24849) Create external table socket timeout when location has large number of files

2021-06-30 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17372145#comment-17372145
 ] 

Steve Loughran commented on HIVE-24849:
---


Something like this
  
* existence check is integrated into the LIST call, saves 1 request against all 
stores, 2 for s3
* listing is incremental, so can determine if a dir is not empty without 
processing all the output,
  at least on those stores which do the listings incrementally (hdfs, webhdfs, 
s3a, abfs).
  on the others it is no slower than listStatus, which is what it calls 
internally.
  
{code}


  public boolean isEmpty() throws HiveException {
Preconditions.checkNotNull(getPath());
try {
  FileSystem fs = FileSystem.get(getPath().toUri(),
  SessionState.getSessionConf());
  RemoteIterator it = listStatusIterator(getPath);
  while (it.hasNext()) {
FileStatus fs = it.next();
if (FileUtils.HIDDEN_FILES_PATH_FILTER.accept(fs.getPath())) {
  // something not matching the filter exists
  return false;
}

  }
  return true;
} catch (FileNotFoundException e) {
  // list failed
  return true;
} catch (IOException e) {
  throw new HiveException(e);
}
  }
  {code}

  {code}

For isDir(), I'd just call FileSystem.isDir(path) and ignore the deprecation 
warning, which is there to make people look at their uses and wonder if there 
are more efficient ways. (too often app code has isFile, isDir or exists() 
before some API Call which would just raise a FileNotFoundException anyway). 

What I would I recommend doing is looking at uses of isDir() and wondering if 
they could be eliminated entirely.

> Create external table socket timeout when location has large number of files
> 
>
> Key: HIVE-24849
> URL: https://issues.apache.org/jira/browse/HIVE-24849
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.3.4
> Environment: AWS EMR 5.23 with default Hive metastore and external 
> location S3
>  
>Reporter: Mithun Antony
>Priority: Major
>
> # The create table API call timeout when during an external table creation on 
> a location where the number files in the S3 location is large ( ie: ~10K 
> objects ).
> The default timeout `hive.metastore.client.socket.timeout` is `600s` current 
> workaround is it to increase the timeout to a higher value
> {code:java}
> 2021-03-04T01:37:42,761 ERROR [66b8024b-e52f-42b8-8629-a45383bcac0c 
> main([])]: exec.DDLTask (DDLTask.java:failed(639)) - 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.thrift.transport.TTransportException: 
> java.net.SocketTimeoutException: Read timed out
>  at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:873)
>  at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:878)
>  at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:4356)
>  at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:354)
>  at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199)
>  at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
>  at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2183)
>  at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1839)
>  at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1526)
>  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
>  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)
>  at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
>  at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
>  at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
>  at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336)
>  at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:474)
>  at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:490)
>  at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:793)
>  at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
>  at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at org.apache.hadoop.util.RunJar.run(RunJar.java:239)
>  at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
> Caused by: org.apache.thrift.transport.TTransportException: 
> java.net.SocketTimeoutException: Read timed out
>  at 
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
>  at 

[jira] [Commented] (HIVE-25164) Execute Bootstrap REPL load DDL tasks in parallel

2021-06-30 Thread Aasha Medhi (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17372133#comment-17372133
 ] 

Aasha Medhi commented on HIVE-25164:


+1

> Execute Bootstrap REPL load DDL tasks in parallel
> -
>
> Key: HIVE-25164
> URL: https://issues.apache.org/jira/browse/HIVE-25164
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25301) Expose notification log table through sys db

2021-06-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25301?focusedWorklogId=617079=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-617079
 ]

ASF GitHub Bot logged work on HIVE-25301:
-

Author: ASF GitHub Bot
Created on: 30/Jun/21 14:49
Start Date: 30/Jun/21 14:49
Worklog Time Spent: 10m 
  Work Description: ayushtkn opened a new pull request #2440:
URL: https://github.com/apache/hive/pull/2440


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 617079)
Remaining Estimate: 0h
Time Spent: 10m

> Expose notification log table through sys db
> 
>
> Key: HIVE-25301
> URL: https://issues.apache.org/jira/browse/HIVE-25301
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Expose the notification_log table in RDBMS through Hive sys database



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25301) Expose notification log table through sys db

2021-06-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25301:
--
Labels: pull-request-available  (was: )

> Expose notification log table through sys db
> 
>
> Key: HIVE-25301
> URL: https://issues.apache.org/jira/browse/HIVE-25301
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Expose the notification_log table in RDBMS through Hive sys database



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25301) Expose notification log table through sys db

2021-06-30 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena reassigned HIVE-25301:
---


> Expose notification log table through sys db
> 
>
> Key: HIVE-25301
> URL: https://issues.apache.org/jira/browse/HIVE-25301
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>
> Expose the notification_log table in RDBMS through Hive sys database



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-19616) Enable TestAutoPurgeTables test

2021-06-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-19616?focusedWorklogId=617041=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-617041
 ]

ASF GitHub Bot logged work on HIVE-19616:
-

Author: ASF GitHub Bot
Created on: 30/Jun/21 13:33
Start Date: 30/Jun/21 13:33
Worklog Time Spent: 10m 
  Work Description: sankarh merged pull request #2438:
URL: https://github.com/apache/hive/pull/2438


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 617041)
Time Spent: 0.5h  (was: 20m)

> Enable TestAutoPurgeTables test
> ---
>
> Key: HIVE-19616
> URL: https://issues.apache.org/jira/browse/HIVE-19616
> Project: Hive
>  Issue Type: Test
>  Components: Test
>Affects Versions: 3.1.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Ashish Sharma
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Disabled by HIVE-19589.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-19616) Enable TestAutoPurgeTables test

2021-06-30 Thread Sankar Hariappan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-19616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan resolved HIVE-19616.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

> Enable TestAutoPurgeTables test
> ---
>
> Key: HIVE-19616
> URL: https://issues.apache.org/jira/browse/HIVE-19616
> Project: Hive
>  Issue Type: Test
>  Components: Test
>Affects Versions: 3.1.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Ashish Sharma
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Disabled by HIVE-19589.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-19707) Enable TestJdbcWithMiniHS2#testHttpRetryOnServerIdleTimeout

2021-06-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-19707?focusedWorklogId=617039=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-617039
 ]

ASF GitHub Bot logged work on HIVE-19707:
-

Author: ASF GitHub Bot
Created on: 30/Jun/21 13:29
Start Date: 30/Jun/21 13:29
Worklog Time Spent: 10m 
  Work Description: sankarh merged pull request #2431:
URL: https://github.com/apache/hive/pull/2431


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 617039)
Time Spent: 20m  (was: 10m)

> Enable TestJdbcWithMiniHS2#testHttpRetryOnServerIdleTimeout
> ---
>
> Key: HIVE-19707
> URL: https://issues.apache.org/jira/browse/HIVE-19707
> Project: Hive
>  Issue Type: Test
>Affects Versions: 3.1.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Ashish Sharma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-19707) Enable TestJdbcWithMiniHS2#testHttpRetryOnServerIdleTimeout

2021-06-30 Thread Sankar Hariappan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-19707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan resolved HIVE-19707.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

> Enable TestJdbcWithMiniHS2#testHttpRetryOnServerIdleTimeout
> ---
>
> Key: HIVE-19707
> URL: https://issues.apache.org/jira/browse/HIVE-19707
> Project: Hive
>  Issue Type: Test
>Affects Versions: 3.1.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Ashish Sharma
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25219) Backward incompatible timestamp serialization in Avro for certain timezones

2021-06-30 Thread Nikhil Gupta (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nikhil Gupta updated HIVE-25219:

Labels: compatibility pull-request-available timestamp  (was: 
pull-request-available)

> Backward incompatible timestamp serialization in Avro for certain timezones
> ---
>
> Key: HIVE-25219
> URL: https://issues.apache.org/jira/browse/HIVE-25219
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 3.1.0
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: compatibility, pull-request-available, timestamp
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> HIVE-12192, HIVE-20007 changed the way that timestamp computations are 
> performed and to some extend how timestamps are serialized and deserialized 
> in files (Parquet, Avro).
> In versions that include HIVE-12192 or HIVE-20007 the serialization in Avro 
> files is not backwards compatible. In other words writing timestamps with a 
> version of Hive that includes HIVE-12192/HIVE-20007 and reading them with 
> another (not including the previous issues) may lead to different results 
> depending on the default timezone of the system.
> Consider the following scenario where the default system timezone is set to 
> US/Pacific.
> At apache/master commit eedcd82bc2d61861a27205f925ba0ffab9b6bca8
> {code:sql}
> CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS AVRO
>  LOCATION '/tmp/hiveexttbl/employee';
> INSERT INTO employee VALUES (1, '1880-01-01 00:00:00');
> INSERT INTO employee VALUES (2, '1884-01-01 00:00:00');
> INSERT INTO employee VALUES (3, '1990-01-01 00:00:00');
> SELECT * FROM employee;
> {code}
> |1|1880-01-01 00:00:00|
> |2|1884-01-01 00:00:00|
> |3|1990-01-01 00:00:00|
> At apache/branch-2.3 commit 324f9faf12d4b91a9359391810cb3312c004d356
> {code:sql}
> CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS AVRO
>  LOCATION '/tmp/hiveexttbl/employee';
> SELECT * FROM employee;
> {code}
> |1|1879-12-31 23:52:58|
> |2|1884-01-01 00:00:00|
> |3|1990-01-01 00:00:00|
> The timestamp for {{eid=1}} in branch-2.3 is different from the one in master.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24322) In case of direct insert, the attempt ID has to be checked when reading the manifest files

2021-06-30 Thread Aditya Shah (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17371967#comment-17371967
 ] 

Aditya Shah commented on HIVE-24322:


[~kuczoram] I was confused as to how the biggest task attempt id ensure that 
the particular attempt was completed successfully. In case of speculative 
execution the previous task may have finished first and the speculative task 
might have been killed post this while it is writing commit Paths or partition 
specs (multi-stmt IOW with DP). Am I missing something?

> In case of direct insert, the attempt ID has to be checked when reading the 
> manifest files
> --
>
> Key: HIVE-24322
> URL: https://issues.apache.org/jira/browse/HIVE-24322
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Marta Kuczora
>Assignee: Marta Kuczora
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In IMPALA-10247 there was an exception from Hive when tyring to load the data:
> {noformat}
> 2020-10-13T16:50:53,424 ERROR [HiveServer2-Background-Pool: Thread-23832] 
> exec.Task: Job Commit failed with exception 
> 'org.apache.hadoop.hive.ql.metadata.HiveException(java.io.EOFException)'
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.EOFException
>  at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.jobCloseOp(FileSinkOperator.java:1468)
>  at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:798)
>  at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:803)
>  at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:803)
>  at org.apache.hadoop.hive.ql.exec.tez.TezTask.close(TezTask.java:627)
>  at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:342)
>  at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213)
>  at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
>  at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:357)
>  at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330)
>  at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246)
>  at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109)
>  at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721)
>  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:488)
>  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:482)
>  at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166)
>  at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225)
>  at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87)
>  at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
>  at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:340)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.EOFException
>  at java.io.DataInputStream.readInt(DataInputStream.java:392)
>  at 
> org.apache.hadoop.hive.ql.exec.Utilities.handleDirectInsertTableFinalPath(Utilities.java:4587)
>  at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.jobCloseOp(FileSinkOperator.java:1462)
>  ... 29 more
> {noformat}
> The reason of the exception was that Hive was trying to read an empty 
> manifest file. Manifest files are used in case of direct insert to determine 
> which files needs to be kept and which one needs to be cleaned up. They are 
> created by the tasks and they use the task attempt Id as postfix. In this 
> particular test what happened is that one of the container ran out of memory 
> so Tez decided to kill it right after the manifest file got created but 
> before the paths got written into the manifest file. This was the manifest 
> file for the task attempt 0. Then Tez assigned a new container to the task, 
> so a new attempt was made with attemptId=1. This one was successful, and 
> wrote the manifest file correctly. But Hive didn't know about this, since 
> this out of memory issue got 

[jira] [Commented] (HIVE-24849) Create external table socket timeout when location has large number of files

2021-06-30 Thread Sungwoo (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17371947#comment-17371947
 ] 

Sungwoo commented on HIVE-24849:


{code:java}
  public boolean isEmpty() throws HiveException {
Preconditions.checkNotNull(getPath());
try {
  FileSystem fs = FileSystem.get(getPath().toUri(), 
SessionState.getSessionConf());
  return !fs.exists(getPath()) || fs.listStatus(getPath(), 
FileUtils.HIDDEN_FILES_PATH_FILTER).length == 0;
} catch (IOException e) {
  throw new HiveException(e);
}
  }
{code}

So, isEmpty() calls listStatus(). So, isEmpty() can be rewritten using 
listStatusIterator/listFiles() as you suggest.

Do you have an idea how to rewrite isDir()?

{code:java}
  public boolean isDir(Path f) throws MetaException {
FileSystem fs;
try {
  fs = getFs(f);
  FileStatus fstatus = fs.getFileStatus(f);
  if (!fstatus.isDir()) {
return false;
  }
} catch (FileNotFoundException e) {
  return false;
} catch (IOException e) {
  MetaStoreUtils.logAndThrowMetaException(e);
}
return true;
  }
{code}


> Create external table socket timeout when location has large number of files
> 
>
> Key: HIVE-24849
> URL: https://issues.apache.org/jira/browse/HIVE-24849
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.3.4
> Environment: AWS EMR 5.23 with default Hive metastore and external 
> location S3
>  
>Reporter: Mithun Antony
>Priority: Major
>
> # The create table API call timeout when during an external table creation on 
> a location where the number files in the S3 location is large ( ie: ~10K 
> objects ).
> The default timeout `hive.metastore.client.socket.timeout` is `600s` current 
> workaround is it to increase the timeout to a higher value
> {code:java}
> 2021-03-04T01:37:42,761 ERROR [66b8024b-e52f-42b8-8629-a45383bcac0c 
> main([])]: exec.DDLTask (DDLTask.java:failed(639)) - 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.thrift.transport.TTransportException: 
> java.net.SocketTimeoutException: Read timed out
>  at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:873)
>  at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:878)
>  at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:4356)
>  at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:354)
>  at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199)
>  at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
>  at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2183)
>  at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1839)
>  at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1526)
>  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
>  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)
>  at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
>  at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
>  at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
>  at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336)
>  at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:474)
>  at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:490)
>  at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:793)
>  at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
>  at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at org.apache.hadoop.util.RunJar.run(RunJar.java:239)
>  at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
> Caused by: org.apache.thrift.transport.TTransportException: 
> java.net.SocketTimeoutException: Read timed out
>  at 
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
>  at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
>  at 
> org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
>  at 
> org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
>  at 
> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
>  at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)
>  at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_create_table_with_environment_context(ThriftHiveMetastore.java:1199)
>  at 
> 

[jira] [Comment Edited] (HIVE-24849) Create external table socket timeout when location has large number of files

2021-06-30 Thread Sungwoo (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17371947#comment-17371947
 ] 

Sungwoo edited comment on HIVE-24849 at 6/30/21, 11:10 AM:
---

{code:java}
  public boolean isEmpty() throws HiveException {
Preconditions.checkNotNull(getPath());
try {
  FileSystem fs = FileSystem.get(getPath().toUri(), 
SessionState.getSessionConf());
  return !fs.exists(getPath()) || fs.listStatus(getPath(), 
FileUtils.HIDDEN_FILES_PATH_FILTER).length == 0;
} catch (IOException e) {
  throw new HiveException(e);
}
  }
{code}

isEmpty() calls listStatus(). So, isEmpty() can be rewritten using 
listStatusIterator/listFiles() as you suggest.

Do you have an idea how to rewrite isDir()?

{code:java}
  public boolean isDir(Path f) throws MetaException {
FileSystem fs;
try {
  fs = getFs(f);
  FileStatus fstatus = fs.getFileStatus(f);
  if (!fstatus.isDir()) {
return false;
  }
} catch (FileNotFoundException e) {
  return false;
} catch (IOException e) {
  MetaStoreUtils.logAndThrowMetaException(e);
}
return true;
  }
{code}



was (Author: glapark):
{code:java}
  public boolean isEmpty() throws HiveException {
Preconditions.checkNotNull(getPath());
try {
  FileSystem fs = FileSystem.get(getPath().toUri(), 
SessionState.getSessionConf());
  return !fs.exists(getPath()) || fs.listStatus(getPath(), 
FileUtils.HIDDEN_FILES_PATH_FILTER).length == 0;
} catch (IOException e) {
  throw new HiveException(e);
}
  }
{code}

So, isEmpty() calls listStatus(). So, isEmpty() can be rewritten using 
listStatusIterator/listFiles() as you suggest.

Do you have an idea how to rewrite isDir()?

{code:java}
  public boolean isDir(Path f) throws MetaException {
FileSystem fs;
try {
  fs = getFs(f);
  FileStatus fstatus = fs.getFileStatus(f);
  if (!fstatus.isDir()) {
return false;
  }
} catch (FileNotFoundException e) {
  return false;
} catch (IOException e) {
  MetaStoreUtils.logAndThrowMetaException(e);
}
return true;
  }
{code}


> Create external table socket timeout when location has large number of files
> 
>
> Key: HIVE-24849
> URL: https://issues.apache.org/jira/browse/HIVE-24849
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.3.4
> Environment: AWS EMR 5.23 with default Hive metastore and external 
> location S3
>  
>Reporter: Mithun Antony
>Priority: Major
>
> # The create table API call timeout when during an external table creation on 
> a location where the number files in the S3 location is large ( ie: ~10K 
> objects ).
> The default timeout `hive.metastore.client.socket.timeout` is `600s` current 
> workaround is it to increase the timeout to a higher value
> {code:java}
> 2021-03-04T01:37:42,761 ERROR [66b8024b-e52f-42b8-8629-a45383bcac0c 
> main([])]: exec.DDLTask (DDLTask.java:failed(639)) - 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.thrift.transport.TTransportException: 
> java.net.SocketTimeoutException: Read timed out
>  at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:873)
>  at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:878)
>  at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:4356)
>  at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:354)
>  at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199)
>  at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
>  at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2183)
>  at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1839)
>  at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1526)
>  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
>  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)
>  at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
>  at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
>  at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
>  at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336)
>  at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:474)
>  at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:490)
>  at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:793)
>  at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
>  at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> 

[jira] [Work logged] (HIVE-25297) Refactor GenericUDFDateDiff

2021-06-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25297?focusedWorklogId=616978=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-616978
 ]

ASF GitHub Bot logged work on HIVE-25297:
-

Author: ASF GitHub Bot
Created on: 30/Jun/21 09:27
Start Date: 30/Jun/21 09:27
Worklog Time Spent: 10m 
  Work Description: ashish-kumar-sharma commented on pull request #2437:
URL: https://github.com/apache/hive/pull/2437#issuecomment-871242515


   @zabetak Could you please review the PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 616978)
Time Spent: 40m  (was: 0.5h)

> Refactor GenericUDFDateDiff
> ---
>
> Key: HIVE-25297
> URL: https://issues.apache.org/jira/browse/HIVE-25297
> Project: Hive
>  Issue Type: Task
>  Components: UDF
>Affects Versions: All Versions
>Reporter: Ashish Sharma
>Assignee: Ashish Sharma
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Description
> Remove redundant code and refactor entire GenericUDFDateDiff.class code



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-19616) Enable TestAutoPurgeTables test

2021-06-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-19616?focusedWorklogId=616972=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-616972
 ]

ASF GitHub Bot logged work on HIVE-19616:
-

Author: ASF GitHub Bot
Created on: 30/Jun/21 09:04
Start Date: 30/Jun/21 09:04
Worklog Time Spent: 10m 
  Work Description: ashish-kumar-sharma commented on pull request #2438:
URL: https://github.com/apache/hive/pull/2438#issuecomment-871225800


   @kgyrtkirk Could you please review the PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 616972)
Time Spent: 20m  (was: 10m)

> Enable TestAutoPurgeTables test
> ---
>
> Key: HIVE-19616
> URL: https://issues.apache.org/jira/browse/HIVE-19616
> Project: Hive
>  Issue Type: Test
>  Components: Test
>Affects Versions: 3.1.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Ashish Sharma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Disabled by HIVE-19589.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25293) Alter partitioned table with "cascade" option create too many columns records.

2021-06-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25293?focusedWorklogId=616957=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-616957
 ]

ASF GitHub Bot logged work on HIVE-25293:
-

Author: ASF GitHub Bot
Created on: 30/Jun/21 08:11
Start Date: 30/Jun/21 08:11
Worklog Time Spent: 10m 
  Work Description: liaoyt commented on pull request #2434:
URL: https://github.com/apache/hive/pull/2434#issuecomment-871190443


   @kgyrtkirk Could you please review the PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 616957)
Time Spent: 40m  (was: 0.5h)

> Alter partitioned table with "cascade" option create too many columns records.
> --
>
> Key: HIVE-25293
> URL: https://issues.apache.org/jira/browse/HIVE-25293
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 2.3.3, 3.1.2
>Reporter: yongtaoliao
>Assignee: yongtaoliao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> When alter partitioned table with "cascade" option, all partitions supports 
> to be updated. Currently, a CD_ID will be created for each partition, 
> associated with a set of Columns, which will cause a large amount of 
> redundant data in the metadata database.
> The following DDL statements can reproduce this scenario:
>  
> {code:java}
> create table test_table (f1 int) partitioned by (p string);
> alter table test_table add partition(p='a');
> alter table test_table add partition(p='b');
> alter table test_table add partition(p='c');
> alter table test_table add columns (f2 int) cascade;{code}
> All partitions use the table's `CD_ID` before adding columns, while each 
> partition use their own `CD_ID` after adding columns.
>  
> My proposal is all partitions should use the same `CD_ID` when table was 
> altered with "cascade" option.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25299) Casting timestamp to numeric data types is incorrect for non-UTC timezones

2021-06-30 Thread Adesh Kumar Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adesh Kumar Rao updated HIVE-25299:
---
Description: 
*Hive 1.2.1*
{noformat}
Connected to: Apache Hive (version 1.2.1000.2.6.5.3033-1)
Driver: Hive JDBC (version 1.2.1000.2.6.5.3033-1)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 1.2.1000.2.6.5.3033-1 by Apache Hive
0: jdbc:hive2://zk0-nikhil.ae4yqb3genuuvaozdf> select cast ( cast ("2021-03-14 
01:00:00" as timestamp) as int);
+-+--+
| _c0 |
+-+--+
| 1615658400  |
+-+--+
1 row selected (0.387 seconds)
0: jdbc:hive2://zk0-nikhil.ae4yqb3genuuvaozdf> select cast ( cast ("2021-03-14 
01:00:00" as timestamp) as bigint);
+-+--+
| _c0 |
+-+--+
| 1615658400  |
+-+--+
1 row selected (0.369 seconds)
0: jdbc:hive2://zk0-nikhil.ae4yqb3genuuvaozdf> select cast ( cast ("2021-03-14 
01:00:00" as timestamp) as double);
+--+--+
| _c0  |
+--+--+
| 1.6156584E9  |
+--+--+
{noformat}
*Hive 3.1, 4.0*
{noformat}
Connected to: Apache Hive (version 3.1.0.3.1.6.1-6)
Driver: Hive JDBC (version 3.1.4.4.1.4.8)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 3.1.4.4.1.4.8 by Apache Hive
0: jdbc:hive2://zk0-nikhil.usmltwlt0ncuxmbost> select cast ( cast ("2021-03-14 
01:00:00" as timestamp) as int);
+-+
| _c0 |
+-+
| 1615683600  |
+-+
1 row selected (0.666 seconds)
0: jdbc:hive2://zk0-nikhil.usmltwlt0ncuxmbost> select cast ( cast ("2021-03-14 
01:00:00" as timestamp) as bigint);
+-+
| _c0 |
+-+
| 1615683600  |
+-+
1 row selected (0.536 seconds)
0: jdbc:hive2://zk0-nikhil.usmltwlt0ncuxmbost> select cast ( cast ("2021-03-14 
01:00:00" as timestamp) as double);
+--+
| _c0  |
+--+
| 1.6156836E9  |
+--+
1 row selected (0.696 seconds)
{noformat}
 

The issue occurs for non-UTC timezone (VM timezone is set to 'Asia/Bangkok').

  was:
*Hive 1.2.1*
{noformat}
Connected to: Apache Hive (version 1.2.1000.2.6.5.3033-1)
Driver: Hive JDBC (version 1.2.1000.2.6.5.3033-1)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 1.2.1000.2.6.5.3033-1 by Apache Hive
0: jdbc:hive2://zk0-nikhil.ae4yqb3genuuvaozdf> select cast ( cast ("2021-03-14 
01:00:00" as timestamp) as int);
+-+--+
| _c0 |
+-+--+
| 1615658400  |
+-+--+
1 row selected (0.387 seconds)
0: jdbc:hive2://zk0-nikhil.ae4yqb3genuuvaozdf> select cast ( cast ("2021-03-14 
01:00:00" as timestamp) as bigint);
+-+--+
| _c0 |
+-+--+
| 1615658400  |
+-+--+
1 row selected (0.369 seconds)
0: jdbc:hive2://zk0-nikhil.ae4yqb3genuuvaozdf> select cast ( cast ("2021-03-14 
01:00:00" as timestamp) as double);
+--+--+
| _c0  |
+--+--+
| 1.6156584E9  |
+--+--+
{noformat}
*Hive 3.1, 4.0*
{noformat}
Connected to: Apache Hive (version 3.1.0.3.1.6.1-6)
Driver: Hive JDBC (version 3.1.4.4.1.4.8)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 3.1.4.4.1.4.8 by Apache Hive
0: jdbc:hive2://zk0-nikhil.usmltwlt0ncuxmbost> select cast ( cast ("2021-03-14 
01:00:00" as timestamp) as int);
+-+
| _c0 |
+-+
| 1615683600  |
+-+
1 row selected (0.666 seconds)
0: jdbc:hive2://zk0-nikhil.usmltwlt0ncuxmbost> select cast ( cast ("2021-03-14 
01:00:00" as timestamp) as bigint);
+-+
| _c0 |
+-+
| 1615683600  |
+-+
1 row selected (0.536 seconds)
0: jdbc:hive2://zk0-nikhil.usmltwlt0ncuxmbost> select cast ( cast ("2021-03-14 
01:00:00" as timestamp) as double);
+--+
| _c0  |
+--+
| 1.6156836E9  |
+--+
1 row selected (0.696 seconds)
{noformat}
 

The issue occurs for non-UTC timezone.


> Casting timestamp to numeric data types is incorrect for non-UTC timezones
> --
>
> Key: HIVE-25299
> URL: https://issues.apache.org/jira/browse/HIVE-25299
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 3.1.2, 4.0.0
>Reporter: Adesh Kumar Rao
>Assignee: Adesh Kumar Rao
>Priority: Major
> Fix For: 4.0.0
>
>
> *Hive 1.2.1*
> {noformat}
> Connected to: Apache Hive (version 1.2.1000.2.6.5.3033-1)
> Driver: Hive JDBC (version 1.2.1000.2.6.5.3033-1)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> Beeline version 1.2.1000.2.6.5.3033-1 by Apache Hive
> 0: jdbc:hive2://zk0-nikhil.ae4yqb3genuuvaozdf> select cast ( cast 
> ("2021-03-14 01:00:00" as timestamp) as int);
> +-+--+
> | _c0 |
> +-+--+
> | 1615658400  |
> 

[jira] [Updated] (HIVE-25299) Casting timestamp to numeric data types is incorrect for non-UTC timezones

2021-06-30 Thread Adesh Kumar Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adesh Kumar Rao updated HIVE-25299:
---
Summary: Casting timestamp to numeric data types is incorrect for non-UTC 
timezones  (was: Casting timestamp to numeric data types is incorrect)

> Casting timestamp to numeric data types is incorrect for non-UTC timezones
> --
>
> Key: HIVE-25299
> URL: https://issues.apache.org/jira/browse/HIVE-25299
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 3.1.2, 4.0.0
>Reporter: Adesh Kumar Rao
>Assignee: Adesh Kumar Rao
>Priority: Major
> Fix For: 4.0.0
>
>
> *Hive 1.2.1*
> {noformat}
> Connected to: Apache Hive (version 1.2.1000.2.6.5.3033-1)
> Driver: Hive JDBC (version 1.2.1000.2.6.5.3033-1)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> Beeline version 1.2.1000.2.6.5.3033-1 by Apache Hive
> 0: jdbc:hive2://zk0-nikhil.ae4yqb3genuuvaozdf> select cast ( cast 
> ("2021-03-14 01:00:00" as timestamp) as int);
> +-+--+
> | _c0 |
> +-+--+
> | 1615658400  |
> +-+--+
> 1 row selected (0.387 seconds)
> 0: jdbc:hive2://zk0-nikhil.ae4yqb3genuuvaozdf> select cast ( cast 
> ("2021-03-14 01:00:00" as timestamp) as bigint);
> +-+--+
> | _c0 |
> +-+--+
> | 1615658400  |
> +-+--+
> 1 row selected (0.369 seconds)
> 0: jdbc:hive2://zk0-nikhil.ae4yqb3genuuvaozdf> select cast ( cast 
> ("2021-03-14 01:00:00" as timestamp) as double);
> +--+--+
> | _c0  |
> +--+--+
> | 1.6156584E9  |
> +--+--+
> {noformat}
> *Hive 3.1, 4.0*
> {noformat}
> Connected to: Apache Hive (version 3.1.0.3.1.6.1-6)
> Driver: Hive JDBC (version 3.1.4.4.1.4.8)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> Beeline version 3.1.4.4.1.4.8 by Apache Hive
> 0: jdbc:hive2://zk0-nikhil.usmltwlt0ncuxmbost> select cast ( cast 
> ("2021-03-14 01:00:00" as timestamp) as int);
> +-+
> | _c0 |
> +-+
> | 1615683600  |
> +-+
> 1 row selected (0.666 seconds)
> 0: jdbc:hive2://zk0-nikhil.usmltwlt0ncuxmbost> select cast ( cast 
> ("2021-03-14 01:00:00" as timestamp) as bigint);
> +-+
> | _c0 |
> +-+
> | 1615683600  |
> +-+
> 1 row selected (0.536 seconds)
> 0: jdbc:hive2://zk0-nikhil.usmltwlt0ncuxmbost> select cast ( cast 
> ("2021-03-14 01:00:00" as timestamp) as double);
> +--+
> | _c0  |
> +--+
> | 1.6156836E9  |
> +--+
> 1 row selected (0.696 seconds)
> {noformat}
>  
> The issue occurs for non-UTC timezone.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25299) Casting timestamp to numeric data types is incorrect

2021-06-30 Thread Adesh Kumar Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adesh Kumar Rao reassigned HIVE-25299:
--


> Casting timestamp to numeric data types is incorrect
> 
>
> Key: HIVE-25299
> URL: https://issues.apache.org/jira/browse/HIVE-25299
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 3.1.2, 4.0.0
>Reporter: Adesh Kumar Rao
>Assignee: Adesh Kumar Rao
>Priority: Major
> Fix For: 4.0.0
>
>
> *Hive 1.2.1*
> {noformat}
> Connected to: Apache Hive (version 1.2.1000.2.6.5.3033-1)
> Driver: Hive JDBC (version 1.2.1000.2.6.5.3033-1)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> Beeline version 1.2.1000.2.6.5.3033-1 by Apache Hive
> 0: jdbc:hive2://zk0-nikhil.ae4yqb3genuuvaozdf> select cast ( cast 
> ("2021-03-14 01:00:00" as timestamp) as int);
> +-+--+
> | _c0 |
> +-+--+
> | 1615658400  |
> +-+--+
> 1 row selected (0.387 seconds)
> 0: jdbc:hive2://zk0-nikhil.ae4yqb3genuuvaozdf> select cast ( cast 
> ("2021-03-14 01:00:00" as timestamp) as bigint);
> +-+--+
> | _c0 |
> +-+--+
> | 1615658400  |
> +-+--+
> 1 row selected (0.369 seconds)
> 0: jdbc:hive2://zk0-nikhil.ae4yqb3genuuvaozdf> select cast ( cast 
> ("2021-03-14 01:00:00" as timestamp) as double);
> +--+--+
> | _c0  |
> +--+--+
> | 1.6156584E9  |
> +--+--+
> {noformat}
> *Hive 3.1, 4.0*
> {noformat}
> Connected to: Apache Hive (version 3.1.0.3.1.6.1-6)
> Driver: Hive JDBC (version 3.1.4.4.1.4.8)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> Beeline version 3.1.4.4.1.4.8 by Apache Hive
> 0: jdbc:hive2://zk0-nikhil.usmltwlt0ncuxmbost> select cast ( cast 
> ("2021-03-14 01:00:00" as timestamp) as int);
> +-+
> | _c0 |
> +-+
> | 1615683600  |
> +-+
> 1 row selected (0.666 seconds)
> 0: jdbc:hive2://zk0-nikhil.usmltwlt0ncuxmbost> select cast ( cast 
> ("2021-03-14 01:00:00" as timestamp) as bigint);
> +-+
> | _c0 |
> +-+
> | 1615683600  |
> +-+
> 1 row selected (0.536 seconds)
> 0: jdbc:hive2://zk0-nikhil.usmltwlt0ncuxmbost> select cast ( cast 
> ("2021-03-14 01:00:00" as timestamp) as double);
> +--+
> | _c0  |
> +--+
> | 1.6156836E9  |
> +--+
> 1 row selected (0.696 seconds)
> {noformat}
>  
> The issue occurs for non-UTC timezone.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)