[jira] [Assigned] (HIVE-23556) Support hive.metastore.limit.partition.request for get_partitions_ps

2022-02-28 Thread Toshihiko Uchida (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Toshihiko Uchida reassigned HIVE-23556:
---

Assignee: iBenny  (was: Toshihiko Uchida)

> Support hive.metastore.limit.partition.request for get_partitions_ps
> 
>
> Key: HIVE-23556
> URL: https://issues.apache.org/jira/browse/HIVE-23556
> Project: Hive
>  Issue Type: Improvement
>Reporter: Toshihiko Uchida
>Assignee: iBenny
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HIVE-23556.2.patch, HIVE-23556.3.patch, 
> HIVE-23556.4.patch, HIVE-23556.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HIVE-13884 added the configuration hive.metastore.limit.partition.request to 
> limit the number of partitions that can be requested.
> Currently, it takes in effect for the following MetaStore APIs
> * get_partitions,
> * get_partitions_with_auth,
> * get_partitions_by_filter,
> * get_partitions_spec_by_filter,
> * get_partitions_by_expr,
> but not for
> * get_partitions_ps,
> * get_partitions_ps_with_auth.
> This issue proposes to apply the configuration also to get_partitions_ps and 
> get_partitions_ps_with_auth.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HIVE-23556) Support hive.metastore.limit.partition.request for get_partitions_ps

2021-06-21 Thread Toshihiko Uchida (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17366918#comment-17366918
 ] 

Toshihiko Uchida commented on HIVE-23556:
-

[~kgyrtkirk] Thanks for taking a look at the issue! Got it.

> Support hive.metastore.limit.partition.request for get_partitions_ps
> 
>
> Key: HIVE-23556
> URL: https://issues.apache.org/jira/browse/HIVE-23556
> Project: Hive
>  Issue Type: Improvement
>Reporter: Toshihiko Uchida
>Assignee: Toshihiko Uchida
>Priority: Minor
> Attachments: HIVE-23556.2.patch, HIVE-23556.3.patch, 
> HIVE-23556.4.patch, HIVE-23556.patch
>
>
> HIVE-13884 added the configuration hive.metastore.limit.partition.request to 
> limit the number of partitions that can be requested.
> Currently, it takes in effect for the following MetaStore APIs
> * get_partitions,
> * get_partitions_with_auth,
> * get_partitions_by_filter,
> * get_partitions_spec_by_filter,
> * get_partitions_by_expr,
> but not for
> * get_partitions_ps,
> * get_partitions_ps_with_auth.
> This issue proposes to apply the configuration also to get_partitions_ps and 
> get_partitions_ps_with_auth.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23556) Support hive.metastore.limit.partition.request for get_partitions_ps

2020-07-21 Thread Toshihiko Uchida (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17162005#comment-17162005
 ] 

Toshihiko Uchida commented on HIVE-23556:
-

[~kgyrtkirk] Could you kindly review the patch or assign anyone has familiarity 
with MetaStore?


> Support hive.metastore.limit.partition.request for get_partitions_ps
> 
>
> Key: HIVE-23556
> URL: https://issues.apache.org/jira/browse/HIVE-23556
> Project: Hive
>  Issue Type: Improvement
>Reporter: Toshihiko Uchida
>Assignee: Toshihiko Uchida
>Priority: Minor
> Attachments: HIVE-23556.2.patch, HIVE-23556.3.patch, 
> HIVE-23556.4.patch, HIVE-23556.patch
>
>
> HIVE-13884 added the configuration hive.metastore.limit.partition.request to 
> limit the number of partitions that can be requested.
> Currently, it takes in effect for the following MetaStore APIs
> * get_partitions,
> * get_partitions_with_auth,
> * get_partitions_by_filter,
> * get_partitions_spec_by_filter,
> * get_partitions_by_expr,
> but not for
> * get_partitions_ps,
> * get_partitions_ps_with_auth.
> This issue proposes to apply the configuration also to get_partitions_ps and 
> get_partitions_ps_with_auth.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23556) Support hive.metastore.limit.partition.request for get_partitions_ps

2020-06-10 Thread Toshihiko Uchida (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17130777#comment-17130777
 ] 

Toshihiko Uchida commented on HIVE-23556:
-

[~kgyrtkirk] Thanks for your comment!
The Deprecated annotation for get_partitions_ps_with_auth has been added in 
HIVE-22017 very recently, and it is still called from the corresponding new API.
{code}
@Override
public GetPartitionsPsWithAuthResponse 
get_partitions_ps_with_auth_req(GetPartitionsPsWithAuthRequest req)
throws MetaException, NoSuchObjectException, TException {
  String dbName = MetaStoreUtils.prependCatalogToDbName(req.getCatName(), 
req.getDbName(), conf);
  List partitions = get_partitions_ps_with_auth(dbName, 
req.getTblName(),
  req.getPartVals(), req.getMaxParts(), req.getUserName(), 
req.getGroupNames());
  GetPartitionsPsWithAuthResponse res = new 
GetPartitionsPsWithAuthResponse();
  res.setPartitions(partitions);
  return res;
}
{code}
The same change has been applied to other APIs such as get_partitions as well.

About the usage, I'm not sure what you mean by "indirectly", but currently 
Spark calls it from Hive#getPartitions when no partition filter is provided.
https://github.com/apache/spark/blob/3a48ea1/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala#L736
Actually, this is why I noticed the issue.

> Support hive.metastore.limit.partition.request for get_partitions_ps
> 
>
> Key: HIVE-23556
> URL: https://issues.apache.org/jira/browse/HIVE-23556
> Project: Hive
>  Issue Type: Improvement
>Reporter: Toshihiko Uchida
>Assignee: Toshihiko Uchida
>Priority: Minor
> Attachments: HIVE-23556.2.patch, HIVE-23556.3.patch, 
> HIVE-23556.4.patch, HIVE-23556.patch
>
>
> HIVE-13884 added the configuration hive.metastore.limit.partition.request to 
> limit the number of partitions that can be requested.
> Currently, it takes in effect for the following MetaStore APIs
> * get_partitions,
> * get_partitions_with_auth,
> * get_partitions_by_filter,
> * get_partitions_spec_by_filter,
> * get_partitions_by_expr,
> but not for
> * get_partitions_ps,
> * get_partitions_ps_with_auth.
> This issue proposes to apply the configuration also to get_partitions_ps and 
> get_partitions_ps_with_auth.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23556) Support hive.metastore.limit.partition.request for get_partitions_ps

2020-06-08 Thread Toshihiko Uchida (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17128207#comment-17128207
 ] 

Toshihiko Uchida commented on HIVE-23556:
-

findbugs and asflicense errors do not seem to be related to HIVE-23556.4.patch.
- findbugs
{code}
[ERROR] Failed to execute goal 
org.codehaus.mojo:findbugs-maven-plugin:3.0.0:findbugs (default-cli) on project 
hive-standalone-metastore-common: Unable to parse configuration of mojo 
org.codehaus.mojo:findbugs-maven-plugin:3.0.0:findbugs for parameter 
pluginArtifacts: Cannot assign configuration entry 'pluginArtifacts' with value 
'${plugin.artifacts}' of type 
java.util.Collections.UnmodifiableRandomAccessList to property of type 
java.util.ArrayList -> [Help 1]
{code}
- asflicense
{code}
Lines that start with ? in the ASF License  report indicate files that do 
not have an Apache license header:
 !? .github/workflows/stale.yml
{code}

Submitted the patch to Review Board.

> Support hive.metastore.limit.partition.request for get_partitions_ps
> 
>
> Key: HIVE-23556
> URL: https://issues.apache.org/jira/browse/HIVE-23556
> Project: Hive
>  Issue Type: Improvement
>Reporter: Toshihiko Uchida
>Assignee: Toshihiko Uchida
>Priority: Minor
> Attachments: HIVE-23556.2.patch, HIVE-23556.3.patch, 
> HIVE-23556.4.patch, HIVE-23556.patch
>
>
> HIVE-13884 added the configuration hive.metastore.limit.partition.request to 
> limit the number of partitions that can be requested.
> Currently, it takes in effect for the following MetaStore APIs
> * get_partitions,
> * get_partitions_with_auth,
> * get_partitions_by_filter,
> * get_partitions_spec_by_filter,
> * get_partitions_by_expr,
> but not for
> * get_partitions_ps,
> * get_partitions_ps_with_auth.
> This issue proposes to apply the configuration also to get_partitions_ps and 
> get_partitions_ps_with_auth.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23556) Support hive.metastore.limit.partition.request for get_partitions_ps

2020-06-07 Thread Toshihiko Uchida (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Toshihiko Uchida updated HIVE-23556:

Attachment: HIVE-23556.4.patch
Status: Patch Available  (was: In Progress)

> Support hive.metastore.limit.partition.request for get_partitions_ps
> 
>
> Key: HIVE-23556
> URL: https://issues.apache.org/jira/browse/HIVE-23556
> Project: Hive
>  Issue Type: Improvement
>Reporter: Toshihiko Uchida
>Assignee: Toshihiko Uchida
>Priority: Minor
> Attachments: HIVE-23556.2.patch, HIVE-23556.3.patch, 
> HIVE-23556.4.patch, HIVE-23556.patch
>
>
> HIVE-13884 added the configuration hive.metastore.limit.partition.request to 
> limit the number of partitions that can be requested.
> Currently, it takes in effect for the following MetaStore APIs
> * get_partitions,
> * get_partitions_with_auth,
> * get_partitions_by_filter,
> * get_partitions_spec_by_filter,
> * get_partitions_by_expr,
> but not for
> * get_partitions_ps,
> * get_partitions_ps_with_auth.
> This issue proposes to apply the configuration also to get_partitions_ps and 
> get_partitions_ps_with_auth.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23556) Support hive.metastore.limit.partition.request for get_partitions_ps

2020-06-07 Thread Toshihiko Uchida (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Toshihiko Uchida updated HIVE-23556:

Status: In Progress  (was: Patch Available)

> Support hive.metastore.limit.partition.request for get_partitions_ps
> 
>
> Key: HIVE-23556
> URL: https://issues.apache.org/jira/browse/HIVE-23556
> Project: Hive
>  Issue Type: Improvement
>Reporter: Toshihiko Uchida
>Assignee: Toshihiko Uchida
>Priority: Minor
> Attachments: HIVE-23556.2.patch, HIVE-23556.3.patch, 
> HIVE-23556.4.patch, HIVE-23556.patch
>
>
> HIVE-13884 added the configuration hive.metastore.limit.partition.request to 
> limit the number of partitions that can be requested.
> Currently, it takes in effect for the following MetaStore APIs
> * get_partitions,
> * get_partitions_with_auth,
> * get_partitions_by_filter,
> * get_partitions_spec_by_filter,
> * get_partitions_by_expr,
> but not for
> * get_partitions_ps,
> * get_partitions_ps_with_auth.
> This issue proposes to apply the configuration also to get_partitions_ps and 
> get_partitions_ps_with_auth.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23556) Support hive.metastore.limit.partition.request for get_partitions_ps

2020-06-05 Thread Toshihiko Uchida (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Toshihiko Uchida updated HIVE-23556:

Attachment: HIVE-23556.3.patch
Status: Patch Available  (was: In Progress)

> Support hive.metastore.limit.partition.request for get_partitions_ps
> 
>
> Key: HIVE-23556
> URL: https://issues.apache.org/jira/browse/HIVE-23556
> Project: Hive
>  Issue Type: Improvement
>Reporter: Toshihiko Uchida
>Assignee: Toshihiko Uchida
>Priority: Minor
> Attachments: HIVE-23556.2.patch, HIVE-23556.3.patch, HIVE-23556.patch
>
>
> HIVE-13884 added the configuration hive.metastore.limit.partition.request to 
> limit the number of partitions that can be requested.
> Currently, it takes in effect for the following MetaStore APIs
> * get_partitions,
> * get_partitions_with_auth,
> * get_partitions_by_filter,
> * get_partitions_spec_by_filter,
> * get_partitions_by_expr,
> but not for
> * get_partitions_ps,
> * get_partitions_ps_with_auth.
> This issue proposes to apply the configuration also to get_partitions_ps and 
> get_partitions_ps_with_auth.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23556) Support hive.metastore.limit.partition.request for get_partitions_ps

2020-06-05 Thread Toshihiko Uchida (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Toshihiko Uchida updated HIVE-23556:

Status: In Progress  (was: Patch Available)

> Support hive.metastore.limit.partition.request for get_partitions_ps
> 
>
> Key: HIVE-23556
> URL: https://issues.apache.org/jira/browse/HIVE-23556
> Project: Hive
>  Issue Type: Improvement
>Reporter: Toshihiko Uchida
>Assignee: Toshihiko Uchida
>Priority: Minor
> Attachments: HIVE-23556.2.patch, HIVE-23556.patch
>
>
> HIVE-13884 added the configuration hive.metastore.limit.partition.request to 
> limit the number of partitions that can be requested.
> Currently, it takes in effect for the following MetaStore APIs
> * get_partitions,
> * get_partitions_with_auth,
> * get_partitions_by_filter,
> * get_partitions_spec_by_filter,
> * get_partitions_by_expr,
> but not for
> * get_partitions_ps,
> * get_partitions_ps_with_auth.
> This issue proposes to apply the configuration also to get_partitions_ps and 
> get_partitions_ps_with_auth.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23556) Support hive.metastore.limit.partition.request for get_partitions_ps

2020-06-04 Thread Toshihiko Uchida (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Toshihiko Uchida updated HIVE-23556:

Attachment: HIVE-23556.2.patch
Status: Patch Available  (was: Open)

> Support hive.metastore.limit.partition.request for get_partitions_ps
> 
>
> Key: HIVE-23556
> URL: https://issues.apache.org/jira/browse/HIVE-23556
> Project: Hive
>  Issue Type: Improvement
>Reporter: Toshihiko Uchida
>Assignee: Toshihiko Uchida
>Priority: Minor
> Attachments: HIVE-23556.2.patch, HIVE-23556.patch
>
>
> HIVE-13884 added the configuration hive.metastore.limit.partition.request to 
> limit the number of partitions that can be requested.
> Currently, it takes in effect for the following MetaStore APIs
> * get_partitions,
> * get_partitions_with_auth,
> * get_partitions_by_filter,
> * get_partitions_spec_by_filter,
> * get_partitions_by_expr,
> but not for
> * get_partitions_ps,
> * get_partitions_ps_with_auth.
> This issue proposes to apply the configuration also to get_partitions_ps and 
> get_partitions_ps_with_auth.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23556) Support hive.metastore.limit.partition.request for get_partitions_ps

2020-06-04 Thread Toshihiko Uchida (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Toshihiko Uchida updated HIVE-23556:

Status: Open  (was: Patch Available)

> Support hive.metastore.limit.partition.request for get_partitions_ps
> 
>
> Key: HIVE-23556
> URL: https://issues.apache.org/jira/browse/HIVE-23556
> Project: Hive
>  Issue Type: Improvement
>Reporter: Toshihiko Uchida
>Assignee: Toshihiko Uchida
>Priority: Minor
> Attachments: HIVE-23556.patch
>
>
> HIVE-13884 added the configuration hive.metastore.limit.partition.request to 
> limit the number of partitions that can be requested.
> Currently, it takes in effect for the following MetaStore APIs
> * get_partitions,
> * get_partitions_with_auth,
> * get_partitions_by_filter,
> * get_partitions_spec_by_filter,
> * get_partitions_by_expr,
> but not for
> * get_partitions_ps,
> * get_partitions_ps_with_auth.
> This issue proposes to apply the configuration also to get_partitions_ps and 
> get_partitions_ps_with_auth.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23502) 【hive on spark】 return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask

2020-05-28 Thread Toshihiko Uchida (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Toshihiko Uchida reassigned HIVE-23502:
---

Assignee: (was: Toshihiko Uchida)

> 【hive on spark】 return code 3 from 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask
> -
>
> Key: HIVE-23502
> URL: https://issues.apache.org/jira/browse/HIVE-23502
> Project: Hive
>  Issue Type: Bug
> Environment: hadoop 2.7.2   hive 1.2.1  sclala 2.9.x   spark 1.3.1
>Reporter: tom
>Priority: Blocker
>
> Spark UI Log:
>  
> 20/05/19 17:07:11 INFO exec.Utilities: No plan file found: 
> hdfs://mycluster/tmp/hive/root/a3b20597-61d1-47a9-86b1-dde289fded78/hive_2020-05-19_17-06-53_394_4024151029162597012-1/-mr-10003/c586ae6a-eefb-49fd-92b6-7593e57f0a93/map.xml
> 20/05/19 17:07:11 ERROR executor.Executor: Exception in task 0.0 in stage 0.0 
> (TID 0)
> java.lang.NullPointerException
>  at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255)
>  at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:437)
>  at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:430)
>  at 
> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587)
>  at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:236)
>  at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:212)
>  at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>  at org.apache.spark.scheduler.Task.run(Task.scala:64)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> 20/05/19 17:07:11 INFO executor.CoarseGrainedExecutorBackend: Got assigned 
> task 1
> 20/05/19 17:07:11 INFO executor.Executor: Running task 0.1 in stage 0.0 (TID 
> 1)
> 20/05/19 17:07:11 INFO rdd.HadoopRDD: Input split: 
> Paths:/user/hive/warehouse/orginfobig_fq/nd=2014/frcode=410503/fqdate=2014-01-01/part-m-0:0+100InputFormatClass:
>  org.apache.hadoop.mapred.TextInputFormat
> 20/05/19 17:07:11 INFO exec.Utilities: No plan file found: 
> hdfs://mycluster/tmp/hive/root/a3b20597-61d1-47a9-86b1-dde289fded78/hive_2020-05-19_17-06-53_394_4024151029162597012-1/-mr-10003/c586ae6a-eefb-49fd-92b6-7593e57f0a93/map.xml
> 20/05/19 17:07:11 ERROR executor.Executor: Exception in task 0.1 in stage 0.0 
> (TID 1)
> java.lang.NullPointerException
>  at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255)
>  at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:437)
>  at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:430)
>  at 
> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587)
>  at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:236)
>  at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:212)
>  at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>  at org.apache.spark.scheduler.Task.run(Task.scala:64)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> 20/05/19 17:19:19 INFO storage.BlockManager: Removing broadcast 1
> 20/05/19 17:19:19 INFO storage.BlockManager: Removing block broadcast_1
> 20/05/19 17:19:19 INFO storage.MemoryStore: Block broadcast_1 of size 189144 
> dropped from 

[jira] [Assigned] (HIVE-23502) 【hive on spark】 return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask

2020-05-28 Thread Toshihiko Uchida (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Toshihiko Uchida reassigned HIVE-23502:
---

Assignee: Toshihiko Uchida

> 【hive on spark】 return code 3 from 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask
> -
>
> Key: HIVE-23502
> URL: https://issues.apache.org/jira/browse/HIVE-23502
> Project: Hive
>  Issue Type: Bug
> Environment: hadoop 2.7.2   hive 1.2.1  sclala 2.9.x   spark 1.3.1
>Reporter: tom
>Assignee: Toshihiko Uchida
>Priority: Blocker
>
> Spark UI Log:
>  
> 20/05/19 17:07:11 INFO exec.Utilities: No plan file found: 
> hdfs://mycluster/tmp/hive/root/a3b20597-61d1-47a9-86b1-dde289fded78/hive_2020-05-19_17-06-53_394_4024151029162597012-1/-mr-10003/c586ae6a-eefb-49fd-92b6-7593e57f0a93/map.xml
> 20/05/19 17:07:11 ERROR executor.Executor: Exception in task 0.0 in stage 0.0 
> (TID 0)
> java.lang.NullPointerException
>  at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255)
>  at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:437)
>  at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:430)
>  at 
> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587)
>  at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:236)
>  at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:212)
>  at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>  at org.apache.spark.scheduler.Task.run(Task.scala:64)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> 20/05/19 17:07:11 INFO executor.CoarseGrainedExecutorBackend: Got assigned 
> task 1
> 20/05/19 17:07:11 INFO executor.Executor: Running task 0.1 in stage 0.0 (TID 
> 1)
> 20/05/19 17:07:11 INFO rdd.HadoopRDD: Input split: 
> Paths:/user/hive/warehouse/orginfobig_fq/nd=2014/frcode=410503/fqdate=2014-01-01/part-m-0:0+100InputFormatClass:
>  org.apache.hadoop.mapred.TextInputFormat
> 20/05/19 17:07:11 INFO exec.Utilities: No plan file found: 
> hdfs://mycluster/tmp/hive/root/a3b20597-61d1-47a9-86b1-dde289fded78/hive_2020-05-19_17-06-53_394_4024151029162597012-1/-mr-10003/c586ae6a-eefb-49fd-92b6-7593e57f0a93/map.xml
> 20/05/19 17:07:11 ERROR executor.Executor: Exception in task 0.1 in stage 0.0 
> (TID 1)
> java.lang.NullPointerException
>  at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255)
>  at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:437)
>  at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:430)
>  at 
> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587)
>  at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:236)
>  at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:212)
>  at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>  at org.apache.spark.scheduler.Task.run(Task.scala:64)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> 20/05/19 17:19:19 INFO storage.BlockManager: Removing broadcast 1
> 20/05/19 17:19:19 INFO storage.BlockManager: Removing block broadcast_1
> 20/05/19 17:19:19 INFO storage.MemoryStore: Block broadcast_1 of 

[jira] [Updated] (HIVE-23556) Support hive.metastore.limit.partition.request for get_partitions_ps

2020-05-27 Thread Toshihiko Uchida (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Toshihiko Uchida updated HIVE-23556:

Attachment: HIVE-23556.patch
  Assignee: Toshihiko Uchida
Status: Patch Available  (was: Open)

> Support hive.metastore.limit.partition.request for get_partitions_ps
> 
>
> Key: HIVE-23556
> URL: https://issues.apache.org/jira/browse/HIVE-23556
> Project: Hive
>  Issue Type: Improvement
>Reporter: Toshihiko Uchida
>Assignee: Toshihiko Uchida
>Priority: Minor
> Attachments: HIVE-23556.patch
>
>
> HIVE-13884 added the configuration hive.metastore.limit.partition.request to 
> limit the number of partitions that can be requested.
> Currently, it takes in effect for the following MetaStore APIs
> * get_partitions,
> * get_partitions_with_auth,
> * get_partitions_by_filter,
> * get_partitions_spec_by_filter,
> * get_partitions_by_expr,
> but not for
> * get_partitions_ps,
> * get_partitions_ps_with_auth.
> This issue proposes to apply the configuration also to get_partitions_ps and 
> get_partitions_ps_with_auth.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22967) Support hive.reloadable.aux.jars.path for Hive on Tez

2020-05-10 Thread Toshihiko Uchida (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17103849#comment-17103849
 ] 

Toshihiko Uchida commented on HIVE-22967:
-

[~jdere] and [~ashutoshc] Thanks for your review!

> Support hive.reloadable.aux.jars.path for Hive on Tez
> -
>
> Key: HIVE-22967
> URL: https://issues.apache.org/jira/browse/HIVE-22967
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2, 2.3.6
>Reporter: Toshihiko Uchida
>Assignee: Toshihiko Uchida
>Priority: Minor
> Fix For: 4.0.0
>
> Attachments: HIVE-22967.1.patch, HIVE-22967.2.patch
>
>
> The jars in hive.reloadable.aux.jars.path are not localized in Tez containers.
> As a result, any query utilizing those reloadable jars fails for Hive on Tez 
> due to ClassNotFoundException.
> {code}
> Error: Error while processing statement: FAILED: Execution Error, return code 
> 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, 
> vertexName=Map 1, vertexId=vertex_1578856704640_0087_1_00, diagnostics=[Task 
> failed, taskId=task_1578856704640_0087_1_00_01, diagnostics=[TaskAttempt 
> 0 failed, info=[Error: Error while running task ( failure) : 
> attempt_1578856704640_0087_1_00_01_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: Map operator initialization failed
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
> at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
> at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: Map operator initialization failed
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:354)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266)
> ... 16 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.RuntimeException: java.lang.ClassNotFoundException: 
> com.example.hive.udf.Lower
> at 
> org.apache.hadoop.hive.ql.exec.FilterOperator.initializeOp(FilterOperator.java:71)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator.initializeOp(VectorFilterOperator.java:83)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:573)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:525)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:386)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.initializeMapOperator(VectorMapOperator.java:591)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:317)
> ... 17 more
> Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: 
> com.example.hive.udf.Lower
> at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.getUdfClass(GenericUDFBridge.java:134)
> at 
> org.apache.hadoop.hive.ql.exec.FunctionRegistry.isStateful(FunctionRegistry.java:1492)
> at 
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.(ExprNodeGenericFuncEvaluator.java:111)
> at 
> 

[jira] [Commented] (HIVE-22967) Support hive.reloadable.aux.jars.path for Hive on Tez

2020-03-08 Thread Toshihiko Uchida (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17054654#comment-17054654
 ] 

Toshihiko Uchida commented on HIVE-22967:
-

The second patch just fixes the checkstyle warning.

> Support hive.reloadable.aux.jars.path for Hive on Tez
> -
>
> Key: HIVE-22967
> URL: https://issues.apache.org/jira/browse/HIVE-22967
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2, 2.3.6
>Reporter: Toshihiko Uchida
>Assignee: Toshihiko Uchida
>Priority: Minor
> Attachments: HIVE-22967.1.patch, HIVE-22967.2.patch
>
>
> The jars in hive.reloadable.aux.jars.path are not localized in Tez containers.
> As a result, any query utilizing those reloadable jars fails for Hive on Tez 
> due to ClassNotFoundException.
> {code}
> Error: Error while processing statement: FAILED: Execution Error, return code 
> 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, 
> vertexName=Map 1, vertexId=vertex_1578856704640_0087_1_00, diagnostics=[Task 
> failed, taskId=task_1578856704640_0087_1_00_01, diagnostics=[TaskAttempt 
> 0 failed, info=[Error: Error while running task ( failure) : 
> attempt_1578856704640_0087_1_00_01_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: Map operator initialization failed
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
> at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
> at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: Map operator initialization failed
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:354)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266)
> ... 16 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.RuntimeException: java.lang.ClassNotFoundException: 
> com.example.hive.udf.Lower
> at 
> org.apache.hadoop.hive.ql.exec.FilterOperator.initializeOp(FilterOperator.java:71)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator.initializeOp(VectorFilterOperator.java:83)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:573)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:525)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:386)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.initializeMapOperator(VectorMapOperator.java:591)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:317)
> ... 17 more
> Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: 
> com.example.hive.udf.Lower
> at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.getUdfClass(GenericUDFBridge.java:134)
> at 
> org.apache.hadoop.hive.ql.exec.FunctionRegistry.isStateful(FunctionRegistry.java:1492)
> at 
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.(ExprNodeGenericFuncEvaluator.java:111)
> at 
> 

[jira] [Updated] (HIVE-22967) Support hive.reloadable.aux.jars.path for Hive on Tez

2020-03-08 Thread Toshihiko Uchida (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Toshihiko Uchida updated HIVE-22967:

Attachment: HIVE-22967.2.patch

> Support hive.reloadable.aux.jars.path for Hive on Tez
> -
>
> Key: HIVE-22967
> URL: https://issues.apache.org/jira/browse/HIVE-22967
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2, 2.3.6
>Reporter: Toshihiko Uchida
>Assignee: Toshihiko Uchida
>Priority: Minor
> Attachments: HIVE-22967.1.patch, HIVE-22967.2.patch
>
>
> The jars in hive.reloadable.aux.jars.path are not localized in Tez containers.
> As a result, any query utilizing those reloadable jars fails for Hive on Tez 
> due to ClassNotFoundException.
> {code}
> Error: Error while processing statement: FAILED: Execution Error, return code 
> 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, 
> vertexName=Map 1, vertexId=vertex_1578856704640_0087_1_00, diagnostics=[Task 
> failed, taskId=task_1578856704640_0087_1_00_01, diagnostics=[TaskAttempt 
> 0 failed, info=[Error: Error while running task ( failure) : 
> attempt_1578856704640_0087_1_00_01_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: Map operator initialization failed
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
> at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
> at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: Map operator initialization failed
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:354)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266)
> ... 16 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.RuntimeException: java.lang.ClassNotFoundException: 
> com.example.hive.udf.Lower
> at 
> org.apache.hadoop.hive.ql.exec.FilterOperator.initializeOp(FilterOperator.java:71)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator.initializeOp(VectorFilterOperator.java:83)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:573)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:525)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:386)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.initializeMapOperator(VectorMapOperator.java:591)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:317)
> ... 17 more
> Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: 
> com.example.hive.udf.Lower
> at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.getUdfClass(GenericUDFBridge.java:134)
> at 
> org.apache.hadoop.hive.ql.exec.FunctionRegistry.isStateful(FunctionRegistry.java:1492)
> at 
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.(ExprNodeGenericFuncEvaluator.java:111)
> at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluatorFactory.get(ExprNodeEvaluatorFactory.java:58)
> at 
> 

[jira] [Commented] (HIVE-22967) Support hive.reloadable.aux.jars.path for Hive on Tez

2020-03-03 Thread Toshihiko Uchida (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17050692#comment-17050692
 ] 

Toshihiko Uchida commented on HIVE-22967:
-

The first patch localizes reloadable jars just like HIVE-14037 and HIVE-14142.
Let me fix the checkstyle warning.
{code}
./ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java:1098:String 
allFiles = HiveStringUtils.joinIgnoringEmpty(new String[]{auxJars, 
reloadableAuxJars, addedJars, addedFiles}, ',');: warning: Line is longer than 
120 characters (found 126).
./ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java:1119:String 
allFiles = HiveStringUtils.joinIgnoringEmpty(new String[]{auxJars, 
reloadableAuxJars, addedJars, addedFiles}, ',');: warning: Line is longer than 
120 characters (found 126).
{code}

> Support hive.reloadable.aux.jars.path for Hive on Tez
> -
>
> Key: HIVE-22967
> URL: https://issues.apache.org/jira/browse/HIVE-22967
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2, 2.3.6
>Reporter: Toshihiko Uchida
>Assignee: Toshihiko Uchida
>Priority: Minor
> Attachments: HIVE-22967.1.patch
>
>
> The jars in hive.reloadable.aux.jars.path are not localized in Tez containers.
> As a result, any query utilizing those reloadable jars fails for Hive on Tez 
> due to ClassNotFoundException.
> {code}
> Error: Error while processing statement: FAILED: Execution Error, return code 
> 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, 
> vertexName=Map 1, vertexId=vertex_1578856704640_0087_1_00, diagnostics=[Task 
> failed, taskId=task_1578856704640_0087_1_00_01, diagnostics=[TaskAttempt 
> 0 failed, info=[Error: Error while running task ( failure) : 
> attempt_1578856704640_0087_1_00_01_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: Map operator initialization failed
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
> at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
> at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: Map operator initialization failed
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:354)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266)
> ... 16 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.RuntimeException: java.lang.ClassNotFoundException: 
> com.example.hive.udf.Lower
> at 
> org.apache.hadoop.hive.ql.exec.FilterOperator.initializeOp(FilterOperator.java:71)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator.initializeOp(VectorFilterOperator.java:83)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:573)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:525)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:386)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.initializeMapOperator(VectorMapOperator.java:591)
> at 
> 

[jira] [Updated] (HIVE-22967) Support hive.reloadable.aux.jars.path for Hive on Tez

2020-03-03 Thread Toshihiko Uchida (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Toshihiko Uchida updated HIVE-22967:

Attachment: HIVE-22967.1.patch
Status: Patch Available  (was: In Progress)

> Support hive.reloadable.aux.jars.path for Hive on Tez
> -
>
> Key: HIVE-22967
> URL: https://issues.apache.org/jira/browse/HIVE-22967
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.6, 3.1.2
>Reporter: Toshihiko Uchida
>Assignee: Toshihiko Uchida
>Priority: Minor
> Attachments: HIVE-22967.1.patch
>
>
> The jars in hive.reloadable.aux.jars.path are not localized in Tez containers.
> As a result, any query utilizing those reloadable jars fails for Hive on Tez 
> due to ClassNotFoundException.
> {code}
> Error: Error while processing statement: FAILED: Execution Error, return code 
> 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, 
> vertexName=Map 1, vertexId=vertex_1578856704640_0087_1_00, diagnostics=[Task 
> failed, taskId=task_1578856704640_0087_1_00_01, diagnostics=[TaskAttempt 
> 0 failed, info=[Error: Error while running task ( failure) : 
> attempt_1578856704640_0087_1_00_01_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: Map operator initialization failed
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
> at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
> at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: Map operator initialization failed
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:354)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266)
> ... 16 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.RuntimeException: java.lang.ClassNotFoundException: 
> com.example.hive.udf.Lower
> at 
> org.apache.hadoop.hive.ql.exec.FilterOperator.initializeOp(FilterOperator.java:71)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator.initializeOp(VectorFilterOperator.java:83)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:573)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:525)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:386)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.initializeMapOperator(VectorMapOperator.java:591)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:317)
> ... 17 more
> Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: 
> com.example.hive.udf.Lower
> at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.getUdfClass(GenericUDFBridge.java:134)
> at 
> org.apache.hadoop.hive.ql.exec.FunctionRegistry.isStateful(FunctionRegistry.java:1492)
> at 
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.(ExprNodeGenericFuncEvaluator.java:111)
> at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluatorFactory.get(ExprNodeEvaluatorFactory.java:58)
> at 
> 

[jira] [Assigned] (HIVE-22967) Support hive.reloadable.aux.jars.path for Hive on Tez

2020-03-03 Thread Toshihiko Uchida (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Toshihiko Uchida reassigned HIVE-22967:
---


> Support hive.reloadable.aux.jars.path for Hive on Tez
> -
>
> Key: HIVE-22967
> URL: https://issues.apache.org/jira/browse/HIVE-22967
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.6, 3.1.2
>Reporter: Toshihiko Uchida
>Assignee: Toshihiko Uchida
>Priority: Minor
>
> The jars in hive.reloadable.aux.jars.path are not localized in Tez containers.
> As a result, any query utilizing those reloadable jars fails for Hive on Tez 
> due to ClassNotFoundException.
> {code}
> Error: Error while processing statement: FAILED: Execution Error, return code 
> 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, 
> vertexName=Map 1, vertexId=vertex_1578856704640_0087_1_00, diagnostics=[Task 
> failed, taskId=task_1578856704640_0087_1_00_01, diagnostics=[TaskAttempt 
> 0 failed, info=[Error: Error while running task ( failure) : 
> attempt_1578856704640_0087_1_00_01_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: Map operator initialization failed
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
> at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
> at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: Map operator initialization failed
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:354)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266)
> ... 16 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.RuntimeException: java.lang.ClassNotFoundException: 
> com.example.hive.udf.Lower
> at 
> org.apache.hadoop.hive.ql.exec.FilterOperator.initializeOp(FilterOperator.java:71)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator.initializeOp(VectorFilterOperator.java:83)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:573)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:525)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:386)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.initializeMapOperator(VectorMapOperator.java:591)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:317)
> ... 17 more
> Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: 
> com.example.hive.udf.Lower
> at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.getUdfClass(GenericUDFBridge.java:134)
> at 
> org.apache.hadoop.hive.ql.exec.FunctionRegistry.isStateful(FunctionRegistry.java:1492)
> at 
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.(ExprNodeGenericFuncEvaluator.java:111)
> at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluatorFactory.get(ExprNodeEvaluatorFactory.java:58)
> at 
> org.apache.hadoop.hive.ql.exec.FilterOperator.initializeOp(FilterOperator.java:63)
> ... 24 more
> Caused by: 

[jira] [Work started] (HIVE-22967) Support hive.reloadable.aux.jars.path for Hive on Tez

2020-03-03 Thread Toshihiko Uchida (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-22967 started by Toshihiko Uchida.
---
> Support hive.reloadable.aux.jars.path for Hive on Tez
> -
>
> Key: HIVE-22967
> URL: https://issues.apache.org/jira/browse/HIVE-22967
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2, 2.3.6
>Reporter: Toshihiko Uchida
>Assignee: Toshihiko Uchida
>Priority: Minor
>
> The jars in hive.reloadable.aux.jars.path are not localized in Tez containers.
> As a result, any query utilizing those reloadable jars fails for Hive on Tez 
> due to ClassNotFoundException.
> {code}
> Error: Error while processing statement: FAILED: Execution Error, return code 
> 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, 
> vertexName=Map 1, vertexId=vertex_1578856704640_0087_1_00, diagnostics=[Task 
> failed, taskId=task_1578856704640_0087_1_00_01, diagnostics=[TaskAttempt 
> 0 failed, info=[Error: Error while running task ( failure) : 
> attempt_1578856704640_0087_1_00_01_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: Map operator initialization failed
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
> at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
> at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: Map operator initialization failed
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:354)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266)
> ... 16 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.RuntimeException: java.lang.ClassNotFoundException: 
> com.example.hive.udf.Lower
> at 
> org.apache.hadoop.hive.ql.exec.FilterOperator.initializeOp(FilterOperator.java:71)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator.initializeOp(VectorFilterOperator.java:83)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:573)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:525)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:386)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.initializeMapOperator(VectorMapOperator.java:591)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:317)
> ... 17 more
> Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: 
> com.example.hive.udf.Lower
> at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.getUdfClass(GenericUDFBridge.java:134)
> at 
> org.apache.hadoop.hive.ql.exec.FunctionRegistry.isStateful(FunctionRegistry.java:1492)
> at 
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.(ExprNodeGenericFuncEvaluator.java:111)
> at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluatorFactory.get(ExprNodeEvaluatorFactory.java:58)
> at 
> org.apache.hadoop.hive.ql.exec.FilterOperator.initializeOp(FilterOperator.java:63)
> ... 24 more
> Caused by: 

[jira] [Commented] (HIVE-22453) Describe table unnecessarily fetches partitions

2020-03-01 Thread Toshihiko Uchida (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17048618#comment-17048618
 ] 

Toshihiko Uchida commented on HIVE-22453:
-

[~vgarg]
Thanks for your review and support, too.

> Describe table unnecessarily fetches partitions
> ---
>
> Key: HIVE-22453
> URL: https://issues.apache.org/jira/browse/HIVE-22453
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2, 2.3.6
>Reporter: Toshihiko Uchida
>Assignee: Toshihiko Uchida
>Priority: Minor
> Fix For: 4.0.0
>
> Attachments: HIVE-22453.2.patch, HIVE-22453.2.patch, 
> HIVE-22453.3.patch, HIVE-22453.4.patch, HIVE-22453.patch
>
>
> The simple describe table command without EXTENDED and FORMATTED (i.e., 
> DESCRIBE table_name) fetches all partitions when no partition is specified, 
> although it does not display partition statistics in nature.
> The command should not fetch partitions since it can take a long time for a 
> large amount of partitions.
> For instance, in our environment, the command takes around 8 seconds for a 
> table with 8760 (24 * 365) partitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22453) Describe table unnecessarily fetches partitions

2020-02-27 Thread Toshihiko Uchida (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17046593#comment-17046593
 ] 

Toshihiko Uchida commented on HIVE-22453:
-

[~vgarg]
Rebased and uploaded the patch.

> Describe table unnecessarily fetches partitions
> ---
>
> Key: HIVE-22453
> URL: https://issues.apache.org/jira/browse/HIVE-22453
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2, 2.3.6
>Reporter: Toshihiko Uchida
>Assignee: Toshihiko Uchida
>Priority: Minor
> Attachments: HIVE-22453.2.patch, HIVE-22453.2.patch, 
> HIVE-22453.3.patch, HIVE-22453.4.patch, HIVE-22453.patch
>
>
> The simple describe table command without EXTENDED and FORMATTED (i.e., 
> DESCRIBE table_name) fetches all partitions when no partition is specified, 
> although it does not display partition statistics in nature.
> The command should not fetch partitions since it can take a long time for a 
> large amount of partitions.
> For instance, in our environment, the command takes around 8 seconds for a 
> table with 8760 (24 * 365) partitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22453) Describe table unnecessarily fetches partitions

2020-02-27 Thread Toshihiko Uchida (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Toshihiko Uchida updated HIVE-22453:

Attachment: HIVE-22453.4.patch

> Describe table unnecessarily fetches partitions
> ---
>
> Key: HIVE-22453
> URL: https://issues.apache.org/jira/browse/HIVE-22453
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2, 2.3.6
>Reporter: Toshihiko Uchida
>Assignee: Toshihiko Uchida
>Priority: Minor
> Attachments: HIVE-22453.2.patch, HIVE-22453.2.patch, 
> HIVE-22453.3.patch, HIVE-22453.4.patch, HIVE-22453.patch
>
>
> The simple describe table command without EXTENDED and FORMATTED (i.e., 
> DESCRIBE table_name) fetches all partitions when no partition is specified, 
> although it does not display partition statistics in nature.
> The command should not fetch partitions since it can take a long time for a 
> large amount of partitions.
> For instance, in our environment, the command takes around 8 seconds for a 
> table with 8760 (24 * 365) partitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22453) Describe table unnecessarily fetches partitions

2020-02-16 Thread Toshihiko Uchida (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17037839#comment-17037839
 ] 

Toshihiko Uchida commented on HIVE-22453:
-

[~vgarg]
Thanks for resubmitting my patch!
Finally, all tests were passed.

> Describe table unnecessarily fetches partitions
> ---
>
> Key: HIVE-22453
> URL: https://issues.apache.org/jira/browse/HIVE-22453
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2, 2.3.6
>Reporter: Toshihiko Uchida
>Assignee: Toshihiko Uchida
>Priority: Minor
> Attachments: HIVE-22453.2.patch, HIVE-22453.2.patch, 
> HIVE-22453.3.patch, HIVE-22453.patch
>
>
> The simple describe table command without EXTENDED and FORMATTED (i.e., 
> DESCRIBE table_name) fetches all partitions when no partition is specified, 
> although it does not display partition statistics in nature.
> The command should not fetch partitions since it can take a long time for a 
> large amount of partitions.
> For instance, in our environment, the command takes around 8 seconds for a 
> table with 8760 (24 * 365) partitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-22453) Describe table unnecessarily fetches partitions

2020-02-09 Thread Toshihiko Uchida (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17033162#comment-17033162
 ] 

Toshihiko Uchida edited comment on HIVE-22453 at 2/9/20 10:08 AM:
--

[~vgarg]
Thanks for taking a look at this issue.

> Can you rebase and reupload the patch?
Sure.

Resubmitted the second patch to retest it.


was (Author: touchida):
[~vgarg]
Thanks for taking a look at this issue.

> Can you rebase and reupload the patch?
Sure.

Resubmitted the second patch (HIVE-22453.2.patch) to retest it.

> Describe table unnecessarily fetches partitions
> ---
>
> Key: HIVE-22453
> URL: https://issues.apache.org/jira/browse/HIVE-22453
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2, 2.3.6
>Reporter: Toshihiko Uchida
>Assignee: Toshihiko Uchida
>Priority: Minor
> Attachments: HIVE-22453.2.patch, HIVE-22453.2.patch, HIVE-22453.patch
>
>
> The simple describe table command without EXTENDED and FORMATTED (i.e., 
> DESCRIBE table_name) fetches all partitions when no partition is specified, 
> although it does not display partition statistics in nature.
> The command should not fetch partitions since it can take a long time for a 
> large amount of partitions.
> For instance, in our environment, the command takes around 8 seconds for a 
> table with 8760 (24 * 365) partitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22453) Describe table unnecessarily fetches partitions

2020-02-09 Thread Toshihiko Uchida (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17033162#comment-17033162
 ] 

Toshihiko Uchida commented on HIVE-22453:
-

[~vgarg]
Thanks for taking a look at this issue.

> Can you rebase and reupload the patch?
Sure.

Resubmitted the second patch (HIVE-22453.2.patch) to retest it.

> Describe table unnecessarily fetches partitions
> ---
>
> Key: HIVE-22453
> URL: https://issues.apache.org/jira/browse/HIVE-22453
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2, 2.3.6
>Reporter: Toshihiko Uchida
>Assignee: Toshihiko Uchida
>Priority: Minor
> Attachments: HIVE-22453.2.patch, HIVE-22453.2.patch, HIVE-22453.patch
>
>
> The simple describe table command without EXTENDED and FORMATTED (i.e., 
> DESCRIBE table_name) fetches all partitions when no partition is specified, 
> although it does not display partition statistics in nature.
> The command should not fetch partitions since it can take a long time for a 
> large amount of partitions.
> For instance, in our environment, the command takes around 8 seconds for a 
> table with 8760 (24 * 365) partitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22453) Describe table unnecessarily fetches partitions

2020-02-09 Thread Toshihiko Uchida (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Toshihiko Uchida updated HIVE-22453:

Attachment: HIVE-22453.2.patch
Status: Patch Available  (was: Open)

> Describe table unnecessarily fetches partitions
> ---
>
> Key: HIVE-22453
> URL: https://issues.apache.org/jira/browse/HIVE-22453
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.6, 3.1.2
>Reporter: Toshihiko Uchida
>Assignee: Toshihiko Uchida
>Priority: Minor
> Attachments: HIVE-22453.2.patch, HIVE-22453.2.patch, HIVE-22453.patch
>
>
> The simple describe table command without EXTENDED and FORMATTED (i.e., 
> DESCRIBE table_name) fetches all partitions when no partition is specified, 
> although it does not display partition statistics in nature.
> The command should not fetch partitions since it can take a long time for a 
> large amount of partitions.
> For instance, in our environment, the command takes around 8 seconds for a 
> table with 8760 (24 * 365) partitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22453) Describe table unnecessarily fetches partitions

2020-02-09 Thread Toshihiko Uchida (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Toshihiko Uchida updated HIVE-22453:

Status: Open  (was: Patch Available)

> Describe table unnecessarily fetches partitions
> ---
>
> Key: HIVE-22453
> URL: https://issues.apache.org/jira/browse/HIVE-22453
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.6, 3.1.2
>Reporter: Toshihiko Uchida
>Assignee: Toshihiko Uchida
>Priority: Minor
> Attachments: HIVE-22453.2.patch, HIVE-22453.patch
>
>
> The simple describe table command without EXTENDED and FORMATTED (i.e., 
> DESCRIBE table_name) fetches all partitions when no partition is specified, 
> although it does not display partition statistics in nature.
> The command should not fetch partitions since it can take a long time for a 
> large amount of partitions.
> For instance, in our environment, the command takes around 8 seconds for a 
> table with 8760 (24 * 365) partitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-13034) Add jdeb plugin to build debian

2020-02-08 Thread Toshihiko Uchida (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-13034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Toshihiko Uchida reassigned HIVE-13034:
---

Assignee: Arshad Matin  (was: Toshihiko Uchida)

> Add jdeb plugin to build debian
> ---
>
> Key: HIVE-13034
> URL: https://issues.apache.org/jira/browse/HIVE-13034
> Project: Hive
>  Issue Type: Improvement
>  Components: Build Infrastructure
>Affects Versions: 2.1.0
>Reporter: Arshad Matin
>Assignee: Arshad Matin
>Priority: Major
> Fix For: 2.1.0
>
> Attachments: HIVE-13034.1.patch, HIVE-13034.patch
>
>
> It would be nice to also generate a debian as a part of build. This can be 
> done by adding jdeb plugin to dist profile.
> NO PRECOMMIT TESTS



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-13034) Add jdeb plugin to build debian

2020-02-08 Thread Toshihiko Uchida (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-13034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Toshihiko Uchida reassigned HIVE-13034:
---

Assignee: Toshihiko Uchida  (was: Arshad Matin)

> Add jdeb plugin to build debian
> ---
>
> Key: HIVE-13034
> URL: https://issues.apache.org/jira/browse/HIVE-13034
> Project: Hive
>  Issue Type: Improvement
>  Components: Build Infrastructure
>Affects Versions: 2.1.0
>Reporter: Arshad Matin
>Assignee: Toshihiko Uchida
>Priority: Major
> Fix For: 2.1.0
>
> Attachments: HIVE-13034.1.patch, HIVE-13034.patch
>
>
> It would be nice to also generate a debian as a part of build. This can be 
> done by adding jdeb plugin to dist profile.
> NO PRECOMMIT TESTS



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22453) Describe table unnecessarily fetches partitions

2019-11-13 Thread Toshihiko Uchida (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16973545#comment-16973545
 ] 

Toshihiko Uchida commented on HIVE-22453:
-

The second patch just adds a comment.
Both test failures would not be related to the patches, since the tests do not 
call DESCRIBE and succeeded in my local test.

[~ashutoshc]
Could you kindly take a look at the patch?


> Describe table unnecessarily fetches partitions
> ---
>
> Key: HIVE-22453
> URL: https://issues.apache.org/jira/browse/HIVE-22453
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2, 2.3.6
>Reporter: Toshihiko Uchida
>Assignee: Toshihiko Uchida
>Priority: Minor
> Attachments: HIVE-22453.2.patch, HIVE-22453.patch
>
>
> The simple describe table command without EXTENDED and FORMATTED (i.e., 
> DESCRIBE table_name) fetches all partitions when no partition is specified, 
> although it does not display partition statistics in nature.
> The command should not fetch partitions since it can take a long time for a 
> large amount of partitions.
> For instance, in our environment, the command takes around 8 seconds for a 
> table with 8760 (24 * 365) partitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22453) Describe table unnecessarily fetches partitions

2019-11-12 Thread Toshihiko Uchida (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Toshihiko Uchida updated HIVE-22453:

Attachment: HIVE-22453.2.patch

> Describe table unnecessarily fetches partitions
> ---
>
> Key: HIVE-22453
> URL: https://issues.apache.org/jira/browse/HIVE-22453
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2, 2.3.6
>Reporter: Toshihiko Uchida
>Assignee: Toshihiko Uchida
>Priority: Minor
> Attachments: HIVE-22453.2.patch, HIVE-22453.patch
>
>
> The simple describe table command without EXTENDED and FORMATTED (i.e., 
> DESCRIBE table_name) fetches all partitions when no partition is specified, 
> although it does not display partition statistics in nature.
> The command should not fetch partitions since it can take a long time for a 
> large amount of partitions.
> For instance, in our environment, the command takes around 8 seconds for a 
> table with 8760 (24 * 365) partitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22453) Describe table unnecessarily fetches partitions

2019-11-04 Thread Toshihiko Uchida (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Toshihiko Uchida updated HIVE-22453:

Attachment: HIVE-22453.patch
  Assignee: Toshihiko Uchida
Status: Patch Available  (was: Open)

> Describe table unnecessarily fetches partitions
> ---
>
> Key: HIVE-22453
> URL: https://issues.apache.org/jira/browse/HIVE-22453
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.6, 3.1.2
>Reporter: Toshihiko Uchida
>Assignee: Toshihiko Uchida
>Priority: Minor
> Attachments: HIVE-22453.patch
>
>
> The simple describe table command without EXTENDED and FORMATTED (i.e., 
> DESCRIBE table_name) fetches all partitions when no partition is specified, 
> although it does not display partition statistics in nature.
> The command should not fetch partitions since it can take a long time for a 
> large amount of partitions.
> For instance, in our environment, the command takes around 8 seconds for a 
> table with 8760 (24 * 365) partitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22453) Describe table unnecessarily fetches partitions

2019-11-04 Thread Toshihiko Uchida (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966935#comment-16966935
 ] 

Toshihiko Uchida commented on HIVE-22453:
-

HIVE-21485 also reports a performance issue on the describe table command, and 
tries to resolve it by introducing a runtime parameter that determines whether 
partition statistics are displayed or not.
In the case of the describe table command without EXTENDED and FORMATTED, 
however, partitions should not be fetched regardless of the parameter.

> Describe table unnecessarily fetches partitions
> ---
>
> Key: HIVE-22453
> URL: https://issues.apache.org/jira/browse/HIVE-22453
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2, 2.3.6
>Reporter: Toshihiko Uchida
>Assignee: Toshihiko Uchida
>Priority: Minor
> Attachments: HIVE-22453.patch
>
>
> The simple describe table command without EXTENDED and FORMATTED (i.e., 
> DESCRIBE table_name) fetches all partitions when no partition is specified, 
> although it does not display partition statistics in nature.
> The command should not fetch partitions since it can take a long time for a 
> large amount of partitions.
> For instance, in our environment, the command takes around 8 seconds for a 
> table with 8760 (24 * 365) partitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22373) File Merge tasks fail when containers are reused

2019-10-20 Thread Toshihiko Uchida (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Toshihiko Uchida updated HIVE-22373:

Attachment: HIVE-22373.patch
Status: Patch Available  (was: In Progress)

> File Merge tasks fail when containers are reused
> 
>
> Key: HIVE-22373
> URL: https://issues.apache.org/jira/browse/HIVE-22373
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2
>Reporter: Toshihiko Uchida
>Assignee: Toshihiko Uchida
>Priority: Major
> Attachments: HIVE-22373.patch
>
>
> h1. Problems
> Setting tez.am.container.reuse.enabled=true allows for containers to be 
> reused across multiple tasks.
> When two File Merge tasks run on the same container, the last task fails in 
> renaming the output path.
> Below is an error log of the task 01_0 on the container 
> container_e87_1570604853053_11564_01_03, where the task 04_0 ran 
> before the task 01_0.
> It shows that the task 01_0's output file name is taken from the previous 
> task id 04_0 mistakenly.
> {code}
> 2019-10-15 13:00:31,438 [ERROR] [TezChild] |tez.TezProcessor|: 
> java.lang.RuntimeException: Hive Runtime Error while closing operators
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.close(MergeFileRecordProcessor.java:188)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:284)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MergeFileTezProcessor.run(MergeFileTezProcessor.java:42)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
>   at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to close 
> AbstractFileMergeOperator
>   at 
> org.apache.hadoop.hive.ql.exec.AbstractFileMergeOperator.closeOp(AbstractFileMergeOperator.java:315)
>   at 
> org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.closeOp(OrcFileMergeOperator.java:265)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:733)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.close(MergeFileRecordProcessor.java:180)
>   ... 17 more
> Caused by: java.io.IOException: Unable to rename 
> viewfs:///user//.hive-staging_hive_2019-10-15_12-59-32_916_2461818728035733124-15476/_task_tmp.-ext-1/_tmp.04_0
>  to 
> viewfs:///user//.hive-staging_hive_2019-10-15_12-59-32_916_2461818728035733124-15476/_tmp.-ext-1/04_0
>   at 
> org.apache.hadoop.hive.ql.exec.AbstractFileMergeOperator.closeOp(AbstractFileMergeOperator.java:254)
>   ... 20 more
> {code}
> h1. Causes
> When AbstractFileMergeOperator is initialized, taskId is updated only for the 
> first time.
> - AbstractFileMergeOperator.java
> {code}
> private void updatePaths(Path tp, Path ttp) {
>   if (taskId == null) {
> taskId = Utilities.getTaskId(jc);
>   }
> {code}
> It leads to the above conflict of the output file names.
> h1. Solutions
> Remove the null-checking conditional, which was introduced in HIVE-14640, and 
> update taskId from JobConf whenever the operator is initialized.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-22373) File Merge tasks fail when containers are reused

2019-10-20 Thread Toshihiko Uchida (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Toshihiko Uchida reassigned HIVE-22373:
---

Assignee: Toshihiko Uchida

> File Merge tasks fail when containers are reused
> 
>
> Key: HIVE-22373
> URL: https://issues.apache.org/jira/browse/HIVE-22373
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2
>Reporter: Toshihiko Uchida
>Assignee: Toshihiko Uchida
>Priority: Major
>
> h1. Problems
> Setting tez.am.container.reuse.enabled=true allows for containers to be 
> reused across multiple tasks.
> When two File Merge tasks run on the same container, the last task fails in 
> renaming the output path.
> Below is an error log of the task 01_0 on the container 
> container_e87_1570604853053_11564_01_03, where the task 04_0 ran 
> before the task 01_0.
> It shows that the task 01_0's output file name is taken from the previous 
> task id 04_0 mistakenly.
> {code}
> 2019-10-15 13:00:31,438 [ERROR] [TezChild] |tez.TezProcessor|: 
> java.lang.RuntimeException: Hive Runtime Error while closing operators
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.close(MergeFileRecordProcessor.java:188)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:284)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MergeFileTezProcessor.run(MergeFileTezProcessor.java:42)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
>   at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to close 
> AbstractFileMergeOperator
>   at 
> org.apache.hadoop.hive.ql.exec.AbstractFileMergeOperator.closeOp(AbstractFileMergeOperator.java:315)
>   at 
> org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.closeOp(OrcFileMergeOperator.java:265)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:733)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.close(MergeFileRecordProcessor.java:180)
>   ... 17 more
> Caused by: java.io.IOException: Unable to rename 
> viewfs:///user//.hive-staging_hive_2019-10-15_12-59-32_916_2461818728035733124-15476/_task_tmp.-ext-1/_tmp.04_0
>  to 
> viewfs:///user//.hive-staging_hive_2019-10-15_12-59-32_916_2461818728035733124-15476/_tmp.-ext-1/04_0
>   at 
> org.apache.hadoop.hive.ql.exec.AbstractFileMergeOperator.closeOp(AbstractFileMergeOperator.java:254)
>   ... 20 more
> {code}
> h1. Causes
> When AbstractFileMergeOperator is initialized, taskId is updated only for the 
> first time.
> - AbstractFileMergeOperator.java
> {code}
> private void updatePaths(Path tp, Path ttp) {
>   if (taskId == null) {
> taskId = Utilities.getTaskId(jc);
>   }
> {code}
> It leads to the above conflict of the output file names.
> h1. Solutions
> Remove the null-checking conditional, which was introduced in HIVE-14640, and 
> update taskId from JobConf whenever the operator is initialized.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HIVE-22373) File Merge tasks fail when containers are reused

2019-10-20 Thread Toshihiko Uchida (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-22373 started by Toshihiko Uchida.
---
> File Merge tasks fail when containers are reused
> 
>
> Key: HIVE-22373
> URL: https://issues.apache.org/jira/browse/HIVE-22373
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2
>Reporter: Toshihiko Uchida
>Assignee: Toshihiko Uchida
>Priority: Major
>
> h1. Problems
> Setting tez.am.container.reuse.enabled=true allows for containers to be 
> reused across multiple tasks.
> When two File Merge tasks run on the same container, the last task fails in 
> renaming the output path.
> Below is an error log of the task 01_0 on the container 
> container_e87_1570604853053_11564_01_03, where the task 04_0 ran 
> before the task 01_0.
> It shows that the task 01_0's output file name is taken from the previous 
> task id 04_0 mistakenly.
> {code}
> 2019-10-15 13:00:31,438 [ERROR] [TezChild] |tez.TezProcessor|: 
> java.lang.RuntimeException: Hive Runtime Error while closing operators
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.close(MergeFileRecordProcessor.java:188)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:284)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MergeFileTezProcessor.run(MergeFileTezProcessor.java:42)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
>   at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to close 
> AbstractFileMergeOperator
>   at 
> org.apache.hadoop.hive.ql.exec.AbstractFileMergeOperator.closeOp(AbstractFileMergeOperator.java:315)
>   at 
> org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.closeOp(OrcFileMergeOperator.java:265)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:733)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.close(MergeFileRecordProcessor.java:180)
>   ... 17 more
> Caused by: java.io.IOException: Unable to rename 
> viewfs:///user//.hive-staging_hive_2019-10-15_12-59-32_916_2461818728035733124-15476/_task_tmp.-ext-1/_tmp.04_0
>  to 
> viewfs:///user//.hive-staging_hive_2019-10-15_12-59-32_916_2461818728035733124-15476/_tmp.-ext-1/04_0
>   at 
> org.apache.hadoop.hive.ql.exec.AbstractFileMergeOperator.closeOp(AbstractFileMergeOperator.java:254)
>   ... 20 more
> {code}
> h1. Causes
> When AbstractFileMergeOperator is initialized, taskId is updated only for the 
> first time.
> - AbstractFileMergeOperator.java
> {code}
> private void updatePaths(Path tp, Path ttp) {
>   if (taskId == null) {
> taskId = Utilities.getTaskId(jc);
>   }
> {code}
> It leads to the above conflict of the output file names.
> h1. Solutions
> Remove the null-checking conditional, which was introduced in HIVE-14640, and 
> update taskId from JobConf whenever the operator is initialized.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)