from:"wang \(Jira\)"

[jira] [Created] (HIVE-26291) Ranger client file descriptor leak

2022-06-04 Thread Adrian Wang (Jira)

Adrian Wang created HIVE-26291:
--

 Summary: Ranger client file descriptor leak
 Key: HIVE-26291
 URL: https://issues.apache.org/jira/browse/HIVE-26291
 Project: Hive
  Issue Type: Improvement
Reporter: Adrian Wang
Assignee: Adrian Wang


Ranger Client has an fd leak



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (HIVE-26174) ALTER TABLE RENAME TO should check new db location

2022-04-25 Thread Adrian Wang (Jira)

Adrian Wang created HIVE-26174:
--

 Summary: ALTER TABLE RENAME TO should check new db location
 Key: HIVE-26174
 URL: https://issues.apache.org/jira/browse/HIVE-26174
 Project: Hive
  Issue Type: Improvement
Reporter: Adrian Wang
Assignee: Adrian Wang


Currently, if we run 
ALTER TABLE db1.table1 RENAME TO db2.table2;
and with `db1` and `db2` on different filesystem, for example `db1` as 
`"hdfs:/user/hive/warehouse/db1.db"`, and `db2` as 
`"s3://bucket/s3warehouse/db2.db"`, the new `db2.table2` will be under location 
`hdfs:/s3warehouse/db2.db/table2`, which looks quite strange.

The idea is to ban this kind of operation, as we seem to intend to ban that, 
but the check was done after we changed file system scheme so it was always 
true.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (HIVE-26032) Upgrade cron-utils to 9.1.6

2022-03-13 Thread Yuming Wang (Jira)

Yuming Wang created HIVE-26032:
--

 Summary: Upgrade cron-utils to 9.1.6
 Key: HIVE-26032
 URL: https://issues.apache.org/jira/browse/HIVE-26032
 Project: Hive
  Issue Type: Task
  Components: Hive
Affects Versions: 4.0.0
Reporter: Yuming Wang


To fix [CVE-2021-41269|https://nvd.nist.gov/vuln/detail/CVE-2021-41269] issue.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (HIVE-26030) Backport HIVE-21498 to branch-2.3

2022-03-11 Thread Yuming Wang (Jira)

Yuming Wang created HIVE-26030:
--

 Summary: Backport HIVE-21498 to branch-2.3
 Key: HIVE-26030
 URL: https://issues.apache.org/jira/browse/HIVE-26030
 Project: Hive
  Issue Type: Task
  Components: Thrift API
Affects Versions: 2.3.9
Reporter: Yuming Wang






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (HIVE-25996) Backport HIVE-21498 and HIVE-25098 to fix CVE-2020-13949

2022-03-01 Thread Yuming Wang (Jira)

Yuming Wang created HIVE-25996:
--

 Summary: Backport HIVE-21498 and HIVE-25098 to fix CVE-2020-13949
 Key: HIVE-25996
 URL: https://issues.apache.org/jira/browse/HIVE-25996
 Project: Hive
  Issue Type: Improvement
Affects Versions: 2.3.9
Reporter: Yuming Wang






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (HIVE-25869) Add GitHub Action job to publish snapshot

2022-01-17 Thread Yuming Wang (Jira)

Yuming Wang created HIVE-25869:
--

 Summary: Add GitHub Action job to publish snapshot
 Key: HIVE-25869
 URL: https://issues.apache.org/jira/browse/HIVE-25869
 Project: Hive
  Issue Type: Improvement
Reporter: Yuming Wang






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (HIVE-25635) Upgrade Thrift to 0.15.0

2021-10-21 Thread Yuming Wang (Jira)

Yuming Wang created HIVE-25635:
--

 Summary: Upgrade Thrift to 0.15.0
 Key: HIVE-25635
 URL: https://issues.apache.org/jira/browse/HIVE-25635
 Project: Hive
  Issue Type: Improvement
Reporter: Yuming Wang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-25295) "File already exist exception" during mapper/reducer retry with old hive(0.13)

2021-06-29 Thread yuquan wang (Jira)

yuquan wang created HIVE-25295:
--

 Summary: "File already exist exception" during mapper/reducer 
retry with old hive(0.13)
 Key: HIVE-25295
 URL: https://issues.apache.org/jira/browse/HIVE-25295
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 0.13.0
Reporter: yuquan wang


We are now using very old hive version(0.13) due to historical reason, and we 
often meet following issue:
{code:java}
Caused by: java.io.IOException: File already 
exists:s3://smart-dmp/warehouse/uploaded/ad_dmp_pixel/dt=2021-06-21/key=259f3XXX
{code}
We have investigated this issue for quite a long time, but didn't get a good 
fix, so I may want to ask the hive community for help to see if there are any 
solutions.
 
The error is created during map/reduce stage, once an instance failed due to 
some unexpected reason(for example unstable spot instance got killed), then 
later retry will throw the above exception, instead of overwriting it.
 
we have several guesses like following:
1. Is it caused by orc file type? I have found similar issue like 
https://issues.apache.org/jira/browse/HIVE-6341 but saw no comments there, and 
our table is stored as orc style.
2. Is the problem solved in the higher hive version? because we are also 
running hive 2.3.6, but didn't meet such an issue, so want to see if version 
upgrade can solve the issue?
3.Do we have such a config that supports always cleaning up existing folders 
during retry of mapper/reducer stage. I have searched all mapreduce config but 
can not find one.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24893) Download data from Thriftserver through JDBC

2021-03-17 Thread Yuming Wang (Jira)

Yuming Wang created HIVE-24893:
--

 Summary: Download data from Thriftserver through JDBC
 Key: HIVE-24893
 URL: https://issues.apache.org/jira/browse/HIVE-24893
 Project: Hive
  Issue Type: New Feature
  Components: HiveServer2, JDBC
Affects Versions: 4.0.0
Reporter: Yuming Wang




Snowflake support Download Data Files Directly from an Internal Stage to a 
Stream:
https://docs.snowflake.com/en/user-guide/jdbc-using.html#label-jdbc-download-from-stage-to-stream
https://github.com/snowflakedb/snowflake-jdbc/blob/95a7d8a03316093430dc3960df6635643208b6fd/src/main/java/net/snowflake/client/jdbc/SnowflakeConnectionV1.java#L886




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24797) Disable validate default values when parsing Avro schemas.

2021-02-19 Thread Yuming Wang (Jira)

Yuming Wang created HIVE-24797:
--

 Summary: Disable validate default values when parsing Avro schemas.
 Key: HIVE-24797
 URL: https://issues.apache.org/jira/browse/HIVE-24797
 Project: Hive
  Issue Type: Bug
Reporter: Yuming Wang


It will throw exceptions when upgrading Avro to 1.10.1 for this schema:
{code:json}
{
"type": "record",
"name": "EventData",
"doc": "event data",
"fields": [
{"name": "ARRAY_WITH_DEFAULT", "type": {"type": "array", "items": 
"string"}, "default": null }
]
}
{code}

{noformat}
org.apache.avro.AvroTypeException: Invalid default for field USERACTIONS: null 
not a {"type":"array","items":"string"}
at org.apache.avro.Schema.validateDefault(Schema.java:1571)
at org.apache.avro.Schema.access$500(Schema.java:87)
at org.apache.avro.Schema$Field.(Schema.java:544)
at org.apache.avro.Schema.parse(Schema.java:1678)
at org.apache.avro.Schema$Parser.parse(Schema.java:1425)
at org.apache.avro.Schema$Parser.parse(Schema.java:1396)
at 
org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils.getSchemaFor(AvroSerdeUtils.java:287)
at 
org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils.getSchemaFromFS(AvroSerdeUtils.java:170)
at 
org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils.determineSchemaOrThrowException(AvroSerdeUtils.java:139)
at 
org.apache.hadoop.hive.serde2.avro.AvroSerDe.determineSchemaOrReturnErrorSchema(AvroSerDe.java:187)
at 
org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:107)
at 
org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:83)
at 
org.apache.hadoop.hive.serde2.SerDeUtils.initializeSerDe(SerDeUtils.java:533)
at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:493)
at 
org.apache.hadoop.hive.ql.metadata.Partition.getDeserializer(Partition.java:225)
{noformat}





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24760) Backport HIVE-19228 to branch-3.0, branch-2 and branch-2.3

2021-02-09 Thread Yuming Wang (Jira)

Yuming Wang created HIVE-24760:
--

 Summary: Backport HIVE-19228 to branch-3.0, branch-2 and branch-2.3
 Key: HIVE-24760
 URL: https://issues.apache.org/jira/browse/HIVE-24760
 Project: Hive
  Issue Type: Improvement
Reporter: Yuming Wang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24568) Fix guice compatibility issues

2020-12-23 Thread Yuming Wang (Jira)

Yuming Wang created HIVE-24568:
--

 Summary: Fix guice compatibility issues
 Key: HIVE-24568
 URL: https://issues.apache.org/jira/browse/HIVE-24568
 Project: Hive
  Issue Type: Improvement
Reporter: Yuming Wang


{noformat}
Exception in thread "main" java.lang.NoSuchMethodError: 
com.google.inject.util.Types.collectionOf(Ljava/lang/reflect/Type;)Ljava/lang/reflect/ParameterizedType;
» at 
com.google.inject.multibindings.Multibinder.collectionOfProvidersOf(Multibinder.java:202)
» at 
com.google.inject.multibindings.Multibinder$RealMultibinder.(Multibinder.java:283)
» at 
com.google.inject.multibindings.Multibinder$RealMultibinder.(Multibinder.java:258)
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24436) Fix Avro NULL_DEFAULT_VALUE compatibility issue

2020-11-26 Thread Yuming Wang (Jira)

Yuming Wang created HIVE-24436:
--

 Summary: Fix Avro NULL_DEFAULT_VALUE compatibility issue
 Key: HIVE-24436
 URL: https://issues.apache.org/jira/browse/HIVE-24436
 Project: Hive
  Issue Type: Improvement
  Components: Avro
Affects Versions: 2.3.8
Reporter: Yuming Wang


Exception1:
{noformat}
- create hive serde table with Catalog
*** RUN ABORTED ***
  java.lang.NoSuchMethodError: 'void 
org.apache.avro.Schema$Field.(java.lang.String, org.apache.avro.Schema, 
java.lang.String, org.codehaus.jackson.JsonNode)'
  at 
org.apache.hadoop.hive.serde2.avro.TypeInfoToSchema.createAvroField(TypeInfoToSchema.java:76)
  at 
org.apache.hadoop.hive.serde2.avro.TypeInfoToSchema.convert(TypeInfoToSchema.java:61)
  at 
org.apache.hadoop.hive.serde2.avro.AvroSerDe.getSchemaFromCols(AvroSerDe.java:170)
  at org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:114)
  at org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:83)
  at 
org.apache.hadoop.hive.serde2.SerDeUtils.initializeSerDe(SerDeUtils.java:533)
  at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:450)
  at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:437)
  at 
org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:281)
  at org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:263)
{noformat}


Exception2:
{noformat}
- alter hive serde table add columns -- partitioned - AVRO *** FAILED ***
  org.apache.spark.sql.AnalysisException: 
org.apache.hadoop.hive.ql.metadata.HiveException: 
org.apache.avro.AvroRuntimeException: Unknown datum class: class 
org.codehaus.jackson.node.NullNode;
  at 
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:112)
  at 
org.apache.spark.sql.hive.HiveExternalCatalog.createTable(HiveExternalCatalog.scala:245)
  at 
org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.createTable(ExternalCatalogWithListener.scala:94)
  at 
org.apache.spark.sql.catalyst.catalog.SessionCatalog.createTable(SessionCatalog.scala:346)
  at 
org.apache.spark.sql.execution.command.CreateTableCommand.run(tables.scala:166)
  at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
  at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
  at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
  at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:228)
  at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3680)
{noformat}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24153) distinct is not quite effective in table expression

2020-09-11 Thread Xinyu Wang (Jira)

Xinyu Wang created HIVE-24153:
-

 Summary: distinct is not quite effective in table expression
 Key: HIVE-24153
 URL: https://issues.apache.org/jira/browse/HIVE-24153
 Project: Hive
  Issue Type: Bug
  Components: Query Planning
Affects Versions: 3.1.1
Reporter: Xinyu Wang


Below is an example:

_t(id int, name string, comment string)._

_with cte as (_

    _select distinct id, name, comment_

    _from t_

_)_

 

_select count(*) from cte_

The result of the above query is larger than select count(distinct id, name, 
comment). In the result of EXPLAIN, PARTITION_ONLY_SHUFFLE is used. But for  
select count(distinct id, name, comment), SHUFFLE is used instead.

 

Thanks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24053) Pluggable HttpRequestInterceptor for Hive JDBC

2020-08-20 Thread Ying Wang (Jira)

Ying Wang created HIVE-24053:


 Summary: Pluggable HttpRequestInterceptor for Hive JDBC
 Key: HIVE-24053
 URL: https://issues.apache.org/jira/browse/HIVE-24053
 Project: Hive
  Issue Type: New Feature
  Components: JDBC
Affects Versions: 3.1.2
Reporter: Ying Wang
Assignee: Ying Wang


Allows client to pass in the name of a customize HttpRequestInterceptor, 
instantiate the class and adds it to HttpClient.

Example usage: We would like to pass in a HttpRequestInterceptor for OAuth2.0 
Authentication purpose. The HttpRequestInterceptor will acquire and/or refresh 
the access token and add it as authentication header each time HiveConnection 
sends the HttpRequest.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-24049) Forbid binary type as partition column

2020-08-19 Thread Yuming Wang (Jira)

Yuming Wang created HIVE-24049:
--

 Summary: Forbid binary type as partition column
 Key: HIVE-24049
 URL: https://issues.apache.org/jira/browse/HIVE-24049
 Project: Hive
  Issue Type: Bug
Reporter: Yuming Wang


Use binary type as partition column maybe has data issue.
{noformat}
hive> create table t1(id int) partitioned by (part binary);
OK
Time taken: 3.307 seconds
hive> insert into t1 PARTITION(part) select 1 as id, cast('a' as binary) as 
part;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the 
future versions. Consider using a different execution engine (i.e. spark, tez) 
or using Hive 1.X releases.
Query ID = yumwang_20200819144033_5eb6d723-edeb-4e17-8509-c658ad89c2a3
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Job running in-process (local Hadoop)
2020-08-19 14:40:36,083 Stage-1 map = 100%,  reduce = 0%
Ended Job = job_local247252310_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory 
file:/Users/yumwang/Downloads/apache-hive-2.3.7-bin/tmp/t1/.hive-staging_hive_2020-08-19_14-40-33_789_7653530788805518878-1/-ext-1
Loading data to table default.t1 partition (part=null)

Loaded : 1/1 partitions.
 Time taken to load dynamic partitions: 4.029 seconds
 Time taken for adding to write entity : 0.001 seconds
MapReduce Jobs Launched:
Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 6.591 seconds
hive> insert into t1 PARTITION(part) select 1 as id, cast('b' as binary) as 
part;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the 
future versions. Consider using a different execution engine (i.e. spark, tez) 
or using Hive 1.X releases.
Query ID = yumwang_20200819144045_1f112d6d-effa-4d81-87e8-9326015289f1
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Job running in-process (local Hadoop)
2020-08-19 14:40:47,537 Stage-1 map = 100%,  reduce = 0%
Ended Job = job_local698238180_0002
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory 
file:/Users/yumwang/Downloads/apache-hive-2.3.7-bin/tmp/t1/.hive-staging_hive_2020-08-19_14-40-45_908_8062651574733580526-1/-ext-1
Loading data to table default.t1 partition (part=null)

Loaded : 1/1 partitions.
 Time taken to load dynamic partitions: 0.15 seconds
 Time taken for adding to write entity : 0.0 seconds
MapReduce Jobs Launched:
Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 1.988 seconds
hive> select * from t1;
OK
1   61
1   62
Time taken: 0.471 seconds, Fetched: 2 row(s)
hive> select * from t1 where part= cast('b' as binary);;
OK
Time taken: 0.381 seconds
hive> select * from t1 where part= cast('b' as binary);
OK
Time taken: 0.141 seconds
hive> select * from t1 where part= cast('a' as binary);
OK
Time taken: 0.198 seconds
hive> select * from t1 where part= 61;
FAILED: RuntimeException Cannot convert to Binary from: int
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23771) load数据到hive,limit 显示用户名中文正确,where 用户名乱码,并且不能使用用户名比对

2020-06-29 Thread wang (Jira)

wang created HIVE-23771:
---

 Summary: load数据到hive,limit 显示用户名中文正确,where 用户名乱码,并且不能使用用户名比对
 Key: HIVE-23771
 URL: https://issues.apache.org/jira/browse/HIVE-23771
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 2.1.1
Reporter: wang
 Fix For: 2.1.1
 Attachments: image-2020-06-29-15-04-23-999.png, 
image-2020-06-29-15-08-25-923.png, image-2020-06-29-15-10-10-310.png

建表语句:create table smg_t_usr_inf_23(
Usr_ID string,
RlgnSvcPltfrmUsr_TpCd string,
Rlgn_InsID string,
Usr_Nm string ,
) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' 
WITH SERDEPROPERTIES ("field.delim"="|@|") stored as textfile

导入数据:LOAD DATA LOCAL INPATH '/home/ap/USR_INF 20200622_0001.dat' INTO TABLE 
usr_inf

select * from usr_inf limit 10;显示数据: !image-2020-06-29-15-04-23-999.png!

 

select * from usr_inf where usr_nm = '胡学玲' ;无显示数据: 
!image-2020-06-29-15-08-25-923.png!

 

其他select * from usr_inf where usr_id='***';显示数据 
!image-2020-06-29-15-10-10-310.png! .

求大神解答,为什么导入的数据是中文但是where就有问题,直接insert into table aa select * from usr_inf;新表 
的usr_nm 字段也是同上 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23359) "show tables like" support for SQL wildcard characters (% and _)

2020-05-03 Thread Yuming Wang (Jira)

Yuming Wang created HIVE-23359:
--

 Summary: "show tables like" support for SQL wildcard characters (% 
and _)
 Key: HIVE-23359
 URL: https://issues.apache.org/jira/browse/HIVE-23359
 Project: Hive
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.3.7
Reporter: Yuming Wang


https://docs.snowflake.com/en/sql-reference/sql/show-tables.html
https://clickhouse.tech/docs/en/sql-reference/statements/show/
https://www.mysqltutorial.org/mysql-show-tables/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23129) Cast invalid string to date returns incorrect result

2020-04-01 Thread Yuming Wang (Jira)

Yuming Wang created HIVE-23129:
--

 Summary: Cast invalid string to date returns incorrect result
 Key: HIVE-23129
 URL: https://issues.apache.org/jira/browse/HIVE-23129
 Project: Hive
  Issue Type: Bug
Affects Versions: 3.1.2
Reporter: Yuming Wang


{noformat}
hive> select cast('2020-20-20' as date);
OK
2021-08-20
Time taken: 4.436 seconds, Fetched: 1 row(s)
{noformat}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-22838) like any is incorrect if contains null

2020-02-06 Thread Yuming Wang (Jira)

Yuming Wang created HIVE-22838:
--

 Summary: like any is incorrect if contains null
 Key: HIVE-22838
 URL: https://issues.apache.org/jira/browse/HIVE-22838
 Project: Hive
  Issue Type: Bug
  Components: Query Planning
Affects Versions: 3.1.2
Reporter: Yuming Wang


How to reproduce:

{code:sql}
CREATE TABLE like_any_table
STORED AS TEXTFILE
AS
SELECT "google" as company,"%oo%" as pat
UNION ALL
SELECT "facebook" as company,"%oo%" as pat
UNION ALL
SELECT "linkedin" as company,"%in" as pat
;
{code}


{noformat}
hive> select company from like_any_table where company like any ('%oo%',null);
OK
Time taken: 0.064 seconds
hive> select company from like_any_table where company like '%oo%'  or company 
like null;
OK
google
facebook
{noformat}





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-22642) Fix the TCLIService.thrift warning

2019-12-12 Thread Yuming Wang (Jira)

Yuming Wang created HIVE-22642:
--

 Summary: Fix the TCLIService.thrift warning
 Key: HIVE-22642
 URL: https://issues.apache.org/jira/browse/HIVE-22642
 Project: Hive
  Issue Type: Improvement
Reporter: Yuming Wang



{noformat}
TCLIService.thrift:361] Consider using the more efficient "binary" type instead 
of "list"
{noformat}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-22229) Backport HIVE-8472 to branch-2.3

2019-09-20 Thread Yuming Wang (Jira)

Yuming Wang created HIVE-9:
--

 Summary: Backport HIVE-8472 to branch-2.3
 Key: HIVE-9
 URL: https://issues.apache.org/jira/browse/HIVE-9
 Project: Hive
  Issue Type: Improvement
  Components: Database/Schema
Affects Versions: 2.3.6
Reporter: Yuming Wang
Assignee: Yuming Wang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-22206) Failed to "add jar" with hiveserver2 on JDK 11

2019-09-14 Thread Yuming Wang (Jira)

Yuming Wang created HIVE-22206:
--

 Summary: Failed to "add jar" with hiveserver2 on JDK 11
 Key: HIVE-22206
 URL: https://issues.apache.org/jira/browse/HIVE-22206
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 4.0.0
Reporter: Yuming Wang


How to reproduce:
{code:sh}
export JAVA_HOME=/usr/lib/jdk-11.0.3
export PATH=${JAVA_HOME}/bin:${PATH}
rm -rf lib/hive-hcatalog-core-4.0.0-SNAPSHOT.jar
bin/hiveserver2
{code}

{code:sql}
bin/beeline -u jdbc:hive2://localhost:1
add jar /root/opensource/apache-hive/hive-hcatalog-core-4.0.0-SNAPSHOT.jar;
CREATE TABLE addJar(key string) ROW FORMAT SERDE 
'org.apache.hive.hcatalog.data.JsonSerDe';
{code}


{noformat}
0: jdbc:hive2://localhost:1> add jar 
/root/opensource/apache-hive/packaging/target/apache-hive-4.0.0-SNAPSHOT-bin/apache-hive-4.0.0-SNAPSHOT-bin/hive-hcatalog-core-4.0.0-SNAPSHOT.jar;
INFO  : Added 
[/root/opensource/apache-hive/packaging/target/apache-hive-4.0.0-SNAPSHOT-bin/apache-hive-4.0.0-SNAPSHOT-bin/hive-hcatalog-core-4.0.0-SNAPSHOT.jar]
 to class path
INFO  : Added resources: 
[/root/opensource/apache-hive/packaging/target/apache-hive-4.0.0-SNAPSHOT-bin/apache-hive-4.0.0-SNAPSHOT-bin/hive-hcatalog-core-4.0.0-SNAPSHOT.jar]
No rows affected (0.018 seconds)
0: jdbc:hive2://localhost:1> CREATE TABLE addJar(key string) ROW FORMAT 
SERDE 'org.apache.hive.hcatalog.data.JsonSerDe';
INFO  : Compiling 
command(queryId=root_20190914215356_211fe827-960f-4556-ad5b-feb4ad474a8c): 
CREATE TABLE addJar(key string) ROW FORMAT SERDE 
'org.apache.hive.hcatalog.data.JsonSerDe'
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
INFO  : Completed compiling 
command(queryId=root_20190914215356_211fe827-960f-4556-ad5b-feb4ad474a8c); Time 
taken: 0.006 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing 
command(queryId=root_20190914215356_211fe827-960f-4556-ad5b-feb4ad474a8c): 
CREATE TABLE addJar(key string) ROW FORMAT SERDE 
'org.apache.hive.hcatalog.data.JsonSerDe'
INFO  : Starting task [Stage-0:DDL] in serial mode
ERROR : Failed
org.apache.hadoop.hive.ql.metadata.HiveException: Cannot validate serde: 
org.apache.hive.hcatalog.data.JsonSerDe
at 
org.apache.hadoop.hive.ql.ddl.DDLUtils.validateSerDe(DDLUtils.java:118) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.ddl.table.creation.CreateTableDesc.toTable(CreateTableDesc.java:772)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.ddl.table.creation.CreateTableOperation.execute(CreateTableOperation.java:57)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:90) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:212) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:103) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2188) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1840) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1508) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1268) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1262) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:160) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:233)
 ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hive.service.cli.operation.SQLOperation.access$600(SQLOperation.java:88)
 ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:332)
 ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at java.security.AccessController.doPrivileged(Native Method) ~[?:?]
at javax.security.auth.Subject.doAs(Subject.java:423) ~[?:?]
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
 ~[hadoop-common-3.2.0.jar:?]
at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:350)
 ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?]
at

[jira] [Created] (HIVE-22139) Will not pad Decimal numbers with trailing zeros if select from value

2019-08-22 Thread Yuming Wang (Jira)

Yuming Wang created HIVE-22139:
--

 Summary: Will not pad Decimal numbers with trailing zeros if 
select from value
 Key: HIVE-22139
 URL: https://issues.apache.org/jira/browse/HIVE-22139
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.1.1
Reporter: Yuming Wang


How to reproduce:
{code:sql}
// code placeholder
{code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (HIVE-22097) Incompatible java.util.ArrayList for java 11

2019-08-11 Thread Yuming Wang (JIRA)

Yuming Wang created HIVE-22097:
--

 Summary: Incompatible java.util.ArrayList for java 11
 Key: HIVE-22097
 URL: https://issues.apache.org/jira/browse/HIVE-22097
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Yuming Wang



{noformat}
export JAVA_HOME=/usr/lib/jdk-11.0.3
export PATH=${JAVA_HOME}/bin:${PATH}

hive> create table t(id int);
Time taken: 0.035 seconds
hive> insert into t values(1);
Query ID = root_20190811155400_7c0e0494-eecb-4c54-a9fd-942ab52a0794
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=
In order to set a constant number of reducers:
  set mapreduce.job.reduces=
java.lang.RuntimeException: java.lang.NoSuchFieldException: parentOffset
at 
org.apache.hadoop.hive.ql.exec.SerializationUtilities$ArrayListSubListSerializer.(SerializationUtilities.java:390)
at 
org.apache.hadoop.hive.ql.exec.SerializationUtilities$1.create(SerializationUtilities.java:235)
at 
org.apache.hive.com.esotericsoftware.kryo.pool.KryoPoolQueueImpl.borrow(KryoPoolQueueImpl.java:48)
at 
org.apache.hadoop.hive.ql.exec.SerializationUtilities.borrowKryo(SerializationUtilities.java:280)
at 
org.apache.hadoop.hive.ql.exec.Utilities.setBaseWork(Utilities.java:595)
at 
org.apache.hadoop.hive.ql.exec.Utilities.setMapWork(Utilities.java:587)
at 
org.apache.hadoop.hive.ql.exec.Utilities.setMapRedWork(Utilities.java:579)
at 
org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:357)
at 
org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:159)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:212)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:103)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2317)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1969)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1636)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1396)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1390)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:162)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:223)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:242)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:189)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:408)
at 
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:838)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:777)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:696)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
Caused by: java.lang.NoSuchFieldException: parentOffset
at java.base/java.lang.Class.getDeclaredField(Class.java:2412)
at 
org.apache.hadoop.hive.ql.exec.SerializationUtilities$ArrayListSubListSerializer.(SerializationUtilities.java:384)
... 29 more
Job Submission failed with exception 
'java.lang.RuntimeException(java.lang.NoSuchFieldException: parentOffset)'
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.mr.MapRedTask. java.lang.NoSuchFieldException: 
parentOffset
{noformat}

The reason is Java remove {{parentOffset}}.




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Created] (HIVE-22096) Backport HIVE-21584 to branch-2.3

2019-08-11 Thread Yuming Wang (JIRA)

Yuming Wang created HIVE-22096:
--

 Summary: Backport HIVE-21584 to branch-2.3
 Key: HIVE-22096
 URL: https://issues.apache.org/jira/browse/HIVE-22096
 Project: Hive
  Issue Type: Improvement
  Components: Hive
Reporter: Yuming Wang
Assignee: Yuming Wang






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Created] (HIVE-22012) Support timestamp type + string type

2019-07-19 Thread Yuming Wang (JIRA)

Yuming Wang created HIVE-22012:
--

 Summary: Support timestamp type + string type
 Key: HIVE-22012
 URL: https://issues.apache.org/jira/browse/HIVE-22012
 Project: Hive
  Issue Type: Improvement
  Components: Parser
Affects Versions: 4.0.0
Reporter: Yuming Wang


{code:sql}
hive> select current_timestamp() + '100 days';
FAILED: SemanticException [Error 10014]: Line 1:7 Wrong arguments ''100 days'': 
No matching method for class 
org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPDTIPlus with (timestamp, 
string)
hive>
{code}


{code:sql}
postgres=# explain verbose select now() + '100 days', '100 days' + now();
QUERY PLAN
--
 Result  (cost=0.00..0.02 rows=1 width=16)
   Output: (now() + '100 days'::interval), (now() + '100 days'::interval)
(2 rows)
{code}





--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Created] (HIVE-21680) Backport HIVE-17644 to branch-2 and branch-2.3

2019-05-02 Thread Yuming Wang (JIRA)

Yuming Wang created HIVE-21680:
--

 Summary: Backport HIVE-17644 to branch-2 and branch-2.3
 Key: HIVE-21680
 URL: https://issues.apache.org/jira/browse/HIVE-21680
 Project: Hive
  Issue Type: Bug
Reporter: Yuming Wang
Assignee: Yuming Wang



{code:scala}
  test("get statistics when not analyzed in Hive or Spark") {
val tabName = "tab1"
withTable(tabName) {
  createNonPartitionedTable(tabName, analyzedByHive = false, 
analyzedBySpark = false)
  checkTableStats(tabName, hasSizeInBytes = true, expectedRowCounts = None)

  // ALTER TABLE SET TBLPROPERTIES invalidates some contents of Hive 
specific statistics
  // This is triggered by the Hive alterTable API
  val describeResult = hiveClient.runSqlHive(s"DESCRIBE FORMATTED $tabName")

  val rawDataSize = extractStatsPropValues(describeResult, "rawDataSize")
  val numRows = extractStatsPropValues(describeResult, "numRows")
  val totalSize = extractStatsPropValues(describeResult, "totalSize")
  assert(rawDataSize.isEmpty, "rawDataSize should not be shown without 
table analysis")
  assert(numRows.isEmpty, "numRows should not be shown without table 
analysis")
  assert(totalSize.isDefined && totalSize.get > 0, "totalSize is lost")
}
  }
// 
https://github.com/apache/spark/blob/43dcb91a4cb25aa7e1cc5967194f098029a0361e/sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala#L789-L806
{code}

{noformat}
06:23:46.103 WARN org.apache.hadoop.hive.metastore.MetaStoreDirectSql: Failed 
to execute [SELECT "DBS"."NAME", "TBLS"."TBL_NAME", 
"COLUMNS_V2"."COLUMN_NAME","KEY_CONSTRAINTS"."POSITION", 
"KEY_CONSTRAINTS"."CONSTRAINT_NAME", "KEY_CONSTRAINTS"."ENABLE_VALIDATE_RELY"  
FROM  "TBLS"  INNER  JOIN "KEY_CONSTRAINTS" ON "TBLS"."TBL_ID" = 
"KEY_CONSTRAINTS"."PARENT_TBL_ID"  INNER JOIN "DBS" ON "TBLS"."DB_ID" = 
"DBS"."DB_ID"  INNER JOIN "COLUMNS_V2" ON "COLUMNS_V2"."CD_ID" = 
"KEY_CONSTRAINTS"."PARENT_CD_ID" AND  "COLUMNS_V2"."INTEGER_IDX" = 
"KEY_CONSTRAINTS"."PARENT_INTEGER_IDX"  WHERE 
"KEY_CONSTRAINTS"."CONSTRAINT_TYPE" = 0 AND "DBS"."NAME" = ? AND 
"TBLS"."TBL_NAME" = ?] with parameters [default, tab1]
javax.jdo.JDODataStoreException: Error executing SQL query "SELECT 
"DBS"."NAME", "TBLS"."TBL_NAME", 
"COLUMNS_V2"."COLUMN_NAME","KEY_CONSTRAINTS"."POSITION", 
"KEY_CONSTRAINTS"."CONSTRAINT_NAME", "KEY_CONSTRAINTS"."ENABLE_VALIDATE_RELY"  
FROM  "TBLS"  INNER  JOIN "KEY_CONSTRAINTS" ON "TBLS"."TBL_ID" = 
"KEY_CONSTRAINTS"."PARENT_TBL_ID"  INNER JOIN "DBS" ON "TBLS"."DB_ID" = 
"DBS"."DB_ID"  INNER JOIN "COLUMNS_V2" ON "COLUMNS_V2"."CD_ID" = 
"KEY_CONSTRAINTS"."PARENT_CD_ID" AND  "COLUMNS_V2"."INTEGER_IDX" = 
"KEY_CONSTRAINTS"."PARENT_INTEGER_IDX"  WHERE 
"KEY_CONSTRAINTS"."CONSTRAINT_TYPE" = 0 AND "DBS"."NAME" = ? AND 
"TBLS"."TBL_NAME" = ?".
at 
org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:543)
at org.datanucleus.api.jdo.JDOQuery.executeInternal(JDOQuery.java:391)
at org.datanucleus.api.jdo.JDOQuery.executeWithArray(JDOQuery.java:267)
at 
org.apache.hadoop.hive.metastore.MetaStoreDirectSql.executeWithArray(MetaStoreDirectSql.java:1750)
at 
org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPrimaryKeys(MetaStoreDirectSql.java:1939)
at 
org.apache.hadoop.hive.metastore.ObjectStore$11.getSqlResult(ObjectStore.java:8213)
at 
org.apache.hadoop.hive.metastore.ObjectStore$11.getSqlResult(ObjectStore.java:8209)
at 
org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:2719)
at 
org.apache.hadoop.hive.metastore.ObjectStore.getPrimaryKeysInternal(ObjectStore.java:8221)
at 
org.apache.hadoop.hive.metastore.ObjectStore.getPrimaryKeys(ObjectStore.java:8199)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:101)
at com.sun.proxy.$Proxy24.getPrimaryKeys(Unknown Source)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_primary_keys(HiveMetaStore.java:6830)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148)
at

[jira] [Created] (HIVE-21639) Spark test failed since HIVE-10632

2019-04-21 Thread Yuming Wang (JIRA)

Yuming Wang created HIVE-21639:
--

 Summary: Spark test failed since HIVE-10632
 Key: HIVE-21639
 URL: https://issues.apache.org/jira/browse/HIVE-21639
 Project: Hive
  Issue Type: Bug
Reporter: Yuming Wang






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-21589) Remove org.eclipse.jetty.orbit:javax.servlet from hive-common

2019-04-06 Thread Yuming Wang (JIRA)

Yuming Wang created HIVE-21589:
--

 Summary: Remove org.eclipse.jetty.orbit:javax.servlet from 
hive-common
 Key: HIVE-21589
 URL: https://issues.apache.org/jira/browse/HIVE-21589
 Project: Hive
  Issue Type: Task
  Components: Spark
Affects Versions: 2.3.4
Reporter: Yuming Wang
Assignee: Yuming Wang


HIVE-12783 includes org.eclipse.jetty.orbit:javax.servlet to fix the Hive on 
Spark test failure.
Since Spark 2.0, We do not need it, see SPARK-14897.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-21588) Remove HBase dependency from hive-metastore

2019-04-06 Thread Yuming Wang (JIRA)

Yuming Wang created HIVE-21588:
--

 Summary: Remove HBase dependency from hive-metastore
 Key: HIVE-21588
 URL: https://issues.apache.org/jira/browse/HIVE-21588
 Project: Hive
  Issue Type: Task
  Components: HBase Metastore
Affects Versions: 4.0.0
Reporter: Yuming Wang
Assignee: Yuming Wang


HIVE-17234 has removed HBase metastore from master. But maven dependency have 
not been removed



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-21563) Improve Table#getEmptyTable performance by disable registerAllFunctionsOnce

2019-04-01 Thread Yuming Wang (JIRA)

Yuming Wang created HIVE-21563:
--

 Summary: Improve Table#getEmptyTable performance by disable 
registerAllFunctionsOnce
 Key: HIVE-21563
 URL: https://issues.apache.org/jira/browse/HIVE-21563
 Project: Hive
  Issue Type: Improvement
Reporter: Yuming Wang
Assignee: Yuming Wang


We do not need registerAllFunctionsOnce when {{Table#getEmptyTable}}. The stack 
trace:
{noformat}
  at 
org.apache.hadoop.hive.ql.exec.Registry.registerGenericUDF(Registry.java:177)
  at 
org.apache.hadoop.hive.ql.exec.Registry.registerGenericUDF(Registry.java:170)
  at 
org.apache.hadoop.hive.ql.exec.FunctionRegistry.(FunctionRegistry.java:209)
  at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:247)
  at 
org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:231)
  at org.apache.hadoop.hive.ql.metadata.Hive.(Hive.java:388)
  at org.apache.hadoop.hive.ql.metadata.Hive.create(Hive.java:332)
  at org.apache.hadoop.hive.ql.metadata.Hive.getInternal(Hive.java:312)
  at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:288)
  at 
org.apache.hadoop.hive.ql.session.SessionState.setAuthorizerV2Config(SessionState.java:913)
  at 
org.apache.hadoop.hive.ql.session.SessionState.setupAuth(SessionState.java:877)
  at 
org.apache.hadoop.hive.ql.session.SessionState.getAuthenticator(SessionState.java:1479)
  at 
org.apache.hadoop.hive.ql.session.SessionState.getUserFromAuthenticator(SessionState.java:1150)
  at org.apache.hadoop.hive.ql.metadata.Table.getEmptyTable(Table.java:180)
{noformat}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-21552) Remove tomcat:jasper-* from hive-service-rpc

2019-03-30 Thread Yuming Wang (JIRA)

Yuming Wang created HIVE-21552:
--

 Summary: Remove tomcat:jasper-* from hive-service-rpc
 Key: HIVE-21552
 URL: https://issues.apache.org/jira/browse/HIVE-21552
 Project: Hive
  Issue Type: Improvement
Reporter: Yuming Wang
Assignee: Yuming Wang


{{hive-service}} added these dependency. {{hive-service-rpc}} do not need these 
dependency.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-21551) Remove tomcat:jasper-* from hive-service-rpc

2019-03-30 Thread Yuming Wang (JIRA)

Yuming Wang created HIVE-21551:
--

 Summary: Remove tomcat:jasper-* from hive-service-rpc
 Key: HIVE-21551
 URL: https://issues.apache.org/jira/browse/HIVE-21551
 Project: Hive
  Issue Type: Improvement
Reporter: Yuming Wang
Assignee: Yuming Wang


{{hive-service}} added these dependency. {{hive-service-rpc}} do not need these 
dependency.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-21536) Backport HIVE-17764 to branch-2.3

2019-03-28 Thread Yuming Wang (JIRA)

Yuming Wang created HIVE-21536:
--

 Summary: Backport HIVE-17764 to branch-2.3
 Key: HIVE-21536
 URL: https://issues.apache.org/jira/browse/HIVE-21536
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 2.3.4
Reporter: Yuming Wang






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-21521) Upgrade Hive to use ORC 1.5.5

2019-03-27 Thread Yuming Wang (JIRA)

Yuming Wang created HIVE-21521:
--

 Summary: Upgrade Hive to use ORC 1.5.5
 Key: HIVE-21521
 URL: https://issues.apache.org/jira/browse/HIVE-21521
 Project: Hive
  Issue Type: Improvement
Affects Versions: 2.3.4
Reporter: Yuming Wang
Assignee: Yuming Wang






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-20650) trunc string type throw FAILED: ArrayIndexOutOfBoundsException 1

2018-09-27 Thread Yuming Wang (JIRA)

Yuming Wang created HIVE-20650:
--

 Summary: trunc string type throw FAILED: 
ArrayIndexOutOfBoundsException 1
 Key: HIVE-20650
 URL: https://issues.apache.org/jira/browse/HIVE-20650
 Project: Hive
  Issue Type: Bug
Affects Versions: 2.3.3
Reporter: Yuming Wang


{code:sql}
hive> select trunc('2.5');
FAILED: ArrayIndexOutOfBoundsException 1
hive> SELECT trunc('2009-02-12');
FAILED: ArrayIndexOutOfBoundsException 1
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-20216) Support range partition

2018-07-19 Thread Yuming Wang (JIRA)

Yuming Wang created HIVE-20216:
--

 Summary: Support range partition
 Key: HIVE-20216
 URL: https://issues.apache.org/jira/browse/HIVE-20216
 Project: Hive
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.0
Reporter: Yuming Wang


Support RANGE PARTITION to improvement performance:
{code:sql}
CREATE TABLE employees  (
id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
fname VARCHAR(25) NOT NULL,
lname VARCHAR(25) NOT NULL,
store_id INT NOT NULL,
department_id INT NOT NULL
) 
PARTITION BY RANGE(id)  (
PARTITION p0 VALUES LESS THAN (5),
PARTITION p1 VALUES LESS THAN (10),
PARTITION p2 VALUES LESS THAN (15),
PARTITION p3 VALUES LESS THAN MAXVALUE
);
{code}

https://dev.mysql.com/doc/refman/5.6/en/partitioning-selection.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19740) Hiveserver2 can't connect to metastore when using Hive 3.0

2018-05-30 Thread heyang wang (JIRA)

heyang wang created HIVE-19740:
--

 Summary: Hiveserver2 can't connect to metastore when using Hive 3.0
 Key: HIVE-19740
 URL: https://issues.apache.org/jira/browse/HIVE-19740
 Project: Hive
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: heyang wang
 Attachments: hive-site.xml

I am using docker to deploy Hadoop 2.7, Hive 3.0 and Spark 2.3.

After starting all the docker image. Hive server2 can't start while outputting 
the following error log:

2018-05-30T14:13:53,832 WARN [main]: server.HiveServer2 
(HiveServer2.java:startHiveServer2(1041)) - Error starting HiveServer2 on 
attempt 1, will retry in 6ms
java.lang.RuntimeException: Error initializing notification event poll
 at org.apache.hive.service.server.HiveServer2.init(HiveServer2.java:269) 
~[hive-service-3.0.0.jar:3.0.0]
 at 
org.apache.hive.service.server.HiveServer2.startHiveServer2(HiveServer2.java:1013)
 [hive-service-3.0.0.jar:3.0.0]
 at 
org.apache.hive.service.server.HiveServer2.access$1800(HiveServer2.java:134) 
[hive-service-3.0.0.jar:3.0.0]
 at 
org.apache.hive.service.server.HiveServer2$StartOptionExecutor.execute(HiveServer2.java:1282)
 [hive-service-3.0.0.jar:3.0.0]
 at org.apache.hive.service.server.HiveServer2.main(HiveServer2.java:1126) 
[hive-service-3.0.0.jar:3.0.0]
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_131]
 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
~[?:1.8.0_131]
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:1.8.0_131]
 at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_131]
 at org.apache.hadoop.util.RunJar.run(RunJar.java:221) 
[hadoop-common-2.7.4.jar:?]
 at org.apache.hadoop.util.RunJar.main(RunJar.java:136) 
[hadoop-common-2.7.4.jar:?]
Caused by: java.io.IOException: org.apache.thrift.TApplicationException: 
Internal error processing get_current_notificationEventId
 at 
org.apache.hadoop.hive.metastore.messaging.EventUtils$MSClientNotificationFetcher.getCurrentNotificationEventId(EventUtils.java:75)
 ~[hive-exec-3.0.0.jar:3.0.0]
 at 
org.apache.hadoop.hive.ql.metadata.events.NotificationEventPoll.(NotificationEventPoll.java:103)
 ~[hive-exec-3.0.0.jar:3.0.0]
 at 
org.apache.hadoop.hive.ql.metadata.events.NotificationEventPoll.initialize(NotificationEventPoll.java:59)
 ~[hive-exec-3.0.0.jar:3.0.0]
 at org.apache.hive.service.server.HiveServer2.init(HiveServer2.java:267) 
~[hive-service-3.0.0.jar:3.0.0]
 ... 10 more
Caused by: org.apache.thrift.TApplicationException: Internal error processing 
get_current_notificationEventId
 at 
org.apache.thrift.TApplicationException.read(TApplicationException.java:111) 
~[hive-exec-3.0.0.jar:3.0.0]
 at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:79) 
~[hive-exec-3.0.0.jar:3.0.0]
 at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_current_notificationEventId(ThriftHiveMetastore.java:5541)
 ~[hive-exec-3.0.0.jar:3.0.0]
 at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_current_notificationEventId(ThriftHiveMetastore.java:5529)
 ~[hive-exec-3.0.0.jar:3.0.0]
 at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getCurrentNotificationEventId(HiveMetaStoreClient.java:2713)
 ~[hive-exec-3.0.0.jar:3.0.0]
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_131]
 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
~[?:1.8.0_131]
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:1.8.0_131]
 at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_131]
 at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:212)
 ~[hive-exec-3.0.0.jar:3.0.0]
 at com.sun.proxy.$Proxy34.getCurrentNotificationEventId(Unknown Source) ~[?:?]
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_131]
 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
~[?:1.8.0_131]
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:1.8.0_131]
 at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_131]
 at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2763)
 ~[hive-exec-3.0.0.jar:3.0.0]
 at com.sun.proxy.$Proxy34.getCurrentNotificationEventId(Unknown Source) ~[?:?]
 at 
org.apache.hadoop.hive.metastore.messaging.EventUtils$MSClientNotificationFetcher.getCurrentNotificationEventId(EventUtils.java:73)
 ~[hive-exec-3.0.0.jar:3.0.0]
 at 
org.apache.hadoop.hive.ql.metadata.events.NotificationEventPoll.(NotificationEventPoll.java:103)
 ~[hive-exec-3.0.0.jar:3.0.0]
 at 
org.apache.hadoop.hive.ql.metadata.events.NotificationEventPoll.initialize(NotificationEventPoll.java:59)
 ~[hive-exec-3.0.0.jar:3.0.0]
 at

[jira] [Created] (HIVE-18856) param note error

2018-03-04 Thread Yu Wang (JIRA)

Yu Wang created HIVE-18856:
--

 Summary: param note error
 Key: HIVE-18856
 URL: https://issues.apache.org/jira/browse/HIVE-18856
 Project: Hive
  Issue Type: Improvement
Affects Versions: 1.1.0
Reporter: Yu Wang
Assignee: Yu Wang
 Fix For: 1.1.0


The PerfLogBegin method in the PerfLogger file comments with an error



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-18767) Some alterPartitions throw NumberFormatException: null

2018-02-21 Thread Yuming Wang (JIRA)

Yuming Wang created HIVE-18767:
--

 Summary: Some alterPartitions throw NumberFormatException: null
 Key: HIVE-18767
 URL: https://issues.apache.org/jira/browse/HIVE-18767
 Project: Hive
  Issue Type: Bug
Affects Versions: 2.3.2
Reporter: Yuming Wang






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-17240) function acos(2) should be null

2017-08-03 Thread Yuming Wang (JIRA)

Yuming Wang created HIVE-17240:
--

 Summary: function acos(2) should be null
 Key: HIVE-17240
 URL: https://issues.apache.org/jira/browse/HIVE-17240
 Project: Hive
  Issue Type: Bug
  Components: UDF
Affects Versions: 2.2.0, 1.2.2, 1.1.1
Reporter: Yuming Wang


{{acos(2)}} should be null, same as MySQL:
{code:sql}
hive> desc function extended acos;
OK
acos(x) - returns the arc cosine of x if -1<=x<=1 or NULL otherwise
Example:
  > SELECT acos(1) FROM src LIMIT 1;
  0
  > SELECT acos(2) FROM src LIMIT 1;
  NULL
Time taken: 0.009 seconds, Fetched: 6 row(s)
hive> select acos(2);
OK
NaN
Time taken: 0.437 seconds, Fetched: 1 row(s)
{code}

{code:sql}
mysql>  select acos(2);
+-+
| acos(2) |
+-+
|NULL |
+-+
1 row in set (0.00 sec)
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (HIVE-16641) Use DistCpOptions.Builder in Hadoop shims

2017-05-10 Thread Andrew Wang (JIRA)

Andrew Wang created HIVE-16641:
--

 Summary: Use DistCpOptions.Builder in Hadoop shims
 Key: HIVE-16641
 URL: https://issues.apache.org/jira/browse/HIVE-16641
 Project: Hive
  Issue Type: Bug
  Components: Shims
Reporter: Andrew Wang


Doing some testing against Hadoop trunk. HADOOP-14267 changed how DistCp is 
invoked. Options are now specified via a builder.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (HIVE-16490) Hive should not use private HDFS APIs for encryption

2017-04-20 Thread Andrew Wang (JIRA)

Andrew Wang created HIVE-16490:
--

 Summary: Hive should not use private HDFS APIs for encryption
 Key: HIVE-16490
 URL: https://issues.apache.org/jira/browse/HIVE-16490
 Project: Hive
  Issue Type: Improvement
  Components: Encryption
Affects Versions: 2.2.0
Reporter: Andrew Wang
Priority: Critical


When compiling against bleeding edge versions of Hive and Hadoop, we discovered 
that HIVE-16047 references a private HDFS API, DFSClient, to get at various 
encryption related information. The private API was recently changed by 
HADOOP-14104, which broke Hive compilation.

It'd be better to instead use publicly supported APIs. HDFS-11687 has been 
filed to add whatever encryption APIs are needed by Hive. This JIRA is to move 
Hive over to these new APIs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (HIVE-15794) Support get hdfsEncryptionShim if FileSystem is ViewFileSystem

2017-02-02 Thread Yuming Wang (JIRA)

Yuming Wang created HIVE-15794:
--

 Summary: Support get hdfsEncryptionShim if FileSystem is 
ViewFileSystem
 Key: HIVE-15794
 URL: https://issues.apache.org/jira/browse/HIVE-15794
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 1.1.0, 1.2.0, 2.2.0
Reporter: Yuming Wang
Assignee: Yuming Wang


*SQL*:
{code:sql}
hive> create table table2 as select * from table1;

hive> show create table table2;
OK
CREATE TABLE `table2`(
  `id` string)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'viewfs://cluster4/user/hive/warehouse/table2'
TBLPROPERTIES (
  'transient_lastDdlTime'='1486050317')

{code}

*LOG*:
{noformat}
2017-02-02T20:12:49,738  INFO [99374b82-e9ca-4654-b803-93b194b9331b main] 
session.SessionState: Could not get hdfsEncryptionShim, it is only applicable 
to hdfs filesystem.
2017-02-02T20:12:49,738  INFO [99374b82-e9ca-4654-b803-93b194b9331b main] 
session.SessionState: Could not get hdfsEncryptionShim, it is only applicable 
to hdfs filesystem.
{noformat}

Can’t get hdfsEncryptionShim if {{FileSystem}} is 
[ViewFileSystem|http://hadoop.apache.org/docs/r2.6.5/hadoop-project-dist/hadoop-hdfs/ViewFs.html],
 we should support it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (HIVE-15379) Get the key of hive.metastore.* values should be consistent with Hive Metastore Server.

2016-12-07 Thread Yuming Wang (JIRA)

Yuming Wang created HIVE-15379:
--

 Summary: Get the key of hive.metastore.*  values should be 
consistent with Hive Metastore Server.
 Key: HIVE-15379
 URL: https://issues.apache.org/jira/browse/HIVE-15379
 Project: Hive
  Issue Type: Bug
  Components: Beeline, CLI
Affects Versions: 1.1.0
Reporter: Yuming Wang
Priority: Minor


Hive Metastore Server's {{hive.metastore.try.direct.sql=false}} when using 
Cloudera Manager. But cli or beeline read the client configure and return true. 
It is meaningless.
{code}
hive> set hive.metastore.try.direct.sql;
hive.metastore.try.direct.sql=true
hive> 
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-14918) Function concat_ws get a wrong value

2016-10-09 Thread Xiaowei Wang (JIRA)

Xiaowei Wang created HIVE-14918:
---

 Summary: Function concat_ws get a wrong value  
 Key: HIVE-14918
 URL: https://issues.apache.org/jira/browse/HIVE-14918
 Project: Hive
  Issue Type: Bug
  Components: UDF
Affects Versions: 2.0.1, 2.1.0, 2.0.0, 1.1.1
Reporter: Xiaowei Wang
Assignee: Xiaowei Wang
Priority: Critical
 Fix For: 2.1.0


FROM src INSERT OVERWRITE TABLE dest1 SELECT 'abc', 'xyz', '8675309'  WHERE 
src.key = 86; 
SELECT concat_ws('.',NULL)  FROM dest1 ;

The result is a empty  string "",but I think it should be return NULL .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-14112) Join a HBase mapped big table shouldn't convert to MapJoin

2016-06-27 Thread Yuming Wang (JIRA)

Yuming Wang created HIVE-14112:
--

 Summary: Join a HBase mapped big table shouldn't convert to MapJoin
 Key: HIVE-14112
 URL: https://issues.apache.org/jira/browse/HIVE-14112
 Project: Hive
  Issue Type: Bug
  Components: StorageHandler
Affects Versions: 1.1.0, 1.2.0
Reporter: Yuming Wang
Assignee: Yuming Wang
Priority: Minor


Two tables, _hbasetable_risk_control_defense_idx_uid_ is HBase mapped table:
{noformat}
[root@dev01 ~]# hadoop fs -du -s -h 
/hbase/data/tandem/hbase-table-risk-control-defense-idx-uid
3.0 G  9.0 G  /hbase/data/tandem/hbase-table-risk-control-defense-idx-uid
[root@dev01 ~]# hadoop fs -du -s -h /user/hive/warehouse/openapi_invoke_base
6.6 G  19.7 G  /user/hive/warehouse/openapi_invoke_base
{noformat}
The smallest table is 3.0G, is greater than _hive.mapjoin.smalltable.filesize_ 
and _hive.auto.convert.join.noconditionaltask.size_. When join these tables, 
Hive auto convert it to mapjoin:
{noformat}
hive> select count(*) from hbasetable_risk_control_defense_idx_uid t1 join 
openapi_invoke_base t2 on (t1.key=t2.merchantid);
Query ID = root_2016062809_9f9d3f25-857b-412c-8a75-3d9228bd5ee5
Total jobs = 1
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; 
support was removed in 8.0
Execution log at: 
/tmp/root/root_2016062809_9f9d3f25-857b-412c-8a75-3d9228bd5ee5.log
2016-06-28 09:22:10 Starting to launch local task to process map join;  
maximum memory = 1908932608
{noformat} 
the root cause is hive use 
_/user/hive/warehouse/hbasetable_risk_control_defense_idx_uid_ as it location, 
but it empty. so hive auto convert it to mapjoin.
My opinion is set right location when mapping HBase table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-13293) Query occurs performance degradation after enabling parallel order by for Hive on sprak

2016-03-16 Thread Lifeng Wang (JIRA)

Lifeng Wang created HIVE-13293:
--

 Summary: Query occurs performance degradation after enabling 
parallel order by for Hive on sprak
 Key: HIVE-13293
 URL: https://issues.apache.org/jira/browse/HIVE-13293
 Project: Hive
  Issue Type: Bug
  Components: Spark
Affects Versions: 2.0.0
Reporter: Lifeng Wang
Assignee: Xuefu Zhang


I use TPCx-BB to do some performance test on Hive on Spark engine. And found 
query 10 has performance degradation when enabling parallel order by.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-12874) dynamic partition insert project wrong column

2016-01-14 Thread bin wang (JIRA)

bin wang created HIVE-12874:
---

 Summary: dynamic partition insert project wrong column
 Key: HIVE-12874
 URL: https://issues.apache.org/jira/browse/HIVE-12874
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 0.14.1
 Environment: hive 1.1.0-cdh5.4.8
Reporter: bin wang
Assignee: Alan Gates


We have two table as below:
create table  test (
id bigint comment ' id',
)
PARTITIONED BY(etl_dt string)
STORED AS ORC;

create table  test1 (
id bigint
start_time int,
)
PARTITIONED BY(etl_dt string)
STORED AS ORC;

we use sql like below to import rows from test1 to test:

insert overwrite table test PARTITION(etl_dt)
select id
,from_unixtime(start_time,'-MM-dd') as  etl_dt  

 
from test1
where test1.etl_dt='2016-01-12';

but it behave wrong, it use test1.etl_dt as the test's partition value, not the 
'etl_dt' in select.
We think it's a bug, anyone to fix it? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-12652) SymbolicTextInputFormat should supports the path with regex ,especially using CombineHiveInputFormat .Add test sql .

2015-12-10 Thread Xiaowei Wang (JIRA)

Xiaowei Wang created HIVE-12652:
---

 Summary: SymbolicTextInputFormat should supports the  path with 
regex  ,especially using CombineHiveInputFormat .Add test sql .
 Key: HIVE-12652
 URL: https://issues.apache.org/jira/browse/HIVE-12652
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Xiaowei Wang
Assignee: Xiaowei Wang
 Fix For: 1.2.1


1, In fact,SybolicTextInputFormat supports the path with regex  .I add some  
test sql . 
2, But ,when using  CombineHiveInputFormat  to merge small file  , It cannot 
resolve the path with regex ,so it will get a wrong result.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-12541) Using CombineHiveInputFormat with the origin inputformat SymbolicTextInputFormat ,it will get a wrong result

2015-11-30 Thread Xiaowei Wang (JIRA)

Xiaowei Wang created HIVE-12541:
---

 Summary: Using CombineHiveInputFormat with the origin inputformat  
SymbolicTextInputFormat  ,it will get a wrong result
 Key: HIVE-12541
 URL: https://issues.apache.org/jira/browse/HIVE-12541
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.1, 1.2.0, 0.14.0
Reporter: Xiaowei Wang
Assignee: Xiaowei Wang


Table desc :
{noformat}
CREATE External TABLE `symlink_text_input_format`(
  `key` string,
  `value` string)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'viewfs://nsX/user/hive/warehouse/symlink_text_input_format'  
{noformat}
There is a link file in the dir 
'/user/hive/warehouse/symlink_text_input_format' ,   the content of the link 
file is 
{noformat}
 "viewfs://nsx/tmp/symlink* " 
{noformat}
it contains one path ,and the path contains a regex!


Execute the sql : 
{noformat}
set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
set mapred.min.split.size.per.rack= 0 ;
set mapred.min.split.size.per.node= 0 ;
set mapred.max.split.size= 0 ;
select count(*) from  symlink_text_input_format ;

{noformat}
It will result a wrong result :０

At the same time ,I add a test case in the patch.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-12482) When execution.engine=tez,set mapreduce.job.name does not work.

2015-11-20 Thread Xiaowei Wang (JIRA)

Xiaowei Wang created HIVE-12482:
---

 Summary: When execution.engine=tez,set mapreduce.job.name does not 
work.
 Key: HIVE-12482
 URL: https://issues.apache.org/jira/browse/HIVE-12482
 Project: Hive
  Issue Type: Bug
  Components: Query Planning
Affects Versions: 1.2.1, 1.0.1, 1.0.0, 0.14.0
Reporter: Xiaowei Wang
 Fix For: 0.14.1


When execution.engine=tez,set mapreduce.job.name does not work.

In Tez mode, the default job name is "Hive_"+Sessionid ,for example 
HIVE-ce5784d0-320c-4fb9-8b0b-2d92539dfd9e .It is difficulty to distinguish job 
when there are too much jobs .
A better way is to set the var of mapreduce.job.name .But set 
mapreduce.job.name does not work!





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-12303) HCatRecordSerDe throw a IndexOutOfBoundsException

2015-10-30 Thread Xiaowei Wang (JIRA)

Xiaowei Wang created HIVE-12303:
---

 Summary:  HCatRecordSerDe  throw a IndexOutOfBoundsException 
 Key: HIVE-12303
 URL: https://issues.apache.org/jira/browse/HIVE-12303
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 1.2.1, 0.14.0
Reporter: Xiaowei Wang
Assignee: Sushanth Sowmyan
 Fix For: 1.2.1


When access hive table using hcatlog in Pig,sometime it throws a exception !

Exception

{noformat}
2015-10-30 06:44:35,219 WARN [Thread-4] org.apache.hadoop.mapred.YarnChild: 
Exception running child : org.apache.pig.backend.executionengine.ExecException: 
ERROR 6018: Error converting read value to tuple
at 
org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:76)
at org.apache.hive.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:59)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:204)
at 
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:553)
at 
org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at 
org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.lang.IndexOutOfBoundsException: Index: 24, Size: 24
at java.util.ArrayList.rangeCheck(ArrayList.java:635)
at java.util.ArrayList.get(ArrayList.java:411)
at 
org.apache.hive.hcatalog.data.HCatRecordSerDe.serializeStruct(HCatRecordSerDe.java:175)
at 
org.apache.hive.hcatalog.data.HCatRecordSerDe.serializeList(HCatRecordSerDe.java:244)
at 
org.apache.hive.hcatalog.data.HCatRecordSerDe.serializeField(HCatRecordSerDe.java:196)
at 
org.apache.hive.hcatalog.data.LazyHCatRecord.get(LazyHCatRecord.java:53)
at 
org.apache.hive.hcatalog.data.LazyHCatRecord.get(LazyHCatRecord.java:97)
at 
org.apache.hive.hcatalog.mapreduce.HCatRecordReader.nextKeyValue(HCatRecordReader.java:204)
at 
org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:63)
... 13 more

{noformat}





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-12229) Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].

2015-10-21 Thread Lifeng Wang (JIRA)

Lifeng Wang created HIVE-12229:
--

 Summary: Custom script in query cannot be executed in yarn-cluster 
mode [Spark Branch].
 Key: HIVE-12229
 URL: https://issues.apache.org/jira/browse/HIVE-12229
 Project: Hive
  Issue Type: Bug
  Components: Spark
Affects Versions: 1.1.0
Reporter: Lifeng Wang


Added one python script in the query and the python script cannot be found 
during execution in yarn-cluster mode.
{noformat}
15/10/21 21:10:55 INFO exec.ScriptOperator: Executing [/usr/bin/python, 
q2-sessionize.py, 3600]
15/10/21 21:10:55 INFO exec.ScriptOperator: tablename=null
15/10/21 21:10:55 INFO exec.ScriptOperator: partname=null
15/10/21 21:10:55 INFO exec.ScriptOperator: alias=null
15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 10 rows: used 
memory = 324896224
15/10/21 21:10:55 INFO exec.ScriptOperator: ErrorStreamProcessor calling 
reporter.progress()
/usr/bin/python: can't open file 'q2-sessionize.py': [Errno 2] No such file or 
directory
15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread OutputProcessor done
15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread ErrorProcessor done
15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 100 rows: used 
memory = 325619920
15/10/21 21:10:55 ERROR exec.ScriptOperator: Error in writing to script: Stream 
closed
15/10/21 21:10:55 INFO exec.ScriptOperator: The script did not consume all 
input data. This is considered as an error.
15/10/21 21:10:55 INFO exec.ScriptOperator: set 
hive.exec.script.allow.partial.consumption=true; to ignore it.
15/10/21 21:10:55 ERROR spark.SparkReduceRecordHandler: Fatal error: 
org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row 
(tag=0) 
{"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}}
org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row 
(tag=0) 
{"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}}
at 
org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:340)
at 
org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:289)
at 
org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49)
at 
org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
at 
org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95)
at 
scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
at 
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:99)
at 
org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20001]: An 
error occurred while reading or writing to your custom script. It may have 
crashed with an error.
at 
org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:453)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
at 
org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:331)
... 14 more
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-11126) multiple insert fails when select with group by clause

2015-06-26 Thread Guodong Wang (JIRA)

Guodong Wang created HIVE-11126:
---

 Summary: multiple insert fails when select with group by clause
 Key: HIVE-11126
 URL: https://issues.apache.org/jira/browse/HIVE-11126
 Project: Hive
  Issue Type: Bug
  Components: Parser
Affects Versions: 0.12.0
Reporter: Guodong Wang


When the select statement contains group by clause, multiple insert fails.

Here is the sample sql.
{code}
from test_src_table 
insert overwrite table test_target_table partition(p)
select src_id as id, lala  as p
group by src_id

insert overwrite table test_target_table partition(p)
select id, p from
select src_id as id, papa as p
group by src_id
{code}

The exception is like this
{code}
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
Hive Runtime Error while processing row 
{_col0:1107625...@qq.com,_col1:lala}
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:195)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row {_col0:1107625...@qq.com,_col1:lala}
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(Ex

FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.mr.MapRedTask
{code}





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-11095) SerDeUtils another bug ,when Text is reused

2015-06-24 Thread xiaowei wang (JIRA)

xiaowei wang created HIVE-11095:
---

 Summary: SerDeUtils  another bug ,when Text is reused
 Key: HIVE-11095
 URL: https://issues.apache.org/jira/browse/HIVE-11095
 Project: Hive
  Issue Type: Bug
  Components: API, CLI
Affects Versions: 1.2.0, 1.0.0, 0.14.0
 Environment: Hadoop 2.3.0-cdh5.0.0
Hive 0.14
Reporter: xiaowei wang
Assignee: xiaowei wang
Priority: Critical
 Fix For: 1.2.0


the method transformTextFromUTF8 have a bug, 
When i query data from a lzo table ， I found in results ： the length of the 
current row is always largr than the previous row， and sometimes，the current 
row contains the contents of the previous row。 For example ，i execute a sql 
,select * from web_searchhub where logdate=2015061003, the result of sql see 
blow.Notice that ,the second row content contains the first row content.
INFO [03:00:05.589] HttpFrontServer::FrontSH 
msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
INFO [03:00:05.594] 18941e66-9962-44ad-81bc-3519f47ba274 
session=901,thread=223ession=3151,thread=254 2015061003
The content of origin lzo file content see below ,just 2 rows.
INFO [03:00:05.635] b88e0473-7530-494c-82d8-e2d2ebd2666c_forweb 
session=3148,thread=285
INFO [03:00:05.635] HttpFrontServer::FrontSH 
msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
I think this error is caused by the Text reuse,and I found the solutions .
Addicational, table create sql is : 
CREATE EXTERNAL TABLE `web_searchhub`(
`line` string)
PARTITIONED BY (
`logdate` string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '
U'
WITH SERDEPROPERTIES (
'serialization.encoding'='GBK')
STORED AS INPUTFORMAT com.hadoop.mapred.DeprecatedLzoTextInputFormat
OUTPUTFORMAT org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat;
LOCATION
'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' ；



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-10983) Lazysimpleserde bug when Text is reused

2015-06-11 Thread xiaowei wang (JIRA)

xiaowei wang created HIVE-10983:
---

 Summary: Lazysimpleserde bug  when Text is reused 
 Key: HIVE-10983
 URL: https://issues.apache.org/jira/browse/HIVE-10983
 Project: Hive
  Issue Type: Bug
  Components: API
Affects Versions: 0.14.0
 Environment: Hadoop 2.3.0-cdh5.0.0
Hive 0.14
Reporter: xiaowei wang
Assignee: xiaowei wang
Priority: Critical


When i query data from a lzo table ， I found  in results ： the length of the 
current row is always largr  than the previous row， and sometimes，the current  
row contains the contents of the previous row。 For example ，i execute a sql 
,select *   from web_searchhub where logdate=2015061003, the result of sql 
see blow.Notice that ,the second row content contains the first row content.

INFO [03:00:05.589] HttpFrontServer::FrontSH 
msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
INFO [03:00:05.594] 18941e66-9962-44ad-81bc-3519f47ba274 
session=901,thread=223ession=3151,thread=254 2015061003

The content  of origin lzo file content see below ,just 2 rows.

INFO [03:00:05.635] b88e0473-7530-494c-82d8-e2d2ebd2666c_forweb 
session=3148,thread=285
INFO [03:00:05.635] HttpFrontServer::FrontSH 
msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285


I think this error is caused by the Text reuse,and I found the solutions .

Addicational, table create sql is : 
CREATE EXTERNAL TABLE `web_searchhub`(
  `line` string)
PARTITIONED BY (
  `logdate` string)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '\\U'
WITH SERDEPROPERTIES (
  'serialization.encoding'='GBK')
STORED AS INPUTFORMAT  com.hadoop.mapred.DeprecatedLzoTextInputFormat
  OUTPUTFORMAT 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat;

LOCATION
  'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' ；




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-10790) orc file sql excute fail

2015-05-21 Thread xiaowei wang (JIRA)

xiaowei wang created HIVE-10790:
---

 Summary: orc file sql excute fail 
 Key: HIVE-10790
 URL: https://issues.apache.org/jira/browse/HIVE-10790
 Project: Hive
  Issue Type: Bug
  Components: API
Affects Versions: 0.14.0, 0.13.0
 Environment: Hadoop 2.5.0-cdh5.3.2 
hive 0.14
Reporter: xiaowei wang
Assignee: xiaowei wang


from a text table insert into a orc table，like as 
insert overwrite table custom.rank_less_orc_none 
partition(logdate='2015051500') select ur,rf,it,dt from custom.rank_text where 
logdate='2015051500';

will throws a error ,Error: java.lang.RuntimeException: Hive Runtime Error 
while closing operators
at 
org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:260)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: org.apache.hadoop.fs.viewfs.NotInMountpointException: 
getDefaultReplication on empty path is invalid
at 
org.apache.hadoop.fs.viewfs.ViewFileSystem.getDefaultReplication(ViewFileSystem.java:593)
at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl.getStream(WriterImpl.java:1750)
at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1767)
at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2040)
at 
org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:105)
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:164)
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:842)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:577)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
at 
org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:227)
... 8 more




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-10237) create external table， location path contains space ，like '/user/hive/warehouse/custom.db/uigs_kmap '

2015-04-07 Thread xiaowei wang (JIRA)

xiaowei wang created HIVE-10237:
---

 Summary: create external table， location  path contains space 
，like '/user/hive/warehouse/custom.db/uigs_kmap ' 
 Key: HIVE-10237
 URL: https://issues.apache.org/jira/browse/HIVE-10237
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.13.1
 Environment: Hadoop 2.3.0-cdh5.0.0 
hive 0.13.1
Reporter: xiaowei wang


when i want to create a external table and give the table a location ，i write a 
wront location path， /user/hive/warehouse/custom.db/uigs_kmap  ，which 
contains a space at the end of the path。 I think hive will trim the space of 
the location，but it does not。



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-10176) skip.header.line.count causes values to be skipped when performing insert values

2015-03-31 Thread Wenbo Wang (JIRA)

Wenbo Wang created HIVE-10176:
-

 Summary: skip.header.line.count causes values to be skipped when 
performing insert values
 Key: HIVE-10176
 URL: https://issues.apache.org/jira/browse/HIVE-10176
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.0.0
Reporter: Wenbo Wang


When inserting values in to tables with TBLPROPERTIES 
(skip.header.line.count=1) the first value listed is also skipped. 

create table test (row int, name string) TBLPROPERTIES 
(skip.header.line.count=1); 
load data local inpath '/root/data' into table test;
insert into table test values (1, 'a'), (2, 'b'), (3, 'c');

(1, 'a') isn't inserted into the table. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-9940) The standard output of Python reduce script can not be interpreted correctly by Hive

2015-03-12 Thread Eric Wang (JIRA)

Eric Wang created HIVE-9940:
---

 Summary: The standard output of Python reduce script can not be 
interpreted correctly by Hive
 Key: HIVE-9940
 URL: https://issues.apache.org/jira/browse/HIVE-9940
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Eric Wang


use HQL statement like:
FROM (
  select_statement
  ) map_output
INSERT OVERWRITE TABLE table
  REDUCE map_output.a, map_output.b
  USING 'py_script'
  AS col1, col2;

(1)original type
stdout of Python has Records where the 2nd column = 'Meerjungfrau'
527500  Meerjungfrau25  AO DE   20140704
...

Hive interprets these as:
527500  Meernull  AO DE   20140704
...

stderr_log interprets these as:
527500  Meerjungfrau25  AO DE   20140704

(2)change all 'Meerjungfrau' to 'bug' in Python script
stdout of Python has Records where the 2nd column = 'bug'
527500  bug 25  AO DE   20140704
...

Hive interprets these as:
527500  b   null  AO DE   20140704
...

stderr_log interprets these as:
527500  bug 25  AO DE   20140704

(3)put 2nd column to the last column
stdout of Python has Records where the 2nd column = 'Meerjungfrau'
527500  25  AO DE   20140704Meerjungfrau
...

Hive interprets these as:
527500  25  null  20140704Meerjungfrau
...

stderr_log interprets these as:
527500  25  AO DE   20140704Meerjungfrau



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8119) Implement Date in ParquetSerde

2015-01-27 Thread Adrian Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14294832#comment-14294832
 ] 

Adrian Wang commented on HIVE-8119:
---

Is there any update?

 Implement Date in ParquetSerde
 --

 Key: HIVE-8119
 URL: https://issues.apache.org/jira/browse/HIVE-8119
 Project: Hive
  Issue Type: Sub-task
Reporter: Brock Noland
Assignee: Mohit Sabharwal

 Date type in Parquet is discussed here: 
 http://mail-archives.apache.org/mod_mbox/incubator-parquet-dev/201406.mbox/%3CCAKa9qDkp7xn+H8fNZC7ms3ckd=xr8gdpe7gqgj5o+pybdem...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8439) query processor fails to handle multiple insert clauses for the same table

2014-10-28 Thread Gordon Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gordon Wang updated HIVE-8439:
--
Summary: query processor fails to handle multiple insert clauses for the 
same table  (was: multiple insert into the same table)

 query processor fails to handle multiple insert clauses for the same table
 --

 Key: HIVE-8439
 URL: https://issues.apache.org/jira/browse/HIVE-8439
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.12.0, 0.13.0
Reporter: Gordon Wang

 when putting multiple inserts for the same table in one SQL, hive query plan 
 analyzer fails to synthesis the right plan.
 Here is the reproduce steps.
 {noformat}
 create table T1(i int, j int);
 create table T2(m int) partitioned by (n int);
 explain from T1
 insert into table T2 partition (n = 1)
   select T1.i where T1.j = 1
 insert overwrite table T2 partition (n = 2)
   select T1.i where T1.j = 2
   ;
 {noformat}
 When there is a insert into clause in the multiple insert part, the insert 
 overwrite is considered as insert into.
 I dig into the source code, looks like Hive does not support mixing insert 
 into and insert overwrite for the same table in multiple insert clauses.
 Here is my finding.
 1. in semantic analyzer, when processing TOK_INSERT_INTO, the analyzer will 
 put the table name into a set which contains all the insert into table names.
 2. when generating file sink plan, the analyzer will check if the table name 
 is in the set, if in the set, the replace flag is set to false. Here is the 
 code snippet.
 {noformat}
   // Create the work for moving the table
   // NOTE: specify Dynamic partitions in dest_tab for WriteEntity
   if (!isNonNativeTable) {
 ltd = new LoadTableDesc(queryTmpdir, 
 ctx.getExternalTmpFileURI(dest_path.toUri()),
 table_desc, dpCtx);
 
 ltd.setReplace(!qb.getParseInfo().isInsertIntoTable(dest_tab.getDbName(),
 dest_tab.getTableName()));
 ltd.setLbCtx(lbCtx);
 if (holdDDLTime) {
   LOG.info(this query will not update transient_lastDdlTime!);
   ltd.setHoldDDLTime(true);
 }
 loadTableWork.add(ltd);
   }
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8439) multiple insert into the same table

2014-10-27 Thread Gordon Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gordon Wang updated HIVE-8439:
--
Description: 
when putting multiple inserts for the same table in one SQL, hive query plan 
analyzer fails to synthesis the right plan.

Here is the reproduce steps.
{noformat}
create table T1(i int, j int);
create table T2(m int) partitioned by (n int);
explain from T1
insert into table T2 partition (n = 1)
  select T1.i where T1.j = 1
insert overwrite table T2 partition (n = 2)
  select T1.i where T1.j = 2
  ;
{noformat}
When there is a insert into clause in the multiple insert part, the insert 
overwrite is considered as insert into.

I dig into the source code, looks like Hive does not support mixing insert 
into and insert overwrite for the same table in multiple insert clauses.

Here is my finding.
1. in semantic analyzer, when processing TOK_INSERT_INTO, the analyzer will put 
the table name into a set which contains all the insert into table names.
2. when generating file sink plan, the analyzer will check if the table name is 
in the set, if in the set, the replace flag is set to false. Here is the code 
snippet.
{noformat}
  // Create the work for moving the table
  // NOTE: specify Dynamic partitions in dest_tab for WriteEntity
  if (!isNonNativeTable) {
ltd = new LoadTableDesc(queryTmpdir, 
ctx.getExternalTmpFileURI(dest_path.toUri()),
table_desc, dpCtx);

ltd.setReplace(!qb.getParseInfo().isInsertIntoTable(dest_tab.getDbName(),
dest_tab.getTableName()));
ltd.setLbCtx(lbCtx);

if (holdDDLTime) {
  LOG.info(this query will not update transient_lastDdlTime!);
  ltd.setHoldDDLTime(true);
}
loadTableWork.add(ltd);
  }
{noformat}

  was:
when putting multiple inserts for the same table in one SQL, hive query plan 
analyzer fails to synthesis the right plan.

Here is the reproduce steps.
{noformat}
create table T1(i int, j int);
create table T2(m int) partitioned by (n int);
explain from T1
insert into table T2 partition (n = 1)
  select T1.i where T1.j = 1
insert overwrite table T2 partition (n = 2)
  select T1.i where T1.j = 2
  ;
{noformat}
When there is a insert into clause in the multiple insert part, the insert 
overwrite is considered as insert into.

I dig into the source code, looks like Hive does not support mixing insert 
into and insert overwrite for the same table in multiple insert clauses.

Here is my finding.
1. in semantic analyzer, when processing TOK_INSERT_INTO, the analyzer will put 
the table name into a set which contains all the insert into table names.
2. when generate file sink plan, the analyzer will check if the table name is 
in the set, if in the set, the replace flag is set to false. Here is the code 
snippet.
{noformat}
  // Create the work for moving the table
  // NOTE: specify Dynamic partitions in dest_tab for WriteEntity
  if (!isNonNativeTable) {
ltd = new LoadTableDesc(queryTmpdir, 
ctx.getExternalTmpFileURI(dest_path.toUri()),
table_desc, dpCtx);

ltd.setReplace(!qb.getParseInfo().isInsertIntoTable(dest_tab.getDbName(),
dest_tab.getTableName()));
ltd.setLbCtx(lbCtx);

if (holdDDLTime) {
  LOG.info(this query will not update transient_lastDdlTime!);
  ltd.setHoldDDLTime(true);
}
loadTableWork.add(ltd);
  }
{noformat}


 multiple insert into the same table
 ---

 Key: HIVE-8439
 URL: https://issues.apache.org/jira/browse/HIVE-8439
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.12.0, 0.13.0
Reporter: Gordon Wang

 when putting multiple inserts for the same table in one SQL, hive query plan 
 analyzer fails to synthesis the right plan.
 Here is the reproduce steps.
 {noformat}
 create table T1(i int, j int);
 create table T2(m int) partitioned by (n int);
 explain from T1
 insert into table T2 partition (n = 1)
   select T1.i where T1.j = 1
 insert overwrite table T2 partition (n = 2)
   select T1.i where T1.j = 2
   ;
 {noformat}
 When there is a insert into clause in the multiple insert part, the insert 
 overwrite is considered as insert into.
 I dig into the source code, looks like Hive does not support mixing insert 
 into and insert overwrite for the same table in multiple insert clauses.
 Here is my finding.
 1. in semantic analyzer, when processing TOK_INSERT_INTO, the analyzer will 
 put the table name into a set which contains all the insert into table names.
 2. when generating file sink plan, the analyzer will check if the table name 
 is in the set, if in the set, the replace flag is set to false. Here is the 
 code snippet.
 {noformat}
   // Create the work for moving the table
   // NOTE: specify Dynamic

[jira] [Commented] (HIVE-8439) multiple insert into the same table

2014-10-27 Thread Gordon Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14184938#comment-14184938
 ] 

Gordon Wang commented on HIVE-8439:
---

currently, hive semantic analyzer can not handle multiple insert clause 
correctly. When mixing INSERT INTO and INSERT OVERWRITE with the same 
table, semantic analyzer can not aware which clause is OVERWRITE.

Some more information about overwrite clause should be recorded in QueryBlock.

 multiple insert into the same table
 ---

 Key: HIVE-8439
 URL: https://issues.apache.org/jira/browse/HIVE-8439
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.12.0, 0.13.0
Reporter: Gordon Wang

 when putting multiple inserts for the same table in one SQL, hive query plan 
 analyzer fails to synthesis the right plan.
 Here is the reproduce steps.
 {noformat}
 create table T1(i int, j int);
 create table T2(m int) partitioned by (n int);
 explain from T1
 insert into table T2 partition (n = 1)
   select T1.i where T1.j = 1
 insert overwrite table T2 partition (n = 2)
   select T1.i where T1.j = 2
   ;
 {noformat}
 When there is a insert into clause in the multiple insert part, the insert 
 overwrite is considered as insert into.
 I dig into the source code, looks like Hive does not support mixing insert 
 into and insert overwrite for the same table in multiple insert clauses.
 Here is my finding.
 1. in semantic analyzer, when processing TOK_INSERT_INTO, the analyzer will 
 put the table name into a set which contains all the insert into table names.
 2. when generating file sink plan, the analyzer will check if the table name 
 is in the set, if in the set, the replace flag is set to false. Here is the 
 code snippet.
 {noformat}
   // Create the work for moving the table
   // NOTE: specify Dynamic partitions in dest_tab for WriteEntity
   if (!isNonNativeTable) {
 ltd = new LoadTableDesc(queryTmpdir, 
 ctx.getExternalTmpFileURI(dest_path.toUri()),
 table_desc, dpCtx);
 
 ltd.setReplace(!qb.getParseInfo().isInsertIntoTable(dest_tab.getDbName(),
 dest_tab.getTableName()));
 ltd.setLbCtx(lbCtx);
 if (holdDDLTime) {
   LOG.info(this query will not update transient_lastDdlTime!);
   ltd.setHoldDDLTime(true);
 }
 loadTableWork.add(ltd);
   }
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8532) return code of source xxx clause is missing

2014-10-23 Thread Gordon Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182375#comment-14182375
 ] 

Gordon Wang commented on HIVE-8532:
---

Looks like the UT failure is not caused by this patch. The failure UT is not in 
the changed code path.

 return code of source xxx clause is missing
 -

 Key: HIVE-8532
 URL: https://issues.apache.org/jira/browse/HIVE-8532
 Project: Hive
  Issue Type: Bug
  Components: Clients
Affects Versions: 0.12.0, 0.13.1
Reporter: Gordon Wang
 Attachments: HIVE-8532.patch


 When executing source hql-file  clause, hive client driver does not catch 
 the return code of this command.
 This behaviour causes an issue when running hive query in Oozie workflow.
 When the source clause is put into a Oozie workflow, Oozie can not get the 
 return code of this command. Thus, Oozie consider the source clause as 
 successful all the time. 
 So, when the source clause fails, the hive query does not abort and the 
 oozie workflow does not abort either.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-8532) return code of source xxx clause is missing

2014-10-20 Thread Gordon Wang (JIRA)

Gordon Wang created HIVE-8532:
-

 Summary: return code of source xxx clause is missing
 Key: HIVE-8532
 URL: https://issues.apache.org/jira/browse/HIVE-8532
 Project: Hive
  Issue Type: Bug
  Components: Clients
Affects Versions: 0.13.1, 0.12.0
Reporter: Gordon Wang


When executing source hql-file  clause, hive client driver does not catch 
the return code of this command.

This behaviour causes an issue when running hive query in Oozie workflow.
When the source clause is put into a Oozie workflow, Oozie can not get the 
return code of this command. Thus, Oozie consider the source clause as 
successful all the time. 

So, when the source clause fails, the hive query does not abort and the oozie 
workflow does not abort either.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8532) return code of source xxx clause is missing

2014-10-20 Thread Gordon Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14177832#comment-14177832
 ] 

Gordon Wang commented on HIVE-8532:
---

The fix is easy, I think a patch would come soon.

 return code of source xxx clause is missing
 -

 Key: HIVE-8532
 URL: https://issues.apache.org/jira/browse/HIVE-8532
 Project: Hive
  Issue Type: Bug
  Components: Clients
Affects Versions: 0.12.0, 0.13.1
Reporter: Gordon Wang

 When executing source hql-file  clause, hive client driver does not catch 
 the return code of this command.
 This behaviour causes an issue when running hive query in Oozie workflow.
 When the source clause is put into a Oozie workflow, Oozie can not get the 
 return code of this command. Thus, Oozie consider the source clause as 
 successful all the time. 
 So, when the source clause fails, the hive query does not abort and the 
 oozie workflow does not abort either.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-8439) multiple insert into the same table

2014-10-12 Thread Gordon Wang (JIRA)

Gordon Wang created HIVE-8439:
-

 Summary: multiple insert into the same table
 Key: HIVE-8439
 URL: https://issues.apache.org/jira/browse/HIVE-8439
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.13.0, 0.12.0
Reporter: Gordon Wang


when putting multiple inserts for the same table in one SQL, hive query plan 
analyzer fails to synthesis the right plan.

Here is the reproduce steps.
{noformat}
create table T1(i int, j int);
create table T2(m int) partitioned by (n int);
explain from T1
insert into table T2 partition (n = 1)
  select T1.i where T1.j = 1
insert overwrite table T2 partition (n = 2)
  select T1.i where T1.j = 2
  ;
{noformat}
When there is a insert into clause in the multiple insert part, the insert 
overwrite is considered as insert into.

I dig into the source code, looks like Hive does not support mixing insert 
into and insert overwrite for the same table in multiple insert clauses.

Here is my finding.
1. in semantic analyzer, when processing TOK_INSERT_INTO, the analyzer will put 
the table name into a set which contains all the insert into table names.
2. when generate file sink plan, the analyzer will check if the table name is 
in the set, if in the set, the replace flag is set to false. Here is the code 
snippet.
{noformat}
  // Create the work for moving the table
  // NOTE: specify Dynamic partitions in dest_tab for WriteEntity
  if (!isNonNativeTable) {
ltd = new LoadTableDesc(queryTmpdir, 
ctx.getExternalTmpFileURI(dest_path.toUri()),
table_desc, dpCtx);

ltd.setReplace(!qb.getParseInfo().isInsertIntoTable(dest_tab.getDbName(),
dest_tab.getTableName()));
ltd.setLbCtx(lbCtx);

if (holdDDLTime) {
  LOG.info(this query will not update transient_lastDdlTime!);
  ltd.setHoldDDLTime(true);
}
loadTableWork.add(ltd);
  }
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-2777) ability to add and drop partitions atomically

2014-09-29 Thread Xinyu Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinyu Wang updated HIVE-2777:
-
Attachment: (was: hive-2777.patch)

 ability to add and drop partitions atomically
 -

 Key: HIVE-2777
 URL: https://issues.apache.org/jira/browse/HIVE-2777
 Project: Hive
  Issue Type: New Feature
  Components: Metastore
Affects Versions: 0.13.0
Reporter: Aniket Mokashi
Assignee: Aniket Mokashi
 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2777.D2271.1.patch


 Hive should have ability to atomically add and drop partitions. This way 
 admins can change partitions atomically without breaking the running jobs. It 
 allows admin to merge several partitions into one.
 Essentially, we would like to have an api- add_drop_partitions(String db, 
 String tbl_name, ListPartition addParts, ListListString dropParts, 
 boolean deleteData);
 This jira covers changes required for metastore and thrift.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-2777) ability to add and drop partitions atomically

2014-09-29 Thread Xinyu Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinyu Wang updated HIVE-2777:
-
Attachment: hive-2777-updated.patch

Updated hive-2777 patch which fixed all the testPartition() tests. Also for the 
other failed tests, they are failing as well in branch-0.13 too. So please help 
rerun the test again. Thank you!

 ability to add and drop partitions atomically
 -

 Key: HIVE-2777
 URL: https://issues.apache.org/jira/browse/HIVE-2777
 Project: Hive
  Issue Type: New Feature
  Components: Metastore
Affects Versions: 0.13.0
Reporter: Aniket Mokashi
Assignee: Aniket Mokashi
 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2777.D2271.1.patch, 
 hive-2777-updated.patch


 Hive should have ability to atomically add and drop partitions. This way 
 admins can change partitions atomically without breaking the running jobs. It 
 allows admin to merge several partitions into one.
 Essentially, we would like to have an api- add_drop_partitions(String db, 
 String tbl_name, ListPartition addParts, ListListString dropParts, 
 boolean deleteData);
 This jira covers changes required for metastore and thrift.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-6179) OOM occurs when query spans to a large number of partitions

2014-08-31 Thread perry wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

perry wang updated HIVE-6179:
-
Description: 
When executing a query against a large number of partitions, such as select 
count(*) from table, OOM error may occur because Hive fetches the metadata for 
all partitions involved and tries to store it in memory.
{code}
2014-01-09 13:14:17,090 ERROR metastore.RetryingHMSHandler 
(RetryingHMSHandler.java:invoke(141)) - java.lang.OutOfMemoryError: Java heap 
space
at java.util.Arrays.copyOf(Arrays.java:2367)
at 
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130)
at 
java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114)
at 
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:415)
at java.lang.StringBuffer.append(StringBuffer.java:237)
at 
org.apache.derby.impl.sql.conn.GenericStatementContext.appendErrorInfo(Unknown 
Source)
at 
org.apache.derby.iapi.services.context.ContextManager.cleanupOnError(Unknown 
Source)
at 
org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown 
Source)
at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown 
Source)
at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown 
Source)
at 
org.apache.derby.impl.jdbc.EmbedResultSet.closeOnTransactionError(Unknown 
Source)
at org.apache.derby.impl.jdbc.EmbedResultSet.movePosition(Unknown 
Source)
at org.apache.derby.impl.jdbc.EmbedResultSet.next(Unknown Source) 
at 
org.datanucleus.store.rdbms.query.ForwardQueryResult.nextResultSetElement(ForwardQueryResult.java:191)
at 
org.datanucleus.store.rdbms.query.ForwardQueryResult$QueryResultIterator.next(ForwardQueryResult.java:379)
at 
org.apache.hadoop.hive.metastore.MetaStoreDirectSql.loopJoinOrderedResult(MetaStoreDirectSql.java:641)
at 
org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:410)
at 
org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitions(MetaStoreDirectSql.java:205)
at 
org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsInternal(ObjectStore.java:1433)
at 
org.apache.hadoop.hive.metastore.ObjectStore.getPartitions(ObjectStore.java:1420)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at 
org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:122)
at com.sun.proxy.$Proxy7.getPartitions(Unknown Source)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions(HiveMetaStore.java:2128)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:103)
{code}
The above error happened when executing select count(*) on a table with 40K 
partitions.

  was:
When executing a query against a large number of partitions, such as select 
count(*) from table, OOM error may occur because Hive fetches the metadata for 
all partitions involved and tries to store it in memory.
{code}
2014-01-09 13:14:17,090 ERROR metastore.RetryingHMSHandler 
(RetryingHMSHandler.java:invoke(141)) - java.lang.OutOfMemoryError: Java heap 
space
at java.util.Arrays.copyOf(Arrays.java:2367)
at 
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130)
at 
java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114)
at 
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:415)
at java.lang.StringBuffer.append(StringBuffer.java:237)
at 
org.apache.derby.impl.sql.conn.GenericStatementContext.appendErrorInfo(Unknown 
Source)
at 
org.apache.derby.iapi.services.context.ContextManager.cleanupOnError(Unknown 
Source)
at 
org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown 
Source)
at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown 
Source)
at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown 
Source)
at 
org.apache.derby.impl.jdbc.EmbedResultSet.closeOnTransactionError(Unknown 
Source)
at org.apache.derby.impl.jdbc.EmbedResultSet.movePosition(Unknown 
Source)
at

[jira] [Commented] (HIVE-7384) Research into reduce-side join [Spark Branch]

2014-08-21 Thread Lianhui Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105301#comment-14105301
 ] 

Lianhui Wang commented on HIVE-7384:


i think current spark already support hash by join_col,sort by {join_col,tag}. 
because in spark map's shuffleWriter hash by Key.hashcode and sort by Key and 
in Hive HiveKey class already define the hashcode. so that can support hash by 
HiveKey.hashcode, sort by HiveKey's bytes

 Research into reduce-side join [Spark Branch]
 -

 Key: HIVE-7384
 URL: https://issues.apache.org/jira/browse/HIVE-7384
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Szehon Ho
 Attachments: Hive on Spark Reduce Side Join.docx, sales_items.txt, 
 sales_products.txt, sales_stores.txt


 Hive's join operator is very sophisticated, especially for reduce-side join. 
 While we expect that other types of join, such as map-side join and SMB 
 map-side join, will work out of the box with our design, there may be some 
 complication in reduce-side join, which extensively utilizes key tag and 
 shuffle behavior. Our design principle prefers to making Hive implementation 
 work out of box also, which might requires new functionality from Spark. The 
 tasks is to research into this area, identifying requirements for Spark 
 community and the work to be done on Hive to make reduce-side join work.
 A design doc might be needed for this. For more information, please refer to 
 the overall design doc on wiki.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7384) Research into reduce-side join [Spark Branch]

2014-08-21 Thread Lianhui Wang (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-7384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106343#comment-14106343
]

Lianhui Wang commented on HIVE-7384:

@Szehon Ho yes,i read OrderedRDDFunctions code and discove that sortByKey
actually does a range-partition. we need to replace range-partition with hash
partition. so spark maybe should create a new interface example:
partitionSortByKey.
@Brock Noland code in 1) means when sample data and more than one reducers,
Hive does a total order sort. so join does not sample data, it does not need a
total order sort.
2) i think we really need auto-parallelism. before i talk it with Reynold Xin,
spark need to support re-partition mapoutput's data as same as tez does.

Research into reduce-side join [Spark Branch]
-

Key: HIVE-7384
URL: https://issues.apache.org/jira/browse/HIVE-7384
Project: Hive
Issue Type: Sub-task
Components: Spark
Reporter: Xuefu Zhang
Assignee: Szehon Ho
Attachments: Hive on Spark Reduce Side Join.docx, sales_items.txt,
sales_products.txt, sales_stores.txt

Hive's join operator is very sophisticated, especially for reduce-side join.
While we expect that other types of join, such as map-side join and SMB
map-side join, will work out of the box with our design, there may be some
complication in reduce-side join, which extensively utilizes key tag and
shuffle behavior. Our design principle prefers to making Hive implementation
work out of box also, which might requires new functionality from Spark. The
tasks is to research into this area, identifying requirements for Spark
community and the work to be done on Hive to make reduce-side join work.
A design doc might be needed for this. For more information, please refer to
the overall design doc on wiki.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7384) Research into reduce-side join [Spark Branch]

2014-08-21 Thread Lianhui Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106407#comment-14106407
 ] 

Lianhui Wang commented on HIVE-7384:


i think the thoughts is same as ideas that you said before. like HIVE-7158, 
that will auto-calculate the number of reducers based on some input from Hive 
(upper/lower bound).

 Research into reduce-side join [Spark Branch]
 -

 Key: HIVE-7384
 URL: https://issues.apache.org/jira/browse/HIVE-7384
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Szehon Ho
 Attachments: Hive on Spark Reduce Side Join.docx, sales_items.txt, 
 sales_products.txt, sales_stores.txt


 Hive's join operator is very sophisticated, especially for reduce-side join. 
 While we expect that other types of join, such as map-side join and SMB 
 map-side join, will work out of the box with our design, there may be some 
 complication in reduce-side join, which extensively utilizes key tag and 
 shuffle behavior. Our design principle prefers to making Hive implementation 
 work out of box also, which might requires new functionality from Spark. The 
 tasks is to research into this area, identifying requirements for Spark 
 community and the work to be done on Hive to make reduce-side join work.
 A design doc might be needed for this. For more information, please refer to 
 the overall design doc on wiki.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7645) Hive CompactorMR job set NUM_BUCKETS mistake

2014-08-08 Thread Xiaoyu Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090728#comment-14090728
 ] 

Xiaoyu Wang commented on HIVE-7645:
---

This error should not cause by this patch!

 Hive CompactorMR job set NUM_BUCKETS mistake
 

 Key: HIVE-7645
 URL: https://issues.apache.org/jira/browse/HIVE-7645
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 0.13.1
Reporter: Xiaoyu Wang
 Attachments: HIVE-7645.patch


 code:
 job.setInt(NUM_BUCKETS, sd.getBucketColsSize());
 should change to:
 job.setInt(NUM_BUCKETS, sd.getNumBuckets());



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7645) Hive CompactorMR job set NUM_BUCKETS mistake

2014-08-07 Thread Xiaoyu Wang (JIRA)

Xiaoyu Wang created HIVE-7645:
-

 Summary: Hive CompactorMR job set NUM_BUCKETS mistake
 Key: HIVE-7645
 URL: https://issues.apache.org/jira/browse/HIVE-7645
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 0.13.1
Reporter: Xiaoyu Wang


code:
job.setInt(NUM_BUCKETS, sd.getBucketColsSize());
should change to:
job.setInt(NUM_BUCKETS, sd.getNumBuckets());



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7645) Hive CompactorMR job set NUM_BUCKETS mistake

2014-08-07 Thread Xiaoyu Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Wang updated HIVE-7645:
--

Attachment: HIVE-7645.patch

 Hive CompactorMR job set NUM_BUCKETS mistake
 

 Key: HIVE-7645
 URL: https://issues.apache.org/jira/browse/HIVE-7645
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 0.13.1
Reporter: Xiaoyu Wang
 Attachments: HIVE-7645.patch


 code:
 job.setInt(NUM_BUCKETS, sd.getBucketColsSize());
 should change to:
 job.setInt(NUM_BUCKETS, sd.getNumBuckets());



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7645) Hive CompactorMR job set NUM_BUCKETS mistake

2014-08-07 Thread Xiaoyu Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Wang updated HIVE-7645:
--

Status: Patch Available  (was: Open)

 Hive CompactorMR job set NUM_BUCKETS mistake
 

 Key: HIVE-7645
 URL: https://issues.apache.org/jira/browse/HIVE-7645
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 0.13.1
Reporter: Xiaoyu Wang
 Attachments: HIVE-7645.patch


 code:
 job.setInt(NUM_BUCKETS, sd.getBucketColsSize());
 should change to:
 job.setInt(NUM_BUCKETS, sd.getNumBuckets());



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7483) hive insert overwrite table select from self dead lock

2014-07-23 Thread Xiaoyu Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14072654#comment-14072654
 ] 

Xiaoyu Wang commented on HIVE-7483:
---

but still deadlock.

 hive insert overwrite table select from self dead lock
 --

 Key: HIVE-7483
 URL: https://issues.apache.org/jira/browse/HIVE-7483
 Project: Hive
  Issue Type: Bug
  Components: Locking
Affects Versions: 0.13.1
Reporter: Xiaoyu Wang

 CREATE TABLE test(
   id int, 
   msg string)
 PARTITIONED BY ( 
   continent string, 
   country string)
 CLUSTERED BY (id) 
 INTO 10 BUCKETS
 STORED AS ORC;
 alter table test add partition(continent='Asia',country='India');
 in hive-site.xml:
 hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
 hive.support.concurrency=true;
 in hive shell:
 set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
 insert into test table some records first.
 then execute sql:
 insert overwrite table test partition(continent='Asia',country='India') 
 select id,msg from test;
 the log stop at :
 INFO log.PerfLogger: PERFLOG method=acquireReadWriteLocks 
 from=org.apache.hadoop.hive.ql.Driver
 i think it has dead lock when insert overwrite table from it self.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7483) hive insert overwrite table select from self dead lock

2014-07-22 Thread Xiaoyu Wang (JIRA)

Xiaoyu Wang created HIVE-7483:
-

 Summary: hive insert overwrite table select from self dead lock
 Key: HIVE-7483
 URL: https://issues.apache.org/jira/browse/HIVE-7483
 Project: Hive
  Issue Type: Bug
  Components: Locking
Affects Versions: 0.13.1
Reporter: Xiaoyu Wang


CREATE TABLE test(
  id int, 
  msg string)
PARTITIONED BY ( 
  continent string, 
  country string)
CLUSTERED BY (id) 
INTO 10 BUCKETS
STORED AS ORC;

alter table test add partition(continent='Asia',country='India');

in hive-site.xml:
hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
hive.support.concurrency=true;
hive.zookeeper.quorum=zk1,zk2,zk3;

in hive shell:
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;

insert into test table some records first.

then execute sql:
insert overwrite table test partition(continent='Asia',country='India') select 
id,msg from test;

the log stop at :
INFO log.PerfLogger: PERFLOG method=acquireReadWriteLocks 
from=org.apache.hadoop.hive.ql.Driver

i think it has dead lock when insert overwrite table from it self.




--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7483) hive insert overwrite table select from self dead lock

2014-07-22 Thread Xiaoyu Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14071356#comment-14071356
 ] 

Xiaoyu Wang commented on HIVE-7483:
---

yes you are right!

 hive insert overwrite table select from self dead lock
 --

 Key: HIVE-7483
 URL: https://issues.apache.org/jira/browse/HIVE-7483
 Project: Hive
  Issue Type: Bug
  Components: Locking
Affects Versions: 0.13.1
Reporter: Xiaoyu Wang

 CREATE TABLE test(
   id int, 
   msg string)
 PARTITIONED BY ( 
   continent string, 
   country string)
 CLUSTERED BY (id) 
 INTO 10 BUCKETS
 STORED AS ORC;
 alter table test add partition(continent='Asia',country='India');
 in hive-site.xml:
 hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
 hive.support.concurrency=true;
 hive.zookeeper.quorum=zk1,zk2,zk3;
 in hive shell:
 set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
 insert into test table some records first.
 then execute sql:
 insert overwrite table test partition(continent='Asia',country='India') 
 select id,msg from test;
 the log stop at :
 INFO log.PerfLogger: PERFLOG method=acquireReadWriteLocks 
 from=org.apache.hadoop.hive.ql.Driver
 i think it has dead lock when insert overwrite table from it self.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7483) hive insert overwrite table select from self dead lock

2014-07-22 Thread Xiaoyu Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Wang updated HIVE-7483:
--

Description: 
CREATE TABLE test(
  id int, 
  msg string)
PARTITIONED BY ( 
  continent string, 
  country string)
CLUSTERED BY (id) 
INTO 10 BUCKETS
STORED AS ORC;

alter table test add partition(continent='Asia',country='India');

in hive-site.xml:
hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
hive.support.concurrency=true;

in hive shell:
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;

insert into test table some records first.

then execute sql:
insert overwrite table test partition(continent='Asia',country='India') select 
id,msg from test;

the log stop at :
INFO log.PerfLogger: PERFLOG method=acquireReadWriteLocks 
from=org.apache.hadoop.hive.ql.Driver

i think it has dead lock when insert overwrite table from it self.


  was:
CREATE TABLE test(
  id int, 
  msg string)
PARTITIONED BY ( 
  continent string, 
  country string)
CLUSTERED BY (id) 
INTO 10 BUCKETS
STORED AS ORC;

alter table test add partition(continent='Asia',country='India');

in hive-site.xml:
hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
hive.support.concurrency=true;
hive.zookeeper.quorum=zk1,zk2,zk3;

in hive shell:
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;

insert into test table some records first.

then execute sql:
insert overwrite table test partition(continent='Asia',country='India') select 
id,msg from test;

the log stop at :
INFO log.PerfLogger: PERFLOG method=acquireReadWriteLocks 
from=org.apache.hadoop.hive.ql.Driver

i think it has dead lock when insert overwrite table from it self.



 hive insert overwrite table select from self dead lock
 --

 Key: HIVE-7483
 URL: https://issues.apache.org/jira/browse/HIVE-7483
 Project: Hive
  Issue Type: Bug
  Components: Locking
Affects Versions: 0.13.1
Reporter: Xiaoyu Wang

 CREATE TABLE test(
   id int, 
   msg string)
 PARTITIONED BY ( 
   continent string, 
   country string)
 CLUSTERED BY (id) 
 INTO 10 BUCKETS
 STORED AS ORC;
 alter table test add partition(continent='Asia',country='India');
 in hive-site.xml:
 hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
 hive.support.concurrency=true;
 in hive shell:
 set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
 insert into test table some records first.
 then execute sql:
 insert overwrite table test partition(continent='Asia',country='India') 
 select id,msg from test;
 the log stop at :
 INFO log.PerfLogger: PERFLOG method=acquireReadWriteLocks 
 from=org.apache.hadoop.hive.ql.Driver
 i think it has dead lock when insert overwrite table from it self.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-707) add group_concat

2014-07-04 Thread Jian Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14052301#comment-14052301
 ] 

Jian Wang commented on HIVE-707:


[~ph4t]
I use this concat_ws(' ', map_keys(UNION_MAP(MAP(your_column, 'dummy' 
method instead of group_concat,but I got a error like this
{code}
FAILED: SemanticException [Error 10011]: Line 172:30 Invalid function 
'UNION_MAP'
{/code}
should I add some jars ?


 add group_concat
 

 Key: HIVE-707
 URL: https://issues.apache.org/jira/browse/HIVE-707
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Min Zhou

 Moving the discussion to a new jira:
 I've implemented group_cat() in a rush, and found something difficult to 
 slove:
 1. function group_cat() has a internal order by clause, currently, we can't 
 implement such an aggregation in hive.
 2. when the strings will be group concated are too large, in another words, 
 if data skew appears, there is often not enough memory to store such a big 
 result.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-2777) ability to add and drop partitions atomically

2014-05-06 Thread Xinyu Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinyu Wang updated HIVE-2777:
-

Attachment: (was: hive-2777.patch)

 ability to add and drop partitions atomically
 -

 Key: HIVE-2777
 URL: https://issues.apache.org/jira/browse/HIVE-2777
 Project: Hive
  Issue Type: New Feature
  Components: Metastore
Affects Versions: 0.13.0
Reporter: Aniket Mokashi
Assignee: Aniket Mokashi
 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2777.D2271.1.patch


 Hive should have ability to atomically add and drop partitions. This way 
 admins can change partitions atomically without breaking the running jobs. It 
 allows admin to merge several partitions into one.
 Essentially, we would like to have an api- add_drop_partitions(String db, 
 String tbl_name, ListPartition addParts, ListListString dropParts, 
 boolean deleteData);
 This jira covers changes required for metastore and thrift.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-2777) ability to add and drop partitions atomically

2014-05-06 Thread Xinyu Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinyu Wang updated HIVE-2777:
-

Attachment: hive-2777.patch

Sorry for the previous patch, I rebased it, and it seems fine now. Can someone 
please review?

 ability to add and drop partitions atomically
 -

 Key: HIVE-2777
 URL: https://issues.apache.org/jira/browse/HIVE-2777
 Project: Hive
  Issue Type: New Feature
  Components: Metastore
Affects Versions: 0.13.0
Reporter: Aniket Mokashi
Assignee: Aniket Mokashi
 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2777.D2271.1.patch, 
 hive-2777.patch


 Hive should have ability to atomically add and drop partitions. This way 
 admins can change partitions atomically without breaking the running jobs. It 
 allows admin to merge several partitions into one.
 Essentially, we would like to have an api- add_drop_partitions(String db, 
 String tbl_name, ListPartition addParts, ListListString dropParts, 
 boolean deleteData);
 This jira covers changes required for metastore and thrift.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6765) ASTNodeOrigin unserializable leads to fail when join with view

2014-05-04 Thread Adrian Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989287#comment-13989287
 ] 

Adrian Wang commented on HIVE-6765:
---

[~cdrome] good catch. I knew the serialization in Hive has been notorious for a 
long time, but I didn't know the progress they made there. Actually, I was real 
curious when I saw my case was OK with Tez with hive-0.13, while I never tried 
Apache's hive-0.13 since there was no official release.

 ASTNodeOrigin unserializable leads to fail when join with view
 --

 Key: HIVE-6765
 URL: https://issues.apache.org/jira/browse/HIVE-6765
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Adrian Wang
 Fix For: 0.13.0

 Attachments: HIVE-6765.patch.1


 when a view contains a UDF, and the view comes into a JOIN operation, Hive 
 will encounter a bug with stack trace like
 Caused by: java.lang.InstantiationException: 
 org.apache.hadoop.hive.ql.parse.ASTNodeOrigin
   at java.lang.Class.newInstance0(Class.java:359)
   at java.lang.Class.newInstance(Class.java:327)
   at sun.reflect.GeneratedMethodAccessor84.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:616)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6765) ASTNodeOrigin unserializable leads to fail when join with view

2014-05-03 Thread Adrian Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13988648#comment-13988648
 ] 

Adrian Wang commented on HIVE-6765:
---

[~selinazh] Thanks for your comment! It's so glad that someone also noticed 
this. Actually, I found that only when there was something like an aggregation 
function in the view, will the problem came up. The problem results from 
cloning the plan, but when joining with view as described, the plan would 
contain a node of ASTNodeOrigin, which does not have a default construct 
method, in which case when duplicating, exception will be thrown.
Could you please try to apply my patch here to see whether your problem is 
resolved? Thanks again.

 ASTNodeOrigin unserializable leads to fail when join with view
 --

 Key: HIVE-6765
 URL: https://issues.apache.org/jira/browse/HIVE-6765
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Adrian Wang
 Fix For: 0.13.0

 Attachments: HIVE-6765.patch.1


 when a view contains a UDF, and the view comes into a JOIN operation, Hive 
 will encounter a bug with stack trace like
 Caused by: java.lang.InstantiationException: 
 org.apache.hadoop.hive.ql.parse.ASTNodeOrigin
   at java.lang.Class.newInstance0(Class.java:359)
   at java.lang.Class.newInstance(Class.java:327)
   at sun.reflect.GeneratedMethodAccessor84.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:616)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-2777) ability to add and drop partitions atomically

2014-05-02 Thread Xinyu Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinyu Wang updated HIVE-2777:
-

Affects Version/s: 0.13.0
   Status: Patch Available  (was: Open)

This is a rebased patch on top of hive branch-0.13. Please review.

 ability to add and drop partitions atomically
 -

 Key: HIVE-2777
 URL: https://issues.apache.org/jira/browse/HIVE-2777
 Project: Hive
  Issue Type: New Feature
  Components: Metastore
Affects Versions: 0.13.0
Reporter: Aniket Mokashi
Assignee: Aniket Mokashi
 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2777.D2271.1.patch


 Hive should have ability to atomically add and drop partitions. This way 
 admins can change partitions atomically without breaking the running jobs. It 
 allows admin to merge several partitions into one.
 Essentially, we would like to have an api- add_drop_partitions(String db, 
 String tbl_name, ListPartition addParts, ListListString dropParts, 
 boolean deleteData);
 This jira covers changes required for metastore and thrift.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-2777) ability to add and drop partitions atomically

2014-05-02 Thread Xinyu Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinyu Wang updated HIVE-2777:
-

Attachment: hive-2777.patch

 ability to add and drop partitions atomically
 -

 Key: HIVE-2777
 URL: https://issues.apache.org/jira/browse/HIVE-2777
 Project: Hive
  Issue Type: New Feature
  Components: Metastore
Affects Versions: 0.13.0
Reporter: Aniket Mokashi
Assignee: Aniket Mokashi
 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2777.D2271.1.patch, 
 hive-2777.patch


 Hive should have ability to atomically add and drop partitions. This way 
 admins can change partitions atomically without breaking the running jobs. It 
 allows admin to merge several partitions into one.
 Essentially, we would like to have an api- add_drop_partitions(String db, 
 String tbl_name, ListPartition addParts, ListListString dropParts, 
 boolean deleteData);
 This jira covers changes required for metastore and thrift.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6765) ASTNodeOrigin unserializable leads to fail when join with view

2014-04-08 Thread Adrian Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrian Wang updated HIVE-6765:
--

Status: Patch Available  (was: Open)

 ASTNodeOrigin unserializable leads to fail when join with view
 --

 Key: HIVE-6765
 URL: https://issues.apache.org/jira/browse/HIVE-6765
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Adrian Wang
 Fix For: 0.13.0

 Attachments: HIVE-6765.patch.1


 when a view contains a UDF, and the view comes into a JOIN operation, Hive 
 will encounter a bug with stack trace like
 Caused by: java.lang.InstantiationException: 
 org.apache.hadoop.hive.ql.parse.ASTNodeOrigin
   at java.lang.Class.newInstance0(Class.java:359)
   at java.lang.Class.newInstance(Class.java:327)
   at sun.reflect.GeneratedMethodAccessor84.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:616)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6765) ASTNodeOrigin unserializable leads to fail when join with view

2014-04-01 Thread Adrian Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrian Wang updated HIVE-6765:
--

Fix Version/s: 0.13.0

 ASTNodeOrigin unserializable leads to fail when join with view
 --

 Key: HIVE-6765
 URL: https://issues.apache.org/jira/browse/HIVE-6765
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Adrian Wang
 Fix For: 0.13.0

 Attachments: HIVE-6765.patch.1


 when a view contains a UDF, and the view comes into a JOIN operation, Hive 
 will encounter a bug with stack trace like
 Caused by: java.lang.InstantiationException: 
 org.apache.hadoop.hive.ql.parse.ASTNodeOrigin
   at java.lang.Class.newInstance0(Class.java:359)
   at java.lang.Class.newInstance(Class.java:327)
   at sun.reflect.GeneratedMethodAccessor84.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:616)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6765) ASTNodeOrigin unserializable leads to fail when join with view

2014-03-28 Thread Adrian Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrian Wang updated HIVE-6765:
--

Component/s: (was: Query Processor)

 ASTNodeOrigin unserializable leads to fail when join with view
 --

 Key: HIVE-6765
 URL: https://issues.apache.org/jira/browse/HIVE-6765
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Adrian Wang
 Attachments: HIVE-6765.patch.1


 when a view contains a UDF, and the view comes into a JOIN operation, Hive 
 will encounter a bug with stack trace like
 Caused by: java.lang.InstantiationException: 
 org.apache.hadoop.hive.ql.parse.ASTNodeOrigin
   at java.lang.Class.newInstance0(Class.java:359)
   at java.lang.Class.newInstance(Class.java:327)
   at sun.reflect.GeneratedMethodAccessor84.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:616)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-6765) ASTNodeOrigin unserializable leads to fail when join with view

2014-03-27 Thread Adrian Wang (JIRA)

Adrian Wang created HIVE-6765:
-

 Summary: ASTNodeOrigin unserializable leads to fail when join with 
view
 Key: HIVE-6765
 URL: https://issues.apache.org/jira/browse/HIVE-6765
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Adrian Wang


when a view contains a UDF, and the view comes into a JOIN operation, Hive will 
encounter a bug with stack trace like
Caused by: java.lang.InstantiationException: 
org.apache.hadoop.hive.ql.parse.ASTNodeOrigin
at java.lang.Class.newInstance0(Class.java:359)
at java.lang.Class.newInstance(Class.java:327)
at sun.reflect.GeneratedMethodAccessor84.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6765) ASTNodeOrigin unserializable leads to fail when join with view

2014-03-27 Thread Adrian Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13949052#comment-13949052
 ] 

Adrian Wang commented on HIVE-6765:
---

I added a PersistenceDelegate in serializeObject() in Class Utilities and 
resolved the problem. later I'll attach the patch.

 ASTNodeOrigin unserializable leads to fail when join with view
 --

 Key: HIVE-6765
 URL: https://issues.apache.org/jira/browse/HIVE-6765
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Adrian Wang

 when a view contains a UDF, and the view comes into a JOIN operation, Hive 
 will encounter a bug with stack trace like
 Caused by: java.lang.InstantiationException: 
 org.apache.hadoop.hive.ql.parse.ASTNodeOrigin
   at java.lang.Class.newInstance0(Class.java:359)
   at java.lang.Class.newInstance(Class.java:327)
   at sun.reflect.GeneratedMethodAccessor84.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:616)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6765) ASTNodeOrigin unserializable leads to fail when join with view

2014-03-27 Thread Adrian Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrian Wang updated HIVE-6765:
--

Attachment: HIVE-6765.patch.1

 ASTNodeOrigin unserializable leads to fail when join with view
 --

 Key: HIVE-6765
 URL: https://issues.apache.org/jira/browse/HIVE-6765
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Adrian Wang
 Attachments: HIVE-6765.patch.1


 when a view contains a UDF, and the view comes into a JOIN operation, Hive 
 will encounter a bug with stack trace like
 Caused by: java.lang.InstantiationException: 
 org.apache.hadoop.hive.ql.parse.ASTNodeOrigin
   at java.lang.Class.newInstance0(Class.java:359)
   at java.lang.Class.newInstance(Class.java:327)
   at sun.reflect.GeneratedMethodAccessor84.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:616)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6765) ASTNodeOrigin unserializable leads to fail when join with view

2014-03-27 Thread Adrian Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13949075#comment-13949075
 ] 

Adrian Wang commented on HIVE-6765:
---

Here's an example to see the Exception:
CREATE TABLE t1 (a1 INT, b1 INT);
CREATE VIEW v1 (x1) AS SELECT MAX(a1) FROM t1;
SELECT s1.x1 FROM v1 s1 JOIN (SELECT MAX(a1) AS ma FROM t1) s2 ON s1.x1 = s2.ma;

This is a bug on both ApacheHive and Tez, outputing return code 1 ...

 ASTNodeOrigin unserializable leads to fail when join with view
 --

 Key: HIVE-6765
 URL: https://issues.apache.org/jira/browse/HIVE-6765
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Adrian Wang
 Attachments: HIVE-6765.patch.1


 when a view contains a UDF, and the view comes into a JOIN operation, Hive 
 will encounter a bug with stack trace like
 Caused by: java.lang.InstantiationException: 
 org.apache.hadoop.hive.ql.parse.ASTNodeOrigin
   at java.lang.Class.newInstance0(Class.java:359)
   at java.lang.Class.newInstance(Class.java:327)
   at sun.reflect.GeneratedMethodAccessor84.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:616)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6765) ASTNodeOrigin unserializable leads to fail when join with view

2014-03-27 Thread Adrian Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13949076#comment-13949076
 ] 

Adrian Wang commented on HIVE-6765:
---

And I think this is just another drawback for using XMLEncoder to clone plan.

 ASTNodeOrigin unserializable leads to fail when join with view
 --

 Key: HIVE-6765
 URL: https://issues.apache.org/jira/browse/HIVE-6765
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Adrian Wang
 Attachments: HIVE-6765.patch.1


 when a view contains a UDF, and the view comes into a JOIN operation, Hive 
 will encounter a bug with stack trace like
 Caused by: java.lang.InstantiationException: 
 org.apache.hadoop.hive.ql.parse.ASTNodeOrigin
   at java.lang.Class.newInstance0(Class.java:359)
   at java.lang.Class.newInstance(Class.java:327)
   at sun.reflect.GeneratedMethodAccessor84.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:616)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6765) ASTNodeOrigin unserializable leads to fail when join with view

2014-03-27 Thread Adrian Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13949113#comment-13949113
 ] 

Adrian Wang commented on HIVE-6765:
---

Sorry, the previous example works on Tez with hive-0.13.
But it fails when I run the query in Hive-0.12 in eclipse.

 ASTNodeOrigin unserializable leads to fail when join with view
 --

 Key: HIVE-6765
 URL: https://issues.apache.org/jira/browse/HIVE-6765
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Adrian Wang
 Attachments: HIVE-6765.patch.1


 when a view contains a UDF, and the view comes into a JOIN operation, Hive 
 will encounter a bug with stack trace like
 Caused by: java.lang.InstantiationException: 
 org.apache.hadoop.hive.ql.parse.ASTNodeOrigin
   at java.lang.Class.newInstance0(Class.java:359)
   at java.lang.Class.newInstance(Class.java:327)
   at sun.reflect.GeneratedMethodAccessor84.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:616)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

1 2 >

1 - 100 of 187 matches

Mail list logo