[jira] [Created] (HIVE-17945) Support column projection for index access when using Parquet Vectorization

2017-10-30 Thread Ferdinand Xu (JIRA)
Ferdinand Xu created HIVE-17945:
---

 Summary: Support column projection for index access when using 
Parquet Vectorization
 Key: HIVE-17945
 URL: https://issues.apache.org/jira/browse/HIVE-17945
 Project: Hive
  Issue Type: Bug
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17920) Vectorized reader does push down projection columns for index access schema

2017-10-27 Thread Ferdinand Xu (JIRA)
Ferdinand Xu created HIVE-17920:
---

 Summary: Vectorized reader does push down projection columns for 
index access schema
 Key: HIVE-17920
 URL: https://issues.apache.org/jira/browse/HIVE-17920
 Project: Hive
  Issue Type: Sub-task
Reporter: Ferdinand Xu






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17783) Hybrid Grace Join has performance degradation for N-way join using Hive on Tez

2017-10-11 Thread Ferdinand Xu (JIRA)
Ferdinand Xu created HIVE-17783:
---

 Summary: Hybrid Grace Join has performance degradation for N-way 
join using Hive on Tez
 Key: HIVE-17783
 URL: https://issues.apache.org/jira/browse/HIVE-17783
 Project: Hive
  Issue Type: Bug
Affects Versions: 2.2.0
 Environment: 8*Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
1 master + 7 workers
TPC-DS at 3TB data scales
Hive version : 2.2.0
Reporter: Ferdinand Xu


Most configurations are using default value. And the benchmark is to test 
enabling against disabling hybrid grace hash join using TPC-DS queries at 3TB 
data scales. Many queries related to N-way join has performance degradation 
over three times test. Detailed result  is attached.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-16795) Measure Performance for Parquet Vectorization Reader

2017-05-30 Thread Ferdinand Xu (JIRA)
Ferdinand Xu created HIVE-16795:
---

 Summary: Measure Performance for Parquet Vectorization Reader
 Key: HIVE-16795
 URL: https://issues.apache.org/jira/browse/HIVE-16795
 Project: Hive
  Issue Type: Sub-task
Reporter: Ferdinand Xu


We need to measure the performance of Parquet Vectorization reader feature 
using TPCx-BB or TPC-DS to see how much performance gain we can archive.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-15156) Support Nested Column Field Pruning for Parquet Vectorized Reader

2016-11-08 Thread Ferdinand Xu (JIRA)
Ferdinand Xu created HIVE-15156:
---

 Summary: Support Nested Column Field Pruning for Parquet 
Vectorized Reader
 Key: HIVE-15156
 URL: https://issues.apache.org/jira/browse/HIVE-15156
 Project: Hive
  Issue Type: Sub-task
Reporter: Ferdinand Xu


As in HIVE-15055, we need support nested column fields pruning for vectorized 
reader as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15112) Implement Parquet vectorization reader for Complex types

2016-11-02 Thread Ferdinand Xu (JIRA)
Ferdinand Xu created HIVE-15112:
---

 Summary: Implement Parquet vectorization reader for Complex types 
 Key: HIVE-15112
 URL: https://issues.apache.org/jira/browse/HIVE-15112
 Project: Hive
  Issue Type: Sub-task
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu


Like HIVE-14815, we need support Parquet vectorized reader for complex types 
like map, struct and union as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14919) Improve the performance of Hive on Spark 2.0.0

2016-10-09 Thread Ferdinand Xu (JIRA)
Ferdinand Xu created HIVE-14919:
---

 Summary: Improve the performance of Hive on Spark 2.0.0
 Key: HIVE-14919
 URL: https://issues.apache.org/jira/browse/HIVE-14919
 Project: Hive
  Issue Type: Improvement
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu
 Attachments: benchmark.xlsx

In HIVE-14029, we have updated Spark dependency to 2.0.0. We use Intel 
BigBench[1] to run benchmark over 10 GB data set comparing with Spark 1.6. We 
can see quite some performance degradations for all the queries of BigBench. 
For detailed information, please see the attached files. This JIRA is the 
umbrella ticket addressing those performance issues.

[1] https://github.com/intel-hadoop/Big-Data-Benchmark-for-Big-Bench



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14916) Reduce the memory requirements for Spark tests

2016-10-07 Thread Ferdinand Xu (JIRA)
Ferdinand Xu created HIVE-14916:
---

 Summary: Reduce the memory requirements for Spark tests
 Key: HIVE-14916
 URL: https://issues.apache.org/jira/browse/HIVE-14916
 Project: Hive
  Issue Type: Sub-task
Reporter: Ferdinand Xu


As HIVE-14887, we need to reduce the memory requirements for Spark tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14836) Implement predict pushing down in Vectorized Page reader

2016-09-23 Thread Ferdinand Xu (JIRA)
Ferdinand Xu created HIVE-14836:
---

 Summary: Implement predict pushing down in Vectorized Page reader
 Key: HIVE-14836
 URL: https://issues.apache.org/jira/browse/HIVE-14836
 Project: Hive
  Issue Type: Sub-task
Reporter: Ferdinand Xu


Currently we filter blocks using Predict pushing down. We should support it in 
page reader as well to improve its efficiency. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14827) Micro benchmark for Parquet vectorized reader

2016-09-23 Thread Ferdinand Xu (JIRA)
Ferdinand Xu created HIVE-14827:
---

 Summary: Micro benchmark for Parquet vectorized reader
 Key: HIVE-14827
 URL: https://issues.apache.org/jira/browse/HIVE-14827
 Project: Hive
  Issue Type: Sub-task
Reporter: Ferdinand Xu


We need a microbenchmark to evaluate the throughput and execution time for 
Parquet vectorized reader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14826) Support vectorization for Parquet

2016-09-23 Thread Ferdinand Xu (JIRA)
Ferdinand Xu created HIVE-14826:
---

 Summary: Support vectorization for Parquet
 Key: HIVE-14826
 URL: https://issues.apache.org/jira/browse/HIVE-14826
 Project: Hive
  Issue Type: New Feature
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu


Parquet vectorized reader can improve both throughput and also leverages 
existing Hive vectorization execution engine. This is an umbrella ticket to 
track this feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14825) Figure out the minimum set of required jars for Hive on Spark after bumping up to Spark 2.0.0

2016-09-22 Thread Ferdinand Xu (JIRA)
Ferdinand Xu created HIVE-14825:
---

 Summary: Figure out the minimum set of required jars for Hive on 
Spark after bumping up to Spark 2.0.0
 Key: HIVE-14825
 URL: https://issues.apache.org/jira/browse/HIVE-14825
 Project: Hive
  Issue Type: Bug
Reporter: Ferdinand Xu


Considering that there's no assembly jar for Spark since 2.0.0, we should 
figure out the minimum set of required jars for HoS to work after bumping up to 
Spark 2.0.0. By this way, users can decide whether they want to add just the 
required jars, or all the jars under spark's dir for convenience.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14815) Support vectorization for Parquet

2016-09-22 Thread Ferdinand Xu (JIRA)
Ferdinand Xu created HIVE-14815:
---

 Summary: Support vectorization for Parquet
 Key: HIVE-14815
 URL: https://issues.apache.org/jira/browse/HIVE-14815
 Project: Hive
  Issue Type: Bug
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14693) Some paritions will be left out when partition number is the multiple of the option hive.msck.repair.batch.size

2016-09-01 Thread Ferdinand Xu (JIRA)
Ferdinand Xu created HIVE-14693:
---

 Summary: Some paritions will be left out when partition number is 
the multiple of the option hive.msck.repair.batch.size
 Key: HIVE-14693
 URL: https://issues.apache.org/jira/browse/HIVE-14693
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu


For example, bactch_size = 5, and no of partitions = 9, it will skip the last 4 
partitions from being added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14677) Beeline should support executing an initial SQL script

2016-08-31 Thread Ferdinand Xu (JIRA)
Ferdinand Xu created HIVE-14677:
---

 Summary: Beeline should support executing an initial SQL script
 Key: HIVE-14677
 URL: https://issues.apache.org/jira/browse/HIVE-14677
 Project: Hive
  Issue Type: Sub-task
Reporter: Ferdinand Xu






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14676) JDBC driver should support executing an initial SQL script

2016-08-31 Thread Ferdinand Xu (JIRA)
Ferdinand Xu created HIVE-14676:
---

 Summary: JDBC driver should support executing an initial SQL script
 Key: HIVE-14676
 URL: https://issues.apache.org/jira/browse/HIVE-14676
 Project: Hive
  Issue Type: Sub-task
Reporter: Ferdinand Xu






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14029) Update Spark version to 1.6

2016-06-15 Thread Ferdinand Xu (JIRA)
Ferdinand Xu created HIVE-14029:
---

 Summary: Update Spark version to 1.6
 Key: HIVE-14029
 URL: https://issues.apache.org/jira/browse/HIVE-14029
 Project: Hive
  Issue Type: Bug
Reporter: Ferdinand Xu


There are quite some new optimizations in Spark 2.0.0. We need to bump up Spark 
to 2.0.0 to benefit those performance improvements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11943) Set old CLI as the default Client when using hive script

2015-09-24 Thread Ferdinand Xu (JIRA)
Ferdinand Xu created HIVE-11943:
---

 Summary: Set old CLI as the default Client when using hive script
 Key: HIVE-11943
 URL: https://issues.apache.org/jira/browse/HIVE-11943
 Project: Hive
  Issue Type: Sub-task
  Components: CLI
Affects Versions: beeline-cli-branch
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu


Since we have some concerns about deprecating the current CLI, we will set the 
old CLI as default. Once we resolve the problems, we will set the new CLI as 
default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11944) Address the review items on HIVE-11778

2015-09-24 Thread Ferdinand Xu (JIRA)
Ferdinand Xu created HIVE-11944:
---

 Summary: Address the review items on HIVE-11778
 Key: HIVE-11944
 URL: https://issues.apache.org/jira/browse/HIVE-11944
 Project: Hive
  Issue Type: Sub-task
  Components: CLI
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu
 Fix For: beeline-cli-branch


This jira will address review items from https://reviews.apache.org/r/38247/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11958) Merge master to beeline-cli branch 09/25/2015

2015-09-24 Thread Ferdinand Xu (JIRA)
Ferdinand Xu created HIVE-11958:
---

 Summary: Merge master to beeline-cli branch 09/25/2015
 Key: HIVE-11958
 URL: https://issues.apache.org/jira/browse/HIVE-11958
 Project: Hive
  Issue Type: Sub-task
  Components: CLI
Affects Versions: beeline-cli-branch
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu
 Fix For: beeline-cli-branch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11796) CLI option is not updated when executing the initial files[beeline-cli]

2015-09-10 Thread Ferdinand Xu (JIRA)
Ferdinand Xu created HIVE-11796:
---

 Summary: CLI option is not updated when executing the initial 
files[beeline-cli]
 Key: HIVE-11796
 URL: https://issues.apache.org/jira/browse/HIVE-11796
 Project: Hive
  Issue Type: Sub-task
Affects Versions: beeline-cli-branch
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu
 Fix For: beeline-cli-branch


"Method not supported" is thrown when executing the initial files. This is 
caused by CLI option is not updated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11769) Merge master to beeline-cli branch 09/09/2015

2015-09-09 Thread Ferdinand Xu (JIRA)
Ferdinand Xu created HIVE-11769:
---

 Summary: Merge master to beeline-cli branch 09/09/2015
 Key: HIVE-11769
 URL: https://issues.apache.org/jira/browse/HIVE-11769
 Project: Hive
  Issue Type: Sub-task
Affects Versions: beeline-cli-branch
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11770) Use the static variable from beeline instead of untils from JDBC

2015-09-09 Thread Ferdinand Xu (JIRA)
Ferdinand Xu created HIVE-11770:
---

 Summary: Use the static variable from beeline instead of untils 
from JDBC
 Key: HIVE-11770
 URL: https://issues.apache.org/jira/browse/HIVE-11770
 Project: Hive
  Issue Type: Sub-task
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu
Priority: Minor


For beeline, we should use the constant BEELINE_DEFAULT_JDBC_URL in beeline 
instead of URL_PREFIX in jdbc Utils.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11778) Merge beeline-cli branch to trunk

2015-09-09 Thread Ferdinand Xu (JIRA)
Ferdinand Xu created HIVE-11778:
---

 Summary: Merge beeline-cli branch to trunk
 Key: HIVE-11778
 URL: https://issues.apache.org/jira/browse/HIVE-11778
 Project: Hive
  Issue Type: Sub-task
Affects Versions: 2.0.0
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu


The team working on the beeline-cli branch would like to merge their work to 
trunk. This jira will track that effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11747) Unnecessary error log is shown when executing a "INSERT OVERWRITE LOCAL DIRECTORY" cmd in the embedded mode

2015-09-06 Thread Ferdinand Xu (JIRA)
Ferdinand Xu created HIVE-11747:
---

 Summary: Unnecessary error log is shown when executing a "INSERT 
OVERWRITE LOCAL DIRECTORY" cmd in the embedded mode
 Key: HIVE-11747
 URL: https://issues.apache.org/jira/browse/HIVE-11747
 Project: Hive
  Issue Type: Bug
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu


The  ”INSERT OVERWRITE LOCAL DIRECTORY“ task runs successfully while some error 
logs are thrown.
{noformat}
Connected to: Apache Hive (version 2.0.0-SNAPSHOT)
Driver: Hive JDBC (version 2.0.0-SNAPSHOT)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 2.0.0-SNAPSHOT by Apache Hive
hive> INSERT OVERWRITE LOCAL DIRECTORY '/nullformat' ROW FORMAT DELIMITED NULL 
DEFINED AS 'fooNull' SELECT a,b FROM base_tab;
18:35:51.288 [HiveServer2-Background-Pool: Thread-25] ERROR 
org.apache.hadoop.hive.ql.exec.mr.ExecDriver - yarn
No rows affected (14.372 seconds)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11746) Connect command should not to be allowed from user[beeline-cli branch]

2015-09-05 Thread Ferdinand Xu (JIRA)
Ferdinand Xu created HIVE-11746:
---

 Summary: Connect command should not to be allowed from 
user[beeline-cli branch]
 Key: HIVE-11746
 URL: https://issues.apache.org/jira/browse/HIVE-11746
 Project: Hive
  Issue Type: Sub-task
  Components: Beeline
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu


For new cli, user should not be allowed to connect a server or database.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11717) nohup mode is not support for beeline

2015-09-01 Thread Ferdinand Xu (JIRA)
Ferdinand Xu created HIVE-11717:
---

 Summary: nohup mode is not support for beeline
 Key: HIVE-11717
 URL: https://issues.apache.org/jira/browse/HIVE-11717
 Project: Hive
  Issue Type: Sub-task
  Components: Beeline
Reporter: Ferdinand Xu


We are able use below hive command to run query file in batch mode.
{noformat}
nohup hive -S -f /home/wj19670/pad.sql >pad.csv &
{noformat}
However under beeline,  we aren't able to use nohup anymore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11640) Shell command doesn't work for new CLI[beeline-cli]

2015-08-25 Thread Ferdinand Xu (JIRA)
Ferdinand Xu created HIVE-11640:
---

 Summary: Shell command doesn't work for new CLI[beeline-cli]
 Key: HIVE-11640
 URL: https://issues.apache.org/jira/browse/HIVE-11640
 Project: Hive
  Issue Type: Sub-task
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu


The shell command doesn't work for the new CLI and Error: Method not supported 
(state=,code=0) was thrown during the execution for option f and e.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11637) Support hive.cli.print.current.db in new CLI[beeline-cli branch]

2015-08-24 Thread Ferdinand Xu (JIRA)
Ferdinand Xu created HIVE-11637:
---

 Summary: Support hive.cli.print.current.db in new CLI[beeline-cli 
branch]
 Key: HIVE-11637
 URL: https://issues.apache.org/jira/browse/HIVE-11637
 Project: Hive
  Issue Type: Sub-task
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11579) Invoke the set command will close standard error output[beeline-cli]

2015-08-17 Thread Ferdinand Xu (JIRA)
Ferdinand Xu created HIVE-11579:
---

 Summary: Invoke the set command will close standard error 
output[beeline-cli]
 Key: HIVE-11579
 URL: https://issues.apache.org/jira/browse/HIVE-11579
 Project: Hive
  Issue Type: Sub-task
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu


We can easily reproduce the debug by the following steps:
{code}
hive set system:xx=yy;
hive lss;
hive 
{code}
The error output disappeared since the err outputstream is closed when closing 
the Hive statement.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11504) Predicate pushing down doesn't work for float type for Parquet

2015-08-09 Thread Ferdinand Xu (JIRA)
Ferdinand Xu created HIVE-11504:
---

 Summary: Predicate pushing down doesn't work for float type for 
Parquet
 Key: HIVE-11504
 URL: https://issues.apache.org/jira/browse/HIVE-11504
 Project: Hive
  Issue Type: Sub-task
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu


Predicate builder should use PrimitiveTypeName type in parquet side to 
construct predicate leaf instead of the type provided by PredicateLeaf.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11352) Avoid the double connections with 'e' option[beeline-cli branch]

2015-07-23 Thread Ferdinand Xu (JIRA)
Ferdinand Xu created HIVE-11352:
---

 Summary: Avoid the double connections with 'e' option[beeline-cli 
branch]
 Key: HIVE-11352
 URL: https://issues.apache.org/jira/browse/HIVE-11352
 Project: Hive
  Issue Type: Sub-task
  Components: Beeline, CLI
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11336) Support initial file option for new CLI [beeline-cli branch]

2015-07-22 Thread Ferdinand Xu (JIRA)
Ferdinand Xu created HIVE-11336:
---

 Summary: Support initial file option for new CLI [beeline-cli 
branch]
 Key: HIVE-11336
 URL: https://issues.apache.org/jira/browse/HIVE-11336
 Project: Hive
  Issue Type: Sub-task
  Components: Beeline
Affects Versions: beeline-cli-branch
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu


Option 'i' need to be enabled in the new CLI, which can support multiple 
initial files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11277) Merge master to parquet 06/16/2015 [Parquet branch]

2015-07-16 Thread Ferdinand Xu (JIRA)
Ferdinand Xu created HIVE-11277:
---

 Summary: Merge master to parquet 06/16/2015 [Parquet branch]
 Key: HIVE-11277
 URL: https://issues.apache.org/jira/browse/HIVE-11277
 Project: Hive
  Issue Type: Sub-task
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11280) Support executing script file from hdfs in new CLI [Beeline-CLI branch]

2015-07-16 Thread Ferdinand Xu (JIRA)
Ferdinand Xu created HIVE-11280:
---

 Summary: Support executing script file from hdfs in new CLI 
[Beeline-CLI branch]
 Key: HIVE-11280
 URL: https://issues.apache.org/jira/browse/HIVE-11280
 Project: Hive
  Issue Type: Sub-task
  Components: Beeline, CLI
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu


In HIVE-7136, old CLI is able to read hive scripts from any of the supported 
file systems in hadoop eco-system. We need to support it in new CLI as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11236) BeeLine-Cli: use the same output format as old CLI in the new CLI

2015-07-13 Thread Ferdinand Xu (JIRA)
Ferdinand Xu created HIVE-11236:
---

 Summary: BeeLine-Cli: use the same output format as old CLI in the 
new CLI
 Key: HIVE-11236
 URL: https://issues.apache.org/jira/browse/HIVE-11236
 Project: Hive
  Issue Type: Sub-task
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu


In old CLI, the output format is as follows:
{noformat}
hive show tables;
OK
tbl1_name
tbl2_name
Time taken: 0.808 seconds, Fetched: 2 row(s)
{noformat}
This requires the default outputformat for new CLI is csv2 and disable the 
showHeader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11226) BeeLine-Cli: support hive.cli.prompt in new CLI

2015-07-10 Thread Ferdinand Xu (JIRA)
Ferdinand Xu created HIVE-11226:
---

 Summary: BeeLine-Cli: support hive.cli.prompt in new CLI
 Key: HIVE-11226
 URL: https://issues.apache.org/jira/browse/HIVE-11226
 Project: Hive
  Issue Type: Sub-task
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu


Beeline uses a different prompt format from old CLI. And for the old CLI, it 
supports configuration. We need change new CLI as the old prompt style.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11203) Beeline force option doesn't force execution when errors occurred in a script.

2015-07-08 Thread Ferdinand Xu (JIRA)
Ferdinand Xu created HIVE-11203:
---

 Summary: Beeline force option doesn't force execution when errors 
occurred in a script.
 Key: HIVE-11203
 URL: https://issues.apache.org/jira/browse/HIVE-11203
 Project: Hive
  Issue Type: Bug
  Components: Beeline
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu


The force option doesn't function as wiki described.  
https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11191) Beeline-cli: support hive.cli.errors.ignore in new CLI

2015-07-07 Thread Ferdinand Xu (JIRA)
Ferdinand Xu created HIVE-11191:
---

 Summary: Beeline-cli: support hive.cli.errors.ignore in new CLI
 Key: HIVE-11191
 URL: https://issues.apache.org/jira/browse/HIVE-11191
 Project: Hive
  Issue Type: Sub-task
  Components: CLI
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu


In the old CLI, it uses hive.cli.errors.ignore from the hive configuration to 
force execution a script when errors occurred. In the beeline, it has a similar 
option called force. We need to support the previous configuration using 
beeline functionality. More details about force option are available in 
https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10975) Parquet: Bump the parquet version up to 1.8.0rc2-SNAPSHOT

2015-06-10 Thread Ferdinand Xu (JIRA)
Ferdinand Xu created HIVE-10975:
---

 Summary: Parquet: Bump the parquet version up to 1.8.0rc2-SNAPSHOT
 Key: HIVE-10975
 URL: https://issues.apache.org/jira/browse/HIVE-10975
 Project: Hive
  Issue Type: Sub-task
Reporter: Ferdinand Xu
Priority: Minor


There are lots of changes since parquet's graduation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10979) Fix failed tests in TestSchemaTool after the version number change in HIVE-10921

2015-06-10 Thread Ferdinand Xu (JIRA)
Ferdinand Xu created HIVE-10979:
---

 Summary: Fix failed tests in TestSchemaTool after the version 
number change in HIVE-10921
 Key: HIVE-10979
 URL: https://issues.apache.org/jira/browse/HIVE-10979
 Project: Hive
  Issue Type: Bug
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu


Some version variables in sql are not updated in HIVE-10921 which caused unit 
test failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10943) Beeline-cli: Enable precommit for beelie-cli branch

2015-06-04 Thread Ferdinand Xu (JIRA)
Ferdinand Xu created HIVE-10943:
---

 Summary: Beeline-cli: Enable precommit for beelie-cli branch 
 Key: HIVE-10943
 URL: https://issues.apache.org/jira/browse/HIVE-10943
 Project: Hive
  Issue Type: Sub-task
  Components: Testing Infrastructure
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu
Priority: Minor


NO PRECOMMIT TESTS




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10821) Beeline-CLI: Implement all CLI command using Beeline functionality

2015-05-25 Thread Ferdinand Xu (JIRA)
Ferdinand Xu created HIVE-10821:
---

 Summary: Beeline-CLI: Implement all CLI command using Beeline 
functionality
 Key: HIVE-10821
 URL: https://issues.apache.org/jira/browse/HIVE-10821
 Project: Hive
  Issue Type: Sub-task
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10749) Implement Insert statement for parquet

2015-05-19 Thread Ferdinand Xu (JIRA)
Ferdinand Xu created HIVE-10749:
---

 Summary: Implement Insert statement for parquet
 Key: HIVE-10749
 URL: https://issues.apache.org/jira/browse/HIVE-10749
 Project: Hive
  Issue Type: Sub-task
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu


We need to implement insert statement for parquet format like ORC.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10747) enable the cleanup side effect for Encryption related qfile test

2015-05-18 Thread Ferdinand Xu (JIRA)
Ferdinand Xu created HIVE-10747:
---

 Summary: enable the cleanup side effect for Encryption related 
qfile test
 Key: HIVE-10747
 URL: https://issues.apache.org/jira/browse/HIVE-10747
 Project: Hive
  Issue Type: Sub-task
  Components: Testing Infrastructure
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu


The hive conf is not reset in the clearTestSideEffects method which is involved 
from HIVE-8900. This will have pollute other qfile's settings running by 
TestEncryptedHDFSCliDriver



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10717) Fix failed qtest encryption_insert_partition_static test in Jenkin

2015-05-14 Thread Ferdinand Xu (JIRA)
Ferdinand Xu created HIVE-10717:
---

 Summary: Fix failed qtest encryption_insert_partition_static test 
in Jenkin
 Key: HIVE-10717
 URL: https://issues.apache.org/jira/browse/HIVE-10717
 Project: Hive
  Issue Type: Sub-task
Reporter: Ferdinand Xu


It can be reproduced in Jenkins. See 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3898/testReport/
 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10718) Update committer list - Add Ferdinand Xu

2015-05-14 Thread Ferdinand Xu (JIRA)
Ferdinand Xu created HIVE-10718:
---

 Summary: Update committer list - Add Ferdinand Xu
 Key: HIVE-10718
 URL: https://issues.apache.org/jira/browse/HIVE-10718
 Project: Hive
  Issue Type: Bug
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu
Priority: Minor


NO PRECOMMIT TESTS
add myself to committer list



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10705) Update tests for HIVE-9302 after removing binaries

2015-05-13 Thread Ferdinand Xu (JIRA)
Ferdinand Xu created HIVE-10705:
---

 Summary: Update tests for HIVE-9302 after removing binaries
 Key: HIVE-10705
 URL: https://issues.apache.org/jira/browse/HIVE-10705
 Project: Hive
  Issue Type: Bug
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10684) Fix the UT failures for HIVE7553 after HIVE-10674 removed the binary jar files

2015-05-12 Thread Ferdinand Xu (JIRA)
Ferdinand Xu created HIVE-10684:
---

 Summary: Fix the UT failures for HIVE7553 after HIVE-10674 removed 
the binary jar files
 Key: HIVE-10684
 URL: https://issues.apache.org/jira/browse/HIVE-10684
 Project: Hive
  Issue Type: Bug
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10624) Update the initial script to make beeline bucked cli as default and allow user choose old hive cli by env

2015-05-06 Thread Ferdinand Xu (JIRA)
Ferdinand Xu created HIVE-10624:
---

 Summary: Update the initial script to make beeline bucked cli as 
default and allow user choose old hive cli by env
 Key: HIVE-10624
 URL: https://issues.apache.org/jira/browse/HIVE-10624
 Project: Hive
  Issue Type: Sub-task
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu


As discussed in the dev-list, we should update the script to make new beeline 
bucked cli default and allow user to change to old cli by environment variable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10623) Implement hive cli options using beeline functionality

2015-05-06 Thread Ferdinand Xu (JIRA)
Ferdinand Xu created HIVE-10623:
---

 Summary: Implement hive cli options using beeline functionality
 Key: HIVE-10623
 URL: https://issues.apache.org/jira/browse/HIVE-10623
 Project: Hive
  Issue Type: Sub-task
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu


We need to support the original hive cli options for the purpose of backwards 
compatibility. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10460) change the key of Parquet Record to Nullwritable instead of void

2015-04-23 Thread Ferdinand Xu (JIRA)
Ferdinand Xu created HIVE-10460:
---

 Summary: change the key of Parquet Record to Nullwritable instead 
of void
 Key: HIVE-10460
 URL: https://issues.apache.org/jira/browse/HIVE-10460
 Project: Hive
  Issue Type: Sub-task
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu


AcidInputFormat is accepting the key type implement the writable interface. So 
the void type is not valid if we want to make acid work for parquet. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10461) Implement Record Updater and Raw Merger for Parquet as well

2015-04-23 Thread Ferdinand Xu (JIRA)
Ferdinand Xu created HIVE-10461:
---

 Summary: Implement Record Updater and Raw Merger for Parquet as 
well
 Key: HIVE-10461
 URL: https://issues.apache.org/jira/browse/HIVE-10461
 Project: Hive
  Issue Type: Sub-task
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu


The Record updater will create the data with acid information. And for the raw 
record merger it can provide the user-view data. In this jira, we should 
implement these two classes and make the basic acid w/r case work. For the 
upper layer like FileSinkOperator, CompactorMR and TxnManager, we can file new 
jiras to fix them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10372) Bump parquet version to 1.6.0

2015-04-16 Thread Ferdinand Xu (JIRA)
Ferdinand Xu created HIVE-10372:
---

 Summary: Bump parquet version to 1.6.0
 Key: HIVE-10372
 URL: https://issues.apache.org/jira/browse/HIVE-10372
 Project: Hive
  Issue Type: Bug
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10189) Create a micro benchmark tool for vectorization to evaluate the performance gain after SIMI optimization

2015-04-01 Thread Ferdinand Xu (JIRA)
Ferdinand Xu created HIVE-10189:
---

 Summary: Create a micro benchmark tool for vectorization to 
evaluate the performance gain after SIMI optimization
 Key: HIVE-10189
 URL: https://issues.apache.org/jira/browse/HIVE-10189
 Project: Hive
  Issue Type: Sub-task
Reporter: Ferdinand Xu






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10135) Add qtest to access struct after parquet column index access enabled

2015-03-29 Thread Ferdinand Xu (JIRA)
Ferdinand Xu created HIVE-10135:
---

 Summary: Add qtest to access struct after parquet column index 
access enabled
 Key: HIVE-10135
 URL: https://issues.apache.org/jira/browse/HIVE-10135
 Project: Hive
  Issue Type: Bug
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10077) Use new ParquetInputSplit constructor API

2015-03-24 Thread Ferdinand Xu (JIRA)
Ferdinand Xu created HIVE-10077:
---

 Summary: Use new ParquetInputSplit constructor API
 Key: HIVE-10077
 URL: https://issues.apache.org/jira/browse/HIVE-10077
 Project: Hive
  Issue Type: Sub-task
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10079) Enable parquet column index in HIVE

2015-03-24 Thread Ferdinand Xu (JIRA)
Ferdinand Xu created HIVE-10079:
---

 Summary: Enable parquet column index in HIVE
 Key: HIVE-10079
 URL: https://issues.apache.org/jira/browse/HIVE-10079
 Project: Hive
  Issue Type: Sub-task
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10076) Update parquet-hadoop-bundle and parquet-column to the version of 1.6.0rc6

2015-03-24 Thread Ferdinand Xu (JIRA)
Ferdinand Xu created HIVE-10076:
---

 Summary: Update parquet-hadoop-bundle and parquet-column to the 
version of 1.6.0rc6
 Key: HIVE-10076
 URL: https://issues.apache.org/jira/browse/HIVE-10076
 Project: Hive
  Issue Type: Sub-task
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10053) Override new init API fom ReadSupport instead of the deprecated one

2015-03-22 Thread Ferdinand Xu (JIRA)
Ferdinand Xu created HIVE-10053:
---

 Summary: Override new init API fom ReadSupport instead of the 
deprecated one
 Key: HIVE-10053
 URL: https://issues.apache.org/jira/browse/HIVE-10053
 Project: Hive
  Issue Type: Sub-task
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10054) Clean up ETypeConverter since Parquet supports timestamp type already

2015-03-22 Thread Ferdinand Xu (JIRA)
Ferdinand Xu created HIVE-10054:
---

 Summary: Clean up ETypeConverter since Parquet supports timestamp 
type already
 Key: HIVE-10054
 URL: https://issues.apache.org/jira/browse/HIVE-10054
 Project: Hive
  Issue Type: Sub-task
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10032) Remove broken java file from source code

2015-03-20 Thread Ferdinand Xu (JIRA)
Ferdinand Xu created HIVE-10032:
---

 Summary: Remove broken java file from source code
 Key: HIVE-10032
 URL: https://issues.apache.org/jira/browse/HIVE-10032
 Project: Hive
  Issue Type: Bug
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu
Priority: Minor


Remove all hcatalog broken java files in java source code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9252) Linking custom SerDe jar to table definition.

2015-02-11 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-9252:
---
Attachment: HIVE-9252.patch

The initial patch is attached!

 Linking custom SerDe jar to table definition.
 -

 Key: HIVE-9252
 URL: https://issues.apache.org/jira/browse/HIVE-9252
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Reporter: Niels Basjes
Assignee: Ferdinand Xu
 Attachments: HIVE-9252.patch


 In HIVE-6047 the option was created that a jar file can be hooked to the 
 definition of a function. (See: [Language Manual DDL: Permanent 
 Functions|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-PermanentFunctions]
  )
 I propose to add something similar that can be used when defining an external 
 table that relies on a custom Serde (I expect to usually only have the 
 Deserializer).
 Something like this:
 {code}
 CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
 ...
 STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)] 
 [USING JAR|FILE|ARCHIVE 'file_uri' [, JAR|FILE|ARCHIVE 'file_uri'] ];
 {code}
 Using this you can define (and share !!!) a Hive table on top of a custom 
 fileformat without the need to let the IT operations people deploy a custom 
 SerDe jar file on all nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9252) Linking custom SerDe jar to table definition.

2015-02-11 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-9252:
---
Attachment: HIVE-9252.1.patch

rebase patch

 Linking custom SerDe jar to table definition.
 -

 Key: HIVE-9252
 URL: https://issues.apache.org/jira/browse/HIVE-9252
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Reporter: Niels Basjes
Assignee: Ferdinand Xu
 Attachments: HIVE-9252.1.patch, HIVE-9252.patch


 In HIVE-6047 the option was created that a jar file can be hooked to the 
 definition of a function. (See: [Language Manual DDL: Permanent 
 Functions|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-PermanentFunctions]
  )
 I propose to add something similar that can be used when defining an external 
 table that relies on a custom Serde (I expect to usually only have the 
 Deserializer).
 Something like this:
 {code}
 CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
 ...
 STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)] 
 [USING JAR|FILE|ARCHIVE 'file_uri' [, JAR|FILE|ARCHIVE 'file_uri'] ];
 {code}
 Using this you can define (and share !!!) a Hive table on top of a custom 
 fileformat without the need to let the IT operations people deploy a custom 
 SerDe jar file on all nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9252) Linking custom SerDe jar to table definition.

2015-02-11 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-9252:
---
Status: Patch Available  (was: Open)

 Linking custom SerDe jar to table definition.
 -

 Key: HIVE-9252
 URL: https://issues.apache.org/jira/browse/HIVE-9252
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Reporter: Niels Basjes
Assignee: Ferdinand Xu
 Attachments: HIVE-9252.1.patch, HIVE-9252.patch


 In HIVE-6047 the option was created that a jar file can be hooked to the 
 definition of a function. (See: [Language Manual DDL: Permanent 
 Functions|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-PermanentFunctions]
  )
 I propose to add something similar that can be used when defining an external 
 table that relies on a custom Serde (I expect to usually only have the 
 Deserializer).
 Something like this:
 {code}
 CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
 ...
 STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)] 
 [USING JAR|FILE|ARCHIVE 'file_uri' [, JAR|FILE|ARCHIVE 'file_uri'] ];
 {code}
 Using this you can define (and share !!!) a Hive table on top of a custom 
 fileformat without the need to let the IT operations people deploy a custom 
 SerDe jar file on all nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-9661) Refine debug log with schema information for the method of creating session directories

2015-02-11 Thread Ferdinand Xu (JIRA)
Ferdinand Xu created HIVE-9661:
--

 Summary: Refine debug log with schema information for the method 
of creating session directories
 Key: HIVE-9661
 URL: https://issues.apache.org/jira/browse/HIVE-9661
 Project: Hive
  Issue Type: Bug
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu
Priority: Minor


For a session, the scratch directory can be either a local path or a hdfs 
scratch path. The method name createRootHDFSDir is quite confusing. So add the 
schema information to the debug log for the troubleshooting need.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9661) Refine debug log with schema information for the method of creating session directories

2015-02-11 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-9661:
---
Status: Patch Available  (was: Open)

 Refine debug log with schema information for the method of creating session 
 directories
 ---

 Key: HIVE-9661
 URL: https://issues.apache.org/jira/browse/HIVE-9661
 Project: Hive
  Issue Type: Bug
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu
Priority: Minor
 Attachments: HIVE-9661.patch


 For a session, the scratch directory can be either a local path or a hdfs 
 scratch path. The method name createRootHDFSDir is quite confusing. So add 
 the schema information to the debug log for the troubleshooting need.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9661) Refine debug log with schema information for the method of creating session directories

2015-02-11 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-9661:
---
Attachment: HIVE-9661.patch

 Refine debug log with schema information for the method of creating session 
 directories
 ---

 Key: HIVE-9661
 URL: https://issues.apache.org/jira/browse/HIVE-9661
 Project: Hive
  Issue Type: Bug
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu
Priority: Minor
 Attachments: HIVE-9661.patch


 For a session, the scratch directory can be either a local path or a hdfs 
 scratch path. The method name createRootHDFSDir is quite confusing. So add 
 the schema information to the debug log for the troubleshooting need.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9252) Linking custom SerDe jar to table definition.

2015-02-11 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-9252:
---
Attachment: (was: HIVE-9252.patch)

 Linking custom SerDe jar to table definition.
 -

 Key: HIVE-9252
 URL: https://issues.apache.org/jira/browse/HIVE-9252
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Reporter: Niels Basjes
Assignee: Ferdinand Xu
 Attachments: HIVE-9252.1.patch


 In HIVE-6047 the option was created that a jar file can be hooked to the 
 definition of a function. (See: [Language Manual DDL: Permanent 
 Functions|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-PermanentFunctions]
  )
 I propose to add something similar that can be used when defining an external 
 table that relies on a custom Serde (I expect to usually only have the 
 Deserializer).
 Something like this:
 {code}
 CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
 ...
 STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)] 
 [USING JAR|FILE|ARCHIVE 'file_uri' [, JAR|FILE|ARCHIVE 'file_uri'] ];
 {code}
 Using this you can define (and share !!!) a Hive table on top of a custom 
 fileformat without the need to let the IT operations people deploy a custom 
 SerDe jar file on all nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8136) Reduce table locking

2015-02-04 Thread Ferdinand Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14306528#comment-14306528
 ] 

Ferdinand Xu commented on HIVE-8136:


Hi [~brocknoland], I agree with you that an exclusive lock is a must for 
altering table structure. I think ADDCLUSTERSORTCOLUMN can use shared lock 
instead. Please see my previous comments for details.

 Reduce table locking
 

 Key: HIVE-8136
 URL: https://issues.apache.org/jira/browse/HIVE-8136
 Project: Hive
  Issue Type: Sub-task
Reporter: Brock Noland
Assignee: Ferdinand Xu
 Attachments: HIVE-8136.patch


 When using ZK for concurrency control, some statements require an exclusive 
 table lock when they are atomic. Such as setting a tables location.
 This JIRA is to analyze the scope of statements like ALTER TABLE and see if 
 we can reduce the locking required.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9302) Beeline add commands to register local jdbc driver names and jars

2015-02-04 Thread Ferdinand Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14306534#comment-14306534
 ] 

Ferdinand Xu commented on HIVE-9302:


Thank Sergio for your review.
@[~brocknoland], do you have any further comments for my patch?

 Beeline add commands to register local jdbc driver names and jars
 -

 Key: HIVE-9302
 URL: https://issues.apache.org/jira/browse/HIVE-9302
 Project: Hive
  Issue Type: New Feature
Reporter: Brock Noland
Assignee: Ferdinand Xu
 Attachments: DummyDriver-1.0-SNAPSHOT.jar, HIVE-9302.1.patch, 
 HIVE-9302.2.patch, HIVE-9302.3.patch, HIVE-9302.patch, 
 mysql-connector-java-bin.jar, postgresql-9.3.jdbc3.jar


 At present if a beeline user uses {{add jar}} the path they give is actually 
 on the HS2 server. It'd be great to allow beeline users to add local jdbc 
 driver jars and register custom jdbc driver names.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9302) Beeline add commands to register local jdbc driver names and jars

2015-02-03 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-9302:
---
Attachment: HIVE-9302.3.patch

Hi Sergio, I have update my patch according to your comments. Please help me 
review it if you have some time. Thank you!

 Beeline add commands to register local jdbc driver names and jars
 -

 Key: HIVE-9302
 URL: https://issues.apache.org/jira/browse/HIVE-9302
 Project: Hive
  Issue Type: New Feature
Reporter: Brock Noland
Assignee: Ferdinand Xu
 Attachments: DummyDriver-1.0-SNAPSHOT.jar, HIVE-9302.1.patch, 
 HIVE-9302.2.patch, HIVE-9302.3.patch, HIVE-9302.patch, 
 mysql-connector-java-bin.jar, postgresql-9.3.jdbc3.jar


 At present if a beeline user uses {{add jar}} the path they give is actually 
 on the HS2 server. It'd be great to allow beeline users to add local jdbc 
 driver jars and register custom jdbc driver names.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8136) Reduce table locking

2015-01-29 Thread Ferdinand Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296607#comment-14296607
 ] 

Ferdinand Xu commented on HIVE-8136:


Currently the following alter table write type is trying to acquire an 
exclusive lock.
DDL_EXCLUSIVE;
RENAMECOLUMN
ADDCLUSTERSORTCOLUMN:
ADDFILEFORMAT:
DROPPROPS:
REPLACECOLS:
ARCHIVE:
UNARCHIVE:
ALTERPROTECTMODE:
ALTERPARTITIONPROTECTMODE:
ALTERLOCATION:
DROPPARTITION:
RENAMEPARTITION:
ADDSKEWEDBY:
ALTERSKEWEDLOCATION:
ALTERBUCKETNUM:
ALTERPARTITION:
ADDCOLS:
RENAME:
TRUNCATE:
MERGEFILES:

Other following is using shared lock:
  ADDSERDE
  ADDPARTITION
  ADDSERDEPROPS
  ADDPROPS

Others has no lock:
  COMPACT
  TOUCH

For changing table structure, an exclusive lock is a must. Most of the cases 
use the exclusive lock since it changes the table or partition structure 
currently. For adding cluster column and sort column, we can use shared lock 
for the following reason.
{quote}
The CLUSTERED BY and SORTED BY creation commands do not affect how data is 
inserted into a table – only how it is read. This means that users must be 
careful to insert data correctly by specifying the number of reducers to be 
equal to the number of buckets, and using CLUSTER BY and SORT BY commands in 
their query.
{quote}
For changing the properties, I think we can use no lock if it doesn't change 
the structure of the table. We can do a follow-up jira.  Any thought about it, 
[~brocknoland]?

 Reduce table locking
 

 Key: HIVE-8136
 URL: https://issues.apache.org/jira/browse/HIVE-8136
 Project: Hive
  Issue Type: Sub-task
Reporter: Brock Noland
Assignee: Ferdinand Xu

 When using ZK for concurrency control, some statements require an exclusive 
 table lock when they are atomic. Such as setting a tables location.
 This JIRA is to analyze the scope of statements like ALTER TABLE and see if 
 we can reduce the locking required.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8136) Reduce table locking

2015-01-29 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-8136:
---
Status: Patch Available  (was: In Progress)

 Reduce table locking
 

 Key: HIVE-8136
 URL: https://issues.apache.org/jira/browse/HIVE-8136
 Project: Hive
  Issue Type: Sub-task
Reporter: Brock Noland
Assignee: Ferdinand Xu
 Attachments: HIVE-8136.patch


 When using ZK for concurrency control, some statements require an exclusive 
 table lock when they are atomic. Such as setting a tables location.
 This JIRA is to analyze the scope of statements like ALTER TABLE and see if 
 we can reduce the locking required.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8136) Reduce table locking

2015-01-29 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-8136:
---
Attachment: HIVE-8136.patch

 Reduce table locking
 

 Key: HIVE-8136
 URL: https://issues.apache.org/jira/browse/HIVE-8136
 Project: Hive
  Issue Type: Sub-task
Reporter: Brock Noland
Assignee: Ferdinand Xu
 Attachments: HIVE-8136.patch


 When using ZK for concurrency control, some statements require an exclusive 
 table lock when they are atomic. Such as setting a tables location.
 This JIRA is to analyze the scope of statements like ALTER TABLE and see if 
 we can reduce the locking required.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-9252) Linking custom SerDe jar to table definition.

2015-01-29 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu reassigned HIVE-9252:
--

Assignee: Ferdinand Xu

 Linking custom SerDe jar to table definition.
 -

 Key: HIVE-9252
 URL: https://issues.apache.org/jira/browse/HIVE-9252
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Reporter: Niels Basjes
Assignee: Ferdinand Xu

 In HIVE-6047 the option was created that a jar file can be hooked to the 
 definition of a function. (See: [Language Manual DDL: Permanent 
 Functions|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-PermanentFunctions]
  )
 I propose to add something similar that can be used when defining an external 
 table that relies on a custom Serde (I expect to usually only have the 
 Deserializer).
 Something like this:
 {code}
 CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
 ...
 STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)] 
 [USING JAR|FILE|ARCHIVE 'file_uri' [, JAR|FILE|ARCHIVE 'file_uri'] ];
 {code}
 Using this you can define (and share !!!) a Hive table on top of a custom 
 fileformat without the need to let the IT operations people deploy a custom 
 SerDe jar file on all nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-9522) Improve select count(*) statement for a parquet table with big input(~1Gb)

2015-01-29 Thread Ferdinand Xu (JIRA)
Ferdinand Xu created HIVE-9522:
--

 Summary: Improve select count(*) statement for a parquet table 
with big input(~1Gb)
 Key: HIVE-9522
 URL: https://issues.apache.org/jira/browse/HIVE-9522
 Project: Hive
  Issue Type: Sub-task
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8136) Reduce table locking

2015-01-29 Thread Ferdinand Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14297970#comment-14297970
 ] 

Ferdinand Xu commented on HIVE-8136:


Sounds unrelated failed cases.

 Reduce table locking
 

 Key: HIVE-8136
 URL: https://issues.apache.org/jira/browse/HIVE-8136
 Project: Hive
  Issue Type: Sub-task
Reporter: Brock Noland
Assignee: Ferdinand Xu
 Attachments: HIVE-8136.patch


 When using ZK for concurrency control, some statements require an exclusive 
 table lock when they are atomic. Such as setting a tables location.
 This JIRA is to analyze the scope of statements like ALTER TABLE and see if 
 we can reduce the locking required.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9522) Improve the speed of select count(*) statement for a parquet table with big input(~1GB)

2015-01-29 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-9522:
---
Summary: Improve the speed of select count(*) statement for a parquet table 
with big input(~1GB)  (was: Improve the speed of select count(*) statement for 
a parquet table with big input(~1Gb))

 Improve the speed of select count(*) statement for a parquet table with big 
 input(~1GB)
 ---

 Key: HIVE-9522
 URL: https://issues.apache.org/jira/browse/HIVE-9522
 Project: Hive
  Issue Type: Sub-task
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9333) Move parquet serialize implementation to DataWritableWriter to improve write speeds

2015-01-29 Thread Ferdinand Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14298082#comment-14298082
 ] 

Ferdinand Xu commented on HIVE-9333:


Thanks Sergio for your patch.
LGTM +1

 Move parquet serialize implementation to DataWritableWriter to improve write 
 speeds
 ---

 Key: HIVE-9333
 URL: https://issues.apache.org/jira/browse/HIVE-9333
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergio Peña
Assignee: Sergio Peña
 Attachments: HIVE-9333.2.patch, HIVE-9333.3.patch


 The serialize process on ParquetHiveSerDe parses a Hive object
 to a Writable object by looping through all the Hive object children,
 and creating new Writables objects per child. These final writables
 objects are passed in to the Parquet writing function, and parsed again
 on the DataWritableWriter class by looping through the ArrayWritable
 object. These two loops (ParquetHiveSerDe.serialize() and 
 DataWritableWriter.write()  may be reduced to use just one loop into the 
 DataWritableWriter.write() method in order to increment the writing process 
 speed for Hive parquet.
 In order to achieve this, we can wrap the Hive object and object inspector
 on ParquetHiveSerDe.serialize() method into an object that implements the 
 Writable object and thus avoid the loop that serialize() does, and leave the 
 loop parser to the DataWritableWriter.write() method. We can see how ORC does 
 this with the OrcSerde.OrcSerdeRow class.
 Writable objects are organized differently on any kind of storage formats, so 
 I don't think it is necessary to create and keep the writable objects in the 
 serialize() method as they won't be used until the writing process starts 
 (DataWritableWriter.write()).
 This performance issue was found using microbenchmark tests from HIVE-8121.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9522) Improve the speed of select count(*) statement for a parquet table with big input(~1Gb)

2015-01-29 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-9522:
---
Summary: Improve the speed of select count(*) statement for a parquet table 
with big input(~1Gb)  (was: Improve select count(*) statement for a parquet 
table with big input(~1Gb))

 Improve the speed of select count(*) statement for a parquet table with big 
 input(~1Gb)
 ---

 Key: HIVE-9522
 URL: https://issues.apache.org/jira/browse/HIVE-9522
 Project: Hive
  Issue Type: Sub-task
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9302) Beeline add jar local to client

2015-01-28 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-9302:
---
Attachment: HIVE-9302.2.patch

 Beeline add jar local to client
 ---

 Key: HIVE-9302
 URL: https://issues.apache.org/jira/browse/HIVE-9302
 Project: Hive
  Issue Type: New Feature
Reporter: Brock Noland
Assignee: Ferdinand Xu
 Attachments: DummyDriver-1.0-SNAPSHOT.jar, HIVE-9302.1.patch, 
 HIVE-9302.2.patch, HIVE-9302.patch, mysql-connector-java-bin.jar, 
 postgresql-9.3.jdbc3.jar


 At present if a beeline user uses {{add jar}} the path they give is actually 
 on the HS2 server. It'd be great to allow beeline users to add local jars as 
 well.
 It might be useful to do this in the jdbc driver itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9302) Beeline add jar local to client

2015-01-28 Thread Ferdinand Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14294853#comment-14294853
 ] 

Ferdinand Xu commented on HIVE-9302:


Sorry, I meant to. 
There are two kinds of use cases. One is to add an existing known driver like 
mysql driver or postgres driver. Current supported driver are postgres and 
mysql.
{noformat}
# beeline
beeline !addlocaldriverjar /path/to/mysql-connector-java-5.1.27-bin.jar
beeline !connect mysql://host:3306/testdb
{noformat}
And another is to add a customized driver.
{noformat}
# beeline
beeline!addlocaldriverjar /path/to/DummyDriver-1.0-SNAPSHOT.jar
beeline!!addlocaldrivername org.apache.dummy.DummyDrive
beeline !connect mysql://host:3306/testdb
{noformat}

 Beeline add jar local to client
 ---

 Key: HIVE-9302
 URL: https://issues.apache.org/jira/browse/HIVE-9302
 Project: Hive
  Issue Type: New Feature
Reporter: Brock Noland
Assignee: Ferdinand Xu
 Attachments: DummyDriver-1.0-SNAPSHOT.jar, HIVE-9302.1.patch, 
 HIVE-9302.patch, mysql-connector-java-bin.jar, postgresql-9.3.jdbc3.jar


 At present if a beeline user uses {{add jar}} the path they give is actually 
 on the HS2 server. It'd be great to allow beeline users to add local jars as 
 well.
 It might be useful to do this in the jdbc driver itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9470) Use a generic writable object to run ColumnaStorageBench write/read tests

2015-01-28 Thread Ferdinand Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296201#comment-14296201
 ] 

Ferdinand Xu commented on HIVE-9470:


Thank you for your update.  +1

 Use a generic writable object to run ColumnaStorageBench write/read tests 
 --

 Key: HIVE-9470
 URL: https://issues.apache.org/jira/browse/HIVE-9470
 Project: Hive
  Issue Type: Improvement
Reporter: Sergio Peña
Assignee: Sergio Peña
 Attachments: HIVE-9470.1.patch, HIVE-9470.2.patch


 The ColumnarStorageBench benchmark class is using a Parquet writable object 
 to run all write/read/serialize/deserialize tests. It would be better to use 
 a more generic writable object (like text writables) to get better benchmark 
 results between format storages.
 Using parquet writables may add advantage when writing parquet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9302) Beeline add commands to register local jdbc driver names and jars

2015-01-28 Thread Ferdinand Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296117#comment-14296117
 ] 

Ferdinand Xu commented on HIVE-9302:


Thanks [~thejas] for your update!

 Beeline add commands to register local jdbc driver names and jars
 -

 Key: HIVE-9302
 URL: https://issues.apache.org/jira/browse/HIVE-9302
 Project: Hive
  Issue Type: New Feature
Reporter: Brock Noland
Assignee: Ferdinand Xu
 Attachments: DummyDriver-1.0-SNAPSHOT.jar, HIVE-9302.1.patch, 
 HIVE-9302.2.patch, HIVE-9302.patch, mysql-connector-java-bin.jar, 
 postgresql-9.3.jdbc3.jar


 At present if a beeline user uses {{add jar}} the path they give is actually 
 on the HS2 server. It'd be great to allow beeline users to add local jdbc 
 driver jars and register custom jdbc driver names.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9302) Beeline add jar local to client

2015-01-27 Thread Ferdinand Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14293700#comment-14293700
 ] 

Ferdinand Xu commented on HIVE-9302:


Failed cases are caused by lack of Driver jar files attached in this jira.

 Beeline add jar local to client
 ---

 Key: HIVE-9302
 URL: https://issues.apache.org/jira/browse/HIVE-9302
 Project: Hive
  Issue Type: New Feature
Reporter: Brock Noland
Assignee: Ferdinand Xu
 Attachments: DummyDriver-1.0-SNAPSHOT.jar, HIVE-9302.1.patch, 
 HIVE-9302.patch, mysql-connector-java-bin.jar, postgresql-9.3.jdbc3.jar


 At present if a beeline user uses {{add jar}} the path they give is actually 
 on the HS2 server. It'd be great to allow beeline users to add local jars as 
 well.
 It might be useful to do this in the jdbc driver itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9333) Move parquet serialize implementation to DataWritableWriter to improve write speeds

2015-01-26 Thread Ferdinand Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292844#comment-14292844
 ] 

Ferdinand Xu commented on HIVE-9333:


Thanks Sergio for your patch. I have left some general questions in the review 
board.

 Move parquet serialize implementation to DataWritableWriter to improve write 
 speeds
 ---

 Key: HIVE-9333
 URL: https://issues.apache.org/jira/browse/HIVE-9333
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergio Peña
Assignee: Sergio Peña
 Attachments: HIVE-9333.2.patch


 The serialize process on ParquetHiveSerDe parses a Hive object
 to a Writable object by looping through all the Hive object children,
 and creating new Writables objects per child. These final writables
 objects are passed in to the Parquet writing function, and parsed again
 on the DataWritableWriter class by looping through the ArrayWritable
 object. These two loops (ParquetHiveSerDe.serialize() and 
 DataWritableWriter.write()  may be reduced to use just one loop into the 
 DataWritableWriter.write() method in order to increment the writing process 
 speed for Hive parquet.
 In order to achieve this, we can wrap the Hive object and object inspector
 on ParquetHiveSerDe.serialize() method into an object that implements the 
 Writable object and thus avoid the loop that serialize() does, and leave the 
 loop parser to the DataWritableWriter.write() method. We can see how ORC does 
 this with the OrcSerde.OrcSerdeRow class.
 Writable objects are organized differently on any kind of storage formats, so 
 I don't think it is necessary to create and keep the writable objects in the 
 serialize() method as they won't be used until the writing process starts 
 (DataWritableWriter.write()).
 This performance issue was found using microbenchmark tests from HIVE-8121.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9470) Use a generic writable object to run ColumnaStorageBench write/read tests

2015-01-26 Thread Ferdinand Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292772#comment-14292772
 ] 

Ferdinand Xu commented on HIVE-9470:


LGTM with some minor suggestions.

{noformat}
131   public ColumnarStorageBench()  {
{noformat}
Please remove extra space.

{noformat}
233   private ObjectInspector getParquetObjectInspector(final String 
columnTypes) {
{noformat}
Can you rename it with getArrayWritableObjectInspector since it will be used by 
both parquet and orc?

{noformat}
242 Writable parquetWritable = 
createRecord(TypeInfoUtils.getTypeInfosFromTypeString(columnTypes));
{noformat}
Can you rename it with recordWritable  for the same reason as above?


 Use a generic writable object to run ColumnaStorageBench write/read tests 
 --

 Key: HIVE-9470
 URL: https://issues.apache.org/jira/browse/HIVE-9470
 Project: Hive
  Issue Type: Improvement
Reporter: Sergio Peña
Assignee: Sergio Peña
 Attachments: HIVE-9470.1.patch


 The ColumnarStorageBench benchmark class is using a Parquet writable object 
 to run all write/read/serialize/deserialize tests. It would be better to use 
 a more generic writable object (like text writables) to get better benchmark 
 results between format storages.
 Using parquet writables may add advantage when writing parquet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9371) Execution error for Parquet table and GROUP BY involving CHAR data type

2015-01-25 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-9371:
---
Resolution: Duplicate
Status: Resolved  (was: Patch Available)

 Execution error for Parquet table and GROUP BY involving CHAR data type
 ---

 Key: HIVE-9371
 URL: https://issues.apache.org/jira/browse/HIVE-9371
 Project: Hive
  Issue Type: Bug
  Components: File Formats, Query Processor
Reporter: Matt McCline
Assignee: Ferdinand Xu
Priority: Critical
 Attachments: HIVE-9371.1.patch, HIVE-9371.patch, HIVE-9371.patch


 Query fails involving PARQUET table format, CHAR data type, and GROUP BY.
 Probably also fails for VARCHAR, too.
 {noformat}
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to 
 org.apache.hadoop.hive.serde2.io.HiveCharWritable
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:814)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
   at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
   at 
 org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493)
   ... 10 more
 Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be 
 cast to org.apache.hadoop.hive.serde2.io.HiveCharWritable
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableHiveCharObjectInspector.copyObject(WritableHiveCharObjectInspector.java:104)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.copyToStandardObject(ObjectInspectorUtils.java:305)
   at 
 org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:150)
   at 
 org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:142)
   at 
 org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.copyKey(KeyWrapperFactory.java:119)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:827)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:739)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:809)
   ... 16 more
 {noformat}
 Here is a q file:
 {noformat}
 SET hive.vectorized.execution.enabled=false;
 drop table char_2;
 create table char_2 (
   key char(10),
   value char(20)
 ) stored as parquet;
 insert overwrite table char_2 select * from src;
 select value, sum(cast(key as int)), count(*) numrows
 from src
 group by value
 order by value asc
 limit 5;
 explain select value, sum(cast(key as int)), count(*) numrows
 from char_2
 group by value
 order by value asc
 limit 5;
 -- should match the query from src
 select value, sum(cast(key as int)), count(*) numrows
 from char_2
 group by value
 order by value asc
 limit 5;
 select value, sum(cast(key as int)), count(*) numrows
 from src
 group by value
 order by value desc
 limit 5;
 explain select value, sum(cast(key as int)), count(*) numrows
 from char_2
 group by value
 order by value desc
 limit 5;
 -- should match the query from src
 select value, sum(cast(key as int)), count(*) numrows
 from char_2
 group by value
 order by value desc
 limit 5;
 drop table char_2;
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9450) [Parquet] Check all data types work for Parquet in Group By operator

2015-01-25 Thread Ferdinand Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291326#comment-14291326
 ] 

Ferdinand Xu commented on HIVE-9450:


Hi [~brocknoland] and [~dongc], do we really need to change the 
WritableHiveCharObjectInspector.java ? See 
https://issues.apache.org/jira/browse/HIVE-9371 

 [Parquet] Check all data types work for Parquet in Group By operator
 

 Key: HIVE-9450
 URL: https://issues.apache.org/jira/browse/HIVE-9450
 Project: Hive
  Issue Type: Sub-task
Reporter: Dong Chen
Assignee: Dong Chen
 Attachments: HIVE-9450.patch, HIVE-9450.patch


 Check all data types work for Parquet in Group By operator.
 1. Add test cases for data types.
 2. Fix the ClassCastException bug for CHARVARCHAR used in group by for 
 Parquet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9302) Beeline add jar local to client

2015-01-25 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-9302:
---
Attachment: HIVE-9302.1.patch

 Beeline add jar local to client
 ---

 Key: HIVE-9302
 URL: https://issues.apache.org/jira/browse/HIVE-9302
 Project: Hive
  Issue Type: New Feature
Reporter: Brock Noland
Assignee: Ferdinand Xu
 Attachments: HIVE-9302.1.patch, HIVE-9302.patch, 
 mysql-connector-java-bin.jar, postgresql-9.3.jdbc3.jar


 At present if a beeline user uses {{add jar}} the path they give is actually 
 on the HS2 server. It'd be great to allow beeline users to add local jars as 
 well.
 It might be useful to do this in the jdbc driver itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9302) Beeline add jar local to client

2015-01-25 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-9302:
---
Attachment: DummyDriver-1.0-SNAPSHOT.jar

 Beeline add jar local to client
 ---

 Key: HIVE-9302
 URL: https://issues.apache.org/jira/browse/HIVE-9302
 Project: Hive
  Issue Type: New Feature
Reporter: Brock Noland
Assignee: Ferdinand Xu
 Attachments: DummyDriver-1.0-SNAPSHOT.jar, HIVE-9302.1.patch, 
 HIVE-9302.patch, mysql-connector-java-bin.jar, postgresql-9.3.jdbc3.jar


 At present if a beeline user uses {{add jar}} the path they give is actually 
 on the HS2 server. It'd be great to allow beeline users to add local jars as 
 well.
 It might be useful to do this in the jdbc driver itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9302) Beeline add jar local to client

2015-01-25 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-9302:
---
Attachment: (was: HIVE-9302.1.patch)

 Beeline add jar local to client
 ---

 Key: HIVE-9302
 URL: https://issues.apache.org/jira/browse/HIVE-9302
 Project: Hive
  Issue Type: New Feature
Reporter: Brock Noland
Assignee: Ferdinand Xu
 Attachments: DummyDriver-1.0-SNAPSHOT.jar, HIVE-9302.1.patch, 
 HIVE-9302.patch, mysql-connector-java-bin.jar, postgresql-9.3.jdbc3.jar


 At present if a beeline user uses {{add jar}} the path they give is actually 
 on the HS2 server. It'd be great to allow beeline users to add local jars as 
 well.
 It might be useful to do this in the jdbc driver itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9302) Beeline add jar local to client

2015-01-25 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-9302:
---
Attachment: HIVE-9302.1.patch

 Beeline add jar local to client
 ---

 Key: HIVE-9302
 URL: https://issues.apache.org/jira/browse/HIVE-9302
 Project: Hive
  Issue Type: New Feature
Reporter: Brock Noland
Assignee: Ferdinand Xu
 Attachments: DummyDriver-1.0-SNAPSHOT.jar, HIVE-9302.1.patch, 
 HIVE-9302.patch, mysql-connector-java-bin.jar, postgresql-9.3.jdbc3.jar


 At present if a beeline user uses {{add jar}} the path they give is actually 
 on the HS2 server. It'd be great to allow beeline users to add local jars as 
 well.
 It might be useful to do this in the jdbc driver itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9371) Execution error for Parquet table and GROUP BY involving CHAR data type

2015-01-21 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-9371:
---
Attachment: HIVE-9371.patch

Reupload my patch to kick off the precommit.

 Execution error for Parquet table and GROUP BY involving CHAR data type
 ---

 Key: HIVE-9371
 URL: https://issues.apache.org/jira/browse/HIVE-9371
 Project: Hive
  Issue Type: Bug
  Components: File Formats, Query Processor
Reporter: Matt McCline
Assignee: Ferdinand Xu
Priority: Critical
 Attachments: HIVE-9371.patch, HIVE-9371.patch


 Query fails involving PARQUET table format, CHAR data type, and GROUP BY.
 Probably also fails for VARCHAR, too.
 {noformat}
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to 
 org.apache.hadoop.hive.serde2.io.HiveCharWritable
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:814)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
   at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
   at 
 org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493)
   ... 10 more
 Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be 
 cast to org.apache.hadoop.hive.serde2.io.HiveCharWritable
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableHiveCharObjectInspector.copyObject(WritableHiveCharObjectInspector.java:104)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.copyToStandardObject(ObjectInspectorUtils.java:305)
   at 
 org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:150)
   at 
 org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:142)
   at 
 org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.copyKey(KeyWrapperFactory.java:119)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:827)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:739)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:809)
   ... 16 more
 {noformat}
 Here is a q file:
 {noformat}
 SET hive.vectorized.execution.enabled=false;
 drop table char_2;
 create table char_2 (
   key char(10),
   value char(20)
 ) stored as parquet;
 insert overwrite table char_2 select * from src;
 select value, sum(cast(key as int)), count(*) numrows
 from src
 group by value
 order by value asc
 limit 5;
 explain select value, sum(cast(key as int)), count(*) numrows
 from char_2
 group by value
 order by value asc
 limit 5;
 -- should match the query from src
 select value, sum(cast(key as int)), count(*) numrows
 from char_2
 group by value
 order by value asc
 limit 5;
 select value, sum(cast(key as int)), count(*) numrows
 from src
 group by value
 order by value desc
 limit 5;
 explain select value, sum(cast(key as int)), count(*) numrows
 from char_2
 group by value
 order by value desc
 limit 5;
 -- should match the query from src
 select value, sum(cast(key as int)), count(*) numrows
 from char_2
 group by value
 order by value desc
 limit 5;
 drop table char_2;
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9371) Execution error for Parquet table and GROUP BY involving CHAR data type

2015-01-21 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-9371:
---
Attachment: HIVE-9371.1.patch

 Execution error for Parquet table and GROUP BY involving CHAR data type
 ---

 Key: HIVE-9371
 URL: https://issues.apache.org/jira/browse/HIVE-9371
 Project: Hive
  Issue Type: Bug
  Components: File Formats, Query Processor
Reporter: Matt McCline
Assignee: Ferdinand Xu
Priority: Critical
 Attachments: HIVE-9371.1.patch, HIVE-9371.patch, HIVE-9371.patch


 Query fails involving PARQUET table format, CHAR data type, and GROUP BY.
 Probably also fails for VARCHAR, too.
 {noformat}
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to 
 org.apache.hadoop.hive.serde2.io.HiveCharWritable
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:814)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
   at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
   at 
 org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493)
   ... 10 more
 Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be 
 cast to org.apache.hadoop.hive.serde2.io.HiveCharWritable
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableHiveCharObjectInspector.copyObject(WritableHiveCharObjectInspector.java:104)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.copyToStandardObject(ObjectInspectorUtils.java:305)
   at 
 org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:150)
   at 
 org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:142)
   at 
 org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.copyKey(KeyWrapperFactory.java:119)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:827)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:739)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:809)
   ... 16 more
 {noformat}
 Here is a q file:
 {noformat}
 SET hive.vectorized.execution.enabled=false;
 drop table char_2;
 create table char_2 (
   key char(10),
   value char(20)
 ) stored as parquet;
 insert overwrite table char_2 select * from src;
 select value, sum(cast(key as int)), count(*) numrows
 from src
 group by value
 order by value asc
 limit 5;
 explain select value, sum(cast(key as int)), count(*) numrows
 from char_2
 group by value
 order by value asc
 limit 5;
 -- should match the query from src
 select value, sum(cast(key as int)), count(*) numrows
 from char_2
 group by value
 order by value asc
 limit 5;
 select value, sum(cast(key as int)), count(*) numrows
 from src
 group by value
 order by value desc
 limit 5;
 explain select value, sum(cast(key as int)), count(*) numrows
 from char_2
 group by value
 order by value desc
 limit 5;
 -- should match the query from src
 select value, sum(cast(key as int)), count(*) numrows
 from char_2
 group by value
 order by value desc
 limit 5;
 drop table char_2;
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9371) Execution error for Parquet table and GROUP BY involving CHAR data type

2015-01-20 Thread Ferdinand Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14283837#comment-14283837
 ] 

Ferdinand Xu commented on HIVE-9371:


It failed when executing the command:
explain select value, sum(cast(key as int)), count(*) numrows
from char_2
group by value
order by value asc
limit 5;

The GroupByOperator got the writableHiveCharObjectInspector to parse a Text 
object which should be WritableStringObjectInspector. 

 Execution error for Parquet table and GROUP BY involving CHAR data type
 ---

 Key: HIVE-9371
 URL: https://issues.apache.org/jira/browse/HIVE-9371
 Project: Hive
  Issue Type: Bug
  Components: File Formats, Query Processor
Reporter: Matt McCline
Priority: Critical

 Query fails involving PARQUET table format, CHAR data type, and GROUP BY.
 Probably also fails for VARCHAR, too.
 {noformat}
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to 
 org.apache.hadoop.hive.serde2.io.HiveCharWritable
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:814)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
   at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
   at 
 org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493)
   ... 10 more
 Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be 
 cast to org.apache.hadoop.hive.serde2.io.HiveCharWritable
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableHiveCharObjectInspector.copyObject(WritableHiveCharObjectInspector.java:104)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.copyToStandardObject(ObjectInspectorUtils.java:305)
   at 
 org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:150)
   at 
 org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:142)
   at 
 org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.copyKey(KeyWrapperFactory.java:119)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:827)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:739)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:809)
   ... 16 more
 {noformat}
 Here is a q file:
 {noformat}
 SET hive.vectorized.execution.enabled=false;
 drop table char_2;
 create table char_2 (
   key char(10),
   value char(20)
 ) stored as parquet;
 insert overwrite table char_2 select * from src;
 select value, sum(cast(key as int)), count(*) numrows
 from src
 group by value
 order by value asc
 limit 5;
 explain select value, sum(cast(key as int)), count(*) numrows
 from char_2
 group by value
 order by value asc
 limit 5;
 -- should match the query from src
 select value, sum(cast(key as int)), count(*) numrows
 from char_2
 group by value
 order by value asc
 limit 5;
 select value, sum(cast(key as int)), count(*) numrows
 from src
 group by value
 order by value desc
 limit 5;
 explain select value, sum(cast(key as int)), count(*) numrows
 from char_2
 group by value
 order by value desc
 limit 5;
 -- should match the query from src
 select value, sum(cast(key as int)), count(*) numrows
 from char_2
 group by value
 order by value desc
 limit 5;
 drop table char_2;
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-9371) Execution error for Parquet table and GROUP BY involving CHAR data type

2015-01-20 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu reassigned HIVE-9371:
--

Assignee: Ferdinand Xu

 Execution error for Parquet table and GROUP BY involving CHAR data type
 ---

 Key: HIVE-9371
 URL: https://issues.apache.org/jira/browse/HIVE-9371
 Project: Hive
  Issue Type: Bug
  Components: File Formats, Query Processor
Reporter: Matt McCline
Assignee: Ferdinand Xu
Priority: Critical

 Query fails involving PARQUET table format, CHAR data type, and GROUP BY.
 Probably also fails for VARCHAR, too.
 {noformat}
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to 
 org.apache.hadoop.hive.serde2.io.HiveCharWritable
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:814)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
   at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
   at 
 org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493)
   ... 10 more
 Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be 
 cast to org.apache.hadoop.hive.serde2.io.HiveCharWritable
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableHiveCharObjectInspector.copyObject(WritableHiveCharObjectInspector.java:104)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.copyToStandardObject(ObjectInspectorUtils.java:305)
   at 
 org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:150)
   at 
 org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:142)
   at 
 org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.copyKey(KeyWrapperFactory.java:119)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:827)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:739)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:809)
   ... 16 more
 {noformat}
 Here is a q file:
 {noformat}
 SET hive.vectorized.execution.enabled=false;
 drop table char_2;
 create table char_2 (
   key char(10),
   value char(20)
 ) stored as parquet;
 insert overwrite table char_2 select * from src;
 select value, sum(cast(key as int)), count(*) numrows
 from src
 group by value
 order by value asc
 limit 5;
 explain select value, sum(cast(key as int)), count(*) numrows
 from char_2
 group by value
 order by value asc
 limit 5;
 -- should match the query from src
 select value, sum(cast(key as int)), count(*) numrows
 from char_2
 group by value
 order by value asc
 limit 5;
 select value, sum(cast(key as int)), count(*) numrows
 from src
 group by value
 order by value desc
 limit 5;
 explain select value, sum(cast(key as int)), count(*) numrows
 from char_2
 group by value
 order by value desc
 limit 5;
 -- should match the query from src
 select value, sum(cast(key as int)), count(*) numrows
 from char_2
 group by value
 order by value desc
 limit 5;
 drop table char_2;
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-8838) Support Parquet through HCatalog

2015-01-20 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu reassigned HIVE-8838:
--

Assignee: Ferdinand Xu

 Support Parquet through HCatalog
 

 Key: HIVE-8838
 URL: https://issues.apache.org/jira/browse/HIVE-8838
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Ferdinand Xu

 Similar to HIVE-8687 for Avro we need to fix Parquet with HCatalog.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9371) Execution error for Parquet table and GROUP BY involving CHAR data type

2015-01-20 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-9371:
---
Status: Patch Available  (was: Open)

 Execution error for Parquet table and GROUP BY involving CHAR data type
 ---

 Key: HIVE-9371
 URL: https://issues.apache.org/jira/browse/HIVE-9371
 Project: Hive
  Issue Type: Bug
  Components: File Formats, Query Processor
Reporter: Matt McCline
Assignee: Ferdinand Xu
Priority: Critical
 Attachments: HIVE-9371.patch


 Query fails involving PARQUET table format, CHAR data type, and GROUP BY.
 Probably also fails for VARCHAR, too.
 {noformat}
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to 
 org.apache.hadoop.hive.serde2.io.HiveCharWritable
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:814)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
   at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
   at 
 org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493)
   ... 10 more
 Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be 
 cast to org.apache.hadoop.hive.serde2.io.HiveCharWritable
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableHiveCharObjectInspector.copyObject(WritableHiveCharObjectInspector.java:104)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.copyToStandardObject(ObjectInspectorUtils.java:305)
   at 
 org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:150)
   at 
 org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:142)
   at 
 org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.copyKey(KeyWrapperFactory.java:119)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:827)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:739)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:809)
   ... 16 more
 {noformat}
 Here is a q file:
 {noformat}
 SET hive.vectorized.execution.enabled=false;
 drop table char_2;
 create table char_2 (
   key char(10),
   value char(20)
 ) stored as parquet;
 insert overwrite table char_2 select * from src;
 select value, sum(cast(key as int)), count(*) numrows
 from src
 group by value
 order by value asc
 limit 5;
 explain select value, sum(cast(key as int)), count(*) numrows
 from char_2
 group by value
 order by value asc
 limit 5;
 -- should match the query from src
 select value, sum(cast(key as int)), count(*) numrows
 from char_2
 group by value
 order by value asc
 limit 5;
 select value, sum(cast(key as int)), count(*) numrows
 from src
 group by value
 order by value desc
 limit 5;
 explain select value, sum(cast(key as int)), count(*) numrows
 from char_2
 group by value
 order by value desc
 limit 5;
 -- should match the query from src
 select value, sum(cast(key as int)), count(*) numrows
 from char_2
 group by value
 order by value desc
 limit 5;
 drop table char_2;
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   4   5   >