from:"Ferdinand Xu \(JIRA\)"


 [ 
https://issues.apache.org/jira/browse/HIVE-9252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-9252:
---
Attachment: HIVE-9252.patch

The initial patch is attached!

 Linking custom SerDe jar to table definition.
 -

 Key: HIVE-9252
 URL: https://issues.apache.org/jira/browse/HIVE-9252
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Reporter: Niels Basjes
Assignee: Ferdinand Xu
 Attachments: HIVE-9252.patch


 In HIVE-6047 the option was created that a jar file can be hooked to the 
 definition of a function. (See: [Language Manual DDL: Permanent 
 Functions|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-PermanentFunctions]
  )
 I propose to add something similar that can be used when defining an external 
 table that relies on a custom Serde (I expect to usually only have the 
 Deserializer).
 Something like this:
 {code}
 CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
 ...
 STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)] 
 [USING JAR|FILE|ARCHIVE 'file_uri' [, JAR|FILE|ARCHIVE 'file_uri'] ];
 {code}
 Using this you can define (and share !!!) a Hive table on top of a custom 
 fileformat without the need to let the IT operations people deploy a custom 
 SerDe jar file on all nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9252) Linking custom SerDe jar to table definition.


 [ 
https://issues.apache.org/jira/browse/HIVE-9252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-9252:
---
Attachment: HIVE-9252.1.patch

rebase patch

 Linking custom SerDe jar to table definition.
 -

 Key: HIVE-9252
 URL: https://issues.apache.org/jira/browse/HIVE-9252
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Reporter: Niels Basjes
Assignee: Ferdinand Xu
 Attachments: HIVE-9252.1.patch, HIVE-9252.patch


 In HIVE-6047 the option was created that a jar file can be hooked to the 
 definition of a function. (See: [Language Manual DDL: Permanent 
 Functions|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-PermanentFunctions]
  )
 I propose to add something similar that can be used when defining an external 
 table that relies on a custom Serde (I expect to usually only have the 
 Deserializer).
 Something like this:
 {code}
 CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
 ...
 STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)] 
 [USING JAR|FILE|ARCHIVE 'file_uri' [, JAR|FILE|ARCHIVE 'file_uri'] ];
 {code}
 Using this you can define (and share !!!) a Hive table on top of a custom 
 fileformat without the need to let the IT operations people deploy a custom 
 SerDe jar file on all nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9252) Linking custom SerDe jar to table definition.


 [ 
https://issues.apache.org/jira/browse/HIVE-9252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-9252:
---
Status: Patch Available  (was: Open)

 Linking custom SerDe jar to table definition.
 -

 Key: HIVE-9252
 URL: https://issues.apache.org/jira/browse/HIVE-9252
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Reporter: Niels Basjes
Assignee: Ferdinand Xu
 Attachments: HIVE-9252.1.patch, HIVE-9252.patch


 In HIVE-6047 the option was created that a jar file can be hooked to the 
 definition of a function. (See: [Language Manual DDL: Permanent 
 Functions|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-PermanentFunctions]
  )
 I propose to add something similar that can be used when defining an external 
 table that relies on a custom Serde (I expect to usually only have the 
 Deserializer).
 Something like this:
 {code}
 CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
 ...
 STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)] 
 [USING JAR|FILE|ARCHIVE 'file_uri' [, JAR|FILE|ARCHIVE 'file_uri'] ];
 {code}
 Using this you can define (and share !!!) a Hive table on top of a custom 
 fileformat without the need to let the IT operations people deploy a custom 
 SerDe jar file on all nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-9661) Refine debug log with schema information for the method of creating session directories

Ferdinand Xu created HIVE-9661:
--

 Summary: Refine debug log with schema information for the method 
of creating session directories
 Key: HIVE-9661
 URL: https://issues.apache.org/jira/browse/HIVE-9661
 Project: Hive
  Issue Type: Bug
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu
Priority: Minor


For a session, the scratch directory can be either a local path or a hdfs 
scratch path. The method name createRootHDFSDir is quite confusing. So add the 
schema information to the debug log for the troubleshooting need.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9661) Refine debug log with schema information for the method of creating session directories


 [ 
https://issues.apache.org/jira/browse/HIVE-9661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-9661:
---
Status: Patch Available  (was: Open)

 Refine debug log with schema information for the method of creating session 
 directories
 ---

 Key: HIVE-9661
 URL: https://issues.apache.org/jira/browse/HIVE-9661
 Project: Hive
  Issue Type: Bug
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu
Priority: Minor
 Attachments: HIVE-9661.patch


 For a session, the scratch directory can be either a local path or a hdfs 
 scratch path. The method name createRootHDFSDir is quite confusing. So add 
 the schema information to the debug log for the troubleshooting need.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9661) Refine debug log with schema information for the method of creating session directories


 [ 
https://issues.apache.org/jira/browse/HIVE-9661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-9661:
---
Attachment: HIVE-9661.patch

 Refine debug log with schema information for the method of creating session 
 directories
 ---

 Key: HIVE-9661
 URL: https://issues.apache.org/jira/browse/HIVE-9661
 Project: Hive
  Issue Type: Bug
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu
Priority: Minor
 Attachments: HIVE-9661.patch


 For a session, the scratch directory can be either a local path or a hdfs 
 scratch path. The method name createRootHDFSDir is quite confusing. So add 
 the schema information to the debug log for the troubleshooting need.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9252) Linking custom SerDe jar to table definition.


 [ 
https://issues.apache.org/jira/browse/HIVE-9252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-9252:
---
Attachment: (was: HIVE-9252.patch)

 Linking custom SerDe jar to table definition.
 -

 Key: HIVE-9252
 URL: https://issues.apache.org/jira/browse/HIVE-9252
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Reporter: Niels Basjes
Assignee: Ferdinand Xu
 Attachments: HIVE-9252.1.patch


 In HIVE-6047 the option was created that a jar file can be hooked to the 
 definition of a function. (See: [Language Manual DDL: Permanent 
 Functions|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-PermanentFunctions]
  )
 I propose to add something similar that can be used when defining an external 
 table that relies on a custom Serde (I expect to usually only have the 
 Deserializer).
 Something like this:
 {code}
 CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
 ...
 STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)] 
 [USING JAR|FILE|ARCHIVE 'file_uri' [, JAR|FILE|ARCHIVE 'file_uri'] ];
 {code}
 Using this you can define (and share !!!) a Hive table on top of a custom 
 fileformat without the need to let the IT operations people deploy a custom 
 SerDe jar file on all nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8136) Reduce table locking

2015-02-04 Thread Ferdinand Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14306528#comment-14306528
 ] 

Ferdinand Xu commented on HIVE-8136:


Hi [~brocknoland], I agree with you that an exclusive lock is a must for 
altering table structure. I think ADDCLUSTERSORTCOLUMN can use shared lock 
instead. Please see my previous comments for details.

 Reduce table locking
 

 Key: HIVE-8136
 URL: https://issues.apache.org/jira/browse/HIVE-8136
 Project: Hive
  Issue Type: Sub-task
Reporter: Brock Noland
Assignee: Ferdinand Xu
 Attachments: HIVE-8136.patch


 When using ZK for concurrency control, some statements require an exclusive 
 table lock when they are atomic. Such as setting a tables location.
 This JIRA is to analyze the scope of statements like ALTER TABLE and see if 
 we can reduce the locking required.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9302) Beeline add commands to register local jdbc driver names and jars

2015-02-04 Thread Ferdinand Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14306534#comment-14306534
 ] 

Ferdinand Xu commented on HIVE-9302:


Thank Sergio for your review.
@[~brocknoland], do you have any further comments for my patch?

 Beeline add commands to register local jdbc driver names and jars
 -

 Key: HIVE-9302
 URL: https://issues.apache.org/jira/browse/HIVE-9302
 Project: Hive
  Issue Type: New Feature
Reporter: Brock Noland
Assignee: Ferdinand Xu
 Attachments: DummyDriver-1.0-SNAPSHOT.jar, HIVE-9302.1.patch, 
 HIVE-9302.2.patch, HIVE-9302.3.patch, HIVE-9302.patch, 
 mysql-connector-java-bin.jar, postgresql-9.3.jdbc3.jar


 At present if a beeline user uses {{add jar}} the path they give is actually 
 on the HS2 server. It'd be great to allow beeline users to add local jdbc 
 driver jars and register custom jdbc driver names.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9302) Beeline add commands to register local jdbc driver names and jars

2015-02-03 Thread Ferdinand Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-9302:
---
Attachment: HIVE-9302.3.patch

Hi Sergio, I have update my patch according to your comments. Please help me 
review it if you have some time. Thank you!

 Beeline add commands to register local jdbc driver names and jars
 -

 Key: HIVE-9302
 URL: https://issues.apache.org/jira/browse/HIVE-9302
 Project: Hive
  Issue Type: New Feature
Reporter: Brock Noland
Assignee: Ferdinand Xu
 Attachments: DummyDriver-1.0-SNAPSHOT.jar, HIVE-9302.1.patch, 
 HIVE-9302.2.patch, HIVE-9302.3.patch, HIVE-9302.patch, 
 mysql-connector-java-bin.jar, postgresql-9.3.jdbc3.jar


 At present if a beeline user uses {{add jar}} the path they give is actually 
 on the HS2 server. It'd be great to allow beeline users to add local jdbc 
 driver jars and register custom jdbc driver names.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8136) Reduce table locking

[
https://issues.apache.org/jira/browse/HIVE-8136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296607#comment-14296607
]

Ferdinand Xu commented on HIVE-8136:

Currently the following alter table write type is trying to acquire an
exclusive lock.
DDL_EXCLUSIVE;
RENAMECOLUMN
ADDCLUSTERSORTCOLUMN:
ADDFILEFORMAT:
DROPPROPS:
REPLACECOLS:
ARCHIVE:
UNARCHIVE:
ALTERPROTECTMODE:
ALTERPARTITIONPROTECTMODE:
ALTERLOCATION:
DROPPARTITION:
RENAMEPARTITION:
ADDSKEWEDBY:
ALTERSKEWEDLOCATION:
ALTERBUCKETNUM:
ALTERPARTITION:
ADDCOLS:
RENAME:
TRUNCATE:
MERGEFILES:

Other following is using shared lock:
ADDSERDE
ADDPARTITION
ADDSERDEPROPS
ADDPROPS

Others has no lock:
COMPACT
TOUCH

For changing table structure, an exclusive lock is a must. Most of the cases
use the exclusive lock since it changes the table or partition structure
currently. For adding cluster column and sort column, we can use shared lock
for the following reason.
{quote}
The CLUSTERED BY and SORTED BY creation commands do not affect how data is
inserted into a table – only how it is read. This means that users must be
careful to insert data correctly by specifying the number of reducers to be
equal to the number of buckets, and using CLUSTER BY and SORT BY commands in
their query.
{quote}
For changing the properties, I think we can use no lock if it doesn't change
the structure of the table. We can do a follow-up jira. Any thought about it,
[~brocknoland]?

Reduce table locking

Key: HIVE-8136
URL: https://issues.apache.org/jira/browse/HIVE-8136
Project: Hive
Issue Type: Sub-task
Reporter: Brock Noland
Assignee: Ferdinand Xu

When using ZK for concurrency control, some statements require an exclusive
table lock when they are atomic. Such as setting a tables location.
This JIRA is to analyze the scope of statements like ALTER TABLE and see if
we can reduce the locking required.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8136) Reduce table locking


 [ 
https://issues.apache.org/jira/browse/HIVE-8136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-8136:
---
Status: Patch Available  (was: In Progress)

 Reduce table locking
 

 Key: HIVE-8136
 URL: https://issues.apache.org/jira/browse/HIVE-8136
 Project: Hive
  Issue Type: Sub-task
Reporter: Brock Noland
Assignee: Ferdinand Xu
 Attachments: HIVE-8136.patch


 When using ZK for concurrency control, some statements require an exclusive 
 table lock when they are atomic. Such as setting a tables location.
 This JIRA is to analyze the scope of statements like ALTER TABLE and see if 
 we can reduce the locking required.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8136) Reduce table locking


 [ 
https://issues.apache.org/jira/browse/HIVE-8136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-8136:
---
Attachment: HIVE-8136.patch

 Reduce table locking
 

 Key: HIVE-8136
 URL: https://issues.apache.org/jira/browse/HIVE-8136
 Project: Hive
  Issue Type: Sub-task
Reporter: Brock Noland
Assignee: Ferdinand Xu
 Attachments: HIVE-8136.patch


 When using ZK for concurrency control, some statements require an exclusive 
 table lock when they are atomic. Such as setting a tables location.
 This JIRA is to analyze the scope of statements like ALTER TABLE and see if 
 we can reduce the locking required.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-9252) Linking custom SerDe jar to table definition.


 [ 
https://issues.apache.org/jira/browse/HIVE-9252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu reassigned HIVE-9252:
--

Assignee: Ferdinand Xu

 Linking custom SerDe jar to table definition.
 -

 Key: HIVE-9252
 URL: https://issues.apache.org/jira/browse/HIVE-9252
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Reporter: Niels Basjes
Assignee: Ferdinand Xu

 In HIVE-6047 the option was created that a jar file can be hooked to the 
 definition of a function. (See: [Language Manual DDL: Permanent 
 Functions|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-PermanentFunctions]
  )
 I propose to add something similar that can be used when defining an external 
 table that relies on a custom Serde (I expect to usually only have the 
 Deserializer).
 Something like this:
 {code}
 CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
 ...
 STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)] 
 [USING JAR|FILE|ARCHIVE 'file_uri' [, JAR|FILE|ARCHIVE 'file_uri'] ];
 {code}
 Using this you can define (and share !!!) a Hive table on top of a custom 
 fileformat without the need to let the IT operations people deploy a custom 
 SerDe jar file on all nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-9522) Improve select count(*) statement for a parquet table with big input(~1Gb)

Ferdinand Xu created HIVE-9522:
--

 Summary: Improve select count(*) statement for a parquet table 
with big input(~1Gb)
 Key: HIVE-9522
 URL: https://issues.apache.org/jira/browse/HIVE-9522
 Project: Hive
  Issue Type: Sub-task
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8136) Reduce table locking


[ 
https://issues.apache.org/jira/browse/HIVE-8136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14297970#comment-14297970
 ] 

Ferdinand Xu commented on HIVE-8136:


Sounds unrelated failed cases.

 Reduce table locking
 

 Key: HIVE-8136
 URL: https://issues.apache.org/jira/browse/HIVE-8136
 Project: Hive
  Issue Type: Sub-task
Reporter: Brock Noland
Assignee: Ferdinand Xu
 Attachments: HIVE-8136.patch


 When using ZK for concurrency control, some statements require an exclusive 
 table lock when they are atomic. Such as setting a tables location.
 This JIRA is to analyze the scope of statements like ALTER TABLE and see if 
 we can reduce the locking required.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9522) Improve the speed of select count(*) statement for a parquet table with big input(~1GB)


 [ 
https://issues.apache.org/jira/browse/HIVE-9522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-9522:
---
Summary: Improve the speed of select count(*) statement for a parquet table 
with big input(~1GB)  (was: Improve the speed of select count(*) statement for 
a parquet table with big input(~1Gb))

 Improve the speed of select count(*) statement for a parquet table with big 
 input(~1GB)
 ---

 Key: HIVE-9522
 URL: https://issues.apache.org/jira/browse/HIVE-9522
 Project: Hive
  Issue Type: Sub-task
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9333) Move parquet serialize implementation to DataWritableWriter to improve write speeds


[ 
https://issues.apache.org/jira/browse/HIVE-9333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14298082#comment-14298082
 ] 

Ferdinand Xu commented on HIVE-9333:


Thanks Sergio for your patch.
LGTM +1

 Move parquet serialize implementation to DataWritableWriter to improve write 
 speeds
 ---

 Key: HIVE-9333
 URL: https://issues.apache.org/jira/browse/HIVE-9333
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergio Peña
Assignee: Sergio Peña
 Attachments: HIVE-9333.2.patch, HIVE-9333.3.patch


 The serialize process on ParquetHiveSerDe parses a Hive object
 to a Writable object by looping through all the Hive object children,
 and creating new Writables objects per child. These final writables
 objects are passed in to the Parquet writing function, and parsed again
 on the DataWritableWriter class by looping through the ArrayWritable
 object. These two loops (ParquetHiveSerDe.serialize() and 
 DataWritableWriter.write()  may be reduced to use just one loop into the 
 DataWritableWriter.write() method in order to increment the writing process 
 speed for Hive parquet.
 In order to achieve this, we can wrap the Hive object and object inspector
 on ParquetHiveSerDe.serialize() method into an object that implements the 
 Writable object and thus avoid the loop that serialize() does, and leave the 
 loop parser to the DataWritableWriter.write() method. We can see how ORC does 
 this with the OrcSerde.OrcSerdeRow class.
 Writable objects are organized differently on any kind of storage formats, so 
 I don't think it is necessary to create and keep the writable objects in the 
 serialize() method as they won't be used until the writing process starts 
 (DataWritableWriter.write()).
 This performance issue was found using microbenchmark tests from HIVE-8121.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9522) Improve the speed of select count(*) statement for a parquet table with big input(~1Gb)


 [ 
https://issues.apache.org/jira/browse/HIVE-9522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-9522:
---
Summary: Improve the speed of select count(*) statement for a parquet table 
with big input(~1Gb)  (was: Improve select count(*) statement for a parquet 
table with big input(~1Gb))

 Improve the speed of select count(*) statement for a parquet table with big 
 input(~1Gb)
 ---

 Key: HIVE-9522
 URL: https://issues.apache.org/jira/browse/HIVE-9522
 Project: Hive
  Issue Type: Sub-task
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9302) Beeline add jar local to client


 [ 
https://issues.apache.org/jira/browse/HIVE-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-9302:
---
Attachment: HIVE-9302.2.patch

 Beeline add jar local to client
 ---

 Key: HIVE-9302
 URL: https://issues.apache.org/jira/browse/HIVE-9302
 Project: Hive
  Issue Type: New Feature
Reporter: Brock Noland
Assignee: Ferdinand Xu
 Attachments: DummyDriver-1.0-SNAPSHOT.jar, HIVE-9302.1.patch, 
 HIVE-9302.2.patch, HIVE-9302.patch, mysql-connector-java-bin.jar, 
 postgresql-9.3.jdbc3.jar


 At present if a beeline user uses {{add jar}} the path they give is actually 
 on the HS2 server. It'd be great to allow beeline users to add local jars as 
 well.
 It might be useful to do this in the jdbc driver itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9302) Beeline add jar local to client


[ 
https://issues.apache.org/jira/browse/HIVE-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14294853#comment-14294853
 ] 

Ferdinand Xu commented on HIVE-9302:


Sorry, I meant to. 
There are two kinds of use cases. One is to add an existing known driver like 
mysql driver or postgres driver. Current supported driver are postgres and 
mysql.
{noformat}
# beeline
beeline !addlocaldriverjar /path/to/mysql-connector-java-5.1.27-bin.jar
beeline !connect mysql://host:3306/testdb
{noformat}
And another is to add a customized driver.
{noformat}
# beeline
beeline!addlocaldriverjar /path/to/DummyDriver-1.0-SNAPSHOT.jar
beeline!!addlocaldrivername org.apache.dummy.DummyDrive
beeline !connect mysql://host:3306/testdb
{noformat}

 Beeline add jar local to client
 ---

 Key: HIVE-9302
 URL: https://issues.apache.org/jira/browse/HIVE-9302
 Project: Hive
  Issue Type: New Feature
Reporter: Brock Noland
Assignee: Ferdinand Xu
 Attachments: DummyDriver-1.0-SNAPSHOT.jar, HIVE-9302.1.patch, 
 HIVE-9302.patch, mysql-connector-java-bin.jar, postgresql-9.3.jdbc3.jar


 At present if a beeline user uses {{add jar}} the path they give is actually 
 on the HS2 server. It'd be great to allow beeline users to add local jars as 
 well.
 It might be useful to do this in the jdbc driver itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9470) Use a generic writable object to run ColumnaStorageBench write/read tests


[ 
https://issues.apache.org/jira/browse/HIVE-9470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296201#comment-14296201
 ] 

Ferdinand Xu commented on HIVE-9470:


Thank you for your update.  +1

 Use a generic writable object to run ColumnaStorageBench write/read tests 
 --

 Key: HIVE-9470
 URL: https://issues.apache.org/jira/browse/HIVE-9470
 Project: Hive
  Issue Type: Improvement
Reporter: Sergio Peña
Assignee: Sergio Peña
 Attachments: HIVE-9470.1.patch, HIVE-9470.2.patch


 The ColumnarStorageBench benchmark class is using a Parquet writable object 
 to run all write/read/serialize/deserialize tests. It would be better to use 
 a more generic writable object (like text writables) to get better benchmark 
 results between format storages.
 Using parquet writables may add advantage when writing parquet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9302) Beeline add commands to register local jdbc driver names and jars


[ 
https://issues.apache.org/jira/browse/HIVE-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296117#comment-14296117
 ] 

Ferdinand Xu commented on HIVE-9302:


Thanks [~thejas] for your update!

 Beeline add commands to register local jdbc driver names and jars
 -

 Key: HIVE-9302
 URL: https://issues.apache.org/jira/browse/HIVE-9302
 Project: Hive
  Issue Type: New Feature
Reporter: Brock Noland
Assignee: Ferdinand Xu
 Attachments: DummyDriver-1.0-SNAPSHOT.jar, HIVE-9302.1.patch, 
 HIVE-9302.2.patch, HIVE-9302.patch, mysql-connector-java-bin.jar, 
 postgresql-9.3.jdbc3.jar


 At present if a beeline user uses {{add jar}} the path they give is actually 
 on the HS2 server. It'd be great to allow beeline users to add local jdbc 
 driver jars and register custom jdbc driver names.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9302) Beeline add jar local to client

2015-01-27 Thread Ferdinand Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14293700#comment-14293700
 ] 

Ferdinand Xu commented on HIVE-9302:


Failed cases are caused by lack of Driver jar files attached in this jira.

 Beeline add jar local to client
 ---

 Key: HIVE-9302
 URL: https://issues.apache.org/jira/browse/HIVE-9302
 Project: Hive
  Issue Type: New Feature
Reporter: Brock Noland
Assignee: Ferdinand Xu
 Attachments: DummyDriver-1.0-SNAPSHOT.jar, HIVE-9302.1.patch, 
 HIVE-9302.patch, mysql-connector-java-bin.jar, postgresql-9.3.jdbc3.jar


 At present if a beeline user uses {{add jar}} the path they give is actually 
 on the HS2 server. It'd be great to allow beeline users to add local jars as 
 well.
 It might be useful to do this in the jdbc driver itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9333) Move parquet serialize implementation to DataWritableWriter to improve write speeds

2015-01-26 Thread Ferdinand Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292844#comment-14292844
 ] 

Ferdinand Xu commented on HIVE-9333:


Thanks Sergio for your patch. I have left some general questions in the review 
board.

 Move parquet serialize implementation to DataWritableWriter to improve write 
 speeds
 ---

 Key: HIVE-9333
 URL: https://issues.apache.org/jira/browse/HIVE-9333
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergio Peña
Assignee: Sergio Peña
 Attachments: HIVE-9333.2.patch


 The serialize process on ParquetHiveSerDe parses a Hive object
 to a Writable object by looping through all the Hive object children,
 and creating new Writables objects per child. These final writables
 objects are passed in to the Parquet writing function, and parsed again
 on the DataWritableWriter class by looping through the ArrayWritable
 object. These two loops (ParquetHiveSerDe.serialize() and 
 DataWritableWriter.write()  may be reduced to use just one loop into the 
 DataWritableWriter.write() method in order to increment the writing process 
 speed for Hive parquet.
 In order to achieve this, we can wrap the Hive object and object inspector
 on ParquetHiveSerDe.serialize() method into an object that implements the 
 Writable object and thus avoid the loop that serialize() does, and leave the 
 loop parser to the DataWritableWriter.write() method. We can see how ORC does 
 this with the OrcSerde.OrcSerdeRow class.
 Writable objects are organized differently on any kind of storage formats, so 
 I don't think it is necessary to create and keep the writable objects in the 
 serialize() method as they won't be used until the writing process starts 
 (DataWritableWriter.write()).
 This performance issue was found using microbenchmark tests from HIVE-8121.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9470) Use a generic writable object to run ColumnaStorageBench write/read tests

2015-01-26 Thread Ferdinand Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292772#comment-14292772
 ] 

Ferdinand Xu commented on HIVE-9470:


LGTM with some minor suggestions.

{noformat}
131   public ColumnarStorageBench()  {
{noformat}
Please remove extra space.

{noformat}
233   private ObjectInspector getParquetObjectInspector(final String 
columnTypes) {
{noformat}
Can you rename it with getArrayWritableObjectInspector since it will be used by 
both parquet and orc?

{noformat}
242 Writable parquetWritable = 
createRecord(TypeInfoUtils.getTypeInfosFromTypeString(columnTypes));
{noformat}
Can you rename it with recordWritable  for the same reason as above?


 Use a generic writable object to run ColumnaStorageBench write/read tests 
 --

 Key: HIVE-9470
 URL: https://issues.apache.org/jira/browse/HIVE-9470
 Project: Hive
  Issue Type: Improvement
Reporter: Sergio Peña
Assignee: Sergio Peña
 Attachments: HIVE-9470.1.patch


 The ColumnarStorageBench benchmark class is using a Parquet writable object 
 to run all write/read/serialize/deserialize tests. It would be better to use 
 a more generic writable object (like text writables) to get better benchmark 
 results between format storages.
 Using parquet writables may add advantage when writing parquet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9371) Execution error for Parquet table and GROUP BY involving CHAR data type


 [ 
https://issues.apache.org/jira/browse/HIVE-9371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-9371:
---
Resolution: Duplicate
Status: Resolved  (was: Patch Available)

 Execution error for Parquet table and GROUP BY involving CHAR data type
 ---

 Key: HIVE-9371
 URL: https://issues.apache.org/jira/browse/HIVE-9371
 Project: Hive
  Issue Type: Bug
  Components: File Formats, Query Processor
Reporter: Matt McCline
Assignee: Ferdinand Xu
Priority: Critical
 Attachments: HIVE-9371.1.patch, HIVE-9371.patch, HIVE-9371.patch


 Query fails involving PARQUET table format, CHAR data type, and GROUP BY.
 Probably also fails for VARCHAR, too.
 {noformat}
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to 
 org.apache.hadoop.hive.serde2.io.HiveCharWritable
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:814)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
   at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
   at 
 org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493)
   ... 10 more
 Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be 
 cast to org.apache.hadoop.hive.serde2.io.HiveCharWritable
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableHiveCharObjectInspector.copyObject(WritableHiveCharObjectInspector.java:104)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.copyToStandardObject(ObjectInspectorUtils.java:305)
   at 
 org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:150)
   at 
 org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:142)
   at 
 org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.copyKey(KeyWrapperFactory.java:119)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:827)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:739)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:809)
   ... 16 more
 {noformat}
 Here is a q file:
 {noformat}
 SET hive.vectorized.execution.enabled=false;
 drop table char_2;
 create table char_2 (
   key char(10),
   value char(20)
 ) stored as parquet;
 insert overwrite table char_2 select * from src;
 select value, sum(cast(key as int)), count(*) numrows
 from src
 group by value
 order by value asc
 limit 5;
 explain select value, sum(cast(key as int)), count(*) numrows
 from char_2
 group by value
 order by value asc
 limit 5;
 -- should match the query from src
 select value, sum(cast(key as int)), count(*) numrows
 from char_2
 group by value
 order by value asc
 limit 5;
 select value, sum(cast(key as int)), count(*) numrows
 from src
 group by value
 order by value desc
 limit 5;
 explain select value, sum(cast(key as int)), count(*) numrows
 from char_2
 group by value
 order by value desc
 limit 5;
 -- should match the query from src
 select value, sum(cast(key as int)), count(*) numrows
 from char_2
 group by value
 order by value desc
 limit 5;
 drop table char_2;
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9450) [Parquet] Check all data types work for Parquet in Group By operator


[ 
https://issues.apache.org/jira/browse/HIVE-9450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291326#comment-14291326
 ] 

Ferdinand Xu commented on HIVE-9450:


Hi [~brocknoland] and [~dongc], do we really need to change the 
WritableHiveCharObjectInspector.java ? See 
https://issues.apache.org/jira/browse/HIVE-9371 

 [Parquet] Check all data types work for Parquet in Group By operator
 

 Key: HIVE-9450
 URL: https://issues.apache.org/jira/browse/HIVE-9450
 Project: Hive
  Issue Type: Sub-task
Reporter: Dong Chen
Assignee: Dong Chen
 Attachments: HIVE-9450.patch, HIVE-9450.patch


 Check all data types work for Parquet in Group By operator.
 1. Add test cases for data types.
 2. Fix the ClassCastException bug for CHARVARCHAR used in group by for 
 Parquet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9302) Beeline add jar local to client


 [ 
https://issues.apache.org/jira/browse/HIVE-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-9302:
---
Attachment: HIVE-9302.1.patch

 Beeline add jar local to client
 ---

 Key: HIVE-9302
 URL: https://issues.apache.org/jira/browse/HIVE-9302
 Project: Hive
  Issue Type: New Feature
Reporter: Brock Noland
Assignee: Ferdinand Xu
 Attachments: HIVE-9302.1.patch, HIVE-9302.patch, 
 mysql-connector-java-bin.jar, postgresql-9.3.jdbc3.jar


 At present if a beeline user uses {{add jar}} the path they give is actually 
 on the HS2 server. It'd be great to allow beeline users to add local jars as 
 well.
 It might be useful to do this in the jdbc driver itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9302) Beeline add jar local to client


 [ 
https://issues.apache.org/jira/browse/HIVE-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-9302:
---
Attachment: DummyDriver-1.0-SNAPSHOT.jar

 Beeline add jar local to client
 ---

 Key: HIVE-9302
 URL: https://issues.apache.org/jira/browse/HIVE-9302
 Project: Hive
  Issue Type: New Feature
Reporter: Brock Noland
Assignee: Ferdinand Xu
 Attachments: DummyDriver-1.0-SNAPSHOT.jar, HIVE-9302.1.patch, 
 HIVE-9302.patch, mysql-connector-java-bin.jar, postgresql-9.3.jdbc3.jar


 At present if a beeline user uses {{add jar}} the path they give is actually 
 on the HS2 server. It'd be great to allow beeline users to add local jars as 
 well.
 It might be useful to do this in the jdbc driver itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9302) Beeline add jar local to client


 [ 
https://issues.apache.org/jira/browse/HIVE-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-9302:
---
Attachment: (was: HIVE-9302.1.patch)

 Beeline add jar local to client
 ---

 Key: HIVE-9302
 URL: https://issues.apache.org/jira/browse/HIVE-9302
 Project: Hive
  Issue Type: New Feature
Reporter: Brock Noland
Assignee: Ferdinand Xu
 Attachments: DummyDriver-1.0-SNAPSHOT.jar, HIVE-9302.1.patch, 
 HIVE-9302.patch, mysql-connector-java-bin.jar, postgresql-9.3.jdbc3.jar


 At present if a beeline user uses {{add jar}} the path they give is actually 
 on the HS2 server. It'd be great to allow beeline users to add local jars as 
 well.
 It might be useful to do this in the jdbc driver itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9302) Beeline add jar local to client


 [ 
https://issues.apache.org/jira/browse/HIVE-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-9302:
---
Attachment: HIVE-9302.1.patch

 Beeline add jar local to client
 ---

 Key: HIVE-9302
 URL: https://issues.apache.org/jira/browse/HIVE-9302
 Project: Hive
  Issue Type: New Feature
Reporter: Brock Noland
Assignee: Ferdinand Xu
 Attachments: DummyDriver-1.0-SNAPSHOT.jar, HIVE-9302.1.patch, 
 HIVE-9302.patch, mysql-connector-java-bin.jar, postgresql-9.3.jdbc3.jar


 At present if a beeline user uses {{add jar}} the path they give is actually 
 on the HS2 server. It'd be great to allow beeline users to add local jars as 
 well.
 It might be useful to do this in the jdbc driver itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9371) Execution error for Parquet table and GROUP BY involving CHAR data type

2015-01-21 Thread Ferdinand Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-9371:
---
Attachment: HIVE-9371.patch

Reupload my patch to kick off the precommit.

 Execution error for Parquet table and GROUP BY involving CHAR data type
 ---

 Key: HIVE-9371
 URL: https://issues.apache.org/jira/browse/HIVE-9371
 Project: Hive
  Issue Type: Bug
  Components: File Formats, Query Processor
Reporter: Matt McCline
Assignee: Ferdinand Xu
Priority: Critical
 Attachments: HIVE-9371.patch, HIVE-9371.patch


 Query fails involving PARQUET table format, CHAR data type, and GROUP BY.
 Probably also fails for VARCHAR, too.
 {noformat}
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to 
 org.apache.hadoop.hive.serde2.io.HiveCharWritable
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:814)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
   at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
   at 
 org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493)
   ... 10 more
 Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be 
 cast to org.apache.hadoop.hive.serde2.io.HiveCharWritable
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableHiveCharObjectInspector.copyObject(WritableHiveCharObjectInspector.java:104)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.copyToStandardObject(ObjectInspectorUtils.java:305)
   at 
 org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:150)
   at 
 org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:142)
   at 
 org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.copyKey(KeyWrapperFactory.java:119)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:827)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:739)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:809)
   ... 16 more
 {noformat}
 Here is a q file:
 {noformat}
 SET hive.vectorized.execution.enabled=false;
 drop table char_2;
 create table char_2 (
   key char(10),
   value char(20)
 ) stored as parquet;
 insert overwrite table char_2 select * from src;
 select value, sum(cast(key as int)), count(*) numrows
 from src
 group by value
 order by value asc
 limit 5;
 explain select value, sum(cast(key as int)), count(*) numrows
 from char_2
 group by value
 order by value asc
 limit 5;
 -- should match the query from src
 select value, sum(cast(key as int)), count(*) numrows
 from char_2
 group by value
 order by value asc
 limit 5;
 select value, sum(cast(key as int)), count(*) numrows
 from src
 group by value
 order by value desc
 limit 5;
 explain select value, sum(cast(key as int)), count(*) numrows
 from char_2
 group by value
 order by value desc
 limit 5;
 -- should match the query from src
 select value, sum(cast(key as int)), count(*) numrows
 from char_2
 group by value
 order by value desc
 limit 5;
 drop table char_2;
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9371) Execution error for Parquet table and GROUP BY involving CHAR data type

2015-01-21 Thread Ferdinand Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-9371:
---
Attachment: HIVE-9371.1.patch

 Execution error for Parquet table and GROUP BY involving CHAR data type
 ---

 Key: HIVE-9371
 URL: https://issues.apache.org/jira/browse/HIVE-9371
 Project: Hive
  Issue Type: Bug
  Components: File Formats, Query Processor
Reporter: Matt McCline
Assignee: Ferdinand Xu
Priority: Critical
 Attachments: HIVE-9371.1.patch, HIVE-9371.patch, HIVE-9371.patch


 Query fails involving PARQUET table format, CHAR data type, and GROUP BY.
 Probably also fails for VARCHAR, too.
 {noformat}
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to 
 org.apache.hadoop.hive.serde2.io.HiveCharWritable
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:814)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
   at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
   at 
 org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493)
   ... 10 more
 Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be 
 cast to org.apache.hadoop.hive.serde2.io.HiveCharWritable
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableHiveCharObjectInspector.copyObject(WritableHiveCharObjectInspector.java:104)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.copyToStandardObject(ObjectInspectorUtils.java:305)
   at 
 org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:150)
   at 
 org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:142)
   at 
 org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.copyKey(KeyWrapperFactory.java:119)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:827)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:739)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:809)
   ... 16 more
 {noformat}
 Here is a q file:
 {noformat}
 SET hive.vectorized.execution.enabled=false;
 drop table char_2;
 create table char_2 (
   key char(10),
   value char(20)
 ) stored as parquet;
 insert overwrite table char_2 select * from src;
 select value, sum(cast(key as int)), count(*) numrows
 from src
 group by value
 order by value asc
 limit 5;
 explain select value, sum(cast(key as int)), count(*) numrows
 from char_2
 group by value
 order by value asc
 limit 5;
 -- should match the query from src
 select value, sum(cast(key as int)), count(*) numrows
 from char_2
 group by value
 order by value asc
 limit 5;
 select value, sum(cast(key as int)), count(*) numrows
 from src
 group by value
 order by value desc
 limit 5;
 explain select value, sum(cast(key as int)), count(*) numrows
 from char_2
 group by value
 order by value desc
 limit 5;
 -- should match the query from src
 select value, sum(cast(key as int)), count(*) numrows
 from char_2
 group by value
 order by value desc
 limit 5;
 drop table char_2;
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9371) Execution error for Parquet table and GROUP BY involving CHAR data type


[ 
https://issues.apache.org/jira/browse/HIVE-9371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14283837#comment-14283837
 ] 

Ferdinand Xu commented on HIVE-9371:


It failed when executing the command:
explain select value, sum(cast(key as int)), count(*) numrows
from char_2
group by value
order by value asc
limit 5;

The GroupByOperator got the writableHiveCharObjectInspector to parse a Text 
object which should be WritableStringObjectInspector. 

 Execution error for Parquet table and GROUP BY involving CHAR data type
 ---

 Key: HIVE-9371
 URL: https://issues.apache.org/jira/browse/HIVE-9371
 Project: Hive
  Issue Type: Bug
  Components: File Formats, Query Processor
Reporter: Matt McCline
Priority: Critical

 Query fails involving PARQUET table format, CHAR data type, and GROUP BY.
 Probably also fails for VARCHAR, too.
 {noformat}
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to 
 org.apache.hadoop.hive.serde2.io.HiveCharWritable
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:814)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
   at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
   at 
 org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493)
   ... 10 more
 Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be 
 cast to org.apache.hadoop.hive.serde2.io.HiveCharWritable
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableHiveCharObjectInspector.copyObject(WritableHiveCharObjectInspector.java:104)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.copyToStandardObject(ObjectInspectorUtils.java:305)
   at 
 org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:150)
   at 
 org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:142)
   at 
 org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.copyKey(KeyWrapperFactory.java:119)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:827)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:739)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:809)
   ... 16 more
 {noformat}
 Here is a q file:
 {noformat}
 SET hive.vectorized.execution.enabled=false;
 drop table char_2;
 create table char_2 (
   key char(10),
   value char(20)
 ) stored as parquet;
 insert overwrite table char_2 select * from src;
 select value, sum(cast(key as int)), count(*) numrows
 from src
 group by value
 order by value asc
 limit 5;
 explain select value, sum(cast(key as int)), count(*) numrows
 from char_2
 group by value
 order by value asc
 limit 5;
 -- should match the query from src
 select value, sum(cast(key as int)), count(*) numrows
 from char_2
 group by value
 order by value asc
 limit 5;
 select value, sum(cast(key as int)), count(*) numrows
 from src
 group by value
 order by value desc
 limit 5;
 explain select value, sum(cast(key as int)), count(*) numrows
 from char_2
 group by value
 order by value desc
 limit 5;
 -- should match the query from src
 select value, sum(cast(key as int)), count(*) numrows
 from char_2
 group by value
 order by value desc
 limit 5;
 drop table char_2;
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-9371) Execution error for Parquet table and GROUP BY involving CHAR data type


 [ 
https://issues.apache.org/jira/browse/HIVE-9371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu reassigned HIVE-9371:
--

Assignee: Ferdinand Xu

 Execution error for Parquet table and GROUP BY involving CHAR data type
 ---

 Key: HIVE-9371
 URL: https://issues.apache.org/jira/browse/HIVE-9371
 Project: Hive
  Issue Type: Bug
  Components: File Formats, Query Processor
Reporter: Matt McCline
Assignee: Ferdinand Xu
Priority: Critical

 Query fails involving PARQUET table format, CHAR data type, and GROUP BY.
 Probably also fails for VARCHAR, too.
 {noformat}
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to 
 org.apache.hadoop.hive.serde2.io.HiveCharWritable
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:814)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
   at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
   at 
 org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493)
   ... 10 more
 Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be 
 cast to org.apache.hadoop.hive.serde2.io.HiveCharWritable
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableHiveCharObjectInspector.copyObject(WritableHiveCharObjectInspector.java:104)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.copyToStandardObject(ObjectInspectorUtils.java:305)
   at 
 org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:150)
   at 
 org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:142)
   at 
 org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.copyKey(KeyWrapperFactory.java:119)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:827)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:739)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:809)
   ... 16 more
 {noformat}
 Here is a q file:
 {noformat}
 SET hive.vectorized.execution.enabled=false;
 drop table char_2;
 create table char_2 (
   key char(10),
   value char(20)
 ) stored as parquet;
 insert overwrite table char_2 select * from src;
 select value, sum(cast(key as int)), count(*) numrows
 from src
 group by value
 order by value asc
 limit 5;
 explain select value, sum(cast(key as int)), count(*) numrows
 from char_2
 group by value
 order by value asc
 limit 5;
 -- should match the query from src
 select value, sum(cast(key as int)), count(*) numrows
 from char_2
 group by value
 order by value asc
 limit 5;
 select value, sum(cast(key as int)), count(*) numrows
 from src
 group by value
 order by value desc
 limit 5;
 explain select value, sum(cast(key as int)), count(*) numrows
 from char_2
 group by value
 order by value desc
 limit 5;
 -- should match the query from src
 select value, sum(cast(key as int)), count(*) numrows
 from char_2
 group by value
 order by value desc
 limit 5;
 drop table char_2;
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-8838) Support Parquet through HCatalog


 [ 
https://issues.apache.org/jira/browse/HIVE-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu reassigned HIVE-8838:
--

Assignee: Ferdinand Xu

 Support Parquet through HCatalog
 

 Key: HIVE-8838
 URL: https://issues.apache.org/jira/browse/HIVE-8838
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Ferdinand Xu

 Similar to HIVE-8687 for Avro we need to fix Parquet with HCatalog.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9371) Execution error for Parquet table and GROUP BY involving CHAR data type