date:20130916


 [ 
https://issues.apache.org/jira/browse/HIVE-5276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-5276:
--

Attachment: D12879.2.patch

navis updated the revision HIVE-5276 [jira] Skip useless string encoding stage 
for hiveserver2.

  Fixed test fails  addressed comments

Reviewers: JIRA

REVISION DETAIL
  https://reviews.facebook.net/D12879

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D12879?vs=39897id=40023#toc

AFFECTED FILES
  ql/src/java/org/apache/hadoop/hive/ql/Driver.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/DefaultFetchFormatter.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/ListSinkOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/FetchFormatter.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/Task.java
  service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java

To: JIRA, navis
Cc: cwsteinbach


 Skip useless string encoding stage for hiveserver2
 --

 Key: HIVE-5276
 URL: https://issues.apache.org/jira/browse/HIVE-5276
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: D12879.2.patch, HIVE-5276.D12879.1.patch


 Current hiveserver2 acquires rows in string format which is used for cli 
 output. Then convert them into row again and convert to final format lastly. 
 This is inefficient and memory consuming. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5279) Kryo cannot instantiate GenericUDAFEvaluator in GroupByDesc


[ 
https://issues.apache.org/jira/browse/HIVE-5279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768123#comment-13768123
 ] 

Phabricator commented on HIVE-5279:
---

ashutoshc has accepted the revision HIVE-5279 [jira] Kryo cannot instantiate 
GenericUDAFEvaluator in GroupByDesc.

  +1

REVISION DETAIL
  https://reviews.facebook.net/D12963

BRANCH
  HIVE-5279

ARCANIST PROJECT
  hive

To: JIRA, ashutoshc, navis


 Kryo cannot instantiate GenericUDAFEvaluator in GroupByDesc
 ---

 Key: HIVE-5279
 URL: https://issues.apache.org/jira/browse/HIVE-5279
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Navis
Priority: Critical
 Attachments: 5279.patch, D12963.1.patch


 We didn't forced GenericUDAFEvaluator to be Serializable. I don't know how 
 previous serialization mechanism solved this but, kryo complaints that it's 
 not Serializable and fails the query.
 The log below is the example, 
 {noformat}
 java.lang.RuntimeException: com.esotericsoftware.kryo.KryoException: Class 
 cannot be created (missing no-arg constructor): 
 org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector
 Serialization trace:
 inputOI 
 (org.apache.hadoop.hive.ql.udf.generic.GenericUDAFGroupOn$VersionedFloatGroupOnEval)
 genericUDAFEvaluator (org.apache.hadoop.hive.ql.plan.AggregationDesc)
 aggregators (org.apache.hadoop.hive.ql.plan.GroupByDesc)
 conf (org.apache.hadoop.hive.ql.exec.GroupByOperator)
 childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
 childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
 aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
   at 
 org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:312)
   at 
 org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:261)
   at 
 org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:256)
   at 
 org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:383)
   at org.apache.h
 {noformat}
 If this cannot be fixed in somehow, some UDAFs should be modified to be run 
 on hive-0.13.0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5122) Add partition for multiple partition ignores locations for non-first partitions


 [ 
https://issues.apache.org/jira/browse/HIVE-5122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-5122:
--

Attachment: D12411.3.patch

navis updated the revision HIVE-5122 [jira] Add partition for multiple 
partition ignores locations for non-first partitions.

  Rebased to trunk  addressed comment (Path to Location, which is filtered by 
QTestUtil)

Reviewers: JIRA

REVISION DETAIL
  https://reviews.facebook.net/D12411

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D12411?vs=38499id=40029#toc

AFFECTED FILES
  ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g
  ql/src/java/org/apache/hadoop/hive/ql/plan/AddPartitionDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/DDLWork.java
  ql/src/test/queries/clientpositive/add_part_exist.q
  ql/src/test/results/clientpositive/add_part_exist.q.out
  ql/src/test/results/clientpositive/create_view_partitioned.q.out

To: JIRA, navis


 Add partition for multiple partition ignores locations for non-first 
 partitions
 ---

 Key: HIVE-5122
 URL: https://issues.apache.org/jira/browse/HIVE-5122
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: D12411.3.patch, HIVE-5122.D12411.1.patch, 
 HIVE-5122.D12411.2.patch


 http://www.mail-archive.com/user@hive.apache.org/msg09151.html
 When multiple partitions are being added in single alter table statement, the 
 location for first partition is being used as the location of all partitions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5276) Skip useless string encoding stage for hiveserver2


[ 
https://issues.apache.org/jira/browse/HIVE-5276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768142#comment-13768142
 ] 

Phabricator commented on HIVE-5276:
---

cwsteinbach has accepted the revision HIVE-5276 [jira] Skip useless string 
encoding stage for hiveserver2.

  +1. Please go ahead and commit this if the tests pass.

REVISION DETAIL
  https://reviews.facebook.net/D12879

BRANCH
  HIVE-5276

ARCANIST PROJECT
  hive

To: JIRA, cwsteinbach, navis
Cc: cwsteinbach


 Skip useless string encoding stage for hiveserver2
 --

 Key: HIVE-5276
 URL: https://issues.apache.org/jira/browse/HIVE-5276
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: D12879.2.patch, HIVE-5276.D12879.1.patch


 Current hiveserver2 acquires rows in string format which is used for cli 
 output. Then convert them into row again and convert to final format lastly. 
 This is inefficient and memory consuming. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5279) Kryo cannot instantiate GenericUDAFEvaluator in GroupByDesc


 [ 
https://issues.apache.org/jira/browse/HIVE-5279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-5279:
---

Assignee: Navis

 Kryo cannot instantiate GenericUDAFEvaluator in GroupByDesc
 ---

 Key: HIVE-5279
 URL: https://issues.apache.org/jira/browse/HIVE-5279
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Critical
 Attachments: 5279.patch, D12963.1.patch


 We didn't forced GenericUDAFEvaluator to be Serializable. I don't know how 
 previous serialization mechanism solved this but, kryo complaints that it's 
 not Serializable and fails the query.
 The log below is the example, 
 {noformat}
 java.lang.RuntimeException: com.esotericsoftware.kryo.KryoException: Class 
 cannot be created (missing no-arg constructor): 
 org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector
 Serialization trace:
 inputOI 
 (org.apache.hadoop.hive.ql.udf.generic.GenericUDAFGroupOn$VersionedFloatGroupOnEval)
 genericUDAFEvaluator (org.apache.hadoop.hive.ql.plan.AggregationDesc)
 aggregators (org.apache.hadoop.hive.ql.plan.GroupByDesc)
 conf (org.apache.hadoop.hive.ql.exec.GroupByOperator)
 childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
 childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
 aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
   at 
 org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:312)
   at 
 org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:261)
   at 
 org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:256)
   at 
 org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:383)
   at org.apache.h
 {noformat}
 If this cannot be fixed in somehow, some UDAFs should be modified to be run 
 on hive-0.13.0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5292) Join on decimal columns fails to return rows

2013-09-16 Thread Navis (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-5292:


Status: Patch Available  (was: Open)

 Join on decimal columns fails to return rows
 

 Key: HIVE-5292
 URL: https://issues.apache.org/jira/browse/HIVE-5292
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 0.11.0
 Environment: Linux lnxx64r5 2.6.18-128.el5 #1 SMP Wed Dec 17 11:41:38 
 EST 2008 x86_64 x86_64 x86_64 GNU/Linux
Reporter: Sergio Lob
Assignee: Navis
 Attachments: D12969.1.patch


 Join on matching decimal columns returns 0 rows
 To reproduce (I used beeline):
 1. create 2 simple identical tables with 2 identical rows: 
 CREATE TABLE SERGDEC(I INT, D DECIMAL) ROW FORMAT DELIMITED FIELDS
 TERMINATED BY '|';
 CREATE TABLE SERGDEC2(I INT, D DECIMAL) ROW FORMAT DELIMITED FIELDS
 TERMINATED BY '|';
 2. populate tables with identical data:
 LOAD DATA LOCAL INPATH './decdata' OVERWRITE INTO TABLE SERGDEC ;
 LOAD DATA LOCAL INPATH './decdata' OVERWRITE INTO TABLE SERGDEC2 ;
 3. data file decdata contains:
 10|.98
 20|1234567890.1234
 4. Perform join (returns 0 rows instead of 2):
 SELECT T1.I, T1.D, T2.D FROM SERGDEC T1 JOIN SERGDEC2 T2 ON
 T1.D = T2.D ;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5292) Join on decimal columns fails to return rows


 [ 
https://issues.apache.org/jira/browse/HIVE-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-5292:
--

Attachment: D12969.1.patch

navis requested code review of HIVE-5292 [jira] Join on decimal columns fails 
to return rows.

Reviewers: JIRA

HIVE-5292 Join on decimal columns fails to return rows

Join on matching decimal columns returns 0 rows

To reproduce (I used beeline):
1. create 2 simple identical tables with 2 identical rows:

CREATE TABLE SERGDEC(I INT, D DECIMAL) ROW FORMAT DELIMITED FIELDS
TERMINATED BY '|';
CREATE TABLE SERGDEC2(I INT, D DECIMAL) ROW FORMAT DELIMITED FIELDS
TERMINATED BY '|';

2. populate tables with identical data:

LOAD DATA LOCAL INPATH './decdata' OVERWRITE INTO TABLE SERGDEC ;
LOAD DATA LOCAL INPATH './decdata' OVERWRITE INTO TABLE SERGDEC2 ;

3. data file decdata contains:

10|.98
20|1234567890.1234

4. Perform join (returns 0 rows instead of 2):

SELECT T1.I, T1.D, T2.D FROM SERGDEC T1 JOIN SERGDEC2 T2 ON
T1.D = T2.D ;

TEST PLAN
  EMPTY

REVISION DETAIL
  https://reviews.facebook.net/D12969

AFFECTED FILES
  common/src/java/org/apache/hadoop/hive/common/type/HiveDecimal.java
  ql/src/test/queries/clientpositive/decimal_join.q
  ql/src/test/results/clientpositive/decimal_join.q.out

MANAGE HERALD RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/30981/

To: JIRA, navis


 Join on decimal columns fails to return rows
 

 Key: HIVE-5292
 URL: https://issues.apache.org/jira/browse/HIVE-5292
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 0.11.0
 Environment: Linux lnxx64r5 2.6.18-128.el5 #1 SMP Wed Dec 17 11:41:38 
 EST 2008 x86_64 x86_64 x86_64 GNU/Linux
Reporter: Sergio Lob
Assignee: Navis
 Attachments: D12969.1.patch


 Join on matching decimal columns returns 0 rows
 To reproduce (I used beeline):
 1. create 2 simple identical tables with 2 identical rows: 
 CREATE TABLE SERGDEC(I INT, D DECIMAL) ROW FORMAT DELIMITED FIELDS
 TERMINATED BY '|';
 CREATE TABLE SERGDEC2(I INT, D DECIMAL) ROW FORMAT DELIMITED FIELDS
 TERMINATED BY '|';
 2. populate tables with identical data:
 LOAD DATA LOCAL INPATH './decdata' OVERWRITE INTO TABLE SERGDEC ;
 LOAD DATA LOCAL INPATH './decdata' OVERWRITE INTO TABLE SERGDEC2 ;
 3. data file decdata contains:
 10|.98
 20|1234567890.1234
 4. Perform join (returns 0 rows instead of 2):
 SELECT T1.I, T1.D, T2.D FROM SERGDEC T1 JOIN SERGDEC2 T2 ON
 T1.D = T2.D ;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-5296) Memory leak: OOM Error after multiple open/closed JDBC connections.

2013-09-16 Thread Douglas (JIRA)

Douglas created HIVE-5296:
-

 Summary: Memory leak: OOM Error after multiple open/closed JDBC 
connections. 
 Key: HIVE-5296
 URL: https://issues.apache.org/jira/browse/HIVE-5296
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.12.0
 Environment: Hive 0.12.0, Hadoop 1.1.2, Debian.
Reporter: Douglas
 Fix For: 0.12.0


This error seems to relate to https://issues.apache.org/jira/browse/HIVE-3481

However, on inspection of the related patch and my built version of Hive (patch 
carried forward to 0.12.0), I am still seeing the described behaviour.

Multiple connections to Hiveserver2, all of which are closed and disposed of 
properly show the Java heap size to grow extremely quickly. 

This issue can be recreated using the following code

{code}

import java.sql.DriverManager;
import java.sql.Connection;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;
import java.util.Properties;

import org.apache.hive.service.cli.HiveSQLException;
import org.apache.log4j.Logger;

/*
 * Class which encapsulates the lifecycle of a query or statement.
 * Provides functionality which allows you to create a connection
 */

public class HiveClient {

Connection con;
Logger logger;
private static String driverName = org.apache.hive.jdbc.HiveDriver;   
private String db;


public HiveClient(String db)
{   
logger = Logger.getLogger(HiveClient.class);
this.db=db;

try{
 Class.forName(driverName);
}catch(ClassNotFoundException e){
logger.info(Can't find Hive driver);
}

String hiveHost = GlimmerServer.config.getString(hive/host);
String hivePort = GlimmerServer.config.getString(hive/port);
String connectionString = jdbc:hive2://+hiveHost+:+hivePort 
+/default;
logger.info(String.format(Attempting to connect to 
%s,connectionString));
try{
con = 
DriverManager.getConnection(connectionString,,);
  
}catch(Exception e){
logger.error(Problem instantiating the 
connection+e.getMessage());
}   
}

public int update(String query) 
{
Integer res = 0;
Statement stmt = null;
try{
stmt = con.createStatement();
String switchdb = USE +db;
logger.info(switchdb);  
stmt.executeUpdate(switchdb);
logger.info(query);
res = stmt.executeUpdate(query);
logger.info(Query passed to server);  
stmt.close();
}catch(HiveSQLException e){
logger.info(String.format(HiveSQLException thrown, 
this can be valid,  +
but check the error: %s from the query 
%s,query,e.toString()));
}catch(SQLException e){
logger.error(String.format(Unable to execute query 
SQLException %s. Error: %s,query,e));
}catch(Exception e){
logger.error(String.format(Unable to execute query %s. 
Error: %s,query,e));
}

if(stmt!=null)
try{
stmt.close();
}catch(SQLException e){
logger.error(Cannot close the statment, 
potentially memory leak +e);
}

return res;
}

public void close()
{
if(con!=null){
try {
con.close();
} catch (SQLException e) {  
logger.info(Problem closing connection +e);
}
}
}



}
{code}

And by creating and closing many HiveClient objects. The heap space used by the 
hiveserver2 runjar process is seen to increase extremely quickly, without such 
space being released.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Deleting the old Wiki

2013-09-16 Thread Dean Wampler

Google searches still put the old wiki at the top of many results, which is
a *very bad thing *for beginners. Can someone delete this wiki or at least
redirect to the Confluence wiki?

dean

-- 
Dean Wampler, Ph.D.
@deanwampler
http://polyglotprogramming.com

Re: Deleting the old Wiki

2013-09-16 Thread Brock Noland

Hi,

Years ago I helped with the wiki conversion. The page I can find is:

http://wiki.apache.org/hadoop/Hive

which isn't harmful.  What pages are you finding?

Brock

On Mon, Sep 16, 2013 at 8:51 AM, Dean Wampler deanwamp...@gmail.com wrote:
 Google searches still put the old wiki at the top of many results, which is
 a *very bad thing *for beginners. Can someone delete this wiki or at least
 redirect to the Confluence wiki?

 dean

 --
 Dean Wampler, Ph.D.
 @deanwampler
 http://polyglotprogramming.com



-- 
Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org

[jira] [Commented] (HIVE-5292) Join on decimal columns fails to return rows


[ 
https://issues.apache.org/jira/browse/HIVE-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768398#comment-13768398
 ] 

Phabricator commented on HIVE-5292:
---

ashutoshc has accepted the revision HIVE-5292 [jira] Join on decimal columns 
fails to return rows.

  +1

REVISION DETAIL
  https://reviews.facebook.net/D12969

BRANCH
  HIVE-5292

ARCANIST PROJECT
  hive

To: JIRA, ashutoshc, navis


 Join on decimal columns fails to return rows
 

 Key: HIVE-5292
 URL: https://issues.apache.org/jira/browse/HIVE-5292
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 0.11.0
 Environment: Linux lnxx64r5 2.6.18-128.el5 #1 SMP Wed Dec 17 11:41:38 
 EST 2008 x86_64 x86_64 x86_64 GNU/Linux
Reporter: Sergio Lob
Assignee: Navis
 Attachments: D12969.1.patch


 Join on matching decimal columns returns 0 rows
 To reproduce (I used beeline):
 1. create 2 simple identical tables with 2 identical rows: 
 CREATE TABLE SERGDEC(I INT, D DECIMAL) ROW FORMAT DELIMITED FIELDS
 TERMINATED BY '|';
 CREATE TABLE SERGDEC2(I INT, D DECIMAL) ROW FORMAT DELIMITED FIELDS
 TERMINATED BY '|';
 2. populate tables with identical data:
 LOAD DATA LOCAL INPATH './decdata' OVERWRITE INTO TABLE SERGDEC ;
 LOAD DATA LOCAL INPATH './decdata' OVERWRITE INTO TABLE SERGDEC2 ;
 3. data file decdata contains:
 10|.98
 20|1234567890.1234
 4. Perform join (returns 0 rows instead of 2):
 SELECT T1.I, T1.D, T2.D FROM SERGDEC T1 JOIN SERGDEC2 T2 ON
 T1.D = T2.D ;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4443) [HCatalog] Have an option for GET queue to return all job information in single call


[ 
https://issues.apache.org/jira/browse/HIVE-4443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768487#comment-13768487
 ] 

Eugene Koifman commented on HIVE-4443:
--

Are there any tests that cover new functionality?

 [HCatalog] Have an option for GET queue to return all job information in 
 single call 
 -

 Key: HIVE-4443
 URL: https://issues.apache.org/jira/browse/HIVE-4443
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.12.0

 Attachments: HIVE-4443-1.patch, HIVE-4443-2.patch, HIVE-4443-3.patch, 
 HIVE-4443-4.patch


 Currently do display a summary of all jobs, one has to call GET queue to 
 retrieve all the jobids and then call GET queue/:jobid for each job. It would 
 be nice to do this in a single call.
 I would suggest:
 * GET queue - mark deprecate
 * GET queue/jobID - mark deprecate
 * DELETE queue/jobID - mark deprecate
 * GET jobs - return the list of JSON objects jobid but no detailed info
 * GET jobs/fields=* - return the list of JSON objects containing detailed Job 
 info
 * GET jobs/jobID - return the single JSON object containing the detailed 
 Job info for the job with the given ID (equivalent to GET queue/jobID)
 * DELETE jobs/jobID - equivalent to DELETE queue/jobID
 NO PRECOMMIT TESTS 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4444) [HCatalog] WebHCat Hive should support equivalent parameters as Pig


[ 
https://issues.apache.org/jira/browse/HIVE-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768497#comment-13768497
 ] 

Eugene Koifman commented on HIVE-:
--

Could the comments be made more detailed?
For example, Server#hive() adds 2 params.  Could you add a few words about what 
they are for, or a URL to hive doc that explains where one can get more info?

Can the new tests be described in a bit more detail or have a pointer to some 
place that describes what the tests are testing?  This will help others in the 
future.

 [HCatalog] WebHCat Hive should support equivalent parameters as Pig 
 

 Key: HIVE-
 URL: https://issues.apache.org/jira/browse/HIVE-
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.12.0

 Attachments: HIVE--1.patch, HIVE--2.patch, HIVE--3.patch


 Currently there is no files and args parameter in Hive. We shall add them 
 to make them similar to Pig.
 NO PRECOMMIT TESTS 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5279) Kryo cannot instantiate GenericUDAFEvaluator in GroupByDesc


 [ 
https://issues.apache.org/jira/browse/HIVE-5279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-5279:
---

Status: Open  (was: Patch Available)

Following tests failed:
* TestParse
* TestCliDriver_autogen_colalias.q
* TestCliDriver_create_udaf.q
* TestCliDriver_create_view.q
* TestCliDriver_limit_pushdown.q
* TestCliDriver_show_functions.q
* TestCliDriver_udaf_sum_list.q
* TestCliDriver_udf_percentile.q

 Kryo cannot instantiate GenericUDAFEvaluator in GroupByDesc
 ---

 Key: HIVE-5279
 URL: https://issues.apache.org/jira/browse/HIVE-5279
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Critical
 Attachments: 5279.patch, D12963.1.patch


 We didn't forced GenericUDAFEvaluator to be Serializable. I don't know how 
 previous serialization mechanism solved this but, kryo complaints that it's 
 not Serializable and fails the query.
 The log below is the example, 
 {noformat}
 java.lang.RuntimeException: com.esotericsoftware.kryo.KryoException: Class 
 cannot be created (missing no-arg constructor): 
 org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector
 Serialization trace:
 inputOI 
 (org.apache.hadoop.hive.ql.udf.generic.GenericUDAFGroupOn$VersionedFloatGroupOnEval)
 genericUDAFEvaluator (org.apache.hadoop.hive.ql.plan.AggregationDesc)
 aggregators (org.apache.hadoop.hive.ql.plan.GroupByDesc)
 conf (org.apache.hadoop.hive.ql.exec.GroupByOperator)
 childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
 childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
 aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
   at 
 org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:312)
   at 
 org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:261)
   at 
 org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:256)
   at 
 org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:383)
   at org.apache.h
 {noformat}
 If this cannot be fixed in somehow, some UDAFs should be modified to be run 
 on hive-0.13.0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5278) Move some string UDFs to GenericUDFs, for better varchar support

2013-09-16 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768416#comment-13768416
 ] 

Hudson commented on HIVE-5278:
--

FAILURE: Integrated in Hive-trunk-h0.21 #2335 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/2335/])
HIVE-5278 : Move some string UDFs to GenericUDFs, for better varchar support 
(Jason Dere via Ashutosh Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1523518)
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFConcat.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFLower.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFUpper.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFConcat.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFLower.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFUpper.java
* /hive/trunk/ql/src/test/results/compiler/plan/groupby2.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/udf6.q.xml


 Move some string UDFs to GenericUDFs, for better varchar support
 

 Key: HIVE-5278
 URL: https://issues.apache.org/jira/browse/HIVE-5278
 Project: Hive
  Issue Type: Improvement
  Components: Types, UDF
Reporter: Jason Dere
Assignee: Jason Dere
 Fix For: 0.13.0

 Attachments: D12909.1.patch, HIVE-5278.1.patch, HIVE-5278.2.patch


 To better support varchar/char types in string UDFs, select UDFs should be 
 converted to GenericUDFs. This allows the UDF to return the resulting 
 char/varchar length in the type metadata.
 This work is being split off as a separate task from HIVE-4844. The initial 
 UDFs as part of this work are concat/lower/upper.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4443) [HCatalog] Have an option for GET queue to return all job information in single call


 [ 
https://issues.apache.org/jira/browse/HIVE-4443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-4443:
-

Attachment: (was: HIVE-4443-4.patch)

 [HCatalog] Have an option for GET queue to return all job information in 
 single call 
 -

 Key: HIVE-4443
 URL: https://issues.apache.org/jira/browse/HIVE-4443
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.12.0

 Attachments: HIVE-4443-1.patch, HIVE-4443-2.patch, HIVE-4443-3.patch, 
 HIVE-4443-4.patch


 Currently do display a summary of all jobs, one has to call GET queue to 
 retrieve all the jobids and then call GET queue/:jobid for each job. It would 
 be nice to do this in a single call.
 I would suggest:
 * GET queue - mark deprecate
 * GET queue/jobID - mark deprecate
 * DELETE queue/jobID - mark deprecate
 * GET jobs - return the list of JSON objects jobid but no detailed info
 * GET jobs/fields=* - return the list of JSON objects containing detailed Job 
 info
 * GET jobs/jobID - return the single JSON object containing the detailed 
 Job info for the job with the given ID (equivalent to GET queue/jobID)
 * DELETE jobs/jobID - equivalent to DELETE queue/jobID
 NO PRECOMMIT TESTS 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4443) [HCatalog] Have an option for GET queue to return all job information in single call


 [ 
https://issues.apache.org/jira/browse/HIVE-4443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-4443:
-

Attachment: HIVE-4443-4.patch

 [HCatalog] Have an option for GET queue to return all job information in 
 single call 
 -

 Key: HIVE-4443
 URL: https://issues.apache.org/jira/browse/HIVE-4443
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.12.0

 Attachments: HIVE-4443-1.patch, HIVE-4443-2.patch, HIVE-4443-3.patch, 
 HIVE-4443-4.patch


 Currently do display a summary of all jobs, one has to call GET queue to 
 retrieve all the jobids and then call GET queue/:jobid for each job. It would 
 be nice to do this in a single call.
 I would suggest:
 * GET queue - mark deprecate
 * GET queue/jobID - mark deprecate
 * DELETE queue/jobID - mark deprecate
 * GET jobs - return the list of JSON objects jobid but no detailed info
 * GET jobs/fields=* - return the list of JSON objects containing detailed Job 
 info
 * GET jobs/jobID - return the single JSON object containing the detailed 
 Job info for the job with the given ID (equivalent to GET queue/jobID)
 * DELETE jobs/jobID - equivalent to DELETE queue/jobID
 NO PRECOMMIT TESTS 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4443) [HCatalog] Have an option for GET queue to return all job information in single call

2013-09-16 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768557#comment-13768557
 ] 

Daniel Dai commented on HIVE-4443:
--

Test is in HIVE-5078.

 [HCatalog] Have an option for GET queue to return all job information in 
 single call 
 -

 Key: HIVE-4443
 URL: https://issues.apache.org/jira/browse/HIVE-4443
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.12.0

 Attachments: HIVE-4443-1.patch, HIVE-4443-2.patch, HIVE-4443-3.patch, 
 HIVE-4443-4.patch


 Currently do display a summary of all jobs, one has to call GET queue to 
 retrieve all the jobids and then call GET queue/:jobid for each job. It would 
 be nice to do this in a single call.
 I would suggest:
 * GET queue - mark deprecate
 * GET queue/jobID - mark deprecate
 * DELETE queue/jobID - mark deprecate
 * GET jobs - return the list of JSON objects jobid but no detailed info
 * GET jobs/fields=* - return the list of JSON objects containing detailed Job 
 info
 * GET jobs/jobID - return the single JSON object containing the detailed 
 Job info for the job with the given ID (equivalent to GET queue/jobID)
 * DELETE jobs/jobID - equivalent to DELETE queue/jobID
 NO PRECOMMIT TESTS 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5253) Create component to compile and jar dynamic code

2013-09-16 Thread Edward Capriolo (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768530#comment-13768530
 ] 

Edward Capriolo commented on HIVE-5253:
---

https://reviews.facebook.net/differential/diff/40041/

 Create component to compile and jar dynamic code
 

 Key: HIVE-5253
 URL: https://issues.apache.org/jira/browse/HIVE-5253
 Project: Hive
  Issue Type: Sub-task
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Attachments: HIVE-5253.patch.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4998) support jdbc documented table types in default configuration

2013-09-16 Thread Harish Butani (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harish Butani updated HIVE-4998:


   Resolution: Fixed
Fix Version/s: 0.12.0
   Status: Resolved  (was: Patch Available)

 support jdbc documented table types in default configuration
 

 Key: HIVE-4998
 URL: https://issues.apache.org/jira/browse/HIVE-4998
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, JDBC
Affects Versions: 0.11.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.12.0

 Attachments: HIVE-4998.1.patch


 The jdbc table types supported by hive server2 are not the documented typical 
 types [1] in jdbc, they are hive specific types (MANAGED_TABLE, 
 EXTERNAL_TABLE, VIRTUAL_VIEW). 
 HIVE-4573 added support for the jdbc documented typical types, but the HS2 
 default configuration is to return the hive types 
 The default configuration should result in the expected jdbc typical behavior.
 [1] 
 http://docs.oracle.com/javase/6/docs/api/java/sql/DatabaseMetaData.html?is-external=true#getTables(java.lang.String,
  java.lang.String, java.lang.String, java.lang.String[])

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5161) Additional SerDe support for varchar type

2013-09-16 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768545#comment-13768545
 ] 

Hudson commented on HIVE-5161:
--

FAILURE: Integrated in Hive-trunk-hadoop2 #433 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/433/])
HIVE-5161 : Additional SerDe support for varchar type (Jason Dere via Ashutosh 
Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1523532)
* 
/hive/trunk/ql/src/gen/protobuf/gen-java/org/apache/hadoop/hive/ql/io/orc/OrcProto.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/ColumnStatisticsImpl.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcStruct.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java
* /hive/trunk/ql/src/protobuf/org/apache/hadoop/hive/ql/io/orc/orc_proto.proto
* /hive/trunk/ql/src/test/queries/clientpositive/varchar_serde.q
* /hive/trunk/ql/src/test/results/clientpositive/varchar_serde.q.out
* /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java
* 
/hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/binarysortable/BinarySortableSerDe.java
* /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyUtils.java
* 
/hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinarySerDe.java


 Additional SerDe support for varchar type
 -

 Key: HIVE-5161
 URL: https://issues.apache.org/jira/browse/HIVE-5161
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers, Types
Reporter: Jason Dere
Assignee: Jason Dere
 Fix For: 0.13.0

 Attachments: D12897.1.patch, HIVE-5161.1.patch, HIVE-5161.2.patch, 
 HIVE-5161.3.patch


 Breaking out support for varchar for the various SerDes as an additional task.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5285) Custom SerDes throw cast exception when there are complex nested structures containing NonSettableObjectInspectors.


 [ 
https://issues.apache.org/jira/browse/HIVE-5285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-5285:


Status: Patch Available  (was: Open)

RB link updated with new diff file : https://reviews.apache.org/r/14144/

 Custom SerDes throw cast exception when there are complex nested structures 
 containing NonSettableObjectInspectors.
 ---

 Key: HIVE-5285
 URL: https://issues.apache.org/jira/browse/HIVE-5285
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.1
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
Priority: Critical
 Attachments: HIVE-5285.1.patch.txt, HIVE-5285.2.patch.txt


 The approach for HIVE-5199 fix is correct.However, the fix for HIVE-5199 is 
 incomplete. Consider a complex nested structure containing the following 
 object inspector hierarchy:
 SettableStructObjectInspector
 {
   ListObjectInspectorNonSettableStructObjectInspector
 }
 In the above case, the cast exception can happen via 
 MapOperator/FetchOperator as below:
 java.io.IOException: java.lang.ClassCastException: 
 com.skype.data.hadoop.hive.proto.CustomObjectInspector cannot be cast to 
 org.apache.hadoop.hive.serde2.objectinspector.SettableMapObjectInspector
 at 
 org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:545)
 at 
 org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:489)
 at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:136)
 at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1412)
 at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:271)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
 at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:756)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
 Caused by: java.lang.ClassCastException: 
 com.skype.data.whaleshark.hadoop.hive.proto.ProtoMapObjectInspector cannot be 
 cast to 
 org.apache.hadoop.hive.serde2.objectinspector.SettableMapObjectInspector
 at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConverter(ObjectInspectorConverters.java:144)
 at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.init(ObjectInspectorConverters.java:294)
 at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConverter(ObjectInspectorConverters.java:138)
 at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$ListConverter.convert(ObjectInspectorConverters.java:251)
 at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.convert(ObjectInspectorConverters.java:316)
 at 
 org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:529)
 ... 13 more

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5246) Local task for map join submitted via oozie job fails on a secure HDFS

2013-09-16 Thread Prasad Mujumdar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Mujumdar updated HIVE-5246:
--

Attachment: (was: HIVE-5246.1.patch)

  Local task for map join submitted via oozie job fails on a secure HDFS
 ---

 Key: HIVE-5246
 URL: https://issues.apache.org/jira/browse/HIVE-5246
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0, 0.11.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Attachments: HIVE-5246-test.tar


 For a Hive query started by Oozie Hive action, the local task submitted for 
 Mapjoin fails. The HDFS delegation token is not shared properly with the 
 child JVM created for the local task.
 Oozie creates a delegation token for the Hive action and sets env variable 
 HADOOP_TOKEN_FILE_LOCATION as well as mapreduce.job.credentials.binary config 
 property. However this doesn't get passed down to the child JVM which causes 
 the problem.
 This is similar issue addressed by HIVE-4343 which address the problem 
 HiveServer2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4512) The vectorized plan is not picking right expression class for string concatenation.

2013-09-16 Thread Eric Hanson (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Hanson updated HIVE-4512:
--

Attachment: HIVE-4512.3-vectorization.patch

Based patch off the latest vectorization branch

 The vectorized plan is not picking right expression class for string 
 concatenation.
 ---

 Key: HIVE-4512
 URL: https://issues.apache.org/jira/browse/HIVE-4512
 Project: Hive
  Issue Type: Sub-task
Affects Versions: vectorization-branch
Reporter: Jitendra Nath Pandey
Assignee: Eric Hanson
 Attachments: HIVE-4512.1-vectorization.patch, 
 HIVE-4512.2-vectorization.patch, HIVE-4512.3-vectorization.patch


 The vectorized plan is not picking right expression class for string 
 concatenation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5246) Local task for map join submitted via oozie job fails on a secure HDFS

2013-09-16 Thread Prasad Mujumdar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Mujumdar updated HIVE-5246:
--

Attachment: HIVE-5246.1.patch

Reattached the patch file

  Local task for map join submitted via oozie job fails on a secure HDFS
 ---

 Key: HIVE-5246
 URL: https://issues.apache.org/jira/browse/HIVE-5246
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0, 0.11.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Attachments: HIVE-5246.1.patch, HIVE-5246-test.tar


 For a Hive query started by Oozie Hive action, the local task submitted for 
 Mapjoin fails. The HDFS delegation token is not shared properly with the 
 child JVM created for the local task.
 Oozie creates a delegation token for the Hive action and sets env variable 
 HADOOP_TOKEN_FILE_LOCATION as well as mapreduce.job.credentials.binary config 
 property. However this doesn't get passed down to the child JVM which causes 
 the problem.
 This is similar issue addressed by HIVE-4343 which address the problem 
 HiveServer2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4444) [HCatalog] WebHCat Hive should support equivalent parameters as Pig


[ 
https://issues.apache.org/jira/browse/HIVE-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768626#comment-13768626
 ] 

Thejas M Nair commented on HIVE-:
-

Daniel, I got some javadoc errors with new patch. Can you please check ?

{code}
  [javadoc] 
/home/hortonth/hive_thejas/hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/Server.java:677:
 warning - @param argument file is not a parameter name.
  [javadoc] 
/home/hortonth/hive_thejas/hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/Server.java:677:
 warning - @param argument arg is not a parameter name.
  [javadoc] 
/home/hortonth/hive_thejas/hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/Server.java:677:
 warning - @param argument files is not a parameter name.
  [javadoc] 
/home/hortonth/hive_thejas/hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/Server.java:677:
 warning - @param argument define is not a parameter name.

{code}

 [HCatalog] WebHCat Hive should support equivalent parameters as Pig 
 

 Key: HIVE-
 URL: https://issues.apache.org/jira/browse/HIVE-
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.12.0

 Attachments: HIVE--1.patch, HIVE--2.patch, HIVE--3.patch, 
 HIVE--4.patch


 Currently there is no files and args parameter in Hive. We shall add them 
 to make them similar to Pig.
 NO PRECOMMIT TESTS 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5288) Perflogger should log under single class


 [ 
https://issues.apache.org/jira/browse/HIVE-5288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-5288:
---

Attachment: HIVE-5288.01.patch

rebase patch

 Perflogger should log under single class
 

 Key: HIVE-5288
 URL: https://issues.apache.org/jira/browse/HIVE-5288
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
Priority: Minor
 Attachments: HIVE-5288.01.patch, HIVE-5288.patch


 Perflogger should log under single class, so that it could be turned on 
 without mass logging spew. Right now the log is passed to it externally; this 
 could be preserved by passing in a string to be logged as part of the message.
 Anyway most of the time it's called from Driver and Utilities, which is a 
 pretty useless class name.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4531) [WebHCat] Collecting task logs to hdfs


[ 
https://issues.apache.org/jira/browse/HIVE-4531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768648#comment-13768648
 ] 

Eugene Koifman commented on HIVE-4531:
--

Review Comments:
Should this include e2e tests in addition (or instead of unit tests).  If (when 
:) Hadoop changes the log file format this will break, but Unit tests won't 
catch this since the data that the tests parse is static.

Here is a bunch of little things/nits:

o Server.java has “if (enablelog == true  
!TempletonUtils.isset(statusdir))  throw new BadParam(enablelog is only 
applicable when statusdir is set);”  in 4 different places.  Can this be a 
method?
o What is the purpose of Server#misc()?
o TempletonControllerJob: import org.apache.hive.hcatalog.templeton.Main; - 
unused import
oo Line 173 - indentation is off
oo Line 295 - writer.close() - This writer is connected to System.err.  What 
are the implications of closing this?  What if something tries to write to it 
later?
o TempletonUtils has unused imports - checkstyle needs to be run on the whole 
patch.
o TestJobIDParser mixes JUnit3 and JUnit4.  It should either not extend 
TestCase (I vote for this) or not use @Test annotations
o Can JobIDParser (and all subclasses) be made package scoped since they are 
not used outside templeton pacakge?  Similarly, can methods be made as private 
as possible?
o JobIDParser#parseJobID() has “fname” param which is not used.  What is the 
intent?  Should it be used in openStatusFile() call?  If not, better to remove 
it.
o JobIDParser#openStatusFile() creas a Reader.  Where/when is it being closed?
o Could the 2 member variables in JobIDParser be made private (even final)?
o Why is TestJobIDParser using findJobID() directly?  Could it not use 
parseJobID()?
o Can JobIDParser have 1 line of class level javadoc about the purpose of this 
class?


 [WebHCat] Collecting task logs to hdfs
 --

 Key: HIVE-4531
 URL: https://issues.apache.org/jira/browse/HIVE-4531
 Project: Hive
  Issue Type: New Feature
  Components: HCatalog
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.12.0

 Attachments: HIVE-4531-1.patch, HIVE-4531-2.patch, HIVE-4531-3.patch, 
 HIVE-4531-4.patch, HIVE-4531-5.patch, HIVE-4531-6.patch, HIVE-4531-7.patch, 
 HIVE-4531-8.patch, samplestatusdirwithlist.tar.gz


 It would be nice we collect task logs after job finish. This is similar to 
 what Amazon EMR does.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5279) Kryo cannot instantiate GenericUDAFEvaluator in GroupByDesc


[ 
https://issues.apache.org/jira/browse/HIVE-5279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768692#comment-13768692
 ] 

Ashutosh Chauhan commented on HIVE-5279:


Additionally following tests failed too:
* TestContribCliDriver_udaf_example_avg.q
* TestContribCliDriver_udaf_example_group_concat.q
* TestContribCliDriver__udaf_example_max.q
* TestContribCliDriver_udaf_example_max_n.q
* TestContribCliDriver__udaf_example_min.q
* TestContribCliDriver__udaf_example_max_n.q

 Kryo cannot instantiate GenericUDAFEvaluator in GroupByDesc
 ---

 Key: HIVE-5279
 URL: https://issues.apache.org/jira/browse/HIVE-5279
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Critical
 Attachments: 5279.patch, D12963.1.patch


 We didn't forced GenericUDAFEvaluator to be Serializable. I don't know how 
 previous serialization mechanism solved this but, kryo complaints that it's 
 not Serializable and fails the query.
 The log below is the example, 
 {noformat}
 java.lang.RuntimeException: com.esotericsoftware.kryo.KryoException: Class 
 cannot be created (missing no-arg constructor): 
 org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector
 Serialization trace:
 inputOI 
 (org.apache.hadoop.hive.ql.udf.generic.GenericUDAFGroupOn$VersionedFloatGroupOnEval)
 genericUDAFEvaluator (org.apache.hadoop.hive.ql.plan.AggregationDesc)
 aggregators (org.apache.hadoop.hive.ql.plan.GroupByDesc)
 conf (org.apache.hadoop.hive.ql.exec.GroupByOperator)
 childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
 childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
 aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
   at 
 org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:312)
   at 
 org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:261)
   at 
 org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:256)
   at 
 org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:383)
   at org.apache.h
 {noformat}
 If this cannot be fixed in somehow, some UDAFs should be modified to be run 
 on hive-0.13.0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5288) Perflogger should log under single class


 [ 
https://issues.apache.org/jira/browse/HIVE-5288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-5288:
---

Attachment: HIVE-5288.01.patch

missed one place...

 Perflogger should log under single class
 

 Key: HIVE-5288
 URL: https://issues.apache.org/jira/browse/HIVE-5288
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
Priority: Minor
 Attachments: HIVE-5288.01.patch, HIVE-5288.01.patch, HIVE-5288.patch


 Perflogger should log under single class, so that it could be turned on 
 without mass logging spew. Right now the log is passed to it externally; this 
 could be preserved by passing in a string to be logged as part of the message.
 Anyway most of the time it's called from Driver and Utilities, which is a 
 pretty useless class name.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5122) Add partition for multiple partition ignores locations for non-first partitions


[ 
https://issues.apache.org/jira/browse/HIVE-5122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768695#comment-13768695
 ] 

Thejas M Nair commented on HIVE-5122:
-

Navis, as I mention in my previous comment, with masking the test no longer 
checks if the locations are being picked up correctly. Ie, we won't know if 
someone introduces a bug that causes same problem again, and causes the first 
location to be associated with each of the partitions. The test case needs to 
be changed.  Maybe, add partitions with data and use a select query selecting 
data from the partitions, to verify that correct locations are associated with 
the partitions?



 Add partition for multiple partition ignores locations for non-first 
 partitions
 ---

 Key: HIVE-5122
 URL: https://issues.apache.org/jira/browse/HIVE-5122
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: D12411.3.patch, HIVE-5122.D12411.1.patch, 
 HIVE-5122.D12411.2.patch


 http://www.mail-archive.com/user@hive.apache.org/msg09151.html
 When multiple partitions are being added in single alter table statement, the 
 location for first partition is being used as the location of all partitions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4961) Create bridge for custom UDFs to operate in vectorized mode


[ 
https://issues.apache.org/jira/browse/HIVE-4961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768679#comment-13768679
 ] 

Hive QA commented on HIVE-4961:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12603380/HIVE-4961.3-vectorization.patch

{color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 3954 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_plan_json
org.apache.hadoop.hive.ql.io.orc.TestFileDump.testDictionaryThreshold
org.apache.hadoop.hive.ql.io.orc.TestFileDump.testDump
org.apache.hcatalog.listener.TestNotificationListener.testAMQListener
org.apache.hive.hcatalog.fileformats.TestOrcDynamicPartitioned.testHCatDynamicPartitionedTable
org.apache.hive.hcatalog.fileformats.TestOrcDynamicPartitioned.testHCatDynamicPartitionedTableMultipleTask
org.apache.hive.hcatalog.pig.TestHCatStorer.testPartColsInData
org.apache.hive.hcatalog.pig.TestHCatStorer.testStoreInPartiitonedTbl
org.apache.hive.hcatalog.pig.TestHCatStorer.testStoreMultiTables
org.apache.hive.hcatalog.pig.TestHCatStorer.testStoreWithNoSchema
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/763/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/763/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests failed with: TestsFailedException: 11 tests failed
{noformat}

This message is automatically generated.

 Create bridge for custom UDFs to operate in vectorized mode
 ---

 Key: HIVE-4961
 URL: https://issues.apache.org/jira/browse/HIVE-4961
 Project: Hive
  Issue Type: Sub-task
Affects Versions: vectorization-branch
Reporter: Eric Hanson
Assignee: Eric Hanson
 Fix For: vectorization-branch

 Attachments: HIVE-4961.1-vectorization.patch, 
 HIVE-4961.2-vectorization.patch, HIVE-4961.3-vectorization.patch, 
 vectorUDF.4.patch, vectorUDF.5.patch, vectorUDF.8.patch, vectorUDF.9.patch


 Suppose you have a custom UDF myUDF() that you've created to extend hive. The 
 goal of this JIRA is to create a facility where if you run a query that uses 
 myUDF() in an expression, the query will run in vectorized mode.
 This would be a general-purpose bridge for custom UDFs that users add to 
 Hive. It would work with existing UDFs.
 I'm considering a separate JIRA for a new kind of custom UDF implementation 
 that is vectorized from the beginning, to optimize performance. That is not 
 covered by this JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HIVE-5284) Some ObjectInspectors do not have a default constructor


 [ 
https://issues.apache.org/jira/browse/HIVE-5284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland resolved HIVE-5284.


Resolution: Won't Fix

Agreed that is the correct solution long term. However, today I was not able to 
find any OI's without a default constructor so I must have been dreaming or 
looking at an old version of the code.

 Some ObjectInspectors do not have a default constructor
 ---

 Key: HIVE-5284
 URL: https://issues.apache.org/jira/browse/HIVE-5284
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Priority: Minor

 In HIVE-5263 we started using Kryo to clone the query plan. I thought I added 
 default constructors to all object inspectors but it appears I missed a few.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HIVE-5284) Some ObjectInspectors do not have a default constructor


 [ 
https://issues.apache.org/jira/browse/HIVE-5284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland resolved HIVE-5284.


Resolution: Won't Fix

Agreed that is the correct solution long term. However, today I was not able to 
find any OI's without a default constructor so I must have been dreaming or 
looking at an old version of the code.

 Some ObjectInspectors do not have a default constructor
 ---

 Key: HIVE-5284
 URL: https://issues.apache.org/jira/browse/HIVE-5284
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Priority: Minor

 In HIVE-5263 we started using Kryo to clone the query plan. I thought I added 
 default constructors to all object inspectors but it appears I missed a few.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4487) Hive does not set explicit permissions on hive.exec.scratchdir

2013-09-16 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-4487:
---

Status: Patch Available  (was: Open)

 Hive does not set explicit permissions on hive.exec.scratchdir
 --

 Key: HIVE-4487
 URL: https://issues.apache.org/jira/browse/HIVE-4487
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Joey Echeverria
Assignee: Chaoyu Tang
 Attachments: HIVE-4487.patch


 The hive.exec.scratchdir defaults to /tmp/hive-$\{user.name\}, but when Hive 
 creates this directory it doesn't set any explicit permission on it. This 
 means if you have the default HDFS umask setting of 022, then these 
 directories end up being world readable. These permissions also get applied 
 to the staging directories and their files, thus leaving inter-stage data 
 world readable.
 This can cause a potential leak of data especially when operating on a 
 Kerberos enabled cluster. Hive should probably default these directories to 
 only be readable by the owner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5285) Custom SerDes throw cast exception when there are complex nested structures containing NonSettableObjectInspectors.


 [ 
https://issues.apache.org/jira/browse/HIVE-5285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-5285:


Attachment: HIVE-5285.2.patch.txt

The previous upload had an optimization fn hasAllFieldsSettable() which breaks 
a couple of tests because the checks are not precise. I am keeping things 
simple for now by avoiding any optimizations for creating a new 
SettableObjectInspector type. This would ensure correctness in all cases, 
however the future scope would be to prevent creating new 
SettableObjectInspector object every time we invoke getConvertedOI(). I have 
tested the changes with partition_fileformat*.q tests and they pass in my local 
machine.

 Custom SerDes throw cast exception when there are complex nested structures 
 containing NonSettableObjectInspectors.
 ---

 Key: HIVE-5285
 URL: https://issues.apache.org/jira/browse/HIVE-5285
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.1
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
Priority: Critical
 Attachments: HIVE-5285.1.patch.txt, HIVE-5285.2.patch.txt


 The approach for HIVE-5199 fix is correct.However, the fix for HIVE-5199 is 
 incomplete. Consider a complex nested structure containing the following 
 object inspector hierarchy:
 SettableStructObjectInspector
 {
   ListObjectInspectorNonSettableStructObjectInspector
 }
 In the above case, the cast exception can happen via 
 MapOperator/FetchOperator as below:
 java.io.IOException: java.lang.ClassCastException: 
 com.skype.data.hadoop.hive.proto.CustomObjectInspector cannot be cast to 
 org.apache.hadoop.hive.serde2.objectinspector.SettableMapObjectInspector
 at 
 org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:545)
 at 
 org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:489)
 at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:136)
 at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1412)
 at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:271)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
 at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:756)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
 Caused by: java.lang.ClassCastException: 
 com.skype.data.whaleshark.hadoop.hive.proto.ProtoMapObjectInspector cannot be 
 cast to 
 org.apache.hadoop.hive.serde2.objectinspector.SettableMapObjectInspector
 at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConverter(ObjectInspectorConverters.java:144)
 at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.init(ObjectInspectorConverters.java:294)
 at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConverter(ObjectInspectorConverters.java:138)
 at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$ListConverter.convert(ObjectInspectorConverters.java:251)
 at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.convert(ObjectInspectorConverters.java:316)
 at 
 org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:529)
 ... 13 more

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5297) Hive does not honor type for partition columns


 [ 
https://issues.apache.org/jira/browse/HIVE-5297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-5297:
-

Attachment: HIVE-5297.1.patch

 Hive does not honor type for partition columns
 --

 Key: HIVE-5297
 URL: https://issues.apache.org/jira/browse/HIVE-5297
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.11.0
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Attachments: HIVE-5297.1.patch


 Hive does not consider the type of the partition column while writing 
 partitions. Consider for example the query:
 {noformat}
 create table tab1 (id1 int, id2 string) PARTITIONED BY(month string,day int) 
 row format delimited fields terminated by ',';
 alter table tab1 add partition (month='June', day='second');
 {noformat}
 Hive accepts this query. However if you try to select from this table and 
 insert into another expecting schema match, it will insert nulls instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5288) Perflogger should log under single class


 [ 
https://issues.apache.org/jira/browse/HIVE-5288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-5288:
---

Status: Patch Available  (was: Open)

 Perflogger should log under single class
 

 Key: HIVE-5288
 URL: https://issues.apache.org/jira/browse/HIVE-5288
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
Priority: Minor
 Attachments: HIVE-5288.01.patch, HIVE-5288.01.patch, HIVE-5288.patch


 Perflogger should log under single class, so that it could be turned on 
 without mass logging spew. Right now the log is passed to it externally; this 
 could be preserved by passing in a string to be logged as part of the message.
 Anyway most of the time it's called from Driver and Utilities, which is a 
 pretty useless class name.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5297) Hive does not honor type for partition columns


 [ 
https://issues.apache.org/jira/browse/HIVE-5297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-5297:
-

Status: Patch Available  (was: Open)

 Hive does not honor type for partition columns
 --

 Key: HIVE-5297
 URL: https://issues.apache.org/jira/browse/HIVE-5297
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.11.0
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Attachments: HIVE-5297.1.patch


 Hive does not consider the type of the partition column while writing 
 partitions. Consider for example the query:
 {noformat}
 create table tab1 (id1 int, id2 string) PARTITIONED BY(month string,day int) 
 row format delimited fields terminated by ',';
 alter table tab1 add partition (month='June', day='second');
 {noformat}
 Hive accepts this query. However if you try to select from this table and 
 insert into another expecting schema match, it will insert nulls instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Review Request 14155: HIVE-5297 Hive does not honor type for partition columns

2013-09-16 Thread Vikram Dixit Kumaraswamy


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14155/
---

Review request for hive and Ashutosh Chauhan.


Bugs: HIVE-5297
https://issues.apache.org/jira/browse/HIVE-5297


Repository: hive-git


Description
---

Hive does not consider the type of the partition column while writing 
partitions. Consider for example the query:

create table tab1 (id1 int, id2 string) PARTITIONED BY(month string,day int) 
row format delimited fields terminated by ',';
alter table tab1 add partition (month='June', day='second');

Hive accepts this query. However if you try to select from this table and 
insert into another expecting schema match, it will insert nulls instead.


Diffs
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1af68a6 
  ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 393ef57 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 2ece97e 
  ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java a704462 
  ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g ca667d4 
  ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java 767f545 
  ql/src/test/queries/clientnegative/illegal_partition_type.q PRE-CREATION 
  ql/src/test/queries/clientnegative/illegal_partition_type2.q PRE-CREATION 
  ql/src/test/queries/clientpositive/partition_type_check.q PRE-CREATION 
  ql/src/test/results/clientnegative/illegal_partition_type.q.out PRE-CREATION 
  ql/src/test/results/clientnegative/illegal_partition_type2.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/parititon_type_check.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/partition_type_check.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/14155/diff/


Testing
---

Ran all tests.


Thanks,

Vikram Dixit Kumaraswamy

[jira] [Commented] (HIVE-5295) HiveConnection#configureConnection tries to execute statement even after it is closed


[ 
https://issues.apache.org/jira/browse/HIVE-5295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768730#comment-13768730
 ] 

Thejas M Nair commented on HIVE-5295:
-

s/Vaibhar/Vaibhav/


 HiveConnection#configureConnection tries to execute statement even after it 
 is closed
 -

 Key: HIVE-5295
 URL: https://issues.apache.org/jira/browse/HIVE-5295
 Project: Hive
  Issue Type: Bug
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
 Fix For: 0.13.0

 Attachments: D12957.1.patch


 HiveConnection#configureConnection tries to execute statement even after it 
 is closed. For remote JDBC client, it tries to set the conf var using 'set 
 foo=bar' by calling HiveStatement.execute for each conf var pair, but closes 
 the statement after the 1st iteration through the conf var pairs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: Interesting claims that seem untrue

2013-09-16 Thread Carter Shanklin

Ed,

If nothing else I'm glad it was interesting enough to generate some
discussion. These sorts of stats are always subjects of a lot of
controversy. I have seen a lot of these sorts of charts float around in
confidential slide decks and I think it's good to have them out in the open
where anyone can critique and correct them.

In this case Ed, you've pointed out a legitimate flaw in my analysis. Doing
the analysis again I found that previously, due to a bug in my scripts,
JIRAs that didn't have Hudson comments in them were not counted (this was
one way it was identifying SVN commit IDs which I have since removed due to
flakiness). Brock's patch was the single largest victim of this bug but not
the only one, there were some from Cloudera, NexR, Hortonworks, Facebook
even 2 from you Ed. The interested can see a full list of exclusions here:
https://docs.google.com/spreadsheet/ccc?key=0ArmXd5zzNQm5dDJTMkFtaUk2d0dyU3hnWGJCcUczbXc#gid=0.
I apologize to those under-represented, there wasn't any intent on my part
to minimize anyone's work. The impact in final totals is Cloudera +5.4%,
NexR +0.8%, Facebook -2.7%, Hortonworks -3.3%. I will be updating the blog
later today with relevant corrections.

There is going to be continued interest in seeing charts like these, for
example when Hive 12 is officially done. Sanjay suggested that LoC counts
may not be the best way to represent true contribution. I agree that not
all lines of code are created equal, for example a few monster patches
recently went in re-arranging HCatalog namespaces and I think also
indentation style. This (hopefully) mechanical work is not on the same
footing as adding new query language features. Still it is work and
wouldn't be fair to pretend it didn't happen. If anyone has ideas on better
ways to fairly capture contribution I'm open to suggestions.

On Thu, Sep 12, 2013 at 7:19 AM, Edward Capriolo edlinuxg...@gmail.comwrote:

I was reading the horton-works blog and found an interesting article.

http://hortonworks.com/blog/stinger-phase-2-the-journey-to-100x-faster-hive/#comment-160753

There is a very interesting graphic which attempts to demonstrate lines of
code in the 12 release.
http://hortonworks.com/wp-content/uploads/2013/09/hive4.png

Although I do not know how they are calculated, they are probably counting
code generated by tests output, but besides that they are wrong.

One claim is that Cloudera contributed 4,244 lines of code.

So to debunk that claim:

In https://issues.apache.org/jira/browse/HIVE-4675 Brock Noland from
cloudera, created the ptest2 testing framework. He did all the work for
ptest2 in hive 12, and it is clearly more then 4,244

This consists of 84 java files
[edward@desksandra ptest2]$ find . -name *.java | wc -l
84
and by itself is 8001 lines of code.
[edward@desksandra ptest2]$ find . -name *.java | xargs cat | wc -l
8001

[edward@desksandra hive-trunk]$ wc -l HIVE-4675.patch
7902 HIVE-4675.patch

This is not the only feature from cloudera in hive 12.

There is also a section of the article that talks of a ROAD MAP for hive
features. I did not know we (hive) had a road map. I have advocated
switching to feature based release and having a road map before, but it was
suggested that might limit people from itch-scratching.

--
Carter Shanklin
Director, Product Management
Hortonworks
(M): +1.650.644.8795 (T): @cshanklin http://twitter.com/cshanklin

--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.

[jira] [Created] (HIVE-5297) Hive does not honor type for partition columns

Vikram Dixit K created HIVE-5297:


 Summary: Hive does not honor type for partition columns
 Key: HIVE-5297
 URL: https://issues.apache.org/jira/browse/HIVE-5297
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.11.0
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K


Hive does not consider the type of the partition column while writing 
partitions. Consider for example the query:

{noformat}
create table tab1 (id1 int, id2 string) PARTITIONED BY(month string,day int) 
row format delimited fields terminated by ',';
alter table tab1 add partition (month='June', day='second');
{noformat}

Hive accepts this query. However if you try to select from this table and 
insert into another expecting schema match, it will insert nulls instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4444) [HCatalog] WebHCat Hive should support equivalent parameters as Pig


 [ 
https://issues.apache.org/jira/browse/HIVE-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-:
-

Attachment: HIVE--5.patch

 [HCatalog] WebHCat Hive should support equivalent parameters as Pig 
 

 Key: HIVE-
 URL: https://issues.apache.org/jira/browse/HIVE-
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.12.0

 Attachments: HIVE--1.patch, HIVE--2.patch, HIVE--3.patch, 
 HIVE--4.patch, HIVE--5.patch


 Currently there is no files and args parameter in Hive. We shall add them 
 to make them similar to Pig.
 NO PRECOMMIT TESTS 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4444) [HCatalog] WebHCat Hive should support equivalent parameters as Pig


[ 
https://issues.apache.org/jira/browse/HIVE-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768657#comment-13768657
 ] 

Daniel Dai commented on HIVE-:
--

Fixed. Sorry about that.

 [HCatalog] WebHCat Hive should support equivalent parameters as Pig 
 

 Key: HIVE-
 URL: https://issues.apache.org/jira/browse/HIVE-
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.12.0

 Attachments: HIVE--1.patch, HIVE--2.patch, HIVE--3.patch, 
 HIVE--4.patch, HIVE--5.patch


 Currently there is no files and args parameter in Hive. We shall add them 
 to make them similar to Pig.
 NO PRECOMMIT TESTS 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: Hive 0.12 release

2013-09-16 Thread Thejas Nair

I have got requests from multiple people for inclusion some
non-blocker jiras in 0.12, and some of them that I got around friday
are in the review phase. So I am planning to extending the time for
inclusion of any on non-blocker jiras by another 2-3 days. I am hoping
to get the patches for any blockers in by middle of next week.


On Thu, Aug 29, 2013 at 9:18 PM, Thejas Nair the...@hortonworks.com wrote:
 It has been more than 3 months since 0.11 was released and we already have
 294 jiras in resolved-fixed state for 0.12. This includes several new
 features such as date data type, optimizer improvements, ORC format
 improvements and many bug fixes. There are also many features look ready to
 get committed soon such as the varchar type.
 I think it is time to start preparing for a 0.12 release by creating a
 branch later next week and start stabilizing it. What do people think about
 it ?

 As we get closer to the branching, we can start discussing any additional
 features/bug fixes that we should add to the release and start monitoring
 their progress.

 Thanks,
 Thejas


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

[jira] [Updated] (HIVE-4340) ORC should provide raw data size


 [ 
https://issues.apache.org/jira/browse/HIVE-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-4340:
-

Attachment: HIVE-4340.4.patch.txt
HIVE-4340-java-only.4.patch.txt

 ORC should provide raw data size
 

 Key: HIVE-4340
 URL: https://issues.apache.org/jira/browse/HIVE-4340
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Affects Versions: 0.11.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-4340.1.patch.txt, HIVE-4340.2.patch.txt, 
 HIVE-4340.3.patch.txt, HIVE-4340.4.patch.txt, HIVE-4340-java-only.4.patch.txt


 ORC's SerDe currently does nothing, and hence does not calculate a raw data 
 size.  WriterImpl, however, has enough information to provide one.
 WriterImpl should compute a raw data size for each row, aggregate them per 
 stripe and record it in the strip information, as RC currently does in its 
 key header, and allow the FileSinkOperator access to the size per row.
 FileSinkOperator should be able to get the raw data size from either the 
 SerDe or the RecordWriter when the RecordWriter can provide it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4340) ORC should provide raw data size

[
https://issues.apache.org/jira/browse/HIVE-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768739#comment-13768739
]

Prasanth J commented on HIVE-4340:
--

I tried enhancing this patch to support SerDeStats in ORC in a slightly more
efficient and less intrusive way. The current implementation of stats gathering
happens for each row in processOp() method of FileSinkOperator. For each row, a
new SerDeStats object is created and the stats are accumulated in a hashmap.
This is good for cases where statistics gathering is not done by underlying
storage format. But in case of ORC, ORC already gathers lots of statistics
while writing the data which can be leveraged to provide SerDeStats. The
statistics gathered by ORC can be retrieved in closeOp() method of
FileSinkOperator making it more efficient than row by row processing of serde
statistics.

Uploaded patch implements the above approach.

ORC should provide raw data size

Key: HIVE-4340
URL: https://issues.apache.org/jira/browse/HIVE-4340
Project: Hive
Issue Type: Improvement
Components: File Formats
Affects Versions: 0.11.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
Attachments: HIVE-4340.1.patch.txt, HIVE-4340.2.patch.txt,
HIVE-4340.3.patch.txt, HIVE-4340.4.patch.txt, HIVE-4340-java-only.4.patch.txt

ORC's SerDe currently does nothing, and hence does not calculate a raw data
size. WriterImpl, however, has enough information to provide one.
WriterImpl should compute a raw data size for each row, aggregate them per
stripe and record it in the strip information, as RC currently does in its
key header, and allow the FileSinkOperator access to the size per row.
FileSinkOperator should be able to get the raw data size from either the
SerDe or the RecordWriter when the RecordWriter can provide it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5297) Hive does not honor type for partition columns


 [ 
https://issues.apache.org/jira/browse/HIVE-5297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-5297:
-

Description: 
Hive does not consider the type of the partition column while writing 
partitions. Consider for example the query:

{noformat}
create table tab1 (id1 int, id2 string) PARTITIONED BY(month string,day int) 
row format delimited fields terminated by ',';
alter table tab1 add partition (month='June', day='second');
{noformat}

Hive accepts this query. However if you try to select from this table and 
insert into another expecting schema match, it will insert nulls instead. We 
should throw an exception on such user error at the time the load happens.

  was:
Hive does not consider the type of the partition column while writing 
partitions. Consider for example the query:

{noformat}
create table tab1 (id1 int, id2 string) PARTITIONED BY(month string,day int) 
row format delimited fields terminated by ',';
alter table tab1 add partition (month='June', day='second');
{noformat}

Hive accepts this query. However if you try to select from this table and 
insert into another expecting schema match, it will insert nulls instead.


 Hive does not honor type for partition columns
 --

 Key: HIVE-5297
 URL: https://issues.apache.org/jira/browse/HIVE-5297
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.11.0
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Attachments: HIVE-5297.1.patch


 Hive does not consider the type of the partition column while writing 
 partitions. Consider for example the query:
 {noformat}
 create table tab1 (id1 int, id2 string) PARTITIONED BY(month string,day int) 
 row format delimited fields terminated by ',';
 alter table tab1 add partition (month='June', day='second');
 {noformat}
 Hive accepts this query. However if you try to select from this table and 
 insert into another expecting schema match, it will insert nulls instead. We 
 should throw an exception on such user error at the time the load happens.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Review Request 14162: HIVE-4340: ORC should provide raw data size

2013-09-16 Thread j . prasanth . j


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14162/
---

Review request for hive, Ashutosh Chauhan and Owen O'Malley.


Bugs: HIVE-4340
https://issues.apache.org/jira/browse/HIVE-4340


Repository: hive-git


Description
---

ORC's SerDe currently does nothing, and hence does not calculate a raw data 
size.  WriterImpl, however, has enough information to provide one.

WriterImpl should compute a raw data size for each row, aggregate them per 
stripe and record it in the strip information, as RC currently does in its key 
header, and allow the FileSinkOperator access to the size per row.

FileSinkOperator should be able to get the raw data size from either the SerDe 
or the RecordWriter when the RecordWriter can provide it.


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java bcee201 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/BinaryColumnStatistics.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/ColumnStatisticsImpl.java 
6268617 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java c80fb02 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/Reader.java 90260fd 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java c454f32 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/StringColumnStatistics.java 
72e779a 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/Writer.java 8e74b91 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java 44961ce 
  ql/src/protobuf/org/apache/hadoop/hive/ql/io/orc/orc_proto.proto edbf822 
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java e6569f4 
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcNullOptimization.java 
b93db84 
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcSerDeStats.java 
PRE-CREATION 
  ql/src/test/resources/orc-file-dump-dictionary-threshold.out 003c132 
  ql/src/test/resources/orc-file-dump.out fac5326 
  serde/src/java/org/apache/hadoop/hive/serde2/SerDeStats.java 1c09dc3 

Diff: https://reviews.apache.org/r/14162/diff/


Testing
---

All unit tests and q file tests related to ORC are passing.


Thanks,

Prasanth_J

[jira] [Commented] (HIVE-5295) HiveConnection#configureConnection tries to execute statement even after it is closed


[ 
https://issues.apache.org/jira/browse/HIVE-5295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768729#comment-13768729
 ] 

Thejas M Nair commented on HIVE-5295:
-

Vaibhar, can you also please add tests for the fix ?


 HiveConnection#configureConnection tries to execute statement even after it 
 is closed
 -

 Key: HIVE-5295
 URL: https://issues.apache.org/jira/browse/HIVE-5295
 Project: Hive
  Issue Type: Bug
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
 Fix For: 0.13.0

 Attachments: D12957.1.patch


 HiveConnection#configureConnection tries to execute statement even after it 
 is closed. For remote JDBC client, it tries to set the conf var using 'set 
 foo=bar' by calling HiveStatement.execute for each conf var pair, but closes 
 the statement after the 1st iteration through the conf var pairs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5297) Hive does not honor type for partition columns


[ 
https://issues.apache.org/jira/browse/HIVE-5297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768798#comment-13768798
 ] 

Ashutosh Chauhan commented on HIVE-5297:


Can you create a phabricator entry for this?

 Hive does not honor type for partition columns
 --

 Key: HIVE-5297
 URL: https://issues.apache.org/jira/browse/HIVE-5297
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.11.0
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Attachments: HIVE-5297.1.patch


 Hive does not consider the type of the partition column while writing 
 partitions. Consider for example the query:
 {noformat}
 create table tab1 (id1 int, id2 string) PARTITIONED BY(month string,day int) 
 row format delimited fields terminated by ',';
 alter table tab1 add partition (month='June', day='second');
 {noformat}
 Hive accepts this query. However if you try to select from this table and 
 insert into another expecting schema match, it will insert nulls instead. We 
 should throw an exception on such user error at the time the load happens.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4961) Create bridge for custom UDFs to operate in vectorized mode

2013-09-16 Thread Eric Hanson (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-4961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768839#comment-13768839
]

Eric Hanson commented on HIVE-4961:
---

As far as I can tell, the 11 test failures report in the last test run are not
related to this patch.

Create bridge for custom UDFs to operate in vectorized mode
---

Key: HIVE-4961
URL: https://issues.apache.org/jira/browse/HIVE-4961
Project: Hive
Issue Type: Sub-task
Affects Versions: vectorization-branch
Reporter: Eric Hanson
Assignee: Eric Hanson
Fix For: vectorization-branch

Attachments: HIVE-4961.1-vectorization.patch,
HIVE-4961.2-vectorization.patch, HIVE-4961.3-vectorization.patch,
vectorUDF.4.patch, vectorUDF.5.patch, vectorUDF.8.patch, vectorUDF.9.patch

Suppose you have a custom UDF myUDF() that you've created to extend hive. The
goal of this JIRA is to create a facility where if you run a query that uses
myUDF() in an expression, the query will run in vectorized mode.
This would be a general-purpose bridge for custom UDFs that users add to
Hive. It would work with existing UDFs.
I'm considering a separate JIRA for a new kind of custom UDF implementation
that is vectorized from the beginning, to optimize performance. That is not
covered by this JIRA.

[jira] [Commented] (HIVE-5285) Custom SerDes throw cast exception when there are complex nested structures containing NonSettableObjectInspectors.


[ 
https://issues.apache.org/jira/browse/HIVE-5285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768844#comment-13768844
 ] 

Hive QA commented on HIVE-5285:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12603387/HIVE-5285.2.patch.txt

{color:green}SUCCESS:{color} +1 3125 tests passed

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/765/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/765/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

 Custom SerDes throw cast exception when there are complex nested structures 
 containing NonSettableObjectInspectors.
 ---

 Key: HIVE-5285
 URL: https://issues.apache.org/jira/browse/HIVE-5285
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.1
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
Priority: Critical
 Attachments: HIVE-5285.1.patch.txt, HIVE-5285.2.patch.txt


 The approach for HIVE-5199 fix is correct.However, the fix for HIVE-5199 is 
 incomplete. Consider a complex nested structure containing the following 
 object inspector hierarchy:
 SettableStructObjectInspector
 {
   ListObjectInspectorNonSettableStructObjectInspector
 }
 In the above case, the cast exception can happen via 
 MapOperator/FetchOperator as below:
 java.io.IOException: java.lang.ClassCastException: 
 com.skype.data.hadoop.hive.proto.CustomObjectInspector cannot be cast to 
 org.apache.hadoop.hive.serde2.objectinspector.SettableMapObjectInspector
 at 
 org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:545)
 at 
 org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:489)
 at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:136)
 at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1412)
 at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:271)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
 at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:756)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
 Caused by: java.lang.ClassCastException: 
 com.skype.data.whaleshark.hadoop.hive.proto.ProtoMapObjectInspector cannot be 
 cast to 
 org.apache.hadoop.hive.serde2.objectinspector.SettableMapObjectInspector
 at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConverter(ObjectInspectorConverters.java:144)
 at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.init(ObjectInspectorConverters.java:294)
 at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConverter(ObjectInspectorConverters.java:138)
 at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$ListConverter.convert(ObjectInspectorConverters.java:251)
 at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.convert(ObjectInspectorConverters.java:316)
 at 
 org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:529)
 ... 13 more

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-5298) AvroSerde performance problem caused by HIVE-3833

2013-09-16 Thread Xuefu Zhang (JIRA)

Xuefu Zhang created HIVE-5298:
-

 Summary: AvroSerde performance problem caused by HIVE-3833
 Key: HIVE-5298
 URL: https://issues.apache.org/jira/browse/HIVE-5298
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.11.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Fix For: 0.13.0


HIVE-3833 fixed the targeted problem and made Hive to use partition-level 
metadata to initialize object inspector. In doing that, however, it goes thru 
every file under the table to access the partition metadata, which is very 
inefficient, especially in case of multiple files per partition. This causes 
more problem for AvroSerde because AvroSerde initialization accesses schema, 
which is located on file system. As a result, before hive can process any data, 
it needs to access every file for a table, which can take long enough to cause 
job failure because of lack of job progress.

The improvement can be made so that partition metadata is only access once per 
partition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4340) ORC should provide raw data size


 [ 
https://issues.apache.org/jira/browse/HIVE-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-4340:
-

Attachment: (was: HIVE-4340-java-only.4.patch.txt)

 ORC should provide raw data size
 

 Key: HIVE-4340
 URL: https://issues.apache.org/jira/browse/HIVE-4340
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Affects Versions: 0.11.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-4340.1.patch.txt, HIVE-4340.2.patch.txt, 
 HIVE-4340.3.patch.txt, HIVE-4340.4.patch.txt, HIVE-4340-java-only.4.patch.txt


 ORC's SerDe currently does nothing, and hence does not calculate a raw data 
 size.  WriterImpl, however, has enough information to provide one.
 WriterImpl should compute a raw data size for each row, aggregate them per 
 stripe and record it in the strip information, as RC currently does in its 
 key header, and allow the FileSinkOperator access to the size per row.
 FileSinkOperator should be able to get the raw data size from either the 
 SerDe or the RecordWriter when the RecordWriter can provide it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: Review Request 14162: HIVE-4340: ORC should provide raw data size

2013-09-16 Thread j . prasanth . j


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14162/
---

(Updated Sept. 16, 2013, 10:10 p.m.)


Review request for hive, Ashutosh Chauhan and Owen O'Malley.


Changes
---

added UNION case to ORC writer raw data size computation.


Bugs: HIVE-4340
https://issues.apache.org/jira/browse/HIVE-4340


Repository: hive-git


Description
---

ORC's SerDe currently does nothing, and hence does not calculate a raw data 
size.  WriterImpl, however, has enough information to provide one.

WriterImpl should compute a raw data size for each row, aggregate them per 
stripe and record it in the strip information, as RC currently does in its key 
header, and allow the FileSinkOperator access to the size per row.

FileSinkOperator should be able to get the raw data size from either the SerDe 
or the RecordWriter when the RecordWriter can provide it.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java bcee201 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/BinaryColumnStatistics.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/ColumnStatisticsImpl.java 
6268617 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java c80fb02 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/Reader.java 90260fd 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java c454f32 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/StringColumnStatistics.java 
72e779a 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/Writer.java 8e74b91 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java 44961ce 
  ql/src/protobuf/org/apache/hadoop/hive/ql/io/orc/orc_proto.proto edbf822 
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java e6569f4 
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcNullOptimization.java 
b93db84 
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcSerDeStats.java 
PRE-CREATION 
  ql/src/test/resources/orc-file-dump-dictionary-threshold.out 003c132 
  ql/src/test/resources/orc-file-dump.out fac5326 
  serde/src/java/org/apache/hadoop/hive/serde2/SerDeStats.java 1c09dc3 

Diff: https://reviews.apache.org/r/14162/diff/


Testing
---

All unit tests and q file tests related to ORC are passing.


Thanks,

Prasanth_J

[jira] [Commented] (HIVE-4340) ORC should provide raw data size


[ 
https://issues.apache.org/jira/browse/HIVE-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768848#comment-13768848
 ] 

Prasanth J commented on HIVE-4340:
--

added UNION case to ORC writer raw data size computation.

 ORC should provide raw data size
 

 Key: HIVE-4340
 URL: https://issues.apache.org/jira/browse/HIVE-4340
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Affects Versions: 0.11.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-4340.1.patch.txt, HIVE-4340.2.patch.txt, 
 HIVE-4340.3.patch.txt, HIVE-4340.4.patch.txt, HIVE-4340-java-only.4.patch.txt


 ORC's SerDe currently does nothing, and hence does not calculate a raw data 
 size.  WriterImpl, however, has enough information to provide one.
 WriterImpl should compute a raw data size for each row, aggregate them per 
 stripe and record it in the strip information, as RC currently does in its 
 key header, and allow the FileSinkOperator access to the size per row.
 FileSinkOperator should be able to get the raw data size from either the 
 SerDe or the RecordWriter when the RecordWriter can provide it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4340) ORC should provide raw data size


 [ 
https://issues.apache.org/jira/browse/HIVE-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-4340:
-

Attachment: HIVE-4340.4.patch.txt
HIVE-4340-java-only.4.patch.txt

 ORC should provide raw data size
 

 Key: HIVE-4340
 URL: https://issues.apache.org/jira/browse/HIVE-4340
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Affects Versions: 0.11.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-4340.1.patch.txt, HIVE-4340.2.patch.txt, 
 HIVE-4340.3.patch.txt, HIVE-4340.4.patch.txt, HIVE-4340-java-only.4.patch.txt


 ORC's SerDe currently does nothing, and hence does not calculate a raw data 
 size.  WriterImpl, however, has enough information to provide one.
 WriterImpl should compute a raw data size for each row, aggregate them per 
 stripe and record it in the strip information, as RC currently does in its 
 key header, and allow the FileSinkOperator access to the size per row.
 FileSinkOperator should be able to get the raw data size from either the 
 SerDe or the RecordWriter when the RecordWriter can provide it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4340) ORC should provide raw data size


 [ 
https://issues.apache.org/jira/browse/HIVE-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-4340:
-

Attachment: (was: HIVE-4340.4.patch.txt)

 ORC should provide raw data size
 

 Key: HIVE-4340
 URL: https://issues.apache.org/jira/browse/HIVE-4340
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Affects Versions: 0.11.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-4340.1.patch.txt, HIVE-4340.2.patch.txt, 
 HIVE-4340.3.patch.txt, HIVE-4340.4.patch.txt, HIVE-4340-java-only.4.patch.txt


 ORC's SerDe currently does nothing, and hence does not calculate a raw data 
 size.  WriterImpl, however, has enough information to provide one.
 WriterImpl should compute a raw data size for each row, aggregate them per 
 stripe and record it in the strip information, as RC currently does in its 
 key header, and allow the FileSinkOperator access to the size per row.
 FileSinkOperator should be able to get the raw data size from either the 
 SerDe or the RecordWriter when the RecordWriter can provide it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4531) [WebHCat] Collecting task logs to hdfs


 [ 
https://issues.apache.org/jira/browse/HIVE-4531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-4531:
-

Status: Open  (was: Patch Available)

 [WebHCat] Collecting task logs to hdfs
 --

 Key: HIVE-4531
 URL: https://issues.apache.org/jira/browse/HIVE-4531
 Project: Hive
  Issue Type: New Feature
  Components: HCatalog
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.12.0

 Attachments: HIVE-4531-1.patch, HIVE-4531-2.patch, HIVE-4531-3.patch, 
 HIVE-4531-4.patch, HIVE-4531-5.patch, HIVE-4531-6.patch, HIVE-4531-7.patch, 
 HIVE-4531-8.patch, samplestatusdirwithlist.tar.gz


 It would be nice we collect task logs after job finish. This is similar to 
 what Amazon EMR does.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4531) [WebHCat] Collecting task logs to hdfs


[ 
https://issues.apache.org/jira/browse/HIVE-4531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768853#comment-13768853
 ] 

Eugene Koifman commented on HIVE-4531:
--

WebHCat e2e + HCat unit tests pass

 [WebHCat] Collecting task logs to hdfs
 --

 Key: HIVE-4531
 URL: https://issues.apache.org/jira/browse/HIVE-4531
 Project: Hive
  Issue Type: New Feature
  Components: HCatalog
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.12.0

 Attachments: HIVE-4531-1.patch, HIVE-4531-2.patch, HIVE-4531-3.patch, 
 HIVE-4531-4.patch, HIVE-4531-5.patch, HIVE-4531-6.patch, HIVE-4531-7.patch, 
 HIVE-4531-8.patch, samplestatusdirwithlist.tar.gz


 It would be nice we collect task logs after job finish. This is similar to 
 what Amazon EMR does.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: Review Request 14162: HIVE-4340: ORC should provide raw data size

2013-09-16 Thread j . prasanth . j


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14162/
---

(Updated Sept. 16, 2013, 10:17 p.m.)


Review request for hive, Ashutosh Chauhan and Owen O'Malley.


Changes
---

The earlier patch didn't apply cleanly. Reuploading a new one.


Bugs: HIVE-4340
https://issues.apache.org/jira/browse/HIVE-4340


Repository: hive-git


Description
---

ORC's SerDe currently does nothing, and hence does not calculate a raw data 
size.  WriterImpl, however, has enough information to provide one.

WriterImpl should compute a raw data size for each row, aggregate them per 
stripe and record it in the strip information, as RC currently does in its key 
header, and allow the FileSinkOperator access to the size per row.

FileSinkOperator should be able to get the raw data size from either the SerDe 
or the RecordWriter when the RecordWriter can provide it.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java bcee201 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/BinaryColumnStatistics.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/ColumnStatisticsImpl.java 
6268617 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java c80fb02 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/Reader.java 90260fd 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java c454f32 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/StringColumnStatistics.java 
72e779a 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/Writer.java 8e74b91 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java 44961ce 
  ql/src/protobuf/org/apache/hadoop/hive/ql/io/orc/orc_proto.proto edbf822 
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java e6569f4 
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcNullOptimization.java 
b93db84 
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcSerDeStats.java 
PRE-CREATION 
  ql/src/test/resources/orc-file-dump-dictionary-threshold.out 003c132 
  ql/src/test/resources/orc-file-dump.out fac5326 
  serde/src/java/org/apache/hadoop/hive/serde2/SerDeStats.java 1c09dc3 

Diff: https://reviews.apache.org/r/14162/diff/


Testing
---

All unit tests and q file tests related to ORC are passing.


Thanks,

Prasanth_J

[jira] [Updated] (HIVE-4340) ORC should provide raw data size


 [ 
https://issues.apache.org/jira/browse/HIVE-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-4340:
-

Attachment: (was: HIVE-4340.4.patch.txt)

 ORC should provide raw data size
 

 Key: HIVE-4340
 URL: https://issues.apache.org/jira/browse/HIVE-4340
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Affects Versions: 0.11.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-4340.1.patch.txt, HIVE-4340.2.patch.txt, 
 HIVE-4340.3.patch.txt


 ORC's SerDe currently does nothing, and hence does not calculate a raw data 
 size.  WriterImpl, however, has enough information to provide one.
 WriterImpl should compute a raw data size for each row, aggregate them per 
 stripe and record it in the strip information, as RC currently does in its 
 key header, and allow the FileSinkOperator access to the size per row.
 FileSinkOperator should be able to get the raw data size from either the 
 SerDe or the RecordWriter when the RecordWriter can provide it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4340) ORC should provide raw data size


 [ 
https://issues.apache.org/jira/browse/HIVE-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-4340:
-

Attachment: (was: HIVE-4340-java-only.4.patch.txt)

 ORC should provide raw data size
 

 Key: HIVE-4340
 URL: https://issues.apache.org/jira/browse/HIVE-4340
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Affects Versions: 0.11.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-4340.1.patch.txt, HIVE-4340.2.patch.txt, 
 HIVE-4340.3.patch.txt


 ORC's SerDe currently does nothing, and hence does not calculate a raw data 
 size.  WriterImpl, however, has enough information to provide one.
 WriterImpl should compute a raw data size for each row, aggregate them per 
 stripe and record it in the strip information, as RC currently does in its 
 key header, and allow the FileSinkOperator access to the size per row.
 FileSinkOperator should be able to get the raw data size from either the 
 SerDe or the RecordWriter when the RecordWriter can provide it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4340) ORC should provide raw data size


 [ 
https://issues.apache.org/jira/browse/HIVE-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-4340:
-

Attachment: HIVE-4340.4.patch.txt
HIVE-4340-java-only.4.patch.txt

The earlier patch upload didn't apply cleanly. Reuploading a new one.

 ORC should provide raw data size
 

 Key: HIVE-4340
 URL: https://issues.apache.org/jira/browse/HIVE-4340
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Affects Versions: 0.11.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-4340.1.patch.txt, HIVE-4340.2.patch.txt, 
 HIVE-4340.3.patch.txt, HIVE-4340.4.patch.txt, HIVE-4340-java-only.4.patch.txt


 ORC's SerDe currently does nothing, and hence does not calculate a raw data 
 size.  WriterImpl, however, has enough information to provide one.
 WriterImpl should compute a raw data size for each row, aggregate them per 
 stripe and record it in the strip information, as RC currently does in its 
 key header, and allow the FileSinkOperator access to the size per row.
 FileSinkOperator should be able to get the raw data size from either the 
 SerDe or the RecordWriter when the RecordWriter can provide it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5206) Support parameterized primitive types


 [ 
https://issues.apache.org/jira/browse/HIVE-5206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-5206:
-

Attachment: HIVE-5206.v12.1.patch

attaching HIVE-5206.v12.1.patch, for use in 0.12 branch

 Support parameterized primitive types
 -

 Key: HIVE-5206
 URL: https://issues.apache.org/jira/browse/HIVE-5206
 Project: Hive
  Issue Type: Improvement
  Components: Types
Reporter: Jason Dere
Assignee: Jason Dere
 Fix For: 0.13.0

 Attachments: HIVE-5206.1.patch, HIVE-5206.2.patch, HIVE-5206.3.patch, 
 HIVE-5206.4.patch, HIVE-5206.D12693.1.patch, HIVE-5206.v12.1.patch


 Support for parameterized types is needed for char/varchar/decimal support. 
 This adds a type parameters value to the 
 PrimitiveTypeEntry/PrimitiveTypeInfo/PrimitiveObjectInspector objects. 
 NO PRECOMMIT TESTS - dependent on HIVE-5203/HIVE-5204

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4844) Add varchar data type


 [ 
https://issues.apache.org/jira/browse/HIVE-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-4844:
-

Attachment: HIVE-4844.v12.1.patch

attaching HIVE-4844.v12.1.patch, for use in 0.12 branch

 Add varchar data type
 -

 Key: HIVE-4844
 URL: https://issues.apache.org/jira/browse/HIVE-4844
 Project: Hive
  Issue Type: New Feature
  Components: Types
Reporter: Jason Dere
Assignee: Jason Dere
 Fix For: 0.13.0

 Attachments: HIVE-4844.10.patch, HIVE-4844.11.patch, 
 HIVE-4844.12.patch, HIVE-4844.13.patch, HIVE-4844.14.patch, 
 HIVE-4844.15.patch, HIVE-4844.16.patch, HIVE-4844.17.patch, 
 HIVE-4844.18.patch, HIVE-4844.19.patch, HIVE-4844.1.patch.hack, 
 HIVE-4844.2.patch, HIVE-4844.3.patch, HIVE-4844.4.patch, HIVE-4844.5.patch, 
 HIVE-4844.6.patch, HIVE-4844.7.patch, HIVE-4844.8.patch, HIVE-4844.9.patch, 
 HIVE-4844.D12699.1.patch, HIVE-4844.D12891.1.patch, HIVE-4844.v12.1.patch, 
 screenshot.png


 Add new varchar data types which have support for more SQL-compliant 
 behavior, such as SQL string comparison semantics, max length, etc.
 Char type will be added as another task.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5278) Move some string UDFs to GenericUDFs, for better varchar support


 [ 
https://issues.apache.org/jira/browse/HIVE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-5278:
-

Attachment: HIVE-5278.v12.1.patch

attaching HIVE-5278.v12.1.patch, for use in 0.12 branch

 Move some string UDFs to GenericUDFs, for better varchar support
 

 Key: HIVE-5278
 URL: https://issues.apache.org/jira/browse/HIVE-5278
 Project: Hive
  Issue Type: Improvement
  Components: Types, UDF
Reporter: Jason Dere
Assignee: Jason Dere
 Fix For: 0.13.0

 Attachments: D12909.1.patch, HIVE-5278.1.patch, HIVE-5278.2.patch, 
 HIVE-5278.v12.1.patch


 To better support varchar/char types in string UDFs, select UDFs should be 
 converted to GenericUDFs. This allows the UDF to return the resulting 
 char/varchar length in the type metadata.
 This work is being split off as a separate task from HIVE-4844. The initial 
 UDFs as part of this work are concat/lower/upper.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5161) Additional SerDe support for varchar type


 [ 
https://issues.apache.org/jira/browse/HIVE-5161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-5161:
-

Attachment: HIVE-5161.v12.1.patch

attaching HIVE-5161.v12.1.patch, for use in 0.12 branch. Code generated using 
protobuf-2.4

 Additional SerDe support for varchar type
 -

 Key: HIVE-5161
 URL: https://issues.apache.org/jira/browse/HIVE-5161
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers, Types
Reporter: Jason Dere
Assignee: Jason Dere
 Fix For: 0.13.0

 Attachments: D12897.1.patch, HIVE-5161.1.patch, HIVE-5161.2.patch, 
 HIVE-5161.3.patch, HIVE-5161.v12.1.patch


 Breaking out support for varchar for the various SerDes as an additional task.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4340) ORC should provide raw data size


[ 
https://issues.apache.org/jira/browse/HIVE-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768786#comment-13768786
 ] 

Prasanth J commented on HIVE-4340:
--

Review board entry https://reviews.apache.org/r/14162

 ORC should provide raw data size
 

 Key: HIVE-4340
 URL: https://issues.apache.org/jira/browse/HIVE-4340
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Affects Versions: 0.11.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-4340.1.patch.txt, HIVE-4340.2.patch.txt, 
 HIVE-4340.3.patch.txt, HIVE-4340.4.patch.txt, HIVE-4340-java-only.4.patch.txt


 ORC's SerDe currently does nothing, and hence does not calculate a raw data 
 size.  WriterImpl, however, has enough information to provide one.
 WriterImpl should compute a raw data size for each row, aggregate them per 
 stripe and record it in the strip information, as RC currently does in its 
 key header, and allow the FileSinkOperator access to the size per row.
 FileSinkOperator should be able to get the raw data size from either the 
 SerDe or the RecordWriter when the RecordWriter can provide it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5070) Need to implement listLocatedStatus() in ProxyFileSystem

2013-09-16 Thread shanyu zhao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shanyu zhao updated HIVE-5070:
--

Attachment: HIVE-5070-v2.patch

V2 of the patch uploaded. To minimize code replication, I created 
ProxyFileSystemBase class where all the 0.20, 0.20S and 0.23 shims reuse where 
0.23 shim override the listLocatedStatus() method.

 Need to implement listLocatedStatus() in ProxyFileSystem
 

 Key: HIVE-5070
 URL: https://issues.apache.org/jira/browse/HIVE-5070
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.11.0
Reporter: shanyu zhao
 Fix For: 0.11.1

 Attachments: HIVE-5070.patch.txt, HIVE-5070-v2.patch


 MAPREDUCE-1981 introduced a new API for FileSystem - listLocatedStatus. It is 
 used in Hadoop's FileInputFormat.getSplits(). Hive's ProxyFileSystem class 
 needs to implement this API in order to make Hive unit test work.
 Otherwise, you'll see these exceptions when running TestCliDriver test case, 
 e.g. results of running allcolref_in_udf.q:
 [junit] Running org.apache.hadoop.hive.cli.TestCliDriver
 [junit] Begin query: allcolref_in_udf.q
 [junit] java.lang.IllegalArgumentException: Wrong FS: 
 pfile:/GitHub/Monarch/project/hive-monarch/build/ql/test/data/warehouse/src, 
 expected: file:///
 [junit]   at 
 org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:642)
 [junit]   at 
 org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:69)
 [junit]   at 
 org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:375)
 [junit]   at 
 org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1482)
 [junit]   at 
 org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1522)
 [junit]   at 
 org.apache.hadoop.fs.FileSystem$4.init(FileSystem.java:1798)
 [junit]   at 
 org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:1797)
 [junit]   at 
 org.apache.hadoop.fs.ChecksumFileSystem.listLocatedStatus(ChecksumFileSystem.java:579)
 [junit]   at 
 org.apache.hadoop.fs.FilterFileSystem.listLocatedStatus(FilterFileSystem.java:235)
 [junit]   at 
 org.apache.hadoop.fs.FilterFileSystem.listLocatedStatus(FilterFileSystem.java:235)
 [junit]   at 
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:264)
 [junit]   at 
 org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:217)
 [junit]   at 
 org.apache.hadoop.mapred.lib.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:69)
 [junit]   at 
 org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:385)
 [junit]   at 
 org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:351)
 [junit]   at 
 org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:389)
 [junit]   at 
 org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:503)
 [junit]   at 
 org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:495)
 [junit]   at 
 org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:390)
 [junit]   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268)
 [junit]   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265)
 [junit]   at java.security.AccessController.doPrivileged(Native Method)
 [junit]   at javax.security.auth.Subject.doAs(Subject.java:396)
 [junit]   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1481)
 [junit]   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265)
 [junit]   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
 [junit]   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:552)
 [junit]   at java.security.AccessController.doPrivileged(Native Method)
 [junit]   at javax.security.auth.Subject.doAs(Subject.java:396)
 [junit]   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1481)
 [junit]   at 
 org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:552)
 [junit]   at 
 org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:543)
 [junit]   at 
 org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:448)
 [junit]   at 
 org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:688)
 [junit]   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 [junit]   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 [junit]   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 [junit]

[jira] [Commented] (HIVE-4763) add support for thrift over http transport in HS2

2013-09-16 Thread Vaibhav Gumashta (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768920#comment-13768920
 ] 

Vaibhav Gumashta commented on HIVE-4763:


[~cwsteinbach] [~thejas] I've uploaded another wip patch. Now fixing the test 
suite changes: OOM exception + reorganizing the test classes. 

 add support for thrift over http transport in HS2
 -

 Key: HIVE-4763
 URL: https://issues.apache.org/jira/browse/HIVE-4763
 Project: Hive
  Issue Type: Sub-task
  Components: HiveServer2
Reporter: Thejas M Nair
Assignee: Vaibhav Gumashta
 Fix For: 0.12.0

 Attachments: HIVE-4763.1.patch, HIVE-4763.2.patch, 
 HIVE-4763.D12855.1.patch


 Subtask for adding support for http transport mode for thrift api in hive 
 server2.
 Support for the different authentication modes will be part of another 
 subtask.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5296) Memory leak: OOM Error after multiple open/closed JDBC connections.

2013-09-16 Thread Kousuke Saruta (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768919#comment-13768919
 ] 

Kousuke Saruta commented on HIVE-5296:
--

I have some questions.

1. Which server-side (Hiveserver2 process) or client-side, does the memory leak 
occur?
2. What query did you execute?
3. If you have already grasped, could you tell me which object increase?

 Memory leak: OOM Error after multiple open/closed JDBC connections. 
 

 Key: HIVE-5296
 URL: https://issues.apache.org/jira/browse/HIVE-5296
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.12.0
 Environment: Hive 0.12.0, Hadoop 1.1.2, Debian.
Reporter: Douglas
  Labels: hiveserver
 Fix For: 0.12.0

   Original Estimate: 168h
  Remaining Estimate: 168h

 This error seems to relate to https://issues.apache.org/jira/browse/HIVE-3481
 However, on inspection of the related patch and my built version of Hive 
 (patch carried forward to 0.12.0), I am still seeing the described behaviour.
 Multiple connections to Hiveserver2, all of which are closed and disposed of 
 properly show the Java heap size to grow extremely quickly. 
 This issue can be recreated using the following code
 {code}
 import java.sql.DriverManager;
 import java.sql.Connection;
 import java.sql.ResultSet;
 import java.sql.SQLException;
 import java.sql.Statement;
 import java.util.Properties;
 import org.apache.hive.service.cli.HiveSQLException;
 import org.apache.log4j.Logger;
 /*
  * Class which encapsulates the lifecycle of a query or statement.
  * Provides functionality which allows you to create a connection
  */
 public class HiveClient {
   
   Connection con;
   Logger logger;
   private static String driverName = org.apache.hive.jdbc.HiveDriver;   
   private String db;
   
   
   public HiveClient(String db)
   {   
   logger = Logger.getLogger(HiveClient.class);
   this.db=db;
   
   try{
Class.forName(driverName);
   }catch(ClassNotFoundException e){
   logger.info(Can't find Hive driver);
   }
   
   String hiveHost = GlimmerServer.config.getString(hive/host);
   String hivePort = GlimmerServer.config.getString(hive/port);
   String connectionString = jdbc:hive2://+hiveHost+:+hivePort 
 +/default;
   logger.info(String.format(Attempting to connect to 
 %s,connectionString));
   try{
   con = 
 DriverManager.getConnection(connectionString,,);  
 
   }catch(Exception e){
   logger.error(Problem instantiating the 
 connection+e.getMessage());
   }   
   }
   
   public int update(String query) 
   {
   Integer res = 0;
   Statement stmt = null;
   try{
   stmt = con.createStatement();
   String switchdb = USE +db;
   logger.info(switchdb);  
   stmt.executeUpdate(switchdb);
   logger.info(query);
   res = stmt.executeUpdate(query);
   logger.info(Query passed to server);  
   stmt.close();
   }catch(HiveSQLException e){
   logger.info(String.format(HiveSQLException thrown, 
 this can be valid,  +
   but check the error: %s from the query 
 %s,query,e.toString()));
   }catch(SQLException e){
   logger.error(String.format(Unable to execute query 
 SQLException %s. Error: %s,query,e));
   }catch(Exception e){
   logger.error(String.format(Unable to execute query %s. 
 Error: %s,query,e));
   }
   
   if(stmt!=null)
   try{
   stmt.close();
   }catch(SQLException e){
   logger.error(Cannot close the statment, 
 potentially memory leak +e);
   }
   
   return res;
   }
   
   public void close()
   {
   if(con!=null){
   try {
   con.close();
   } catch (SQLException e) {  
   logger.info(Problem closing connection +e);
   }
   }

[jira] [Updated] (HIVE-5267) Use array instead of Collections if possible in DemuxOperator

2013-09-16 Thread Yin Huai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-5267:
---

Attachment: HIVE-5267.patch

Navis, I am uploading a new patch (HIVE-5267.patch) which includes the change I 
mentioned in phabricator.

 Use array instead of Collections if possible in DemuxOperator
 -

 Key: HIVE-5267
 URL: https://issues.apache.org/jira/browse/HIVE-5267
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-5267.D12867.1.patch, HIVE-5267.patch


 DemuxOperator accesses Maps twice+ for each row, which can be replaced by 
 array.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HIVE-5298) AvroSerde performance problem caused by HIVE-3833


 [ 
https://issues.apache.org/jira/browse/HIVE-5298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K reassigned HIVE-5298:


Assignee: Vikram Dixit K  (was: Xuefu Zhang)

 AvroSerde performance problem caused by HIVE-3833
 -

 Key: HIVE-5298
 URL: https://issues.apache.org/jira/browse/HIVE-5298
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.11.0
Reporter: Xuefu Zhang
Assignee: Vikram Dixit K
 Fix For: 0.13.0


 HIVE-3833 fixed the targeted problem and made Hive to use partition-level 
 metadata to initialize object inspector. In doing that, however, it goes thru 
 every file under the table to access the partition metadata, which is very 
 inefficient, especially in case of multiple files per partition. This causes 
 more problem for AvroSerde because AvroSerde initialization accesses schema, 
 which is located on file system. As a result, before hive can process any 
 data, it needs to access every file for a table, which can take long enough 
 to cause job failure because of lack of job progress.
 The improvement can be made so that partition metadata is only access once 
 per partition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5298) AvroSerde performance problem caused by HIVE-3833


 [ 
https://issues.apache.org/jira/browse/HIVE-5298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-5298:
-

Assignee: Xuefu Zhang  (was: Vikram Dixit K)

 AvroSerde performance problem caused by HIVE-3833
 -

 Key: HIVE-5298
 URL: https://issues.apache.org/jira/browse/HIVE-5298
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.11.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Fix For: 0.13.0


 HIVE-3833 fixed the targeted problem and made Hive to use partition-level 
 metadata to initialize object inspector. In doing that, however, it goes thru 
 every file under the table to access the partition metadata, which is very 
 inefficient, especially in case of multiple files per partition. This causes 
 more problem for AvroSerde because AvroSerde initialization accesses schema, 
 which is located on file system. As a result, before hive can process any 
 data, it needs to access every file for a table, which can take long enough 
 to cause job failure because of lack of job progress.
 The improvement can be made so that partition metadata is only access once 
 per partition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5298) AvroSerde performance problem caused by HIVE-3833

2013-09-16 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-5298:
--

Attachment: HIVE-5298.patch

Initial patch. Running tests. Will submit patch if tests pass.

 AvroSerde performance problem caused by HIVE-3833
 -

 Key: HIVE-5298
 URL: https://issues.apache.org/jira/browse/HIVE-5298
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.11.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Fix For: 0.13.0

 Attachments: HIVE-5298.patch


 HIVE-3833 fixed the targeted problem and made Hive to use partition-level 
 metadata to initialize object inspector. In doing that, however, it goes thru 
 every file under the table to access the partition metadata, which is very 
 inefficient, especially in case of multiple files per partition. This causes 
 more problem for AvroSerde because AvroSerde initialization accesses schema, 
 which is located on file system. As a result, before hive can process any 
 data, it needs to access every file for a table, which can take long enough 
 to cause job failure because of lack of job progress.
 The improvement can be made so that partition metadata is only access once 
 per partition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5298) AvroSerde performance problem caused by HIVE-3833

[
https://issues.apache.org/jira/browse/HIVE-5298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768934#comment-13768934
]

Vikram Dixit K commented on HIVE-5298:
--

Sorry about the assignment change. Some accidental typing/clicking. I do not
know what caused it. Assigned back to Xuefu.

AvroSerde performance problem caused by HIVE-3833
-

Key: HIVE-5298
URL: https://issues.apache.org/jira/browse/HIVE-5298
Project: Hive
Issue Type: Improvement
Components: Query Processor
Affects Versions: 0.11.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
Fix For: 0.13.0

Attachments: HIVE-5298.patch

HIVE-3833 fixed the targeted problem and made Hive to use partition-level
metadata to initialize object inspector. In doing that, however, it goes thru
every file under the table to access the partition metadata, which is very
inefficient, especially in case of multiple files per partition. This causes
more problem for AvroSerde because AvroSerde initialization accesses schema,
which is located on file system. As a result, before hive can process any
data, it needs to access every file for a table, which can take long enough
to cause job failure because of lack of job progress.
The improvement can be made so that partition metadata is only access once
per partition.

Re: Review Request 14155: HIVE-5297 Hive does not honor type for partition columns

2013-09-16 Thread Sergey Shelukhin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14155/#review26159
---



common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
https://reviews.apache.org/r/14155/#comment51110

why false by default?



ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
https://reviews.apache.org/r/14155/#comment5

nit: the return seems pointless, it always returns the same map that caller 
already has



ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
https://reviews.apache.org/r/14155/#comment51112

wouldn't it re-put the entire map into itself as it stands now



ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
https://reviews.apache.org/r/14155/#comment51113

not really familiar with this flavor of trees; is val guaranteed to be the 
2nd argument? Can it be reverse.
Also nit above - checks for 0 children but not for 1.



ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
https://reviews.apache.org/r/14155/#comment51114

nit: use entrySet?



ql/src/test/queries/clientnegative/illegal_partition_type.q
https://reviews.apache.org/r/14155/#comment51115

local path, also below in other q files


- Sergey Shelukhin


On Sept. 16, 2013, 9:05 p.m., Vikram Dixit Kumaraswamy wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/14155/
 ---
 
 (Updated Sept. 16, 2013, 9:05 p.m.)
 
 
 Review request for hive and Ashutosh Chauhan.
 
 
 Bugs: HIVE-5297
 https://issues.apache.org/jira/browse/HIVE-5297
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Hive does not consider the type of the partition column while writing 
 partitions. Consider for example the query:
 
 create table tab1 (id1 int, id2 string) PARTITIONED BY(month string,day int) 
 row format delimited fields terminated by ',';
 alter table tab1 add partition (month='June', day='second');
 
 Hive accepts this query. However if you try to select from this table and 
 insert into another expecting schema match, it will insert nulls instead.
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1af68a6 
   ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 393ef57 
   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 2ece97e 
   ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 
 a704462 
   ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g ca667d4 
   ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java 
 767f545 
   ql/src/test/queries/clientnegative/illegal_partition_type.q PRE-CREATION 
   ql/src/test/queries/clientnegative/illegal_partition_type2.q PRE-CREATION 
   ql/src/test/queries/clientpositive/partition_type_check.q PRE-CREATION 
   ql/src/test/results/clientnegative/illegal_partition_type.q.out 
 PRE-CREATION 
   ql/src/test/results/clientnegative/illegal_partition_type2.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/parititon_type_check.q.out PRE-CREATION 
   ql/src/test/results/clientpositive/partition_type_check.q.out PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/14155/diff/
 
 
 Testing
 ---
 
 Ran all tests.
 
 
 Thanks,
 
 Vikram Dixit Kumaraswamy

[jira] [Commented] (HIVE-5297) Hive does not honor type for partition columns


[ 
https://issues.apache.org/jira/browse/HIVE-5297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768940#comment-13768940
 ] 

Sergey Shelukhin commented on HIVE-5297:


some comments on rb

 Hive does not honor type for partition columns
 --

 Key: HIVE-5297
 URL: https://issues.apache.org/jira/browse/HIVE-5297
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.11.0
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Attachments: HIVE-5297.1.patch


 Hive does not consider the type of the partition column while writing 
 partitions. Consider for example the query:
 {noformat}
 create table tab1 (id1 int, id2 string) PARTITIONED BY(month string,day int) 
 row format delimited fields terminated by ',';
 alter table tab1 add partition (month='June', day='second');
 {noformat}
 Hive accepts this query. However if you try to select from this table and 
 insert into another expecting schema match, it will insert nulls instead. We 
 should throw an exception on such user error at the time the partition 
 addition/load happens.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5246) Local task for map join submitted via oozie job fails on a secure HDFS


[ 
https://issues.apache.org/jira/browse/HIVE-5246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768964#comment-13768964
 ] 

Hive QA commented on HIVE-5246:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12603400/HIVE-5246.1.patch

{color:green}SUCCESS:{color} +1 3097 tests passed

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/767/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/767/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

  Local task for map join submitted via oozie job fails on a secure HDFS
 ---

 Key: HIVE-5246
 URL: https://issues.apache.org/jira/browse/HIVE-5246
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0, 0.11.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Attachments: HIVE-5246.1.patch, HIVE-5246-test.tar


 For a Hive query started by Oozie Hive action, the local task submitted for 
 Mapjoin fails. The HDFS delegation token is not shared properly with the 
 child JVM created for the local task.
 Oozie creates a delegation token for the Hive action and sets env variable 
 HADOOP_TOKEN_FILE_LOCATION as well as mapreduce.job.credentials.binary config 
 property. However this doesn't get passed down to the child JVM which causes 
 the problem.
 This is similar issue addressed by HIVE-4343 which address the problem 
 HiveServer2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4961) Create bridge for custom UDFs to operate in vectorized mode

2013-09-16 Thread Eric Hanson (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768969#comment-13768969
 ] 

Eric Hanson commented on HIVE-4961:
---

I ran the failing tests on my machine on a clean version of the vectorization 
branch without my patch. These tests failed:

org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_plan_json
org.apache.hadoop.hive.ql.io.orc.TestFileDump.testDictionaryThreshold
org.apache.hadoop.hive.ql.io.orc.TestFileDump.testDump

These tests would not run in a way that produced output in ant testreport, and 
my changes should not affect them.

org.apache.hcatalog.listener.TestNotificationListener.testAMQListener
org.apache.hive.hcatalog.fileformats.TestOrcDynamicPartitioned.testHCatDynamicPartitionedTable
org.apache.hive.hcatalog.fileformats.TestOrcDynamicPartitioned.testHCatDynamicPartitionedTableMultipleTask
org.apache.hive.hcatalog.pig.TestHCatStorer.testPartColsInData
org.apache.hive.hcatalog.pig.TestHCatStorer.testStoreInPartiitonedTbl
org.apache.hive.hcatalog.pig.TestHCatStorer.testStoreMultiTables
org.apache.hive.hcatalog.pig.TestHCatStorer.testStoreWithNoSchema


 Create bridge for custom UDFs to operate in vectorized mode
 ---

 Key: HIVE-4961
 URL: https://issues.apache.org/jira/browse/HIVE-4961
 Project: Hive
  Issue Type: Sub-task
Affects Versions: vectorization-branch
Reporter: Eric Hanson
Assignee: Eric Hanson
 Fix For: vectorization-branch

 Attachments: HIVE-4961.1-vectorization.patch, 
 HIVE-4961.2-vectorization.patch, HIVE-4961.3-vectorization.patch, 
 vectorUDF.4.patch, vectorUDF.5.patch, vectorUDF.8.patch, vectorUDF.9.patch


 Suppose you have a custom UDF myUDF() that you've created to extend hive. The 
 goal of this JIRA is to create a facility where if you run a query that uses 
 myUDF() in an expression, the query will run in vectorized mode.
 This would be a general-purpose bridge for custom UDFs that users add to 
 Hive. It would work with existing UDFs.
 I'm considering a separate JIRA for a new kind of custom UDF implementation 
 that is vectorized from the beginning, to optimize performance. That is not 
 covered by this JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5167) webhcat_config.sh checks for env variables being set before sourcing webhcat-env.sh


[ 
https://issues.apache.org/jira/browse/HIVE-5167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768977#comment-13768977
 ] 

Eugene Koifman commented on HIVE-5167:
--

[~thejas]Couldn't HIVE_HOME logic be simpler?  In pseudo code:

if(!isset(HIVE_HOME)) {
   set HIVE_HOME = DEFAULT_HIVE_HOME
else {
   //do nothing; just use this assuming the user set this intentionally
}
// may optionally check that HIVE_HOME/bin/hive exists


 webhcat_config.sh checks for env variables being set before sourcing 
 webhcat-env.sh
 ---

 Key: HIVE-5167
 URL: https://issues.apache.org/jira/browse/HIVE-5167
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Affects Versions: 0.12.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-5167.1.patch, HIVE-5167.2.patch


 HIVE-4820 introduced checks for env variables, but it does so before sourcing 
 webhcat-env.sh. This order needs to be reversed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5246) Local task for map join submitted via oozie job fails on a secure HDFS


[ 
https://issues.apache.org/jira/browse/HIVE-5246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768979#comment-13768979
 ] 

Brock Noland commented on HIVE-5246:


+1

  Local task for map join submitted via oozie job fails on a secure HDFS
 ---

 Key: HIVE-5246
 URL: https://issues.apache.org/jira/browse/HIVE-5246
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0, 0.11.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Attachments: HIVE-5246.1.patch, HIVE-5246-test.tar


 For a Hive query started by Oozie Hive action, the local task submitted for 
 Mapjoin fails. The HDFS delegation token is not shared properly with the 
 child JVM created for the local task.
 Oozie creates a delegation token for the Hive action and sets env variable 
 HADOOP_TOKEN_FILE_LOCATION as well as mapreduce.job.credentials.binary config 
 property. However this doesn't get passed down to the child JVM which causes 
 the problem.
 This is similar issue addressed by HIVE-4343 which address the problem 
 HiveServer2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5167) webhcat_config.sh checks for env variables being set before sourcing webhcat-env.sh


[ 
https://issues.apache.org/jira/browse/HIVE-5167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768982#comment-13768982
 ] 

Thejas M Nair commented on HIVE-5167:
-

Setting HIVE_HOME to DEFAULT_HIVE_HOME, if DEFAULT_HIVE_HOME location is not 
valid will break hcat scripts. THis is because they will assume the user knows 
best and try to use the already set HIVE_HOME location.


 webhcat_config.sh checks for env variables being set before sourcing 
 webhcat-env.sh
 ---

 Key: HIVE-5167
 URL: https://issues.apache.org/jira/browse/HIVE-5167
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Affects Versions: 0.12.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-5167.1.patch, HIVE-5167.2.patch


 HIVE-4820 introduced checks for env variables, but it does so before sourcing 
 webhcat-env.sh. This order needs to be reversed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5167) webhcat_config.sh checks for env variables being set before sourcing webhcat-env.sh


[ 
https://issues.apache.org/jira/browse/HIVE-5167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768986#comment-13768986
 ] 

Eugene Koifman commented on HIVE-5167:
--

OK, so that is what 
// may optionally check that HIVE_HOME/bin/hive exists

would do

 webhcat_config.sh checks for env variables being set before sourcing 
 webhcat-env.sh
 ---

 Key: HIVE-5167
 URL: https://issues.apache.org/jira/browse/HIVE-5167
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Affects Versions: 0.12.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-5167.1.patch, HIVE-5167.2.patch


 HIVE-4820 introduced checks for env variables, but it does so before sourcing 
 webhcat-env.sh. This order needs to be reversed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5167) webhcat_config.sh checks for env variables being set before sourcing webhcat-env.sh


[ 
https://issues.apache.org/jira/browse/HIVE-5167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769010#comment-13769010
 ] 

Eugene Koifman commented on HIVE-5167:
--

hcat script also sets a default for HIVE_HOME (if not set already) so in 
webhcat_config.sh we should only set HIVE_HOME if it contains bin/hive

 webhcat_config.sh checks for env variables being set before sourcing 
 webhcat-env.sh
 ---

 Key: HIVE-5167
 URL: https://issues.apache.org/jira/browse/HIVE-5167
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Affects Versions: 0.12.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-5167.1.patch, HIVE-5167.2.patch


 HIVE-4820 introduced checks for env variables, but it does so before sourcing 
 webhcat-env.sh. This order needs to be reversed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5167) webhcat_config.sh checks for env variables being set before sourcing webhcat-env.sh


[ 
https://issues.apache.org/jira/browse/HIVE-5167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769027#comment-13769027
 ] 

Eugene Koifman commented on HIVE-5167:
--


The existing patch already does what I say in my last comment

+1

 webhcat_config.sh checks for env variables being set before sourcing 
 webhcat-env.sh
 ---

 Key: HIVE-5167
 URL: https://issues.apache.org/jira/browse/HIVE-5167
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Affects Versions: 0.12.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-5167.1.patch, HIVE-5167.2.patch


 HIVE-4820 introduced checks for env variables, but it does so before sourcing 
 webhcat-env.sh. This order needs to be reversed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4961) Create bridge for custom UDFs to operate in vectorized mode

[
https://issues.apache.org/jira/browse/HIVE-4961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769044#comment-13769044
]

Ashutosh Chauhan commented on HIVE-4961:

hcatalog tests are flaky and we can ignore them. But, none of hive tests fail
in trunk. Its not likely related to your patch though. I have seen {{input4.q}}
and {{plan_json.q}} to fail consistently only on vectorization branch, so they
need to be debugged on branch. orc tests I am not sure, but if they fail
regardless of patch, I think this patch is good to go.

Create bridge for custom UDFs to operate in vectorized mode
---

Attachments: HIVE-4961.1-vectorization.patch,
HIVE-4961.2-vectorization.patch, HIVE-4961.3-vectorization.patch,
vectorUDF.4.patch, vectorUDF.5.patch, vectorUDF.8.patch, vectorUDF.9.patch

[jira] [Commented] (HIVE-4732) Reduce or eliminate the expensive Schema equals() check for AvroSerde

2013-09-16 Thread Mohammad Kamrul Islam (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769049#comment-13769049
]

Mohammad Kamrul Islam commented on HIVE-4732:
-

[~appodictic]: I can see your point. Indeed a very informative link.
As the link mentioned, the probability of ID collisions are very very rare.
Pasted from wikipedia:
To put these numbers into perspective, the annual risk of someone being hit by
a meteorite is estimated to be one chance in 17 billion,[38] which means the
probability is about 0.006 (6 × 10−11), equivalent to the odds of
creating a few tens of trillions of UUIDs in a year and having one duplicate.
In other words, only after generating 1 billion UUIDs every second for the next
100 years, the probability of creating just one duplicate would be about 50%.
The probability of one duplicate would be about 50% if every person on earth
owns 600 million UUIDs.

With these probability, will it be necessary to make thing complex. Moreover,
these IDs are often few in one hive session.

Reduce or eliminate the expensive Schema equals() check for AvroSerde
-

Key: HIVE-4732
URL: https://issues.apache.org/jira/browse/HIVE-4732
Project: Hive
Issue Type: Improvement
Components: Serializers/Deserializers
Reporter: Mark Wagner
Assignee: Mohammad Kamrul Islam
Attachments: HIVE-4732.1.patch, HIVE-4732.4.patch,
HIVE-4732.v1.patch, HIVE-4732.v4.patch

The AvroSerde spends a significant amount of time checking schema equality.
Changing to compare hashcodes (which can be computed once then reused) will
improve performance.

[jira] [Updated] (HIVE-4732) Reduce or eliminate the expensive Schema equals() check for AvroSerde

2013-09-16 Thread Mohammad Kamrul Islam (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-4732:


Attachment: HIVE-4732.5.patch

Fixed the failed testcase.

 Reduce or eliminate the expensive Schema equals() check for AvroSerde
 -

 Key: HIVE-4732
 URL: https://issues.apache.org/jira/browse/HIVE-4732
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Mark Wagner
Assignee: Mohammad Kamrul Islam
 Attachments: HIVE-4732.1.patch, HIVE-4732.4.patch, HIVE-4732.5.patch, 
 HIVE-4732.v1.patch, HIVE-4732.v4.patch


 The AvroSerde spends a significant amount of time checking schema equality. 
 Changing to compare hashcodes (which can be computed once then reused) will 
 improve performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4512) The vectorized plan is not picking right expression class for string concatenation.


[ 
https://issues.apache.org/jira/browse/HIVE-4512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769053#comment-13769053
 ] 

Hive QA commented on HIVE-4512:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12603399/HIVE-4512.3-vectorization.patch

{color:red}ERROR:{color} -1 due to 17 failed/errored test(s), 3951 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_plan_json
org.apache.hadoop.hive.ql.io.orc.TestFileDump.testDictionaryThreshold
org.apache.hadoop.hive.ql.io.orc.TestFileDump.testDump
org.apache.hcatalog.api.TestHCatClient.testBasicDDLCommands
org.apache.hcatalog.api.TestHCatClient.testPartitionsHCatClientImpl
org.apache.hive.hcatalog.api.TestHCatClient.testBasicDDLCommands
org.apache.hive.hcatalog.api.TestHCatClient.testDatabaseLocation
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSchema
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionsHCatClientImpl
org.apache.hive.hcatalog.fileformats.TestOrcDynamicPartitioned.testHCatDynamicPartitionedTableMultipleTask
org.apache.hive.hcatalog.mapreduce.TestHCatExternalDynamicPartitioned.testHCatDynamicPartitionedTable
org.apache.hive.hcatalog.mapreduce.TestHCatExternalDynamicPartitioned.testHCatDynamicPartitionedTableMultipleTask
org.apache.hive.hcatalog.mapreduce.TestHCatExternalPartitioned.testHCatPartitionedTable
org.apache.hive.hcatalog.pig.TestHCatLoader.testGetInputBytes
org.apache.hive.hcatalog.pig.TestHCatLoader.testProjectionsBasic
org.apache.hive.hcatalog.pig.TestHCatLoader.testReadPartitionedBasic
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/768/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/768/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests failed with: TestsFailedException: 17 tests failed
{noformat}

This message is automatically generated.

 The vectorized plan is not picking right expression class for string 
 concatenation.
 ---

 Key: HIVE-4512
 URL: https://issues.apache.org/jira/browse/HIVE-4512
 Project: Hive
  Issue Type: Sub-task
Affects Versions: vectorization-branch
Reporter: Jitendra Nath Pandey
Assignee: Eric Hanson
 Attachments: HIVE-4512.1-vectorization.patch, 
 HIVE-4512.2-vectorization.patch, HIVE-4512.3-vectorization.patch


 The vectorized plan is not picking right expression class for string 
 concatenation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5279) Kryo cannot instantiate GenericUDAFEvaluator in GroupByDesc

2013-09-16 Thread Navis (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769056#comment-13769056
 ] 

Navis commented on HIVE-5279:
-

Oh, I'll check that.

 Kryo cannot instantiate GenericUDAFEvaluator in GroupByDesc
 ---

 Key: HIVE-5279
 URL: https://issues.apache.org/jira/browse/HIVE-5279
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Critical
 Attachments: 5279.patch, D12963.1.patch


 We didn't forced GenericUDAFEvaluator to be Serializable. I don't know how 
 previous serialization mechanism solved this but, kryo complaints that it's 
 not Serializable and fails the query.
 The log below is the example, 
 {noformat}
 java.lang.RuntimeException: com.esotericsoftware.kryo.KryoException: Class 
 cannot be created (missing no-arg constructor): 
 org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector
 Serialization trace:
 inputOI 
 (org.apache.hadoop.hive.ql.udf.generic.GenericUDAFGroupOn$VersionedFloatGroupOnEval)
 genericUDAFEvaluator (org.apache.hadoop.hive.ql.plan.AggregationDesc)
 aggregators (org.apache.hadoop.hive.ql.plan.GroupByDesc)
 conf (org.apache.hadoop.hive.ql.exec.GroupByOperator)
 childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
 childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
 aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
   at 
 org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:312)
   at 
 org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:261)
   at 
 org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:256)
   at 
 org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:383)
   at org.apache.h
 {noformat}
 If this cannot be fixed in somehow, some UDAFs should be modified to be run 
 on hive-0.13.0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5297) Hive does not honor type for partition columns


 [ 
https://issues.apache.org/jira/browse/HIVE-5297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-5297:
-

Attachment: HIVE-5297.2.patch

 Hive does not honor type for partition columns
 --

 Key: HIVE-5297
 URL: https://issues.apache.org/jira/browse/HIVE-5297
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.11.0
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Attachments: HIVE-5297.1.patch, HIVE-5297.2.patch


 Hive does not consider the type of the partition column while writing 
 partitions. Consider for example the query:
 {noformat}
 create table tab1 (id1 int, id2 string) PARTITIONED BY(month string,day int) 
 row format delimited fields terminated by ',';
 alter table tab1 add partition (month='June', day='second');
 {noformat}
 Hive accepts this query. However if you try to select from this table and 
 insert into another expecting schema match, it will insert nulls instead. We 
 should throw an exception on such user error at the time the partition 
 addition/load happens.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: Review Request 14155: HIVE-5297 Hive does not honor type for partition columns

2013-09-16 Thread Vikram Dixit Kumaraswamy


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14155/
---

(Updated Sept. 17, 2013, 1:14 a.m.)


Review request for hive and Ashutosh Chauhan.


Changes
---

Addressed Sergey's comments.


Bugs: HIVE-5297
https://issues.apache.org/jira/browse/HIVE-5297


Repository: hive-git


Description
---

Hive does not consider the type of the partition column while writing 
partitions. Consider for example the query:

create table tab1 (id1 int, id2 string) PARTITIONED BY(month string,day int) 
row format delimited fields terminated by ',';
alter table tab1 add partition (month='June', day='second');

Hive accepts this query. However if you try to select from this table and 
insert into another expecting schema match, it will insert nulls instead.


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1af68a6 
  ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 393ef57 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 2ece97e 
  ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java a704462 
  ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java fb79823 
  ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g ca667d4 
  ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java 767f545 
  ql/src/test/queries/clientnegative/illegal_partition_type.q PRE-CREATION 
  ql/src/test/queries/clientnegative/illegal_partition_type2.q PRE-CREATION 
  ql/src/test/queries/clientpositive/partition_type_check.q PRE-CREATION 
  ql/src/test/results/clientnegative/illegal_partition_type.q.out PRE-CREATION 
  ql/src/test/results/clientnegative/illegal_partition_type2.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/parititon_type_check.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/partition_type_check.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/14155/diff/


Testing
---

Ran all tests.


Thanks,

Vikram Dixit Kumaraswamy

[jira] [Commented] (HIVE-5297) Hive does not honor type for partition columns


[ 
https://issues.apache.org/jira/browse/HIVE-5297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769059#comment-13769059
 ] 

Vikram Dixit K commented on HIVE-5297:
--

Second iteration.

 Hive does not honor type for partition columns
 --

 Key: HIVE-5297
 URL: https://issues.apache.org/jira/browse/HIVE-5297
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.11.0
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Attachments: HIVE-5297.1.patch, HIVE-5297.2.patch


 Hive does not consider the type of the partition column while writing 
 partitions. Consider for example the query:
 {noformat}
 create table tab1 (id1 int, id2 string) PARTITIONED BY(month string,day int) 
 row format delimited fields terminated by ',';
 alter table tab1 add partition (month='June', day='second');
 {noformat}
 Hive accepts this query. However if you try to select from this table and 
 insert into another expecting schema match, it will insert nulls instead. We 
 should throw an exception on such user error at the time the partition 
 addition/load happens.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-5299) allow exposing metastore APIs from HiveServer2 with embedded metastore