[jira] Commented: (HIVE-537) Hive TypeInfo/ObjectInspector to support union (besides struct, array, and map)

2010-10-06 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12918740#action_12918740
 ] 

Namit Jain commented on HIVE-537:
-

Otherwise it looks good to me

> Hive TypeInfo/ObjectInspector to support union (besides struct, array, and 
> map)
> ---
>
> Key: HIVE-537
> URL: https://issues.apache.org/jira/browse/HIVE-537
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Zheng Shao
>Assignee: Amareshwari Sriramadasu
> Fix For: 0.7.0
>
> Attachments: HIVE-537.1.patch, patch-537-1.txt, patch-537-2.txt, 
> patch-537-3.txt, patch-537-4.txt, patch-537.txt
>
>
> There are already some cases inside the code that we use heterogeneous data: 
> JoinOperator, and UnionOperator (in the sense that different parents can pass 
> in records with different ObjectInspectors).
> We currently use Operator's parentID to distinguish that. However that 
> approach does not extend to more complex plans that might be needed in the 
> future.
> We will support the union type like this:
> {code}
> TypeDefinition:
>   type: primitivetype | structtype | arraytype | maptype | uniontype
>   uniontype: "union" "<" tag ":" type ("," tag ":" type)* ">"
> Example:
>   union<0:int,1:double,2:array,3:struct>
> Example of serialized data format:
>   We will first store the tag byte before we serialize the object. On 
> deserialization, we will first read out the tag byte, then we know what is 
> the current type of the following object, so we can deserialize it 
> successfully.
> Interface for ObjectInspector:
> interface UnionObjectInspector {
>   /** Returns the array of OIs that are for each of the tags
>*/
>   ObjectInspector[] getObjectInspectors();
>   /** Return the tag of the object.
>*/
>   byte getTag(Object o);
>   /** Return the field based on the tag value associated with the Object.
>*/
>   Object getField(Object o);
> };
> An example serialization format (Using deliminated format, with ' ' as 
> first-level delimitor and '=' as second-level delimitor)
> userid:int,log:union<0:struct>,1:string>
> 123 1=login
> 123 0=243=helloworld
> 123 1=logout
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-537) Hive TypeInfo/ObjectInspector to support union (besides struct, array, and map)

2010-10-06 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12918739#action_12918739
 ] 

Namit Jain commented on HIVE-537:
-

I will review again: few initial comments:

1. Constants.java is a generated file ? Can you change serde/if/serde.thrift
2. desc extended for create_union is not detailed enough ?

> Hive TypeInfo/ObjectInspector to support union (besides struct, array, and 
> map)
> ---
>
> Key: HIVE-537
> URL: https://issues.apache.org/jira/browse/HIVE-537
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Zheng Shao
>Assignee: Amareshwari Sriramadasu
> Fix For: 0.7.0
>
> Attachments: HIVE-537.1.patch, patch-537-1.txt, patch-537-2.txt, 
> patch-537-3.txt, patch-537-4.txt, patch-537.txt
>
>
> There are already some cases inside the code that we use heterogeneous data: 
> JoinOperator, and UnionOperator (in the sense that different parents can pass 
> in records with different ObjectInspectors).
> We currently use Operator's parentID to distinguish that. However that 
> approach does not extend to more complex plans that might be needed in the 
> future.
> We will support the union type like this:
> {code}
> TypeDefinition:
>   type: primitivetype | structtype | arraytype | maptype | uniontype
>   uniontype: "union" "<" tag ":" type ("," tag ":" type)* ">"
> Example:
>   union<0:int,1:double,2:array,3:struct>
> Example of serialized data format:
>   We will first store the tag byte before we serialize the object. On 
> deserialization, we will first read out the tag byte, then we know what is 
> the current type of the following object, so we can deserialize it 
> successfully.
> Interface for ObjectInspector:
> interface UnionObjectInspector {
>   /** Returns the array of OIs that are for each of the tags
>*/
>   ObjectInspector[] getObjectInspectors();
>   /** Return the tag of the object.
>*/
>   byte getTag(Object o);
>   /** Return the field based on the tag value associated with the Object.
>*/
>   Object getField(Object o);
> };
> An example serialization format (Using deliminated format, with ' ' as 
> first-level delimitor and '=' as second-level delimitor)
> userid:int,log:union<0:struct>,1:string>
> 123 1=login
> 123 0=243=helloworld
> 123 1=logout
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1697) Migration scripts should increase size of PARAM_VALUE in PARTITION_PARAMS

2010-10-06 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1697:
-

   Resolution: Fixed
Fix Version/s: 0.7.0
   0.6.0
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

Committed in trunk and 0.6 - Thanks Paul

> Migration scripts should increase size of PARAM_VALUE in PARTITION_PARAMS
> -
>
> Key: HIVE-1697
> URL: https://issues.apache.org/jira/browse/HIVE-1697
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.7.0
>Reporter: Paul Yang
>Assignee: Paul Yang
> Fix For: 0.6.0, 0.7.0
>
> Attachments: HIVE-1697.1.patch
>
>
> The migration scripts should increase the size of column PARAM_VALUE in the 
> table PARTITION_PARAMS to 4000 chars to follow the description in package.jdo.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HIVE-1427) Provide metastore schema migration scripts (0.5 -> 0.6)

2010-10-06 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain resolved HIVE-1427.
--

   Resolution: Fixed
Fix Version/s: 0.7.0
 Hadoop Flags: [Reviewed]

Committed in both 0.6 and trunk. Thanks Carl

> Provide metastore schema migration scripts (0.5 -> 0.6)
> ---
>
> Key: HIVE-1427
> URL: https://issues.apache.org/jira/browse/HIVE-1427
> Project: Hadoop Hive
>  Issue Type: Task
>  Components: Metastore
>Reporter: Carl Steinbach
>Assignee: Carl Steinbach
> Fix For: 0.6.0, 0.7.0
>
> Attachments: HIVE-1427.1.patch.txt
>
>
> At a minimum this ticket covers packaging up example MySQL migration scripts 
> (cumulative across all schema changes from 0.5 to 0.6) and explaining what to 
> do with them in the release notes.
> This is also probably a good point at which to decide and clearly state which 
> Metastore DBs we officially support in production, e.g. do we need to provide 
> migration scripts for Derby?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HIVE-1693) Make the compile target depend on thrift.home

2010-10-06 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain resolved HIVE-1693.
--

  Resolution: Fixed
Hadoop Flags: [Reviewed]

Committed to both 0.6 and trunk. Thanks Eli

> Make the compile target depend on thrift.home
> -
>
> Key: HIVE-1693
> URL: https://issues.apache.org/jira/browse/HIVE-1693
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Build Infrastructure
>Affects Versions: 0.5.0
>Reporter: Eli Collins
>Priority: Minor
> Fix For: 0.6.0, 0.7.0
>
> Attachments: hive-1693-1.patch
>
>
> Per http://wiki.apache.org/hadoop/Hive/HiveODBC the ant compile targets 
> require thrift.home be set. Rather than fail to compile fail with a message 
> indicating it should be set.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1693) Make the compile target depend on thrift.home

2010-10-06 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12918679#action_12918679
 ] 

Namit Jain commented on HIVE-1693:
--

+1

> Make the compile target depend on thrift.home
> -
>
> Key: HIVE-1693
> URL: https://issues.apache.org/jira/browse/HIVE-1693
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Build Infrastructure
>Affects Versions: 0.5.0
>Reporter: Eli Collins
>Priority: Minor
> Fix For: 0.6.0, 0.7.0
>
> Attachments: hive-1693-1.patch
>
>
> Per http://wiki.apache.org/hadoop/Hive/HiveODBC the ant compile targets 
> require thrift.home be set. Rather than fail to compile fail with a message 
> indicating it should be set.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1364) Increase the maximum length of various metastore fields, and remove TYPE_NAME from COLUMNS primary key

2010-10-06 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1364:
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Committed to both 0.6 and trunk. Thanks Carl

> Increase the maximum length of various metastore fields, and remove TYPE_NAME 
> from COLUMNS primary key
> --
>
> Key: HIVE-1364
> URL: https://issues.apache.org/jira/browse/HIVE-1364
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.5.0
>Reporter: Carl Steinbach
>Assignee: Carl Steinbach
> Fix For: 0.6.0, 0.7.0
>
> Attachments: HIVE-1364.2.patch.txt, 
> HIVE-1364.3.backport-060.patch.txt, HIVE-1364.3.patch.txt, 
> HIVE-1364.4.backport-060.patch.txt, HIVE-1364.4.patch.txt, HIVE-1364.patch
>
>
> The value component of a SERDEPROPERTIES key/value pair is currently limited
> to a maximum length of 767 characters. I believe that the motivation for 
> limiting the length to 
> 767 characters is that this value is the maximum allowed length of an index in
> a MySQL database running on the InnoDB engine: 
> http://bugs.mysql.com/bug.php?id=13315
> * The Metastore OR mapping currently limits many fields (including 
> SERDEPROPERTIES.PARAM_VALUE) to a maximum length of 767 characters despite 
> the fact that these fields are not indexed.
> * The maximum length of a VARCHAR value in MySQL 5.0.3 and later is 65,535.
> * We can expect many users to hit the 767 character limit on 
> SERDEPROPERTIES.PARAM_VALUE when using the hbase.columns.mapping 
> serdeproperty to map a table that has many columns.
> I propose increasing the maximum allowed length of 
> SERDEPROPERTIES.PARAM_VALUE to 8192.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1695) MapJoin followed by ReduceSink should be done as single MapReduce Job

2010-10-06 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12918603#action_12918603
 ] 

Namit Jain commented on HIVE-1695:
--

This would be a very useful optimization

> MapJoin followed by ReduceSink should be done as single MapReduce Job
> -
>
> Key: HIVE-1695
> URL: https://issues.apache.org/jira/browse/HIVE-1695
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Amareshwari Sriramadasu
>
> Currently MapJoin followed by ReduceSink runs as two MapReduce jobs : One map 
> only job followed by a Map-Reduce job. It can be combined into single 
> MapReduce Job.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1691) ANALYZE TABLE command should check columns in partition spec

2010-10-06 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1691:
-

   Resolution: Fixed
Fix Version/s: 0.7.0
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

Committed. Thanks Ning

> ANALYZE TABLE command should check columns in partition spec
> 
>
> Key: HIVE-1691
> URL: https://issues.apache.org/jira/browse/HIVE-1691
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.7.0
>
> Attachments: HIVE-1691.patch
>
>
> ANALYZE TABEL PARTITION (col1, col2,...) should check whether col1, col2 etc 
> are partition columns.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1691) ANALYZE TABLE command should check columns in partition spec

2010-10-05 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12918359#action_12918359
 ] 

Namit Jain commented on HIVE-1691:
--

I will take a look

> ANALYZE TABLE command should check columns in partition spec
> 
>
> Key: HIVE-1691
> URL: https://issues.apache.org/jira/browse/HIVE-1691
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-1691.patch
>
>
> ANALYZE TABEL PARTITION (col1, col2,...) should check whether col1, col2 etc 
> are partition columns.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1364) Increase the maximum length of various metastore fields, and remove TYPE_NAME from COLUMNS primary key

2010-10-05 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12918356#action_12918356
 ] 

Namit Jain commented on HIVE-1364:
--

We tested this on our production metastore db.
The downtime should be acceptable - I will start the tests

+1

> Increase the maximum length of various metastore fields, and remove TYPE_NAME 
> from COLUMNS primary key
> --
>
> Key: HIVE-1364
> URL: https://issues.apache.org/jira/browse/HIVE-1364
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.5.0
>Reporter: Carl Steinbach
>Assignee: Carl Steinbach
> Fix For: 0.6.0, 0.7.0
>
> Attachments: HIVE-1364.2.patch.txt, 
> HIVE-1364.3.backport-060.patch.txt, HIVE-1364.3.patch.txt, 
> HIVE-1364.4.backport-060.patch.txt, HIVE-1364.4.patch.txt, HIVE-1364.patch
>
>
> The value component of a SERDEPROPERTIES key/value pair is currently limited
> to a maximum length of 767 characters. I believe that the motivation for 
> limiting the length to 
> 767 characters is that this value is the maximum allowed length of an index in
> a MySQL database running on the InnoDB engine: 
> http://bugs.mysql.com/bug.php?id=13315
> * The Metastore OR mapping currently limits many fields (including 
> SERDEPROPERTIES.PARAM_VALUE) to a maximum length of 767 characters despite 
> the fact that these fields are not indexed.
> * The maximum length of a VARCHAR value in MySQL 5.0.3 and later is 65,535.
> * We can expect many users to hit the 767 character limit on 
> SERDEPROPERTIES.PARAM_VALUE when using the hbase.columns.mapping 
> serdeproperty to map a table that has many columns.
> I propose increasing the maximum allowed length of 
> SERDEPROPERTIES.PARAM_VALUE to 8192.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1691) ANALYZE TABLE command should check columns in partition spec

2010-10-05 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1691:
-

Summary: ANALYZE TABLE command should check columns in partition spec  
(was: ANALYZE TABLE command should check columns in partitioin spec)

> ANALYZE TABLE command should check columns in partition spec
> 
>
> Key: HIVE-1691
> URL: https://issues.apache.org/jira/browse/HIVE-1691
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-1691.patch
>
>
> ANALYZE TABEL PARTITION (col1, col2,...) should check whether col1, col2 etc 
> are partition columns.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1678) NPE in MapJoin

2010-10-04 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1678:
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Committed. Thanks Amareshwari

> NPE in MapJoin 
> ---
>
> Key: HIVE-1678
> URL: https://issues.apache.org/jira/browse/HIVE-1678
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Amareshwari Sriramadasu
>Assignee: Amareshwari Sriramadasu
> Attachments: patch-1678.txt
>
>
> The query with two map joins and a group by fails with following NPE:
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:177)
> at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
> at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697)
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:464)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1678) NPE in MapJoin

2010-10-04 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917684#action_12917684
 ] 

Namit Jain commented on HIVE-1678:
--

Nice catch - Thanks

+1

will commit if the tests pass

> NPE in MapJoin 
> ---
>
> Key: HIVE-1678
> URL: https://issues.apache.org/jira/browse/HIVE-1678
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Amareshwari Sriramadasu
>Assignee: Amareshwari Sriramadasu
> Attachments: patch-1678.txt
>
>
> The query with two map joins and a group by fails with following NPE:
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:177)
> at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
> at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:457)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:697)
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:464)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1546) Ability to plug custom Semantic Analyzers for Hive Grammar

2010-10-04 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917677#action_12917677
 ] 

Namit Jain commented on HIVE-1546:
--

I will take a look in more detail, but overall it looks good. I had the 
following comments:

1. Instead of TestSemanticAnalyzerHookLoading.java, add tests in 
test/queries/clientpositive and test/queries/clientnegative
2. Do you want to set the value of hive.semantic.analyzer.hook to a dummy value 
in data/conf/hive-site.xml for the unit tests ?
Can something meaningful be printed here, which can be used for comparing ?


> Ability to plug custom Semantic Analyzers for Hive Grammar
> --
>
> Key: HIVE-1546
> URL: https://issues.apache.org/jira/browse/HIVE-1546
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 0.7.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Fix For: 0.7.0
>
> Attachments: hive-1546-3.patch, hive-1546-4.patch, hive-1546.patch, 
> hive-1546_2.patch, hooks.patch, Howl_Semantic_Analysis.txt
>
>
> It will be useful if Semantic Analysis phase is made pluggable such that 
> other projects can do custom analysis of hive queries before doing metastore 
> operations on them. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1647) Incorrect initialization of thread local variable inside IOContext ( implementation is not threadsafe )

2010-10-04 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1647:
-

Status: Open  (was: Patch Available)

> Incorrect initialization of thread local variable inside IOContext ( 
> implementation is not threadsafe ) 
> 
>
> Key: HIVE-1647
> URL: https://issues.apache.org/jira/browse/HIVE-1647
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Server Infrastructure
>Affects Versions: 0.6.0, 0.7.0
>Reporter: Raman Grover
>Assignee: Liyin Tang
> Fix For: 0.7.0
>
> Attachments: HIVE-1647.patch
>
>   Original Estimate: 0.17h
>  Remaining Estimate: 0.17h
>
> Bug in org.apache.hadoop.hive.ql.io.IOContext
> in relation to initialization of thread local variable.
>  
> public class IOContext {
>  
>   private static ThreadLocal threadLocal = new 
> ThreadLocal(){ };
>  
>   static {
> if (threadLocal.get() == null) {
>   threadLocal.set(new IOContext());
> }
>   }
>  
> In a multi-threaded environment, the thread that gets to load the class first 
> for the JVM (assuming threads share the classloader),
> gets to initialize itself correctly by executing the code in the static 
> block. Once the class is loaded, 
> any subsequent threads would  have their respective threadlocal variable as 
> null.  Since IOContext
> is set during initialization of HiveRecordReader, In a scenario where 
> multiple threads get to acquire
>  an instance of HiveRecordReader, it would result in a NPE for all but the 
> first thread that gets to load the class in the VM.
>  
> Is the above scenario of multiple threads initializing HiveRecordReader a 
> typical one ?  or we could just provide the following fix...
>  
>   private static ThreadLocal threadLocal = new 
> ThreadLocal(){
> protected synchronized IOContext initialValue() {
>   return new IOContext();
> }  
>   };

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1673) Create table bug causes the row format property lost when serde is specified.

2010-10-01 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1673:
-

  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

Committed. Thanks Yongqiang

> Create table bug causes the row format property lost when serde is specified.
> -
>
> Key: HIVE-1673
> URL: https://issues.apache.org/jira/browse/HIVE-1673
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Fix For: 0.7.0
>
> Attachments: hive-1673.1.patch
>
>
> An example:
> create table src_rc_serde_yongqiang(key string, value string) ROW FORMAT  
> DELIMITED FIELDS TERMINATED BY '\\0' stored as rcfile; 
> will lost the row format information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1687) smb_mapjoin_8.q in TestMinimrCliDriver hangs/fails

2010-10-01 Thread Namit Jain (JIRA)
smb_mapjoin_8.q in TestMinimrCliDriver hangs/fails
--

 Key: HIVE-1687
 URL: https://issues.apache.org/jira/browse/HIVE-1687
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Joydeep Sen Sarma


The test never seems to succeed for me, although it is OK for many other people

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1683) Column aliases cannot be used in a group by clause

2010-10-01 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916953#action_12916953
 ] 

Namit Jain commented on HIVE-1683:
--

A workaround is to use the original expression:

select col1, count(col2) from test group by col1;

> Column aliases cannot be used in a group by clause
> --
>
> Key: HIVE-1683
> URL: https://issues.apache.org/jira/browse/HIVE-1683
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Shrikrishna Lawande
>
> Column aliases cannot be used in a group by clause
> Following query would fail :
> select col1 as t, count(col2) from test group by t;
> FAILED: Error in semantic analysis: line 1:49 Invalid Table Alias or Column 
> Reference t

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1673) Create table bug causes the row format property lost when serde is specified.

2010-10-01 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916951#action_12916951
 ] 

Namit Jain commented on HIVE-1673:
--

scriptfile1.q and smp_mapjoin_8.q in TestMiniMrCliDriver are also unrelated - 
filed independent jiras for them also.


> Create table bug causes the row format property lost when serde is specified.
> -
>
> Key: HIVE-1673
> URL: https://issues.apache.org/jira/browse/HIVE-1673
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Fix For: 0.7.0
>
> Attachments: hive-1673.1.patch
>
>
> An example:
> create table src_rc_serde_yongqiang(key string, value string) ROW FORMAT  
> DELIMITED FIELDS TERMINATED BY '\\0' stored as rcfile; 
> will lost the row format information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1685) scriptfile1.1 in minimr faling intermittently

2010-10-01 Thread Namit Jain (JIRA)
scriptfile1.1 in minimr faling intermittently
-

 Key: HIVE-1685
 URL: https://issues.apache.org/jira/browse/HIVE-1685
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Joydeep Sen Sarma


 [junit] Begin query: scriptfile1.q
[junit] diff -a -I file: -I pfile: -I hdfs: -I /tmp/ -I invalidscheme: -I 
lastUpdateTime -I lastAccessTime -I [Oo]wner -I CreateTime -I LastAccessTime -I 
Location -I transient_lastDdlTime -I last_modified_ -I 
java.lang.RuntimeException -I at org -I at sun -I at java -I at junit -I Caused 
by: -I [.][.][.] [0-9]* more 
/data/users/njain/hive_commit1/hive_commit1/build/ql/test/logs/clientpositive/scriptfile1.q.out
 
/data/users/njain/hive_commit1/hive_commit1/ql/src/test/results/clientpositive/scriptfile1.q.out
[junit] 1c1
[junit] < PREHOOK: query: CREATE TABLE scriptfile1_dest1(key INT, value 
STRING)
[junit] ---
[junit] > PREHOOK: query: CREATE TABLE dest1(key INT, value STRING)
[junit] 3c3
[junit] < POSTHOOK: query: CREATE TABLE scriptfile1_dest1(key INT, value 
STRING)
[junit] ---
[junit] > POSTHOOK: query: CREATE TABLE dest1(key INT, value STRING)
[junit] 5c5
[junit] < POSTHOOK: Output: defa...@scriptfile1_dest1
[junit] ---
[junit] > POSTHOOK: Output: defa...@dest1
[junit] 12c12
[junit] < INSERT OVERWRITE TABLE scriptfile1_dest1 SELECT tmap.tkey, 
tmap.tvalue
[junit] ---
[junit] junit.framework.AssertionFailedError: Client execution results 
failed with error code = 1
[junit] > INSERT OVERWRITE TABLE dest1 SELECT tmap.tkey, tmap.tvalue
[junit] See build/ql/tmp/hive.log, or try "ant test ... 
-Dtest.silent=false" to get more logs.
[junit] 15c15
[junit] at junit.framework.Assert.fail(Assert.java:47)
[junit] < PREHOOK: Output: defa...@scriptfile1_dest1
[junit] at 
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_scriptfile1(TestMinimrCliDriver.java:522)
[junit] ---
[junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[junit] > PREHOOK: Output: defa...@dest1
[junit] at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
[junit] 22c22
[junit] at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
[junit] < INSERT OVERWRITE TABLE scriptfile1_dest1 SELECT tmap.tkey, 
tmap.tvalue
[junit] at java.lang.reflect.Method.invoke(Method.java:597)
[junit] ---
[junit] at junit.framework.TestCase.runTest(TestCase.java:154)
[junit] > INSERT OVERWRITE TABLE dest1 SELECT tmap.tkey, tmap.tvalue
[junit] at junit.framework.TestCase.runBare(TestCase.java:127)
[junit] 25,28c25,28
[junit] at junit.framework.TestResult$1.protect(TestResult.java:106)
[junit] < POSTHOOK: Output: defa...@scriptfile1_dest1
[junit] at junit.framework.TestResult.runProtected(TestResult.java:124)
[junit] < POSTHOOK: Lineage: scriptfile1_dest1.key SCRIPT 
[(src)src.FieldSchema(name:key, type:string, comment:default), 
(src)src.FieldSchema(name:value, type:string, comment:default), ]
[junit] at junit.framework.TestResult.run(TestResult.java:109)
[junit] at junit.framework.TestCase.run(TestCase.java:118)
[junit] < POSTHOOK: Lineage: scriptfile1_dest1.value SCRIPT 
[(src)src.FieldSchema(name:key, type:string, comment:default), 
(src)src.FieldSchema(name:value, type:string, comment:default), ]
[junit] at junit.framework.TestSuite.runTest(TestSuite.java:208)
[junit] < PREHOOK: query: SELECT scriptfile1_dest1.* FROM scriptfile1_dest1
[junit] at junit.framework.TestSuite.run(TestSuite.java:203)
[junit] ---
[junit] at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:420)
[junit] > POSTHOOK: Output: defa...@dest1
[junit] at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:911)
[junit] > POSTHOOK: Lineage: dest1.key SCRIPT 
[(src)src.FieldSchema(name:key, type:string, comment:default), 
(src)src.FieldSchema(name:value, type:string, comment:default), ]
[junit] at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:768)
[junit] > POSTHOOK: Lineage: dest1.value SCRIPT 
[(src)src.FieldSchema(name:key, type:string, comment:default), 
(src)src.FieldSchema(name:value, type:string, comment:default), ]
[junit] > PREHOOK: query: SELECT dest1.* FROM dest1
[junit] 30,32c30,32
[junit] < PREHOOK: Input: defa...@scriptfile1_dest1
[junit] < PREHOOK: Output: 
hdfs://localhost.localdomain:59220/data/users/njain/hive_commit1/hive_commit1/build/ql/scratchdir/hive_2010-09-30_01-24-37_987_7722845044472176538/-mr-1
[junit] < POSTHOOK: query: SELECT scriptfile1_dest1.* FROM scriptfile1_des

[jira] Created: (HIVE-1684) intermittent failures in create_escape.q

2010-10-01 Thread Namit Jain (JIRA)
intermittent failures in create_escape.q


 Key: HIVE-1684
 URL: https://issues.apache.org/jira/browse/HIVE-1684
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: He Yongqiang


[junit] diff -a -I file: -I pfile: -I hdfs: -I /tmp/ -I invalidscheme: -I 
lastUpdateTime -I lastAccessTime -I [Oo]wner -I CreateTime -I LastAccessTime -I 
Location -I transient_lastDdlTime -I last_modified_ -I 
java.lang.RuntimeException -I at org -I at sun -I at java -I at junit -I Caused 
by: -I [.][.][.] [0-9]* more 
/data/users/njain/hive_commit1/hive_commit1/build/ql/test/logs/clientpositive/create_escape.q.out
 
/data/users/njain/hive_commit1/hive_commit1/ql/src/test/results/clientpositive/create_escape.q.out
[junit] 48d47
[junit] <   serialization.format\t  
[junit] 49a49
[junit] >   serialization.format\t  


Sometimes, I see the above failure. 

This does not happen always, and needs to be investigated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1673) Create table bug causes the row format property lost when serde is specified.

2010-10-01 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916944#action_12916944
 ] 

Namit Jain commented on HIVE-1673:
--

create_escape.q in TestCliDriver is failing intermittently, but has nothing to 
do with the current patch

> Create table bug causes the row format property lost when serde is specified.
> -
>
> Key: HIVE-1673
> URL: https://issues.apache.org/jira/browse/HIVE-1673
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Fix For: 0.7.0
>
> Attachments: hive-1673.1.patch
>
>
> An example:
> create table src_rc_serde_yongqiang(key string, value string) ROW FORMAT  
> DELIMITED FIELDS TERMINATED BY '\\0' stored as rcfile; 
> will lost the row format information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1667) Store the group of the owner of the table in metastore

2010-09-30 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916734#action_12916734
 ] 

Namit Jain commented on HIVE-1667:
--

There may be a superuser, say root, who is the owner of the parent dir (which 
will be warehouse dir).

If I create a table T, the group should be my group, not root. - The current 
BSD semantics force the group to be root

> Store the group of the owner of the table in metastore
> --
>
> Key: HIVE-1667
> URL: https://issues.apache.org/jira/browse/HIVE-1667
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Namit Jain
> Attachments: hive-1667.patch
>
>
> Currently, the group of the owner of the table is not stored in the metastore.
> Secondly, if you create a table, the table's owner group is set to the group 
> for the parent. It is not read from the UGI passed in.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1673) Create table bug causes the row format property lost when serde is specified.

2010-09-30 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916470#action_12916470
 ] 

Namit Jain commented on HIVE-1673:
--

The following tests failed:

create_escape.q


and 


scriptfile1.q
smp_mapjoin_8.q

in TestMiniMrCliDriver

> Create table bug causes the row format property lost when serde is specified.
> -
>
> Key: HIVE-1673
> URL: https://issues.apache.org/jira/browse/HIVE-1673
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Fix For: 0.7.0
>
> Attachments: hive-1673.1.patch
>
>
> An example:
> create table src_rc_serde_yongqiang(key string, value string) ROW FORMAT  
> DELIMITED FIELDS TERMINATED BY '\\0' stored as rcfile; 
> will lost the row format information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1670) MapJoin throws EOFExeption when the mapjoined table has 0 column selected

2010-09-30 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1670:
-

   Status: Resolved  (was: Patch Available)
 Hadoop Flags: [Reviewed]
Fix Version/s: 0.7.0
   Resolution: Fixed

Committed. Thanks Ning

> MapJoin throws EOFExeption when the mapjoined table has 0 column selected
> -
>
> Key: HIVE-1670
> URL: https://issues.apache.org/jira/browse/HIVE-1670
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.7.0
>
> Attachments: HIVE-1670.patch
>
>
> select /*+mapjoin(b) */ sum(a.key) from src a join src b on (a.key=b.key); 
> throws EOFException

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1638) convert commonly used udfs to generic udfs

2010-09-29 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916208#action_12916208
 ] 

Namit Jain commented on HIVE-1638:
--

great results - I will review the patch

> convert commonly used udfs to generic udfs
> --
>
> Key: HIVE-1638
> URL: https://issues.apache.org/jira/browse/HIVE-1638
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Siying Dong
> Attachments: HIVE-1638.1.patch
>
>
> Copying a mail from Joy:
> i did a little bit of profiling of a simple hive group by query today. i was 
> surprised to see that one of the most expensive functions were in converting 
> the equals udf (i had some simple string filters) to generic udfs. 
> (primitiveobjectinspectorconverter.textconverter)
> am i correct in thinking that the fix is to simply port some of the most 
> popular udfs (string equality/comparison etc.) to generic udsf?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1658) Fix describe [extended] column formatting

2010-09-29 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916207#action_12916207
 ] 

Namit Jain commented on HIVE-1658:
--

Thiruvel, any updates on this - we need it urgently in order to deploy HIVE-558


> Fix describe [extended] column formatting
> -
>
> Key: HIVE-1658
> URL: https://issues.apache.org/jira/browse/HIVE-1658
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Paul Yang
>Assignee: Thiruvel Thirumoolan
>
> When displaying the column schema, the formatting should follow should be 
> nametypecomment
> to be inline with the previous formatting style for backward compatibility.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1670) MapJoin throws EOFExeption when the mapjoined table has 0 column selected

2010-09-29 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916205#action_12916205
 ] 

Namit Jain commented on HIVE-1670:
--

+1

> MapJoin throws EOFExeption when the mapjoined table has 0 column selected
> -
>
> Key: HIVE-1670
> URL: https://issues.apache.org/jira/browse/HIVE-1670
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-1670.patch
>
>
> select /*+mapjoin(b) */ sum(a.key) from src a join src b on (a.key=b.key); 
> throws EOFException

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1642) Convert join queries to map-join based on size of table/row

2010-09-28 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915856#action_12915856
 ] 

Namit Jain commented on HIVE-1642:
--

yes

> Convert join queries to map-join based on size of table/row
> ---
>
> Key: HIVE-1642
> URL: https://issues.apache.org/jira/browse/HIVE-1642
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Liyin Tang
> Fix For: 0.7.0
>
>
> Based on the number of rows and size of each table, Hive should automatically 
> be able to convert a join into map-join.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1157) UDFs can't be loaded via "add jar" when jar is on HDFS

2010-09-27 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915619#action_12915619
 ] 

Namit Jain commented on HIVE-1157:
--

The changes looked good, but I got the following error:

[junit] Begin query: alter1.q
[junit] diff -a -I file: -I pfile: -I hdfs: -I /tmp/ -I invalidscheme: -I 
lastUpdateTime -I lastAccessTime -I [Oo]wner -I CreateTime -I LastAccessTime -I 
Location -I transient_lastDdlTime -I last_modified_ -I 
java.lang.RuntimeException -I at org -I at sun -I at java -I at junit -I Caused 
by: -I [.][.][.] [0-9]* more 
/data/users/njain/hive_commit2/hive_commit2/build/ql/test/logs/clientpositive/alter1.q.out
 
/data/users/njain/hive_commit2/hive_commit2/ql/src/test/results/clientpositive/alter1.q.out
[junit] 778d777
[junit] < Resource ../data/files/TestSerDe.jar already added.


Philip, can you take care of that ?

> UDFs can't be loaded via "add jar" when jar is on HDFS
> --
>
> Key: HIVE-1157
> URL: https://issues.apache.org/jira/browse/HIVE-1157
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Philip Zeyliger
>Priority: Minor
> Attachments: hive-1157.patch.txt, HIVE-1157.patch.v3.txt, 
> HIVE-1157.patch.v4.txt, HIVE-1157.patch.v5.txt, HIVE-1157.v2.patch.txt, 
> output.txt
>
>
> As discussed on the mailing list, it would be nice if you could use UDFs that 
> are on jars on HDFS.  The proposed implementation would be for "add jar" to 
> recognize that the target file is on HDFS, copy it locally, and load it into 
> the classpath.
> {quote}
> Hi folks,
> I have a quick question about UDF support in Hive.  I'm on the 0.5 branch.  
> Can you use a UDF where the jar which contains the function is on HDFS, and 
> not on the local filesystem.  Specifically, the following does not seem to 
> work:
> # This is Hive 0.5, from svn
> $bin/hive  
> Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt
> hive> add jar hdfs://localhost/FooTest.jar;   
>
> Added hdfs://localhost/FooTest.jar to class path
> hive> create temporary function cube as 'com.cloudera.FooTestUDF';
> 
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.FunctionTask
> Does this work for other people?  I could probably fix it by changing "add 
> jar" to download remote jars locally, when necessary (to load them into the 
> classpath), or update URLClassLoader (or whatever is underneath there) to 
> read directly from HDFS, which seems a bit more fragile.  But I wanted to 
> make sure that my interpretation of what's going on is right before I have at 
> it.
> Thanks,
> -- Philip
> {quote}
> {quote}
> Yes that's correct. I prefer to download the jars in "add jar".
> Zheng
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HIVE-1671) multithreading on Context.pathToCS

2010-09-27 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain resolved HIVE-1671.
--

Hadoop Flags: [Reviewed]
  Resolution: Fixed

Committed. Thanks Bennie

> multithreading on Context.pathToCS
> --
>
> Key: HIVE-1671
> URL: https://issues.apache.org/jira/browse/HIVE-1671
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Bennie Schut
>Assignee: Bennie Schut
> Fix For: 0.7.0
>
> Attachments: HIVE-1671-1.patch
>
>
> we having 2 threads running at 100%
> With a stacktrace like this:
> "Thread-16725" prio=10 tid=0x7ff410662000 nid=0x497d runnable 
> [0x442eb000]
>java.lang.Thread.State: RUNNABLE
> at java.util.HashMap.get(HashMap.java:303)
> at org.apache.hadoop.hive.ql.Context.getCS(Context.java:524)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.getInputSummary(Utilities.java:1369)
> at 
> org.apache.hadoop.hive.ql.exec.MapRedTask.estimateNumberOfReducers(MapRedTask.java:329)
> at 
> org.apache.hadoop.hive.ql.exec.MapRedTask.setNumberOfReducers(MapRedTask.java:297)
> at 
> org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:84)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:108)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
> at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:47)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1667) Store the group of the owner of the table in metastore

2010-09-27 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915516#action_12915516
 ] 

Namit Jain commented on HIVE-1667:
--

Is it true that the first group is always the primary group (like Unix) ?
If org.apache.hadoop.security.UserGroupInformation guarantees that, your 
approach seems correct.

One more thing to check is what happens in case of creation of a partition via 
insert overwrite ?
The temp. folder is moved to the final folder, so you may have to fix there 
also.

> Store the group of the owner of the table in metastore
> --
>
> Key: HIVE-1667
> URL: https://issues.apache.org/jira/browse/HIVE-1667
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Namit Jain
> Attachments: hive-1667.patch
>
>
> Currently, the group of the owner of the table is not stored in the metastore.
> Secondly, if you create a table, the table's owner group is set to the group 
> for the parent. It is not read from the UGI passed in.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1671) multithreading on Context.pathToCS

2010-09-27 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915508#action_12915508
 ] 

Namit Jain commented on HIVE-1671:
--

OK, I can now see the problem.

+1


> multithreading on Context.pathToCS
> --
>
> Key: HIVE-1671
> URL: https://issues.apache.org/jira/browse/HIVE-1671
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Bennie Schut
>Assignee: Bennie Schut
> Fix For: 0.7.0
>
> Attachments: HIVE-1671-1.patch
>
>
> we having 2 threads running at 100%
> With a stacktrace like this:
> "Thread-16725" prio=10 tid=0x7ff410662000 nid=0x497d runnable 
> [0x442eb000]
>java.lang.Thread.State: RUNNABLE
> at java.util.HashMap.get(HashMap.java:303)
> at org.apache.hadoop.hive.ql.Context.getCS(Context.java:524)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.getInputSummary(Utilities.java:1369)
> at 
> org.apache.hadoop.hive.ql.exec.MapRedTask.estimateNumberOfReducers(MapRedTask.java:329)
> at 
> org.apache.hadoop.hive.ql.exec.MapRedTask.setNumberOfReducers(MapRedTask.java:297)
> at 
> org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:84)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:108)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
> at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:47)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1157) UDFs can't be loaded via "add jar" when jar is on HDFS

2010-09-27 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915469#action_12915469
 ] 

Namit Jain commented on HIVE-1157:
--

This is good to have - I will take a look

> UDFs can't be loaded via "add jar" when jar is on HDFS
> --
>
> Key: HIVE-1157
> URL: https://issues.apache.org/jira/browse/HIVE-1157
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Philip Zeyliger
>Priority: Minor
> Attachments: hive-1157.patch.txt, HIVE-1157.patch.v3.txt, 
> HIVE-1157.patch.v4.txt, HIVE-1157.patch.v5.txt, HIVE-1157.v2.patch.txt, 
> output.txt
>
>
> As discussed on the mailing list, it would be nice if you could use UDFs that 
> are on jars on HDFS.  The proposed implementation would be for "add jar" to 
> recognize that the target file is on HDFS, copy it locally, and load it into 
> the classpath.
> {quote}
> Hi folks,
> I have a quick question about UDF support in Hive.  I'm on the 0.5 branch.  
> Can you use a UDF where the jar which contains the function is on HDFS, and 
> not on the local filesystem.  Specifically, the following does not seem to 
> work:
> # This is Hive 0.5, from svn
> $bin/hive  
> Hive history file=/tmp/philip/hive_job_log_philip_201002081541_370227273.txt
> hive> add jar hdfs://localhost/FooTest.jar;   
>
> Added hdfs://localhost/FooTest.jar to class path
> hive> create temporary function cube as 'com.cloudera.FooTestUDF';
> 
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.FunctionTask
> Does this work for other people?  I could probably fix it by changing "add 
> jar" to download remote jars locally, when necessary (to load them into the 
> classpath), or update URLClassLoader (or whatever is underneath there) to 
> read directly from HDFS, which seems a bit more fragile.  But I wanted to 
> make sure that my interpretation of what's going on is right before I have at 
> it.
> Thanks,
> -- Philip
> {quote}
> {quote}
> Yes that's correct. I prefer to download the jars in "add jar".
> Zheng
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1671) multithreading on Context.pathToCS

2010-09-27 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915440#action_12915440
 ] 

Namit Jain commented on HIVE-1671:
--

Are your using HiveServer ?

.bq we having 2 threads running at 100%

What do you mean by the above ? Are you setting hive.exec.parallel to true, in 
which case, I can see the problem happening ?

> multithreading on Context.pathToCS
> --
>
> Key: HIVE-1671
> URL: https://issues.apache.org/jira/browse/HIVE-1671
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Bennie Schut
>Assignee: Bennie Schut
> Fix For: 0.7.0
>
> Attachments: HIVE-1671-1.patch
>
>
> we having 2 threads running at 100%
> With a stacktrace like this:
> "Thread-16725" prio=10 tid=0x7ff410662000 nid=0x497d runnable 
> [0x442eb000]
>java.lang.Thread.State: RUNNABLE
> at java.util.HashMap.get(HashMap.java:303)
> at org.apache.hadoop.hive.ql.Context.getCS(Context.java:524)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.getInputSummary(Utilities.java:1369)
> at 
> org.apache.hadoop.hive.ql.exec.MapRedTask.estimateNumberOfReducers(MapRedTask.java:329)
> at 
> org.apache.hadoop.hive.ql.exec.MapRedTask.setNumberOfReducers(MapRedTask.java:297)
> at 
> org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:84)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:108)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
> at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:47)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (HIVE-1671) multithreading on Context.pathToCS

2010-09-27 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915440#action_12915440
 ] 

Namit Jain edited comment on HIVE-1671 at 9/27/10 3:22 PM:
---

Are your using HiveServer ?

>> we having 2 threads running at 100%

What do you mean by the above ? Are you setting hive.exec.parallel to true, in 
which case, I can see the problem happening ?

  was (Author: namit):
Are your using HiveServer ?

.bq we having 2 threads running at 100%

What do you mean by the above ? Are you setting hive.exec.parallel to true, in 
which case, I can see the problem happening ?
  
> multithreading on Context.pathToCS
> --
>
> Key: HIVE-1671
> URL: https://issues.apache.org/jira/browse/HIVE-1671
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Bennie Schut
>Assignee: Bennie Schut
> Fix For: 0.7.0
>
> Attachments: HIVE-1671-1.patch
>
>
> we having 2 threads running at 100%
> With a stacktrace like this:
> "Thread-16725" prio=10 tid=0x7ff410662000 nid=0x497d runnable 
> [0x442eb000]
>java.lang.Thread.State: RUNNABLE
> at java.util.HashMap.get(HashMap.java:303)
> at org.apache.hadoop.hive.ql.Context.getCS(Context.java:524)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.getInputSummary(Utilities.java:1369)
> at 
> org.apache.hadoop.hive.ql.exec.MapRedTask.estimateNumberOfReducers(MapRedTask.java:329)
> at 
> org.apache.hadoop.hive.ql.exec.MapRedTask.setNumberOfReducers(MapRedTask.java:297)
> at 
> org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:84)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:108)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
> at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:47)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1667) Store the group of the owner of the table in metastore

2010-09-23 Thread Namit Jain (JIRA)
Store the group of the owner of the table in metastore
--

 Key: HIVE-1667
 URL: https://issues.apache.org/jira/browse/HIVE-1667
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Namit Jain


Currently, the group of the owner of the table is not stored in the metastore.

Secondly, if you create a table, the table's owner group is set to the group 
for the parent. It is not read from the UGI passed in.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1666) retry metadata operation in case of an failure

2010-09-23 Thread Namit Jain (JIRA)
retry metadata operation in case of an failure
--

 Key: HIVE-1666
 URL: https://issues.apache.org/jira/browse/HIVE-1666
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Paul Yang


If a user is trying to insert into a partition,

insert overwrite table T partition (p) select ..


it is possible that the directory gets created, but the metadata creation of 
t...@p fails - 
currently, we will just throw an error. The final directory has been created.

It will be useful to at-least retry the metadata operation. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1665) drop operations may cause file leak

2010-09-22 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913880#action_12913880
 ] 

Namit Jain commented on HIVE-1665:
--

By default, the scratch dir can be based on date etc. so that it can be easily 
cleaned up

> drop operations may cause file leak
> ---
>
> Key: HIVE-1665
> URL: https://issues.apache.org/jira/browse/HIVE-1665
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: He Yongqiang
>Assignee: He Yongqiang
>
> Right now when doing a drop, Hive first drops metadata and then drops the 
> actual files. If file system is down at that time, the files will keep not 
> deleted. 
> Had an offline discussion about this:
> to fix this, add a new conf "scratch dir" into hive conf. 
> when doing a drop operation:
> 1) move data to scratch directory
> 2) drop metadata
> 3) if 2) failed, roll back 1) and report error 3.1
> if 2) succeeded, drop data from scratch directory 3.2
> 4) if 3.2 fails, we are ok because we assume the scratch dir will be emptied 
> manually.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1361) table/partition level statistics

2010-09-22 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913783#action_12913783
 ] 

Namit Jain commented on HIVE-1361:
--

Ning, the latest patch contains the output of svn stat

> table/partition level statistics
> 
>
> Key: HIVE-1361
> URL: https://issues.apache.org/jira/browse/HIVE-1361
> Project: Hadoop Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Ning Zhang
>Assignee: Ahmed M Aly
> Fix For: 0.7.0
>
> Attachments: HIVE-1361.2.patch, HIVE-1361.2_java_only.patch, 
> HIVE-1361.3.patch, HIVE-1361.java_only.patch, HIVE-1361.patch, stats0.patch
>
>
> At the first step, we gather table-level stats for non-partitioned table and 
> partition-level stats for partitioned table. Future work could extend the 
> table level stats to partitioned table as well. 
> There are 3 major milestones in this subtask: 
>  1) extend the insert statement to gather table/partition level stats 
> on-the-fly.
>  2) extend metastore API to support storing and retrieving stats for a 
> particular table/partition. 
>  3) add an ANALYZE TABLE [PARTITION] statement in Hive QL to gather stats for 
> existing tables/partitions. 
> The proposed stats are:
> Partition-level stats: 
>   - number of rows
>   - total size in bytes
>   - number of files
>   - max, min, average row sizes
>   - max, min, average file sizes
> Table-level stats in addition to partition level stats:
>   - number of partitions

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1661) Default values for parameters

2010-09-22 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913726#action_12913726
 ] 

Namit Jain commented on HIVE-1661:
--

Yongqiang, can you take a look at this ?

> Default values for parameters
> -
>
> Key: HIVE-1661
> URL: https://issues.apache.org/jira/browse/HIVE-1661
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Siying Dong
> Fix For: 0.7.0
>
> Attachments: HIVE-1661.1.patch, HIVE-1661.2.patch
>
>
> It would be good to have a default value for some hive parameters:
> say RETENTION to be 30 days.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1658) Fix describe [extended] column formatting

2010-09-22 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913662#action_12913662
 ] 

Namit Jain commented on HIVE-1658:
--

@Thiruvel, can we keep the new output in the old format.
I mean, we just have to make sure that the output has 3 columns separated by a 
delimiter.

So, if your current output is 'x', you can replace it with:

x

An implicit null at the beginning and end.




> Fix describe [extended] column formatting
> -
>
> Key: HIVE-1658
> URL: https://issues.apache.org/jira/browse/HIVE-1658
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Paul Yang
>Assignee: Thiruvel Thirumoolan
>
> When displaying the column schema, the formatting should follow should be 
> nametypecomment
> to be inline with the previous formatting style for backward compatibility.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1655) Adding consistency check at jobClose() when committing dynamic partitions

2010-09-21 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1655:
-

   Status: Resolved  (was: Patch Available)
 Hadoop Flags: [Reviewed]
Fix Version/s: 0.7.0
   Resolution: Fixed

committed. Thanks Ning

> Adding consistency check at jobClose() when committing dynamic partitions
> -
>
> Key: HIVE-1655
> URL: https://issues.apache.org/jira/browse/HIVE-1655
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.7.0
>
> Attachments: HIVE-1655.patch
>
>
> In case of dynamic partition insert, FileSinkOperator generated a directory 
> for a new partition and the files in the directory is named with '_tmp*'. 
> When a task succeed, the file is renamed to remove the "_tmp", which 
> essentially implement the "commit" semantics. A lot of exceptions could 
> happen (process got killed, machine dies etc.) could left the _tmp files 
> exist in the DP directory. These _tmp files should be deleted ("rolled back") 
> at successful jobClose(). After the deletion, we should also delete any empty 
> directories.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1655) Adding consistency check at jobClose() when committing dynamic partitions

2010-09-21 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913317#action_12913317
 ] 

Namit Jain commented on HIVE-1655:
--

+1

> Adding consistency check at jobClose() when committing dynamic partitions
> -
>
> Key: HIVE-1655
> URL: https://issues.apache.org/jira/browse/HIVE-1655
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-1655.patch
>
>
> In case of dynamic partition insert, FileSinkOperator generated a directory 
> for a new partition and the files in the directory is named with '_tmp*'. 
> When a task succeed, the file is renamed to remove the "_tmp", which 
> essentially implement the "commit" semantics. A lot of exceptions could 
> happen (process got killed, machine dies etc.) could left the _tmp files 
> exist in the DP directory. These _tmp files should be deleted ("rolled back") 
> at successful jobClose(). After the deletion, we should also delete any empty 
> directories.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1534) Join filters do not work correctly with outer joins

2010-09-21 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1534:
-

  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

Committed. Thanks Amareshwari

> Join filters do not work correctly with outer joins
> ---
>
> Key: HIVE-1534
> URL: https://issues.apache.org/jira/browse/HIVE-1534
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Amareshwari Sriramadasu
>Assignee: Amareshwari Sriramadasu
> Attachments: patch-1534-1.txt, patch-1534-2.txt, patch-1534-3.txt, 
> patch-1534-4.txt, patch-1534.txt
>
>
>  SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1 < 10)
> and  SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1 < 10)
> do not give correct results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1661) Default values for parameters

2010-09-21 Thread Namit Jain (JIRA)
Default values for parameters
-

 Key: HIVE-1661
 URL: https://issues.apache.org/jira/browse/HIVE-1661
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Siying Dong
 Fix For: 0.7.0


It would be good to have a default value for some hive parameters:

say RETENTION to be 30 days.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1534) Join filters do not work correctly with outer joins

2010-09-21 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913105#action_12913105
 ] 

Namit Jain commented on HIVE-1534:
--

+1

> Join filters do not work correctly with outer joins
> ---
>
> Key: HIVE-1534
> URL: https://issues.apache.org/jira/browse/HIVE-1534
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Amareshwari Sriramadasu
>Assignee: Amareshwari Sriramadasu
> Attachments: patch-1534-1.txt, patch-1534-2.txt, patch-1534-3.txt, 
> patch-1534-4.txt, patch-1534.txt
>
>
>  SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1 < 10)
> and  SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1 < 10)
> do not give correct results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1609) Support partition filtering in metastore

2010-09-21 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913001#action_12913001
 ] 

Namit Jain commented on HIVE-1609:
--

I meant, exposing it via the Hive QL directly.
I don't think there is a way to do that currently.

> Support partition filtering in metastore
> 
>
> Key: HIVE-1609
> URL: https://issues.apache.org/jira/browse/HIVE-1609
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Metastore
>Reporter: Ajay Kidave
>Assignee: Ajay Kidave
> Fix For: 0.7.0
>
> Attachments: hive_1609.patch, hive_1609_2.patch, hive_1609_3.patch
>
>
> The metastore needs to have support for returning a list of partitions based 
> on user specified filter conditions. This will be useful for tools which need 
> to do partition pruning. Howl is one such use case. The way partition pruning 
> is done during hive query execution need not be changed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1609) Support partition filtering in metastore

2010-09-20 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12912851#action_12912851
 ] 

Namit Jain commented on HIVE-1609:
--

Once this is in, it would be useful to add a API like get_partitions_ps in HIVE 
- I mean, get all sub-partitions.

For example, if the table is partitioned on (ds, hr): 
something like

show partitions (ds='2010-09-20', hr) should return all sub-partitions.

> Support partition filtering in metastore
> 
>
> Key: HIVE-1609
> URL: https://issues.apache.org/jira/browse/HIVE-1609
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Metastore
>Reporter: Ajay Kidave
>Assignee: Ajay Kidave
> Fix For: 0.7.0
>
> Attachments: hive_1609.patch, hive_1609_2.patch, hive_1609_3.patch
>
>
> The metastore needs to have support for returning a list of partitions based 
> on user specified filter conditions. This will be useful for tools which need 
> to do partition pruning. Howl is one such use case. The way partition pruning 
> is done during hive query execution need not be changed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1620) Patch to write directly to S3 from Hive

2010-09-20 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12912849#action_12912849
 ] 

Namit Jain commented on HIVE-1620:
--

I will take a look and get back to you

> Patch to write directly to S3 from Hive
> ---
>
> Key: HIVE-1620
> URL: https://issues.apache.org/jira/browse/HIVE-1620
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Vaibhav Aggarwal
>Assignee: Vaibhav Aggarwal
> Attachments: HIVE-1620.patch
>
>
> We want to submit a patch to Hive which allows user to write files directly 
> to S3.
> This patch allow user to specify an S3 location as the table output location 
> and hence eliminates the need  of copying data from HDFS to S3.
> Users can run Hive queries directly over the data stored in S3.
> This patch helps integrate hive with S3 better and quicker.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1658) Fix describe [extended] column formatting

2010-09-20 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12912846#action_12912846
 ] 

Namit Jain commented on HIVE-1658:
--

Thiruvel, this is a show-stopper for HIVE-558.
The schema for describe and describe extended cannot be changed.

You can add NULLs at the beginning/end, but the number of columns have to be 
maintained

> Fix describe [extended] column formatting
> -
>
> Key: HIVE-1658
> URL: https://issues.apache.org/jira/browse/HIVE-1658
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Paul Yang
>Assignee: Thiruvel Thirumoolan
>
> When displaying the column schema, the formatting should follow should be 
> nametypecomment
> to be inline with the previous formatting style for backward compatibility.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1534) Join filters do not work correctly with outer joins

2010-09-20 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12912845#action_12912845
 ] 

Namit Jain commented on HIVE-1534:
--

What I meant to say was the following:


People are running queries in the warehouse with the expected wrong semantics - 
if we suddenly fix this, the queries will break.
We need to give some time to everyone to change their queries to use a 
sub-query if they want the filter to be pushed up.

Adding the above config, parameter seems like the only choice - we can try to 
remove this parameter before 0.7 goes out 
(if everyone agrees), but we need it right now for deployment

> Join filters do not work correctly with outer joins
> ---
>
> Key: HIVE-1534
> URL: https://issues.apache.org/jira/browse/HIVE-1534
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Amareshwari Sriramadasu
>Assignee: Amareshwari Sriramadasu
> Attachments: patch-1534-1.txt, patch-1534-2.txt, patch-1534-3.txt, 
> patch-1534.txt
>
>
>  SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1 < 10)
> and  SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1 < 10)
> do not give correct results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1534) Join filters do not work correctly with outer joins

2010-09-20 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12912741#action_12912741
 ] 

Namit Jain commented on HIVE-1534:
--

The patch looks good - however, we have a deployment issue.

This is a incompatible change, and will change/break existing queries. I cant 
think of a great way of getting this in.
One option is to cover it via a configurable parameter (it is ON by default). 
For internal deployments (like Facebook),
we can turn it off and find all the bad queries slowly and convert them, and 
only then enable this.


> Join filters do not work correctly with outer joins
> ---
>
> Key: HIVE-1534
> URL: https://issues.apache.org/jira/browse/HIVE-1534
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Amareshwari Sriramadasu
>Assignee: Amareshwari Sriramadasu
> Attachments: patch-1534-1.txt, patch-1534-2.txt, patch-1534-3.txt, 
> patch-1534.txt
>
>
>  SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1 < 10)
> and  SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1 < 10)
> do not give correct results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1497) support COMMENT clause on CREATE INDEX, and add new commands for SHOW/DESCRIBE indexes

2010-09-20 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12912694#action_12912694
 ] 

Namit Jain commented on HIVE-1497:
--

btw, HIVE-558 just got committed.

> support COMMENT clause on CREATE INDEX, and add new commands for 
> SHOW/DESCRIBE indexes
> --
>
> Key: HIVE-1497
> URL: https://issues.apache.org/jira/browse/HIVE-1497
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Indexing
>Affects Versions: 0.7.0
>Reporter: John Sichi
>Assignee: Russell Melick
> Fix For: 0.7.0
>
>
> We need to work out the syntax for SHOW/DESCRIBE, taking partitioning into 
> account.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HIVE-558) describe extended table/partition output is cryptic

2010-09-20 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain resolved HIVE-558.
-

 Hadoop Flags: [Reviewed]
Fix Version/s: 0.7.0
   Resolution: Fixed

Committed. Thanks Thiruvel

> describe extended table/partition output is cryptic
> ---
>
> Key: HIVE-558
> URL: https://issues.apache.org/jira/browse/HIVE-558
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Prasad Chakka
>Assignee: Thiruvel Thirumoolan
> Fix For: 0.7.0
>
> Attachments: HIVE-558.3.patch, HIVE-558.4.patch, HIVE-558.patch, 
> HIVE-558.patch, HIVE-558_PrelimPatch.patch, SampleOutputDescribe.txt
>
>
> describe extended table prints out the Thrift metadata object directly. The 
> information from it is not easy to read or parse. Output should be easily 
> read and can be simple parsed to get table location etc by programs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1534) Join filters do not work correctly with outer joins

2010-09-20 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12912590#action_12912590
 ] 

Namit Jain commented on HIVE-1534:
--

I will take a look again

> Join filters do not work correctly with outer joins
> ---
>
> Key: HIVE-1534
> URL: https://issues.apache.org/jira/browse/HIVE-1534
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Amareshwari Sriramadasu
>Assignee: Amareshwari Sriramadasu
> Attachments: patch-1534-1.txt, patch-1534-2.txt, patch-1534-3.txt, 
> patch-1534.txt
>
>
>  SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1 < 10)
> and  SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1 < 10)
> do not give correct results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-558) describe extended table/partition output is cryptic

2010-09-20 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12912585#action_12912585
 ] 

Namit Jain commented on HIVE-558:
-

Running tests again - will commit if it passes

> describe extended table/partition output is cryptic
> ---
>
> Key: HIVE-558
> URL: https://issues.apache.org/jira/browse/HIVE-558
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Prasad Chakka
>Assignee: Thiruvel Thirumoolan
> Attachments: HIVE-558.3.patch, HIVE-558.4.patch, HIVE-558.patch, 
> HIVE-558.patch, HIVE-558_PrelimPatch.patch, SampleOutputDescribe.txt
>
>
> describe extended table prints out the Thrift metadata object directly. The 
> information from it is not easy to read or parse. Output should be easily 
> read and can be simple parsed to get table location etc by programs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-558) describe extended table/partition output is cryptic

2010-09-18 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12911032#action_12911032
 ] 

Namit Jain commented on HIVE-558:
-

TestJdbcDriver is failing - I havent debugged it yet. Can you take a look ?
All other tests are passing

> describe extended table/partition output is cryptic
> ---
>
> Key: HIVE-558
> URL: https://issues.apache.org/jira/browse/HIVE-558
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Prasad Chakka
>Assignee: Thiruvel Thirumoolan
> Attachments: HIVE-558.3.patch, HIVE-558.patch, HIVE-558.patch, 
> HIVE-558_PrelimPatch.patch, SampleOutputDescribe.txt
>
>
> describe extended table prints out the Thrift metadata object directly. The 
> information from it is not easy to read or parse. Output should be easily 
> read and can be simple parsed to get table location etc by programs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1361) table/partition level statistics

2010-09-17 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1361:
-

Status: Open  (was: Patch Available)

> table/partition level statistics
> 
>
> Key: HIVE-1361
> URL: https://issues.apache.org/jira/browse/HIVE-1361
> Project: Hadoop Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Ning Zhang
>Assignee: Ahmed M Aly
> Fix For: 0.7.0
>
> Attachments: HIVE-1361.java_only.patch, HIVE-1361.patch, stats0.patch
>
>
> At the first step, we gather table-level stats for non-partitioned table and 
> partition-level stats for partitioned table. Future work could extend the 
> table level stats to partitioned table as well. 
> There are 3 major milestones in this subtask: 
>  1) extend the insert statement to gather table/partition level stats 
> on-the-fly.
>  2) extend metastore API to support storing and retrieving stats for a 
> particular table/partition. 
>  3) add an ANALYZE TABLE [PARTITION] statement in Hive QL to gather stats for 
> existing tables/partitions. 
> The proposed stats are:
> Partition-level stats: 
>   - number of rows
>   - total size in bytes
>   - number of files
>   - max, min, average row sizes
>   - max, min, average file sizes
> Table-level stats in addition to partition level stats:
>   - number of partitions

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1617) ScriptOperator's AutoProgressor can lead to an infinite loop

2010-09-17 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910903#action_12910903
 ] 

Namit Jain commented on HIVE-1617:
--

Committed. Thanks Paul

> ScriptOperator's AutoProgressor can lead to an infinite loop
> 
>
> Key: HIVE-1617
> URL: https://issues.apache.org/jira/browse/HIVE-1617
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Paul Yang
> Fix For: 0.7.0
>
> Attachments: HIVE-1617.1.patch
>
>
> In the default settings, the auto progressor can result in a infinite loop.
> There should be another configurable parameter which stops the auto progress 
> if the script has not made an progress.
> The default can be an hour or so - this way we will not get indefinitely stuck

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-558) describe extended table/partition output is cryptic

2010-09-17 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910795#action_12910795
 ] 

Namit Jain commented on HIVE-558:
-

Thanks Paul, I will commit it if the tests pass

> describe extended table/partition output is cryptic
> ---
>
> Key: HIVE-558
> URL: https://issues.apache.org/jira/browse/HIVE-558
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Prasad Chakka
>Assignee: Thiruvel Thirumoolan
> Attachments: HIVE-558.3.patch, HIVE-558.patch, HIVE-558.patch, 
> HIVE-558_PrelimPatch.patch, SampleOutputDescribe.txt
>
>
> describe extended table prints out the Thrift metadata object directly. The 
> information from it is not easy to read or parse. Output should be easily 
> read and can be simple parsed to get table location etc by programs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1653) Ability to enforce correct stats

2010-09-17 Thread Namit Jain (JIRA)
Ability to enforce correct stats


 Key: HIVE-1653
 URL: https://issues.apache.org/jira/browse/HIVE-1653
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Ning Zhang
 Fix For: 0.7.0


This is a follow-up for https://issues.apache.org/jira/browse/HIVE-1361.

If one of the mappers/reducers cannot publish stats, it may lead to wrong 
aggregated stats.
There should be a way to avoid this - at the least, a configuration variable 
which fails the 
task if stats cannot be published

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1652) Delete temporary stats data after some time

2010-09-17 Thread Namit Jain (JIRA)
Delete temporary stats data after some time
---

 Key: HIVE-1652
 URL: https://issues.apache.org/jira/browse/HIVE-1652
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Ning Zhang
 Fix For: 0.7.0


This is a follow-up for https://issues.apache.org/jira/browse/HIVE-1361.
If the client dies after some stats have been published, there is no way to 
clean that data.

A simple work-around might be to add current timestamp in the data - and a 
background process
to clean up old stats. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1534) Join filters do not work correctly with outer joins

2010-09-17 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1534:
-

Status: Open  (was: Patch Available)

> Join filters do not work correctly with outer joins
> ---
>
> Key: HIVE-1534
> URL: https://issues.apache.org/jira/browse/HIVE-1534
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Amareshwari Sriramadasu
>Assignee: Amareshwari Sriramadasu
> Attachments: patch-1534-1.txt, patch-1534-2.txt, patch-1534.txt
>
>
>  SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1 < 10)
> and  SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1 < 10)
> do not give correct results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1534) Join filters do not work correctly with outer joins

2010-09-17 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910763#action_12910763
 ] 

Namit Jain commented on HIVE-1534:
--

You can cleanup the patch by not special-casing for partitioned columns. 
Otherwise, the patch looks good

> Join filters do not work correctly with outer joins
> ---
>
> Key: HIVE-1534
> URL: https://issues.apache.org/jira/browse/HIVE-1534
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Amareshwari Sriramadasu
>Assignee: Amareshwari Sriramadasu
> Attachments: patch-1534-1.txt, patch-1534-2.txt, patch-1534.txt
>
>
>  SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1 < 10)
> and  SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1 < 10)
> do not give correct results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1651) ScriptOperator should not forward any output to downstream operators if an exception is happened

2010-09-17 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910747#action_12910747
 ] 

Namit Jain commented on HIVE-1651:
--

+1

> ScriptOperator should not forward any output to downstream operators if an 
> exception is happened
> 
>
> Key: HIVE-1651
> URL: https://issues.apache.org/jira/browse/HIVE-1651
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-1651.patch
>
>
> ScriptOperator spawns 2 threads for getting the stdout and stderr from the 
> script and then forward the output from stdout to downstream operators. In 
> case of any exceptions to the script (e.g., got killed), the ScriptOperator 
> got an exception and throw it to upstream operators until MapOperator got it 
> and call close(abort). Before the ScriptOperator.close() is called the script 
> output stream can still forward output to downstream operators. We should 
> terminate it immediately.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1534) Join filters do not work correctly with outer joins

2010-09-17 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910669#action_12910669
 ] 

Namit Jain commented on HIVE-1534:
--

bq. I think it makes sense to push the filters on partitioned columns and not 
output all the table for outer join. Patch pushes filters on partitioned 
columns, even for outer joins. Thoughts?

I dont think it is a good idea to special case partitioned columns - can you 
treat them like any other column



> Join filters do not work correctly with outer joins
> ---
>
> Key: HIVE-1534
> URL: https://issues.apache.org/jira/browse/HIVE-1534
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Amareshwari Sriramadasu
>Assignee: Amareshwari Sriramadasu
> Attachments: patch-1534-1.txt, patch-1534-2.txt, patch-1534.txt
>
>
>  SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1 < 10)
> and  SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1 < 10)
> do not give correct results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1650) TestContribNegativeCliDriver fails

2010-09-16 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1650:
-

Attachment: hive.1650.1.patch

> TestContribNegativeCliDriver fails
> --
>
> Key: HIVE-1650
> URL: https://issues.apache.org/jira/browse/HIVE-1650
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.1650.1.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1650) TestContribNegativeCliDriver fails

2010-09-16 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1650:
-

Status: Patch Available  (was: Open)

> TestContribNegativeCliDriver fails
> --
>
> Key: HIVE-1650
> URL: https://issues.apache.org/jira/browse/HIVE-1650
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.1650.1.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1650) TestContribNegativeCliDriver fails

2010-09-16 Thread Namit Jain (JIRA)
TestContribNegativeCliDriver fails
--

 Key: HIVE-1650
 URL: https://issues.apache.org/jira/browse/HIVE-1650
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1616) Add ProtocolBuffersStructObjectInspector

2010-09-16 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1616:
-

   Status: Resolved  (was: Patch Available)
 Hadoop Flags: [Reviewed]
Fix Version/s: 0.7.0
   Resolution: Fixed

Committed. Thanks Johan

> Add ProtocolBuffersStructObjectInspector
> 
>
> Key: HIVE-1616
> URL: https://issues.apache.org/jira/browse/HIVE-1616
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Johan Oskarsson
>Assignee: Johan Oskarsson
>Priority: Minor
> Fix For: 0.7.0
>
> Attachments: HIVE-1616.patch
>
>
> Much like there is a ThriftStructObjectInspector that ignores the isset 
> booleans there is a need for a ProtocolBuffersStructObjectInspector that 
> ignores has*. This can then be used together with Twitter's elephant-bird.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1361) table/partition level statistics

2010-09-16 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910470#action_12910470
 ] 

Namit Jain commented on HIVE-1361:
--

Will take a look 

> table/partition level statistics
> 
>
> Key: HIVE-1361
> URL: https://issues.apache.org/jira/browse/HIVE-1361
> Project: Hadoop Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Reporter: Ning Zhang
>Assignee: Ahmed M Aly
> Fix For: 0.7.0
>
> Attachments: HIVE-1361.java_only.patch, HIVE-1361.patch, stats0.patch
>
>
> At the first step, we gather table-level stats for non-partitioned table and 
> partition-level stats for partitioned table. Future work could extend the 
> table level stats to partitioned table as well. 
> There are 3 major milestones in this subtask: 
>  1) extend the insert statement to gather table/partition level stats 
> on-the-fly.
>  2) extend metastore API to support storing and retrieving stats for a 
> particular table/partition. 
>  3) add an ANALYZE TABLE [PARTITION] statement in Hive QL to gather stats for 
> existing tables/partitions. 
> The proposed stats are:
> Partition-level stats: 
>   - number of rows
>   - total size in bytes
>   - number of files
>   - max, min, average row sizes
>   - max, min, average file sizes
> Table-level stats in addition to partition level stats:
>   - number of partitions

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1616) Add ProtocolBuffersStructObjectInspector

2010-09-16 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910421#action_12910421
 ] 

Namit Jain commented on HIVE-1616:
--

Will commit if the tests pass

> Add ProtocolBuffersStructObjectInspector
> 
>
> Key: HIVE-1616
> URL: https://issues.apache.org/jira/browse/HIVE-1616
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Johan Oskarsson
>Assignee: Johan Oskarsson
>Priority: Minor
> Attachments: HIVE-1616.patch
>
>
> Much like there is a ThriftStructObjectInspector that ignores the isset 
> booleans there is a need for a ProtocolBuffersStructObjectInspector that 
> ignores has*. This can then be used together with Twitter's elephant-bird.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1617) ScriptOperator's AutoProgressor can lead to an infinite loop

2010-09-16 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910419#action_12910419
 ] 

Namit Jain commented on HIVE-1617:
--

Otherwise it looks good

> ScriptOperator's AutoProgressor can lead to an infinite loop
> 
>
> Key: HIVE-1617
> URL: https://issues.apache.org/jira/browse/HIVE-1617
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Paul Yang
> Fix For: 0.7.0
>
> Attachments: HIVE-1617.1.patch
>
>
> In the default settings, the auto progressor can result in a infinite loop.
> There should be another configurable parameter which stops the auto progress 
> if the script has not made an progress.
> The default can be an hour or so - this way we will not get indefinitely stuck

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1617) ScriptOperator's AutoProgressor can lead to an infinite loop

2010-09-16 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1617:
-

Status: Open  (was: Patch Available)

> ScriptOperator's AutoProgressor can lead to an infinite loop
> 
>
> Key: HIVE-1617
> URL: https://issues.apache.org/jira/browse/HIVE-1617
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Paul Yang
> Fix For: 0.7.0
>
> Attachments: HIVE-1617.1.patch
>
>
> In the default settings, the auto progressor can result in a infinite loop.
> There should be another configurable parameter which stops the auto progress 
> if the script has not made an progress.
> The default can be an hour or so - this way we will not get indefinitely stuck

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1617) ScriptOperator's AutoProgressor can lead to an infinite loop

2010-09-16 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910386#action_12910386
 ] 

Namit Jain commented on HIVE-1617:
--

Can you add a negative testcase - which times out 

> ScriptOperator's AutoProgressor can lead to an infinite loop
> 
>
> Key: HIVE-1617
> URL: https://issues.apache.org/jira/browse/HIVE-1617
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Paul Yang
> Fix For: 0.7.0
>
> Attachments: HIVE-1617.1.patch
>
>
> In the default settings, the auto progressor can result in a infinite loop.
> There should be another configurable parameter which stops the auto progress 
> if the script has not made an progress.
> The default can be an hour or so - this way we will not get indefinitely stuck

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-675) add database/schema support Hive QL

2010-09-16 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-675:


Status: Resolved  (was: Patch Available)
Resolution: Fixed

Committed in 0.6. Thanks Carl

> add database/schema support Hive QL
> ---
>
> Key: HIVE-675
> URL: https://issues.apache.org/jira/browse/HIVE-675
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Metastore, Query Processor
>Reporter: Prasad Chakka
>Assignee: Carl Steinbach
> Fix For: 0.6.0, 0.7.0
>
> Attachments: hive-675-2009-9-16.patch, hive-675-2009-9-19.patch, 
> hive-675-2009-9-21.patch, hive-675-2009-9-23.patch, hive-675-2009-9-7.patch, 
> hive-675-2009-9-8.patch, HIVE-675-2010-08-16.patch.txt, 
> HIVE-675-2010-7-16.patch.txt, HIVE-675-2010-8-4.patch.txt, 
> HIVE-675-backport-v6.1.patch.txt, HIVE-675-backport-v6.2.patch.txt, 
> HIVE-675.10.patch.txt, HIVE-675.11.patch.txt, HIVE-675.12.patch.txt, 
> HIVE-675.13.patch.txt
>
>
> Currently all Hive tables reside in single namespace (default). Hive should 
> support multiple namespaces (databases or schemas) such that users can create 
> tables in their specific namespaces. These name spaces can have different 
> warehouse directories (with a default naming scheme) and possibly different 
> properties.
> There is already some support for this in metastore but Hive query parser 
> should have this feature as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1639) ExecDriver.addInputPaths() error if partition name contains a comma

2010-09-16 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1639:
-

  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

Committed. Thanks Ning

> ExecDriver.addInputPaths() error if partition name contains a comma
> ---
>
> Key: HIVE-1639
> URL: https://issues.apache.org/jira/browse/HIVE-1639
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-1639.2.patch, HIVE-1639.patch
>
>
> The ExecDriver.addInputPaths() calls FileInputFormat.addPaths(), which takes 
> a comma-separated string representing a set of paths. If the path name of a 
> input file contains a comma, this code throw an exception: 
> java.lang.IllegalArgumentException: Can not create a Path from an empty 
> string.
> Instead of calling FileInputFormat.addPaths(), ExecDriver.addInputPaths 
> should iterate all paths and call FileInputFormat.addInputPath. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-675) add database/schema support Hive QL

2010-09-15 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910020#action_12910020
 ] 

Namit Jain commented on HIVE-675:
-

I have take a look

> add database/schema support Hive QL
> ---
>
> Key: HIVE-675
> URL: https://issues.apache.org/jira/browse/HIVE-675
> Project: Hadoop Hive
>  Issue Type: New Feature
>  Components: Metastore, Query Processor
>Reporter: Prasad Chakka
>Assignee: Carl Steinbach
> Fix For: 0.6.0, 0.7.0
>
> Attachments: hive-675-2009-9-16.patch, hive-675-2009-9-19.patch, 
> hive-675-2009-9-21.patch, hive-675-2009-9-23.patch, hive-675-2009-9-7.patch, 
> hive-675-2009-9-8.patch, HIVE-675-2010-08-16.patch.txt, 
> HIVE-675-2010-7-16.patch.txt, HIVE-675-2010-8-4.patch.txt, 
> HIVE-675-backport-v6.1.patch.txt, HIVE-675-backport-v6.2.patch.txt, 
> HIVE-675.10.patch.txt, HIVE-675.11.patch.txt, HIVE-675.12.patch.txt, 
> HIVE-675.13.patch.txt
>
>
> Currently all Hive tables reside in single namespace (default). Hive should 
> support multiple namespaces (databases or schemas) such that users can create 
> tables in their specific namespaces. These name spaces can have different 
> warehouse directories (with a default naming scheme) and possibly different 
> properties.
> There is already some support for this in metastore but Hive query parser 
> should have this feature as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1645) ability to specify parent directory for zookeeper lock manager

2010-09-15 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909982#action_12909982
 ] 

Namit Jain commented on HIVE-1645:
--

Added a new configuration parameter 'hive.zookeeper.namespace' under which all 
the locks will be created if zookeeper is being used as the lock maanger


> ability to specify parent directory for zookeeper lock manager
> --
>
> Key: HIVE-1645
> URL: https://issues.apache.org/jira/browse/HIVE-1645
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.1645.1.patch
>
>
> For concurrency support, it would be desirable if all the locks were created 
> under a common parent, so that zookeeper can be used
> for different purposes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1645) ability to specify parent directory for zookeeper lock manager

2010-09-15 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1645:
-

Attachment: hive.1645.1.patch

> ability to specify parent directory for zookeeper lock manager
> --
>
> Key: HIVE-1645
> URL: https://issues.apache.org/jira/browse/HIVE-1645
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.1645.1.patch
>
>
> For concurrency support, it would be desirable if all the locks were created 
> under a common parent, so that zookeeper can be used
> for different purposes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1645) ability to specify parent directory for zookeeper lock manager

2010-09-15 Thread Namit Jain (JIRA)
ability to specify parent directory for zookeeper lock manager
--

 Key: HIVE-1645
 URL: https://issues.apache.org/jira/browse/HIVE-1645
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.1645.1.patch

For concurrency support, it would be desirable if all the locks were created 
under a common parent, so that zookeeper can be used
for different purposes.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1639) ExecDriver.addInputPaths() error if partition name contains a comma

2010-09-15 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909956#action_12909956
 ] 

Namit Jain commented on HIVE-1639:
--

+1

looks good - will commit if the tests pass

> ExecDriver.addInputPaths() error if partition name contains a comma
> ---
>
> Key: HIVE-1639
> URL: https://issues.apache.org/jira/browse/HIVE-1639
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-1639.2.patch, HIVE-1639.patch
>
>
> The ExecDriver.addInputPaths() calls FileInputFormat.addPaths(), which takes 
> a comma-separated string representing a set of paths. If the path name of a 
> input file contains a comma, this code throw an exception: 
> java.lang.IllegalArgumentException: Can not create a Path from an empty 
> string.
> Instead of calling FileInputFormat.addPaths(), ExecDriver.addInputPaths 
> should iterate all paths and call FileInputFormat.addInputPath. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HIVE-1642) Convert join queries to map-join based on size of table/row

2010-09-15 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain reassigned HIVE-1642:


Assignee: Liyin Tang

> Convert join queries to map-join based on size of table/row
> ---
>
> Key: HIVE-1642
> URL: https://issues.apache.org/jira/browse/HIVE-1642
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Liyin Tang
> Fix For: 0.7.0
>
>
> Based on the number of rows and size of each table, Hive should automatically 
> be able to convert a join into map-join.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HIVE-1641) add map joined table to distributed cache

2010-09-15 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain reassigned HIVE-1641:


Assignee: Liyin Tang

> add map joined table to distributed cache
> -
>
> Key: HIVE-1641
> URL: https://issues.apache.org/jira/browse/HIVE-1641
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Liyin Tang
> Fix For: 0.7.0
>
>
> Currently, the mappers directly read the map-joined table from HDFS, which 
> makes it difficult to scale.
> We end up getting lots of timeouts once the number of mappers are beyond a 
> few thousand, due to 
> concurrent mappers.
> It would be good idea to put the mapped file into distributed cache and read 
> from there instead.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1642) Convert join queries to map-join based on size of table/row

2010-09-15 Thread Namit Jain (JIRA)
Convert join queries to map-join based on size of table/row
---

 Key: HIVE-1642
 URL: https://issues.apache.org/jira/browse/HIVE-1642
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
 Fix For: 0.7.0


Based on the number of rows and size of each table, Hive should automatically 
be able to convert a join into map-join.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1641) add map joined table to distributed cache

2010-09-15 Thread Namit Jain (JIRA)
add map joined table to distributed cache
-

 Key: HIVE-1641
 URL: https://issues.apache.org/jira/browse/HIVE-1641
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
 Fix For: 0.7.0


Currently, the mappers directly read the map-joined table from HDFS, which 
makes it difficult to scale.
We end up getting lots of timeouts once the number of mappers are beyond a few 
thousand, due to 
concurrent mappers.

It would be good idea to put the mapped file into distributed cache and read 
from there instead.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1639) ExecDriver.addInputPaths() error if partition name contains a comma

2010-09-15 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909903#action_12909903
 ] 

Namit Jain commented on HIVE-1639:
--

Can you add a testcase ?

> ExecDriver.addInputPaths() error if partition name contains a comma
> ---
>
> Key: HIVE-1639
> URL: https://issues.apache.org/jira/browse/HIVE-1639
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-1639.patch
>
>
> The ExecDriver.addInputPaths() calls FileInputFormat.addPaths(), which takes 
> a comma-separated string representing a set of paths. If the path name of a 
> input file contains a comma, this code throw an exception: 
> java.lang.IllegalArgumentException: Can not create a Path from an empty 
> string.
> Instead of calling FileInputFormat.addPaths(), ExecDriver.addInputPaths 
> should iterate all paths and call FileInputFormat.addInputPath. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1638) convert commonly used udfs to generic udfs

2010-09-15 Thread Namit Jain (JIRA)
convert commonly used udfs to generic udfs
--

 Key: HIVE-1638
 URL: https://issues.apache.org/jira/browse/HIVE-1638
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Siying Dong


Copying a mail from Joy:


i did a little bit of profiling of a simple hive group by query today. i was 
surprised to see that one of the most expensive functions were in converting 
the equals udf (i had some simple string filters) to generic udfs. 
(primitiveobjectinspectorconverter.textconverter)

am i correct in thinking that the fix is to simply port some of the most 
popular udfs (string equality/comparison etc.) to generic udsf?


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1534) Join filters do not work correctly with outer joins

2010-09-14 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909539#action_12909539
 ] 

Namit Jain commented on HIVE-1534:
--

Did you run all the tests ? Some of the tests should break - minimally a change 
of explain plans.
What about semi joins ?

Why did you add a genExprNode() ? Cant you re-use the one from SemanticAnalyzer 
?

> Join filters do not work correctly with outer joins
> ---
>
> Key: HIVE-1534
> URL: https://issues.apache.org/jira/browse/HIVE-1534
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Amareshwari Sriramadasu
>Assignee: Amareshwari Sriramadasu
> Attachments: patch-1534-1.txt, patch-1534.txt
>
>
>  SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1 < 10)
> and  SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1 < 10)
> do not give correct results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1534) Join filters do not work correctly with outer joins

2010-09-14 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909388#action_12909388
 ] 

Namit Jain commented on HIVE-1534:
--

Since we are still pushing filters for non-outer joins, the assumption that we 
will always output a row by the filters is true, and therefore 
we dont need a progress.

Cool, I will take a look at the patch again

> Join filters do not work correctly with outer joins
> ---
>
> Key: HIVE-1534
> URL: https://issues.apache.org/jira/browse/HIVE-1534
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Amareshwari Sriramadasu
>Assignee: Amareshwari Sriramadasu
> Attachments: patch-1534-1.txt, patch-1534.txt
>
>
>  SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1 < 10)
> and  SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1 < 10)
> do not give correct results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1635) ability to check partition size for dynamic partiions

2010-09-14 Thread Namit Jain (JIRA)
ability to check partition size for dynamic partiions
-

 Key: HIVE-1635
 URL: https://issues.apache.org/jira/browse/HIVE-1635
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Ning Zhang
 Fix For: 0.7.0


With dynamic partitions, it becomes very easy to create partitions.

We have seen some scenarios, where a lot of partitions/files get created due to 
some corrupt data (1 corrupt row
can end up creating a partition and a lot of files (number of mappers, if merge 
is false)).

This puts a lot of load on the cluster, and is a debugging nightmare.

It would be good to have a configuration parameter, for the minimum number of 
rows for a partition.
If the number of rows is less than the threshold, the partition need not be 
created. The default value
of this parameter can be zero for backward compatibility

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1622) Use CombineHiveInputFormat for the merge job if hive.merge.mapredfiles=true

2010-09-13 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908874#action_12908874
 ] 

Namit Jain commented on HIVE-1622:
--

Committed the new log for 0.17

> Use CombineHiveInputFormat for the merge job if hive.merge.mapredfiles=true
> ---
>
> Key: HIVE-1622
> URL: https://issues.apache.org/jira/browse/HIVE-1622
> Project: Hadoop Hive
>  Issue Type: Improvement
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.7.0
>
> Attachments: HIVE-1622.patch, HIVE-1622_0.17.patch
>
>
> Currently map-only merge (using CombineHiveInputFormat) is only enabled for 
> merging files generated by mappers. It should be used for files generated at 
> readers as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1621) Disable join filters for outer joins.

2010-09-13 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908859#action_12908859
 ] 

Namit Jain commented on HIVE-1621:
--

See https://issues.apache.org/jira/browse/HIVE-1534 for the offending test case.

> Disable join filters for outer joins.
> -
>
> Key: HIVE-1621
> URL: https://issues.apache.org/jira/browse/HIVE-1621
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Amareshwari Sriramadasu
>Assignee: Amareshwari Sriramadasu
>
> As suggested at [comment 
> |https://issues.apache.org/jira/browse/HIVE-1534?focusedCommentId=12907001&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12907001],
>  SemanticAnalyzer should give out error if join filter is specified for outer 
> joins.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1534) Join filters do not work correctly with outer joins

2010-09-13 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908858#action_12908858
 ] 

Namit Jain commented on HIVE-1534:
--

The approach looks OK - I will look into the code for more detailed comments.

One general comment was that you also need to account for progress if the join 
filters filter all the rows.
The task tracker may be thought of an un-responsive. Look at the filter 
operator, we send a progress in
that case if there are 'n' consecutive rows filtered out.

> Join filters do not work correctly with outer joins
> ---
>
> Key: HIVE-1534
> URL: https://issues.apache.org/jira/browse/HIVE-1534
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Amareshwari Sriramadasu
>Assignee: Amareshwari Sriramadasu
> Attachments: patch-1534.txt
>
>
>  SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1 < 10)
> and  SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1 < 10)
> do not give correct results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1620) Patch to write directly to S3 from Hive

2010-09-13 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908854#action_12908854
 ] 

Namit Jain commented on HIVE-1620:
--

The approach looks good, but can you move all the checks to compile time 
instead.

I mean, while generating a plan, create a S3FileSinkOperator instead of 
FileSinkOperator, if the
destination under consideration is on S3 FileSystem - there will be no move 
task etc.
The explain work will show the correct plan


The commit for S3FileSystem will be a no-op. That way, FileSinkOperator does 
not change much

> Patch to write directly to S3 from Hive
> ---
>
> Key: HIVE-1620
> URL: https://issues.apache.org/jira/browse/HIVE-1620
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: Vaibhav Aggarwal
>Assignee: Vaibhav Aggarwal
> Attachments: HIVE-1620.patch
>
>
> We want to submit a patch to Hive which allows user to write files directly 
> to S3.
> This patch allow user to specify an S3 location as the table output location 
> and hence eliminates the need  of copying data from HDFS to S3.
> Users can run Hive queries directly over the data stored in S3.
> This patch helps integrate hive with S3 better and quicker.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HIVE-1630) bug in NO_DROP

2010-09-12 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain resolved HIVE-1630.
--

Hadoop Flags: [Reviewed]
  Resolution: Fixed

Committed. Thanks Siying

> bug in NO_DROP
> --
>
> Key: HIVE-1630
> URL: https://issues.apache.org/jira/browse/HIVE-1630
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Siying Dong
> Fix For: 0.7.0
>
> Attachments: HIVE-1630.2.patch
>
>
> If the table is marked NO_DROP, we should still be able to drop old 
> partitions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1630) bug in NO_DROP

2010-09-12 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908483#action_12908483
 ] 

Namit Jain commented on HIVE-1630:
--

+1

will commit if the tests pass

> bug in NO_DROP
> --
>
> Key: HIVE-1630
> URL: https://issues.apache.org/jira/browse/HIVE-1630
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Siying Dong
> Fix For: 0.7.0
>
> Attachments: HIVE-1630.2.patch
>
>
> If the table is marked NO_DROP, we should still be able to drop old 
> partitions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1630) bug in NO_DROP

2010-09-10 Thread Namit Jain (JIRA)
bug in NO_DROP
--

 Key: HIVE-1630
 URL: https://issues.apache.org/jira/browse/HIVE-1630
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Siying Dong
 Fix For: 0.7.0


If the table is marked NO_DROP, we should still be able to drop old partitions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



  1   2   3   4   5   6   7   8   9   10   >