[jira] Commented: (HIVE-1634) Allow access to Primitive types stored in binary format in HBase

2010-10-21 Thread Basab Maulik (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923776#action_12923776
 ] 

Basab Maulik commented on HIVE-1634:


Re: Beyond the review comments I added, I do have some higher-level suggestions:

* For the column mapping, the reason I suggested "a:b:string" in the 
original JIRA description is that it's a pain to keep everything lined up by 
column position. It's already less than ideal that we do the column name 
mapping by position, so I don't think we should make it worse by having a 
separate property for type. Using the s/b shorthand is fine, and if you think 
that we shouldn't overload the colon, we can use a different separator, e.g. 
"cf:cq#s". Since the existing property name is hbase.columns.mapping, I don't 
think it will be confusing to roll in the (optional) type info as well.

I have adopted your suggestion of '#' as the separator to the storage 
information and use 'hbase.columns.mapping' to carry the additional storage 
information optionally. I have made a small change to allow any prefix of 
'string' or of 'binary' to be valid, i.e. s/b or str/bin or string/binary etc.

* I'm wondering whether we can just use the existing classes like 
LazyBinaryByte in package org.apache.hadoop.hive.serde2.lazybinary instead of 
creating new ones. Or are these not compatible with hbase.utils.Bytes?

I think the incompatibility stems more from trying to stay within the 
serde2.lazy.Lazy family of objects which the HBaseSerDe, LazyHBaseRow, and 
LazyHBaseCellMap extend or depend on. It will be useful to have these two 
families of classes compatible (inherit from a common base class). Small 
differences in the object inspector classes which type parametrize these 
classes further complicates getting past the type system. Should be doable but 
perhaps as a separate patch?

* For the tests, I noticed that you have attached 
TestHiveHBaseExternalTable. I think it would be a good idea if you can create 
and populate such a fixture table in HBaseTestSetup; that way it can be 
available (treated as read-only) to all of the HBase .q tests. Otherwise, it's 
hard to verify that we're compatible with a table created directly through 
HBase API's rather than Hive.

Done. Added tests to create a Hive external table associated with this HBase 
table and test queries.

* Also for the tests, it would be good if you can filter it down to only a 
small number of representative rows when pulling the initial test data set from 
the Hive src table. That way, we can keep the .q.out files smaller.

Done, the .out files are a lot smaller than in the initial patch.

* Once we get this one committed, be sure to update the wiki.

Will do once this is committed.


> Allow access to Primitive types stored in binary format in HBase
> 
>
> Key: HIVE-1634
> URL: https://issues.apache.org/jira/browse/HIVE-1634
> Project: Hive
>  Issue Type: Improvement
>  Components: HBase Handler
>Affects Versions: 0.7.0
>Reporter: Basab Maulik
>Assignee: Basab Maulik
> Attachments: HIVE-1634.0.patch, TestHiveHBaseExternalTable.java
>
>
> This addresses HIVE-1245 in part, for atomic or primitive types.
> The serde property "hbase.columns.storage.types" = "-,b,b,b,b,b,b,b,b" is a 
> specification of the storage option for the corresponding column in the serde 
> property "hbase.columns.mapping". Allowed values are '-' for table default, 
> 's' for standard string storage, and 'b' for binary storage as would be 
> obtained from o.a.h.hbase.utils.Bytes. Map types for HBase column families 
> use a colon separated pair such as 's:b' for the key and value part 
> specifiers respectively. See the test cases and queries for HBase handler for 
> additional examples.
> There is also a table property "hbase.table.default.storage.type" = "string" 
> to specify a table level default storage type. The other valid specification 
> is "binary". The table level default is overridden by a column level 
> specification.
> This control is available for the boolean, tinyint, smallint, int, bigint, 
> float, and double primitive types. The attached patch also relaxes the 
> mapping of map types to HBase column families to allow any primitive type to 
> be the map key.
> Attached is a program for creating a table and populating it in HBase. The 
> external table in Hive can access the data as shown in the example below.
> hive> create external table TestHiveHBaseExternalTable
> > (key string, c_bool boolean, c_byte tinyint, c_short smallint,
> >  c_int int, c_long bigint, c_string string, c_float float, c_double 
> double)
> >  stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> >  with serdeproperties ("hbase.columns.mapping" = 
> ":key,cf

[jira] Commented: (HIVE-1634) Allow access to Primitive types stored in binary format in HBase

2010-10-21 Thread HBase Review Board (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923769#action_12923769
 ] 

HBase Review Board commented on HIVE-1634:
--

Message from: bkm.had...@gmail.com


bq.  On 2010-09-16 13:28:48, John Sichi wrote:
bq.  > 
trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java, line 
499
bq.  > 
bq.  >
bq.  > Doesn't this error message need to change?

Updated the comment to "' should be mapped to Map,?>, that is " + "the Key for the map should be of primitive type, but is ... 
"


bq.  On 2010-09-16 13:28:48, John Sichi wrote:
bq.  > 
trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java, line 
623
bq.  > 
bq.  >
bq.  > I don't understand these TODO's.

Removed/updated comment.


bq.  On 2010-09-16 13:28:48, John Sichi wrote:
bq.  > 
trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java, line 
76
bq.  > 
bq.  >
bq.  > We keep adding new List data members.  Probably time to move to a 
single List, with a new class ColumnMapping with fields for 
familyName, familyNameBytes, qualifierName, qualifierNameBytes, familyBinary, 
qualifierBinary.  That will be a lot cleaner and also allow you to avoid the 
boolean [] here, which is a little clumsy.

I have changed the code to use List with the fields of interest 
as members of this data class.


bq.  On 2010-09-16 13:28:48, John Sichi wrote:
bq.  > 
trunk/hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java, 
line 480
bq.  > 
bq.  >
bq.  > Why is this assertion commented out?

I have removed this test. We do have coverage from the .q files for this case. 
This was failing due to small differences in the byte arrays from 
DataOutputStream/DataInputStream vs o.a.h.hbase.utils.Bytes.


- bkm


---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/826/#review1247
---





> Allow access to Primitive types stored in binary format in HBase
> 
>
> Key: HIVE-1634
> URL: https://issues.apache.org/jira/browse/HIVE-1634
> Project: Hive
>  Issue Type: Improvement
>  Components: HBase Handler
>Affects Versions: 0.7.0
>Reporter: Basab Maulik
>Assignee: Basab Maulik
> Attachments: HIVE-1634.0.patch, TestHiveHBaseExternalTable.java
>
>
> This addresses HIVE-1245 in part, for atomic or primitive types.
> The serde property "hbase.columns.storage.types" = "-,b,b,b,b,b,b,b,b" is a 
> specification of the storage option for the corresponding column in the serde 
> property "hbase.columns.mapping". Allowed values are '-' for table default, 
> 's' for standard string storage, and 'b' for binary storage as would be 
> obtained from o.a.h.hbase.utils.Bytes. Map types for HBase column families 
> use a colon separated pair such as 's:b' for the key and value part 
> specifiers respectively. See the test cases and queries for HBase handler for 
> additional examples.
> There is also a table property "hbase.table.default.storage.type" = "string" 
> to specify a table level default storage type. The other valid specification 
> is "binary". The table level default is overridden by a column level 
> specification.
> This control is available for the boolean, tinyint, smallint, int, bigint, 
> float, and double primitive types. The attached patch also relaxes the 
> mapping of map types to HBase column families to allow any primitive type to 
> be the map key.
> Attached is a program for creating a table and populating it in HBase. The 
> external table in Hive can access the data as shown in the example below.
> hive> create external table TestHiveHBaseExternalTable
> > (key string, c_bool boolean, c_byte tinyint, c_short smallint,
> >  c_int int, c_long bigint, c_string string, c_float float, c_double 
> double)
> >  stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> >  with serdeproperties ("hbase.columns.mapping" = 
> ":key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double")
> >  tblproperties ("hbase.table.name" = "TestHiveHBaseExternalTable");
> OK
> Time taken: 0.691 seconds
> hive> select * from TestHiveHBaseExternalTable;
> OK
> key-1 NULLNULLNULLNULLNULLTest-String NULLNULL
> Time taken: 0.346 seconds
> hive> drop table TestHiveHBaseExternalTable;
> OK
> Time taken: 0.139 seconds
> hive> create external ta

Re: Review Request: HIVE-1634 - Allow access to Primitive types stored in binary format in HBase

2010-10-21 Thread bkm . hadoop


> On 2010-09-16 13:28:48, John Sichi wrote:
> > trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java, 
> > line 499
> > 
> >
> > Doesn't this error message need to change?

Updated the comment to "' should be mapped to Map,?>, that is " + "the Key for the map should be of primitive type, but is ... 
"


> On 2010-09-16 13:28:48, John Sichi wrote:
> > trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java, 
> > line 623
> > 
> >
> > I don't understand these TODO's.

Removed/updated comment.


> On 2010-09-16 13:28:48, John Sichi wrote:
> > trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java, 
> > line 76
> > 
> >
> > We keep adding new List data members.  Probably time to move to a 
> > single List, with a new class ColumnMapping with fields for 
> > familyName, familyNameBytes, qualifierName, qualifierNameBytes, 
> > familyBinary, qualifierBinary.  That will be a lot cleaner and also allow 
> > you to avoid the boolean [] here, which is a little clumsy.

I have changed the code to use List with the fields of interest 
as members of this data class.


> On 2010-09-16 13:28:48, John Sichi wrote:
> > trunk/hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java,
> >  line 480
> > 
> >
> > Why is this assertion commented out?

I have removed this test. We do have coverage from the .q files for this case. 
This was failing due to small differences in the byte arrays from 
DataOutputStream/DataInputStream vs o.a.h.hbase.utils.Bytes.


- bkm


---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/826/#review1247
---


On 2010-10-21 20:11:06, bkm wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> http://review.cloudera.org/r/826/
> ---
> 
> (Updated 2010-10-21 20:11:06)
> 
> 
> Review request for Hive Developers and John Sichi.
> 
> 
> Summary
> ---
> 
> This addresses HIVE-1245 in part, for atomic or primitive types.
> 
> The serde property "hbase.columns.storage.types" = "-,b,b,b,b,b,b,b,b" is a 
> specification of the storage option for the corresponding column in the serde 
> property "hbase.columns.mapping". Allowed values are '' for table default, 
> 's' for standard string storage, and 'b' for binary storage as would be 
> obtained from o.a.h.hbase.utils.Bytes. Map types for HBase column families 
> use a colon separated pair such as 's:b' for the key and value part 
> specifiers respectively. See the test cases and queries for HBase handler for 
> additional examples.
> 
> There is also a table property "hbase.table.default.storage.type" = "string" 
> to specify a table level default storage type. The other valid specification 
> is "binary". The table level default is overridden by a column level 
> specification.
> 
> This control is available for the boolean, tinyint, smallint, int, bigint, 
> float, and double primitive types. The attached patch also relaxes the 
> mapping of map types to HBase column families to allow any primitive type to 
> be the map key.
> 
> 
> This addresses bug HIVE-1634.
> http://issues.apache.org/jira/browse/HIVE-1634
> 
> 
> Diffs
> -
> 
>   trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java 
> 1023967 
>   
> trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsAggregator.java
>  1023967 
>   
> trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsPublisher.java
>  1023967 
>   
> trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java
>  1023967 
>   
> trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat.java
>  1023967 
>   
> trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseCellMap.java
>  1023967 
>   trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java 
> 1023967 
>   
> trunk/hbase-handler/src/test/org/apache/hadoop/hive/hbase/HBaseTestSetup.java 
> 1023967 
>   
> trunk/hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java 
> 1023967 
>   
> trunk/hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestLazyHBaseObject.java
>  1023967 
>   trunk/hbase-handler/src/test/queries/hbase_binary_external_table_queries.q 
> PRE-CREATION 
>   trunk/hbase-handler/src/test/queries/hbase_binary_map_queries.q 
> PRE-CREATION 
>   trunk/hbase-handler/src/test/queries/hbase_binary_storage_queries.q 
> PRE-CREATION 
>   
> trunk/hbase-handler/src/tes

[jira] Commented: (HIVE-1738) Optimize Key Comparison in GroupByOperator

2010-10-21 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923759#action_12923759
 ] 

Zheng Shao commented on HIVE-1738:
--

+1. This is smart!

> public boolean areEqual(ArrayList ol0, ArrayList ol1) ...
Why do we need to care about the case that the 2 array lists are different in 
size / shorter than numFields?

> for (int i = 0; i < numFields; i++) {

We might want to try comparing the last field first.  The reason is that in 
sort-based aggregation, the last key is more likely to be different than the 
first key. Not sure the effect is big enough to be noticeable though.


> Optimize Key Comparison in GroupByOperator
> --
>
> Key: HIVE-1738
> URL: https://issues.apache.org/jira/browse/HIVE-1738
> Project: Hive
>  Issue Type: Improvement
>Reporter: Siying Dong
>Assignee: Siying Dong
> Fix For: 0.7.0
>
> Attachments: HIVE.1738.1.patch, HIVE.1738.2.patch, HIVE.1738.3.patch
>
>
> GroupByOperator uses ObjectInspectorUtils.compare() to compare keys, which is 
> written for generalized object comparisons, which is not optimized for 
> group-by operator. By optimizing this logic, we expect to see obvious 
> improvements in GroupByOperator.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1738) Optimize Key Comparison in GroupByOperator

2010-10-21 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1738:
-

   Resolution: Fixed
Fix Version/s: 0.7.0
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

Committed. Thanks Siying

> Optimize Key Comparison in GroupByOperator
> --
>
> Key: HIVE-1738
> URL: https://issues.apache.org/jira/browse/HIVE-1738
> Project: Hive
>  Issue Type: Improvement
>Reporter: Siying Dong
>Assignee: Siying Dong
> Fix For: 0.7.0
>
> Attachments: HIVE.1738.1.patch, HIVE.1738.2.patch, HIVE.1738.3.patch
>
>
> GroupByOperator uses ObjectInspectorUtils.compare() to compare keys, which is 
> written for generalized object comparisons, which is not optimized for 
> group-by operator. By optimizing this logic, we expect to see obvious 
> improvements in GroupByOperator.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1634) Allow access to Primitive types stored in binary format in HBase

2010-10-21 Thread HBase Review Board (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923746#action_12923746
 ] 

HBase Review Board commented on HIVE-1634:
--

Message from: bkm.had...@gmail.com

---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/826/
---

(Updated 2010-10-21 20:11:06.837430)


Review request for Hive Developers and John Sichi.


Changes
---

The proposed serde property "hbase.columns.storage.types" = "-,b,b,b,b,b,b,b,b" 
as a specification of the storage option for the corresponding column in the 
serde property "hbase.columns.mapping" has been removed. Instead the storage 
option is an optional part of the "hbase.columns.mapping" and is specified for 
a column using '#' as a separator following the column family/qualifier. 
Allowed values are '' for table default, a prefix of 'string' for standard 
string storage, and a prefix of 'binary' for binary storage as would be 
obtained from o.a.h.hbase.utils.Bytes. Map types for HBase column families use 
a colon separated pair such as 'str:bin' or 's:b' for the key and value part 
specifiers respectively.

The tests TestHBaseSerDe, TestLazyHBaseObject, TestHBaseCliDriver, and 
TestHBaseMinimrCliDriver pass.


Summary
---

This addresses HIVE-1245 in part, for atomic or primitive types.

The serde property "hbase.columns.storage.types" = "-,b,b,b,b,b,b,b,b" is a 
specification of the storage option for the corresponding column in the serde 
property "hbase.columns.mapping". Allowed values are '' for table default, 's' 
for standard string storage, and 'b' for binary storage as would be obtained 
from o.a.h.hbase.utils.Bytes. Map types for HBase column families use a colon 
separated pair such as 's:b' for the key and value part specifiers 
respectively. See the test cases and queries for HBase handler for additional 
examples.

There is also a table property "hbase.table.default.storage.type" = "string" to 
specify a table level default storage type. The other valid specification is 
"binary". The table level default is overridden by a column level specification.

This control is available for the boolean, tinyint, smallint, int, bigint, 
float, and double primitive types. The attached patch also relaxes the mapping 
of map types to HBase column families to allow any primitive type to be the map 
key.


This addresses bug HIVE-1634.
http://issues.apache.org/jira/browse/HIVE-1634


Diffs (updated)
-

  trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java 
1023967 
  
trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsAggregator.java
 1023967 
  
trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsPublisher.java
 1023967 
  
trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java
 1023967 
  
trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat.java
 1023967 
  
trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseCellMap.java 
1023967 
  trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java 
1023967 
  trunk/hbase-handler/src/test/org/apache/hadoop/hive/hbase/HBaseTestSetup.java 
1023967 
  trunk/hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java 
1023967 
  
trunk/hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestLazyHBaseObject.java
 1023967 
  trunk/hbase-handler/src/test/queries/hbase_binary_external_table_queries.q 
PRE-CREATION 
  trunk/hbase-handler/src/test/queries/hbase_binary_map_queries.q PRE-CREATION 
  trunk/hbase-handler/src/test/queries/hbase_binary_storage_queries.q 
PRE-CREATION 
  
trunk/hbase-handler/src/test/results/hbase_binary_external_table_queries.q.out 
PRE-CREATION 
  trunk/hbase-handler/src/test/results/hbase_binary_map_queries.q.out 
PRE-CREATION 
  trunk/hbase-handler/src/test/results/hbase_binary_storage_queries.q.out 
PRE-CREATION 
  
trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyBooleanBinary.java 
PRE-CREATION 
  trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyByteBinary.java 
PRE-CREATION 
  trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyDoubleBinary.java 
PRE-CREATION 
  trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyFactory.java 
1023967 
  trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyFloatBinary.java 
PRE-CREATION 
  
trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyIntegerBinary.java 
PRE-CREATION 
  trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyLongBinary.java 
PRE-CREATION 
  trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyShortBinary.java 
PRE-CREATION 
  trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyUtils.java 
1023967 

Diff: http://review.cloudera.or

Re: Review Request: HIVE-1634 - Allow access to Primitive types stored in binary format in HBase

2010-10-21 Thread bkm . hadoop

---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/826/
---

(Updated 2010-10-21 20:11:06.837430)


Review request for Hive Developers and John Sichi.


Changes
---

The proposed serde property "hbase.columns.storage.types" = "-,b,b,b,b,b,b,b,b" 
as a specification of the storage option for the corresponding column in the 
serde property "hbase.columns.mapping" has been removed. Instead the storage 
option is an optional part of the "hbase.columns.mapping" and is specified for 
a column using '#' as a separator following the column family/qualifier. 
Allowed values are '' for table default, a prefix of 'string' for standard 
string storage, and a prefix of 'binary' for binary storage as would be 
obtained from o.a.h.hbase.utils.Bytes. Map types for HBase column families use 
a colon separated pair such as 'str:bin' or 's:b' for the key and value part 
specifiers respectively.

The tests TestHBaseSerDe, TestLazyHBaseObject, TestHBaseCliDriver, and 
TestHBaseMinimrCliDriver pass.


Summary
---

This addresses HIVE-1245 in part, for atomic or primitive types.

The serde property "hbase.columns.storage.types" = "-,b,b,b,b,b,b,b,b" is a 
specification of the storage option for the corresponding column in the serde 
property "hbase.columns.mapping". Allowed values are '' for table default, 's' 
for standard string storage, and 'b' for binary storage as would be obtained 
from o.a.h.hbase.utils.Bytes. Map types for HBase column families use a colon 
separated pair such as 's:b' for the key and value part specifiers 
respectively. See the test cases and queries for HBase handler for additional 
examples.

There is also a table property "hbase.table.default.storage.type" = "string" to 
specify a table level default storage type. The other valid specification is 
"binary". The table level default is overridden by a column level specification.

This control is available for the boolean, tinyint, smallint, int, bigint, 
float, and double primitive types. The attached patch also relaxes the mapping 
of map types to HBase column families to allow any primitive type to be the map 
key.


This addresses bug HIVE-1634.
http://issues.apache.org/jira/browse/HIVE-1634


Diffs (updated)
-

  trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java 
1023967 
  
trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsAggregator.java
 1023967 
  
trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsPublisher.java
 1023967 
  
trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java
 1023967 
  
trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat.java
 1023967 
  
trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseCellMap.java 
1023967 
  trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java 
1023967 
  trunk/hbase-handler/src/test/org/apache/hadoop/hive/hbase/HBaseTestSetup.java 
1023967 
  trunk/hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java 
1023967 
  
trunk/hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestLazyHBaseObject.java
 1023967 
  trunk/hbase-handler/src/test/queries/hbase_binary_external_table_queries.q 
PRE-CREATION 
  trunk/hbase-handler/src/test/queries/hbase_binary_map_queries.q PRE-CREATION 
  trunk/hbase-handler/src/test/queries/hbase_binary_storage_queries.q 
PRE-CREATION 
  
trunk/hbase-handler/src/test/results/hbase_binary_external_table_queries.q.out 
PRE-CREATION 
  trunk/hbase-handler/src/test/results/hbase_binary_map_queries.q.out 
PRE-CREATION 
  trunk/hbase-handler/src/test/results/hbase_binary_storage_queries.q.out 
PRE-CREATION 
  
trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyBooleanBinary.java 
PRE-CREATION 
  trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyByteBinary.java 
PRE-CREATION 
  trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyDoubleBinary.java 
PRE-CREATION 
  trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyFactory.java 
1023967 
  trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyFloatBinary.java 
PRE-CREATION 
  
trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyIntegerBinary.java 
PRE-CREATION 
  trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyLongBinary.java 
PRE-CREATION 
  trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyShortBinary.java 
PRE-CREATION 
  trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyUtils.java 
1023967 

Diff: http://review.cloudera.org/r/826/diff


Testing
---

The HBase handler tests TestHBaseSerDe, TestLazyHBaseObject, 
TestHBaseCliDriver, and TestHBaseMinimrCliDriver pass.

New tests have been added to TestHBaseSerDe and TestLazyHBaseObject to test 
this feature.

New queries which exercise this feature have been add

[jira] Commented: (HIVE-78) Authorization infrastructure for Hive

2010-10-21 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923734#action_12923734
 ] 

He Yongqiang commented on HIVE-78:
--

By-passing the hdfs permission from hive layer is just one option. And the 
implementation should also support setting user groups in the hdfs side. And 
let the mapreduce job run as the user.

Just a quick update about the authorization rule:

In the offline discussion we had internally this afternoon, remove DENY should 
also another option to be considered. And we examined our use cased with this 
(without DENY), it works. So remove DENY from the authorization will simplify 
the implementation a lot.

And regarding view and index, for the first version, we should not do that. And 
we can do them later when we have a better understanding after we implement the 
first version.

> Authorization infrastructure for Hive
> -
>
> Key: HIVE-78
> URL: https://issues.apache.org/jira/browse/HIVE-78
> Project: Hive
>  Issue Type: New Feature
>  Components: Metastore, Query Processor, Server Infrastructure
>Reporter: Ashish Thusoo
>Assignee: He Yongqiang
> Attachments: createuser-v1.patch, hive-78-metadata-v1.patch, 
> hive-78-syntax-v1.patch, hive-78.diff
>
>
> Allow hive to integrate with existing user repositories for authentication 
> and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-78) Authorization infrastructure for Hive

2010-10-21 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-78:
---

Component/s: Query Processor
 Metastore

> Authorization infrastructure for Hive
> -
>
> Key: HIVE-78
> URL: https://issues.apache.org/jira/browse/HIVE-78
> Project: Hive
>  Issue Type: New Feature
>  Components: Metastore, Query Processor, Server Infrastructure
>Reporter: Ashish Thusoo
>Assignee: He Yongqiang
> Attachments: createuser-v1.patch, hive-78-metadata-v1.patch, 
> hive-78-syntax-v1.patch, hive-78.diff
>
>
> Allow hive to integrate with existing user repositories for authentication 
> and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-78) Authorization infrastructure for Hive

2010-10-21 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923733#action_12923733
 ] 

Carl Steinbach commented on HIVE-78:


The issue that Todd raised is pretty important and needs to be addressed in the 
proposal.
My personal opinion is that running all queries as a "hive" super-user is the 
most
practical approach and will also yield behavior that is familiar to users of 
traditional
RDBMS systems (who I expect will increasingly define the average Hive 
user/administrator).

There are some other follow-on issues that need to be decided if we end up 
settling
on this approach:

* This approach to authorization presupposes that users are accessing Hive 
through a HiveServer process. This follows from the fact that A) you want Hive 
to execute the query plans as the Hive superuser, and B) that user can 
circumvent the authorization model if they are given direct access to the 
MetaStore DB. It would be nice if the proposal explicitly stated this 
requirement and mentioned some of the follow-on work that this necessitates, 
e.g. fixing concurrency issues in HiveServer, reducing the memory requirements 
of HiveServer, etc.

* We need to apply the authorization model to the '{{add [archive|file|jar]}}' 
commands as well as {{add temorary function}}. {{add jar}} and {{add file}} 
both currently allow the user to inject code into MR jobs, and {{add jar}} in 
conjunction with {{add temporary function}} allows the user to inject and 
execute arbitrary code within the HiveServer process. We may also want to add a 
new {{add executable}} command for adding executable scripts that has a 
different permission model than {{add file}}.

* I think there also may be security issues stemming from external tables, e.g. 
if I create an external table that points to another user's home directory and 
then run a query on it which executes with Hive's superuser permissions.

* Loading date into the Hive warehouse from an arbitrary HDFS location and 
exporting data to other locations in HDFS are two issues that need to be 
considered. In each case I think the correct behavior depends on both the Hive 
process's permissions and those of the user.




> Authorization infrastructure for Hive
> -
>
> Key: HIVE-78
> URL: https://issues.apache.org/jira/browse/HIVE-78
> Project: Hive
>  Issue Type: New Feature
>  Components: Metastore, Query Processor, Server Infrastructure
>Reporter: Ashish Thusoo
>Assignee: He Yongqiang
> Attachments: createuser-v1.patch, hive-78-metadata-v1.patch, 
> hive-78-syntax-v1.patch, hive-78.diff
>
>
> Allow hive to integrate with existing user repositories for authentication 
> and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-78) Authorization infrastructure for Hive

2010-10-21 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923719#action_12923719
 ] 

Todd Lipcon commented on HIVE-78:
-

I'm a little unclear on how the user identity is passed down to the MR layer. 
Carl and I had chatted about this a few weeks back -- is the idea now that all 
hive queries will run MR jobs as a "hive" user, rather than "todd"? If so, we 
need to add authorization control for UDFs and TRANSFORM as well, since a user 
could trivially take over the "hive" user credentials from within a UDF. If the 
MR jobs will continue to run as "todd", then I don't understand how we can 
apply any permissions model that is any different than HDFS permissions. More 
restrictive is impossible because I can just read the files myself, and less 
restrictive is impossible because HDFS is applying permissions based on the 
"todd" identity.

> Authorization infrastructure for Hive
> -
>
> Key: HIVE-78
> URL: https://issues.apache.org/jira/browse/HIVE-78
> Project: Hive
>  Issue Type: New Feature
>  Components: Server Infrastructure
>Reporter: Ashish Thusoo
>Assignee: He Yongqiang
> Attachments: createuser-v1.patch, hive-78-metadata-v1.patch, 
> hive-78-syntax-v1.patch, hive-78.diff
>
>
> Allow hive to integrate with existing user repositories for authentication 
> and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1597) Hive CLI returns MasterNotRunningException with HBase 0.89.x

2010-10-21 Thread Leo Alekseyev (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923718#action_12923718
 ] 

Leo Alekseyev commented on HIVE-1597:
-

In our case the problem appeared to be related to Hive HBase handler being 
compiled with outdated jars.  The solution (for HBase 0.89.20100830) was to 
make sure $HIVE_SOURCE/lib contains both hbase-0.89.20100830.jar and 
hbase-0.89.20100830-tests.jar and no other versions of hbase jars.


> Hive CLI returns MasterNotRunningException with HBase 0.89.x
> 
>
> Key: HIVE-1597
> URL: https://issues.apache.org/jira/browse/HIVE-1597
> Project: Hive
>  Issue Type: Bug
>  Components: HBase Handler
>Affects Versions: 0.7.0
>Reporter: Basab Maulik
>
> This is a follow on task to HIVE-1512.
> hive> CREATE TABLE hbase_table_1(key int, value string)
>> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>> WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")
>> TBLPROPERTIES ("hbase.table.name" = "xyz");
> FAILED: Error in metadata:
> MetaException(message:org.apache.hadoop.hbase.MasterNotRunningException:
> 10.2.128.92:6  at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getMaster(HConnectionManager.java:376)
> ...
> This reproduces in testing with CDH3 and with HBase 0.89.x snapshot/zookeeper 
> 3.3.1.
> Interesting, the tests TestHBaseSerDe, TestLazyHBaseObject, 
> TestHBaseCliDriver, and TestHBaseCliMinimrDriver pass using these upgraded 
> versions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1526) Hive should depend on a release version of Thrift

2010-10-21 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923715#action_12923715
 ] 

Todd Lipcon commented on HIVE-1526:
---

Hey Ning. Can you take a look at this change? It's no longer in sync with 
trunk, but I don't want to have to redo it twice (it's a pain since you have to 
regenerate all the files, etc). If the basics look OK I will resync with trunk 
and then we can commit soonafter.

> Hive should depend on a release version of Thrift
> -
>
> Key: HIVE-1526
> URL: https://issues.apache.org/jira/browse/HIVE-1526
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure, Clients
>Reporter: Carl Steinbach
>Assignee: Todd Lipcon
> Fix For: 0.7.0
>
> Attachments: HIVE-1526.2.patch.txt, hive-1526.txt, libfb303.jar, 
> libthrift.jar
>
>
> Hive should depend on a release version of Thrift, and ideally it should use 
> Ivy to resolve this dependency.
> The Thrift folks are working on adding Thrift artifacts to a maven repository 
> here: https://issues.apache.org/jira/browse/THRIFT-363

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1741) HiveInputFormat.readFields should print the cause when there's an exception

2010-10-21 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923687#action_12923687
 ] 

John Sichi commented on HIVE-1741:
--

Yeah, Hive exception-swallowing is notorious.  We really need a coding standard 
here...

> HiveInputFormat.readFields should print the cause when there's an exception
> ---
>
> Key: HIVE-1741
> URL: https://issues.apache.org/jira/browse/HIVE-1741
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 0.7.0
>Reporter: Jean-Daniel Cryans
>Priority: Trivial
>
> Minor annoyance when it comes to debugging using exotic input formats, 
> currently if you do something wrong trying to get the HBase handler working 
> you get something like this:
> {noformat}
> java.io.IOException: Cannot create an instance of InputSplit class = 
> org.apache.hadoop.hive.hbase.HBaseSplit:org.apache.hadoop.hive.hbase.HBaseSplit
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.readFields(HiveInputFormat.java:147)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:333)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>   at org.apache.hadoop.mapred.Child.main(Child.java:170)
> {noformat}
> It could be a lot more helpful to see the cause, in my case:
> {noformat}
> java.io.IOException: Cannot create an instance of InputSplit class = 
> org.apache.hadoop.hive.hbase.HBaseSplit:org.apache.hadoop.hive.hbase.HBaseSplit
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.readFields(HiveInputFormat.java:147)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:333)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>   at org.apache.hadoop.mapred.Child.main(Child.java:170)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hadoop.hive.hbase.HBaseSplit
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> ...
>   at java.lang.Class.forName0(Native Method)
>   at java.lang.Class.forName(Class.java:247)
>   at 
> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:907)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.readFields(HiveInputFormat.java:144)
> {noformat}
> It's just a matter of doing this in readFields:
> {code}
> -+ inputSplitClassName + ":" + e.getMessage());
> ++ inputSplitClassName + ":" + e.getMessage(), e);
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1738) Optimize Key Comparison in GroupByOperator

2010-10-21 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923684#action_12923684
 ] 

Namit Jain commented on HIVE-1738:
--

+1

> Optimize Key Comparison in GroupByOperator
> --
>
> Key: HIVE-1738
> URL: https://issues.apache.org/jira/browse/HIVE-1738
> Project: Hive
>  Issue Type: Improvement
>Reporter: Siying Dong
>Assignee: Siying Dong
> Attachments: HIVE.1738.1.patch, HIVE.1738.2.patch, HIVE.1738.3.patch
>
>
> GroupByOperator uses ObjectInspectorUtils.compare() to compare keys, which is 
> written for generalized object comparisons, which is not optimized for 
> group-by operator. By optimizing this logic, we expect to see obvious 
> improvements in GroupByOperator.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1742) Fix Eclipse templates (and use Ivy metadata to generate Eclipse library dependencies)

2010-10-21 Thread HBase Review Board (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923677#action_12923677
 ] 

HBase Review Board commented on HIVE-1742:
--

Message from: "Carl Steinbach" 

---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/1066/
---

Review request for Hive Developers.


Summary
---

HIVE-1742: Fix Eclipse templates (and use Ivy metadata to generate Eclipse 
library dependencies)


This addresses bug HIVE-1742.
http://issues.apache.org/jira/browse/HIVE-1742


Diffs
-

  build.xml 1407bca 
  eclipse-templates/.classpath 48de61d 
  ivy/libraries.properties 615 
  metastore/ivy.xml e6057c0 

Diff: http://review.cloudera.org/r/1066/diff


Testing
---


Thanks,

Carl




> Fix Eclipse templates (and use Ivy metadata to generate Eclipse library 
> dependencies)
> -
>
> Key: HIVE-1742
> URL: https://issues.apache.org/jira/browse/HIVE-1742
> Project: Hive
>  Issue Type: Bug
>  Components: Build Infrastructure
>Reporter: Carl Steinbach
>Assignee: Carl Steinbach
> Fix For: 0.7.0
>
> Attachments: HIVE-1742.1.patch.txt
>
>
> A previous commit broke the eclipse templates.
> Also, we should use the library version information in 
> ivy/libraries.properties in
> ivy.xml files as well as the eclipse .classpath file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Review Request: HIVE-1742: Fix Eclipse templates (and use Ivy metadata to generate Eclipse library dependencies)

2010-10-21 Thread Carl Steinbach

---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/1066/
---

Review request for Hive Developers.


Summary
---

HIVE-1742: Fix Eclipse templates (and use Ivy metadata to generate Eclipse 
library dependencies)


This addresses bug HIVE-1742.
http://issues.apache.org/jira/browse/HIVE-1742


Diffs
-

  build.xml 1407bca 
  eclipse-templates/.classpath 48de61d 
  ivy/libraries.properties 615 
  metastore/ivy.xml e6057c0 

Diff: http://review.cloudera.org/r/1066/diff


Testing
---


Thanks,

Carl



[jira] Updated: (HIVE-1742) Fix Eclipse templates (and use Ivy metadata to generate Eclipse library dependencies)

2010-10-21 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1742:
-

Status: Patch Available  (was: Open)

> Fix Eclipse templates (and use Ivy metadata to generate Eclipse library 
> dependencies)
> -
>
> Key: HIVE-1742
> URL: https://issues.apache.org/jira/browse/HIVE-1742
> Project: Hive
>  Issue Type: Bug
>  Components: Build Infrastructure
>Reporter: Carl Steinbach
>Assignee: Carl Steinbach
> Fix For: 0.7.0
>
> Attachments: HIVE-1742.1.patch.txt
>
>
> A previous commit broke the eclipse templates.
> Also, we should use the library version information in 
> ivy/libraries.properties in
> ivy.xml files as well as the eclipse .classpath file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1742) Fix Eclipse templates (and use Ivy metadata to generate Eclipse library dependencies)

2010-10-21 Thread Carl Steinbach (JIRA)
Fix Eclipse templates (and use Ivy metadata to generate Eclipse library 
dependencies)
-

 Key: HIVE-1742
 URL: https://issues.apache.org/jira/browse/HIVE-1742
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure
Reporter: Carl Steinbach
Assignee: Carl Steinbach
 Fix For: 0.7.0
 Attachments: HIVE-1742.1.patch.txt

A previous commit broke the eclipse templates.

Also, we should use the library version information in ivy/libraries.properties 
in
ivy.xml files as well as the eclipse .classpath file.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1742) Fix Eclipse templates (and use Ivy metadata to generate Eclipse library dependencies)

2010-10-21 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1742:
-

Attachment: HIVE-1742.1.patch.txt

> Fix Eclipse templates (and use Ivy metadata to generate Eclipse library 
> dependencies)
> -
>
> Key: HIVE-1742
> URL: https://issues.apache.org/jira/browse/HIVE-1742
> Project: Hive
>  Issue Type: Bug
>  Components: Build Infrastructure
>Reporter: Carl Steinbach
>Assignee: Carl Steinbach
> Fix For: 0.7.0
>
> Attachments: HIVE-1742.1.patch.txt
>
>
> A previous commit broke the eclipse templates.
> Also, we should use the library version information in 
> ivy/libraries.properties in
> ivy.xml files as well as the eclipse .classpath file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-78) Authorization infrastructure for Hive

2010-10-21 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923671#action_12923671
 ] 

He Yongqiang commented on HIVE-78:
--

Sorry, in the previous comment: by "one accept then accept; one deny then 
deny", i mean "Accept overwrite deny. one accept then accept; no accept then 
deny"

> Authorization infrastructure for Hive
> -
>
> Key: HIVE-78
> URL: https://issues.apache.org/jira/browse/HIVE-78
> Project: Hive
>  Issue Type: New Feature
>  Components: Server Infrastructure
>Reporter: Ashish Thusoo
>Assignee: He Yongqiang
> Attachments: createuser-v1.patch, hive-78-metadata-v1.patch, 
> hive-78-syntax-v1.patch, hive-78.diff
>
>
> Allow hive to integrate with existing user repositories for authentication 
> and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-78) Authorization infrastructure for Hive

2010-10-21 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923667#action_12923667
 ] 

He Yongqiang commented on HIVE-78:
--

The other option we came up from offline discussion is the rule of "one accept 
then accept" but in a hierarchy style. First check privileges granted the user 
and groups. One accept then accept; One deny then deny. And then check role 
level privileges, one accept then accept; one deny then deny.

We prefer to go with this rule. Please comment, and if no concerns on this, i 
will update the wiki.

> Authorization infrastructure for Hive
> -
>
> Key: HIVE-78
> URL: https://issues.apache.org/jira/browse/HIVE-78
> Project: Hive
>  Issue Type: New Feature
>  Components: Server Infrastructure
>Reporter: Ashish Thusoo
>Assignee: He Yongqiang
> Attachments: createuser-v1.patch, hive-78-metadata-v1.patch, 
> hive-78-syntax-v1.patch, hive-78.diff
>
>
> Allow hive to integrate with existing user repositories for authentication 
> and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1575) get_json_object does not support JSON array at the root level

2010-10-21 Thread Mike Lewis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Lewis updated HIVE-1575:
-

Attachment: 0001-Updated-UDFJson-to-allow-arrays-as-a-root-object.patch

This patch actually works, albeit a bit sloppy.

> get_json_object does not support JSON array at the root level
> -
>
> Key: HIVE-1575
> URL: https://issues.apache.org/jira/browse/HIVE-1575
> Project: Hive
>  Issue Type: Improvement
>  Components: UDF
>Affects Versions: 0.7.0
>Reporter: Steven Wong
> Attachments: 
> 0001-Updated-UDFJson-to-allow-arrays-as-a-root-object.patch
>
>
> Currently, get_json_object(json_txt, path) always returns null if json_txt is 
> not a JSON object (e.g. is a JSON array) at the root level.
> I have a table column of JSON arrays at the root level, but I can't parse it 
> because of that.
> get_json_object should accept any JSON value (string, number, object, array, 
> true, false, null), not just object, at the root level. In other words, it 
> should behave as if it were named get_json_value or simply get_json.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1741) HiveInputFormat.readFields should print the cause when there's an exception

2010-10-21 Thread Jean-Daniel Cryans (JIRA)
HiveInputFormat.readFields should print the cause when there's an exception
---

 Key: HIVE-1741
 URL: https://issues.apache.org/jira/browse/HIVE-1741
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.7.0
Reporter: Jean-Daniel Cryans
Priority: Trivial


Minor annoyance when it comes to debugging using exotic input formats, 
currently if you do something wrong trying to get the HBase handler working you 
get something like this:

{noformat}
java.io.IOException: Cannot create an instance of InputSplit class = 
org.apache.hadoop.hive.hbase.HBaseSplit:org.apache.hadoop.hive.hbase.HBaseSplit
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.readFields(HiveInputFormat.java:147)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:333)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
{noformat}

It could be a lot more helpful to see the cause, in my case:

{noformat}
java.io.IOException: Cannot create an instance of InputSplit class = 
org.apache.hadoop.hive.hbase.HBaseSplit:org.apache.hadoop.hive.hbase.HBaseSplit
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.readFields(HiveInputFormat.java:147)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:333)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.lang.ClassNotFoundException: 
org.apache.hadoop.hive.hbase.HBaseSplit
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
...
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at 
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:907)
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.readFields(HiveInputFormat.java:144)
{noformat}

It's just a matter of doing this in readFields:
{code}
-+ inputSplitClassName + ":" + e.getMessage());
++ inputSplitClassName + ":" + e.getMessage(), e);
{code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1575) get_json_object does not support JSON array at the root level

2010-10-21 Thread Mike Lewis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923644#action_12923644
 ] 

Mike Lewis commented on HIVE-1575:
--

Apologies.  The patch I submitted was broken.  I think I may have messed up the 
state of this issue :(

> get_json_object does not support JSON array at the root level
> -
>
> Key: HIVE-1575
> URL: https://issues.apache.org/jira/browse/HIVE-1575
> Project: Hive
>  Issue Type: Improvement
>  Components: UDF
>Affects Versions: 0.7.0
>Reporter: Steven Wong
>
> Currently, get_json_object(json_txt, path) always returns null if json_txt is 
> not a JSON object (e.g. is a JSON array) at the root level.
> I have a table column of JSON arrays at the root level, but I can't parse it 
> because of that.
> get_json_object should accept any JSON value (string, number, object, array, 
> true, false, null), not just object, at the root level. In other words, it 
> should behave as if it were named get_json_value or simply get_json.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1575) get_json_object does not support JSON array at the root level

2010-10-21 Thread Mike Lewis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Lewis updated HIVE-1575:
-

Attachment: (was: 
0001-Updated-UDFJson-to-allow-arrays-as-a-root-object.patch)

> get_json_object does not support JSON array at the root level
> -
>
> Key: HIVE-1575
> URL: https://issues.apache.org/jira/browse/HIVE-1575
> Project: Hive
>  Issue Type: Improvement
>  Components: UDF
>Affects Versions: 0.7.0
>Reporter: Steven Wong
>
> Currently, get_json_object(json_txt, path) always returns null if json_txt is 
> not a JSON object (e.g. is a JSON array) at the root level.
> I have a table column of JSON arrays at the root level, but I can't parse it 
> because of that.
> get_json_object should accept any JSON value (string, number, object, array, 
> true, false, null), not just object, at the root level. In other words, it 
> should behave as if it were named get_json_value or simply get_json.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-78) Authorization infrastructure for Hive

2010-10-21 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923642#action_12923642
 ] 

He Yongqiang commented on HIVE-78:
--

@dhruba
HDFS has its own authorization. So if we allow an access in Hive layer and pass 
this access to HDFS (by setting the correct hdfs username and groups), the job 
can fail with HDFS permission problem. 
So need to solve the problem from 2 layer independent authorization.
One way to allow all accesses to HDFS, and let hive do the authorization. So 
hive runs as root in terms of HDFS.
The other way is to plug in HDFS authorization to Hive layer, and only accept 
one access if both of Hive and HDFS say YES.  A user belongs to different unix 
groups, and set hdfs permission based on the unix group. [ I am not sure about 
how many groups a user can have in terms of HDFS. I mean how many group 
settings you can put to a hdfs file. Let's simply say i want these 2 groups to 
be able to read the file.]  The another problem is the column level privileges.
This is very open for discussion, please comment on it.


About the proposal, there is one authorization rule that we are not sure about. 
It's the simple rule: one deny then deny.

Let's say this example:
5.3.1 I want to grant everyone (new people may join at anytime) to db_name.*, 
and then later i want to protect one table db_name.T from ALL users but a few
1) Add all users to a group 'users'. (assumption: new users will automatically 
join this group). And grant 'users' ALL privileges to db_name.*
2) Add those few users to a new group 'users2'. AND REMOVE them from 'users'
3) DENY 'users' to db_name.T
4) Grant ALL on db_name.T to users2

The main problem in this approach is that "REMOVE them from 'users'" is not 
practicable. 


The other options that we have thought about is another rule.

First try user name:

first try to deny this access by look up the deny tables by user name:

1. If there is an entry in 'user' that deny this access, return DENY
2. If there is an entry in 'db'  that deny this access, return DENY
3. If there is an entry in 'table'  that deny this access, return DENY
4. If there is an entry in 'column'  that deny this access, return DENY

If we got one deny, will return deny for this attempt.

if deny failed, go through all privilege levels with the user name:

5. If there is an entry in 'user' that accept this access, return ACCEPT
6. If there is an entry in 'db'  that accept this access, return ACCEPT
7. If there is an entry in 'table'  that accept this access, return ACCEPT
8. If there is an entry in 'column'  that accept this access, return ACCEPT


Second try the user's group/role names one by one until we get an ACCEPT. If we 
get an ACCEPT from one group/role, will ACCEPT this access. Else deny.

For each role/group, we do the same routine as we did for user name.
The problem with this approach is it's a little bit complex and we did not find 
any system that use this. For mysql, there is no deny. For sql server, it's one 
deny then deny.


> Authorization infrastructure for Hive
> -
>
> Key: HIVE-78
> URL: https://issues.apache.org/jira/browse/HIVE-78
> Project: Hive
>  Issue Type: New Feature
>  Components: Server Infrastructure
>Reporter: Ashish Thusoo
>Assignee: He Yongqiang
> Attachments: createuser-v1.patch, hive-78-metadata-v1.patch, 
> hive-78-syntax-v1.patch, hive-78.diff
>
>
> Allow hive to integrate with existing user repositories for authentication 
> and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1740) support NOT IN and NOT LIKE syntax

2010-10-21 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923641#action_12923641
 ] 

John Sichi commented on HIVE-1740:
--

This may apply to more operators such as RLIKE too.

Workaround is currently to use prefix NOT

NOT(x LIKE p)
etc.


> support NOT IN and NOT LIKE syntax
> --
>
> Key: HIVE-1740
> URL: https://issues.apache.org/jira/browse/HIVE-1740
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.6.0
>Reporter: John Sichi
> Fix For: 0.7.0
>
>
> Hive should support standard SQL syntax
> x NOT LIKE p
> x NOT IN (...)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1738) Optimize Key Comparison in GroupByOperator

2010-10-21 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-1738:
--

Attachment: HIVE.1738.3.patch

Modify according to Namit's comments.

> Optimize Key Comparison in GroupByOperator
> --
>
> Key: HIVE-1738
> URL: https://issues.apache.org/jira/browse/HIVE-1738
> Project: Hive
>  Issue Type: Improvement
>Reporter: Siying Dong
>Assignee: Siying Dong
> Attachments: HIVE.1738.1.patch, HIVE.1738.2.patch, HIVE.1738.3.patch
>
>
> GroupByOperator uses ObjectInspectorUtils.compare() to compare keys, which is 
> written for generalized object comparisons, which is not optimized for 
> group-by operator. By optimizing this logic, we expect to see obvious 
> improvements in GroupByOperator.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1738) Optimize Key Comparison in GroupByOperator

2010-10-21 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-1738:
--

Status: Patch Available  (was: Open)

> Optimize Key Comparison in GroupByOperator
> --
>
> Key: HIVE-1738
> URL: https://issues.apache.org/jira/browse/HIVE-1738
> Project: Hive
>  Issue Type: Improvement
>Reporter: Siying Dong
>Assignee: Siying Dong
> Attachments: HIVE.1738.1.patch, HIVE.1738.2.patch, HIVE.1738.3.patch
>
>
> GroupByOperator uses ObjectInspectorUtils.compare() to compare keys, which is 
> written for generalized object comparisons, which is not optimized for 
> group-by operator. By optimizing this logic, we expect to see obvious 
> improvements in GroupByOperator.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1740) support NOT IN and NOT LIKE syntax

2010-10-21 Thread John Sichi (JIRA)
support NOT IN and NOT LIKE syntax
--

 Key: HIVE-1740
 URL: https://issues.apache.org/jira/browse/HIVE-1740
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: John Sichi
 Fix For: 0.7.0


Hive should support standard SQL syntax

x NOT LIKE p
x NOT IN (...)


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1575) get_json_object does not support JSON array at the root level

2010-10-21 Thread Mike Lewis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Lewis updated HIVE-1575:
-

Attachment: 0001-Updated-UDFJson-to-allow-arrays-as-a-root-object.patch

Sorry if this is a repeat post, wasn't sure how to attach the patch.

> get_json_object does not support JSON array at the root level
> -
>
> Key: HIVE-1575
> URL: https://issues.apache.org/jira/browse/HIVE-1575
> Project: Hive
>  Issue Type: Improvement
>  Components: UDF
>Affects Versions: 0.7.0
>Reporter: Steven Wong
> Attachments: 
> 0001-Updated-UDFJson-to-allow-arrays-as-a-root-object.patch
>
>
> Currently, get_json_object(json_txt, path) always returns null if json_txt is 
> not a JSON object (e.g. is a JSON array) at the root level.
> I have a table column of JSON arrays at the root level, but I can't parse it 
> because of that.
> get_json_object should accept any JSON value (string, number, object, array, 
> true, false, null), not just object, at the root level. In other words, it 
> should behave as if it were named get_json_value or simply get_json.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1575) get_json_object does not support JSON array at the root level

2010-10-21 Thread Mike Lewis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Lewis updated HIVE-1575:
-

Status: Patch Available  (was: Open)

Here's a quick patch I made to detect whether the root is an object or an 
array.  if it's an array, it will create a new JSONArray

> get_json_object does not support JSON array at the root level
> -
>
> Key: HIVE-1575
> URL: https://issues.apache.org/jira/browse/HIVE-1575
> Project: Hive
>  Issue Type: Improvement
>  Components: UDF
>Affects Versions: 0.7.0
>Reporter: Steven Wong
>
> Currently, get_json_object(json_txt, path) always returns null if json_txt is 
> not a JSON object (e.g. is a JSON array) at the root level.
> I have a table column of JSON arrays at the root level, but I can't parse it 
> because of that.
> get_json_object should accept any JSON value (string, number, object, array, 
> true, false, null), not just object, at the root level. In other words, it 
> should behave as if it were named get_json_value or simply get_json.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-78) Authorization infrastructure for Hive

2010-10-21 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923624#action_12923624
 ] 

dhruba borthakur commented on HIVE-78:
--

Can somebody pl comment on how this ties in with HDFS permission/authorization? 
There is a small subsection in the doc about this issue, but I am unable to 
understand that part.

> Authorization infrastructure for Hive
> -
>
> Key: HIVE-78
> URL: https://issues.apache.org/jira/browse/HIVE-78
> Project: Hive
>  Issue Type: New Feature
>  Components: Server Infrastructure
>Reporter: Ashish Thusoo
>Assignee: He Yongqiang
> Attachments: createuser-v1.patch, hive-78-metadata-v1.patch, 
> hive-78-syntax-v1.patch, hive-78.diff
>
>
> Allow hive to integrate with existing user repositories for authentication 
> and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-474) Support for distinct selection on two or more columns

2010-10-21 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-474:


Status: Open  (was: Patch Available)

> Support for distinct selection on two or more columns
> -
>
> Key: HIVE-474
> URL: https://issues.apache.org/jira/browse/HIVE-474
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Alexis Rondeau
>Assignee: Amareshwari Sriramadasu
> Attachments: hive-474.0.4.2rc.patch, patch-474.txt
>
>
> The ability to select distinct several, individual columns as by example: 
> select count(distinct user), count(distinct session) from actions;   
> Currently returns the following failure: 
> FAILED: Error in semantic analysis: line 2:7 DISTINCT on Different Columns 
> not Supported user

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (HIVE-1738) Optimize Key Comparison in GroupByOperator

2010-10-21 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923568#action_12923568
 ] 

Namit Jain edited comment on HIVE-1738 at 10/21/10 3:41 PM:


 * Also, for string and test elements, it performs slightly better than

spelling: (should be Text)




  public ListObjectsEqualComparer(ObjectInspector[] oi0, ObjectInspector[] oi1) 
{
assert(oi0.length == oi1.length);


Instead of asserting, can you throw an error ?




ListObjectsEqualComparer

else {
 assert(type0.equals(type1));
 compareType = CompareType.SAME_TYPE;


Dont assert same type ?
types can be different - it wont happen for GroupBy


Otherwise, it looks great

  was (Author: namit):
 * Also, for string and test elements, it performs slightly better than

spelling: (should be Text)




  public ListObjectsEqualComparer(ObjectInspector[] oi0, ObjectInspector[] oi1) 
{
assert(oi0.length == oi1.length);


Instead of asserting, can you throw an error ?






} else {
 assert(type0.equals(type1));
 compareType = CompareType.SAME_TYPE;


Dont assert same type ?
types can be different - it wont happen for GroupBy


Otherwise, it looks great
  
> Optimize Key Comparison in GroupByOperator
> --
>
> Key: HIVE-1738
> URL: https://issues.apache.org/jira/browse/HIVE-1738
> Project: Hive
>  Issue Type: Improvement
>Reporter: Siying Dong
>Assignee: Siying Dong
> Attachments: HIVE.1738.1.patch, HIVE.1738.2.patch
>
>
> GroupByOperator uses ObjectInspectorUtils.compare() to compare keys, which is 
> written for generalized object comparisons, which is not optimized for 
> group-by operator. By optimizing this logic, we expect to see obvious 
> improvements in GroupByOperator.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (HIVE-1738) Optimize Key Comparison in GroupByOperator

2010-10-21 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923568#action_12923568
 ] 

Namit Jain edited comment on HIVE-1738 at 10/21/10 3:40 PM:


 * Also, for string and test elements, it performs slightly better than

spelling: (should be Text)




  public ListObjectsEqualComparer(ObjectInspector[] oi0, ObjectInspector[] oi1) 
{
assert(oi0.length == oi1.length);


Instead of asserting, can you throw an error ?






   } else {
 assert(type0.equals(type1));
 compareType = CompareType.SAME_TYPE;


Dont assert same type ?
types can be different - it wont happen for GroupBy


Otherwise, it looks great

  was (Author: namit):
 * Also, for string and test elements, it performs slightly better than

spelling: (should be Text)




  public ListObjectsEqualComparer(ObjectInspector[] oi0, ObjectInspector[] oi1) 
{
assert(oi0.length == oi1.length);


Instead of asserting, can you throw an error ?




   } else {
 assert(type0.equals(type1));
 compareType = CompareType.SAME_TYPE;


Dont assert same type ?
types can be different - it wont happen for GroupBy


Otherwise, it looks great
  
> Optimize Key Comparison in GroupByOperator
> --
>
> Key: HIVE-1738
> URL: https://issues.apache.org/jira/browse/HIVE-1738
> Project: Hive
>  Issue Type: Improvement
>Reporter: Siying Dong
>Assignee: Siying Dong
> Attachments: HIVE.1738.1.patch, HIVE.1738.2.patch
>
>
> GroupByOperator uses ObjectInspectorUtils.compare() to compare keys, which is 
> written for generalized object comparisons, which is not optimized for 
> group-by operator. By optimizing this logic, we expect to see obvious 
> improvements in GroupByOperator.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (HIVE-1738) Optimize Key Comparison in GroupByOperator

2010-10-21 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923568#action_12923568
 ] 

Namit Jain edited comment on HIVE-1738 at 10/21/10 3:40 PM:


 * Also, for string and test elements, it performs slightly better than

spelling: (should be Text)




  public ListObjectsEqualComparer(ObjectInspector[] oi0, ObjectInspector[] oi1) 
{
assert(oi0.length == oi1.length);


Instead of asserting, can you throw an error ?






} else {
 assert(type0.equals(type1));
 compareType = CompareType.SAME_TYPE;


Dont assert same type ?
types can be different - it wont happen for GroupBy


Otherwise, it looks great

  was (Author: namit):
 * Also, for string and test elements, it performs slightly better than

spelling: (should be Text)




  public ListObjectsEqualComparer(ObjectInspector[] oi0, ObjectInspector[] oi1) 
{
assert(oi0.length == oi1.length);


Instead of asserting, can you throw an error ?






   } else {
 assert(type0.equals(type1));
 compareType = CompareType.SAME_TYPE;


Dont assert same type ?
types can be different - it wont happen for GroupBy


Otherwise, it looks great
  
> Optimize Key Comparison in GroupByOperator
> --
>
> Key: HIVE-1738
> URL: https://issues.apache.org/jira/browse/HIVE-1738
> Project: Hive
>  Issue Type: Improvement
>Reporter: Siying Dong
>Assignee: Siying Dong
> Attachments: HIVE.1738.1.patch, HIVE.1738.2.patch
>
>
> GroupByOperator uses ObjectInspectorUtils.compare() to compare keys, which is 
> written for generalized object comparisons, which is not optimized for 
> group-by operator. By optimizing this logic, we expect to see obvious 
> improvements in GroupByOperator.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-474) Support for distinct selection on two or more columns

2010-10-21 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923602#action_12923602
 ] 

Namit Jain commented on HIVE-474:
-

1. add initEvaluators() in Operator.java instead of ReduceSinkOperator.java
2. ReduceSinkDesc: use numKeys and getNumKeys() or change numKeys to 
numDistributionKeys -
   You may run into problems with serialization/deserialization
3. Add some comments in initEvaluatorsAndReturnStruct in ReduceSinkOperator
   -- explain that it is same as parent in case of no union for groupby
4. Can you more comments in GroupByOperator and SemanticAnalyzer also ?
   It looks OK, but it will help if there are more comments.

> Support for distinct selection on two or more columns
> -
>
> Key: HIVE-474
> URL: https://issues.apache.org/jira/browse/HIVE-474
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Alexis Rondeau
>Assignee: Amareshwari Sriramadasu
> Attachments: hive-474.0.4.2rc.patch, patch-474.txt
>
>
> The ability to select distinct several, individual columns as by example: 
> select count(distinct user), count(distinct session) from actions;   
> Currently returns the following failure: 
> FAILED: Error in semantic analysis: line 2:7 DISTINCT on Different Columns 
> not Supported user

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-78) Authorization infrastructure for Hive

2010-10-21 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923598#action_12923598
 ] 

Namit Jain commented on HIVE-78:


Please comment - we would like to hear all use cases before finalizing the 
design.

> Authorization infrastructure for Hive
> -
>
> Key: HIVE-78
> URL: https://issues.apache.org/jira/browse/HIVE-78
> Project: Hive
>  Issue Type: New Feature
>  Components: Server Infrastructure
>Reporter: Ashish Thusoo
>Assignee: He Yongqiang
> Attachments: createuser-v1.patch, hive-78-metadata-v1.patch, 
> hive-78-syntax-v1.patch, hive-78.diff
>
>
> Allow hive to integrate with existing user repositories for authentication 
> and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-78) Authorization infrastructure for Hive

2010-10-21 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923597#action_12923597
 ] 

Carl Steinbach commented on HIVE-78:


Authorization proposal on the wiki: http://wiki.apache.org/hadoop/Hive/AuthDev


> Authorization infrastructure for Hive
> -
>
> Key: HIVE-78
> URL: https://issues.apache.org/jira/browse/HIVE-78
> Project: Hive
>  Issue Type: New Feature
>  Components: Server Infrastructure
>Reporter: Ashish Thusoo
>Assignee: He Yongqiang
> Attachments: createuser-v1.patch, hive-78-metadata-v1.patch, 
> hive-78-syntax-v1.patch, hive-78.diff
>
>
> Allow hive to integrate with existing user repositories for authentication 
> and authorization infromation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1597) Hive CLI returns MasterNotRunningException with HBase 0.89.x

2010-10-21 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923592#action_12923592
 ] 

Jean-Daniel Cryans commented on HIVE-1597:
--

I created HBASE-3143.

> Hive CLI returns MasterNotRunningException with HBase 0.89.x
> 
>
> Key: HIVE-1597
> URL: https://issues.apache.org/jira/browse/HIVE-1597
> Project: Hive
>  Issue Type: Bug
>  Components: HBase Handler
>Affects Versions: 0.7.0
>Reporter: Basab Maulik
>
> This is a follow on task to HIVE-1512.
> hive> CREATE TABLE hbase_table_1(key int, value string)
>> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>> WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")
>> TBLPROPERTIES ("hbase.table.name" = "xyz");
> FAILED: Error in metadata:
> MetaException(message:org.apache.hadoop.hbase.MasterNotRunningException:
> 10.2.128.92:6  at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getMaster(HConnectionManager.java:376)
> ...
> This reproduces in testing with CDH3 and with HBase 0.89.x snapshot/zookeeper 
> 3.3.1.
> Interesting, the tests TestHBaseSerDe, TestLazyHBaseObject, 
> TestHBaseCliDriver, and TestHBaseCliMinimrDriver pass using these upgraded 
> versions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [VOTE] hive 0.6.0 release candidate 0

2010-10-21 Thread John Sichi
Yeah, the scripts should only be needed in configurations where JDO is told not 
to automatically update the schema.  This is recommended for production 
environments.

For this particular release, taking a downtime while running the scripts is a 
good idea due to the nature of the changes (e.g. altering the primary key on 
COLS).  That needn't be true in general for additive-only changes.

JVS

On Oct 21, 2010, at 12:14 PM, Edward Capriolo wrote:

> On Wed, Oct 20, 2010 at 6:38 PM, John Sichi  wrote:
>> The tarballs are at
>> 
>> http://people.apache.org/~jvs/hive-0.6.0-candidate-0
>> 
>> Carl did some sanity testing on it already, but any additional testing you 
>> can do before voting helps to ensure a quality release.
>> 
>> JVS
>> 
>> 
> 
> I am checking it out now. It seems like since i have used two trunk
> versions since hive the view related tables have already been created.
> I do not need the update script.



[jira] Commented: (HIVE-1597) Hive CLI returns MasterNotRunningException with HBase 0.89.x

2010-10-21 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923574#action_12923574
 ] 

Jean-Daniel Cryans commented on HIVE-1597:
--

Oh I see now, the hbase-site.xml for the unit tests wasn't included in the test 
jar before and now it seems it is.

> Hive CLI returns MasterNotRunningException with HBase 0.89.x
> 
>
> Key: HIVE-1597
> URL: https://issues.apache.org/jira/browse/HIVE-1597
> Project: Hive
>  Issue Type: Bug
>  Components: HBase Handler
>Affects Versions: 0.7.0
>Reporter: Basab Maulik
>
> This is a follow on task to HIVE-1512.
> hive> CREATE TABLE hbase_table_1(key int, value string)
>> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>> WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")
>> TBLPROPERTIES ("hbase.table.name" = "xyz");
> FAILED: Error in metadata:
> MetaException(message:org.apache.hadoop.hbase.MasterNotRunningException:
> 10.2.128.92:6  at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getMaster(HConnectionManager.java:376)
> ...
> This reproduces in testing with CDH3 and with HBase 0.89.x snapshot/zookeeper 
> 3.3.1.
> Interesting, the tests TestHBaseSerDe, TestLazyHBaseObject, 
> TestHBaseCliDriver, and TestHBaseCliMinimrDriver pass using these upgraded 
> versions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [VOTE] hive 0.6.0 release candidate 0

2010-10-21 Thread Edward Capriolo
On Wed, Oct 20, 2010 at 6:38 PM, John Sichi  wrote:
> The tarballs are at
>
> http://people.apache.org/~jvs/hive-0.6.0-candidate-0
>
> Carl did some sanity testing on it already, but any additional testing you 
> can do before voting helps to ensure a quality release.
>
> JVS
>
>

I am checking it out now. It seems like since i have used two trunk
versions since hive the view related tables have already been created.
I do not need the update script.


[jira] Commented: (HIVE-1597) Hive CLI returns MasterNotRunningException with HBase 0.89.x

2010-10-21 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923569#action_12923569
 ] 

Jean-Daniel Cryans commented on HIVE-1597:
--

I just tried it with hbase trunk (which gives a ZooKeeperConnectionException 
instead of the MasterNotRunningException as we reworked that part). Looking at 
the hive.log file, I see that it tries to connect to a non-default port:

{noformat}
2010-10-21 11:49:21,871 DEBUG zookeeper.ZKUtil (ZKUtil.java:connect(94)) - 
hconnection opening connection to ZooKeeper with ensemble (localhost:21818)
{noformat}

This is because there's a hbase-site.xml file in hbase's src/test/resources 
that's used for tests which has the port 21818 and it seems to get picked up on 
the classpath. If I set hbase.zookeeper.client.port to 2181 in 
conf/hive-site.xml, I can create the table just fine. I believe this is a HBase 
issue, not Hive, and I'll make sure we fix this for 0.90.0. since I can easily 
see "normal" users getting this error.

> Hive CLI returns MasterNotRunningException with HBase 0.89.x
> 
>
> Key: HIVE-1597
> URL: https://issues.apache.org/jira/browse/HIVE-1597
> Project: Hive
>  Issue Type: Bug
>  Components: HBase Handler
>Affects Versions: 0.7.0
>Reporter: Basab Maulik
>
> This is a follow on task to HIVE-1512.
> hive> CREATE TABLE hbase_table_1(key int, value string)
>> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>> WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")
>> TBLPROPERTIES ("hbase.table.name" = "xyz");
> FAILED: Error in metadata:
> MetaException(message:org.apache.hadoop.hbase.MasterNotRunningException:
> 10.2.128.92:6  at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getMaster(HConnectionManager.java:376)
> ...
> This reproduces in testing with CDH3 and with HBase 0.89.x snapshot/zookeeper 
> 3.3.1.
> Interesting, the tests TestHBaseSerDe, TestLazyHBaseObject, 
> TestHBaseCliDriver, and TestHBaseCliMinimrDriver pass using these upgraded 
> versions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1738) Optimize Key Comparison in GroupByOperator

2010-10-21 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923568#action_12923568
 ] 

Namit Jain commented on HIVE-1738:
--

 * Also, for string and test elements, it performs slightly better than

spelling: (should be Text)




  public ListObjectsEqualComparer(ObjectInspector[] oi0, ObjectInspector[] oi1) 
{
assert(oi0.length == oi1.length);


Instead of asserting, can you throw an error ?




   } else {
 assert(type0.equals(type1));
 compareType = CompareType.SAME_TYPE;


Dont assert same type ?
types can be different - it wont happen for GroupBy


Otherwise, it looks great

> Optimize Key Comparison in GroupByOperator
> --
>
> Key: HIVE-1738
> URL: https://issues.apache.org/jira/browse/HIVE-1738
> Project: Hive
>  Issue Type: Improvement
>Reporter: Siying Dong
>Assignee: Siying Dong
> Attachments: HIVE.1738.1.patch, HIVE.1738.2.patch
>
>
> GroupByOperator uses ObjectInspectorUtils.compare() to compare keys, which is 
> written for generalized object comparisons, which is not optimized for 
> group-by operator. By optimizing this logic, we expect to see obvious 
> improvements in GroupByOperator.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1738) Optimize Key Comparison in GroupByOperator

2010-10-21 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-1738:
--

Attachment: HIVE.1738.2.patch

Resolver conflicts to previous check-in

> Optimize Key Comparison in GroupByOperator
> --
>
> Key: HIVE-1738
> URL: https://issues.apache.org/jira/browse/HIVE-1738
> Project: Hive
>  Issue Type: Improvement
>Reporter: Siying Dong
>Assignee: Siying Dong
> Attachments: HIVE.1738.1.patch, HIVE.1738.2.patch
>
>
> GroupByOperator uses ObjectInspectorUtils.compare() to compare keys, which is 
> written for generalized object comparisons, which is not optimized for 
> group-by operator. By optimizing this logic, we expect to see obvious 
> improvements in GroupByOperator.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1738) Optimize Key Comparison in GroupByOperator

2010-10-21 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1738:
-

Status: Open  (was: Patch Available)

> Optimize Key Comparison in GroupByOperator
> --
>
> Key: HIVE-1738
> URL: https://issues.apache.org/jira/browse/HIVE-1738
> Project: Hive
>  Issue Type: Improvement
>Reporter: Siying Dong
>Assignee: Siying Dong
> Attachments: HIVE.1738.1.patch
>
>
> GroupByOperator uses ObjectInspectorUtils.compare() to compare keys, which is 
> written for generalized object comparisons, which is not optimized for 
> group-by operator. By optimizing this logic, we expect to see obvious 
> improvements in GroupByOperator.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1738) Optimize Key Comparison in GroupByOperator

2010-10-21 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923523#action_12923523
 ] 

Namit Jain commented on HIVE-1738:
--

Can you regenerate the patch ? I am getting some conflicts

> Optimize Key Comparison in GroupByOperator
> --
>
> Key: HIVE-1738
> URL: https://issues.apache.org/jira/browse/HIVE-1738
> Project: Hive
>  Issue Type: Improvement
>Reporter: Siying Dong
>Assignee: Siying Dong
> Attachments: HIVE.1738.1.patch
>
>
> GroupByOperator uses ObjectInspectorUtils.compare() to compare keys, which is 
> written for generalized object comparisons, which is not optimized for 
> group-by operator. By optimizing this logic, we expect to see obvious 
> improvements in GroupByOperator.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1672) Complex Hive queries fails with Task timeouts when trying to do a table scan

2010-10-21 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923520#action_12923520
 ] 

Namit Jain commented on HIVE-1672:
--

The changes look good - but it might conflict with 
https://issues.apache.org/jira/browse/HIVE-1641 which is a much bigger patch, 
ans nearly ready.
So, I think we should hold on for HIVE-1672 till we are done with HIVE-1641

> Complex Hive queries fails with Task timeouts when trying to do a table scan
> 
>
> Key: HIVE-1672
> URL: https://issues.apache.org/jira/browse/HIVE-1672
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Shrikrishna Lawande
>Assignee: Amareshwari Sriramadasu
> Attachments: patch-1672-1.txt, patch-1672.txt
>
>
> executing a join query where one of the tables is a fact table would fail 
> during table scan of the fact table. This usually happens when one of the 
> tasks is scanning large number of rows (say 200 thousand rows in my case) and 
> the task fails to respond in the timeout window.
> The workaround for this is to set a very large timeout for task. I could 
> manage to run the query by setting the timeout to 0. (infinite) 
> To repro :
> Run a join query with couple of tables of which one is a fact table. In my 
> env, the fact table has 40TB data with more than a Billion rows. Most of the 
> map tasks are processing over 200 thousand rows. 
> Few of the task takes more than 30 min to respond and fail since the default 
> task timeout if 10 min..

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1737) Two Bugs for Estimating Row Sizes in GroupByOperator

2010-10-21 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1737:
-

   Resolution: Fixed
Fix Version/s: 0.7.0
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

Committed. Thanks Siying

> Two Bugs for Estimating Row Sizes in GroupByOperator
> 
>
> Key: HIVE-1737
> URL: https://issues.apache.org/jira/browse/HIVE-1737
> Project: Hive
>  Issue Type: Bug
>Reporter: Siying Dong
>Assignee: Siying Dong
> Fix For: 0.7.0
>
> Attachments: HIVE-1737.1.patch
>
>
> Two bugs:
> 1. if UDAF uses string type, Group-by will break as it tries to insert an 
> ArrayList to a HashMap.
> 2. The code to sample size of keys only handles String type and Text type, 
> while in most cases, they are org.apache.hadoop.hive.serde2.lazy.LazyString, 
> so that 0 is always used.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HIVE-1739) Move forest from hadoop svn to hive svn

2010-10-21 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo resolved HIVE-1739.
---

Resolution: Won't Fix

We will not need this. This will happen when the entire trunk is moved.

> Move forest from hadoop svn to hive svn
> ---
>
> Key: HIVE-1739
> URL: https://issues.apache.org/jira/browse/HIVE-1739
> Project: Hive
>  Issue Type: Task
>  Components: Documentation
>Reporter: Edward Capriolo
>
> Currently the hive forest is still inside hadoop. Before we move to our own 
> SVN we should move/copy the documentation from inside hadoop's svn to space 
> in our subtree.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1739) Move forest from hadoop svn to hive svn

2010-10-21 Thread Edward Capriolo (JIRA)
Move forest from hadoop svn to hive svn
---

 Key: HIVE-1739
 URL: https://issues.apache.org/jira/browse/HIVE-1739
 Project: Hive
  Issue Type: Task
  Components: Documentation
Reporter: Edward Capriolo


Currently the hive forest is still inside hadoop. Before we move to our own SVN 
we should move/copy the documentation from inside hadoop's svn to space in our 
subtree.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1612) Cannot build hive for hadoop 0.21.0

2010-10-21 Thread SingoWong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923487#action_12923487
 ] 

SingoWong commented on HIVE-1612:
-

Hi,

Can you execute the hive-ql in hive client? i got this error below:

$ ./hive
Exception in thread "main" java.lang.RuntimeException: Could not load shims
in class null
at org.apache.hadoop
.hive.shims.ShimLoader.loadShims(ShimLoader.java:86)
at org.apache.hadoop
.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:62)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:234)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:192)
Caused by: java.lang.NullPointerException
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:169)
at org.apache.hadoop
.hive.shims.ShimLoader.loadShims(ShimLoader.java:83)
... 7 more

and cannot execute hive, is it HIVE dont support hadoop0.21.0 now?

Thanks & Regards,
Singo





> Cannot build hive for hadoop 0.21.0
> ---
>
> Key: HIVE-1612
> URL: https://issues.apache.org/jira/browse/HIVE-1612
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: AJ Pahl
> Attachments: HIVE-1612.patch
>
>
> Current trunk for 0.7.0 does not support building HIVE against the Hadoop 
> 0.21.0 release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [jira] Updated: (HIVE-1612) Cannot build hive for hadoop 0.21.0

2010-10-21 Thread SingoWong
Hi,

Can you execute the hive-ql in hive client? i got this error below:

$ ./hive
Exception in thread "main" java.lang.RuntimeException: Could not load shims
in class null
at org.apache.hadoop
.hive.shims.ShimLoader.loadShims(ShimLoader.java:86)
at org.apache.hadoop
.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:62)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:234)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:192)
Caused by: java.lang.NullPointerException
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:169)
at org.apache.hadoop
.hive.shims.ShimLoader.loadShims(ShimLoader.java:83)
... 7 more

and cannot execute hive, is it HIVE dont support hadoop0.21.0 now?

Thanks & Regards,
Singo


On Wed, Oct 13, 2010 at 2:51 AM, Daisuke Fujiwara (JIRA) wrote:

>
> [
> https://issues.apache.org/jira/browse/HIVE-1612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel]
>
> Daisuke Fujiwara updated HIVE-1612:
> ---
>
>Attachment: HIVE-1612.patch
>
> Hi,
> I have managed to build Hive (trunk -r 1000628) against Hadoop 0.21.0.
>
> Following is what I had to do to make it work.
> 1. build Hive as instructed in Hive wiki (ant pacakge).
> 2. Copy hadoop-0.21.0.tar.gz to ${HIVE_DIR}/build/hadoopcore, untar it, and
> create/touch the "hadoop-0.21.0.installed" file
> 3. Apply the patch
> 4. build Hive again by issuing ant package "-Dhadoop.version=0.21.0
> -Doffline=true"
>
> I understand that the patch and process to build is not in the pristine
> forms, but I would like to get some feedback and see if I can pursue this
> further.
>
> Thanks.
>
> > Cannot build hive for hadoop 0.21.0
> > ---
> >
> > Key: HIVE-1612
> > URL: https://issues.apache.org/jira/browse/HIVE-1612
> > Project: Hadoop Hive
> >  Issue Type: Bug
> >Affects Versions: 0.7.0
> >Reporter: AJ Pahl
> > Attachments: HIVE-1612.patch
> >
> >
> > Current trunk for 0.7.0 does not support building HIVE against the Hadoop
> 0.21.0 release.
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>


[jira] Commented: (HIVE-1738) Optimize Key Comparison in GroupByOperator

2010-10-21 Thread Siying Dong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923397#action_12923397
 ] 

Siying Dong commented on HIVE-1738:
---

One note: for the query above, input format is SequenceFile, which is not 
friendly to this kind of query. I convert the input to RCFile and do the same 
comparison against it, I can see Map's CPU_MILLISECONDS are improved from about 
1,050,000 to about 965,000. 

> Optimize Key Comparison in GroupByOperator
> --
>
> Key: HIVE-1738
> URL: https://issues.apache.org/jira/browse/HIVE-1738
> Project: Hive
>  Issue Type: Improvement
>Reporter: Siying Dong
>Assignee: Siying Dong
> Attachments: HIVE.1738.1.patch
>
>
> GroupByOperator uses ObjectInspectorUtils.compare() to compare keys, which is 
> written for generalized object comparisons, which is not optimized for 
> group-by operator. By optimizing this logic, we expect to see obvious 
> improvements in GroupByOperator.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1738) Optimize Key Comparison in GroupByOperator

2010-10-21 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-1738:
--

Status: Patch Available  (was: Open)

> Optimize Key Comparison in GroupByOperator
> --
>
> Key: HIVE-1738
> URL: https://issues.apache.org/jira/browse/HIVE-1738
> Project: Hive
>  Issue Type: Improvement
>Reporter: Siying Dong
>Assignee: Siying Dong
> Attachments: HIVE.1738.1.patch
>
>
> GroupByOperator uses ObjectInspectorUtils.compare() to compare keys, which is 
> written for generalized object comparisons, which is not optimized for 
> group-by operator. By optimizing this logic, we expect to see obvious 
> improvements in GroupByOperator.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1738) Optimize Key Comparison in GroupByOperator

2010-10-21 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-1738:
--

Attachment: HIVE.1738.1.patch

I compare performance for a simple group-by query:

   SELECT col1,col2,col3,col4,count(1)
   FROM source_table
   GROUP BY col1, col2, col3, col4

which returns about 1000 rows.
The input has about 736M rows in about 19GB compressed files.The query started 
67 Mappers.

Map's CPU MILLISECONDS are down from about 2,950,000 to about 2,800,000. I ran 
each query at least 5 times, and the improvement is consistent in every run.

> Optimize Key Comparison in GroupByOperator
> --
>
> Key: HIVE-1738
> URL: https://issues.apache.org/jira/browse/HIVE-1738
> Project: Hive
>  Issue Type: Improvement
>Reporter: Siying Dong
>Assignee: Siying Dong
> Attachments: HIVE.1738.1.patch
>
>
> GroupByOperator uses ObjectInspectorUtils.compare() to compare keys, which is 
> written for generalized object comparisons, which is not optimized for 
> group-by operator. By optimizing this logic, we expect to see obvious 
> improvements in GroupByOperator.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1672) Complex Hive queries fails with Task timeouts when trying to do a table scan

2010-10-21 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923370#action_12923370
 ] 

Amareshwari Sriramadasu commented on HIVE-1672:
---

bq. What is the size of small table? and the number of rows in small table?
Small table has around 500 million rows. Big table has around 1 million 
rows.

> Complex Hive queries fails with Task timeouts when trying to do a table scan
> 
>
> Key: HIVE-1672
> URL: https://issues.apache.org/jira/browse/HIVE-1672
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Shrikrishna Lawande
>Assignee: Amareshwari Sriramadasu
> Attachments: patch-1672-1.txt, patch-1672.txt
>
>
> executing a join query where one of the tables is a fact table would fail 
> during table scan of the fact table. This usually happens when one of the 
> tasks is scanning large number of rows (say 200 thousand rows in my case) and 
> the task fails to respond in the timeout window.
> The workaround for this is to set a very large timeout for task. I could 
> manage to run the query by setting the timeout to 0. (infinite) 
> To repro :
> Run a join query with couple of tables of which one is a fact table. In my 
> env, the fact table has 40TB data with more than a Billion rows. Most of the 
> map tasks are processing over 200 thousand rows. 
> Few of the task takes more than 30 min to respond and fail since the default 
> task timeout if 10 min..

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1672) Complex Hive queries fails with Task timeouts when trying to do a table scan

2010-10-21 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923365#action_12923365
 ] 

He Yongqiang commented on HIVE-1672:


What is the size of small table? and the number of rows in small table?

> Complex Hive queries fails with Task timeouts when trying to do a table scan
> 
>
> Key: HIVE-1672
> URL: https://issues.apache.org/jira/browse/HIVE-1672
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Shrikrishna Lawande
>Assignee: Amareshwari Sriramadasu
> Attachments: patch-1672-1.txt, patch-1672.txt
>
>
> executing a join query where one of the tables is a fact table would fail 
> during table scan of the fact table. This usually happens when one of the 
> tasks is scanning large number of rows (say 200 thousand rows in my case) and 
> the task fails to respond in the timeout window.
> The workaround for this is to set a very large timeout for task. I could 
> manage to run the query by setting the timeout to 0. (infinite) 
> To repro :
> Run a join query with couple of tables of which one is a fact table. In my 
> env, the fact table has 40TB data with more than a Billion rows. Most of the 
> map tasks are processing over 200 thousand rows. 
> Few of the task takes more than 30 min to respond and fail since the default 
> task timeout if 10 min..

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1672) Complex Hive queries fails with Task timeouts when trying to do a table scan

2010-10-21 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-1672:
--

Attachment: patch-1672-1.txt

With a minor change from the earlier patch.

> Complex Hive queries fails with Task timeouts when trying to do a table scan
> 
>
> Key: HIVE-1672
> URL: https://issues.apache.org/jira/browse/HIVE-1672
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Shrikrishna Lawande
>Assignee: Amareshwari Sriramadasu
> Attachments: patch-1672-1.txt, patch-1672.txt
>
>
> executing a join query where one of the tables is a fact table would fail 
> during table scan of the fact table. This usually happens when one of the 
> tasks is scanning large number of rows (say 200 thousand rows in my case) and 
> the task fails to respond in the timeout window.
> The workaround for this is to set a very large timeout for task. I could 
> manage to run the query by setting the timeout to 0. (infinite) 
> To repro :
> Run a join query with couple of tables of which one is a fact table. In my 
> env, the fact table has 40TB data with more than a Billion rows. Most of the 
> map tasks are processing over 200 thousand rows. 
> Few of the task takes more than 30 min to respond and fail since the default 
> task timeout if 10 min..

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1672) Complex Hive queries fails with Task timeouts when trying to do a table scan

2010-10-21 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-1672:
--

Attachment: patch-1672.txt

I looked at Shrikrishna's query and the task logs. The mapper was spending time 
in processMapLocalWork() without reporting.
Attached patch fixes the problem.

> Complex Hive queries fails with Task timeouts when trying to do a table scan
> 
>
> Key: HIVE-1672
> URL: https://issues.apache.org/jira/browse/HIVE-1672
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Shrikrishna Lawande
> Attachments: patch-1672.txt
>
>
> executing a join query where one of the tables is a fact table would fail 
> during table scan of the fact table. This usually happens when one of the 
> tasks is scanning large number of rows (say 200 thousand rows in my case) and 
> the task fails to respond in the timeout window.
> The workaround for this is to set a very large timeout for task. I could 
> manage to run the query by setting the timeout to 0. (infinite) 
> To repro :
> Run a join query with couple of tables of which one is a fact table. In my 
> env, the fact table has 40TB data with more than a Billion rows. Most of the 
> map tasks are processing over 200 thousand rows. 
> Few of the task takes more than 30 min to respond and fail since the default 
> task timeout if 10 min..

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1672) Complex Hive queries fails with Task timeouts when trying to do a table scan

2010-10-21 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-1672:
--

Assignee: Amareshwari Sriramadasu
  Status: Patch Available  (was: Open)

> Complex Hive queries fails with Task timeouts when trying to do a table scan
> 
>
> Key: HIVE-1672
> URL: https://issues.apache.org/jira/browse/HIVE-1672
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Shrikrishna Lawande
>Assignee: Amareshwari Sriramadasu
> Attachments: patch-1672.txt
>
>
> executing a join query where one of the tables is a fact table would fail 
> during table scan of the fact table. This usually happens when one of the 
> tasks is scanning large number of rows (say 200 thousand rows in my case) and 
> the task fails to respond in the timeout window.
> The workaround for this is to set a very large timeout for task. I could 
> manage to run the query by setting the timeout to 0. (infinite) 
> To repro :
> Run a join query with couple of tables of which one is a fact table. In my 
> env, the fact table has 40TB data with more than a Billion rows. Most of the 
> map tasks are processing over 200 thousand rows. 
> Few of the task takes more than 30 min to respond and fail since the default 
> task timeout if 10 min..

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.