[jira] Commented: (HIVE-1751) Optimize ColumnarStructObjectInspector.getStructFieldData()

2010-11-04 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12928133#action_12928133
 ] 

Namit Jain commented on HIVE-1751:
--

+1

running tests.

 Optimize ColumnarStructObjectInspector.getStructFieldData()
 ---

 Key: HIVE-1751
 URL: https://issues.apache.org/jira/browse/HIVE-1751
 Project: Hive
  Issue Type: Improvement
Reporter: Siying Dong
Assignee: Siying Dong
 Attachments: HIVE-1751.1.patch


 ColumnarStructObjectInspector.getStructFieldData() is a heavy used function 
 and is expensive.
 By optimizing this function, including ColumnarStruct.uncheckedGetField() 
 called by it, most queries can benefit from it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1497) support COMMENT clause on CREATE INDEX, and add new commands for SHOW/DESCRIBE indexes

2010-11-04 Thread Russell Melick (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12928140#action_12928140
 ] 

Russell Melick commented on HIVE-1497:
--

Should also update the Index wiki:

http://wiki.apache.org/hadoop/Hive/IndexDev

 support COMMENT clause on CREATE INDEX, and add new commands for 
 SHOW/DESCRIBE indexes
 --

 Key: HIVE-1497
 URL: https://issues.apache.org/jira/browse/HIVE-1497
 Project: Hive
  Issue Type: Improvement
  Components: Indexing
Affects Versions: 0.7.0
Reporter: John Sichi
Assignee: Russell Melick
 Fix For: 0.7.0

 Attachments: HIVE-1497.4.patch, HIVE-1497.5.patch, 
 hive-1497.p1.patch, hive-1497.p2.patch, hive-1497.p3.patch


 We need to work out the syntax for SHOW/DESCRIBE, taking partitioning into 
 account.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1746) Support for using ALTER to set IDXPROPERTIES

2010-11-04 Thread Marquis Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marquis Wang updated HIVE-1746:
---

Attachment: HIVE-1746.2.patch

New patch. Includes thrift generated files and should work now.

 Support for using ALTER to set IDXPROPERTIES
 

 Key: HIVE-1746
 URL: https://issues.apache.org/jira/browse/HIVE-1746
 Project: Hive
  Issue Type: Improvement
Reporter: Marquis Wang
Assignee: Marquis Wang
 Attachments: 1746.prelim.patch, HIVE-1746.2.patch


 Hive-1498 has support for IDXPROPERTIES on index creation, so now we want to 
 support ALTERing those properties.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1754) Remove JDBM component from Map Join

2010-11-04 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12928262#action_12928262
 ] 

Liyin Tang commented on HIVE-1754:
--

This patch has some potential bugs. I will fix it today and upload a new one.

 Remove JDBM component from Map Join
 ---

 Key: HIVE-1754
 URL: https://issues.apache.org/jira/browse/HIVE-1754
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.6.0, 0.7.0
Reporter: Liyin Tang
Assignee: Liyin Tang
 Fix For: 0.7.0

 Attachments: Hive-1754.patch


 Right now, JDBM is the major performance bottleneck of performance.
 With the growth of the small table, the PUT and GET operation will take most 
 of execution time.
 Map Join is designed to load the data of small table into memory. 
 If the data is too large to hold in memory, then there is no need to use the 
 map join strategy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1750) Remove Partition Filtering Conditions when Possible

2010-11-04 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12928282#action_12928282
 ] 

Namit Jain commented on HIVE-1750:
--

Do you think it might be simpler to update opToParts list in 
PartitionPruner.prune() itself - reduces the possibility of bugs.
It can definitely be done in a follow-up

 Remove Partition Filtering Conditions when Possible
 ---

 Key: HIVE-1750
 URL: https://issues.apache.org/jira/browse/HIVE-1750
 Project: Hive
  Issue Type: Improvement
Reporter: Siying Dong
Assignee: Siying Dong
 Attachments: HIVE-1750.1.patch, HIVE-1750.2.patch, HIVE-1750.3.patch, 
 HIVE-1750.4.patch


 For some simple queries, partition filtering constraints take 8% of CPU time 
 (now 16% since we filter twice) even if the result is always true. When 
 possible, we should remove these constraints to save CPU times.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1501) when generating reentrant INSERT for index rebuild, quote identifiers using backticks

2010-11-04 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12928295#action_12928295
 ] 

John Sichi commented on HIVE-1501:
--

+1.  Will commit when tests pass.

 when generating reentrant INSERT for index rebuild, quote identifiers using 
 backticks
 -

 Key: HIVE-1501
 URL: https://issues.apache.org/jira/browse/HIVE-1501
 Project: Hive
  Issue Type: Bug
  Components: Indexing
Affects Versions: 0.7.0
Reporter: John Sichi
Assignee: Skye Berghel
 Fix For: 0.7.0

 Attachments: 1501.patch, 1501_new_tests.patch, 1501_with_tests.patch, 
 HIVE-1501.4.patch


 Yongqiang, you mentioned that you weren't able to do this due to SORT BY not 
 accepting them.  The SORT BY is gone now as of HIVE-1494 (and SORT BY needs 
 to be fixed anyway).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1634) Allow access to Primitive types stored in binary format in HBase

2010-11-04 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12928334#action_12928334
 ] 

John Sichi commented on HIVE-1634:
--

OK, I finally got some time to look into the Lazy* classes.  I see what you 
mean about the class hierarchy, and I agree that we can leave any refactoring 
of the existing classes for a followup patch.  Also, I was wrong to think that 
we could reuse the existing binary classes, since they do things such as VInt 
zero-compression, and that's incompatible with the HBase Bytes format.

However, for this patch, I want to at least get the new classes into their 
final destination with respect to package name and class name (so that we don't 
have to move them later, even if we adjust their inheritance).  To this end, I 
suggest a new package serde2.lazydio, and name the classes on the pattern 
LazyDioInteger.  The Dio is to indicate DataInput/DataOutput format.  (I was 
thinking of lazybytes and LazyByteInteger, to indicate HBase Bytes format, but 
then I saw that Byte is also one of the datatypes, and LazyBytesByte would be 
puzzling.)

Having both LazyIntegerBinary and LazyBinaryInteger, as in the current patch, 
would just be too confusing.

Also, regarding the implementation of the new classes, most of the init method 
code is duplicated from class to class.  The only thing specific to each class 
is the actual read+set.  Should we factor out a LazyDioObject (similar to the 
existing pattern for LazyObject and LazyBinaryObject)?  Likewise for 
LazyDioPrimitive and LazyDioNonPrimitive.

I will ask some others to chime in on this as well.


 Allow access to Primitive types stored in binary format in HBase
 

 Key: HIVE-1634
 URL: https://issues.apache.org/jira/browse/HIVE-1634
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Affects Versions: 0.7.0
Reporter: Basab Maulik
Assignee: Basab Maulik
 Attachments: HIVE-1634.0.patch, TestHiveHBaseExternalTable.java


 This addresses HIVE-1245 in part, for atomic or primitive types.
 The serde property hbase.columns.storage.types = -,b,b,b,b,b,b,b,b is a 
 specification of the storage option for the corresponding column in the serde 
 property hbase.columns.mapping. Allowed values are '-' for table default, 
 's' for standard string storage, and 'b' for binary storage as would be 
 obtained from o.a.h.hbase.utils.Bytes. Map types for HBase column families 
 use a colon separated pair such as 's:b' for the key and value part 
 specifiers respectively. See the test cases and queries for HBase handler for 
 additional examples.
 There is also a table property hbase.table.default.storage.type = string 
 to specify a table level default storage type. The other valid specification 
 is binary. The table level default is overridden by a column level 
 specification.
 This control is available for the boolean, tinyint, smallint, int, bigint, 
 float, and double primitive types. The attached patch also relaxes the 
 mapping of map types to HBase column families to allow any primitive type to 
 be the map key.
 Attached is a program for creating a table and populating it in HBase. The 
 external table in Hive can access the data as shown in the example below.
 hive create external table TestHiveHBaseExternalTable
  (key string, c_bool boolean, c_byte tinyint, c_short smallint,
   c_int int, c_long bigint, c_string string, c_float float, c_double 
 double)
   stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
   with serdeproperties (hbase.columns.mapping = 
 :key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double)
   tblproperties (hbase.table.name = TestHiveHBaseExternalTable);
 OK
 Time taken: 0.691 seconds
 hive select * from TestHiveHBaseExternalTable;
 OK
 key-1 NULLNULLNULLNULLNULLTest-String NULLNULL
 Time taken: 0.346 seconds
 hive drop table TestHiveHBaseExternalTable;
 OK
 Time taken: 0.139 seconds
 hive create external table TestHiveHBaseExternalTable
  (key string, c_bool boolean, c_byte tinyint, c_short smallint,
   c_int int, c_long bigint, c_string string, c_float float, c_double 
 double)
   stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
   with serdeproperties (
   hbase.columns.mapping = 
 :key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double,
   hbase.columns.storage.types = -,b,b,b,b,b,b,b,b )
   tblproperties (
   hbase.table.name = TestHiveHBaseExternalTable,
   hbase.table.default.storage.type = string);
 OK
 Time taken: 0.139 seconds
 hive select * from TestHiveHBaseExternalTable;
 OK
 key-1 true-128-32768  -2147483648 -9223372036854775808
 Test-String -2.1793132E-11  

[jira] Commented: (HIVE-1767) Merge files does not work with dynamic partition

2010-11-04 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12928382#action_12928382
 ] 

Namit Jain commented on HIVE-1767:
--

load data local inpath 
'/Users/heyongqiang/Documents/workspace/Hive-Index/data/files/srcbucket20.txt' 
INTO TABLE srcpart_merge_dp partition(ds='2008-04-08', hr=11);
load data local inpath 
'/Users/heyongqiang/Documents/workspace/Hive-Index/data/files/srcbucket21.txt' 
INTO TABLE srcpart_merge_dp partition(ds='2008-04-08', hr=11);
load data local inpath 
'/Users/heyongqiang/Documents/workspace/Hive-Index/data/files/srcbucket22.txt' 
INTO TABLE srcpart_merge_dp partition(ds='2008-04-08', hr=11);
load data local inpath 
'/Users/heyongqiang/Documents/workspace/Hive-Index/data/files/srcbucket23.txt' 
INTO TABLE srcpart_merge_dp partition(ds='2008-04-08', hr=11);


The test needs to be updated

 Merge files does not work with dynamic partition
 

 Key: HIVE-1767
 URL: https://issues.apache.org/jira/browse/HIVE-1767
 Project: Hive
  Issue Type: Bug
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: HIVE-1767.1.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1767) Merge files does not work with dynamic partition

2010-11-04 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1767:
---

Attachment: HIVE-1767.1.patch

 Merge files does not work with dynamic partition
 

 Key: HIVE-1767
 URL: https://issues.apache.org/jira/browse/HIVE-1767
 Project: Hive
  Issue Type: Bug
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: HIVE-1767.1.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1767) Merge files does not work with dynamic partition

2010-11-04 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1767:
---

Attachment: (was: HIVE-1767.1.patch)

 Merge files does not work with dynamic partition
 

 Key: HIVE-1767
 URL: https://issues.apache.org/jira/browse/HIVE-1767
 Project: Hive
  Issue Type: Bug
Reporter: He Yongqiang
Assignee: He Yongqiang



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1754) Remove JDBM component from Map Join

2010-11-04 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HIVE-1754:
-

Attachment: Hive-1754_2.patch

Remove JDBM from Hive completely 

 Remove JDBM component from Map Join
 ---

 Key: HIVE-1754
 URL: https://issues.apache.org/jira/browse/HIVE-1754
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.6.0, 0.7.0
Reporter: Liyin Tang
Assignee: Liyin Tang
 Fix For: 0.7.0

 Attachments: Hive-1754.patch, Hive-1754_2.patch


 Right now, JDBM is the major performance bottleneck of performance.
 With the growth of the small table, the PUT and GET operation will take most 
 of execution time.
 Map Join is designed to load the data of small table into memory. 
 If the data is too large to hold in memory, then there is no need to use the 
 map join strategy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1768) Update transident_lastDdlTime only if not specified

2010-11-04 Thread Paul Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Yang updated HIVE-1768:


Attachment: HIVE-1768.1.patch

 Update transident_lastDdlTime only if not specified
 ---

 Key: HIVE-1768
 URL: https://issues.apache.org/jira/browse/HIVE-1768
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.7.0
Reporter: Paul Yang
Assignee: Paul Yang
 Attachments: HIVE-1768.1.patch


 Currently, whenever a table/partition is created/altered, the field 
 'transient_lastDdl' time is updated with the current timestamp. For normal 
 operations, this is the desired behavior. However for some housekeeping 
 tasks, it may be helpful if the user could keep the existing value (or set it 
 to something different).
 One example where this is useful is if a partition is copied between 
 clusters. If the last modified time were kept same after the initial copy, it 
 would be easy to know if one partition were overwritten/updated by comparing 
 timestamps.
 This patch alters the behavior of create/alter methods in the metastore API 
 to update the timestamp only if it is not specified in the object.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1648) Automatically gathering stats when reading a table/partition

2010-11-04 Thread Paul Butler (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Butler updated HIVE-1648:
--

Attachment: HIVE-1648.patch

 Automatically gathering stats when reading a table/partition
 

 Key: HIVE-1648
 URL: https://issues.apache.org/jira/browse/HIVE-1648
 Project: Hive
  Issue Type: Sub-task
Reporter: Ning Zhang
 Attachments: HIVE-1648.patch


 HIVE-1361 introduces a new command 'ANALYZE TABLE T COMPUTE STATISTICS' to 
 gathering stats. This requires additional scan of the data. Stats gathering 
 can be piggy-backed on TableScanOperator whenever a table/partition is 
 scanned (given not LIMIT operator). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.