[jira] [Updated] (HIVE-3834) Support hive: alter view ... as ..

2012-12-24 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3834:
-

Status: Open  (was: Patch Available)

comments

 Support hive: alter view ... as ..
 --

 Key: HIVE-3834
 URL: https://issues.apache.org/jira/browse/HIVE-3834
 Project: Hive
  Issue Type: New Feature
Reporter: Zhenxiao Luo
Assignee: Zhenxiao Luo
 Attachments: HIVE-3834.1.patch.txt, HIVE-3834.2.patch.txt


 Hive supports alter view on setting property, add/drop partition etc but 
 not as.
 If you want to change as part, you have to drop view, recreate it and 
 backfill partition etc. pretty painful.
 It will be nice to support this. The reference is mysql syntax 
 http://dev.mysql.com/doc/refman/5.0/en/alter-view.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3784) de-emphasize mapjoin hint

2012-12-24 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13539247#comment-13539247
 ] 

Namit Jain commented on HIVE-3784:
--

[~vinodkv], I agree that with change, it will be 2 map-only jobs instead of 1 
map job.
I haven't tested it myself, but seems likely.

But, the current code has way too much complexity to handle this special case. 
Ideally, this change
should be done as part of converting join tasks into conditional join tasks. 
That layer should be smarter
to see that there is no need of a conditional task, and a map-only task is 
possible. Also, another layer needs
to be written to merge consecutive map-only tasks.

Although, for this special case, we are taking a hit, I still believe this is 
the right long term way to go.

 de-emphasize mapjoin hint
 -

 Key: HIVE-3784
 URL: https://issues.apache.org/jira/browse/HIVE-3784
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.3784.1.patch, hive.3784.2.patch, hive.3784.3.patch, 
 hive.3784.4.patch, hive.3784.5.patch


 hive.auto.convert.join has been around for a long time, and is pretty stable.
 When mapjoin hint was created, the above parameter did not exist.
 The only reason for the user to specify a mapjoin currently is if they want
 it to be converted to a bucketed-mapjoin or a sort-merge bucketed mapjoin.
 Eventually, that should also go away, but that may take some time to 
 stabilize.
 There are many rules in SemanticAnalyzer to handle the following trees:
 ReduceSink - MapJoin
 Union  - MapJoin
 MapJoin- MapJoin
 This should not be supported anymore. In any of the above scenarios, the
 user can get the mapjoin behavior by setting hive.auto.convert.join to true
 and not specifying the hint. This will simplify the code a lot.
 What does everyone think ?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3834) Support ALTER VIEW AS SELECT in Hive

2012-12-24 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13539282#comment-13539282
 ] 

Namit Jain commented on HIVE-3834:
--

+1

Running tests

 Support ALTER VIEW AS SELECT in Hive
 

 Key: HIVE-3834
 URL: https://issues.apache.org/jira/browse/HIVE-3834
 Project: Hive
  Issue Type: New Feature
Reporter: Zhenxiao Luo
Assignee: Zhenxiao Luo
 Attachments: HIVE-3834.1.patch.txt, HIVE-3834.2.patch.txt, 
 HIVE-3834.3.patch.txt


 Hive supports alter view on setting property, add/drop partition etc but 
 not as.
 If you want to change as part, you have to drop view, recreate it and 
 backfill partition etc. pretty painful.
 It will be nice to support this. The reference is mysql syntax 
 http://dev.mysql.com/doc/refman/5.0/en/alter-view.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3834) Support ALTER VIEW AS SELECT in Hive

2012-12-24 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13539357#comment-13539357
 ] 

Namit Jain commented on HIVE-3834:
--

Can you add documentation once it is committed ?

 Support ALTER VIEW AS SELECT in Hive
 

 Key: HIVE-3834
 URL: https://issues.apache.org/jira/browse/HIVE-3834
 Project: Hive
  Issue Type: New Feature
Reporter: Zhenxiao Luo
Assignee: Zhenxiao Luo
 Attachments: HIVE-3834.1.patch.txt, HIVE-3834.2.patch.txt, 
 HIVE-3834.3.patch.txt


 Hive supports alter view on setting property, add/drop partition etc but 
 not as.
 If you want to change as part, you have to drop view, recreate it and 
 backfill partition etc. pretty painful.
 It will be nice to support this. The reference is mysql syntax 
 http://dev.mysql.com/doc/refman/5.0/en/alter-view.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3834) Support ALTER VIEW AS SELECT in Hive

2012-12-24 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3834:
-

Status: Open  (was: Patch Available)

The following tests failed:

create_view.q
create_or_replace_view.q

Can you take a look ?

 Support ALTER VIEW AS SELECT in Hive
 

 Key: HIVE-3834
 URL: https://issues.apache.org/jira/browse/HIVE-3834
 Project: Hive
  Issue Type: New Feature
Reporter: Zhenxiao Luo
Assignee: Zhenxiao Luo
 Attachments: HIVE-3834.1.patch.txt, HIVE-3834.2.patch.txt, 
 HIVE-3834.3.patch.txt


 Hive supports alter view on setting property, add/drop partition etc but 
 not as.
 If you want to change as part, you have to drop view, recreate it and 
 backfill partition etc. pretty painful.
 It will be nice to support this. The reference is mysql syntax 
 http://dev.mysql.com/doc/refman/5.0/en/alter-view.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3829) Hive CLI needs UNSET TBLPROPERTY command

2012-12-23 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3829:
-

Status: Open  (was: Patch Available)

some more clarifications

 Hive CLI needs UNSET TBLPROPERTY command
 

 Key: HIVE-3829
 URL: https://issues.apache.org/jira/browse/HIVE-3829
 Project: Hive
  Issue Type: Bug
Reporter: Zhenxiao Luo
Assignee: Zhenxiao Luo
 Attachments: HIVE-3829.1.patch.txt, HIVE-3829.2.patch.txt, 
 HIVE-3829.3.patch.txt, HIVE-3829.4.patch.txt


 The Hive CLI currently supports
 ALTER TABLE table SET TBLPROPERTIES ('key1' = 'value1', 'key2' = 'value2', 
 ...);
 To add/change the value of table properties.
 It would be really useful if Hive also supported
 ALTER TABLE table UNSET TBLPROPERTIES ('key1', 'key2', ...);
 Which would remove table properties

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3718) Add check to determine whether partition can be dropped at Semantic Analysis time

2012-12-23 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13539053#comment-13539053
 ] 

Namit Jain commented on HIVE-3718:
--

For some reason, I could not run parallel tests on the git patch - applying a 
new patch to run tests

 Add check to determine whether partition can be dropped at Semantic Analysis 
 time
 -

 Key: HIVE-3718
 URL: https://issues.apache.org/jira/browse/HIVE-3718
 Project: Hive
  Issue Type: Task
  Components: CLI
Reporter: Pamela Vagata
Assignee: Pamela Vagata
Priority: Minor
 Attachments: HIVE-3718.1.patch.txt, HIVE-3718.2.patch.txt, 
 HIVE-3718.3.patch.txt, hive.3718.4.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3718) Add check to determine whether partition can be dropped at Semantic Analysis time

2012-12-23 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3718:
-

Attachment: hive.3718.4.patch

 Add check to determine whether partition can be dropped at Semantic Analysis 
 time
 -

 Key: HIVE-3718
 URL: https://issues.apache.org/jira/browse/HIVE-3718
 Project: Hive
  Issue Type: Task
  Components: CLI
Reporter: Pamela Vagata
Assignee: Pamela Vagata
Priority: Minor
 Attachments: HIVE-3718.1.patch.txt, HIVE-3718.2.patch.txt, 
 HIVE-3718.3.patch.txt, hive.3718.4.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3832) Insert overwrite doesn't create a dir if the skewed column position doesnt match

2012-12-23 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3832:
-

Status: Open  (was: Patch Available)

minor comments on phabricator

 Insert overwrite doesn't create a dir if the skewed column position doesnt 
 match
 

 Key: HIVE-3832
 URL: https://issues.apache.org/jira/browse/HIVE-3832
 Project: Hive
  Issue Type: Bug
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
 Attachments: HIVE-3832.patch.1


 If skewed column doesn't match the position in table column, insert overwrite 
 doesn't create sub-dir but put all into default directory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3718) Add check to determine whether partition can be dropped at Semantic Analysis time

2012-12-23 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13539056#comment-13539056
 ] 

Namit Jain commented on HIVE-3718:
--

drop_partitions_ignore_protection.q fails

 Add check to determine whether partition can be dropped at Semantic Analysis 
 time
 -

 Key: HIVE-3718
 URL: https://issues.apache.org/jira/browse/HIVE-3718
 Project: Hive
  Issue Type: Task
  Components: CLI
Reporter: Pamela Vagata
Assignee: Pamela Vagata
Priority: Minor
 Attachments: HIVE-3718.1.patch.txt, HIVE-3718.2.patch.txt, 
 HIVE-3718.3.patch.txt, hive.3718.4.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3718) Add check to determine whether partition can be dropped at Semantic Analysis time

2012-12-23 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3718:
-

Status: Open  (was: Patch Available)

 Add check to determine whether partition can be dropped at Semantic Analysis 
 time
 -

 Key: HIVE-3718
 URL: https://issues.apache.org/jira/browse/HIVE-3718
 Project: Hive
  Issue Type: Task
  Components: CLI
Reporter: Pamela Vagata
Assignee: Pamela Vagata
Priority: Minor
 Attachments: HIVE-3718.1.patch.txt, HIVE-3718.2.patch.txt, 
 HIVE-3718.3.patch.txt, hive.3718.4.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3832) Insert overwrite doesn't create a dir if the skewed column position doesnt match

2012-12-23 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13539169#comment-13539169
 ] 

Namit Jain commented on HIVE-3832:
--

+1

Running tests

 Insert overwrite doesn't create a dir if the skewed column position doesnt 
 match
 

 Key: HIVE-3832
 URL: https://issues.apache.org/jira/browse/HIVE-3832
 Project: Hive
  Issue Type: Bug
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
 Attachments: HIVE-3832.patch.1, HIVE-3832.patch.2


 If skewed column doesn't match the position in table column, insert overwrite 
 doesn't create sub-dir but put all into default directory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-3835) Add an option to run tests where testfiles can be specified as a regular expression

2012-12-23 Thread Namit Jain (JIRA)
Namit Jain created HIVE-3835:


 Summary: Add an option to run tests where testfiles can be 
specified as a regular expression
 Key: HIVE-3835
 URL: https://issues.apache.org/jira/browse/HIVE-3835
 Project: Hive
  Issue Type: Improvement
  Components: Testing Infrastructure
Reporter: Namit Jain


For eg., if I want to run all list bucketing tests, I should be able to say:

 ant test -Dtestcase=TestCliDriver -Dqfile=list_bucket_dml*.q

or something like that

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3829) Hive CLI needs UNSET TBLPROPERTY command

2012-12-23 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13539173#comment-13539173
 ] 

Namit Jain commented on HIVE-3829:
--

+1

 Hive CLI needs UNSET TBLPROPERTY command
 

 Key: HIVE-3829
 URL: https://issues.apache.org/jira/browse/HIVE-3829
 Project: Hive
  Issue Type: Bug
Reporter: Zhenxiao Luo
Assignee: Zhenxiao Luo
 Attachments: HIVE-3829.1.patch.txt, HIVE-3829.2.patch.txt, 
 HIVE-3829.3.patch.txt, HIVE-3829.4.patch.txt, HIVE-3829.5.patch.txt


 The Hive CLI currently supports
 ALTER TABLE table SET TBLPROPERTIES ('key1' = 'value1', 'key2' = 'value2', 
 ...);
 To add/change the value of table properties.
 It would be really useful if Hive also supported
 ALTER TABLE table UNSET TBLPROPERTIES ('key1', 'key2', ...);
 Which would remove table properties

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3824) bug if different serdes are used for different partitions

2012-12-23 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3824:
-

Attachment: hive.3824.3.patch

 bug if different serdes are used for different partitions
 -

 Key: HIVE-3824
 URL: https://issues.apache.org/jira/browse/HIVE-3824
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
 Attachments: hive.3824.1.patch, hive.3824.3.patch


 Consider the following testcase:
 create table tst5 (key string, value string) partitioned by (ds string) 
 stored as rcfile;
 insert overwrite table tst5 partition (ds='1') select * from src;
 insert overwrite table tst5 partition (ds='2') select * from src;
 insert overwrite table tst5 partition (ds='3') select * from src;
 alter table tst5 stored as sequencefile; 
 insert overwrite table tst5 partition (ds='4') select * from src;
 insert overwrite table tst5 partition (ds='5') select * from src;
 insert overwrite table tst5 partition (ds='6') select * from src;  
 alter table tst5 set serde 
 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'; 
 insert overwrite table tst5 partition (ds='7') select * from src;
 insert overwrite table tst5 partition (ds='8') select * from src;
 insert overwrite table tst5 partition (ds='9') select * from src;  
 The following query works fine:
  select key + key, value from tst5 where ((ds = '4') or (ds = '1'));   
 since both the partitions use ColumnarSerDe
 But the following query fails:
 select key + key, value from tst5 where ((ds = '4') or (ds = '1') or 
 (ds='7'));
 since different serdes are used.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3824) bug if different serdes are used for different partitions

2012-12-23 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3824:
-

Assignee: Namit Jain
  Status: Patch Available  (was: Open)

 bug if different serdes are used for different partitions
 -

 Key: HIVE-3824
 URL: https://issues.apache.org/jira/browse/HIVE-3824
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.3824.1.patch, hive.3824.3.patch


 Consider the following testcase:
 create table tst5 (key string, value string) partitioned by (ds string) 
 stored as rcfile;
 insert overwrite table tst5 partition (ds='1') select * from src;
 insert overwrite table tst5 partition (ds='2') select * from src;
 insert overwrite table tst5 partition (ds='3') select * from src;
 alter table tst5 stored as sequencefile; 
 insert overwrite table tst5 partition (ds='4') select * from src;
 insert overwrite table tst5 partition (ds='5') select * from src;
 insert overwrite table tst5 partition (ds='6') select * from src;  
 alter table tst5 set serde 
 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'; 
 insert overwrite table tst5 partition (ds='7') select * from src;
 insert overwrite table tst5 partition (ds='8') select * from src;
 insert overwrite table tst5 partition (ds='9') select * from src;  
 The following query works fine:
  select key + key, value from tst5 where ((ds = '4') or (ds = '1'));   
 since both the partitions use ColumnarSerDe
 But the following query fails:
 select key + key, value from tst5 where ((ds = '4') or (ds = '1') or 
 (ds='7'));
 since different serdes are used.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3832) Insert overwrite doesn't create a dir if the skewed column position doesnt match

2012-12-23 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3832:
-

   Resolution: Fixed
Fix Version/s: 0.11
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Committed. Thanks Tim

 Insert overwrite doesn't create a dir if the skewed column position doesnt 
 match
 

 Key: HIVE-3832
 URL: https://issues.apache.org/jira/browse/HIVE-3832
 Project: Hive
  Issue Type: Bug
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
 Fix For: 0.11

 Attachments: HIVE-3832.patch.1, HIVE-3832.patch.2


 If skewed column doesn't match the position in table column, insert overwrite 
 doesn't create sub-dir but put all into default directory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3833) object inspectors should be initialized based on partition metadata

2012-12-23 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13539201#comment-13539201
 ] 

Namit Jain commented on HIVE-3833:
--

Consider the following test:

set hive.input.format = org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;

create table partition_test_partitioned(key string, value string) partitioned 
by (dt string) stored as rcfile;

alter table partition_test_partitioned set serde 
'org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe';
insert overwrite table partition_test_partitioned partition(dt='1') select * 
from src where key = 238;

alter table partition_test_partitioned change key key int; 


The query:
select * from partition_test_partitioned where dt is not null;

returns:

50  val_238 1
50  val_238 1

This is due to the fact that the key column was serialized as a string column, 
and is now being read as a integer.

 object inspectors should be initialized based on partition metadata
 ---

 Key: HIVE-3833
 URL: https://issues.apache.org/jira/browse/HIVE-3833
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain

 Currently, different partitions can be picked up for the same input split 
 based on the
 serdes' etc. And, we dont allow to change the schema for 
 LazyColumnarBinarySerDe.
 Instead of that, different partitions should be part of the same split, only 
 if the
 partition schemas exactly match. The operator tree object inspectors should 
 be based
 on the partition schema. That would give greater flexibility and also help 
 using binary serde with rcfile

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3834) Support hive: alter view ... as ..

2012-12-23 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3834:
-

Status: Open  (was: Patch Available)

comments on phabricator

 Support hive: alter view ... as ..
 --

 Key: HIVE-3834
 URL: https://issues.apache.org/jira/browse/HIVE-3834
 Project: Hive
  Issue Type: New Feature
Reporter: Zhenxiao Luo
Assignee: Zhenxiao Luo
 Attachments: HIVE-3834.1.patch.txt


 Hive supports alter view on setting property, add/drop partition etc but 
 not as.
 If you want to change as part, you have to drop view, recreate it and 
 backfill partition etc. pretty painful.
 It will be nice to support this. The reference is mysql syntax 
 http://dev.mysql.com/doc/refman/5.0/en/alter-view.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3833) object inspectors should be initialized based on partition metadata

2012-12-23 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13539203#comment-13539203
 ] 

Namit Jain commented on HIVE-3833:
--

The possible options are to not allow the schema to be changed with 
LazyColumnarSerDe (only allow additions),
or use partition metadata for inspectors.

 object inspectors should be initialized based on partition metadata
 ---

 Key: HIVE-3833
 URL: https://issues.apache.org/jira/browse/HIVE-3833
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain

 Currently, different partitions can be picked up for the same input split 
 based on the
 serdes' etc. And, we dont allow to change the schema for 
 LazyColumnarBinarySerDe.
 Instead of that, different partitions should be part of the same split, only 
 if the
 partition schemas exactly match. The operator tree object inspectors should 
 be based
 on the partition schema. That would give greater flexibility and also help 
 using binary serde with rcfile

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-933) Infer bucketing/sorting properties

2012-12-22 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-933:


Status: Open  (was: Patch Available)

minor comments on phabricator

 Infer bucketing/sorting properties
 --

 Key: HIVE-933
 URL: https://issues.apache.org/jira/browse/HIVE-933
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Kevin Wilfong
 Attachments: HIVE-933.1.patch.txt, HIVE-933.2.patch.txt, 
 HIVE-933.3.patch.txt, HIVE-933.4.patch.txt, HIVE-933.5.patch.txt, 
 HIVE-933.6.patch.txt


 This is a long-term plan, and may require major changes.
 From the query, we can figure out the sorting/bucketing properties, and 
 change the metadata of the destination at that time.
 However, this means that different partitions may have different metadata. 
 Currently, the query plan is same for all the 
 partitions of the table - we can do the following:
 1. In the first cut, have a simple approach where you take the union all 
 metadata, and create the most defensive plan.
 2. Enhance mapredWork() to include partition specific operator trees.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3825) Add Operator level Hooks

2012-12-22 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3825:
-

Status: Open  (was: Patch Available)

Can you create a phabricator entry ?

Few comments:

1. Create a OperatorHookCtx - what you will pass to the hooks ?
2. Add some tests - some dummy hooks

 Add Operator level Hooks
 

 Key: HIVE-3825
 URL: https://issues.apache.org/jira/browse/HIVE-3825
 Project: Hive
  Issue Type: New Feature
Reporter: Pamela Vagata
Assignee: Pamela Vagata
Priority: Minor
 Attachments: HIVE-3825.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3718) Add check to determine whether partition can be dropped at Semantic Analysis time

2012-12-22 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13538840#comment-13538840
 ] 

Namit Jain commented on HIVE-3718:
--

+1

The patch file looks good. Can you refresh the phabricator entry also ?

 Add check to determine whether partition can be dropped at Semantic Analysis 
 time
 -

 Key: HIVE-3718
 URL: https://issues.apache.org/jira/browse/HIVE-3718
 Project: Hive
  Issue Type: Task
  Components: CLI
Reporter: Pamela Vagata
Assignee: Pamela Vagata
Priority: Minor
 Attachments: HIVE-3718.1.patch.txt, HIVE-3718.2.patch.txt, 
 HIVE-3718.3.patch.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3829) Hive CLI needs UNSET TBLPROPERTY command

2012-12-22 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3829:
-

Status: Open  (was: Patch Available)

some more comments

 Hive CLI needs UNSET TBLPROPERTY command
 

 Key: HIVE-3829
 URL: https://issues.apache.org/jira/browse/HIVE-3829
 Project: Hive
  Issue Type: Bug
Reporter: Zhenxiao Luo
Assignee: Zhenxiao Luo
 Attachments: HIVE-3829.1.patch.txt, HIVE-3829.2.patch.txt


 The Hive CLI currently supports
 ALTER TABLE table SET TBLPROPERTIES ('key1' = 'value1', 'key2' = 'value2', 
 ...);
 To add/change the value of table properties.
 It would be really useful if Hive also supported
 ALTER TABLE table UNSET TBLPROPERTIES ('key1', 'key2', ...);
 Which would remove table properties

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2439) Upgrade antlr version to 3.4

2012-12-22 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13538843#comment-13538843
 ] 

Namit Jain commented on HIVE-2439:
--

What is the reason for upgrading antlr ?

 Upgrade antlr version to 3.4
 

 Key: HIVE-2439
 URL: https://issues.apache.org/jira/browse/HIVE-2439
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.8.0
Reporter: Ashutosh Chauhan
Assignee: Thiruvel Thirumoolan
 Fix For: 0.10.0, 0.9.1, 0.11

 Attachments: HIVE-2439_branch9_2.patch, HIVE-2439_branch9_3.patch, 
 HIVE-2439_branch9.patch, hive-2439_incomplete.patch, HIVE-2439_trunk.patch


 Upgrade antlr version to 3.4

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3236) allow column names to be prefixed by table alias in select all queries

2012-12-22 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3236:
-

Status: Open  (was: Patch Available)

can you create a phabricator entry ?

 allow column names to be prefixed by table alias in select all queries
 --

 Key: HIVE-3236
 URL: https://issues.apache.org/jira/browse/HIVE-3236
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.10.0, 0.9.1
Reporter: Keegan Mosley
Priority: Minor
 Attachments: HIVE-3236.1.patch.txt


 When using CREATE TABLE x AS SELECT ... where the select joins tables with 
 hundreds of columns it is not a simple task to resolve duplicate column name 
 exceptions (particularly with self-joins). The user must either manually 
 specify aliases for all duplicate columns (potentially hundreds) or write a 
 script to generate the data set in a separate select query, then create the 
 table and load the data in.
 There should be some conf flag that would allow queries like
 create table joined as select one.\*, two.\* from mytable one join mytable 
 two on (one.duplicate_field = two.duplicate_field1);
 to create a table with columns one_duplicate_field and two_duplicate_field.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3635) allow 't', 'T', '1', 'f', 'F', and '0' to be allowable true/false values for the boolean hive type

2012-12-22 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3635:
-

Status: Open  (was: Patch Available)

Can you answer Edward's question ? and also refresh.

  allow 't', 'T', '1', 'f', 'F', and '0' to be allowable true/false values for 
 the boolean hive type
 ---

 Key: HIVE-3635
 URL: https://issues.apache.org/jira/browse/HIVE-3635
 Project: Hive
  Issue Type: Improvement
  Components: CLI
Affects Versions: 0.9.0
Reporter: Alexander Alten-Lorenz
Assignee: Alexander Alten-Lorenz
 Attachments: HIVE-3635.patch


 interpret t as true and f as false for boolean types. PostgreSQL exports 
 represent it that way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3828) insert overwrite fails with stored-as-dir in cluster

2012-12-22 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3828:
-

   Resolution: Fixed
Fix Version/s: 0.11
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Committed. Thanks Tim

 insert overwrite fails with stored-as-dir in cluster
 

 Key: HIVE-3828
 URL: https://issues.apache.org/jira/browse/HIVE-3828
 Project: Hive
  Issue Type: Bug
  Components: Import/Export
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
 Fix For: 0.11

 Attachments: HIVE-3828.patch.1


 The following query works fine in hive TestCliDriver test suite but not 
 minimr because different Hadoop file system is used.
 The error is
 {code}
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename 
 output from: .../_task_tmp.-ext-10002/key=103/_tmp.00_0 to: 
 .../_tmp.-ext-10002/key=103/00_0
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3829) Hive CLI needs UNSET TBLPROPERTY command

2012-12-22 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3829:
-

Status: Open  (was: Patch Available)

very minor comments

 Hive CLI needs UNSET TBLPROPERTY command
 

 Key: HIVE-3829
 URL: https://issues.apache.org/jira/browse/HIVE-3829
 Project: Hive
  Issue Type: Bug
Reporter: Zhenxiao Luo
Assignee: Zhenxiao Luo
 Attachments: HIVE-3829.1.patch.txt, HIVE-3829.2.patch.txt, 
 HIVE-3829.3.patch.txt


 The Hive CLI currently supports
 ALTER TABLE table SET TBLPROPERTIES ('key1' = 'value1', 'key2' = 'value2', 
 ...);
 To add/change the value of table properties.
 It would be really useful if Hive also supported
 ALTER TABLE table UNSET TBLPROPERTIES ('key1', 'key2', ...);
 Which would remove table properties

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-3833) object inspectors should be initialized based on partition metadata

2012-12-22 Thread Namit Jain (JIRA)
Namit Jain created HIVE-3833:


 Summary: object inspectors should be initialized based on 
partition metadata
 Key: HIVE-3833
 URL: https://issues.apache.org/jira/browse/HIVE-3833
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain


Currently, different partitions can be picked up for the same input split based 
on the
serdes' etc. And, we dont allow to change the schema for 
LazyColumnarBinarySerDe.
Instead of that, different partitions should be part of the same split, only if 
the
partition schemas exactly match. The operator tree object inspectors should be 
based
on the partition schema. That would give greater flexibility and also help 
using binary serde with rcfile

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3826) Rollbacks and retries of drops cause org.datanucleus.exceptions.NucleusObjectNotFoundException: No such database row)

2012-12-21 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13538660#comment-13538660
 ] 

Namit Jain commented on HIVE-3826:
--

+1

Great catch - running tests.

 Rollbacks and retries of drops cause 
 org.datanucleus.exceptions.NucleusObjectNotFoundException: No such database 
 row)
 -

 Key: HIVE-3826
 URL: https://issues.apache.org/jira/browse/HIVE-3826
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.11
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-3826.1.patch.txt


 I'm not sure if this is the only cause of the exception 
 org.datanucleus.exceptions.NucleusObjectNotFoundException: No such database 
 row) from the metastore, but one cause seems to be related to a drop command 
 failing, and being retried by the client.
 Based on focusing on a single thread in the metastore with DEBUG level 
 logging, I was seeing the objects that were intended to be dropped remaining 
 in the PersistenceManager cache even after a rollback.  The steps seemed to 
 be as follows:
 1) First attempt to drop the table, the table is pulled into the 
 PersistenceManager cache for the purposes of dropping
 2) The drop fails, e.g. due to a lock wait timeout on the SQL backend, this 
 causes a rollback of the transaction
 3) The drop is retried using a different thread on the metastore Thrift 
 server or a different server and succeeds
 4) Back on the original thread of the original Thrift server someone tries to 
 perform some write operation which produces a commit.  This causes those 
 detached objects related to the dropped table to attempt to reattach, causing 
 JDO to query the SQL backend for those objects which it can't find.  This 
 causes the exception.
 I was able to reproduce this regularly using the following sequence of 
 commands:
 Hive client 1 (Hive1): connected to a metastore Thrift server running a 
 single thread, I hard coded a RuntimeException into the code to drop a table 
 in the ObjectStore, specifically right before the commit in 
 preDropStorageDescriptor, to induce a rollback.  I also turned off all 
 retries at all layers of the metastore.
 Hive client 2 (Hive2): connected to a separate metastore Thrift server 
 running with standard configs and code
 1: On Hive1, CREATE TABLE t1 (c STRING);
 2: On Hive1, DROP TABLE t1; // This failed due to the hard coded exception
 3: On Hive2, DROP TABLE t1; // Succeeds
 4: On Hive1, CREATE DATABASE d1; // This database already existed, I'm not 
 sure why this was necessary, but it didn't work without it, it seemed to have 
 an affect on the order objects were committed in the next step
 5: On Hive1, CREATE DATABASE d2; // This database didn't exist, it would fail 
 with the NucleusObjectNotFoundException
 The object that would cause the exception varied, I saw the MTable, the 
 MSerDeInfo, and MTablePrivilege from the table that attempted to be dropped.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3803) explain dependency should show the dependencies hierarchically in presence of views

2012-12-21 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3803:
-

Status: Patch Available  (was: Open)

Comments addressed.

Kevin, can you take a look ? This is a code only patch.
It this looks good, I will file a patch with the log file updates.

 explain dependency should show the dependencies hierarchically in presence of 
 views
 ---

 Key: HIVE-3803
 URL: https://issues.apache.org/jira/browse/HIVE-3803
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.3803.1.patch, hive.3803.2.patch, hive.3803.3.patch, 
 hive.3803.4.patch, hive.3803.5.patch


 It should also include tables whose partitions are being accessed

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3552) HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys

2012-12-21 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3552:
-

Status: Patch Available  (was: Open)

 HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a 
 high number of grouping set keys
 -

 Key: HIVE-3552
 URL: https://issues.apache.org/jira/browse/HIVE-3552
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.3552.10.patch, hive.3552.11.patch, 
 hive.3552.1.patch, hive.3552.2.patch, hive.3552.3.patch, hive.3552.4.patch, 
 hive.3552.5.patch, hive.3552.6.patch, hive.3552.7.patch, hive.3552.8.patch, 
 hive.3552.9.patch


 This is a follow up for HIVE-3433.
 Had a offline discussion with Sambavi - she pointed out a scenario where the
 implementation in HIVE-3433 will not scale. Assume that the user is performing
 a cube on many columns, say '8' columns. So, each row would generate 256 rows
 for the hash table, which may kill the current group by implementation.
 A better implementation would be to add an additional mr job - in the first 
 mr job perform the group by assuming there was no cube. Add another mr job, 
 where
 you would perform the cube. The assumption is that the group by would have 
 decreased the output data significantly, and the rows would appear in the 
 order of
 grouping keys which has a higher probability of hitting the hash table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3552) HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys

2012-12-21 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3552:
-

Attachment: hive.3552.11.patch

 HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a 
 high number of grouping set keys
 -

 Key: HIVE-3552
 URL: https://issues.apache.org/jira/browse/HIVE-3552
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.3552.10.patch, hive.3552.11.patch, 
 hive.3552.1.patch, hive.3552.2.patch, hive.3552.3.patch, hive.3552.4.patch, 
 hive.3552.5.patch, hive.3552.6.patch, hive.3552.7.patch, hive.3552.8.patch, 
 hive.3552.9.patch


 This is a follow up for HIVE-3433.
 Had a offline discussion with Sambavi - she pointed out a scenario where the
 implementation in HIVE-3433 will not scale. Assume that the user is performing
 a cube on many columns, say '8' columns. So, each row would generate 256 rows
 for the hash table, which may kill the current group by implementation.
 A better implementation would be to add an additional mr job - in the first 
 mr job perform the group by assuming there was no cube. Add another mr job, 
 where
 you would perform the cube. The assumption is that the group by would have 
 decreased the output data significantly, and the rows would appear in the 
 order of
 grouping keys which has a higher probability of hitting the hash table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3826) Rollbacks and retries of drops cause org.datanucleus.exceptions.NucleusObjectNotFoundException: No such database row)

2012-12-21 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3826:
-

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Committed. Thanks Kevin

 Rollbacks and retries of drops cause 
 org.datanucleus.exceptions.NucleusObjectNotFoundException: No such database 
 row)
 -

 Key: HIVE-3826
 URL: https://issues.apache.org/jira/browse/HIVE-3826
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.11
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-3826.1.patch.txt


 I'm not sure if this is the only cause of the exception 
 org.datanucleus.exceptions.NucleusObjectNotFoundException: No such database 
 row) from the metastore, but one cause seems to be related to a drop command 
 failing, and being retried by the client.
 Based on focusing on a single thread in the metastore with DEBUG level 
 logging, I was seeing the objects that were intended to be dropped remaining 
 in the PersistenceManager cache even after a rollback.  The steps seemed to 
 be as follows:
 1) First attempt to drop the table, the table is pulled into the 
 PersistenceManager cache for the purposes of dropping
 2) The drop fails, e.g. due to a lock wait timeout on the SQL backend, this 
 causes a rollback of the transaction
 3) The drop is retried using a different thread on the metastore Thrift 
 server or a different server and succeeds
 4) Back on the original thread of the original Thrift server someone tries to 
 perform some write operation which produces a commit.  This causes those 
 detached objects related to the dropped table to attempt to reattach, causing 
 JDO to query the SQL backend for those objects which it can't find.  This 
 causes the exception.
 I was able to reproduce this regularly using the following sequence of 
 commands:
 Hive client 1 (Hive1): connected to a metastore Thrift server running a 
 single thread, I hard coded a RuntimeException into the code to drop a table 
 in the ObjectStore, specifically right before the commit in 
 preDropStorageDescriptor, to induce a rollback.  I also turned off all 
 retries at all layers of the metastore.
 Hive client 2 (Hive2): connected to a separate metastore Thrift server 
 running with standard configs and code
 1: On Hive1, CREATE TABLE t1 (c STRING);
 2: On Hive1, DROP TABLE t1; // This failed due to the hard coded exception
 3: On Hive2, DROP TABLE t1; // Succeeds
 4: On Hive1, CREATE DATABASE d1; // This database already existed, I'm not 
 sure why this was necessary, but it didn't work without it, it seemed to have 
 an affect on the order objects were committed in the next step
 5: On Hive1, CREATE DATABASE d2; // This database didn't exist, it would fail 
 with the NucleusObjectNotFoundException
 The object that would cause the exception varied, I saw the MTable, the 
 MSerDeInfo, and MTablePrivilege from the table that attempted to be dropped.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-3824) bug if different serdes are used for different partitions

2012-12-20 Thread Namit Jain (JIRA)
Namit Jain created HIVE-3824:


 Summary: bug if different serdes are used for different partitions
 Key: HIVE-3824
 URL: https://issues.apache.org/jira/browse/HIVE-3824
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain


Consider the following testcase:

create table tst5 (key string, value string) partitioned by (ds string) stored 
as rcfile;
insert overwrite table tst5 partition (ds='1') select * from src;
insert overwrite table tst5 partition (ds='2') select * from src;
insert overwrite table tst5 partition (ds='3') select * from src;

alter table tst5 stored as sequencefile; 

insert overwrite table tst5 partition (ds='4') select * from src;
insert overwrite table tst5 partition (ds='5') select * from src;
insert overwrite table tst5 partition (ds='6') select * from src;  

alter table tst5 set serde 
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'; 

insert overwrite table tst5 partition (ds='7') select * from src;
insert overwrite table tst5 partition (ds='8') select * from src;
insert overwrite table tst5 partition (ds='9') select * from src;  

The following query works fine:

 select key + key, value from tst5 where ((ds = '4') or (ds = '1'));   

since both the partitions use ColumnarSerDe

But the following query fails:

select key + key, value from tst5 where ((ds = '4') or (ds = '1') or (ds='7'));

since different serdes are used.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3824) bug if different serdes are used for different partitions

2012-12-20 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13536916#comment-13536916
 ] 

Namit Jain commented on HIVE-3824:
--

https://reviews.facebook.net/D7515

 bug if different serdes are used for different partitions
 -

 Key: HIVE-3824
 URL: https://issues.apache.org/jira/browse/HIVE-3824
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain

 Consider the following testcase:
 create table tst5 (key string, value string) partitioned by (ds string) 
 stored as rcfile;
 insert overwrite table tst5 partition (ds='1') select * from src;
 insert overwrite table tst5 partition (ds='2') select * from src;
 insert overwrite table tst5 partition (ds='3') select * from src;
 alter table tst5 stored as sequencefile; 
 insert overwrite table tst5 partition (ds='4') select * from src;
 insert overwrite table tst5 partition (ds='5') select * from src;
 insert overwrite table tst5 partition (ds='6') select * from src;  
 alter table tst5 set serde 
 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'; 
 insert overwrite table tst5 partition (ds='7') select * from src;
 insert overwrite table tst5 partition (ds='8') select * from src;
 insert overwrite table tst5 partition (ds='9') select * from src;  
 The following query works fine:
  select key + key, value from tst5 where ((ds = '4') or (ds = '1'));   
 since both the partitions use ColumnarSerDe
 But the following query fails:
 select key + key, value from tst5 where ((ds = '4') or (ds = '1') or 
 (ds='7'));
 since different serdes are used.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3824) bug if different serdes are used for different partitions

2012-12-20 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3824:
-

Attachment: hive.3824.1.patch

 bug if different serdes are used for different partitions
 -

 Key: HIVE-3824
 URL: https://issues.apache.org/jira/browse/HIVE-3824
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
 Attachments: hive.3824.1.patch


 Consider the following testcase:
 create table tst5 (key string, value string) partitioned by (ds string) 
 stored as rcfile;
 insert overwrite table tst5 partition (ds='1') select * from src;
 insert overwrite table tst5 partition (ds='2') select * from src;
 insert overwrite table tst5 partition (ds='3') select * from src;
 alter table tst5 stored as sequencefile; 
 insert overwrite table tst5 partition (ds='4') select * from src;
 insert overwrite table tst5 partition (ds='5') select * from src;
 insert overwrite table tst5 partition (ds='6') select * from src;  
 alter table tst5 set serde 
 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'; 
 insert overwrite table tst5 partition (ds='7') select * from src;
 insert overwrite table tst5 partition (ds='8') select * from src;
 insert overwrite table tst5 partition (ds='9') select * from src;  
 The following query works fine:
  select key + key, value from tst5 where ((ds = '4') or (ds = '1'));   
 since both the partitions use ColumnarSerDe
 But the following query fails:
 select key + key, value from tst5 where ((ds = '4') or (ds = '1') or 
 (ds='7'));
 since different serdes are used.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3818) Identify VOID datatype in generated table and notify the user accordingly

2012-12-20 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13536946#comment-13536946
 ] 

Namit Jain commented on HIVE-3818:
--

Can you create a phabricator entry ?
https://cwiki.apache.org/Hive/phabricatorcodereview.html


https://reviews.facebook.net/D7521

 Identify VOID datatype in generated table and notify the user accordingly
 -

 Key: HIVE-3818
 URL: https://issues.apache.org/jira/browse/HIVE-3818
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Gary Colman
Assignee: Gary Colman
Priority: Trivial
 Attachments: 
 Patch_to_throw_useful_error_msg_for_VOID_datatype_in_created_table.txt

   Original Estimate: 1h
  Remaining Estimate: 1h

 When using rcfile as a datastore, generating a table from a select statement 
 with a null field results in a void data type, and throws an exception 
 (Internal error: no LazyObject for VOID).
 eg.
   set hive.default.fileformat=RCFILE;
   CREATE TABLE test_table AS SELECT NULL, key FROM src;
 Make the message to the user a little more intuitive.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3818) Identify VOID datatype in generated table and notify the user accordingly

2012-12-20 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3818:
-

Status: Open  (was: Patch Available)

comments on phabricator

 Identify VOID datatype in generated table and notify the user accordingly
 -

 Key: HIVE-3818
 URL: https://issues.apache.org/jira/browse/HIVE-3818
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Gary Colman
Assignee: Gary Colman
Priority: Trivial
 Attachments: 
 Patch_to_throw_useful_error_msg_for_VOID_datatype_in_created_table.txt

   Original Estimate: 1h
  Remaining Estimate: 1h

 When using rcfile as a datastore, generating a table from a select statement 
 with a null field results in a void data type, and throws an exception 
 (Internal error: no LazyObject for VOID).
 eg.
   set hive.default.fileformat=RCFILE;
   CREATE TABLE test_table AS SELECT NULL, key FROM src;
 Make the message to the user a little more intuitive.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3821) RCFile does not work with lazyBinarySerDe

2012-12-20 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3821:
-

Status: Patch Available  (was: Open)

 RCFile does not work with lazyBinarySerDe
 -

 Key: HIVE-3821
 URL: https://issues.apache.org/jira/browse/HIVE-3821
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Namit Jain
Assignee: Namit Jain

 create table tst(key string, value string) row format serde 
 'org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe' stored as rcfile;  
 insert overwrite table tst select * from src;
 gets an error:
 Caused by: java.lang.UnsupportedOperationException: Currently the writer can 
 only accept BytesRefArrayWritable
 at org.apache.hadoop.hive.ql.io.RCFile$Writer.append(RCFile.java:882)
 at 
 org.apache.hadoop.hive.ql.io.RCFileOutputFormat$2.write(RCFileOutputFormat.java:140)
 at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:637)
 at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
 at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
 at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
 at 
 org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:90)
 at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
 at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:547)
 ... 5 more

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3821) RCFile does not work with lazyBinarySerDe

2012-12-19 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13535837#comment-13535837
 ] 

Namit Jain commented on HIVE-3821:
--

A simpler way to handle this would be to let ColumnarSerDe write binary data

 RCFile does not work with lazyBinarySerDe
 -

 Key: HIVE-3821
 URL: https://issues.apache.org/jira/browse/HIVE-3821
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Namit Jain
Assignee: Namit Jain

 create table tst(key string, value string) row format serde 
 'org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe' stored as rcfile;  
 insert overwrite table tst select * from src;
 gets an error:
 Caused by: java.lang.UnsupportedOperationException: Currently the writer can 
 only accept BytesRefArrayWritable
 at org.apache.hadoop.hive.ql.io.RCFile$Writer.append(RCFile.java:882)
 at 
 org.apache.hadoop.hive.ql.io.RCFileOutputFormat$2.write(RCFileOutputFormat.java:140)
 at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:637)
 at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
 at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
 at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
 at 
 org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:90)
 at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
 at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:547)
 ... 5 more

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: [DISCUSS] HCatalog becoming a subproject of Hive

2012-12-19 Thread Namit Jain
I don’t agree with the proposal. It is impractical to have a Hcat committer
with commit access to Hcat only portions of Hive. We cannot guarantee that
a Hcat
committer will become a Hive committer in 6-9 months, that depends on what
they do
in the next 6-9 months.

The current Hcat committers should spend more time in reviewing patches,
work on non-Hcat areas in Hive, and then gradually become a hive
committer. They should not be given any preferential treatment, and the
process should be same as it would be for any other hive contributor
currently. Given that the expertise of the Hcat committers, they should
be inline for becoming a hive committer if they continue to work in hive,
but that cannot be guaranteed. I agree that some Hive committers should try
and help the existing Hcat patches, and again that is voluntary and
different
committers cannot be assigned to different parts of the code.

Thanks,
-namit







On 12/20/12 1:03 AM, Carl Steinbach cwsteinb...@gmail.com wrote:

Alan's proposal sounds like a good idea to me.

+1

On Dec 18, 2012 5:36 PM, Travis Crawford traviscrawf...@gmail.com
wrote:

 Alan, I think your proposal sounds great.

 --travis

 On Tue, Dec 18, 2012 at 1:13 PM, Alan Gates ga...@hortonworks.com
wrote:
  Carl, speaking just for myself and not as a representative of the HCat
 PPMC at this point, I am coming to agree with you that HCat integrating
 with Hive fully makes more sense.
 
  However, this makes the committer question even thornier.  Travis and
 Namit, I think the shepherd proposal needs to lay out a clear and time
 bounded path to committership for HCat committers.  Having HCat
committers
 as second class Hive citizens for the long run will not be healthy.  I
 propose the following as a starting point for discussion:
 
  All active HCat committers (those who have contributed or committed a
 patch in the last 6 months) will be made committers in the HCat portion
 only of Hive.  In addition those committers will be assigned a
particular
 shepherd who is a current Hive committer and who will be responsible for
 mentoring them towards full Hive committership.  As a part of this
 mentorship the HCat committer will review patches of other contributors,
 contribute patches to Hive (both inside and outside of HCatalog),
respond
 to user issues on the mailing lists, etc.  It is intended that as a
result
 of this mentorship program HCat committers can become full Hive
committers
 in 6-9 months.  No new HCat only committers will be elected in Hive
after
 this.  All Hive committers will automatically also have commit rights on
 HCatalog.
 
  Alan.
 
  On Dec 14, 2012, at 10:05 AM, Carl Steinbach wrote:
 
  On a functional level I don't think there is going to be much of a
  difference between the subproject option proposed by Travis and the
 other
  option where HCatalog becomes a TLP. In both cases HCatalog and Hive
 will
  have separate committers, separate code repositories, separate
release
  cycles, and separate project roadmaps. Aside from ASF bureaucracy, I
 think
  the only major difference between the two options is that the
subproject
  route will give the rest of the community the false impression that
the
 two
  projects have coordinated roadmaps and a process to prevent
overlapping
  functionality from appearing in both projects. Consequently, If these
 are
  the only two options then I would prefer that HCatalog become a TLP.
 
  On the other hand, I also agree with many of the sentiments that have
  already been expressed in this thread, namely that the two projects
are
  closely related and that it would benefit the community at large if
the
 two
  projects could be brought closer together. Up to this point the major
  source of pain for the HCatalog team has been the frequent necessity
of
  making changes on both the Hive and HCatalog sides when implementing
new
  features in HCatalog. This situation is compounded by the ASF
 requirement
  that release artifacts may not depend on snapshot artifacts from
other
 ASF
  projects. Furthermore, if Hive adds a dependency on HCatalog then it
 will
  be subject to these same problems (in addition to the gross circular
  dependency!).
 
  I think the best way to avoid these problems is for HCatalog to
become a
  Hive submodule. In this scenario HCatalog would exist as a
subdirectory
 in
  the Hive repository and would be distributed as a Hive artifact in
 future
  Hive releases. In addition to solving the problems I mentioned
earlier,
 I
  think this would also help to assuage the concerns of many Hive
 committers
  who don't want to see the MetaStore split out into a separate
project.
 
  Thanks.
 
  Carl
 
  On Thu, Dec 13, 2012 at 7:59 PM, Namit Jain nj...@fb.com wrote:
 
  I am fine with this. Any hive committers who wants to volunteer to
be
  a hcat shepherd is welcome.
 
 
 
  On 12/14/12 7:01 AM, Travis Crawford traviscrawf...@gmail.com
 wrote:
 
  Thanks for reviving this thread. Reviewing the comments everyone

[jira] [Commented] (HIVE-3552) HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys

2012-12-19 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13536762#comment-13536762
 ] 

Namit Jain commented on HIVE-3552:
--

refreshed and attached latest patch.

 HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a 
 high number of grouping set keys
 -

 Key: HIVE-3552
 URL: https://issues.apache.org/jira/browse/HIVE-3552
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.3552.10.patch, hive.3552.1.patch, 
 hive.3552.2.patch, hive.3552.3.patch, hive.3552.4.patch, hive.3552.5.patch, 
 hive.3552.6.patch, hive.3552.7.patch, hive.3552.8.patch, hive.3552.9.patch


 This is a follow up for HIVE-3433.
 Had a offline discussion with Sambavi - she pointed out a scenario where the
 implementation in HIVE-3433 will not scale. Assume that the user is performing
 a cube on many columns, say '8' columns. So, each row would generate 256 rows
 for the hash table, which may kill the current group by implementation.
 A better implementation would be to add an additional mr job - in the first 
 mr job perform the group by assuming there was no cube. Add another mr job, 
 where
 you would perform the cube. The assumption is that the group by would have 
 decreased the output data significantly, and the rows would appear in the 
 order of
 grouping keys which has a higher probability of hitting the hash table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3796) Multi-insert involving bucketed/sorted table turns off merging on all outputs

2012-12-18 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13534768#comment-13534768
 ] 

Namit Jain commented on HIVE-3796:
--

+1

 Multi-insert involving bucketed/sorted table turns off merging on all outputs
 -

 Key: HIVE-3796
 URL: https://issues.apache.org/jira/browse/HIVE-3796
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.11
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-3796.1.patch.txt, HIVE-3796.2.patch.txt, 
 HIVE-3796.3.patch.txt, HIVE-3796.4.patch.txt


 When a multi-insert query has at least one output that is bucketed, merging 
 is turned off for all outputs, rather than just the bucketed ones.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3785) Core hive changes for HiveServer2 implementation

2012-12-18 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3785:
-

Status: Open  (was: Patch Available)

initial comments

 Core hive changes for HiveServer2 implementation
 

 Key: HIVE-3785
 URL: https://issues.apache.org/jira/browse/HIVE-3785
 Project: Hive
  Issue Type: Sub-task
  Components: Authentication, Build Infrastructure, Configuration, 
 Thrift API
Affects Versions: 0.10.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Attachments: HS2-changed-files-only.patch


 The subtask to track changes in the core hive components for HiveServer2 
 implementation

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HIVE-3818) Identify VOID datatype in generated table and notify the user accordingly

2012-12-18 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain reassigned HIVE-3818:


Assignee: Gary Colman

 Identify VOID datatype in generated table and notify the user accordingly
 -

 Key: HIVE-3818
 URL: https://issues.apache.org/jira/browse/HIVE-3818
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Gary Colman
Assignee: Gary Colman
Priority: Trivial
   Original Estimate: 1h
  Remaining Estimate: 1h

 When using rcfile as a datastore, generating a table from a select statement 
 with a null field results in a void data type, and throws an exception 
 (Internal error: no LazyObject for VOID).
 eg.
   set hive.default.fileformat=RCFILE;
   CREATE TABLE test_table AS SELECT NULL, key FROM src;
 Make the message to the user a little more intuitive.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3752) Add a non-sql API in hive to access data.

2012-12-18 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13535626#comment-13535626
 ] 

Namit Jain commented on HIVE-3752:
--

[~nitay], can you try now ?

 Add a non-sql API in hive to access data.
 -

 Key: HIVE-3752
 URL: https://issues.apache.org/jira/browse/HIVE-3752
 Project: Hive
  Issue Type: Improvement
Reporter: Nitay Joffe
Assignee: Nitay Joffe

 We would like to add an input/output format for accessing Hive data in Hadoop 
 directly without having to use e.g. a transform. Using a transform
 means having to do a whole map-reduce step with its own disk accesses and its 
 imposed structure. It also means needing to have Hive be the base 
 infrastructure for the entire system being developed which is not the right 
 fit as we only need a small part of it (access to the data).
 So we propose adding an API level InputFormat and OutputFormat to Hive that 
 will make it trivially easy to select a table with partition spec and read 
 from / write to it. We chose this design to make it compatible with Hadoop so 
 that existing systems that work with Hadoop's IO API will just work out of 
 the box.
 We need this system for the Giraph graph processing system 
 (http://giraph.apache.org/) as running graph jobs which read/write from Hive 
 is a common use case.
 [~namitjain] [~aching] [~kevinwilfong] [~apresta]
 Input-side (HiveApiInputFormat) review: https://reviews.facebook.net/D7401

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3537) release locks at the end of move tasks

2012-12-18 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3537:
-

Attachment: hive.3537.7.patch

 release locks at the end of move tasks
 --

 Key: HIVE-3537
 URL: https://issues.apache.org/jira/browse/HIVE-3537
 Project: Hive
  Issue Type: Bug
  Components: Locking, Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.3537.1.patch, hive.3537.2.patch, hive.3537.3.patch, 
 hive.3537.4.patch, hive.3537.5.patch, hive.3537.6.patch, hive.3537.7.patch


 Look at HIVE-3106 for details.
 In order to make sure that concurrency is not an issue for multi-table 
 inserts, the current option is to introduce a dependency task, which thereby
 delays the creation of all partitions. It would be desirable to release the
 locks for the outputs as soon as the move task is completed. That way, for
 multi-table inserts, the concurrency can be enabled without delaying any 
 table.
 Currently, the movetask contains a input/output, but they do not seem to be
 populated correctly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3803) explain dependency should show the dependencies hierarchically in presence of views

2012-12-18 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3803:
-

Attachment: hive.3803.5.patch

 explain dependency should show the dependencies hierarchically in presence of 
 views
 ---

 Key: HIVE-3803
 URL: https://issues.apache.org/jira/browse/HIVE-3803
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.3803.1.patch, hive.3803.2.patch, hive.3803.3.patch, 
 hive.3803.4.patch, hive.3803.5.patch


 It should also include tables whose partitions are being accessed

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3803) explain dependency should show the dependencies hierarchically in presence of views

2012-12-18 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13535628#comment-13535628
 ] 

Namit Jain commented on HIVE-3803:
--

I have refreshed the above entry https://reviews.facebook.net/D7377 with 
code-only changes, and
explain_dependency.out. These are more important to review for correctness, 
others in the patch files
are log file changes only.

 explain dependency should show the dependencies hierarchically in presence of 
 views
 ---

 Key: HIVE-3803
 URL: https://issues.apache.org/jira/browse/HIVE-3803
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.3803.1.patch, hive.3803.2.patch, hive.3803.3.patch, 
 hive.3803.4.patch, hive.3803.5.patch


 It should also include tables whose partitions are being accessed

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3796) Multi-insert involving bucketed/sorted table turns off merging on all outputs

2012-12-18 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3796:
-

Attachment: hive.3796.5.patch

Refreshed the patch, adding a new patch file
(for the record).

 Multi-insert involving bucketed/sorted table turns off merging on all outputs
 -

 Key: HIVE-3796
 URL: https://issues.apache.org/jira/browse/HIVE-3796
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.11
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-3796.1.patch.txt, HIVE-3796.2.patch.txt, 
 HIVE-3796.3.patch.txt, HIVE-3796.4.patch.txt, hive.3796.5.patch


 When a multi-insert query has at least one output that is bucketed, merging 
 is turned off for all outputs, rather than just the bucketed ones.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-3821) RCFile does not work with lazyBinarySerDe

2012-12-18 Thread Namit Jain (JIRA)
Namit Jain created HIVE-3821:


 Summary: RCFile does not work with lazyBinarySerDe
 Key: HIVE-3821
 URL: https://issues.apache.org/jira/browse/HIVE-3821
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Namit Jain
Assignee: Namit Jain


create table tst(key string, value string) row format serde 
'org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe' stored as rcfile;  

insert overwrite table tst select * from src;

gets an error:

Caused by: java.lang.UnsupportedOperationException: Currently the writer can 
only accept BytesRefArrayWritable
at org.apache.hadoop.hive.ql.io.RCFile$Writer.append(RCFile.java:882)
at 
org.apache.hadoop.hive.ql.io.RCFileOutputFormat$2.write(RCFileOutputFormat.java:140)
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:637)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:90)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:547)
... 5 more

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3552) HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys

2012-12-18 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3552:
-

Attachment: hive.3552.8.patch

 HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a 
 high number of grouping set keys
 -

 Key: HIVE-3552
 URL: https://issues.apache.org/jira/browse/HIVE-3552
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.3552.1.patch, hive.3552.2.patch, hive.3552.3.patch, 
 hive.3552.4.patch, hive.3552.5.patch, hive.3552.6.patch, hive.3552.7.patch, 
 hive.3552.8.patch


 This is a follow up for HIVE-3433.
 Had a offline discussion with Sambavi - she pointed out a scenario where the
 implementation in HIVE-3433 will not scale. Assume that the user is performing
 a cube on many columns, say '8' columns. So, each row would generate 256 rows
 for the hash table, which may kill the current group by implementation.
 A better implementation would be to add an additional mr job - in the first 
 mr job perform the group by assuming there was no cube. Add another mr job, 
 where
 you would perform the cube. The assumption is that the group by would have 
 decreased the output data significantly, and the rows would appear in the 
 order of
 grouping keys which has a higher probability of hitting the hash table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3633) sort-merge join does not work with sub-queries

2012-12-17 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3633:
-

Attachment: hive.3633.10.patch

 sort-merge join does not work with sub-queries
 --

 Key: HIVE-3633
 URL: https://issues.apache.org/jira/browse/HIVE-3633
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.3633.10.patch, hive.3633.1.patch, 
 hive.3633.2.patch, hive.3633.3.patch, hive.3633.4.patch, hive.3633.5.patch, 
 hive.3633.6.patch, hive.3633.7.patch, hive.3633.8.patch, hive.3633.9.patch


 Consider the following query:
 create table smb_bucket_1(key int, value string) CLUSTERED BY (key) SORTED BY 
 (key) INTO 6 BUCKETS STORED AS TEXTFILE;
 create table smb_bucket_2(key int, value string) CLUSTERED BY (key) SORTED BY 
 (key) INTO 6 BUCKETS STORED AS TEXTFILE;
 -- load the above tables
 set hive.optimize.bucketmapjoin = true;
 set hive.optimize.bucketmapjoin.sortedmerge = true;
 set hive.input.format = 
 org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
 explain
 select count(*) from
 (
 select /*+mapjoin(a)*/ a.key as key1, b.key as key2, a.value as value1, 
 b.value as value2
 from smb_bucket_1 a join smb_bucket_2 b on a.key = b.key)
 subq;
 The above query does not use sort-merge join. This would be very useful as we 
 automatically convert the queries to use sorting and bucketing properties for 
 join.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3633) sort-merge join does not work with sub-queries

2012-12-17 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3633:
-

Status: Patch Available  (was: Open)

refreshed, and also attached the new file

 sort-merge join does not work with sub-queries
 --

 Key: HIVE-3633
 URL: https://issues.apache.org/jira/browse/HIVE-3633
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.3633.10.patch, hive.3633.1.patch, 
 hive.3633.2.patch, hive.3633.3.patch, hive.3633.4.patch, hive.3633.5.patch, 
 hive.3633.6.patch, hive.3633.7.patch, hive.3633.8.patch, hive.3633.9.patch


 Consider the following query:
 create table smb_bucket_1(key int, value string) CLUSTERED BY (key) SORTED BY 
 (key) INTO 6 BUCKETS STORED AS TEXTFILE;
 create table smb_bucket_2(key int, value string) CLUSTERED BY (key) SORTED BY 
 (key) INTO 6 BUCKETS STORED AS TEXTFILE;
 -- load the above tables
 set hive.optimize.bucketmapjoin = true;
 set hive.optimize.bucketmapjoin.sortedmerge = true;
 set hive.input.format = 
 org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
 explain
 select count(*) from
 (
 select /*+mapjoin(a)*/ a.key as key1, b.key as key2, a.value as value1, 
 b.value as value2
 from smb_bucket_1 a join smb_bucket_2 b on a.key = b.key)
 subq;
 The above query does not use sort-merge join. This would be very useful as we 
 automatically convert the queries to use sorting and bucketing properties for 
 join.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HIVE-3646) Add 'IGNORE PROTECTION' predicate for dropping partitions

2012-12-17 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain resolved HIVE-3646.
--

   Resolution: Fixed
Fix Version/s: 0.11
 Hadoop Flags: Reviewed

Committed. Thanks Andrew

 Add 'IGNORE PROTECTION' predicate for dropping partitions
 -

 Key: HIVE-3646
 URL: https://issues.apache.org/jira/browse/HIVE-3646
 Project: Hive
  Issue Type: New Feature
  Components: CLI
Reporter: Andrew Chalfant
Assignee: Andrew Chalfant
Priority: Minor
 Fix For: 0.11

 Attachments: HIVE-3646.1.patch.txt, HIVE-3646.2.patch.txt, 
 HIVE-3646.3.patch.txt

   Original Estimate: 1m
  Remaining Estimate: 1m

 There are cases where it is desirable to move partitions between clusters. 
 Having to undo protection and then re-protect tables in order to delete 
 partitions from a source are multi-step and can leave us in a failed open 
 state where partition and table metadata is dirty. By implementing an 'rm 
 -rf'-like functionality, we can perform these operations atomically.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3787) Regression introduced from HIVE-3401

2012-12-17 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13533856#comment-13533856
 ] 

Namit Jain commented on HIVE-3787:
--

+1

 Regression introduced from HIVE-3401
 

 Key: HIVE-3787
 URL: https://issues.apache.org/jira/browse/HIVE-3787
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-3787.D7275.1.patch


 By HIVE-3562, split_sample_out_of_range.q and split_sample_wrong_format.q are 
 not showing valid 'line:loc' information for error messages.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3300) LOAD DATA INPATH fails if a hdfs file with same name is added to table

2012-12-17 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3300:
-

Status: Open  (was: Patch Available)

comments on phabricator

 LOAD DATA INPATH fails if a hdfs file with same name is added to table
 --

 Key: HIVE-3300
 URL: https://issues.apache.org/jira/browse/HIVE-3300
 Project: Hive
  Issue Type: Bug
  Components: Import/Export
Affects Versions: 0.10.0
 Environment: ubuntu linux, hadoop 1.0.3, hive 0.9
Reporter: Bejoy KS
Assignee: Navis
 Attachments: HIVE-3300.1.patch.txt, HIVE-3300.D4383.3.patch


 If we are loading data from local fs to hive tables using 'LOAD DATA LOCAL 
 INPATH' and if a file with the same name exists in the table's location then 
 the new file will be suffixed by *_copy_1.
 But if we do the 'LOAD DATA INPATH'  for a file in hdfs then there is no 
 rename happening but just a move task is getting triggered. Since a file with 
 same name exists in same hdfs location, hadoop fs move operation throws an 
 error.
 hive LOAD DATA INPATH '/userdata/bejoy/site.txt' INTO TABLE test.site;
 Loading data to table test.site
 Failed with exception null
 FAILED: Execution Error, return code 1 from 
 org.apache.hadoop.hive.ql.exec.MoveTask
 hive 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3803) explain dependency should show the dependencies hierarchically in presence of views

2012-12-17 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3803:
-

Attachment: hive.3803.3.patch

 explain dependency should show the dependencies hierarchically in presence of 
 views
 ---

 Key: HIVE-3803
 URL: https://issues.apache.org/jira/browse/HIVE-3803
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.3803.1.patch, hive.3803.2.patch, hive.3803.3.patch


 It should also include tables whose partitions are being accessed

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3795) NPE in SELECT when WHERE-clause is an and/or/not operation involving null

2012-12-17 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13534623#comment-13534623
 ] 

Namit Jain commented on HIVE-3795:
--

+1

running tests

 NPE in SELECT when WHERE-clause is an and/or/not operation involving null
 -

 Key: HIVE-3795
 URL: https://issues.apache.org/jira/browse/HIVE-3795
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Xiao Jiang
Assignee: Xiao Jiang
Priority: Trivial
 Attachments: HIVE-3795.1.patch.txt, HIVE-3795.2.patch.txt, 
 HIVE-3795.3.patch.txt


 Sometimes users forget to quote date constants in queries. For example, 
 SELECT * FROM some_table WHERE ds = 2012-12-10 and ds = 2012-12-12; . In 
 such cases, if the WHERE-clause contains and/or/not operation, it would throw 
 NPE exception. That's because PcrExprProcFactory in ql/optimizer forgot to 
 check null. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3795) NPE in SELECT when WHERE-clause is an and/or/not operation involving null

2012-12-17 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3795:
-

Attachment: hive.3795.4.patch

 NPE in SELECT when WHERE-clause is an and/or/not operation involving null
 -

 Key: HIVE-3795
 URL: https://issues.apache.org/jira/browse/HIVE-3795
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Xiao Jiang
Assignee: Xiao Jiang
Priority: Trivial
 Attachments: HIVE-3795.1.patch.txt, HIVE-3795.2.patch.txt, 
 HIVE-3795.3.patch.txt, hive.3795.4.patch


 Sometimes users forget to quote date constants in queries. For example, 
 SELECT * FROM some_table WHERE ds = 2012-12-10 and ds = 2012-12-12; . In 
 such cases, if the WHERE-clause contains and/or/not operation, it would throw 
 NPE exception. That's because PcrExprProcFactory in ql/optimizer forgot to 
 check null. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3633) sort-merge join does not work with sub-queries

2012-12-17 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3633:
-

Attachment: hive.3633.11.patch

 sort-merge join does not work with sub-queries
 --

 Key: HIVE-3633
 URL: https://issues.apache.org/jira/browse/HIVE-3633
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.3633.10.patch, hive.3633.11.patch, 
 hive.3633.1.patch, hive.3633.2.patch, hive.3633.3.patch, hive.3633.4.patch, 
 hive.3633.5.patch, hive.3633.6.patch, hive.3633.7.patch, hive.3633.8.patch, 
 hive.3633.9.patch


 Consider the following query:
 create table smb_bucket_1(key int, value string) CLUSTERED BY (key) SORTED BY 
 (key) INTO 6 BUCKETS STORED AS TEXTFILE;
 create table smb_bucket_2(key int, value string) CLUSTERED BY (key) SORTED BY 
 (key) INTO 6 BUCKETS STORED AS TEXTFILE;
 -- load the above tables
 set hive.optimize.bucketmapjoin = true;
 set hive.optimize.bucketmapjoin.sortedmerge = true;
 set hive.input.format = 
 org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
 explain
 select count(*) from
 (
 select /*+mapjoin(a)*/ a.key as key1, b.key as key2, a.value as value1, 
 b.value as value2
 from smb_bucket_1 a join smb_bucket_2 b on a.key = b.key)
 subq;
 The above query does not use sort-merge join. This would be very useful as we 
 automatically convert the queries to use sorting and bucketing properties for 
 join.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3633) sort-merge join does not work with sub-queries

2012-12-17 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3633:
-

Status: Patch Available  (was: Open)

addressed comments - made new tests deterministic

 sort-merge join does not work with sub-queries
 --

 Key: HIVE-3633
 URL: https://issues.apache.org/jira/browse/HIVE-3633
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.3633.10.patch, hive.3633.11.patch, 
 hive.3633.1.patch, hive.3633.2.patch, hive.3633.3.patch, hive.3633.4.patch, 
 hive.3633.5.patch, hive.3633.6.patch, hive.3633.7.patch, hive.3633.8.patch, 
 hive.3633.9.patch


 Consider the following query:
 create table smb_bucket_1(key int, value string) CLUSTERED BY (key) SORTED BY 
 (key) INTO 6 BUCKETS STORED AS TEXTFILE;
 create table smb_bucket_2(key int, value string) CLUSTERED BY (key) SORTED BY 
 (key) INTO 6 BUCKETS STORED AS TEXTFILE;
 -- load the above tables
 set hive.optimize.bucketmapjoin = true;
 set hive.optimize.bucketmapjoin.sortedmerge = true;
 set hive.input.format = 
 org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
 explain
 select count(*) from
 (
 select /*+mapjoin(a)*/ a.key as key1, b.key as key2, a.value as value1, 
 b.value as value2
 from smb_bucket_1 a join smb_bucket_2 b on a.key = b.key)
 subq;
 The above query does not use sort-merge join. This would be very useful as we 
 automatically convert the queries to use sorting and bucketing properties for 
 join.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3803) explain dependency should show the dependencies hierarchically in presence of views

2012-12-17 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3803:
-

Attachment: hive.3803.4.patch

 explain dependency should show the dependencies hierarchically in presence of 
 views
 ---

 Key: HIVE-3803
 URL: https://issues.apache.org/jira/browse/HIVE-3803
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.3803.1.patch, hive.3803.2.patch, hive.3803.3.patch, 
 hive.3803.4.patch


 It should also include tables whose partitions are being accessed

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3552) HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys

2012-12-17 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3552:
-

Attachment: hive.3552.5.patch

 HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a 
 high number of grouping set keys
 -

 Key: HIVE-3552
 URL: https://issues.apache.org/jira/browse/HIVE-3552
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.3552.1.patch, hive.3552.2.patch, hive.3552.3.patch, 
 hive.3552.4.patch, hive.3552.5.patch


 This is a follow up for HIVE-3433.
 Had a offline discussion with Sambavi - she pointed out a scenario where the
 implementation in HIVE-3433 will not scale. Assume that the user is performing
 a cube on many columns, say '8' columns. So, each row would generate 256 rows
 for the hash table, which may kill the current group by implementation.
 A better implementation would be to add an additional mr job - in the first 
 mr job perform the group by assuming there was no cube. Add another mr job, 
 where
 you would perform the cube. The assumption is that the group by would have 
 decreased the output data significantly, and the rows would appear in the 
 order of
 grouping keys which has a higher probability of hitting the hash table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3552) HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys

2012-12-17 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3552:
-

Status: Patch Available  (was: Open)

comments addressed

 HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a 
 high number of grouping set keys
 -

 Key: HIVE-3552
 URL: https://issues.apache.org/jira/browse/HIVE-3552
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.3552.1.patch, hive.3552.2.patch, hive.3552.3.patch, 
 hive.3552.4.patch, hive.3552.5.patch


 This is a follow up for HIVE-3433.
 Had a offline discussion with Sambavi - she pointed out a scenario where the
 implementation in HIVE-3433 will not scale. Assume that the user is performing
 a cube on many columns, say '8' columns. So, each row would generate 256 rows
 for the hash table, which may kill the current group by implementation.
 A better implementation would be to add an additional mr job - in the first 
 mr job perform the group by assuming there was no cube. Add another mr job, 
 where
 you would perform the cube. The assumption is that the group by would have 
 decreased the output data significantly, and the rows would appear in the 
 order of
 grouping keys which has a higher probability of hitting the hash table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3803) explain dependency should show the dependencies hierarchically in presence of views

2012-12-17 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13534647#comment-13534647
 ] 

Namit Jain commented on HIVE-3803:
--

I was not able to refresh the phabricator entry due to length exceeded (lots of 
log files).
The attached patch file contains all the changes - the code changes are present 
in the phabricator review.

 explain dependency should show the dependencies hierarchically in presence of 
 views
 ---

 Key: HIVE-3803
 URL: https://issues.apache.org/jira/browse/HIVE-3803
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.3803.1.patch, hive.3803.2.patch, hive.3803.3.patch, 
 hive.3803.4.patch


 It should also include tables whose partitions are being accessed

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3537) release locks at the end of move tasks

2012-12-17 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3537:
-

Attachment: hive.3537.5.patch

 release locks at the end of move tasks
 --

 Key: HIVE-3537
 URL: https://issues.apache.org/jira/browse/HIVE-3537
 Project: Hive
  Issue Type: Bug
  Components: Locking, Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.3537.1.patch, hive.3537.2.patch, hive.3537.3.patch, 
 hive.3537.4.patch, hive.3537.5.patch


 Look at HIVE-3106 for details.
 In order to make sure that concurrency is not an issue for multi-table 
 inserts, the current option is to introduce a dependency task, which thereby
 delays the creation of all partitions. It would be desirable to release the
 locks for the outputs as soon as the move task is completed. That way, for
 multi-table inserts, the concurrency can be enabled without delaying any 
 table.
 Currently, the movetask contains a input/output, but they do not seem to be
 populated correctly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3646) Add 'IGNORE PROTECTION' predicate for dropping partitions

2012-12-17 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13534660#comment-13534660
 ] 

Namit Jain commented on HIVE-3646:
--

[~chalfant], please add documentation for this change.

 Add 'IGNORE PROTECTION' predicate for dropping partitions
 -

 Key: HIVE-3646
 URL: https://issues.apache.org/jira/browse/HIVE-3646
 Project: Hive
  Issue Type: New Feature
  Components: CLI
Reporter: Andrew Chalfant
Assignee: Andrew Chalfant
Priority: Minor
 Fix For: 0.11

 Attachments: HIVE-3646.1.patch.txt, HIVE-3646.2.patch.txt, 
 HIVE-3646.3.patch.txt

   Original Estimate: 1m
  Remaining Estimate: 1m

 There are cases where it is desirable to move partitions between clusters. 
 Having to undo protection and then re-protect tables in order to delete 
 partitions from a source are multi-step and can leave us in a failed open 
 state where partition and table metadata is dirty. By implementing an 'rm 
 -rf'-like functionality, we can perform these operations atomically.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3492) Provide ALTER for partition changing bucket number

2012-12-17 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13534661#comment-13534661
 ] 

Namit Jain commented on HIVE-3492:
--

[~navis], please add documentation for this change.

 Provide ALTER for partition changing bucket number 
 ---

 Key: HIVE-3492
 URL: https://issues.apache.org/jira/browse/HIVE-3492
 Project: Hive
  Issue Type: Improvement
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Fix For: 0.11

 Attachments: HIVE-3492.1.patch.txt, HIVE-3492.2.patch.txt, 
 HIVE-3492.D5589.2.patch, HIVE-3492.D5589.3.patch


 As a follow up of HIVE-3283, bucket number of a partition could be 
 set/changed individually by query like 'ALTER table srcpart 
 PARTIRION(ds='1999') SET BUCKETNUM 5'.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3401) Diversify grammar for split sampling

2012-12-17 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13534664#comment-13534664
 ] 

Namit Jain commented on HIVE-3401:
--

[~navis], please add documentation for this change.

 Diversify grammar for split sampling
 

 Key: HIVE-3401
 URL: https://issues.apache.org/jira/browse/HIVE-3401
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-3401.D4821.2.patch, HIVE-3401.D4821.3.patch, 
 HIVE-3401.D4821.4.patch, HIVE-3401.D4821.5.patch, HIVE-3401.D4821.6.patch, 
 HIVE-3401.D4821.7.patch


 Current split sampling only supports grammar like TABLESAMPLE(n PERCENT). But 
 some users wants to specify just the size of input. It can be easily 
 calculated with a few commands but it seemed good to support more grammars 
 something like TABLESAMPLE(500M). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3796) Multi-insert involving bucketed/sorted table turns off merging on all outputs

2012-12-17 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13534665#comment-13534665
 ] 

Namit Jain commented on HIVE-3796:
--

hmmm, So you fixed a bug as a side affect.
let me take a look again

 Multi-insert involving bucketed/sorted table turns off merging on all outputs
 -

 Key: HIVE-3796
 URL: https://issues.apache.org/jira/browse/HIVE-3796
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.11
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-3796.1.patch.txt, HIVE-3796.2.patch.txt, 
 HIVE-3796.3.patch.txt


 When a multi-insert query has at least one output that is bucketed, merging 
 is turned off for all outputs, rather than just the bucketed ones.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3715) float and double calculation is inaccurate in Hive

2012-12-17 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13534708#comment-13534708
 ] 

Namit Jain commented on HIVE-3715:
--

Have you looked at ArciMath BigDecimal - I have not looked at it, but casual 
browsing suggests it might be faster than BigDecimal.

 float and double calculation is inaccurate in Hive
 --

 Key: HIVE-3715
 URL: https://issues.apache.org/jira/browse/HIVE-3715
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 0.10.0
Reporter: Johnny Zhang
Assignee: Johnny Zhang
 Attachments: HIVE-3715.patch.txt


 I found this during debug the e2e test failures. I found Hive miss calculate 
 the float and double value. Take float calculation as an example:
 hive select f from all100k limit 1;
 48308.98
 hive select f/10 from all100k limit 1;
 4830.898046875   --added 04875 in the end
 hive select f*1.01 from all100k limit 1;
 48792.0702734375  --should be 48792.0698
 It might be essentially the same problem as 
 http://effbot.org/pyfaq/why-are-floating-point-calculations-so-inaccurate.htm.
  But since e2e test compare the results with mysql and seems mysql does it 
 right, so it is worthy fixing it in Hive.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3715) float and double calculation is inaccurate in Hive

2012-12-17 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3715:
-

Status: Open  (was: Patch Available)

 float and double calculation is inaccurate in Hive
 --

 Key: HIVE-3715
 URL: https://issues.apache.org/jira/browse/HIVE-3715
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 0.10.0
Reporter: Johnny Zhang
Assignee: Johnny Zhang
 Attachments: HIVE-3715.patch.txt


 I found this during debug the e2e test failures. I found Hive miss calculate 
 the float and double value. Take float calculation as an example:
 hive select f from all100k limit 1;
 48308.98
 hive select f/10 from all100k limit 1;
 4830.898046875   --added 04875 in the end
 hive select f*1.01 from all100k limit 1;
 48792.0702734375  --should be 48792.0698
 It might be essentially the same problem as 
 http://effbot.org/pyfaq/why-are-floating-point-calculations-so-inaccurate.htm.
  But since e2e test compare the results with mysql and seems mysql does it 
 right, so it is worthy fixing it in Hive.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3537) release locks at the end of move tasks

2012-12-17 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3537:
-

Attachment: hive.3537.6.patch

 release locks at the end of move tasks
 --

 Key: HIVE-3537
 URL: https://issues.apache.org/jira/browse/HIVE-3537
 Project: Hive
  Issue Type: Bug
  Components: Locking, Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.3537.1.patch, hive.3537.2.patch, hive.3537.3.patch, 
 hive.3537.4.patch, hive.3537.5.patch, hive.3537.6.patch


 Look at HIVE-3106 for details.
 In order to make sure that concurrency is not an issue for multi-table 
 inserts, the current option is to introduce a dependency task, which thereby
 delays the creation of all partitions. It would be desirable to release the
 locks for the outputs as soon as the move task is completed. That way, for
 multi-table inserts, the concurrency can be enabled without delaying any 
 table.
 Currently, the movetask contains a input/output, but they do not seem to be
 populated correctly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3552) HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys

2012-12-17 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3552:
-

Attachment: hive.3552.6.patch

 HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a 
 high number of grouping set keys
 -

 Key: HIVE-3552
 URL: https://issues.apache.org/jira/browse/HIVE-3552
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.3552.1.patch, hive.3552.2.patch, hive.3552.3.patch, 
 hive.3552.4.patch, hive.3552.5.patch, hive.3552.6.patch


 This is a follow up for HIVE-3433.
 Had a offline discussion with Sambavi - she pointed out a scenario where the
 implementation in HIVE-3433 will not scale. Assume that the user is performing
 a cube on many columns, say '8' columns. So, each row would generate 256 rows
 for the hash table, which may kill the current group by implementation.
 A better implementation would be to add an additional mr job - in the first 
 mr job perform the group by assuming there was no cube. Add another mr job, 
 where
 you would perform the cube. The assumption is that the group by would have 
 decreased the output data significantly, and the rows would appear in the 
 order of
 grouping keys which has a higher probability of hitting the hash table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3778) Add MapJoinDesc.isBucketMapJoin() as part of explain plan

2012-12-16 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13533425#comment-13533425
 ] 

Namit Jain commented on HIVE-3778:
--

Code changes look good.

 Add MapJoinDesc.isBucketMapJoin() as part of explain plan
 -

 Key: HIVE-3778
 URL: https://issues.apache.org/jira/browse/HIVE-3778
 Project: Hive
  Issue Type: Bug
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
Priority: Minor
 Attachments: HIVE-3778.patch.3


 This is follow up of HIVE-3767:
 Add MapJoinDesc.isBucketMapJoin() as part of explain plan

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3806) Ptest failing due to Argument list too long errors

2012-12-16 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3806:
-

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Committed. Thanks Bhushan

 Ptest failing due to Argument list too long errors
 

 Key: HIVE-3806
 URL: https://issues.apache.org/jira/browse/HIVE-3806
 Project: Hive
  Issue Type: Bug
  Components: Testing Infrastructure
Reporter: Bhushan Mandhani
Assignee: Bhushan Mandhani
Priority: Minor
 Attachments: HIVE-3806.1.patch.txt


 ptest creates a really huge shell command to delete from each test host those 
 .q files that it should not be running. For TestCliDriver, the command has 
 become long enough that it is over the threshold allowed by the shell. We 
 should rewrite it so that the same semantics is captured in a shorter command.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3796) Multi-insert involving bucketed/sorted table turns off merging on all outputs

2012-12-16 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13533433#comment-13533433
 ] 

Namit Jain commented on HIVE-3796:
--

+1

 Multi-insert involving bucketed/sorted table turns off merging on all outputs
 -

 Key: HIVE-3796
 URL: https://issues.apache.org/jira/browse/HIVE-3796
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.11
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-3796.1.patch.txt, HIVE-3796.2.patch.txt


 When a multi-insert query has at least one output that is bucketed, merging 
 is turned off for all outputs, rather than just the bucketed ones.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HIVE-446) Implement TRUNCATE

2012-12-16 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain reassigned HIVE-446:
---

Assignee: Navis  (was: Andrew Chalfant)

 Implement TRUNCATE
 --

 Key: HIVE-446
 URL: https://issues.apache.org/jira/browse/HIVE-446
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Prasad Chakka
Assignee: Navis
 Attachments: HIVE-446.D7371.1.patch


 truncate the data but leave the table and metadata intact.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-446) Implement TRUNCATE

2012-12-16 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-446:


Status: Open  (was: Patch Available)

comments on phabricator

 Implement TRUNCATE
 --

 Key: HIVE-446
 URL: https://issues.apache.org/jira/browse/HIVE-446
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Prasad Chakka
Assignee: Navis
 Attachments: HIVE-446.D7371.1.patch


 truncate the data but leave the table and metadata intact.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3752) Add a non-sql API in hive to access data.

2012-12-16 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13533444#comment-13533444
 ] 

Namit Jain commented on HIVE-3752:
--

Nitay, along with patch, can you create a document on apache hive cwiki, with 
the proposed API.
If you dont have wiki permissions, please create an account, and send me your 
id. -
I will give you the required permissions.

 Add a non-sql API in hive to access data.
 -

 Key: HIVE-3752
 URL: https://issues.apache.org/jira/browse/HIVE-3752
 Project: Hive
  Issue Type: Improvement
Reporter: Nitay Joffe
Assignee: Nitay Joffe

 We would like to add an input/output format for accessing Hive data in Hadoop 
 directly without having to use e.g. a transform. Using a transform
 means having to do a whole map-reduce step with its own disk accesses and its 
 imposed structure. It also means needing to have Hive be the base 
 infrastructure for the entire system being developed which is not the right 
 fit as we only need a small part of it (access to the data).
 So we propose adding an API level InputFormat and OutputFormat to Hive that 
 will make it trivially easy to select a table with partition spec and read 
 from / write to it. We chose this design to make it compatible with Hadoop so 
 that existing systems that work with Hadoop's IO API will just work out of 
 the box.
 We need this system for the Giraph graph processing system 
 (http://giraph.apache.org/) as running graph jobs which read/write from Hive 
 is a common use case.
 [~namitjain] [~aching] [~kevinwilfong] [~apresta]
 Input-side (HiveApiInputFormat) review: https://reviews.facebook.net/D7401

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3778) Add MapJoinDesc.isBucketMapJoin() as part of explain plan

2012-12-16 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13533630#comment-13533630
 ] 

Namit Jain commented on HIVE-3778:
--

[~gangtimliu], the tests passed fine.
I wanted to wait for HIVE-3784 before getting this, since HIVE-3784 is a bigger 
patch and will conflict with this (log files).
If I get any comments on HIVE-3784 and anyway I need to refresh, I will commit 
HIVE-3778.

 Add MapJoinDesc.isBucketMapJoin() as part of explain plan
 -

 Key: HIVE-3778
 URL: https://issues.apache.org/jira/browse/HIVE-3778
 Project: Hive
  Issue Type: Bug
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
Priority: Minor
 Attachments: HIVE-3778.patch.3


 This is follow up of HIVE-3767:
 Add MapJoinDesc.isBucketMapJoin() as part of explain plan

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3492) Provide ALTER for partition changing bucket number

2012-12-16 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3492:
-

   Resolution: Fixed
Fix Version/s: 0.11
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Committed. Thanks Navis

 Provide ALTER for partition changing bucket number 
 ---

 Key: HIVE-3492
 URL: https://issues.apache.org/jira/browse/HIVE-3492
 Project: Hive
  Issue Type: Improvement
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Fix For: 0.11

 Attachments: HIVE-3492.1.patch.txt, HIVE-3492.2.patch.txt, 
 HIVE-3492.D5589.2.patch, HIVE-3492.D5589.3.patch


 As a follow up of HIVE-3283, bucket number of a partition could be 
 set/changed individually by query like 'ALTER table srcpart 
 PARTIRION(ds='1999') SET BUCKETNUM 5'.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3562) Some limit can be pushed down to map stage

2012-12-16 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3562:
-

Status: Open  (was: Patch Available)

comments on phabricator

 Some limit can be pushed down to map stage
 --

 Key: HIVE-3562
 URL: https://issues.apache.org/jira/browse/HIVE-3562
 Project: Hive
  Issue Type: Bug
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-3562.D5967.1.patch, HIVE-3562.D5967.2.patch


 Queries with limit clause (with reasonable number), for example
 {noformat}
 select * from src order by key limit 10;
 {noformat}
 makes operator tree, 
 TS-SEL-RS-EXT-LIMIT-FS
 But LIMIT can be partially calculated in RS, reducing size of shuffling.
 TS-SEL-RS(TOP-N)-EXT-LIMIT-FS

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3467) BucketMapJoinOptimizer should optimize joins on partition columns

2012-12-16 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3467:
-

Fix Version/s: 0.11
Affects Version/s: (was: 0.10.0)
   Status: Open  (was: Patch Available)

comments on phabricator

 BucketMapJoinOptimizer should optimize joins on partition columns
 -

 Key: HIVE-3467
 URL: https://issues.apache.org/jira/browse/HIVE-3467
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Kevin Wilfong
Assignee: Zhenxiao Luo
 Fix For: 0.11

 Attachments: HIVE-3467.1.patch.txt, HIVE-3467.2.patch.txt, 
 HIVE-3467.3.patch.txt


 Consider the query:
 SELECT * FROM t1 JOIN t2 on t1.part = t2.part and t1.key = t2.key;
 Where t1 and t2 are partitioned by part and bucketed by key.
 Suppose part take values 1 and 2 and t1 and t2 are bucketed into 2 buckets.
 The bucket map join optimizer will put the first bucket of part=1 and part=2 
 partitions of t2 into the same mapper as that of part=1 partition of t1.  It 
 will do the same for the part=2 partition of t1.
 It could take advantage of the partition values and send the first bucket of 
 only the part=1 partitions of t1 and t2 into one mapper and the first bucket 
 of only the part=2 partitions into another.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3803) explain dependency should show the dependencies hierarchically in presence of views

2012-12-16 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3803:
-

Attachment: hive.3803.2.patch

 explain dependency should show the dependencies hierarchically in presence of 
 views
 ---

 Key: HIVE-3803
 URL: https://issues.apache.org/jira/browse/HIVE-3803
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.3803.1.patch, hive.3803.2.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3803) explain dependency should show the dependencies hierarchically in presence of views

2012-12-16 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3803:
-

Description: It should also include tables whose partitions are being 
accessed

 explain dependency should show the dependencies hierarchically in presence of 
 views
 ---

 Key: HIVE-3803
 URL: https://issues.apache.org/jira/browse/HIVE-3803
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.3803.1.patch, hive.3803.2.patch


 It should also include tables whose partitions are being accessed

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-3811) explain dependency should work with views

2012-12-16 Thread Namit Jain (JIRA)
Namit Jain created HIVE-3811:


 Summary: explain dependency should work with views
 Key: HIVE-3811
 URL: https://issues.apache.org/jira/browse/HIVE-3811
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain


View partitions should also show up

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3803) explain dependency should show the dependencies hierarchically in presence of views

2012-12-14 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13532183#comment-13532183
 ] 

Namit Jain commented on HIVE-3803:
--

https://reviews.facebook.net/D7377

 explain dependency should show the dependencies hierarchically in presence of 
 views
 ---

 Key: HIVE-3803
 URL: https://issues.apache.org/jira/browse/HIVE-3803
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3784) de-emphasize mapjoin hint

2012-12-13 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3784:
-

Attachment: hive.3784.4.patch

 de-emphasize mapjoin hint
 -

 Key: HIVE-3784
 URL: https://issues.apache.org/jira/browse/HIVE-3784
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.3784.1.patch, hive.3784.2.patch, hive.3784.3.patch, 
 hive.3784.4.patch


 hive.auto.convert.join has been around for a long time, and is pretty stable.
 When mapjoin hint was created, the above parameter did not exist.
 The only reason for the user to specify a mapjoin currently is if they want
 it to be converted to a bucketed-mapjoin or a sort-merge bucketed mapjoin.
 Eventually, that should also go away, but that may take some time to 
 stabilize.
 There are many rules in SemanticAnalyzer to handle the following trees:
 ReduceSink - MapJoin
 Union  - MapJoin
 MapJoin- MapJoin
 This should not be supported anymore. In any of the above scenarios, the
 user can get the mapjoin behavior by setting hive.auto.convert.join to true
 and not specifying the hint. This will simplify the code a lot.
 What does everyone think ?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3784) de-emphasize mapjoin hint

2012-12-13 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530798#comment-13530798
 ] 

Namit Jain commented on HIVE-3784:
--

This removes a lot of redundant code, and also optimizes a lot of queries - 
mapjoin followed by groupby.
Due to that, there are a lof of plan changes.

 de-emphasize mapjoin hint
 -

 Key: HIVE-3784
 URL: https://issues.apache.org/jira/browse/HIVE-3784
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.3784.1.patch, hive.3784.2.patch, hive.3784.3.patch, 
 hive.3784.4.patch


 hive.auto.convert.join has been around for a long time, and is pretty stable.
 When mapjoin hint was created, the above parameter did not exist.
 The only reason for the user to specify a mapjoin currently is if they want
 it to be converted to a bucketed-mapjoin or a sort-merge bucketed mapjoin.
 Eventually, that should also go away, but that may take some time to 
 stabilize.
 There are many rules in SemanticAnalyzer to handle the following trees:
 ReduceSink - MapJoin
 Union  - MapJoin
 MapJoin- MapJoin
 This should not be supported anymore. In any of the above scenarios, the
 user can get the mapjoin behavior by setting hive.auto.convert.join to true
 and not specifying the hint. This will simplify the code a lot.
 What does everyone think ?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3783) stats19.q is failing on trunk

2012-12-13 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3783:
-

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Committed. Thanks Kevin

 stats19.q is failing on trunk
 -

 Key: HIVE-3783
 URL: https://issues.apache.org/jira/browse/HIVE-3783
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11
Reporter: Ashutosh Chauhan
Assignee: Kevin Wilfong
 Attachments: HIVE-3783.1.patch.txt


 This test-case was introduced in HIVE-3750 and is failing since as soon as it 
 was introduced. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3784) de-emphasize mapjoin hint

2012-12-13 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530821#comment-13530821
 ] 

Namit Jain commented on HIVE-3784:
--

Verified that all the plan changes actually remove a redundant MR stage. There 
is no change in query results.

 de-emphasize mapjoin hint
 -

 Key: HIVE-3784
 URL: https://issues.apache.org/jira/browse/HIVE-3784
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.3784.1.patch, hive.3784.2.patch, hive.3784.3.patch, 
 hive.3784.4.patch


 hive.auto.convert.join has been around for a long time, and is pretty stable.
 When mapjoin hint was created, the above parameter did not exist.
 The only reason for the user to specify a mapjoin currently is if they want
 it to be converted to a bucketed-mapjoin or a sort-merge bucketed mapjoin.
 Eventually, that should also go away, but that may take some time to 
 stabilize.
 There are many rules in SemanticAnalyzer to handle the following trees:
 ReduceSink - MapJoin
 Union  - MapJoin
 MapJoin- MapJoin
 This should not be supported anymore. In any of the above scenarios, the
 user can get the mapjoin behavior by setting hive.auto.convert.join to true
 and not specifying the hint. This will simplify the code a lot.
 What does everyone think ?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3784) de-emphasize mapjoin hint

2012-12-13 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3784:
-

Status: Patch Available  (was: Open)

 de-emphasize mapjoin hint
 -

 Key: HIVE-3784
 URL: https://issues.apache.org/jira/browse/HIVE-3784
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.3784.1.patch, hive.3784.2.patch, hive.3784.3.patch, 
 hive.3784.4.patch


 hive.auto.convert.join has been around for a long time, and is pretty stable.
 When mapjoin hint was created, the above parameter did not exist.
 The only reason for the user to specify a mapjoin currently is if they want
 it to be converted to a bucketed-mapjoin or a sort-merge bucketed mapjoin.
 Eventually, that should also go away, but that may take some time to 
 stabilize.
 There are many rules in SemanticAnalyzer to handle the following trees:
 ReduceSink - MapJoin
 Union  - MapJoin
 MapJoin- MapJoin
 This should not be supported anymore. In any of the above scenarios, the
 user can get the mapjoin behavior by setting hive.auto.convert.join to true
 and not specifying the hint. This will simplify the code a lot.
 What does everyone think ?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3401) Diversify grammar for split sampling

2012-12-13 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13531231#comment-13531231
 ] 

Namit Jain commented on HIVE-3401:
--

i gave you permissions - you are the best, take a first cut at the document, 
and post the link here.
We can always review it.

 Diversify grammar for split sampling
 

 Key: HIVE-3401
 URL: https://issues.apache.org/jira/browse/HIVE-3401
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-3401.D4821.2.patch, HIVE-3401.D4821.3.patch, 
 HIVE-3401.D4821.4.patch, HIVE-3401.D4821.5.patch, HIVE-3401.D4821.6.patch, 
 HIVE-3401.D4821.7.patch


 Current split sampling only supports grammar like TABLESAMPLE(n PERCENT). But 
 some users wants to specify just the size of input. It can be easily 
 calculated with a few commands but it seemed good to support more grammars 
 something like TABLESAMPLE(500M). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3795) NPE in SELECT when WHERE-clause is an and/or/not operation involving null

2012-12-13 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13531248#comment-13531248
 ] 

Namit Jain commented on HIVE-3795:
--

yes, and click on 'submit patch' if it is ready for review

 NPE in SELECT when WHERE-clause is an and/or/not operation involving null
 -

 Key: HIVE-3795
 URL: https://issues.apache.org/jira/browse/HIVE-3795
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Xiao Jiang
Assignee: Xiao Jiang
Priority: Trivial
 Attachments: HIVE-3795.1.patch.txt


 Sometimes users forget to quote date constants in queries. For example, 
 SELECT * FROM some_table WHERE ds = 2012-12-10 and ds = 2012-12-12; . In 
 such cases, if the WHERE-clause contains and/or/not operation, it would throw 
 NPE exception. That's because PcrExprProcFactory in ql/optimizer forgot to 
 check null. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3752) Add a non-sql API in hive to access data.

2012-12-13 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13531283#comment-13531283
 ] 

Namit Jain commented on HIVE-3752:
--

For a simple usecase like ours, we dont want to depend on additional layers.
It is a much simpler change to support in hive, rather than fixing a lot more 
in HCatalog.

Eventually, if HCat moves into Hive, these 2 APIs should be merged. But, that 
may take a long time,
and it may be much easier for us to have a much more light-weight solution in 
hive, rather than wait.


 Add a non-sql API in hive to access data.
 -

 Key: HIVE-3752
 URL: https://issues.apache.org/jira/browse/HIVE-3752
 Project: Hive
  Issue Type: Improvement
Reporter: Nitay Joffe

 We would like to add an input/output format for accessing Hive data in Hadoop 
 directly without having to use e.g. a transform. Using a transform
 means having to do a whole map-reduce step with its own disk accesses and its 
 imposed structure. It also means needing to have Hive be the base 
 infrastructure for the entire system being developed which is not the right 
 fit as we only need a small part of it (access to the data).
 So we propose adding an API level InputFormat and OutputFormat to Hive that 
 will make it trivially easy to select a table with partition spec and read 
 from / write to it. We chose this design to make it compatible with Hadoop so 
 that existing systems that work with Hadoop's IO API will just work out of 
 the box.
 We need this system for the Giraph graph processing system 
 (http://giraph.apache.org/) as running graph jobs which read/write from Hive 
 is a common use case.
 [~namitjain] [~aching] [~kevinwilfong] [~apresta]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3752) Add a non-sql API in hive to access data.

2012-12-13 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13532038#comment-13532038
 ] 

Namit Jain commented on HIVE-3752:
--

Nitay, can you add the patch with the API ?

 Add a non-sql API in hive to access data.
 -

 Key: HIVE-3752
 URL: https://issues.apache.org/jira/browse/HIVE-3752
 Project: Hive
  Issue Type: Improvement
Reporter: Nitay Joffe

 We would like to add an input/output format for accessing Hive data in Hadoop 
 directly without having to use e.g. a transform. Using a transform
 means having to do a whole map-reduce step with its own disk accesses and its 
 imposed structure. It also means needing to have Hive be the base 
 infrastructure for the entire system being developed which is not the right 
 fit as we only need a small part of it (access to the data).
 So we propose adding an API level InputFormat and OutputFormat to Hive that 
 will make it trivially easy to select a table with partition spec and read 
 from / write to it. We chose this design to make it compatible with Hadoop so 
 that existing systems that work with Hadoop's IO API will just work out of 
 the box.
 We need this system for the Giraph graph processing system 
 (http://giraph.apache.org/) as running graph jobs which read/write from Hive 
 is a common use case.
 [~namitjain] [~aching] [~kevinwilfong] [~apresta]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


<    5   6   7   8   9   10   11   12   13   14   >