[jira] [Commented] (HIVE-933) Infer bucketing/sorting properties

2013-01-13 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13552431#comment-13552431
 ] 

Namit Jain commented on HIVE-933:
-

some more minor comments.
Looks mostly good.


> Infer bucketing/sorting properties
> --
>
> Key: HIVE-933
> URL: https://issues.apache.org/jira/browse/HIVE-933
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Kevin Wilfong
> Attachments: HIVE-933.1.patch.txt, HIVE-933.2.patch.txt, 
> HIVE-933.3.patch.txt, HIVE-933.4.patch.txt, HIVE-933.5.patch.txt, 
> HIVE-933.6.patch.txt, HIVE-933.7.patch.txt, HIVE-933.8.patch.txt, 
> HIVE-933.9.patch.txt
>
>
> This is a long-term plan, and may require major changes.
> From the query, we can figure out the sorting/bucketing properties, and 
> change the metadata of the destination at that time.
> However, this means that different partitions may have different metadata. 
> Currently, the query plan is same for all the 
> partitions of the table - we can do the following:
> 1. In the first cut, have a simple approach where you take the union all 
> metadata, and create the most defensive plan.
> 2. Enhance mapredWork() to include partition specific operator trees.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-933) Infer bucketing/sorting properties

2013-01-13 Thread Kevin Wilfong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13552421#comment-13552421
 ] 

Kevin Wilfong commented on HIVE-933:


To be more explicit, the other benefits include being able to turn this on for 
unpartitioned tables and easier auditing.

> Infer bucketing/sorting properties
> --
>
> Key: HIVE-933
> URL: https://issues.apache.org/jira/browse/HIVE-933
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Kevin Wilfong
> Attachments: HIVE-933.1.patch.txt, HIVE-933.2.patch.txt, 
> HIVE-933.3.patch.txt, HIVE-933.4.patch.txt, HIVE-933.5.patch.txt, 
> HIVE-933.6.patch.txt, HIVE-933.7.patch.txt, HIVE-933.8.patch.txt, 
> HIVE-933.9.patch.txt
>
>
> This is a long-term plan, and may require major changes.
> From the query, we can figure out the sorting/bucketing properties, and 
> change the metadata of the destination at that time.
> However, this means that different partitions may have different metadata. 
> Currently, the query plan is same for all the 
> partitions of the table - we can do the following:
> 1. In the first cut, have a simple approach where you take the union all 
> metadata, and create the most defensive plan.
> 2. Enhance mapredWork() to include partition specific operator trees.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-933) Infer bucketing/sorting properties

2013-01-13 Thread Kevin Wilfong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Wilfong updated HIVE-933:
---

Status: Open  (was: Patch Available)

> Infer bucketing/sorting properties
> --
>
> Key: HIVE-933
> URL: https://issues.apache.org/jira/browse/HIVE-933
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Kevin Wilfong
> Attachments: HIVE-933.1.patch.txt, HIVE-933.2.patch.txt, 
> HIVE-933.3.patch.txt, HIVE-933.4.patch.txt, HIVE-933.5.patch.txt, 
> HIVE-933.6.patch.txt, HIVE-933.7.patch.txt, HIVE-933.8.patch.txt, 
> HIVE-933.9.patch.txt
>
>
> This is a long-term plan, and may require major changes.
> From the query, we can figure out the sorting/bucketing properties, and 
> change the metadata of the destination at that time.
> However, this means that different partitions may have different metadata. 
> Currently, the query plan is same for all the 
> partitions of the table - we can do the following:
> 1. In the first cut, have a simple approach where you take the union all 
> metadata, and create the most defensive plan.
> 2. Enhance mapredWork() to include partition specific operator trees.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-933) Infer bucketing/sorting properties

2013-01-13 Thread Kevin Wilfong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13552420#comment-13552420
 ] 

Kevin Wilfong commented on HIVE-933:


I'd considered this, and have been going back and forth on the idea, it'd be 
useful for other reasons too.

I wouldn't want to do this as a partition parameter, it'd have to be a new 
column in the SDS table.

I've mainly been concerned about whether or not it's worth it, given that we 
can approximate it by checking if the partition is bucketed/sorted and the 
table is not.  I suppose consistency in the scenario you described tips the 
scales, I'll add it.

> Infer bucketing/sorting properties
> --
>
> Key: HIVE-933
> URL: https://issues.apache.org/jira/browse/HIVE-933
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Kevin Wilfong
> Attachments: HIVE-933.1.patch.txt, HIVE-933.2.patch.txt, 
> HIVE-933.3.patch.txt, HIVE-933.4.patch.txt, HIVE-933.5.patch.txt, 
> HIVE-933.6.patch.txt, HIVE-933.7.patch.txt, HIVE-933.8.patch.txt, 
> HIVE-933.9.patch.txt
>
>
> This is a long-term plan, and may require major changes.
> From the query, we can figure out the sorting/bucketing properties, and 
> change the metadata of the destination at that time.
> However, this means that different partitions may have different metadata. 
> Currently, the query plan is same for all the 
> partitions of the table - we can do the following:
> 1. In the first cut, have a simple approach where you take the union all 
> metadata, and create the most defensive plan.
> 2. Enhance mapredWork() to include partition specific operator trees.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3650) Hive List Bucketing - validation

2013-01-13 Thread Gang Tim Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3650:
---

Issue Type: Improvement  (was: New Feature)

> Hive List Bucketing - validation
> 
>
> Key: HIVE-3650
> URL: https://issues.apache.org/jira/browse/HIVE-3650
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 0.10.0
>Reporter: Gang Tim Liu
>Assignee: Gang Tim Liu
>Priority: Minor
>
> Many validations are done in each patch. This issue tracks left-over from 
> complete list
> https://cwiki.apache.org/confluence/display/Hive/ListBucketing

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-3890) Hive List Bucketing - merge per skewed dir

2013-01-13 Thread Gang Tim Liu (JIRA)
Gang Tim Liu created HIVE-3890:
--

 Summary: Hive List Bucketing - merge per skewed dir
 Key: HIVE-3890
 URL: https://issues.apache.org/jira/browse/HIVE-3890
 Project: Hive
  Issue Type: Improvement
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu


Right now, in list bucketing DML, if it involves merge, it uses 1 MR job for 
all skewed directory. If no. of files is big, it might triggers hive client 
side OOM due to too many spits. If we use 1 MR job for one skewed dir, it will 
reduce OOM risks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-933) Infer bucketing/sorting properties

2013-01-13 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13552417#comment-13552417
 ] 

Namit Jain commented on HIVE-933:
-

[~kevinwilfong], do you think we should differentiate between the fact that the 
user specified bucketing vs. bucketing was inferred ?
For eg: merge cannot be performed on bucketed partitions. So, a working 
concatenate statement today might start throwing an error if
this is deployed. The behavior should be different for the 2 scenarios: if 
bucketing/sorting was inferred, merge should go through and
bucketing/sorting should be turned off.

> Infer bucketing/sorting properties
> --
>
> Key: HIVE-933
> URL: https://issues.apache.org/jira/browse/HIVE-933
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Kevin Wilfong
> Attachments: HIVE-933.1.patch.txt, HIVE-933.2.patch.txt, 
> HIVE-933.3.patch.txt, HIVE-933.4.patch.txt, HIVE-933.5.patch.txt, 
> HIVE-933.6.patch.txt, HIVE-933.7.patch.txt, HIVE-933.8.patch.txt, 
> HIVE-933.9.patch.txt
>
>
> This is a long-term plan, and may require major changes.
> From the query, we can figure out the sorting/bucketing properties, and 
> change the metadata of the destination at that time.
> However, this means that different partitions may have different metadata. 
> Currently, the query plan is same for all the 
> partitions of the table - we can do the following:
> 1. In the first cut, have a simple approach where you take the union all 
> metadata, and create the most defensive plan.
> 2. Enhance mapredWork() to include partition specific operator trees.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3824) bug if different serdes are used for different partitions

2013-01-13 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13552406#comment-13552406
 ] 

Namit Jain commented on HIVE-3824:
--

Nothing, just the test output.
Due to https://issues.apache.org/jira/browse/HIVE-3803, the test output changed.

The tests passed.
[~ashutoshc], should I commit this ?

> bug if different serdes are used for different partitions
> -
>
> Key: HIVE-3824
> URL: https://issues.apache.org/jira/browse/HIVE-3824
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3824.1.patch, hive.3824.3.patch, hive.3824.4.patch
>
>
> Consider the following testcase:
> create table tst5 (key string, value string) partitioned by (ds string) 
> stored as rcfile;
> insert overwrite table tst5 partition (ds='1') select * from src;
> insert overwrite table tst5 partition (ds='2') select * from src;
> insert overwrite table tst5 partition (ds='3') select * from src;
> alter table tst5 stored as sequencefile; 
> insert overwrite table tst5 partition (ds='4') select * from src;
> insert overwrite table tst5 partition (ds='5') select * from src;
> insert overwrite table tst5 partition (ds='6') select * from src;  
> alter table tst5 set serde 
> 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'; 
> insert overwrite table tst5 partition (ds='7') select * from src;
> insert overwrite table tst5 partition (ds='8') select * from src;
> insert overwrite table tst5 partition (ds='9') select * from src;  
> The following query works fine:
>  select key + key, value from tst5 where ((ds = '4') or (ds = '1'));   
> since both the partitions use ColumnarSerDe
> But the following query fails:
> select key + key, value from tst5 where ((ds = '4') or (ds = '1') or 
> (ds='7'));
> since different serdes are used.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3807) Hive authorization should use short username when Kerberos authentication

2013-01-13 Thread Kai Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13552385#comment-13552385
 ] 

Kai Zheng commented on HIVE-3807:
-

Ashutosh thanks for your view. I agree we need to consider the incompatible 
issue, and will work on it as you suggested.

> Hive authorization should use short username when Kerberos authentication
> -
>
> Key: HIVE-3807
> URL: https://issues.apache.org/jira/browse/HIVE-3807
> Project: Hive
>  Issue Type: Improvement
>  Components: Authorization
>Affects Versions: 0.9.0, 0.10.0
>Reporter: Kai Zheng
>Assignee: Kai Zheng
> Attachments: HIVE-3807.patch
>
>
> Currently when authentication method is Kerberos,Hive authorization uses user 
> full name as privilege principal, for example, it uses j...@example.com 
> instead of john.
> It should use the short name instead. The benefits:
> 1. Be consistent. Hadoop, HBase and etc they all use short name in related 
> ACLs or authorizations. For Hive authorization works well with them, this 
> should be.
> 2. Be convenient. It's very inconvenient to use the lengthy Kerberos 
> principal name when grant or revoke privileges via Hive CLI.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3137) Including row update and delete option in hive

2013-01-13 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-3137:
---

Fix Version/s: (was: 0.10.0)

> Including row update and delete option in hive
> --
>
> Key: HIVE-3137
> URL: https://issues.apache.org/jira/browse/HIVE-3137
> Project: Hive
>  Issue Type: New Feature
>  Components: Database/Schema
>Affects Versions: 0.8.0, 0.9.0
> Environment: Ubuntu
>Reporter: Unnikrishnan V T
>Priority: Trivial
>  Labels: hadoop
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-1695) MapJoin followed by ReduceSink should be done as single MapReduce Job

2013-01-13 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-1695:
---

Fix Version/s: (was: 0.10.0)

> MapJoin followed by ReduceSink should be done as single MapReduce Job
> -
>
> Key: HIVE-1695
> URL: https://issues.apache.org/jira/browse/HIVE-1695
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Amareshwari Sriramadasu
>Assignee: Sreekanth Ramakrishnan
> Attachments: hive-1695-1.patch, hive-1695.patch
>
>
> Currently MapJoin followed by ReduceSink runs as two MapReduce jobs : One map 
> only job followed by a Map-Reduce job. It can be combined into single 
> MapReduce Job.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3598) physical optimizer changes for auto sort-merge join

2013-01-13 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-3598:
---

Fix Version/s: (was: 0.10.0)

> physical optimizer changes for auto sort-merge join
> ---
>
> Key: HIVE-3598
> URL: https://issues.apache.org/jira/browse/HIVE-3598
> Project: Hive
>  Issue Type: Bug
>Reporter: Namit Jain
>Assignee: Namit Jain
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2397) Support with rollup option for group by

2013-01-13 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-2397:
---

Fix Version/s: 0.10.0

> Support with rollup option for group by
> ---
>
> Key: HIVE-2397
> URL: https://issues.apache.org/jira/browse/HIVE-2397
> Project: Hive
>  Issue Type: New Feature
>Reporter: Kevin Wilfong
>Assignee: Namit Jain
> Fix For: 0.10.0
>
> Attachments: HIVE-2397.2.patch.txt, HIVE-2397.3.patch.txt, 
> HIVE-2397.4.patch.txt, HIVE-2397.5.patch.txt
>
>
> We should support the ROLLUP operator similar to the way MySQL is 
> implemented. 
> Exerted from MySQL documents:
> mysql> SELECT year, country, product, SUM(profit)
> -> FROM sales
> -> GROUP BY year, country, product WITH ROLLUP;
> +--+-++-+
> | year | country | product| SUM(profit) |
> +--+-++-+
> | 2000 | Finland | Computer   |1500 |
> | 2000 | Finland | Phone  | 100 |
> | 2000 | Finland | NULL   |1600 |
> | 2000 | India   | Calculator | 150 |
> | 2000 | India   | Computer   |1200 |
> | 2000 | India   | NULL   |1350 |
> | 2000 | USA | Calculator |  75 |
> | 2000 | USA | Computer   |1500 |
> | 2000 | USA | NULL   |1575 |
> | 2000 | NULL| NULL   |4525 |
> | 2001 | Finland | Phone  |  10 |
> | 2001 | Finland | NULL   |  10 |
> | 2001 | USA | Calculator |  50 |
> | 2001 | USA | Computer   |2700 |
> | 2001 | USA | TV | 250 |
> | 2001 | USA | NULL   |3000 |
> | 2001 | NULL| NULL   |3010 |
> | NULL | NULL| NULL   |7535 |
> +--+-++-+
> http://dev.mysql.com/doc/refman/5.0/en/group-by-modifiers.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3042) Thrift classes do not need to be passed to the mappers and reducers

2013-01-13 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-3042:
---

Fix Version/s: (was: 0.10.0)

> Thrift classes do not need to be passed to the mappers and reducers
> ---
>
> Key: HIVE-3042
> URL: https://issues.apache.org/jira/browse/HIVE-3042
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.10.0
>Reporter: Namit Jain
>Assignee: Paul Yang
> Attachments: HIVE-3042.1.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2693) Add DECIMAL data type

2013-01-13 Thread Mark Grover (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13552312#comment-13552312
 ] 

Mark Grover commented on HIVE-2693:
---

Gunther: thanks for the feedback! I understand what you are saying about the 
different representation for the same number. That's why I referred to the 
scale change as inconsistencies for floor/ceil but a bug for round. I called 
them inconsistencies for floor/ceil because they are the exact same numbers and 
like we have discussed before the decimal patch considers same numbers with 
different representation equal (by using compareTo() instead of equals()). 
However, here is why I considered it a bug for round because by definition 
round(x,d) rounds number x to d decimal places. While 0.00, 0 and 0E-99 are all 
the same number (with different representation), only one of them has 2 decimal 
places as expected by the result of round(1E-99, 2). Here is an example where 
it could cause problems: if someone was using the thrift client to issue Hive 
queries from C++, and issued a query like this:
{code}
select round(mycol,2) from mytable;
{code}
and split the output based on the decimal point to obtain the fractional part; 
they would expect the fractional part to fit in a string of length of 2. 
However, given the present implementation, that's not the case.

Consistency with MySQL aside, I don't think we should be setting the value back 
to the original scale, especially in round.

Having said the above, I agree that they are just different representations of 
the same number so if you feel strongly about not changing this, I happily +1 
patch 21.

> Add DECIMAL data type
> -
>
> Key: HIVE-2693
> URL: https://issues.apache.org/jira/browse/HIVE-2693
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor, Types
>Affects Versions: 0.10.0
>Reporter: Carl Steinbach
>Assignee: Prasad Mujumdar
> Attachments: 2693_7.patch, 2693_8.patch, 2693_fix_all_tests1.patch, 
> HIVE-2693-10.patch, HIVE-2693-11.patch, HIVE-2693-12-SortableSerDe.patch, 
> HIVE-2693-13.patch, HIVE-2693-14.patch, HIVE-2693-15.patch, 
> HIVE-2693-16.patch, HIVE-2693-17.patch, HIVE-2693-18.patch, 
> HIVE-2693-19.patch, HIVE-2693-1.patch.txt, HIVE-2693-20.patch, 
> HIVE-2693-21.patch, HIVE-2693-22.patch, HIVE-2693-all.patch, 
> HIVE-2693.D7683.1.patch, HIVE-2693-fix.patch, HIVE-2693.patch, 
> HIVE-2693-take3.patch, HIVE-2693-take4.patch
>
>
> Add support for the DECIMAL data type. HIVE-2272 (TIMESTAMP) provides a nice 
> template for how to do this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3824) bug if different serdes are used for different partitions

2013-01-13 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13552301#comment-13552301
 ] 

Ashutosh Chauhan commented on HIVE-3824:


Whats different in this patch and previous one?

> bug if different serdes are used for different partitions
> -
>
> Key: HIVE-3824
> URL: https://issues.apache.org/jira/browse/HIVE-3824
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3824.1.patch, hive.3824.3.patch, hive.3824.4.patch
>
>
> Consider the following testcase:
> create table tst5 (key string, value string) partitioned by (ds string) 
> stored as rcfile;
> insert overwrite table tst5 partition (ds='1') select * from src;
> insert overwrite table tst5 partition (ds='2') select * from src;
> insert overwrite table tst5 partition (ds='3') select * from src;
> alter table tst5 stored as sequencefile; 
> insert overwrite table tst5 partition (ds='4') select * from src;
> insert overwrite table tst5 partition (ds='5') select * from src;
> insert overwrite table tst5 partition (ds='6') select * from src;  
> alter table tst5 set serde 
> 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'; 
> insert overwrite table tst5 partition (ds='7') select * from src;
> insert overwrite table tst5 partition (ds='8') select * from src;
> insert overwrite table tst5 partition (ds='9') select * from src;  
> The following query works fine:
>  select key + key, value from tst5 where ((ds = '4') or (ds = '1'));   
> since both the partitions use ColumnarSerDe
> But the following query fails:
> select key + key, value from tst5 where ((ds = '4') or (ds = '1') or 
> (ds='7'));
> since different serdes are used.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3004) RegexSerDe should support other column types in addition to STRING

2013-01-13 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13552299#comment-13552299
 ] 

Ashutosh Chauhan commented on HIVE-3004:


Left a comment on RB.

> RegexSerDe should support other column types in addition to STRING
> --
>
> Key: HIVE-3004
> URL: https://issues.apache.org/jira/browse/HIVE-3004
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Carl Steinbach
>Assignee: Shreepadma Venugopalan
> Attachments: HIVE-3004-1.patch, HIVE-3004.2.patch, 
> HIVE-3004.3.patch.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2693) Add DECIMAL data type

2013-01-13 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13552297#comment-13552297
 ] 

Gunther Hagleitner commented on HIVE-2693:
--

Mark: Scale is different from the return type. Seems both you and Namit are 
open to using decimal as the return type with documentation. Given that decimal 
avoids overflow issues, I think we can settle on the current (patch .21) 
implementation for this.

As far as the scale/representation of the returned decimal goes: I don't see a 
bug. 0.00, 0, 0E-99 are all the same number it's just the representation that's 
different. And since all these numbers are considered the same they will be 
handled correctly by the system (we don't consider identical numbers with 
different scale different). Keeping representation in line with MySQL has other 
implications. For one, MySQL doesn't use scientific notation, also MySQL 
decimals are defined with additional parameters prec and scale, which control 
among other things the representation.

My proposal: Keep patch HIVE-2693-21.patch as is and check it in. It passed 
muster with Ashutosh, Carl and Namit and gives correct results in Mark's cases 
as well. Open a new jira to extend the feature and give users control over 
representation with optional prec/scale (i.e.: decimal(5,2)).



> Add DECIMAL data type
> -
>
> Key: HIVE-2693
> URL: https://issues.apache.org/jira/browse/HIVE-2693
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor, Types
>Affects Versions: 0.10.0
>Reporter: Carl Steinbach
>Assignee: Prasad Mujumdar
> Attachments: 2693_7.patch, 2693_8.patch, 2693_fix_all_tests1.patch, 
> HIVE-2693-10.patch, HIVE-2693-11.patch, HIVE-2693-12-SortableSerDe.patch, 
> HIVE-2693-13.patch, HIVE-2693-14.patch, HIVE-2693-15.patch, 
> HIVE-2693-16.patch, HIVE-2693-17.patch, HIVE-2693-18.patch, 
> HIVE-2693-19.patch, HIVE-2693-1.patch.txt, HIVE-2693-20.patch, 
> HIVE-2693-21.patch, HIVE-2693-22.patch, HIVE-2693-all.patch, 
> HIVE-2693.D7683.1.patch, HIVE-2693-fix.patch, HIVE-2693.patch, 
> HIVE-2693-take3.patch, HIVE-2693-take4.patch
>
>
> Add support for the DECIMAL data type. HIVE-2272 (TIMESTAMP) provides a nice 
> template for how to do this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Review Request: HIVE-3004: RegexSerDe should support other column types in addition to STRING

2013-01-13 Thread Ashutosh Chauhan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8931/#review15305
---



serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java


Are we sure that reference equality here is guaranteed to work? In other 
words, we don't need .equals() instead?
Same question for all == comparisons on subsequent line as well.


- Ashutosh Chauhan


On Jan. 12, 2013, 12:28 a.m., Shreepadma Venugopalan wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/8931/
> ---
> 
> (Updated Jan. 12, 2013, 12:28 a.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Description
> ---
> 
> This patch enhances regex serde to parse column types other than STRING. Only 
> primitive types are supported.
> 
> 
> This addresses bug HIVE-3004.
> https://issues.apache.org/jira/browse/HIVE-3004
> 
> 
> Diffs
> -
> 
>   ql/src/test/queries/clientnegative/serde_regex.q 6603b91 
>   ql/src/test/queries/clientpositive/serde_regex.q c6809cb 
>   ql/src/test/results/clientnegative/serde_regex.q.out 03fe907 
>   ql/src/test/results/clientpositive/serde_regex.q.out a8ce604 
>   serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java e728244 
> 
> Diff: https://reviews.apache.org/r/8931/diff/
> 
> 
> Testing
> ---
> 
> New test cases have been added and they pass.
> 
> 
> Thanks,
> 
> Shreepadma Venugopalan
> 
>



[jira] [Commented] (HIVE-1555) JDBC Storage Handler

2013-01-13 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13552295#comment-13552295
 ] 

Ashutosh Chauhan commented on HIVE-1555:


Looking at comments and watchers list, looks like there is a lot of interest in 
this. But, I don't see any patch yet. Some one wants to take this up? 

> JDBC Storage Handler
> 
>
> Key: HIVE-1555
> URL: https://issues.apache.org/jira/browse/HIVE-1555
> Project: Hive
>  Issue Type: New Feature
>  Components: JDBC
>Reporter: Bob Robertson
>Assignee: Andrew Wilson
> Attachments: JDBCStorageHandler Design Doc.pdf
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> With the Cassandra and HBase Storage Handlers I thought it would make sense 
> to include a generic JDBC RDBMS Storage Handler so that you could import a 
> standard DB table into Hive. Many people must want to perform HiveQL joins, 
> etc against tables in other systems etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3145) Lock support for Metastore calls

2013-01-13 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13552294#comment-13552294
 ] 

Ashutosh Chauhan commented on HIVE-3145:


Whats the motivation for this? 

> Lock support for Metastore calls
> 
>
> Key: HIVE-3145
> URL: https://issues.apache.org/jira/browse/HIVE-3145
> Project: Hive
>  Issue Type: Improvement
>  Components: Locking, Metastore
>Affects Versions: 0.10.0
>Reporter: David Goode
>Assignee: Andrew Chalfant
>Priority: Minor
> Attachments: HIVE3145_lock.diff
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Added locking to the metastore calls. Currently failing some unit tests due 
> to improper configuration I think; this needs to be resolved and new unit 
> tests added. Also may want some code cleanup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2013-01-13 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13552290#comment-13552290
 ] 

Ashutosh Chauhan commented on HIVE-2206:


If Yin wants to provide a patch against a stable (or any) branch, thats his 
choice. But, for patch to get committed it needs to get committed on trunk 
first.

> add a new optimizer for query correlation discovery and optimization
> 
>
> Key: HIVE-2206
> URL: https://issues.apache.org/jira/browse/HIVE-2206
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.10.0
>Reporter: He Yongqiang
>Assignee: Yin Huai
> Attachments: HIVE-2206.10-r1384442.patch.txt, 
> HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
> HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, 
> HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, 
> HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt, 
> HIVE-2206.19-r1410581.patch.txt, HIVE-2206.1.patch.txt, 
> HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, 
> HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, 
> HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt, 
> HIVE-2206.8-r1237253.patch.txt, testQueries.2.q, YSmartPatchForHive.patch
>
>
> This issue proposes a new logical optimizer called Correlation Optimizer, 
> which is used to merge correlated MapReduce jobs (MR jobs) into a single MR 
> job. The idea is based on YSmart (http://ysmart.cse.ohio-state.edu/).The 
> paper and slides of YSmart are linked at the bottom.
> Since Hive translates queries in a sentence by sentence fashion, for every 
> operation which may need to shuffle the data (e.g. join and aggregation 
> operations), Hive will generate a MapReduce job for that operation. However, 
> for those operations which may need to shuffle the data, they may involve 
> correlations explained below and thus can be executed in a single MR job.
> # Input Correlation: Multiple MR jobs have input correlation (IC) if their 
> input relation sets are not disjoint;
> # Transit Correlation: Multiple MR jobs have transit correlation (TC) if they 
> have not only input correlation, but also the same partition key;
> # Job Flow Correlation: An MR has job flow correlation (JFC) with one of its 
> child nodes if it has the same partition key as that child node.
> The current implementation of correlation optimizer only detect correlations 
> among MR jobs for reduce-side join operators and reduce-side aggregation 
> operators (not map only aggregation). A query will be optimized if it 
> satisfies following conditions.
> # There exists a MR job for reduce-side join operator or reduce side 
> aggregation operator which have JFC with all of its parents MR jobs (TCs will 
> be also exploited if JFC exists);
> # All input tables of those correlated MR job are original input tables (not 
> intermediate tables generated by sub-queries); and 
> # No self join is involved in those correlated MR jobs.
> Correlation optimizer is implemented as a logical optimizer. The main reasons 
> are that it only needs to manipulate the query plan tree and it can leverage 
> the existing component on generating MR jobs.
> Current implementation can serve as a framework for correlation related 
> optimizations. I think that it is better than adding individual optimizers. 
> There are several work that can be done in future to improve this optimizer. 
> Here are three examples.
> # Support queries only involve TC;
> # Support queries in which input tables of correlated MR jobs involves 
> intermediate tables; and 
> # Optimize queries involving self join. 
> References:
> Paper and presentation of YSmart.
> Paper: 
> http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf
> Slides: http://sdrv.ms/UpwJJc

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3807) Hive authorization should use short username when Kerberos authentication

2013-01-13 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-3807:
---

Fix Version/s: (was: 0.9.0)
Affects Version/s: 0.10.0
   Status: Open  (was: Patch Available)

Tests did pass. However this looks like an incompatible change. Isn't it? These 
names are stored in metastore. After the change shortName will be compared 
against long name for users for whom privileges have already been granted. Than 
check will fail and privileged user will not be allowed to do actions. 
Workaround will be to grant privileges to such users again with short names. 
Kai, can you run some tests to verify if the problem which I identified does 
exist and workaround will actually work. 

> Hive authorization should use short username when Kerberos authentication
> -
>
> Key: HIVE-3807
> URL: https://issues.apache.org/jira/browse/HIVE-3807
> Project: Hive
>  Issue Type: Improvement
>  Components: Authorization
>Affects Versions: 0.9.0, 0.10.0
>Reporter: Kai Zheng
>Assignee: Kai Zheng
> Attachments: HIVE-3807.patch
>
>
> Currently when authentication method is Kerberos,Hive authorization uses user 
> full name as privilege principal, for example, it uses j...@example.com 
> instead of john.
> It should use the short name instead. The benefits:
> 1. Be consistent. Hadoop, HBase and etc they all use short name in related 
> ACLs or authorizations. For Hive authorization works well with them, this 
> should be.
> 2. Be convenient. It's very inconvenient to use the lengthy Kerberos 
> principal name when grant or revoke privileges via Hive CLI.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Got a hadoop server IPC version mismatch ERROR in TestCliDriver avro_joins.q

2013-01-13 Thread Bing Li
Hi, guys
I applied the patches for HIVE-895 ( add SerDe for Avro serialized data
) and HIVE-3273 (Add avro jars into hive execution classpath ) on
Hive-0.9.0.
And then I ran the following command with hadoop-1.0.3 and avro-1.6.3
 ant test -Dtestcase=TestCliDriver -Dqfile=avro_joins.q
-Dtest.silent=false

But I got an ERROR from hadoop in unit test. ( I can ran avro_joins.q
successfully in a real hadoop-1.0.3 cluster).

I found that IPC version 7 is from hadoop 2.x and version 4 is from
hadoop-1.x, but I didn't set hadoop 2.x in any properties files.
Do you know how this happened in unit test?

Thanks,
- Bing

ERROR

[junit] Caused by: org.apache.hadoop.ipc.RemoteException: Server IPC
version 7 cannot communicate with client version 4
[junit]  at org.apache.hadoop.ipc.Client.call(Client.java:740)
[junit]  at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
[junit]  at $Proxy1.getProtocolVersion(Unknown Source)
[junit]  at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
[junit]  at
org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:106)
[junit]  at
org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:207)
[junit]  at
org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:170)
[junit]  at
org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
[junit]  at
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
[junit]  at
org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
[junit]  at
org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
[junit]  at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
[junit]  at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95)
[junit]  at
org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:367)
[junit]  ... 10 more
[junit] Job Submission failed with exception
'java.lang.RuntimeException(org.apache.hadoop.ipc.RemoteException: Server
IPC version 7 cannot communicate with client version 4)'


[jira] [Commented] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2013-01-13 Thread David Inbar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13552282#comment-13552282
 ] 

David Inbar commented on HIVE-2206:
---

I will be on vacation through January 14th, but will be checking email and 
voicemail periodically.

For all time-critical items, please call my mobile phone.

Many thanks,
David

NOTICE: All information in and attached to this email may be proprietary, 
confidential, privileged and otherwise protected from improper or erroneous 
disclosure. If you are not the sender's intended recipient, you are not 
authorized to intercept, read, print, retain, copy, forward, or disseminate 
this message.



> add a new optimizer for query correlation discovery and optimization
> 
>
> Key: HIVE-2206
> URL: https://issues.apache.org/jira/browse/HIVE-2206
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.10.0
>Reporter: He Yongqiang
>Assignee: Yin Huai
> Attachments: HIVE-2206.10-r1384442.patch.txt, 
> HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
> HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, 
> HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, 
> HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt, 
> HIVE-2206.19-r1410581.patch.txt, HIVE-2206.1.patch.txt, 
> HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, 
> HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, 
> HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt, 
> HIVE-2206.8-r1237253.patch.txt, testQueries.2.q, YSmartPatchForHive.patch
>
>
> This issue proposes a new logical optimizer called Correlation Optimizer, 
> which is used to merge correlated MapReduce jobs (MR jobs) into a single MR 
> job. The idea is based on YSmart (http://ysmart.cse.ohio-state.edu/).The 
> paper and slides of YSmart are linked at the bottom.
> Since Hive translates queries in a sentence by sentence fashion, for every 
> operation which may need to shuffle the data (e.g. join and aggregation 
> operations), Hive will generate a MapReduce job for that operation. However, 
> for those operations which may need to shuffle the data, they may involve 
> correlations explained below and thus can be executed in a single MR job.
> # Input Correlation: Multiple MR jobs have input correlation (IC) if their 
> input relation sets are not disjoint;
> # Transit Correlation: Multiple MR jobs have transit correlation (TC) if they 
> have not only input correlation, but also the same partition key;
> # Job Flow Correlation: An MR has job flow correlation (JFC) with one of its 
> child nodes if it has the same partition key as that child node.
> The current implementation of correlation optimizer only detect correlations 
> among MR jobs for reduce-side join operators and reduce-side aggregation 
> operators (not map only aggregation). A query will be optimized if it 
> satisfies following conditions.
> # There exists a MR job for reduce-side join operator or reduce side 
> aggregation operator which have JFC with all of its parents MR jobs (TCs will 
> be also exploited if JFC exists);
> # All input tables of those correlated MR job are original input tables (not 
> intermediate tables generated by sub-queries); and 
> # No self join is involved in those correlated MR jobs.
> Correlation optimizer is implemented as a logical optimizer. The main reasons 
> are that it only needs to manipulate the query plan tree and it can leverage 
> the existing component on generating MR jobs.
> Current implementation can serve as a framework for correlation related 
> optimizations. I think that it is better than adding individual optimizers. 
> There are several work that can be done in future to improve this optimizer. 
> Here are three examples.
> # Support queries only involve TC;
> # Support queries in which input tables of correlated MR jobs involves 
> intermediate tables; and 
> # Optimize queries involving self join. 
> References:
> Paper and presentation of YSmart.
> Paper: 
> http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf
> Slides: http://sdrv.ms/UpwJJc

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2013-01-13 Thread Liu Zongquan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13552281#comment-13552281
 ] 

Liu Zongquan commented on HIVE-2206:


[~yhuai] I have a question that why not release a patch upon a stable hive 
release, e.g,branch hive-0.8-r2. Actually I found that the r1410581 is not a 
stable revision and even I can't run through "ant test -Dtestcase=TestCliDriver 
-Dqfile=show_functions.q -Doverwrite=true" on this revision. So, if this patch 
is based on a stable version, espectially a stable branch, then your honor work 
will benefit more people. Even so ,just a suggestion. 

> add a new optimizer for query correlation discovery and optimization
> 
>
> Key: HIVE-2206
> URL: https://issues.apache.org/jira/browse/HIVE-2206
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.10.0
>Reporter: He Yongqiang
>Assignee: Yin Huai
> Attachments: HIVE-2206.10-r1384442.patch.txt, 
> HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
> HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, 
> HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, 
> HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt, 
> HIVE-2206.19-r1410581.patch.txt, HIVE-2206.1.patch.txt, 
> HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, 
> HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, 
> HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt, 
> HIVE-2206.8-r1237253.patch.txt, testQueries.2.q, YSmartPatchForHive.patch
>
>
> This issue proposes a new logical optimizer called Correlation Optimizer, 
> which is used to merge correlated MapReduce jobs (MR jobs) into a single MR 
> job. The idea is based on YSmart (http://ysmart.cse.ohio-state.edu/).The 
> paper and slides of YSmart are linked at the bottom.
> Since Hive translates queries in a sentence by sentence fashion, for every 
> operation which may need to shuffle the data (e.g. join and aggregation 
> operations), Hive will generate a MapReduce job for that operation. However, 
> for those operations which may need to shuffle the data, they may involve 
> correlations explained below and thus can be executed in a single MR job.
> # Input Correlation: Multiple MR jobs have input correlation (IC) if their 
> input relation sets are not disjoint;
> # Transit Correlation: Multiple MR jobs have transit correlation (TC) if they 
> have not only input correlation, but also the same partition key;
> # Job Flow Correlation: An MR has job flow correlation (JFC) with one of its 
> child nodes if it has the same partition key as that child node.
> The current implementation of correlation optimizer only detect correlations 
> among MR jobs for reduce-side join operators and reduce-side aggregation 
> operators (not map only aggregation). A query will be optimized if it 
> satisfies following conditions.
> # There exists a MR job for reduce-side join operator or reduce side 
> aggregation operator which have JFC with all of its parents MR jobs (TCs will 
> be also exploited if JFC exists);
> # All input tables of those correlated MR job are original input tables (not 
> intermediate tables generated by sub-queries); and 
> # No self join is involved in those correlated MR jobs.
> Correlation optimizer is implemented as a logical optimizer. The main reasons 
> are that it only needs to manipulate the query plan tree and it can leverage 
> the existing component on generating MR jobs.
> Current implementation can serve as a framework for correlation related 
> optimizations. I think that it is better than adding individual optimizers. 
> There are several work that can be done in future to improve this optimizer. 
> Here are three examples.
> # Support queries only involve TC;
> # Support queries in which input tables of correlated MR jobs involves 
> intermediate tables; and 
> # Optimize queries involving self join. 
> References:
> Paper and presentation of YSmart.
> Paper: 
> http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf
> Slides: http://sdrv.ms/UpwJJc

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2780) Implement more restrictive table sampler

2013-01-13 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-2780:
---

Status: Open  (was: Patch Available)

My manually conflict-resolved patch resulted in failure in split_sample.q
{code}
[junit] java.lang.NullPointerException
[junit] at 
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat$DefaultPercentSampler.sampling(CombineHiveInputFormat.java:596)
[junit] at 
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.sampling(CombineHiveInputFormat.java:496)
[junit] at 
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.sampleSplits(CombineHiveInputFormat.java:477)
[junit] at 
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:403)
[junit] at 
org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)
[junit] at 
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781)
[junit] at 
org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
[junit] at 
org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:448)
[junit] at 
org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:682)
[junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[junit] at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
[junit] at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
[junit] at java.lang.reflect.Method.invoke(Method.java:597)
[junit] at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
[junit] Job Submission failed with exception 
'java.lang.NullPointerException(null)'
{code}
Either my resolution wasn't correct or trunk has moved significantly. Navis, if 
you rebase the patch, I will take a look at this one quickly so that it doesnt 
go stale again.

> Implement more restrictive table sampler
> 
>
> Key: HIVE-2780
> URL: https://issues.apache.org/jira/browse/HIVE-2780
> Project: Hive
>  Issue Type: Improvement
>Reporter: Navis
>Assignee: Navis
>Priority: Trivial
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2780.D1623.1.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2780.D1623.2.patch
>
>
> Current table sampling scans whole block, making more rows included than 
> expected especially for small tables.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3876) call resetValid instead of ensureCapacity in the constructor of BytesRefArrayWritable

2013-01-13 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-3876:
---

Status: Open  (was: Patch Available)

Turns out this results in tons of failures. 46 tests failed in CliDriver and 
there were failures in other drivers as well.

> call resetValid instead of ensureCapacity in the constructor of 
> BytesRefArrayWritable
> -
>
> Key: HIVE-3876
> URL: https://issues.apache.org/jira/browse/HIVE-3876
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 0.10.0
>Reporter: Yin Huai
>Assignee: Yin Huai
>Priority: Minor
> Attachments: HIVE-3876.1.patch.txt
>
>
> In the constructor of BytesRefArrayWritable, ensureCapacity(capacity) is 
> called, but "valid" has not been adjusted accordingly. After a new 
> BytesRefArrayWritable has been created with a initial capacity of "x", if 
> resetValid() has not been called explicitly, the size returned is still 0.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3537) release locks at the end of move tasks

2013-01-13 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13552255#comment-13552255
 ] 

Namit Jain commented on HIVE-3537:
--

[~ashutoshc], I was thinking about it more, and I think we should leave it for 
some users.
What if I want atomicity for some reason - Without dependency, it is possible 
that:

the query fails (for some reason), but some outputs get changed, and we dont 
even know which one.

> release locks at the end of move tasks
> --
>
> Key: HIVE-3537
> URL: https://issues.apache.org/jira/browse/HIVE-3537
> Project: Hive
>  Issue Type: Bug
>  Components: Locking, Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3537.1.patch, hive.3537.2.patch, hive.3537.3.patch, 
> hive.3537.4.patch, hive.3537.5.patch, hive.3537.6.patch, hive.3537.7.patch, 
> hive.3537.8.patch, hive.3537.9.patch
>
>
> Look at HIVE-3106 for details.
> In order to make sure that concurrency is not an issue for multi-table 
> inserts, the current option is to introduce a dependency task, which thereby
> delays the creation of all partitions. It would be desirable to release the
> locks for the outputs as soon as the move task is completed. That way, for
> multi-table inserts, the concurrency can be enabled without delaying any 
> table.
> Currently, the movetask contains a input/output, but they do not seem to be
> populated correctly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-933) Infer bucketing/sorting properties

2013-01-13 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13552236#comment-13552236
 ] 

Namit Jain commented on HIVE-933:
-

[~kevinwilfong], did you refresh it after HIVE-3803. The test outputs would 
change.

> Infer bucketing/sorting properties
> --
>
> Key: HIVE-933
> URL: https://issues.apache.org/jira/browse/HIVE-933
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Kevin Wilfong
> Attachments: HIVE-933.1.patch.txt, HIVE-933.2.patch.txt, 
> HIVE-933.3.patch.txt, HIVE-933.4.patch.txt, HIVE-933.5.patch.txt, 
> HIVE-933.6.patch.txt, HIVE-933.7.patch.txt, HIVE-933.8.patch.txt, 
> HIVE-933.9.patch.txt
>
>
> This is a long-term plan, and may require major changes.
> From the query, we can figure out the sorting/bucketing properties, and 
> change the metadata of the destination at that time.
> However, this means that different partitions may have different metadata. 
> Currently, the query plan is same for all the 
> partitions of the table - we can do the following:
> 1. In the first cut, have a simple approach where you take the union all 
> metadata, and create the most defensive plan.
> 2. Enhance mapredWork() to include partition specific operator trees.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3824) bug if different serdes are used for different partitions

2013-01-13 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3824:
-

Attachment: hive.3824.4.patch

> bug if different serdes are used for different partitions
> -
>
> Key: HIVE-3824
> URL: https://issues.apache.org/jira/browse/HIVE-3824
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3824.1.patch, hive.3824.3.patch, hive.3824.4.patch
>
>
> Consider the following testcase:
> create table tst5 (key string, value string) partitioned by (ds string) 
> stored as rcfile;
> insert overwrite table tst5 partition (ds='1') select * from src;
> insert overwrite table tst5 partition (ds='2') select * from src;
> insert overwrite table tst5 partition (ds='3') select * from src;
> alter table tst5 stored as sequencefile; 
> insert overwrite table tst5 partition (ds='4') select * from src;
> insert overwrite table tst5 partition (ds='5') select * from src;
> insert overwrite table tst5 partition (ds='6') select * from src;  
> alter table tst5 set serde 
> 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'; 
> insert overwrite table tst5 partition (ds='7') select * from src;
> insert overwrite table tst5 partition (ds='8') select * from src;
> insert overwrite table tst5 partition (ds='9') select * from src;  
> The following query works fine:
>  select key + key, value from tst5 where ((ds = '4') or (ds = '1'));   
> since both the partitions use ColumnarSerDe
> But the following query fails:
> select key + key, value from tst5 where ((ds = '4') or (ds = '1') or 
> (ds='7'));
> since different serdes are used.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3824) bug if different serdes are used for different partitions

2013-01-13 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13552228#comment-13552228
 ] 

Namit Jain commented on HIVE-3824:
--

will refresh and run tests.

> bug if different serdes are used for different partitions
> -
>
> Key: HIVE-3824
> URL: https://issues.apache.org/jira/browse/HIVE-3824
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3824.1.patch, hive.3824.3.patch
>
>
> Consider the following testcase:
> create table tst5 (key string, value string) partitioned by (ds string) 
> stored as rcfile;
> insert overwrite table tst5 partition (ds='1') select * from src;
> insert overwrite table tst5 partition (ds='2') select * from src;
> insert overwrite table tst5 partition (ds='3') select * from src;
> alter table tst5 stored as sequencefile; 
> insert overwrite table tst5 partition (ds='4') select * from src;
> insert overwrite table tst5 partition (ds='5') select * from src;
> insert overwrite table tst5 partition (ds='6') select * from src;  
> alter table tst5 set serde 
> 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'; 
> insert overwrite table tst5 partition (ds='7') select * from src;
> insert overwrite table tst5 partition (ds='8') select * from src;
> insert overwrite table tst5 partition (ds='9') select * from src;  
> The following query works fine:
>  select key + key, value from tst5 where ((ds = '4') or (ds = '1'));   
> since both the partitions use ColumnarSerDe
> But the following query fails:
> select key + key, value from tst5 where ((ds = '4') or (ds = '1') or 
> (ds='7'));
> since different serdes are used.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Hive-trunk-h0.21 - Build # 1910 - Still Failing

2013-01-13 Thread Apache Jenkins Server
Changes for Build #1907
[namit] HIVE-3888 wrong mapside groupby if no partition is being selected
(Namit Jain via Ashutosh and namit)


Changes for Build #1908

Changes for Build #1909
[kevinwilfong] HIVE-3803. explain dependency should show the dependencies 
hierarchically in presence of views. (njain via kevinwilfong)


Changes for Build #1910



No tests ran.

The Apache Jenkins build system has built Hive-trunk-h0.21 (build #1910)

Status: Still Failing

Check console output at https://builds.apache.org/job/Hive-trunk-h0.21/1910/ to 
view the results.

[jira] [Commented] (HIVE-2693) Add DECIMAL data type

2013-01-13 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13552219#comment-13552219
 ] 

Namit Jain commented on HIVE-2693:
--

Either approach is fine, as long as we document it clearly.

Intuitively, returning a long makes sense, barring the corner cases.
But, I am open.


> Add DECIMAL data type
> -
>
> Key: HIVE-2693
> URL: https://issues.apache.org/jira/browse/HIVE-2693
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor, Types
>Affects Versions: 0.10.0
>Reporter: Carl Steinbach
>Assignee: Prasad Mujumdar
> Attachments: 2693_7.patch, 2693_8.patch, 2693_fix_all_tests1.patch, 
> HIVE-2693-10.patch, HIVE-2693-11.patch, HIVE-2693-12-SortableSerDe.patch, 
> HIVE-2693-13.patch, HIVE-2693-14.patch, HIVE-2693-15.patch, 
> HIVE-2693-16.patch, HIVE-2693-17.patch, HIVE-2693-18.patch, 
> HIVE-2693-19.patch, HIVE-2693-1.patch.txt, HIVE-2693-20.patch, 
> HIVE-2693-21.patch, HIVE-2693-22.patch, HIVE-2693-all.patch, 
> HIVE-2693.D7683.1.patch, HIVE-2693-fix.patch, HIVE-2693.patch, 
> HIVE-2693-take3.patch, HIVE-2693-take4.patch
>
>
> Add support for the DECIMAL data type. HIVE-2272 (TIMESTAMP) provides a nice 
> template for how to do this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3803) explain dependency should show the dependencies hierarchically in presence of views

2013-01-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13552190#comment-13552190
 ] 

Hudson commented on HIVE-3803:
--

Integrated in Hive-trunk-h0.21 #1909 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/1909/])
HIVE-3803. explain dependency should show the dependencies hierarchically 
in presence of views. (njain via kevinwilfong) (Revision 1432543)

 Result = FAILURE
kevinwilfong : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1432543
Files : 
* /hive/trunk/hbase-handler/src/test/results/positive/hbase_stats.q.out
* /hive/trunk/hbase-handler/src/test/results/positive/hbase_stats2.q.out
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/Driver.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExplainTask.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/hooks/ReadEntity.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java
* /hive/trunk/ql/src/test/queries/clientpositive/explain_dependency.q
* /hive/trunk/ql/src/test/results/clientnegative/archive1.q.out
* /hive/trunk/ql/src/test/results/clientnegative/archive2.q.out
* /hive/trunk/ql/src/test/results/clientnegative/archive_insert1.q.out
* /hive/trunk/ql/src/test/results/clientnegative/archive_insert2.q.out
* /hive/trunk/ql/src/test/results/clientnegative/archive_insert3.q.out
* /hive/trunk/ql/src/test/results/clientnegative/archive_insert4.q.out
* /hive/trunk/ql/src/test/results/clientnegative/archive_multi1.q.out
* /hive/trunk/ql/src/test/results/clientnegative/archive_multi2.q.out
* /hive/trunk/ql/src/test/results/clientnegative/archive_multi3.q.out
* /hive/trunk/ql/src/test/results/clientnegative/archive_multi4.q.out
* /hive/trunk/ql/src/test/results/clientnegative/archive_multi5.q.out
* /hive/trunk/ql/src/test/results/clientnegative/archive_multi6.q.out
* /hive/trunk/ql/src/test/results/clientnegative/archive_multi7.q.out
* /hive/trunk/ql/src/test/results/clientnegative/archive_partspec1.q.out
* /hive/trunk/ql/src/test/results/clientnegative/archive_partspec2.q.out
* /hive/trunk/ql/src/test/results/clientnegative/archive_partspec3.q.out
* /hive/trunk/ql/src/test/results/clientnegative/archive_partspec4.q.out
* /hive/trunk/ql/src/test/results/clientnegative/archive_partspec5.q.out
* /hive/trunk/ql/src/test/results/clientnegative/authorization_part.q.out
* /hive/trunk/ql/src/test/results/clientnegative/create_or_replace_view1.q.out
* /hive/trunk/ql/src/test/results/clientnegative/create_or_replace_view2.q.out
* /hive/trunk/ql/src/test/results/clientnegative/lockneg3.q.out
* /hive/trunk/ql/src/test/results/clientnegative/lockneg4.q.out
* /hive/trunk/ql/src/test/results/clientnegative/protectmode_part1.q.out
* 
/hive/trunk/ql/src/test/results/clientpositive/alter_concatenate_indexed_table.q.out
* /hive/trunk/ql/src/test/results/clientpositive/alter_merge.q.out
* /hive/trunk/ql/src/test/results/clientpositive/alter_merge_2.q.out
* /hive/trunk/ql/src/test/results/clientpositive/alter_merge_stats.q.out
* /hive/trunk/ql/src/test/results/clientpositive/alter_view_rename.q.out
* /hive/trunk/ql/src/test/results/clientpositive/archive.q.out
* /hive/trunk/ql/src/test/results/clientpositive/archive_corrupt.q.out
* /hive/trunk/ql/src/test/results/clientpositive/archive_multi.q.out
* /hive/trunk/ql/src/test/results/clientpositive/authorization_2.q.out
* /hive/trunk/ql/src/test/results/clientpositive/authorization_6.q.out
* /hive/trunk/ql/src/test/results/clientpositive/auto_join14_hadoop20.q.out
* /hive/trunk/ql/src/test/results/clientpositive/auto_join19.q.out
* /hive/trunk/ql/src/test/results/clientpositive/auto_join25.q.out
* /hive/trunk/ql/src/test/results/clientpositive/auto_join9.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucket3.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucket_groupby.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_1.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_2.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_3.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_4.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_6.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_7.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_8.q.out
* 
/hive/trunk/ql/src/test/results/clientpositive/bucketizedhiveinputformat_auto.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin1.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin10.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin11.q.out
* /hive/trunk/ql/src/test/results/client

Hive-trunk-h0.21 - Build # 1909 - Still Failing

2013-01-13 Thread Apache Jenkins Server
Changes for Build #1907
[namit] HIVE-3888 wrong mapside groupby if no partition is being selected
(Namit Jain via Ashutosh and namit)


Changes for Build #1908

Changes for Build #1909
[kevinwilfong] HIVE-3803. explain dependency should show the dependencies 
hierarchically in presence of views. (njain via kevinwilfong)




1 tests failed.
REGRESSION:  
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_script_broken_pipe1

Error Message:
Unexpected exception See build/ql/tmp/hive.log, or try "ant test ... 
-Dtest.silent=false" to get more logs.

Stack Trace:
junit.framework.AssertionFailedError: Unexpected exception
See build/ql/tmp/hive.log, or try "ant test ... -Dtest.silent=false" to get 
more logs.
at junit.framework.Assert.fail(Assert.java:47)
at 
org.apache.hadoop.hive.cli.TestNegativeCliDriver.runTest(TestNegativeCliDriver.java:2316)
at 
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_script_broken_pipe1(TestNegativeCliDriver.java:1814)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at junit.framework.TestCase.runTest(TestCase.java:168)
at junit.framework.TestCase.runBare(TestCase.java:134)
at junit.framework.TestResult$1.protect(TestResult.java:110)
at junit.framework.TestResult.runProtected(TestResult.java:128)
at junit.framework.TestResult.run(TestResult.java:113)
at junit.framework.TestCase.run(TestCase.java:124)
at junit.framework.TestSuite.runTest(TestSuite.java:232)
at junit.framework.TestSuite.run(TestSuite.java:227)
at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:518)
at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1052)
at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:906)




The Apache Jenkins build system has built Hive-trunk-h0.21 (build #1909)

Status: Still Failing

Check console output at https://builds.apache.org/job/Hive-trunk-h0.21/1909/ to 
view the results.

[jira] [Commented] (HIVE-3803) explain dependency should show the dependencies hierarchically in presence of views

2013-01-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13552159#comment-13552159
 ] 

Hudson commented on HIVE-3803:
--

Integrated in Hive-trunk-hadoop2 #62 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/62/])
HIVE-3803. explain dependency should show the dependencies hierarchically 
in presence of views. (njain via kevinwilfong) (Revision 1432543)

 Result = FAILURE
kevinwilfong : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1432543
Files : 
* /hive/trunk/hbase-handler/src/test/results/positive/hbase_stats.q.out
* /hive/trunk/hbase-handler/src/test/results/positive/hbase_stats2.q.out
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/Driver.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExplainTask.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/hooks/ReadEntity.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java
* /hive/trunk/ql/src/test/queries/clientpositive/explain_dependency.q
* /hive/trunk/ql/src/test/results/clientnegative/archive1.q.out
* /hive/trunk/ql/src/test/results/clientnegative/archive2.q.out
* /hive/trunk/ql/src/test/results/clientnegative/archive_insert1.q.out
* /hive/trunk/ql/src/test/results/clientnegative/archive_insert2.q.out
* /hive/trunk/ql/src/test/results/clientnegative/archive_insert3.q.out
* /hive/trunk/ql/src/test/results/clientnegative/archive_insert4.q.out
* /hive/trunk/ql/src/test/results/clientnegative/archive_multi1.q.out
* /hive/trunk/ql/src/test/results/clientnegative/archive_multi2.q.out
* /hive/trunk/ql/src/test/results/clientnegative/archive_multi3.q.out
* /hive/trunk/ql/src/test/results/clientnegative/archive_multi4.q.out
* /hive/trunk/ql/src/test/results/clientnegative/archive_multi5.q.out
* /hive/trunk/ql/src/test/results/clientnegative/archive_multi6.q.out
* /hive/trunk/ql/src/test/results/clientnegative/archive_multi7.q.out
* /hive/trunk/ql/src/test/results/clientnegative/archive_partspec1.q.out
* /hive/trunk/ql/src/test/results/clientnegative/archive_partspec2.q.out
* /hive/trunk/ql/src/test/results/clientnegative/archive_partspec3.q.out
* /hive/trunk/ql/src/test/results/clientnegative/archive_partspec4.q.out
* /hive/trunk/ql/src/test/results/clientnegative/archive_partspec5.q.out
* /hive/trunk/ql/src/test/results/clientnegative/authorization_part.q.out
* /hive/trunk/ql/src/test/results/clientnegative/create_or_replace_view1.q.out
* /hive/trunk/ql/src/test/results/clientnegative/create_or_replace_view2.q.out
* /hive/trunk/ql/src/test/results/clientnegative/lockneg3.q.out
* /hive/trunk/ql/src/test/results/clientnegative/lockneg4.q.out
* /hive/trunk/ql/src/test/results/clientnegative/protectmode_part1.q.out
* 
/hive/trunk/ql/src/test/results/clientpositive/alter_concatenate_indexed_table.q.out
* /hive/trunk/ql/src/test/results/clientpositive/alter_merge.q.out
* /hive/trunk/ql/src/test/results/clientpositive/alter_merge_2.q.out
* /hive/trunk/ql/src/test/results/clientpositive/alter_merge_stats.q.out
* /hive/trunk/ql/src/test/results/clientpositive/alter_view_rename.q.out
* /hive/trunk/ql/src/test/results/clientpositive/archive.q.out
* /hive/trunk/ql/src/test/results/clientpositive/archive_corrupt.q.out
* /hive/trunk/ql/src/test/results/clientpositive/archive_multi.q.out
* /hive/trunk/ql/src/test/results/clientpositive/authorization_2.q.out
* /hive/trunk/ql/src/test/results/clientpositive/authorization_6.q.out
* /hive/trunk/ql/src/test/results/clientpositive/auto_join14_hadoop20.q.out
* /hive/trunk/ql/src/test/results/clientpositive/auto_join19.q.out
* /hive/trunk/ql/src/test/results/clientpositive/auto_join25.q.out
* /hive/trunk/ql/src/test/results/clientpositive/auto_join9.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucket3.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucket_groupby.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_1.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_2.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_3.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_4.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_6.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_7.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketcontext_8.q.out
* 
/hive/trunk/ql/src/test/results/clientpositive/bucketizedhiveinputformat_auto.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin1.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin10.q.out
* /hive/trunk/ql/src/test/results/clientpositive/bucketmapjoin11.q.out
* /hive/trunk/ql/src/test/results/client

[jira] [Commented] (HIVE-2693) Add DECIMAL data type

2013-01-13 Thread Mark Grover (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13552151#comment-13552151
 ] 

Mark Grover commented on HIVE-2693:
---

[~hagleitn], I see what you mean. The reason I had changed the type was to be 
consistent with other methods in the UDF in terms of "scale" of the return 
type, albeit at the cost of losing precision. In any case, given what you found 
in documentation, I agree that it makes sense to retain the type to decimal. 

However, the scale of the return value in floor/ceil/round UDFs for decimal 
type methods seems incorrect to me. We should be not be setting the return 
values back to the original scale.

This seems to lead to inconsistencies in ceil/floor and a bug in round UDF.
For example,
{code}
mysql> select ceil(-0.33) from t;
+-+
| ceil(-0.33) |
+-+
|   0 |
+-+
1 row in set (0.00 sec)
{code}
However, patch 21 gives ceil(-0.33) as 0.00. With the proposed change, the 
result in hive would be consistent in representation with MySQL.

Here is what seems like a bug in round UDF:
In MySQL:
round(1E-99,2)=0.00
But, hive gives
round(1E-99,2)=0E-99
which seems to be the incorrect representation.

The proposed change will make hive give the correct result (consistent with 
MySQL): round(1E-99,2)=0.00

What do you think? If you agree, I can upload the new patch. Please let me 
know. Thanks!

> Add DECIMAL data type
> -
>
> Key: HIVE-2693
> URL: https://issues.apache.org/jira/browse/HIVE-2693
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor, Types
>Affects Versions: 0.10.0
>Reporter: Carl Steinbach
>Assignee: Prasad Mujumdar
> Attachments: 2693_7.patch, 2693_8.patch, 2693_fix_all_tests1.patch, 
> HIVE-2693-10.patch, HIVE-2693-11.patch, HIVE-2693-12-SortableSerDe.patch, 
> HIVE-2693-13.patch, HIVE-2693-14.patch, HIVE-2693-15.patch, 
> HIVE-2693-16.patch, HIVE-2693-17.patch, HIVE-2693-18.patch, 
> HIVE-2693-19.patch, HIVE-2693-1.patch.txt, HIVE-2693-20.patch, 
> HIVE-2693-21.patch, HIVE-2693-22.patch, HIVE-2693-all.patch, 
> HIVE-2693.D7683.1.patch, HIVE-2693-fix.patch, HIVE-2693.patch, 
> HIVE-2693-take3.patch, HIVE-2693-take4.patch
>
>
> Add support for the DECIMAL data type. HIVE-2272 (TIMESTAMP) provides a nice 
> template for how to do this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira