[jira] [Resolved] (HIVE-1324) support more than 1 reducer in local test mode

2012-12-27 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan resolved HIVE-1324.


Resolution: Duplicate

Seems like HIVE-117 is tracking same thing.

> support more than 1 reducer in local test mode
> --
>
> Key: HIVE-1324
> URL: https://issues.apache.org/jira/browse/HIVE-1324
> Project: Hive
>  Issue Type: Improvement
>  Components: Testing Infrastructure
>Affects Versions: 0.5.0
>Reporter: Ted Yu
>
> Currently only 1 reducer is supported in local test mode.
> In order to write unit tests that simulate real production environment, it 
> would be desirable to support more than 1 reducer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3718) Add check to determine whether partition can be dropped at Semantic Analysis time

2012-12-27 Thread Pamela Vagata (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13540356#comment-13540356
 ] 

Pamela Vagata commented on HIVE-3718:
-

I ran all the tests, looks like the test (sa_fail_hook3.q) included with the 
diff is actually failing - I'll fix it up.

> Add check to determine whether partition can be dropped at Semantic Analysis 
> time
> -
>
> Key: HIVE-3718
> URL: https://issues.apache.org/jira/browse/HIVE-3718
> Project: Hive
>  Issue Type: Task
>  Components: CLI
>Reporter: Pamela Vagata
>Assignee: Pamela Vagata
>Priority: Minor
> Attachments: HIVE-3718.1.patch.txt, HIVE-3718.2.patch.txt, 
> HIVE-3718.3.patch.txt, hive.3718.4.patch, HIVE-3718.5.patch.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3718) Add check to determine whether partition can be dropped at Semantic Analysis time

2012-12-27 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13540352#comment-13540352
 ] 

Namit Jain commented on HIVE-3718:
--

Did you run the following test : drop_partitions_ignore_protection.q ?
Seems like it would fail.


> Add check to determine whether partition can be dropped at Semantic Analysis 
> time
> -
>
> Key: HIVE-3718
> URL: https://issues.apache.org/jira/browse/HIVE-3718
> Project: Hive
>  Issue Type: Task
>  Components: CLI
>Reporter: Pamela Vagata
>Assignee: Pamela Vagata
>Priority: Minor
> Attachments: HIVE-3718.1.patch.txt, HIVE-3718.2.patch.txt, 
> HIVE-3718.3.patch.txt, hive.3718.4.patch, HIVE-3718.5.patch.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3809) Concurrency issue in RCFile: multiple threads can use the same decompressor

2012-12-27 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-3809:
---

Status: Patch Available  (was: Open)

+1

> Concurrency issue in RCFile: multiple threads can use the same decompressor
> ---
>
> Key: HIVE-3809
> URL: https://issues.apache.org/jira/browse/HIVE-3809
> Project: Hive
>  Issue Type: Bug
>Reporter: Mikhail Bautin
>Assignee: Mikhail Bautin
>Priority: Critical
> Attachments: 
> 0001-HIVE-3809-Decompressors-should-only-be-returned-to-t.patch, D7419.1.patch
>
>
> RCFile is not thread-safe, even if each reader is only used by one thread as 
> intended, because it is possible to return decompressors to the pool multiple 
> times by calling close on the reader multiple times. Then, different threads 
> can pick up the same decompressor twice from the pool, resulting in 
> decompression failures.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3809) Concurrency issue in RCFile: multiple threads can use the same decompressor

2012-12-27 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-3809:
---

Assignee: Mikhail Bautin

> Concurrency issue in RCFile: multiple threads can use the same decompressor
> ---
>
> Key: HIVE-3809
> URL: https://issues.apache.org/jira/browse/HIVE-3809
> Project: Hive
>  Issue Type: Bug
>Reporter: Mikhail Bautin
>Assignee: Mikhail Bautin
>Priority: Critical
> Attachments: 
> 0001-HIVE-3809-Decompressors-should-only-be-returned-to-t.patch, D7419.1.patch
>
>
> RCFile is not thread-safe, even if each reader is only used by one thread as 
> intended, because it is possible to return decompressors to the pool multiple 
> times by calling close on the reader multiple times. Then, different threads 
> can pick up the same decompressor twice from the pool, resulting in 
> decompression failures.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HIVE-1571) hive.metastore.uris is missing its definition in hive-default.xml

2012-12-27 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan resolved HIVE-1571.


Resolution: Not A Problem

There is no longer any hive-default.xml. Instead we have 
hive-default.xml.template which does this include this definition.

> hive.metastore.uris is missing its definition in hive-default.xml
> -
>
> Key: HIVE-1571
> URL: https://issues.apache.org/jira/browse/HIVE-1571
> Project: Hive
>  Issue Type: Bug
>  Components: Configuration, Metastore
>Affects Versions: 0.6.0
>Reporter: John Sichi
>Assignee: Paul Yang
>Priority: Critical
>
> Need to add this here as doc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3272) RetryingRawStore will perform partial transaction on retry

2012-12-27 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13540333#comment-13540333
 ] 

Ashutosh Chauhan commented on HIVE-3272:


[~kevinwilfong] : I believe this is still an issue in spite of HIVE-3826 

> RetryingRawStore will perform partial transaction on retry
> --
>
> Key: HIVE-3272
> URL: https://issues.apache.org/jira/browse/HIVE-3272
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.10.0
>Reporter: Kevin Wilfong
>Priority: Critical
>
> By the time the RetryingRawStore retries a command the transaction 
> encompassing it has already been rolled back.  This means that it will 
> perform the remainder of the raw store commands outside of a transaction, 
> unless there is another one encapsulating it which is definitely not always 
> the case, and then fail when it tries to commit the transaction as there is 
> none open.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3826) Rollbacks and retries of drops cause org.datanucleus.exceptions.NucleusObjectNotFoundException: No such database row)

2012-12-27 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-3826:
---

Fix Version/s: 0.11.0

> Rollbacks and retries of drops cause 
> org.datanucleus.exceptions.NucleusObjectNotFoundException: No such database 
> row)
> -
>
> Key: HIVE-3826
> URL: https://issues.apache.org/jira/browse/HIVE-3826
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.11.0
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Fix For: 0.11.0
>
> Attachments: HIVE-3826.1.patch.txt
>
>
> I'm not sure if this is the only cause of the exception 
> "org.datanucleus.exceptions.NucleusObjectNotFoundException: No such database 
> row)" from the metastore, but one cause seems to be related to a drop command 
> failing, and being retried by the client.
> Based on focusing on a single thread in the metastore with DEBUG level 
> logging, I was seeing the objects that were intended to be dropped remaining 
> in the PersistenceManager cache even after a rollback.  The steps seemed to 
> be as follows:
> 1) First attempt to drop the table, the table is pulled into the 
> PersistenceManager cache for the purposes of dropping
> 2) The drop fails, e.g. due to a lock wait timeout on the SQL backend, this 
> causes a rollback of the transaction
> 3) The drop is retried using a different thread on the metastore Thrift 
> server or a different server and succeeds
> 4) Back on the original thread of the original Thrift server someone tries to 
> perform some write operation which produces a commit.  This causes those 
> detached objects related to the dropped table to attempt to reattach, causing 
> JDO to query the SQL backend for those objects which it can't find.  This 
> causes the exception.
> I was able to reproduce this regularly using the following sequence of 
> commands:
> Hive client 1 (Hive1): connected to a metastore Thrift server running a 
> single thread, I hard coded a RuntimeException into the code to drop a table 
> in the ObjectStore, specifically right before the commit in 
> preDropStorageDescriptor, to induce a rollback.  I also turned off all 
> retries at all layers of the metastore.
> Hive client 2 (Hive2): connected to a separate metastore Thrift server 
> running with standard configs and code
> 1: On Hive1, CREATE TABLE t1 (c STRING);
> 2: On Hive1, DROP TABLE t1; // This failed due to the hard coded exception
> 3: On Hive2, DROP TABLE t1; // Succeeds
> 4: On Hive1, CREATE DATABASE d1; // This database already existed, I'm not 
> sure why this was necessary, but it didn't work without it, it seemed to have 
> an affect on the order objects were committed in the next step
> 5: On Hive1, CREATE DATABASE d2; // This database didn't exist, it would fail 
> with the NucleusObjectNotFoundException
> The object that would cause the exception varied, I saw the MTable, the 
> MSerDeInfo, and MTablePrivilege from the table that attempted to be dropped.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3718) Add check to determine whether partition can be dropped at Semantic Analysis time

2012-12-27 Thread Pamela Vagata (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13540326#comment-13540326
 ] 

Pamela Vagata commented on HIVE-3718:
-

updated https://reviews.facebook.net/D6783

> Add check to determine whether partition can be dropped at Semantic Analysis 
> time
> -
>
> Key: HIVE-3718
> URL: https://issues.apache.org/jira/browse/HIVE-3718
> Project: Hive
>  Issue Type: Task
>  Components: CLI
>Reporter: Pamela Vagata
>Assignee: Pamela Vagata
>Priority: Minor
> Attachments: HIVE-3718.1.patch.txt, HIVE-3718.2.patch.txt, 
> HIVE-3718.3.patch.txt, hive.3718.4.patch, HIVE-3718.5.patch.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3718) Add check to determine whether partition can be dropped at Semantic Analysis time

2012-12-27 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13540320#comment-13540320
 ] 

Namit Jain commented on HIVE-3718:
--

Can you create a phabricator entry also ?
It is much easier to review.

> Add check to determine whether partition can be dropped at Semantic Analysis 
> time
> -
>
> Key: HIVE-3718
> URL: https://issues.apache.org/jira/browse/HIVE-3718
> Project: Hive
>  Issue Type: Task
>  Components: CLI
>Reporter: Pamela Vagata
>Assignee: Pamela Vagata
>Priority: Minor
> Attachments: HIVE-3718.1.patch.txt, HIVE-3718.2.patch.txt, 
> HIVE-3718.3.patch.txt, hive.3718.4.patch, HIVE-3718.5.patch.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HIVE-401) Reduce the ant test time to under 15 minutes

2012-12-27 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan resolved HIVE-401.
---

Resolution: Fixed

https://cwiki.apache.org/confluence/display/Hive/Unit+Test+Parallel+Execution 
documents how to run tests parallely.

> Reduce the ant test time to under 15 minutes
> 
>
> Key: HIVE-401
> URL: https://issues.apache.org/jira/browse/HIVE-401
> Project: Hive
>  Issue Type: Wish
>Reporter: Zheng Shao
>Assignee: Zheng Shao
> Attachments: hive_parallel_test.sh
>
>
> "ant test" is taking too long. This is a big overhead for development since 
> we need to do context switching all the time.
> We should bring the time back to under 15 minutes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3718) Add check to determine whether partition can be dropped at Semantic Analysis time

2012-12-27 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3718:
-

Status: Open  (was: Patch Available)

HIVE-3718.5.patch.txt and HIVE-3718.3.patch.txt are exactly the same

> Add check to determine whether partition can be dropped at Semantic Analysis 
> time
> -
>
> Key: HIVE-3718
> URL: https://issues.apache.org/jira/browse/HIVE-3718
> Project: Hive
>  Issue Type: Task
>  Components: CLI
>Reporter: Pamela Vagata
>Assignee: Pamela Vagata
>Priority: Minor
> Attachments: HIVE-3718.1.patch.txt, HIVE-3718.2.patch.txt, 
> HIVE-3718.3.patch.txt, hive.3718.4.patch, HIVE-3718.5.patch.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3838) Add input table name to MetaStoreEndFunctionContext for logging purposes

2012-12-27 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13540289#comment-13540289
 ] 

Namit Jain commented on HIVE-3838:
--

+1

> Add input table name to MetaStoreEndFunctionContext for logging purposes
> 
>
> Key: HIVE-3838
> URL: https://issues.apache.org/jira/browse/HIVE-3838
> Project: Hive
>  Issue Type: Task
>Reporter: Pamela Vagata
>Assignee: Pamela Vagata
>Priority: Minor
> Attachments: HIVE-3838.1.patch.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3813) Allow publishing artifacts to an arbitrary remote repository

2012-12-27 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-3813:
---

Assignee: Mikhail Bautin

> Allow publishing artifacts to an arbitrary remote repository
> 
>
> Key: HIVE-3813
> URL: https://issues.apache.org/jira/browse/HIVE-3813
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mikhail Bautin
>Assignee: Mikhail Bautin
> Attachments: 
> 0001-HIVE-3813-Allow-publishing-artifacts-to-an-arbitrary.patch, D7455.1.patch
>
>
> Allow publishing artifacts to an arbitrary remote repository by specifying 
> -Dmvn.publish.repoUrl on the command line (patch by Thomas Dudziak).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3813) Allow publishing artifacts to an arbitrary remote repository

2012-12-27 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13540280#comment-13540280
 ] 

Ashutosh Chauhan commented on HIVE-3813:


Patch looks good. Is there any easy way to manually test this before I commit 
this?

> Allow publishing artifacts to an arbitrary remote repository
> 
>
> Key: HIVE-3813
> URL: https://issues.apache.org/jira/browse/HIVE-3813
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mikhail Bautin
>Assignee: Mikhail Bautin
> Attachments: 
> 0001-HIVE-3813-Allow-publishing-artifacts-to-an-arbitrary.patch, D7455.1.patch
>
>
> Allow publishing artifacts to an arbitrary remote repository by specifying 
> -Dmvn.publish.repoUrl on the command line (patch by Thomas Dudziak).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3446) PrimitiveObjectInspector doesn't handle timestamps properly

2012-12-27 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13540293#comment-13540293
 ] 

Namit Jain commented on HIVE-3446:
--

Ashutosh, I am assuming you are committing it.
Can you file a follow-up jira for a test for this one ?

> PrimitiveObjectInspector doesn't handle timestamps properly
> ---
>
> Key: HIVE-3446
> URL: https://issues.apache.org/jira/browse/HIVE-3446
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.10.0, 0.9.1
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
> Attachments: HIVE-3446.1.patch.txt
>
>
> Getting java.sql.Timestamp from a TimestampWritable is broken due to an 
> incorrect mapping in PrimitiveObjectInspectorUtils.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3286) Explicit skew join on user provided condition

2012-12-27 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13540294#comment-13540294
 ] 

Phabricator commented on HIVE-3286:
---

njain has commented on the revision "HIVE-3286 [jira] Explicit skew join on 
user provided condition".

  Cool, can you move it to a optimization step ?
  That way, we can also drive it from the table metadata.

INLINE COMMENTS
  ql/src/test/queries/clientpositive/skewjoin_explict.q:4 The user should not 
be setting the partitioner.

  for SKEWED ON syntax, the partitioner should be automatically chosen
  ql/src/test/queries/clientpositive/skewjoin_explict.q:63 Add some sub-queries 
in the tests

REVISION DETAIL
  https://reviews.facebook.net/D4287

To: JIRA, navis
Cc: njain


> Explicit skew join on user provided condition
> -
>
> Key: HIVE-3286
> URL: https://issues.apache.org/jira/browse/HIVE-3286
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
> Attachments: HIVE-3286.D4287.5.patch, HIVE-3286.D4287.6.patch
>
>
> Join operation on table with skewed data takes most of execution time 
> handling the skewed keys. But mostly we already know about that and even know 
> what is look like the skewed keys.
> If we can explicitly assign reducer slots for the skewed keys, total 
> execution time could be greatly shortened.
> As for a start, I've extended join grammar something like this.
> {code}
> select * from src a join src b on a.key=b.key skew on (a.key+1 < 50, a.key+1 
> < 100, a.key < 150);
> {code}
> which means if above query is executed by 20 reducers, one reducer for 
> a.key+1 < 50, one reducer for 50 <= a.key+1 < 100, one reducer for 99 <= 
> a.key < 150, and 17 reducers for others (could be extended to assign more 
> than one reducer later)
> This can be only used with common-inner-equi joins. And skew condition should 
> be composed of join keys only.
> Work till done now will be updated shortly after code cleanup.
> 
> Skew expressions* in "SKEW ON (expr, expr, ...)" are evaluated sequentially 
> at runtime, and first 'true' one decides skew group for the row. Each skew 
> group has reserved partition slot(s), to which all rows in a group would be 
> assigned. 
> The number of partition slot reserved for each group is decided also at 
> runtime by simple calculation of percentage. If a skew group is "CLUSTER BY 
> 20 PERCENT" and total partition slot (=number of reducer) is 20, that group 
> will reserve 4 partition slots, etc.
> "DISTRIBUTE BY" decides how the rows in a group is dispersed in the range of 
> reserved slots (If there is only one slot for a group, this is meaningless). 
> Currently, three distribution policies are available: RANDOM, KEYS, 
> . 
> 1. RANDOM : rows of driver** alias are dispersed by random and rows of 
> non-driver alias are duplicated for all the slots (default if not specified)
> 2. KEYS : determined by hash value of keys (same with previous)
> 3. expression : determined by hash of object evaluated by user-provided 
> expression
> Only possible with inner, equi, common-joins. Not yet supports join tree 
> merging.
> Might be used by other RS users like "SORT BY" or "GROUP BY"
> If there exists column statistics for the key, it could be possible to apply 
> automatically.
> For example, if 20 reducers are used for the query below,
> {code}
> select count(*) from src a join src b on a.key=b.key skew on (
>a.key = '0' CLUSTER BY 10 PERCENT,
>b.key < '100' CLUSTER BY 20 PERCENT DISTRIBUTE BY upper(b.key),
>cast(a.key as int) > 300 CLUSTER BY 40 PERCENT DISTRIBUTE BY KEYS);
> {code}
> group-0 will reserve slots 6~7, group-1 8~11, group-2 12~19 and others will 
> reserve slots 0~5.
> For a row with key='0' from alias a, the row is randomly assigned in the 
> range of 6~7 (driver alias) : 6 or 7
> For a row with key='0' from alias b, the row is disributed for all slots in 
> 6~7 (non-driver alias) : 6 and 7
> For a row with key='50', the row is assigned in the range of 8~11 by hashcode 
> of upper(b.key) : 8 + (hash(upper(key)) % 4)
> For a row with key='500', the row is assigned in the range of 12~19 by 
> hashcode of join key : 12 + (hash(key) % 8)
> For a row with key='200', this is not belong to any skew group : hash(key) % 6
> *expressions in skew condition : 
> 1. all expressions should be made of expression in join condition, which 
> means if join condition is "a.key=b.key", user can make any expression with 
> "a.key" or "b.key". But if join condition is a.key+1=b.key, user cannot make 
> expression with "a.key" solely (should make expression with "a.key+1"). 
> 2. all expressions should reference one and only-one side of aliases. For 
> example,

[jira] [Updated] (HIVE-3813) Allow publishing artifacts to an arbitrary remote repository

2012-12-27 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-3813:
---

Status: Patch Available  (was: Open)

> Allow publishing artifacts to an arbitrary remote repository
> 
>
> Key: HIVE-3813
> URL: https://issues.apache.org/jira/browse/HIVE-3813
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mikhail Bautin
>Assignee: Mikhail Bautin
> Attachments: 
> 0001-HIVE-3813-Allow-publishing-artifacts-to-an-arbitrary.patch, D7455.1.patch
>
>
> Allow publishing artifacts to an arbitrary remote repository by specifying 
> -Dmvn.publish.repoUrl on the command line (patch by Thomas Dudziak).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3718) Add check to determine whether partition can be dropped at Semantic Analysis time

2012-12-27 Thread Pamela Vagata (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13540300#comment-13540300
 ] 

Pamela Vagata commented on HIVE-3718:
-

there isn't a code difference, the hive-3718.3.patch.txt was produced so that 
the file names had prefixes on them, which is why you couldn't run the tests; I 
ran git diff --no-prefix to produce hive-3718.5.patch.txt and ran the tests to 
make sure it would work.

> Add check to determine whether partition can be dropped at Semantic Analysis 
> time
> -
>
> Key: HIVE-3718
> URL: https://issues.apache.org/jira/browse/HIVE-3718
> Project: Hive
>  Issue Type: Task
>  Components: CLI
>Reporter: Pamela Vagata
>Assignee: Pamela Vagata
>Priority: Minor
> Attachments: HIVE-3718.1.patch.txt, HIVE-3718.2.patch.txt, 
> HIVE-3718.3.patch.txt, hive.3718.4.patch, HIVE-3718.5.patch.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3835) Add an option to run tests where testfiles can be specified as a regular expression

2012-12-27 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13540275#comment-13540275
 ] 

Namit Jain commented on HIVE-3835:
--

yes

> Add an option to run tests where testfiles can be specified as a regular 
> expression
> ---
>
> Key: HIVE-3835
> URL: https://issues.apache.org/jira/browse/HIVE-3835
> Project: Hive
>  Issue Type: Improvement
>  Components: Testing Infrastructure
>Reporter: Namit Jain
>
> For eg., if I want to run all list bucketing tests, I should be able to say:
>  ant test -Dtestcase=TestCliDriver -Dqfile=list_bucket_dml*.q
> or something like that

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3838) Add input table name to MetaStoreEndFunctionContext for logging purposes

2012-12-27 Thread Pamela Vagata (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13540273#comment-13540273
 ] 

Pamela Vagata commented on HIVE-3838:
-

see https://reviews.facebook.net/D7677

> Add input table name to MetaStoreEndFunctionContext for logging purposes
> 
>
> Key: HIVE-3838
> URL: https://issues.apache.org/jira/browse/HIVE-3838
> Project: Hive
>  Issue Type: Task
>Reporter: Pamela Vagata
>Assignee: Pamela Vagata
>Priority: Minor
> Attachments: HIVE-3838.1.patch.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3838) Add input table name to MetaStoreEndFunctionContext for logging purposes

2012-12-27 Thread Pamela Vagata (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pamela Vagata updated HIVE-3838:


Status: Patch Available  (was: Open)

> Add input table name to MetaStoreEndFunctionContext for logging purposes
> 
>
> Key: HIVE-3838
> URL: https://issues.apache.org/jira/browse/HIVE-3838
> Project: Hive
>  Issue Type: Task
>Reporter: Pamela Vagata
>Assignee: Pamela Vagata
>Priority: Minor
> Attachments: HIVE-3838.1.patch.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3838) Add input table name to MetaStoreEndFunctionContext for logging purposes

2012-12-27 Thread Pamela Vagata (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pamela Vagata updated HIVE-3838:


Attachment: HIVE-3838.1.patch.txt

> Add input table name to MetaStoreEndFunctionContext for logging purposes
> 
>
> Key: HIVE-3838
> URL: https://issues.apache.org/jira/browse/HIVE-3838
> Project: Hive
>  Issue Type: Task
>Reporter: Pamela Vagata
>Assignee: Pamela Vagata
>Priority: Minor
> Attachments: HIVE-3838.1.patch.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-3842) Remove redundant test codes

2012-12-27 Thread Navis (JIRA)
Navis created HIVE-3842:
---

 Summary: Remove redundant test codes
 Key: HIVE-3842
 URL: https://issues.apache.org/jira/browse/HIVE-3842
 Project: Hive
  Issue Type: Test
  Components: Tests
Reporter: Navis
Assignee: Navis
Priority: Trivial


Currently hive writes same test code again and again for each test, making test 
class huge (50k line for ql).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3835) Add an option to run tests where testfiles can be specified as a regular expression

2012-12-27 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13540269#comment-13540269
 ] 

Navis commented on HIVE-3835:
-

You mean -qfile_regex=java_regex?

> Add an option to run tests where testfiles can be specified as a regular 
> expression
> ---
>
> Key: HIVE-3835
> URL: https://issues.apache.org/jira/browse/HIVE-3835
> Project: Hive
>  Issue Type: Improvement
>  Components: Testing Infrastructure
>Reporter: Namit Jain
>
> For eg., if I want to run all list bucketing tests, I should be able to say:
>  ant test -Dtestcase=TestCliDriver -Dqfile=list_bucket_dml*.q
> or something like that

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HIVE-3631) script_pipe.q fails when using JDK7

2012-12-27 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan resolved HIVE-3631.


   Resolution: Fixed
Fix Version/s: 0.11

> script_pipe.q fails when using JDK7
> ---
>
> Key: HIVE-3631
> URL: https://issues.apache.org/jira/browse/HIVE-3631
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 0.10.0, 0.9.1
>Reporter: Chris Drome
>Assignee: Chris Drome
> Fix For: 0.11.0
>
> Attachments: HIVE-3631-0.9.patch, HIVE-3631-trunk.patch
>
>
> Hive Runtime Error while closing operators: Hit error while closing ..
> The MR job fails on this test. Unfortunately, the exception is not all that 
> helpful.
> I tracked this down to a class which attempts to close a stream that is 
> already closed. Broken pipe exceptions are caught and not propagated further, 
> but stream closed exception are not caught.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3446) PrimitiveObjectInspector doesn't handle timestamps properly

2012-12-27 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-3446:
---

Assignee: Sam Tunnicliffe

> PrimitiveObjectInspector doesn't handle timestamps properly
> ---
>
> Key: HIVE-3446
> URL: https://issues.apache.org/jira/browse/HIVE-3446
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.10.0, 0.9.1
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
> Attachments: HIVE-3446.1.patch.txt
>
>
> Getting java.sql.Timestamp from a TimestampWritable is broken due to an 
> incorrect mapping in PrimitiveObjectInspectorUtils.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3446) PrimitiveObjectInspector doesn't handle timestamps properly

2012-12-27 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13540262#comment-13540262
 ] 

Ashutosh Chauhan commented on HIVE-3446:


Cool. Trunk mostly.

> PrimitiveObjectInspector doesn't handle timestamps properly
> ---
>
> Key: HIVE-3446
> URL: https://issues.apache.org/jira/browse/HIVE-3446
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.10.0, 0.9.1
>Reporter: Sam Tunnicliffe
> Attachments: HIVE-3446.1.patch.txt
>
>
> Getting java.sql.Timestamp from a TimestampWritable is broken due to an 
> incorrect mapping in PrimitiveObjectInspectorUtils.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3841) Sampling in previous MR for range partitioning of next RS

2012-12-27 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-3841:
--

Attachment: HIVE-3841.D7671.1.patch

navis requested code review of "HIVE-3841 [jira] Sampling in previous MR for 
range partitioning of next RS".
Reviewers: JIRA

  DPAL-1945 Sampling in previous MR for range partitioning of next RS

  Currently hive enforces single reducer for order by clause, which can be 
performance bottleneck.

  If sampling could be done on ordering key at previous MR stage, multiple 
reducers could be assigned for it.

TEST PLAN
  EMPTY

REVISION DETAIL
  https://reviews.facebook.net/D7671

AFFECTED FILES
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/HiveTotalOrderPartitioner.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/PartitionSampler.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/SampleMerger.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/SamplingOptimizer.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/FileSinkDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/SamplingContext.java

MANAGE HERALD DIFFERENTIAL RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/18381/

To: JIRA, navis


> Sampling in previous MR for range partitioning of next RS
> -
>
> Key: HIVE-3841
> URL: https://issues.apache.org/jira/browse/HIVE-3841
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
> Attachments: HIVE-3841.D7671.1.patch
>
>
> Currently hive enforces single reducer for order by clause, which can be 
> performance bottleneck. 
> If sampling could be done on ordering key at previous MR stage, multiple 
> reducers could be assigned for it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3837) Three Table BucketMapJoin is failing

2012-12-27 Thread Zhenxiao Luo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhenxiao Luo updated HIVE-3837:
---

Description: 
The following three table bucketmapjoin query returns 0 as result:

set hive.optimize.bucketmapjoin = true;
set hive.input.format = org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;

CREATE TABLE t1 (key1 int, value1 string) partitioned by (ds1 string) CLUSTERED 
BY (key1) INTO 2 BUCKETS STORED AS TEXTFILE;
load data local inpath '../data/files/srcbucket20.txt' INTO TABLE t1 
partition(ds1='part1');
load data local inpath '../data/files/srcbucket21.txt' INTO TABLE t1 
partition(ds1='part1');
load data local inpath '../data/files/srcbucket20.txt' INTO TABLE t1 
partition(ds1='part2');
load data local inpath '../data/files/srcbucket21.txt' INTO TABLE t1 
partition(ds1='part2');

CREATE TABLE t2 (key2 int, value2 string) partitioned by (ds2 string) CLUSTERED 
BY (key2) INTO 2 BUCKETS STORED AS TEXTFILE;
load data local inpath '../data/files/srcbucket20.txt' INTO TABLE t2 
partition(ds2='part1');
load data local inpath '../data/files/srcbucket21.txt' INTO TABLE t2 
partition(ds2='part1');
load data local inpath '../data/files/srcbucket20.txt' INTO TABLE t2 
partition(ds2='part2');
load data local inpath '../data/files/srcbucket21.txt' INTO TABLE t2 
partition(ds2='part2');

CREATE TABLE t3 (key3 int, value3 string) partitioned by (ds3 string) CLUSTERED 
BY (key3) INTO 2 BUCKETS STORED AS TEXTFILE;
load data local inpath '../data/files/srcbucket20.txt' INTO TABLE t3 
partition(ds3='part1');
load data local inpath '../data/files/srcbucket21.txt' INTO TABLE t3 
partition(ds3='part1');

-- Three Tables Join
explain extended
select /*+mapjoin(b,c)*/ count(*)
from t1 a join t2 b on (a.key1=b.key2 and a.ds1=b.ds2) join t3 c on 
(a.key1=c.key3 and a.ds1=c.ds3);

select /*+mapjoin(b,c)*/ count(*)
from t1 a join t2 b on (a.key1=b.key2 and a.ds1=b.ds2) join t3 c on 
(a.key1=c.key3 and a.ds1=c.ds3);

It should return 1114(if we run a join without mapjoin).

  was:
The following testcase shows that three table BucketMapJoin is failing:


set hive.optimize.bucketmapjoin = true;

CREATE TABLE t1 (key1 int, value1 string) partitioned by (ds1 string) CLUSTERED 
BY (key1) INTO 2 BUCKETS STORED AS TEXTFILE;
load data local inpath '../data/files/srcbucket20.txt' INTO TABLE t1 
partition(ds1='part1');
load data local inpath '../data/files/srcbucket21.txt' INTO TABLE t1 
partition(ds1='part1');
load data local inpath '../data/files/srcbucket20.txt' INTO TABLE t1 
partition(ds1='part2');
load data local inpath '../data/files/srcbucket21.txt' INTO TABLE t1 
partition(ds1='part2');

CREATE TABLE t2 (key2 int, value2 string) partitioned by (ds2 string) CLUSTERED 
BY (key2) INTO 2 BUCKETS STORED AS TEXTFILE;
load data local inpath '../data/files/srcbucket20.txt' INTO TABLE t2 
partition(ds2='part1');
load data local inpath '../data/files/srcbucket21.txt' INTO TABLE t2 
partition(ds2='part1');
load data local inpath '../data/files/srcbucket20.txt' INTO TABLE t2 
partition(ds2='part2');
load data local inpath '../data/files/srcbucket21.txt' INTO TABLE t2 
partition(ds2='part2');

CREATE TABLE t3 (key3 int, value3 string) partitioned by (ds3 string) CLUSTERED 
BY (key3) INTO 2 BUCKETS STORED AS TEXTFILE;
load data local inpath '../data/files/srcbucket20.txt' INTO TABLE t3 
partition(ds3='part1');
load data local inpath '../data/files/srcbucket21.txt' INTO TABLE t3 
partition(ds3='part1');

-- Three Tables Join
explain extended
select /*+mapjoin(b,c)*/ count(*)
from t1 a join t2 b on (a.key1=b.key2 and a.ds1=b.ds2) join t3 c on 
(b.key2=c.key3 and b.ds2=c.ds3);

select /*+mapjoin(b,c)*/ count(*)
from t1 a join t2 b on (a.key1=b.key2 and a.ds1=b.ds2) join t3 c on 
(b.key2=c.key3 and b.ds2=c.ds3);

select count(*)
from t1 a join t2 b on (a.key1=b.key2 and a.ds1=b.ds2) join t3 c on 
(b.key2=c.key3 and b.ds2=c.ds3);


The result is:

PREHOOK: query: CREATE TABLE t1 (key1 int, value1 string) partitioned by (ds1 
string) CLUSTERED BY (key1) INTO 2 BUCKETS STORED AS TEXTFILE
PREHOOK: type: CREATETABLE
POSTHOOK: query: CREATE TABLE t1 (key1 int, value1 string) partitioned by (ds1 
string) CLUSTERED BY (key1) INTO 2 BUCKETS STORED AS TEXTFILE
POSTHOOK: type: CREATETABLE
POSTHOOK: Output: default@t1
PREHOOK: query: load data local inpath '../data/files/srcbucket20.txt' INTO 
TABLE t1 partition(ds1='part1')
PREHOOK: type: LOAD
PREHOOK: Output: default@t1
POSTHOOK: query: load data local inpath '../data/files/srcbucket20.txt' INTO 
TABLE t1 partition(ds1='part1')
POSTHOOK: type: LOAD
POSTHOOK: Output: default@t1
POSTHOOK: Output: default@t1@ds1=part1
PREHOOK: query: load data local inpath '../data/files/srcbucket21.txt' INTO 
TABLE t1 partition(ds1='part1')
PREHOOK: type: LOAD
PREHOOK: Output: default@t1@ds1=part1
POSTHOOK: query: load data local inpath '../data/files/srcbucket21.txt' INTO 
TABLE t1 partition(ds1='part1')
POSTHOOK: type: LO

[jira] [Created] (HIVE-3841) Sampling in previous MR for range partitioning of next RS

2012-12-27 Thread Navis (JIRA)
Navis created HIVE-3841:
---

 Summary: Sampling in previous MR for range partitioning of next RS
 Key: HIVE-3841
 URL: https://issues.apache.org/jira/browse/HIVE-3841
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor


Currently hive enforces single reducer for order by clause, which can be 
performance bottleneck. 

If sampling could be done on ordering key at previous MR stage, multiple 
reducers could be assigned for it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3286) Explicit skew join on user provided condition

2012-12-27 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-3286:
--

Attachment: HIVE-3286.D4287.6.patch

navis updated the revision "HIVE-3286 [jira] Explicit skew join on user 
provided condition".
Reviewers: JIRA

  Rebased to trunk
  Removed explicit assigning


REVISION DETAIL
  https://reviews.facebook.net/D4287

AFFECTED FILES
  ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/io/HiveKey.java
  ql/src/java/org/apache/hadoop/hive/ql/io/SkewedKeyPartitioner.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g
  ql/src/java/org/apache/hadoop/hive/ql/parse/QBJoinTree.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/SkewContext.java
  ql/src/test/queries/clientpositive/skewjoin_explict.q
  ql/src/test/results/clientpositive/skewjoin_explict.q.out

To: JIRA, navis
Cc: njain


> Explicit skew join on user provided condition
> -
>
> Key: HIVE-3286
> URL: https://issues.apache.org/jira/browse/HIVE-3286
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
> Attachments: HIVE-3286.D4287.5.patch, HIVE-3286.D4287.6.patch
>
>
> Join operation on table with skewed data takes most of execution time 
> handling the skewed keys. But mostly we already know about that and even know 
> what is look like the skewed keys.
> If we can explicitly assign reducer slots for the skewed keys, total 
> execution time could be greatly shortened.
> As for a start, I've extended join grammar something like this.
> {code}
> select * from src a join src b on a.key=b.key skew on (a.key+1 < 50, a.key+1 
> < 100, a.key < 150);
> {code}
> which means if above query is executed by 20 reducers, one reducer for 
> a.key+1 < 50, one reducer for 50 <= a.key+1 < 100, one reducer for 99 <= 
> a.key < 150, and 17 reducers for others (could be extended to assign more 
> than one reducer later)
> This can be only used with common-inner-equi joins. And skew condition should 
> be composed of join keys only.
> Work till done now will be updated shortly after code cleanup.
> 
> Skew expressions* in "SKEW ON (expr, expr, ...)" are evaluated sequentially 
> at runtime, and first 'true' one decides skew group for the row. Each skew 
> group has reserved partition slot(s), to which all rows in a group would be 
> assigned. 
> The number of partition slot reserved for each group is decided also at 
> runtime by simple calculation of percentage. If a skew group is "CLUSTER BY 
> 20 PERCENT" and total partition slot (=number of reducer) is 20, that group 
> will reserve 4 partition slots, etc.
> "DISTRIBUTE BY" decides how the rows in a group is dispersed in the range of 
> reserved slots (If there is only one slot for a group, this is meaningless). 
> Currently, three distribution policies are available: RANDOM, KEYS, 
> . 
> 1. RANDOM : rows of driver** alias are dispersed by random and rows of 
> non-driver alias are duplicated for all the slots (default if not specified)
> 2. KEYS : determined by hash value of keys (same with previous)
> 3. expression : determined by hash of object evaluated by user-provided 
> expression
> Only possible with inner, equi, common-joins. Not yet supports join tree 
> merging.
> Might be used by other RS users like "SORT BY" or "GROUP BY"
> If there exists column statistics for the key, it could be possible to apply 
> automatically.
> For example, if 20 reducers are used for the query below,
> {code}
> select count(*) from src a join src b on a.key=b.key skew on (
>a.key = '0' CLUSTER BY 10 PERCENT,
>b.key < '100' CLUSTER BY 20 PERCENT DISTRIBUTE BY upper(b.key),
>cast(a.key as int) > 300 CLUSTER BY 40 PERCENT DISTRIBUTE BY KEYS);
> {code}
> group-0 will reserve slots 6~7, group-1 8~11, group-2 12~19 and others will 
> reserve slots 0~5.
> For a row with key='0' from alias a, the row is randomly assigned in the 
> range of 6~7 (driver alias) : 6 or 7
> For a row with key='0' from alias b, the row is disributed for all slots in 
> 6~7 (non-driver alias) : 6 and 7
> For a row with key='50', the row is assigned in the range of 8~11 by hashcode 
> of upper(b.key) : 8 + (hash(upper(key)) % 4)
> For a row with key='500', the row is assigned in the range of 12~19 by 
> hashcode of join key : 12 + (hash(key) % 8)
> For a row with key='200', this is not belong to any skew group : hash(key) % 6
> *expressions i

[jira] [Commented] (HIVE-3446) PrimitiveObjectInspector doesn't handle timestamps properly

2012-12-27 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13540259#comment-13540259
 ] 

Edward Capriolo commented on HIVE-3446:
---

Wow I totally lost track of this one. yea lets do it. Are we committing on 0.10 
release at this point or just trunk?

> PrimitiveObjectInspector doesn't handle timestamps properly
> ---
>
> Key: HIVE-3446
> URL: https://issues.apache.org/jira/browse/HIVE-3446
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.10.0, 0.9.1
>Reporter: Sam Tunnicliffe
> Attachments: HIVE-3446.1.patch.txt
>
>
> Getting java.sql.Timestamp from a TimestampWritable is broken due to an 
> incorrect mapping in PrimitiveObjectInspectorUtils.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3286) Explicit skew join on user provided condition

2012-12-27 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13540252#comment-13540252
 ] 

Phabricator commented on HIVE-3286:
---

navis has commented on the revision "HIVE-3286 [jira] Explicit skew join on 
user provided condition".

  RS.dedup does not applied to JOIN-RS case either. Then, could it be enough to 
remove explicit assigning partition number (a.key = 0 CLUSTER BY 2 PARTITIONS, 
as you mentioned)?

  And.. creating a optimizer for skew join would be a really good thing (and 
also had intent to do it). I think current code base could be simply copied to 
the optimizer and it seemed not so hard.

REVISION DETAIL
  https://reviews.facebook.net/D4287

To: JIRA, navis
Cc: njain


> Explicit skew join on user provided condition
> -
>
> Key: HIVE-3286
> URL: https://issues.apache.org/jira/browse/HIVE-3286
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
> Attachments: HIVE-3286.D4287.5.patch
>
>
> Join operation on table with skewed data takes most of execution time 
> handling the skewed keys. But mostly we already know about that and even know 
> what is look like the skewed keys.
> If we can explicitly assign reducer slots for the skewed keys, total 
> execution time could be greatly shortened.
> As for a start, I've extended join grammar something like this.
> {code}
> select * from src a join src b on a.key=b.key skew on (a.key+1 < 50, a.key+1 
> < 100, a.key < 150);
> {code}
> which means if above query is executed by 20 reducers, one reducer for 
> a.key+1 < 50, one reducer for 50 <= a.key+1 < 100, one reducer for 99 <= 
> a.key < 150, and 17 reducers for others (could be extended to assign more 
> than one reducer later)
> This can be only used with common-inner-equi joins. And skew condition should 
> be composed of join keys only.
> Work till done now will be updated shortly after code cleanup.
> 
> Skew expressions* in "SKEW ON (expr, expr, ...)" are evaluated sequentially 
> at runtime, and first 'true' one decides skew group for the row. Each skew 
> group has reserved partition slot(s), to which all rows in a group would be 
> assigned. 
> The number of partition slot reserved for each group is decided also at 
> runtime by simple calculation of percentage. If a skew group is "CLUSTER BY 
> 20 PERCENT" and total partition slot (=number of reducer) is 20, that group 
> will reserve 4 partition slots, etc.
> "DISTRIBUTE BY" decides how the rows in a group is dispersed in the range of 
> reserved slots (If there is only one slot for a group, this is meaningless). 
> Currently, three distribution policies are available: RANDOM, KEYS, 
> . 
> 1. RANDOM : rows of driver** alias are dispersed by random and rows of 
> non-driver alias are duplicated for all the slots (default if not specified)
> 2. KEYS : determined by hash value of keys (same with previous)
> 3. expression : determined by hash of object evaluated by user-provided 
> expression
> Only possible with inner, equi, common-joins. Not yet supports join tree 
> merging.
> Might be used by other RS users like "SORT BY" or "GROUP BY"
> If there exists column statistics for the key, it could be possible to apply 
> automatically.
> For example, if 20 reducers are used for the query below,
> {code}
> select count(*) from src a join src b on a.key=b.key skew on (
>a.key = '0' CLUSTER BY 10 PERCENT,
>b.key < '100' CLUSTER BY 20 PERCENT DISTRIBUTE BY upper(b.key),
>cast(a.key as int) > 300 CLUSTER BY 40 PERCENT DISTRIBUTE BY KEYS);
> {code}
> group-0 will reserve slots 6~7, group-1 8~11, group-2 12~19 and others will 
> reserve slots 0~5.
> For a row with key='0' from alias a, the row is randomly assigned in the 
> range of 6~7 (driver alias) : 6 or 7
> For a row with key='0' from alias b, the row is disributed for all slots in 
> 6~7 (non-driver alias) : 6 and 7
> For a row with key='50', the row is assigned in the range of 8~11 by hashcode 
> of upper(b.key) : 8 + (hash(upper(key)) % 4)
> For a row with key='500', the row is assigned in the range of 12~19 by 
> hashcode of join key : 12 + (hash(key) % 8)
> For a row with key='200', this is not belong to any skew group : hash(key) % 6
> *expressions in skew condition : 
> 1. all expressions should be made of expression in join condition, which 
> means if join condition is "a.key=b.key", user can make any expression with 
> "a.key" or "b.key". But if join condition is a.key+1=b.key, user cannot make 
> expression with "a.key" solely (should make expression with "a.key+1"). 
> 2. all expressions should reference one and only-one side of aliases. For 
> example, simple constant expressions or expression

[jira] [Commented] (HIVE-3286) Explicit skew join on user provided condition

2012-12-27 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13540247#comment-13540247
 ] 

Phabricator commented on HIVE-3286:
---

navis has commented on the revision "HIVE-3286 [jira] Explicit skew join on 
user provided condition".

  1. I think this is a kind of join hint. So just disabled if it's not possible 
(outer join, invalid expression, etc.).
  2. RS dedup does not applied when child RS is for GBY or JOIN. Test for 
JOIN+SORTBY case will be added.

REVISION DETAIL
  https://reviews.facebook.net/D4287

To: JIRA, navis
Cc: njain


> Explicit skew join on user provided condition
> -
>
> Key: HIVE-3286
> URL: https://issues.apache.org/jira/browse/HIVE-3286
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
> Attachments: HIVE-3286.D4287.5.patch
>
>
> Join operation on table with skewed data takes most of execution time 
> handling the skewed keys. But mostly we already know about that and even know 
> what is look like the skewed keys.
> If we can explicitly assign reducer slots for the skewed keys, total 
> execution time could be greatly shortened.
> As for a start, I've extended join grammar something like this.
> {code}
> select * from src a join src b on a.key=b.key skew on (a.key+1 < 50, a.key+1 
> < 100, a.key < 150);
> {code}
> which means if above query is executed by 20 reducers, one reducer for 
> a.key+1 < 50, one reducer for 50 <= a.key+1 < 100, one reducer for 99 <= 
> a.key < 150, and 17 reducers for others (could be extended to assign more 
> than one reducer later)
> This can be only used with common-inner-equi joins. And skew condition should 
> be composed of join keys only.
> Work till done now will be updated shortly after code cleanup.
> 
> Skew expressions* in "SKEW ON (expr, expr, ...)" are evaluated sequentially 
> at runtime, and first 'true' one decides skew group for the row. Each skew 
> group has reserved partition slot(s), to which all rows in a group would be 
> assigned. 
> The number of partition slot reserved for each group is decided also at 
> runtime by simple calculation of percentage. If a skew group is "CLUSTER BY 
> 20 PERCENT" and total partition slot (=number of reducer) is 20, that group 
> will reserve 4 partition slots, etc.
> "DISTRIBUTE BY" decides how the rows in a group is dispersed in the range of 
> reserved slots (If there is only one slot for a group, this is meaningless). 
> Currently, three distribution policies are available: RANDOM, KEYS, 
> . 
> 1. RANDOM : rows of driver** alias are dispersed by random and rows of 
> non-driver alias are duplicated for all the slots (default if not specified)
> 2. KEYS : determined by hash value of keys (same with previous)
> 3. expression : determined by hash of object evaluated by user-provided 
> expression
> Only possible with inner, equi, common-joins. Not yet supports join tree 
> merging.
> Might be used by other RS users like "SORT BY" or "GROUP BY"
> If there exists column statistics for the key, it could be possible to apply 
> automatically.
> For example, if 20 reducers are used for the query below,
> {code}
> select count(*) from src a join src b on a.key=b.key skew on (
>a.key = '0' CLUSTER BY 10 PERCENT,
>b.key < '100' CLUSTER BY 20 PERCENT DISTRIBUTE BY upper(b.key),
>cast(a.key as int) > 300 CLUSTER BY 40 PERCENT DISTRIBUTE BY KEYS);
> {code}
> group-0 will reserve slots 6~7, group-1 8~11, group-2 12~19 and others will 
> reserve slots 0~5.
> For a row with key='0' from alias a, the row is randomly assigned in the 
> range of 6~7 (driver alias) : 6 or 7
> For a row with key='0' from alias b, the row is disributed for all slots in 
> 6~7 (non-driver alias) : 6 and 7
> For a row with key='50', the row is assigned in the range of 8~11 by hashcode 
> of upper(b.key) : 8 + (hash(upper(key)) % 4)
> For a row with key='500', the row is assigned in the range of 12~19 by 
> hashcode of join key : 12 + (hash(key) % 8)
> For a row with key='200', this is not belong to any skew group : hash(key) % 6
> *expressions in skew condition : 
> 1. all expressions should be made of expression in join condition, which 
> means if join condition is "a.key=b.key", user can make any expression with 
> "a.key" or "b.key". But if join condition is a.key+1=b.key, user cannot make 
> expression with "a.key" solely (should make expression with "a.key+1"). 
> 2. all expressions should reference one and only-one side of aliases. For 
> example, simple constant expressions or expressions referencing both side of 
> join condition ("a.key+b.key<100") is not allowed.
> 3. all functions in expression should be deteministic and stateless.

[jira] [Commented] (HIVE-3805) Resolve TODO in TUGIBasedProcessor

2012-12-27 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13540245#comment-13540245
 ] 

Ashutosh Chauhan commented on HIVE-3805:


Ok. I am fine with this current patch. We can track correct fix in a separate 
jira. 

> Resolve TODO in TUGIBasedProcessor
> --
>
> Key: HIVE-3805
> URL: https://issues.apache.org/jira/browse/HIVE-3805
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 0.11
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Attachments: HIVE-3805.1.patch.txt
>
>
> There's a TODO in TUGIBasedProcessor
> // TODO get rid of following reflection after THRIFT-1465 is fixed.
> Now that we have upgraded to Thrift 9 THRIFT-1465 is available.
> This will also fix an issue where fb303 counters cannot be collected if the 
> TUGIBasedProcessor is used.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3446) PrimitiveObjectInspector doesn't handle timestamps properly

2012-12-27 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13540243#comment-13540243
 ] 

Ashutosh Chauhan commented on HIVE-3446:


Pinging [~appodictic]

> PrimitiveObjectInspector doesn't handle timestamps properly
> ---
>
> Key: HIVE-3446
> URL: https://issues.apache.org/jira/browse/HIVE-3446
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.10.0, 0.9.1
>Reporter: Sam Tunnicliffe
> Attachments: HIVE-3446.1.patch.txt
>
>
> Getting java.sql.Timestamp from a TimestampWritable is broken due to an 
> incorrect mapping in PrimitiveObjectInspectorUtils.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Build failed in Jenkins: Hive-0.10.0-SNAPSHOT-h0.20.1 #16

2012-12-27 Thread Apache Jenkins Server
See 

--
[...truncated 41935 lines...]
[junit] Hadoop job information for null: number of mappers: 0; number of 
reducers: 0
[junit] 2012-12-27 15:42:08,779 null map = 100%,  reduce = 100%
[junit] Ended Job = job_local_0001
[junit] Execution completed successfully
[junit] Mapred Local Task Succeeded . Convert the Join into MapJoin
[junit] POSTHOOK: query: select count(1) as cnt from testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 

[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] Copying file: 

[junit] PREHOOK: query: create table testhivedrivertable (num int)
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: load data local inpath 
'
 into table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] Copying data from 

[junit] Loading data to table default.testhivedrivertable
[junit] Table default.testhivedrivertable stats: [num_partitions: 0, 
num_files: 1, num_rows: 0, total_size: 5812, raw_data_size: 0]
[junit] POSTHOOK: query: load data local inpath 
'
 into table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: select * from testhivedrivertable limit 10
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: 

[junit] POSTHOOK: query: select * from testhivedrivertable limit 10
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 

[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (num int)
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivert

Hive-trunk-h0.21 - Build # 1882 - Fixed

2012-12-27 Thread Apache Jenkins Server
Changes for Build #1881
[namit] HIVE-581 improve group by syntax
(Zhenxiao Luo via namit)


Changes for Build #1882
[hashutosh] HIVE-3802 : testCliDriver_input39 fails on hadoop-1 (Gunther 
Hagleitner via Ashutosh Chauhan)

[hashutosh] HIVE-3801 : testCliDriver_loadpart_err fails on hadoop-1 (Gunther 
Hagleitner via Ashutosh Chauhan)

[hashutosh] HIVE-3800 : testCliDriver_combine2 fails on hadoop-1 (Gunther 
Hagleitner via Ashutosh Chauhan)




All tests passed

The Apache Jenkins build system has built Hive-trunk-h0.21 (build #1882)

Status: Fixed

Check console output at https://builds.apache.org/job/Hive-trunk-h0.21/1882/ to 
view the results.

[jira] [Commented] (HIVE-3802) testCliDriver_input39 fails on hadoop-1

2012-12-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13540216#comment-13540216
 ] 

Hudson commented on HIVE-3802:
--

Integrated in Hive-trunk-h0.21 #1882 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/1882/])
HIVE-3802 : testCliDriver_input39 fails on hadoop-1 (Gunther Hagleitner via 
Ashutosh Chauhan) (Revision 1426265)

 Result = SUCCESS
hashutosh : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1426265
Files : 
* /hive/trunk/ql/src/test/queries/clientpositive/input39.q
* /hive/trunk/ql/src/test/queries/clientpositive/input39_hadoop20.q
* /hive/trunk/ql/src/test/results/clientpositive/input39.q.out
* /hive/trunk/ql/src/test/results/clientpositive/input39_hadoop20.q.out


> testCliDriver_input39 fails on hadoop-1
> ---
>
> Key: HIVE-3802
> URL: https://issues.apache.org/jira/browse/HIVE-3802
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.10.0
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
> Fix For: 0.11
>
> Attachments: HIVE-3802.patch
>
>
> This test is marked as flaky and disabled for all versions, but hadoop-1 was 
> missed in that list.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3800) testCliDriver_combine2 fails on hadoop-1

2012-12-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13540214#comment-13540214
 ] 

Hudson commented on HIVE-3800:
--

Integrated in Hive-trunk-h0.21 #1882 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/1882/])
HIVE-3800 : testCliDriver_combine2 fails on hadoop-1 (Gunther Hagleitner 
via Ashutosh Chauhan) (Revision 1426261)

 Result = SUCCESS
hashutosh : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1426261
Files : 
* /hive/trunk/ql/src/test/queries/clientpositive/combine2.q
* /hive/trunk/ql/src/test/queries/clientpositive/combine2_hadoop20.q
* /hive/trunk/ql/src/test/results/clientpositive/combine2.q.out
* /hive/trunk/ql/src/test/results/clientpositive/combine2_hadoop20.q.out


> testCliDriver_combine2 fails on hadoop-1
> 
>
> Key: HIVE-3800
> URL: https://issues.apache.org/jira/browse/HIVE-3800
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.10.0
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
> Fix For: 0.11
>
> Attachments: HIVE-3800.2.patch, HIVE-3800.patch
>
>
> Actually functionality is working correctly, but incorrect include/exclude 
> macro make cause the wrong query file to be run.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3801) testCliDriver_loadpart_err fails on hadoop-1

2012-12-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13540215#comment-13540215
 ] 

Hudson commented on HIVE-3801:
--

Integrated in Hive-trunk-h0.21 #1882 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/1882/])
HIVE-3801 : testCliDriver_loadpart_err fails on hadoop-1 (Gunther 
Hagleitner via Ashutosh Chauhan) (Revision 1426263)

 Result = SUCCESS
hashutosh : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1426263
Files : 
* /hive/trunk/ql/src/test/queries/clientpositive/loadpart_err.q


> testCliDriver_loadpart_err fails on hadoop-1
> 
>
> Key: HIVE-3801
> URL: https://issues.apache.org/jira/browse/HIVE-3801
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.10.0
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
> Fix For: 0.11
>
> Attachments: HIVE-3801.patch
>
>
> This test is marked as flaky and disabled for all versions, but hadoop-1 was 
> missed in that list.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3817) Adding the name space for the maven task for the maven-publish target.

2012-12-27 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13540205#comment-13540205
 ] 

Ashutosh Chauhan commented on HIVE-3817:


+1

> Adding the name space for the maven task for the maven-publish target.
> --
>
> Key: HIVE-3817
> URL: https://issues.apache.org/jira/browse/HIVE-3817
> Project: Hive
>  Issue Type: Bug
>  Components: Build Infrastructure
>Affects Versions: 0.10.0
>Reporter: Ashish Singh
>Assignee: Ashish Singh
> Attachments: HIVE-3817.patch
>
>
> maven task for the maven-publish target is missing from the build.xml.
> This is causing maven deploy issues.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3840) Printing NULL valued columns for different types in Hive cli

2012-12-27 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13540198#comment-13540198
 ] 

Ashutosh Chauhan commented on HIVE-3840:


Currently, hive cli behavior is following:
* It prints null valued primitive types (except binary) as *NULL*
* Null binary column is printed as *null*
* All null complex types are printed as *null*
* All null primitive types within non-null complex types are printed as *null*

This hive cli behavior is inconsistent with itself. MySQL prints *NULL* for all 
column type and we should do the same in all cases.

> Printing NULL valued columns for different types in Hive cli
> 
>
> Key: HIVE-3840
> URL: https://issues.apache.org/jira/browse/HIVE-3840
> Project: Hive
>  Issue Type: Bug
>Reporter: Ashutosh Chauhan
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-3840) Printing NULL valued columns for different types in Hive cli

2012-12-27 Thread Ashutosh Chauhan (JIRA)
Ashutosh Chauhan created HIVE-3840:
--

 Summary: Printing NULL valued columns for different types in Hive 
cli
 Key: HIVE-3840
 URL: https://issues.apache.org/jira/browse/HIVE-3840
 Project: Hive
  Issue Type: Bug
Reporter: Ashutosh Chauhan




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Build failed in Jenkins: Hive-0.9.1-SNAPSHOT-h0.21 #242

2012-12-27 Thread Apache Jenkins Server
See 

--
[...truncated 36422 lines...]
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 
file:/tmp/hudson/hive_2012-12-27_14-15-07_370_3694730629556279578/-mr-1
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (num int)
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: load data local inpath 
'
 into table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] Copying data from 

[junit] Copying file: 

[junit] Loading data to table default.testhivedrivertable
[junit] POSTHOOK: query: load data local inpath 
'
 into table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: select * from testhivedrivertable limit 10
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: 
file:/tmp/hudson/hive_2012-12-27_14-15-12_215_924069779776175308/-mr-1
[junit] POSTHOOK: query: select * from testhivedrivertable limit 10
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 
file:/tmp/hudson/hive_2012-12-27_14-15-12_215_924069779776175308/-mr-1
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (num int)
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=
[junit] Hive history 
file=
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (key int, val

[jira] [Updated] (HIVE-3718) Add check to determine whether partition can be dropped at Semantic Analysis time

2012-12-27 Thread Pamela Vagata (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pamela Vagata updated HIVE-3718:


Status: Patch Available  (was: Open)

Sorry about the previous patch - forgot to run git diff with option 
--no-prefix, I ran the tests with this patch and it all passes.

> Add check to determine whether partition can be dropped at Semantic Analysis 
> time
> -
>
> Key: HIVE-3718
> URL: https://issues.apache.org/jira/browse/HIVE-3718
> Project: Hive
>  Issue Type: Task
>  Components: CLI
>Reporter: Pamela Vagata
>Assignee: Pamela Vagata
>Priority: Minor
> Attachments: HIVE-3718.1.patch.txt, HIVE-3718.2.patch.txt, 
> HIVE-3718.3.patch.txt, hive.3718.4.patch, HIVE-3718.5.patch.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3718) Add check to determine whether partition can be dropped at Semantic Analysis time

2012-12-27 Thread Pamela Vagata (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pamela Vagata updated HIVE-3718:


Attachment: HIVE-3718.5.patch.txt

> Add check to determine whether partition can be dropped at Semantic Analysis 
> time
> -
>
> Key: HIVE-3718
> URL: https://issues.apache.org/jira/browse/HIVE-3718
> Project: Hive
>  Issue Type: Task
>  Components: CLI
>Reporter: Pamela Vagata
>Assignee: Pamela Vagata
>Priority: Minor
> Attachments: HIVE-3718.1.patch.txt, HIVE-3718.2.patch.txt, 
> HIVE-3718.3.patch.txt, hive.3718.4.patch, HIVE-3718.5.patch.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3718) Add check to determine whether partition can be dropped at Semantic Analysis time

2012-12-27 Thread Pamela Vagata (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pamela Vagata updated HIVE-3718:


Attachment: (was: HIVE-3718.4.patch.txt)

> Add check to determine whether partition can be dropped at Semantic Analysis 
> time
> -
>
> Key: HIVE-3718
> URL: https://issues.apache.org/jira/browse/HIVE-3718
> Project: Hive
>  Issue Type: Task
>  Components: CLI
>Reporter: Pamela Vagata
>Assignee: Pamela Vagata
>Priority: Minor
> Attachments: HIVE-3718.1.patch.txt, HIVE-3718.2.patch.txt, 
> HIVE-3718.3.patch.txt, hive.3718.4.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3839) adding .gitattributes file for normalizing line endings during cross platform development

2012-12-27 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-3839:


Status: Patch Available  (was: Open)

> adding .gitattributes file for normalizing line endings during cross platform 
> development
> -
>
> Key: HIVE-3839
> URL: https://issues.apache.org/jira/browse/HIVE-3839
> Project: Hive
>  Issue Type: Bug
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 0.11
>
> Attachments: HIVE-3839.1.patch
>
>
> On the lines of HADOOP-8912 .
> Many developers clone the apache/hive git repository to make changes.
> Adding a .gitattributes file will help in doing the right thing while 
> checking out files on Windows (eg- adding \r\n on checkout of most text 
> files, preserving \n in case of *.sh files ), and replacing \r\n with \n 
> while checking in code back into a git repository.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3839) adding .gitattributes file for normalizing line endings during cross platform development

2012-12-27 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-3839:


Attachment: HIVE-3839.1.patch

> adding .gitattributes file for normalizing line endings during cross platform 
> development
> -
>
> Key: HIVE-3839
> URL: https://issues.apache.org/jira/browse/HIVE-3839
> Project: Hive
>  Issue Type: Bug
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 0.11
>
> Attachments: HIVE-3839.1.patch
>
>
> On the lines of HADOOP-8912 .
> Many developers clone the apache/hive git repository to make changes.
> Adding a .gitattributes file will help in doing the right thing while 
> checking out files on Windows (eg- adding \r\n on checkout of most test 
> files, preserving \n in case of *.sh files ), and replacing \r\n with \n 
> while checking in code back into a git repository.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3839) adding .gitattributes file for normalizing line endings during cross platform development

2012-12-27 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-3839:


Description: 
On the lines of HADOOP-8912 .
Many developers clone the apache/hive git repository to make changes.
Adding a .gitattributes file will help in doing the right thing while checking 
out files on Windows (eg- adding \r\n on checkout of most text files, 
preserving \n in case of *.sh files ), and replacing \r\n with \n while 
checking in code back into a git repository.


  was:
On the lines of HADOOP-8912 .
Many developers clone the apache/hive git repository to make changes.
Adding a .gitattributes file will help in doing the right thing while checking 
out files on Windows (eg- adding \r\n on checkout of most test files, 
preserving \n in case of *.sh files ), and replacing \r\n with \n while 
checking in code back into a git repository.



> adding .gitattributes file for normalizing line endings during cross platform 
> development
> -
>
> Key: HIVE-3839
> URL: https://issues.apache.org/jira/browse/HIVE-3839
> Project: Hive
>  Issue Type: Bug
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 0.11
>
> Attachments: HIVE-3839.1.patch
>
>
> On the lines of HADOOP-8912 .
> Many developers clone the apache/hive git repository to make changes.
> Adding a .gitattributes file will help in doing the right thing while 
> checking out files on Windows (eg- adding \r\n on checkout of most text 
> files, preserving \n in case of *.sh files ), and replacing \r\n with \n 
> while checking in code back into a git repository.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-3839) adding .gitattributes file for normalizing line endings during cross platform development

2012-12-27 Thread Thejas M Nair (JIRA)
Thejas M Nair created HIVE-3839:
---

 Summary: adding .gitattributes file for normalizing line endings 
during cross platform development
 Key: HIVE-3839
 URL: https://issues.apache.org/jira/browse/HIVE-3839
 Project: Hive
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.11


On the lines of HADOOP-8912 .
Many developers clone the apache/hive git repository to make changes.
Adding a .gitattributes file will help in doing the right thing while checking 
out files on Windows (eg- adding \r\n on checkout of most test files, 
preserving \n in case of *.sh files ), and replacing \r\n with \n while 
checking in code back into a git repository.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-3838) Add input table name to MetaStoreEndFunctionContext for logging purposes

2012-12-27 Thread Pamela Vagata (JIRA)
Pamela Vagata created HIVE-3838:
---

 Summary: Add input table name to MetaStoreEndFunctionContext for 
logging purposes
 Key: HIVE-3838
 URL: https://issues.apache.org/jira/browse/HIVE-3838
 Project: Hive
  Issue Type: Task
Reporter: Pamela Vagata
Assignee: Pamela Vagata
Priority: Minor




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Hive-trunk-h0.21 - Build # 1881 - Failure

2012-12-27 Thread Apache Jenkins Server
Changes for Build #1881
[namit] HIVE-581 improve group by syntax
(Zhenxiao Luo via namit)




1 tests failed.
REGRESSION:  
org.apache.hadoop.hive.metastore.TestMetaStoreAuthorization.testMetaStoreAuthorization

Error Message:
null

Stack Trace:
org.apache.thrift.transport.TTransportException
at 
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at 
org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
at 
org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
at 
org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_database(ThriftHiveMetastore.java:412)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_database(ThriftHiveMetastore.java:399)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabase(HiveMetaStoreClient.java:736)
at 
org.apache.hadoop.hive.metastore.TestMetaStoreAuthorization.testMetaStoreAuthorization(TestMetaStoreAuthorization.java:82)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at junit.framework.TestCase.runTest(TestCase.java:168)
at junit.framework.TestCase.runBare(TestCase.java:134)
at junit.framework.TestResult$1.protect(TestResult.java:110)
at junit.framework.TestResult.runProtected(TestResult.java:128)
at junit.framework.TestResult.run(TestResult.java:113)
at junit.framework.TestCase.run(TestCase.java:124)
at junit.framework.TestSuite.runTest(TestSuite.java:232)
at junit.framework.TestSuite.run(TestSuite.java:227)
at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:518)
at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1052)
at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:906)




The Apache Jenkins build system has built Hive-trunk-h0.21 (build #1881)

Status: Failure

Check console output at https://builds.apache.org/job/Hive-trunk-h0.21/1881/ to 
view the results.

[jira] [Commented] (HIVE-581) improve group by syntax

2012-12-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13540079#comment-13540079
 ] 

Hudson commented on HIVE-581:
-

Integrated in Hive-trunk-h0.21 #1881 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/1881/])
HIVE-581 improve group by syntax
(Zhenxiao Luo via namit) (Revision 1426165)

 Result = FAILURE
namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1426165
Files : 
* /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
* /hive/trunk/conf/hive-default.xml.template
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
* /hive/trunk/ql/src/test/queries/clientnegative/groupby_invalid_position.q
* /hive/trunk/ql/src/test/queries/clientnegative/orderby_invalid_position.q
* /hive/trunk/ql/src/test/queries/clientnegative/orderby_position_unsupported.q
* /hive/trunk/ql/src/test/queries/clientpositive/groupby_position.q
* /hive/trunk/ql/src/test/results/clientnegative/groupby_invalid_position.q.out
* /hive/trunk/ql/src/test/results/clientnegative/orderby_invalid_position.q.out
* 
/hive/trunk/ql/src/test/results/clientnegative/orderby_position_unsupported.q.out
* /hive/trunk/ql/src/test/results/clientpositive/groupby_position.q.out


> improve group by syntax
> ---
>
> Key: HIVE-581
> URL: https://issues.apache.org/jira/browse/HIVE-581
> Project: Hive
>  Issue Type: Improvement
>  Components: Clients, Query Processor
>Reporter: Larry Ogrodnek
>Assignee: Zhenxiao Luo
>  Labels: features
> Fix For: 0.11
>
> Attachments: HIVE-581.1.patch.txt, HIVE-581.2.patch.txt
>
>
> It would be nice if group by allowed either column aliases or column position 
> (like mysql).
> It can be burdensome to have to repeat UDFs both in the select and in the 
> group by.
> e.g. instead of:
> select f1(col1), f2(col2), f3(col3), count(1) group by f1(col1), f2(col2), 
> f3(col3);
> it would allow:
> select f1(col1), f2(col2), f3(col3), count(1) group by 1, 2, 3;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3802) testCliDriver_input39 fails on hadoop-1

2012-12-27 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-3802:
---

   Resolution: Fixed
Fix Version/s: 0.11
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Gunther!

> testCliDriver_input39 fails on hadoop-1
> ---
>
> Key: HIVE-3802
> URL: https://issues.apache.org/jira/browse/HIVE-3802
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.10.0
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
> Fix For: 0.11
>
> Attachments: HIVE-3802.patch
>
>
> This test is marked as flaky and disabled for all versions, but hadoop-1 was 
> missed in that list.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3801) testCliDriver_loadpart_err fails on hadoop-1

2012-12-27 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-3801:
---

   Resolution: Fixed
Fix Version/s: 0.11
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Gunther!

> testCliDriver_loadpart_err fails on hadoop-1
> 
>
> Key: HIVE-3801
> URL: https://issues.apache.org/jira/browse/HIVE-3801
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.10.0
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
> Fix For: 0.11
>
> Attachments: HIVE-3801.patch
>
>
> This test is marked as flaky and disabled for all versions, but hadoop-1 was 
> missed in that list.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3800) testCliDriver_combine2 fails on hadoop-1

2012-12-27 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-3800:
---

   Resolution: Fixed
Fix Version/s: 0.11
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Gunther!

> testCliDriver_combine2 fails on hadoop-1
> 
>
> Key: HIVE-3800
> URL: https://issues.apache.org/jira/browse/HIVE-3800
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.10.0
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
> Fix For: 0.11
>
> Attachments: HIVE-3800.2.patch, HIVE-3800.patch
>
>
> Actually functionality is working correctly, but incorrect include/exclude 
> macro make cause the wrong query file to be run.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Build failed in Jenkins: Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false #242

2012-12-27 Thread Apache Jenkins Server
See 


--
[...truncated 10060 lines...]

compile-test:
 [echo] Project: serde
[javac] Compiling 26 source files to 

[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.

create-dirs:
 [echo] Project: service
 [copy] Warning: 

 does not exist.

init:
 [echo] Project: service

ivy-init-settings:
 [echo] Project: service

ivy-resolve:
 [echo] Project: service
[ivy:resolve] :: loading settings :: file = 

[ivy:report] Processing 

 to 


ivy-retrieve:
 [echo] Project: service

compile:
 [echo] Project: service

ivy-resolve-test:
 [echo] Project: service

ivy-retrieve-test:
 [echo] Project: service

compile-test:
 [echo] Project: service
[javac] Compiling 2 source files to 


test:
 [echo] Project: hive

test-shims:
 [echo] Project: hive

test-conditions:
 [echo] Project: shims

gen-test:
 [echo] Project: shims

create-dirs:
 [echo] Project: shims
 [copy] Warning: 

 does not exist.

init:
 [echo] Project: shims

ivy-init-settings:
 [echo] Project: shims

ivy-resolve:
 [echo] Project: shims
[ivy:resolve] :: loading settings :: file = 

[ivy:report] Processing 

 to 


ivy-retrieve:
 [echo] Project: shims

compile:
 [echo] Project: shims
 [echo] Building shims 0.20

build_shims:
 [echo] Project: shims
 [echo] Compiling 

 against hadoop 0.20.2 
(

ivy-init-settings:
 [echo] Project: shims

ivy-resolve-hadoop-shim:
 [echo] Project: shims
[ivy:resolve] :: loading settings :: file = 


ivy-retrieve-hadoop-shim:
 [echo] Project: shims
 [echo] Building shims 0.20S

build_shims:
 [echo] Project: shims
 [echo] Compiling 

 against hadoop 1.0.0 
(

ivy-init-settings:
 [echo] Project: shims

ivy-resolve-hadoop-shim:
 [echo] Project: shims
[ivy:resolve] :: loading settings :: file = 


ivy-retrieve-hadoop-shim:
 [echo] Project: shims
 [echo] Building shims 0.23

build_shims:
 [echo] Project: shims
 [echo] Compiling 

 against hadoop 0.23.3 
(

Re: where should I start from if I want to customize my own hive release?

2012-12-27 Thread Mark Grover
Zongquan,
The developer guide would be a good starting place:
https://cwiki.apache.org/confluence/display/Hive/DeveloperGuide

Mark

On Wed, Dec 26, 2012 at 2:59 AM, Liu,Zongquan  wrote:
> Hi all,
>
> I am a newer to Hive development, I want to know where should I start if I 
> want to Customize a hive with my own feature.
> So, could anybody kind enough to provide me some advice?
> Thanks a lot!
>
> -zongquan
>
>


[jira] [Updated] (HIVE-581) improve group by syntax

2012-12-27 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-581:


   Resolution: Fixed
Fix Version/s: 0.11
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Committed. Thanks Zhenxiao

> improve group by syntax
> ---
>
> Key: HIVE-581
> URL: https://issues.apache.org/jira/browse/HIVE-581
> Project: Hive
>  Issue Type: Improvement
>  Components: Clients, Query Processor
>Reporter: Larry Ogrodnek
>Assignee: Zhenxiao Luo
>  Labels: features
> Fix For: 0.11
>
> Attachments: HIVE-581.1.patch.txt, HIVE-581.2.patch.txt
>
>
> It would be nice if group by allowed either column aliases or column position 
> (like mysql).
> It can be burdensome to have to repeat UDFs both in the select and in the 
> group by.
> e.g. instead of:
> select f1(col1), f2(col2), f3(col3), count(1) group by f1(col1), f2(col2), 
> f3(col3);
> it would allow:
> select f1(col1), f2(col2), f3(col3), count(1) group by 1, 2, 3;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3833) object inspectors should be initialized based on partition metadata

2012-12-27 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13539888#comment-13539888
 ] 

Namit Jain commented on HIVE-3833:
--

https://reviews.facebook.net/D7653

> object inspectors should be initialized based on partition metadata
> ---
>
> Key: HIVE-3833
> URL: https://issues.apache.org/jira/browse/HIVE-3833
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3833.1.patch
>
>
> Currently, different partitions can be picked up for the same input split 
> based on the
> serdes' etc. And, we dont allow to change the schema for 
> LazyColumnarBinarySerDe.
> Instead of that, different partitions should be part of the same split, only 
> if the
> partition schemas exactly match. The operator tree object inspectors should 
> be based
> on the partition schema. That would give greater flexibility and also help 
> using binary serde with rcfile

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3833) object inspectors should be initialized based on partition metadata

2012-12-27 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3833:
-

Attachment: hive.3833.1.patch

> object inspectors should be initialized based on partition metadata
> ---
>
> Key: HIVE-3833
> URL: https://issues.apache.org/jira/browse/HIVE-3833
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3833.1.patch
>
>
> Currently, different partitions can be picked up for the same input split 
> based on the
> serdes' etc. And, we dont allow to change the schema for 
> LazyColumnarBinarySerDe.
> Instead of that, different partitions should be part of the same split, only 
> if the
> partition schemas exactly match. The operator tree object inspectors should 
> be based
> on the partition schema. That would give greater flexibility and also help 
> using binary serde with rcfile

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3286) Explicit skew join on user provided condition

2012-12-27 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13539872#comment-13539872
 ] 

Namit Jain commented on HIVE-3286:
--

[~gangtimliu], it would be useful if you can come up with a way to store the 
histogram data for a table.
The skew join should be automatically able to use that.

> Explicit skew join on user provided condition
> -
>
> Key: HIVE-3286
> URL: https://issues.apache.org/jira/browse/HIVE-3286
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
> Attachments: HIVE-3286.D4287.5.patch
>
>
> Join operation on table with skewed data takes most of execution time 
> handling the skewed keys. But mostly we already know about that and even know 
> what is look like the skewed keys.
> If we can explicitly assign reducer slots for the skewed keys, total 
> execution time could be greatly shortened.
> As for a start, I've extended join grammar something like this.
> {code}
> select * from src a join src b on a.key=b.key skew on (a.key+1 < 50, a.key+1 
> < 100, a.key < 150);
> {code}
> which means if above query is executed by 20 reducers, one reducer for 
> a.key+1 < 50, one reducer for 50 <= a.key+1 < 100, one reducer for 99 <= 
> a.key < 150, and 17 reducers for others (could be extended to assign more 
> than one reducer later)
> This can be only used with common-inner-equi joins. And skew condition should 
> be composed of join keys only.
> Work till done now will be updated shortly after code cleanup.
> 
> Skew expressions* in "SKEW ON (expr, expr, ...)" are evaluated sequentially 
> at runtime, and first 'true' one decides skew group for the row. Each skew 
> group has reserved partition slot(s), to which all rows in a group would be 
> assigned. 
> The number of partition slot reserved for each group is decided also at 
> runtime by simple calculation of percentage. If a skew group is "CLUSTER BY 
> 20 PERCENT" and total partition slot (=number of reducer) is 20, that group 
> will reserve 4 partition slots, etc.
> "DISTRIBUTE BY" decides how the rows in a group is dispersed in the range of 
> reserved slots (If there is only one slot for a group, this is meaningless). 
> Currently, three distribution policies are available: RANDOM, KEYS, 
> . 
> 1. RANDOM : rows of driver** alias are dispersed by random and rows of 
> non-driver alias are duplicated for all the slots (default if not specified)
> 2. KEYS : determined by hash value of keys (same with previous)
> 3. expression : determined by hash of object evaluated by user-provided 
> expression
> Only possible with inner, equi, common-joins. Not yet supports join tree 
> merging.
> Might be used by other RS users like "SORT BY" or "GROUP BY"
> If there exists column statistics for the key, it could be possible to apply 
> automatically.
> For example, if 20 reducers are used for the query below,
> {code}
> select count(*) from src a join src b on a.key=b.key skew on (
>a.key = '0' CLUSTER BY 10 PERCENT,
>b.key < '100' CLUSTER BY 20 PERCENT DISTRIBUTE BY upper(b.key),
>cast(a.key as int) > 300 CLUSTER BY 40 PERCENT DISTRIBUTE BY KEYS);
> {code}
> group-0 will reserve slots 6~7, group-1 8~11, group-2 12~19 and others will 
> reserve slots 0~5.
> For a row with key='0' from alias a, the row is randomly assigned in the 
> range of 6~7 (driver alias) : 6 or 7
> For a row with key='0' from alias b, the row is disributed for all slots in 
> 6~7 (non-driver alias) : 6 and 7
> For a row with key='50', the row is assigned in the range of 8~11 by hashcode 
> of upper(b.key) : 8 + (hash(upper(key)) % 4)
> For a row with key='500', the row is assigned in the range of 12~19 by 
> hashcode of join key : 12 + (hash(key) % 8)
> For a row with key='200', this is not belong to any skew group : hash(key) % 6
> *expressions in skew condition : 
> 1. all expressions should be made of expression in join condition, which 
> means if join condition is "a.key=b.key", user can make any expression with 
> "a.key" or "b.key". But if join condition is a.key+1=b.key, user cannot make 
> expression with "a.key" solely (should make expression with "a.key+1"). 
> 2. all expressions should reference one and only-one side of aliases. For 
> example, simple constant expressions or expressions referencing both side of 
> join condition ("a.key+b.key<100") is not allowed.
> 3. all functions in expression should be deteministic and stateless.
> 4. if "DISTRIBUTED BY expression" is used, distibution expression also should 
> have same alias with skew expression.
> **driver alias :
> 1. driver alias means the sole referenced alias from skew expression, which 
> is important for RANDOM distribut

[jira] [Updated] (HIVE-3286) Explicit skew join on user provided condition

2012-12-27 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3286:
-

Status: Open  (was: Patch Available)

Comments on phbaricator -

[~navis], can you refresh. do some cleanups - address comments.
This would be really useful.

remove the syntax:

a.key = 0 CLUSTER BY 2 PARTITIONS,



What does the above mean -- not clear.

Also, can you restructure the code in such a way, that in future if histogram
data is available for a table (like skewed data), we should be able to convert 
the
join to use this ? I mean, this data instead of coming from the query, can come 
from
the table metadata.


> Explicit skew join on user provided condition
> -
>
> Key: HIVE-3286
> URL: https://issues.apache.org/jira/browse/HIVE-3286
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
> Attachments: HIVE-3286.D4287.5.patch
>
>
> Join operation on table with skewed data takes most of execution time 
> handling the skewed keys. But mostly we already know about that and even know 
> what is look like the skewed keys.
> If we can explicitly assign reducer slots for the skewed keys, total 
> execution time could be greatly shortened.
> As for a start, I've extended join grammar something like this.
> {code}
> select * from src a join src b on a.key=b.key skew on (a.key+1 < 50, a.key+1 
> < 100, a.key < 150);
> {code}
> which means if above query is executed by 20 reducers, one reducer for 
> a.key+1 < 50, one reducer for 50 <= a.key+1 < 100, one reducer for 99 <= 
> a.key < 150, and 17 reducers for others (could be extended to assign more 
> than one reducer later)
> This can be only used with common-inner-equi joins. And skew condition should 
> be composed of join keys only.
> Work till done now will be updated shortly after code cleanup.
> 
> Skew expressions* in "SKEW ON (expr, expr, ...)" are evaluated sequentially 
> at runtime, and first 'true' one decides skew group for the row. Each skew 
> group has reserved partition slot(s), to which all rows in a group would be 
> assigned. 
> The number of partition slot reserved for each group is decided also at 
> runtime by simple calculation of percentage. If a skew group is "CLUSTER BY 
> 20 PERCENT" and total partition slot (=number of reducer) is 20, that group 
> will reserve 4 partition slots, etc.
> "DISTRIBUTE BY" decides how the rows in a group is dispersed in the range of 
> reserved slots (If there is only one slot for a group, this is meaningless). 
> Currently, three distribution policies are available: RANDOM, KEYS, 
> . 
> 1. RANDOM : rows of driver** alias are dispersed by random and rows of 
> non-driver alias are duplicated for all the slots (default if not specified)
> 2. KEYS : determined by hash value of keys (same with previous)
> 3. expression : determined by hash of object evaluated by user-provided 
> expression
> Only possible with inner, equi, common-joins. Not yet supports join tree 
> merging.
> Might be used by other RS users like "SORT BY" or "GROUP BY"
> If there exists column statistics for the key, it could be possible to apply 
> automatically.
> For example, if 20 reducers are used for the query below,
> {code}
> select count(*) from src a join src b on a.key=b.key skew on (
>a.key = '0' CLUSTER BY 10 PERCENT,
>b.key < '100' CLUSTER BY 20 PERCENT DISTRIBUTE BY upper(b.key),
>cast(a.key as int) > 300 CLUSTER BY 40 PERCENT DISTRIBUTE BY KEYS);
> {code}
> group-0 will reserve slots 6~7, group-1 8~11, group-2 12~19 and others will 
> reserve slots 0~5.
> For a row with key='0' from alias a, the row is randomly assigned in the 
> range of 6~7 (driver alias) : 6 or 7
> For a row with key='0' from alias b, the row is disributed for all slots in 
> 6~7 (non-driver alias) : 6 and 7
> For a row with key='50', the row is assigned in the range of 8~11 by hashcode 
> of upper(b.key) : 8 + (hash(upper(key)) % 4)
> For a row with key='500', the row is assigned in the range of 12~19 by 
> hashcode of join key : 12 + (hash(key) % 8)
> For a row with key='200', this is not belong to any skew group : hash(key) % 6
> *expressions in skew condition : 
> 1. all expressions should be made of expression in join condition, which 
> means if join condition is "a.key=b.key", user can make any expression with 
> "a.key" or "b.key". But if join condition is a.key+1=b.key, user cannot make 
> expression with "a.key" solely (should make expression with "a.key+1"). 
> 2. all expressions should reference one and only-one side of aliases. For 
> example, simple constant expressions or expressions referencing both side of 
> join condition ("a.key+b.key<100") is not allowed.
> 3. all funct

[jira] [Commented] (HIVE-581) improve group by syntax

2012-12-27 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13539869#comment-13539869
 ] 

Namit Jain commented on HIVE-581:
-

+1

Running tests

> improve group by syntax
> ---
>
> Key: HIVE-581
> URL: https://issues.apache.org/jira/browse/HIVE-581
> Project: Hive
>  Issue Type: Improvement
>  Components: Clients, Query Processor
>Reporter: Larry Ogrodnek
>Assignee: Zhenxiao Luo
>  Labels: features
> Attachments: HIVE-581.1.patch.txt, HIVE-581.2.patch.txt
>
>
> It would be nice if group by allowed either column aliases or column position 
> (like mysql).
> It can be burdensome to have to repeat UDFs both in the select and in the 
> group by.
> e.g. instead of:
> select f1(col1), f2(col2), f3(col3), count(1) group by f1(col1), f2(col2), 
> f3(col3);
> it would allow:
> select f1(col1), f2(col2), f3(col3), count(1) group by 1, 2, 3;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira