[jira] [Updated] (HIVE-2077) Allow HBaseStorageHandler to work with hbase 0.90.1

2011-03-25 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HIVE-2077:
-

Description: 
Currently HBase handler works with hbase 0.89
We should make it work with 0.90.1 and utilize new features of 0.90.1

  was:
Currently HBase handler works with hbase 0.89
We should make it work with 0.90.1 as well.


> Allow HBaseStorageHandler to work with hbase 0.90.1
> ---
>
> Key: HIVE-2077
> URL: https://issues.apache.org/jira/browse/HIVE-2077
> Project: Hive
>  Issue Type: Improvement
>  Components: HBase Handler
>Affects Versions: 0.7.0
>Reporter: Ted Yu
> Fix For: 0.8.0
>
>
> Currently HBase handler works with hbase 0.89
> We should make it work with 0.90.1 and utilize new features of 0.90.1

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-2077) Allow HBaseStorageHandler to work with hbase 0.90.1

2011-03-25 Thread Ted Yu (JIRA)
Allow HBaseStorageHandler to work with hbase 0.90.1
---

 Key: HIVE-2077
 URL: https://issues.apache.org/jira/browse/HIVE-2077
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Affects Versions: 0.7.0
Reporter: Ted Yu
 Fix For: 0.8.0


Currently HBase handler works with hbase 0.89
We should make it work with 0.90.1 as well.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-1988) Make the delegation token issued by the MetaStore owned by the right user

2011-03-25 Thread Devaraj Das (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13011538#comment-13011538
 ] 

Devaraj Das commented on HIVE-1988:
---

This patch also does some refactoring of the audit logging part. It moves a 
class to do with audit logging to the 20S shim code.

> Make the delegation token issued by the MetaStore owned by the right user
> -
>
> Key: HIVE-1988
> URL: https://issues.apache.org/jira/browse/HIVE-1988
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Security, Server Infrastructure
>Affects Versions: 0.7.0
>Reporter: Devaraj Das
>Assignee: Devaraj Das
> Fix For: 0.8.0
>
> Attachments: hive-1988-3.patch, hive-1988.patch
>
>
> The 'owner' of any delegation token issued by the MetaStore is set to the 
> requesting user. When a delegation token is asked by the user himself during 
> a job submission, this is fine. However, in the case where the token is 
> requested for by services (e.g., Oozie), on behalf of the user, the token's 
> owner is set to the user the service is running as. Later on, when the token 
> is used by a MapReduce task, the MetaStore treats the incoming request as 
> coming from Oozie and does operations as Oozie. This means any new directory 
> creations (e.g., create_table) on the hdfs by the MetaStore will end up with 
> Oozie as the owner.
> Also, the MetaStore doesn't check whether a user asking for a token on behalf 
> of some other user, is actually authorized to act on behalf of that other 
> user. We should start using the ProxyUser authorization in the MetaStore 
> (HADOOP-6510's APIs).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-1988) Make the delegation token issued by the MetaStore owned by the right user

2011-03-25 Thread Devaraj Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das updated HIVE-1988:
--

Attachment: hive-1988-3.patch

https://reviews.apache.org/r/528/ is the reviewboard page. 

> Make the delegation token issued by the MetaStore owned by the right user
> -
>
> Key: HIVE-1988
> URL: https://issues.apache.org/jira/browse/HIVE-1988
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Security, Server Infrastructure
>Affects Versions: 0.7.0
>Reporter: Devaraj Das
> Fix For: 0.8.0
>
> Attachments: hive-1988-3.patch, hive-1988.patch
>
>
> The 'owner' of any delegation token issued by the MetaStore is set to the 
> requesting user. When a delegation token is asked by the user himself during 
> a job submission, this is fine. However, in the case where the token is 
> requested for by services (e.g., Oozie), on behalf of the user, the token's 
> owner is set to the user the service is running as. Later on, when the token 
> is used by a MapReduce task, the MetaStore treats the incoming request as 
> coming from Oozie and does operations as Oozie. This means any new directory 
> creations (e.g., create_table) on the hdfs by the MetaStore will end up with 
> Oozie as the owner.
> Also, the MetaStore doesn't check whether a user asking for a token on behalf 
> of some other user, is actually authorized to act on behalf of that other 
> user. We should start using the ProxyUser authorization in the MetaStore 
> (HADOOP-6510's APIs).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HIVE-1988) Make the delegation token issued by the MetaStore owned by the right user

2011-03-25 Thread Devaraj Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das reassigned HIVE-1988:
-

Assignee: Devaraj Das

> Make the delegation token issued by the MetaStore owned by the right user
> -
>
> Key: HIVE-1988
> URL: https://issues.apache.org/jira/browse/HIVE-1988
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Security, Server Infrastructure
>Affects Versions: 0.7.0
>Reporter: Devaraj Das
>Assignee: Devaraj Das
> Fix For: 0.8.0
>
> Attachments: hive-1988-3.patch, hive-1988.patch
>
>
> The 'owner' of any delegation token issued by the MetaStore is set to the 
> requesting user. When a delegation token is asked by the user himself during 
> a job submission, this is fine. However, in the case where the token is 
> requested for by services (e.g., Oozie), on behalf of the user, the token's 
> owner is set to the user the service is running as. Later on, when the token 
> is used by a MapReduce task, the MetaStore treats the incoming request as 
> coming from Oozie and does operations as Oozie. This means any new directory 
> creations (e.g., create_table) on the hdfs by the MetaStore will end up with 
> Oozie as the owner.
> Also, the MetaStore doesn't check whether a user asking for a token on behalf 
> of some other user, is actually authorized to act on behalf of that other 
> user. We should start using the ProxyUser authorization in the MetaStore 
> (HADOOP-6510's APIs).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-1988) Make the delegation token issued by the MetaStore owned by the right user

2011-03-25 Thread Devaraj Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das updated HIVE-1988:
--

Fix Version/s: 0.8.0
   Status: Patch Available  (was: Open)

> Make the delegation token issued by the MetaStore owned by the right user
> -
>
> Key: HIVE-1988
> URL: https://issues.apache.org/jira/browse/HIVE-1988
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Security, Server Infrastructure
>Affects Versions: 0.7.0
>Reporter: Devaraj Das
>Assignee: Devaraj Das
> Fix For: 0.8.0
>
> Attachments: hive-1988-3.patch, hive-1988.patch
>
>
> The 'owner' of any delegation token issued by the MetaStore is set to the 
> requesting user. When a delegation token is asked by the user himself during 
> a job submission, this is fine. However, in the case where the token is 
> requested for by services (e.g., Oozie), on behalf of the user, the token's 
> owner is set to the user the service is running as. Later on, when the token 
> is used by a MapReduce task, the MetaStore treats the incoming request as 
> coming from Oozie and does operations as Oozie. This means any new directory 
> creations (e.g., create_table) on the hdfs by the MetaStore will end up with 
> Oozie as the owner.
> Also, the MetaStore doesn't check whether a user asking for a token on behalf 
> of some other user, is actually authorized to act on behalf of that other 
> user. We should start using the ProxyUser authorization in the MetaStore 
> (HADOOP-6510's APIs).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-1095) Hive in Maven

2011-03-25 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13011527#comment-13011527
 ] 

Carl Steinbach commented on HIVE-1095:
--

@Giridharan: 0.8.0 is the current trunk development branch. We haven't voted to 
release this version yet, so we shouldn't publish anything other than 0.8.0 
SNAPSHOTs to the snapshot repository. If you want to publish artifacts to the 
release repo then we need to backport this patch to branch-0.7.


> Hive in Maven
> -
>
> Key: HIVE-1095
> URL: https://issues.apache.org/jira/browse/HIVE-1095
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Affects Versions: 0.6.0
>Reporter: Gerrit Jansen van Vuuren
>Priority: Minor
> Attachments: HIVE-1095-trunk.patch, HIVE-1095.v2.PATCH, 
> HIVE-1095.v3.PATCH, HIVE-1095.v4.PATCH, hiveReleasedToMaven.tar.gz
>
>
> Getting hive into maven main repositories
> Documentation on how to do this is on:
> http://maven.apache.org/guides/mini/guide-central-repository-upload.html

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-1095) Hive in Maven

2011-03-25 Thread Giridharan Kesavan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13011521#comment-13011521
 ] 

Giridharan Kesavan commented on HIVE-1095:
--

someone with hive committer access should try this to see if we are able to 
publish to the nexus staging repo.

ant make-maven -Dversion=0.8.0
ant maven-publish -Dversion=0.8.0



> Hive in Maven
> -
>
> Key: HIVE-1095
> URL: https://issues.apache.org/jira/browse/HIVE-1095
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Affects Versions: 0.6.0
>Reporter: Gerrit Jansen van Vuuren
>Priority: Minor
> Attachments: HIVE-1095-trunk.patch, HIVE-1095.v2.PATCH, 
> HIVE-1095.v3.PATCH, HIVE-1095.v4.PATCH, hiveReleasedToMaven.tar.gz
>
>
> Getting hive into maven main repositories
> Documentation on how to do this is on:
> http://maven.apache.org/guides/mini/guide-central-repository-upload.html

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2050) batch processing partition pruning process

2011-03-25 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-2050:
-

Status: Open  (was: Patch Available)

> batch processing partition pruning process
> --
>
> Key: HIVE-2050
> URL: https://issues.apache.org/jira/browse/HIVE-2050
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-2050.patch
>
>
> For partition predicates that cannot be pushed down to JDO filtering 
> (HIVE-2049), we should fall back to the old approach of listing all partition 
> names first and use Hive's expression evaluation engine to select the correct 
> partitions. Then the partition pruner should hand Hive a list of partition 
> names and return a list of Partition Object (this should be added to the Hive 
> API). 
> A possible optimization is that the the partition pruner should give Hive a 
> set of ranges of partition names (say [ts=01, ts=11], [ts=20, ts=24]), and 
> the JDO query should be formulated as range queries. Range queries are 
> possible because the first step list all partition names in sorted order. 
> It's easy to come up with a range and it is guaranteed that the JDO range 
> query results should be equivalent to the query with a list of partition 
> names. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2050) batch processing partition pruning process

2011-03-25 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13011518#comment-13011518
 ] 

Namit Jain commented on HIVE-2050:
--

Based on an offline review, this may increase memory, we need to return the
partition names periodically to put a memory bound

> batch processing partition pruning process
> --
>
> Key: HIVE-2050
> URL: https://issues.apache.org/jira/browse/HIVE-2050
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Attachments: HIVE-2050.patch
>
>
> For partition predicates that cannot be pushed down to JDO filtering 
> (HIVE-2049), we should fall back to the old approach of listing all partition 
> names first and use Hive's expression evaluation engine to select the correct 
> partitions. Then the partition pruner should hand Hive a list of partition 
> names and return a list of Partition Object (this should be added to the Hive 
> API). 
> A possible optimization is that the the partition pruner should give Hive a 
> set of ranges of partition names (say [ts=01, ts=11], [ts=20, ts=24]), and 
> the JDO query should be formulated as range queries. Range queries are 
> possible because the first step list all partition names in sorted order. 
> It's easy to come up with a range and it is guaranteed that the JDO range 
> query results should be equivalent to the query with a list of partition 
> names. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-1095) Hive in Maven

2011-03-25 Thread Giridharan Kesavan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13011514#comment-13011514
 ] 

Giridharan Kesavan commented on HIVE-1095:
--

I got the same error as Amareshwari mentioned in the previous comment. But I 
fixed locally by changing the dependency of make-pom target. ie made the 
make-pom target to depend on check-ivy instead of ivy-init.
{code:xml}build-common.xml 
- 
+ 
{code}

This seem to work fine.. 

About snapshot versioning:
I think we cannot publish snapshots to the staging/release repository. 
Snapshots can only go the the snapshots repo. 
We need to fix the version string so that we can publish to the staging/release 
repo.

> Hive in Maven
> -
>
> Key: HIVE-1095
> URL: https://issues.apache.org/jira/browse/HIVE-1095
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Affects Versions: 0.6.0
>Reporter: Gerrit Jansen van Vuuren
>Priority: Minor
> Attachments: HIVE-1095-trunk.patch, HIVE-1095.v2.PATCH, 
> HIVE-1095.v3.PATCH, HIVE-1095.v4.PATCH, hiveReleasedToMaven.tar.gz
>
>
> Getting hive into maven main repositories
> Documentation on how to do this is on:
> http://maven.apache.org/guides/mini/guide-central-repository-upload.html

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: [VOTE] Hive 0.7.0 Release Candidate 1

2011-03-25 Thread Carl Steinbach
Thanks for voting everyone!

RC1 has been approved as the official Hive 0.7.0 release after receiving
five +1 votes from Hive PMC members.

I'll send out an official announcement in 24 hours.

Thanks.

Carl


On Fri, Mar 25, 2011 at 3:32 PM, Namit Jain  wrote:

> +1 - go ahead
>
>
> On 3/25/11 12:53 PM, "John Sichi"  wrote:
>
> >+1, ship it!
> >
> >JVS
> >
> >On Mar 20, 2011, at 2:32 AM, Carl Steinbach wrote:
> >
> >> Hive 0.7.0 Release Candidate 1 is available here:
> >>
> >> http://people.apache.org/~cws/hive-0.7.0-candidate-1
> >>
> >> We need 3 +1 votes from Hive PMC members in order to release. Please
> >>vote.
> >
>
>


Re: [VOTE] Hive 0.7.0 Release Candidate 1

2011-03-25 Thread Namit Jain
+1 - go ahead


On 3/25/11 12:53 PM, "John Sichi"  wrote:

>+1, ship it!
>
>JVS
>
>On Mar 20, 2011, at 2:32 AM, Carl Steinbach wrote:
>
>> Hive 0.7.0 Release Candidate 1 is available here:
>> 
>> http://people.apache.org/~cws/hive-0.7.0-candidate-1
>> 
>> We need 3 +1 votes from Hive PMC members in order to release. Please
>>vote.
>



Re: [VOTE] Hive 0.7.0 Release Candidate 1

2011-03-25 Thread Ashish Thusoo
+1 

Thanks Carl!

Ashish
On Mar 25, 2011, at 3:08 PM, Ning Zhang wrote:

> +1.
> 
> Thanks Carl!
> 
> On Mar 20, 2011, at 2:32 AM, Carl Steinbach wrote:
> 
>> Hive 0.7.0 Release Candidate 1 is available here:
>> 
>> http://people.apache.org/~cws/hive-0.7.0-candidate-1
>> 
>> We need 3 +1 votes from Hive PMC members in order to release. Please vote.
> 



Re: [VOTE] Hive 0.7.0 Release Candidate 1

2011-03-25 Thread Ning Zhang
+1.

Thanks Carl!

On Mar 20, 2011, at 2:32 AM, Carl Steinbach wrote:

> Hive 0.7.0 Release Candidate 1 is available here:
> 
> http://people.apache.org/~cws/hive-0.7.0-candidate-1
> 
> We need 3 +1 votes from Hive PMC members in order to release. Please vote.



[jira] [Updated] (HIVE-1434) Cassandra Storage Handler

2011-03-25 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-1434:
-

Status: Open  (was: Patch Available)

Some other things which need to be addressed:

* Apache headers are missing on many new files
* all commented-out code should be removed
* new classes (e.g. CassandraStorageHandler) should have Javadoc (and for ones 
that have it, like CassandraQTestUtil, eliminate copy-and-paste evidence)
* there is a file in the patch with the name 
cassandra-handler/src/test/results/cassandra_queries; I don't think it's 
supposed to be there (there should only be the .q.out file)

For the HBase handler, there's a wiki page; it would be good to have one here 
too.

Also, for HBase, we originally had some bugs with joins against tables with 
different schemas (and for joining HBase vs non-HBase tables), so you probably 
want to add some tests for those similar to the ones in hbase_queries.q and 
hbase_joins.q.


> Cassandra Storage Handler
> -
>
> Key: HIVE-1434
> URL: https://issues.apache.org/jira/browse/HIVE-1434
> Project: Hive
>  Issue Type: New Feature
>Affects Versions: 0.7.0
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
> Attachments: cas-handle.tar.gz, cass_handler.diff, hive-1434-1.txt, 
> hive-1434-2-patch.txt, hive-1434-2011-02-26.patch.txt, 
> hive-1434-2011-03-07.patch.txt, hive-1434-2011-03-07.patch.txt, 
> hive-1434-2011-03-14.patch.txt, hive-1434-3-patch.txt, hive-1434-4-patch.txt, 
> hive-1434-5.patch.txt, hive-1434.2011-02-27.diff.txt, 
> hive-cassandra.2011-02-25.txt, hive.diff
>
>
> Add a cassandra storage handler.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Review Request: HIVE-2050. batch processing partition pruning process

2011-03-25 Thread Ning Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/522/
---

(Updated 2011-03-25 13:50:22.615065)


Review request for hive.


Changes
---

The previous patch is too large due to thrift-generated files. This is a 
Java-only patch by removing all thrift-generated files.


Summary
---

Introducing a new metastore API to retrieve a list of partitions in batch. 


Diffs (updated)
-

  trunk/metastore/if/hive_metastore.thrift 108 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
108 
  
trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
 108 
  
trunk/metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 
108 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 
108 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 
108 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 108 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartExprEvalUtils.java
 108 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java 
108 

Diff: https://reviews.apache.org/r/522/diff


Testing
---


Thanks,

Ning



Re: [VOTE] Hive 0.7.0 Release Candidate 1

2011-03-25 Thread John Sichi
+1, ship it!

JVS

On Mar 20, 2011, at 2:32 AM, Carl Steinbach wrote:

> Hive 0.7.0 Release Candidate 1 is available here:
> 
> http://people.apache.org/~cws/hive-0.7.0-candidate-1
> 
> We need 3 +1 votes from Hive PMC members in order to release. Please vote.



[jira] [Commented] (HIVE-1803) Implement bitmap indexing in Hive

2011-03-25 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13011380#comment-13011380
 ] 

John Sichi commented on HIVE-1803:
--

Right, without row-level skipping, the main use case is AND/OR for block 
filtering.

I'd suggest we get this committed without row-level skipping, and then create a 
followup for that.  Besides AND/OR, having the bitmap index build/access code 
committed will be useful for others working on related issues such as automatic 
usage in the WHERE clause.


> Implement bitmap indexing in Hive
> -
>
> Key: HIVE-1803
> URL: https://issues.apache.org/jira/browse/HIVE-1803
> Project: Hive
>  Issue Type: New Feature
>  Components: Indexing
>Reporter: Marquis Wang
>Assignee: Marquis Wang
> Attachments: HIVE-1803.1.patch, HIVE-1803.2.patch, HIVE-1803.3.patch, 
> HIVE-1803.4.patch, HIVE-1803.5.patch, HIVE-1803.6.patch, HIVE-1803.7.patch, 
> JavaEWAH_20110304.zip, bitmap_index_1.png, bitmap_index_2.png, javaewah.jar, 
> javaewah.jar
>
>
> Implement bitmap index handler to complement compact indexing.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Build failed in Jenkins: Hive-trunk-h0.20 #642

2011-03-25 Thread Apache Hudson Server
See 

--
[...truncated 3 lines...]
[junit] OK
[junit] PREHOOK: query: select count(1) as cnt from testhivedrivertable
[junit] PREHOOK: type: QUERY
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: 
file:/tmp/hudson/hive_2011-03-25_12-14-32_149_5296886701504161620/-mr-1
[junit] Total MapReduce jobs = 1
[junit] Launching Job 1 out of 1
[junit] Number of reduce tasks determined at compile time: 1
[junit] In order to change the average load for a reducer (in bytes):
[junit]   set hive.exec.reducers.bytes.per.reducer=
[junit] In order to limit the maximum number of reducers:
[junit]   set hive.exec.reducers.max=
[junit] In order to set a constant number of reducers:
[junit]   set mapred.reduce.tasks=
[junit] Job running in-process (local Hadoop)
[junit] Hadoop job information for null: number of mappers: 0; number of 
reducers: 0
[junit] 2011-03-25 12:14:35,165 null map = 100%,  reduce = 100%
[junit] Ended Job = job_local_0001
[junit] POSTHOOK: query: select count(1) as cnt from testhivedrivertable
[junit] POSTHOOK: type: QUERY
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 
file:/tmp/hudson/hive_2011-03-25_12-14-32_149_5296886701504161620/-mr-1
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (num int)
[junit] PREHOOK: type: CREATETABLE
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] POSTHOOK: type: CREATETABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: load data local inpath 
'
 into table testhivedrivertable
[junit] PREHOOK: type: LOAD
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] Copying data from 

[junit] Loading data to table default.testhivedrivertable
[junit] POSTHOOK: query: load data local inpath 
'
 into table testhivedrivertable
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: select * from testhivedrivertable limit 10
[junit] PREHOOK: type: QUERY
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: 
file:/tmp/hudson/hive_2011-03-25_12-14-36_712_4202806813542269420/-mr-1
[junit] POSTHOOK: query: select * from testhivedrivertable limit 10
[junit] POSTHOOK: type: QUERY
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 
file:/tmp/hudson/hive_2011-03-25_12-14-36_712_4202806813542269420/-mr-1
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (num int)
[junit] PREHOOK: type: CREATETABLE
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] POSTHOOK: type: CREATETABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[j

Build failed in Jenkins: Hive-0.7.0-h0.20 #53

2011-03-25 Thread Apache Hudson Server
See 

--
[...truncated 26900 lines...]
[junit] Loading data to table default.srcbucket2
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'
 INTO TABLE srcbucket2
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@srcbucket2
[junit] OK
[junit] Copying file: 

[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'
 INTO TABLE src
[junit] PREHOOK: type: LOAD
[junit] Copying data from 

[junit] Loading data to table default.src
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'
 INTO TABLE src
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@src
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'
 INTO TABLE src1
[junit] PREHOOK: type: LOAD
[junit] Copying data from 

[junit] Copying file: 

[junit] Loading data to table default.src1
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'
 INTO TABLE src1
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@src1
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'
 INTO TABLE src_sequencefile
[junit] PREHOOK: type: LOAD
[junit] Copying data from 

[junit] Copying file: 

[junit] Loading data to table default.src_sequencefile
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'
 INTO TABLE src_sequencefile
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@src_sequencefile
[junit] OK
[junit] Copying file: 

[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'
 INTO TABLE src_thrift
[junit] PREHOOK: type: LOAD
[junit] Copying data from 

[junit] Loading data to table default.src_thrift
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'
 INTO TABLE src_thrift
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@src_thrift
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'
 INTO TABLE src_json
[junit] PREHOOK: type: LOAD
[junit] Copying data from 

[junit] Copying file: 

[junit] Loading data to table default.src_json
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'
 INTO TABLE src_json
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@src_json
[junit] OK
[junit] diff 

 

[junit] Done query: wrong_distinct1.q
[junit] Hive history 
file=
[junit] Begin query: wrong_distinct2.q
[junit] Hive history 
file=
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'
 OVERWRITE INTO TABLE srcpart PARTITION (ds='2008-04-08',hr='11')
[junit] PREHOOK: type: LOAD
[junit] Copying data from 


[jira] [Commented] (HIVE-1199) configure total number of mappers

2011-03-25 Thread Adam Kramer (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13011302#comment-13011302
 ] 

Adam Kramer commented on HIVE-1199:
---

+1. This is also a bigger issue for automation of jobs that require tweaking 
the amount of resources. I have a job right now that needs about 10x the number 
of mappers to run smoothly, and I would like to pipeline it, but the data size 
is growing...so if I configure the split sizes, I need to do so based on 
today's size of the table. That should be handled by Hive.

Ideally, this would mean that the split.sizes are generated or recomputed 
dynamically. One variable, mapred.map.tasks.approx, could be set or 
unset...then Hive could do some quick math based on the size of the table and 
dynamically set its own mapred.max.split.size and min.split.size to get 
approximately the desired number of mappers. Doesn't have to be perfect in 
order to be useful!

> configure total number of mappers
> -
>
> Key: HIVE-1199
> URL: https://issues.apache.org/jira/browse/HIVE-1199
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>
> For users, it might be very difficult to control the number of mappers. There 
> are many parameters which confuses the users - 
> for CombineHiveInputFormat, a different set of parameters is required to 
> control the number of mappers.
> In general, users should have a way to specify the total number of mappers, 
> which should be obeyed. This will be very difficult
> to guarantee, since the query might be reading from a large number of 
> partitions, where a mapper can only span one partition.
> What if the number of mappers that the user wants is less than the total 
> number of partitions ?
> It would be a very hueristic to have - a simple usecase that Joy had is as 
> follows:
> A query needs to be run on one table, which has a lot of small files - it 
> will be easy for him to specify the total number of mappers
> rather than the various rac local/node local combinefileinputformat 
> parameters.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2076) Provide Metastore upgrade scripts and default schemas for PostgreSQL

2011-03-25 Thread Yuanjun Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13011182#comment-13011182
 ] 

Yuanjun Li commented on HIVE-2076:
--

This patch can be applied to HIVE-0.7 branch too. Can the fix version set 
[0.7.0,0.8.0] be better?

> Provide Metastore upgrade scripts and default schemas for PostgreSQL
> 
>
> Key: HIVE-2076
> URL: https://issues.apache.org/jira/browse/HIVE-2076
> Project: Hive
>  Issue Type: Task
>  Components: Metastore
>Reporter: Carl Steinbach
>Assignee: Yuanjun Li
> Fix For: 0.8.0
>
> Attachments: HIVE-2011.postgres.1.patch
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2015) Eliminate bogus Datanucleus.Plugin Bundle ERROR log messages

2011-03-25 Thread Andy Jefferson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13011155#comment-13011155
 ] 

Andy Jefferson commented on HIVE-2015:
--

or just use recent DataNucleus (3.0Mx) which, by default, omits checks on OSGi 
dependencies.

PS. if having such issues with third party software i'd expect people to go to 
that third-party software and register an issue there to be able to turn 
something off etc, rather than rely on that projects developers to just happen 
across issues like this in a web trawl.

> Eliminate bogus Datanucleus.Plugin Bundle ERROR log messages
> 
>
> Key: HIVE-2015
> URL: https://issues.apache.org/jira/browse/HIVE-2015
> Project: Hive
>  Issue Type: Bug
>  Components: Diagnosability, Metastore
>Reporter: Carl Steinbach
>
> Every time I start up the Hive CLI with logging enabled I'm treated to the 
> following ERROR log messages courtesy of DataNucleus:
> {code}
> DEBUG metastore.ObjectStore: datanucleus.plugin.pluginRegistryBundleCheck = 
> LOG 
> ERROR DataNucleus.Plugin: Bundle "org.eclipse.jdt.core" requires 
> "org.eclipse.core.resources" but it cannot be resolved. 
> ERROR DataNucleus.Plugin: Bundle "org.eclipse.jdt.core" requires 
> "org.eclipse.core.runtime" but it cannot be resolved. 
> ERROR DataNucleus.Plugin: Bundle "org.eclipse.jdt.core" requires 
> "org.eclipse.text" but it cannot be resolved.
> {code}
> Here's where this comes from:
> * The bin/hive scripts cause Hive to inherit Hadoop's classpath.
> * Hadoop's classpath includes $HADOOP_HOME/lib/core-3.1.1.jar, an Eclipse 
> library.
> * core-3.1.1.jar includes a plugin.xml file defining an OSGI plugin
> * At startup, Datanucleus scans the classpath looking for OSGI plugins, and 
> will attempt to initialize any that it finds, including the Eclipse OSGI 
> plugins located in core-3.1.1.jar
> * Initialization of the OSGI plugin in core-3.1.1.jar fails because of 
> unresolved dependencies.
> * We see an ERROR message telling us that Datanucleus failed to initialize a 
> plugin that we don't care about in the first place.
> I can think of two options for solving this problem:
> # Rewrite the scripts in $HIVE_HOME/bin so that they don't inherit ALL of 
> Hadoop's CLASSPATH.
> # Replace DataNucleus's NOnManagedPluginRegistry with our own implementation 
> that does nothing.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HIVE-1884) Potential risk of resource leaks in Hive

2011-03-25 Thread Chinna Rao Lalam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chinna Rao Lalam reassigned HIVE-1884:
--

Assignee: Chinna Rao Lalam  (was: Mohit Sikri)

> Potential risk of resource leaks in Hive
> 
>
> Key: HIVE-1884
> URL: https://issues.apache.org/jira/browse/HIVE-1884
> Project: Hive
>  Issue Type: Bug
>  Components: CLI, Metastore, Query Processor, Server Infrastructure
>Affects Versions: 0.3.0, 0.4.0, 0.4.1, 0.5.0, 0.6.0
> Environment: Hive 0.6.0, Hadoop 0.20.1
> SUSE Linux Enterprise Server 11 (i586)
>Reporter: Mohit Sikri
>Assignee: Chinna Rao Lalam
> Attachments: HIVE-1884.1.PATCH
>
>
> h3.There are couple of resource leaks.
> h4.For example,
> In CliDriver.java, Method :- processReader() the buffered reader is not 
> closed.
> h3.Also there are risk(s) of  resource(s) getting leaked , in such cases we 
> need to re factor the code to move closing of resources in finally block.
> h4. For Example :- 
> In Throttle.java   Method:- checkJobTracker() , the following code snippet 
> might cause resource leak.
> {code}
> InputStream in = url.openStream();
> in.read(buffer);
> in.close();
> {code}
> Ideally and as per the best coding practices it should be like below
> {code}
> InputStream in=null;
> try   {
> in = url.openStream();
> int numRead = in.read(buffer);
> }
> finally {
>IOUtils.closeStream(in);
> }
> {code}
> Similar cases, were found in ExplainTask.java, DDLTask.java etc.Need to re 
> factor all such occurrences.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Map-side aggregates

2011-03-25 Thread Joerg Schad

Hi,
are there some more documents on a) details of usage/advantages and b) the 
implementation of Map-side aggregates?
Thanks a lot for your answers
Jörg