[jira] [Updated] (HIVE-1095) Hive in Maven

2011-05-18 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-1095:
--

   Resolution: Fixed
Fix Version/s: 0.8.0
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

I just committed this to trunk and branch 0.7.

Thanks Gerrit and Carl !


 Hive in Maven
 -

 Key: HIVE-1095
 URL: https://issues.apache.org/jira/browse/HIVE-1095
 Project: Hive
  Issue Type: Task
  Components: Build Infrastructure
Affects Versions: 0.6.0
Reporter: Gerrit Jansen van Vuuren
Priority: Minor
 Fix For: 0.7.1, 0.8.0

 Attachments: HIVE-1095-trunk.patch, HIVE-1095.7.patch.txt, 
 HIVE-1095.v2.PATCH, HIVE-1095.v3.PATCH, HIVE-1095.v4.PATCH, 
 HIVE-1095.v5.PATCH, HIVE-1095.v6.patch, hiveReleasedToMaven.tar.gz, 
 make-maven.log


 Getting hive into maven main repositories
 Documentation on how to do this is on:
 http://maven.apache.org/guides/mini/guide-central-repository-upload.html

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-1095) Hive in Maven

2011-05-18 Thread Gerrit Jansen van Vuuren (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13035278#comment-13035278
 ] 

Gerrit Jansen van Vuuren commented on HIVE-1095:


Great

Thanks Carl, Amareshwari for seeing this through.

 Hive in Maven
 -

 Key: HIVE-1095
 URL: https://issues.apache.org/jira/browse/HIVE-1095
 Project: Hive
  Issue Type: Task
  Components: Build Infrastructure
Affects Versions: 0.6.0
Reporter: Gerrit Jansen van Vuuren
Priority: Minor
 Fix For: 0.7.1, 0.8.0

 Attachments: HIVE-1095-trunk.patch, HIVE-1095.7.patch.txt, 
 HIVE-1095.v2.PATCH, HIVE-1095.v3.PATCH, HIVE-1095.v4.PATCH, 
 HIVE-1095.v5.PATCH, HIVE-1095.v6.patch, hiveReleasedToMaven.tar.gz, 
 make-maven.log


 Getting hive into maven main repositories
 Documentation on how to do this is on:
 http://maven.apache.org/guides/mini/guide-central-repository-upload.html

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Jenkins build is still unstable: Hive-trunk-h0.21 #736

2011-05-18 Thread Apache Jenkins Server
See https://builds.apache.org/hudson/job/Hive-trunk-h0.21/changes




[jira] [Commented] (HIVE-1095) Hive in Maven

2011-05-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13035417#comment-13035417
 ] 

Hudson commented on HIVE-1095:
--

Integrated in Hive-trunk-h0.21 #736 (See 
[https://builds.apache.org/hudson/job/Hive-trunk-h0.21/736/])
HIVE-1095. Hive in Maven. Contributed by Gerrit Jansen van Vuuren, 
Amareshwari Sriramadasu and Carl Steinbach.

amareshwari : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1124164
Files : 
* /hive/trunk/ant/ivy.xml
* /hive/trunk/ivy.xml
* /hive/trunk/jdbc/ivy.xml
* /hive/trunk/ql/ivy.xml
* /hive/trunk/build.xml
* /hive/trunk/service/ivy.xml
* /hive/trunk/hbase-handler/ivy.xml
* /hive/trunk/contrib/ivy.xml
* /hive/trunk/shims/ivy.xml
* /hive/trunk/hwi/ivy.xml
* /hive/trunk/ivy/libraries.properties
* /hive/trunk/metastore/ivy.xml
* /hive/trunk/cli/ivy.xml
* /hive/trunk/serde/ivy.xml
* /hive/trunk/common/ivy.xml
* /hive/trunk/build-common.xml


 Hive in Maven
 -

 Key: HIVE-1095
 URL: https://issues.apache.org/jira/browse/HIVE-1095
 Project: Hive
  Issue Type: Task
  Components: Build Infrastructure
Affects Versions: 0.6.0
Reporter: Gerrit Jansen van Vuuren
Priority: Minor
 Fix For: 0.7.1, 0.8.0

 Attachments: HIVE-1095-trunk.patch, HIVE-1095.7.patch.txt, 
 HIVE-1095.v2.PATCH, HIVE-1095.v3.PATCH, HIVE-1095.v4.PATCH, 
 HIVE-1095.v5.PATCH, HIVE-1095.v6.patch, hiveReleasedToMaven.tar.gz, 
 make-maven.log


 Getting hive into maven main repositories
 Documentation on how to do this is on:
 http://maven.apache.org/guides/mini/guide-central-repository-upload.html

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2161) Remaining patch for HIVE-2148

2011-05-18 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13035462#comment-13035462
 ] 

Ashutosh Chauhan commented on HIVE-2161:


Can some one commit this one, it has already been discussed at HIVE-2148

 Remaining patch for HIVE-2148
 -

 Key: HIVE-2161
 URL: https://issues.apache.org/jira/browse/HIVE-2161
 Project: Hive
  Issue Type: Task
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: 0.8.0

 Attachments: hive_2161.patch


 Follow-up jira for HIVE-2148.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2144) reduce workload generated by JDBCStatsPublisher

2011-05-18 Thread Tomasz Nykiel (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13035493#comment-13035493
 ] 

Tomasz Nykiel commented on HIVE-2144:
-

Currently the schema of the stat table is the following:

PARTITION_STAT_TABLE ( ID VARCHAR(255), ROW_COUNT BIGINT ) and does not have 
any integrity constraints declared.

We can amend it to:

PARTITION_STAT_TABLE ( ID VARCHAR(255) UNIQUE , ROW_COUNT BIGINT ).

Then instead of executing two queries per row inserted, we can execute one 
INSERT query, as we do currently.
In the case when the integrity constraint is violated, via the unique index, 
which can be caught by an exception, we perform a single UPDATE query.
The UPDATE query needs to check the condition, if the currently inserted stats 
are newer then the ones already in the table:

UPDATE PARTITION_STAT_TBL SET ROW_COUNT = new_value
WHERE ID = rowID AND
(0)new_value 
(1)(SELECT TEMP.ROW_COUNT FROM
(2)(SELECT ROW_COUNT FROM PARTITION_STAT_TBL WHERE ID = 
rowID) TEMP )

--(0) is a condition that checks if the newly inserted value is greater that 
the one we already have.
--(1) and (2) is a work-around for MySQL, which does not allow to refer to the 
table that occurs in the update statement. Here, we basically materialize the 
value that we need for comparison.
--(1) should theoretically have (LIMIT 1) to choose exactly one tuple, however 
Derby does not support it, and by the unique constraint, and the fact that the 
insert failed, there exists exactly one tuple matching the ID predicate.

To summarize, for non existing rows, only one insert query will be executed, 
instead of two.
For existing rows, which seems to occur very infrequently, two queries instead 
of three will be executed.


 reduce workload generated by JDBCStatsPublisher
 ---

 Key: HIVE-2144
 URL: https://issues.apache.org/jira/browse/HIVE-2144
 Project: Hive
  Issue Type: Improvement
Reporter: Ning Zhang
Assignee: Tomasz Nykiel

 In JDBCStatsPublisher, we first try a SELECT query to see if the specific ID 
 was inserted by another task (mostly likely a speculative or previously 
 failed task). Depending on if the ID is there, an INSERT or UPDATE query was 
 issues. So there are basically 2x of queries per row inserted into the 
 intermediate stats table. This workload could be reduced to 1/2 if we insert 
 it anyway (it is very rare that IDs are duplicated) and use a different SQL 
 query in the aggregation phase to dedup the ID (e.g., using group-by and 
 max()). The benefits are that even though the aggregation query is more 
 expensive, it is only run once per query. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2144) reduce workload generated by JDBCStatsPublisher

2011-05-18 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13035612#comment-13035612
 ] 

Ning Zhang commented on HIVE-2144:
--

Great! I like the idea. 

One comment about the primary key constraint: I'm not sure if UNIQUE is the 
standard way to specify primary key constraint. There are people using 
Oralce/MS SQL sever/Postgres as metastore, we should use a standard way. I 
think 'id varchar(255) PRIMARY KEY' is more widely supported. Can you double 
check with mysql and derby?

 reduce workload generated by JDBCStatsPublisher
 ---

 Key: HIVE-2144
 URL: https://issues.apache.org/jira/browse/HIVE-2144
 Project: Hive
  Issue Type: Improvement
Reporter: Ning Zhang
Assignee: Tomasz Nykiel

 In JDBCStatsPublisher, we first try a SELECT query to see if the specific ID 
 was inserted by another task (mostly likely a speculative or previously 
 failed task). Depending on if the ID is there, an INSERT or UPDATE query was 
 issues. So there are basically 2x of queries per row inserted into the 
 intermediate stats table. This workload could be reduced to 1/2 if we insert 
 it anyway (it is very rare that IDs are duplicated) and use a different SQL 
 query in the aggregation phase to dedup the ID (e.g., using group-by and 
 max()). The benefits are that even though the aggregation query is more 
 expensive, it is only run once per query. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Jenkins build is still unstable: Hive-trunk-h0.21 #737

2011-05-18 Thread Apache Jenkins Server
See https://builds.apache.org/hudson/job/Hive-trunk-h0.21/changes




[jira] [Commented] (HIVE-2144) reduce workload generated by JDBCStatsPublisher

2011-05-18 Thread Tomasz Nykiel (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13035634#comment-13035634
 ] 

Tomasz Nykiel commented on HIVE-2144:
-

Yes, I agree. There are some subtle differences between UNIQUE and PK in Derby 
and MySQL (e.g., in MySQL the unique index allows null values, and in Derby it 
does not. So in general, PK constraint will be more suitable.

CREATE TABLE PARTITION_STAT_TBL ( IDE VARCHAR(255) PRIMARY KEY, ROW_COUNT 
BIGINT ) works for both Derby and MySql.
After a quick check it seems that it's supported by Oracle/MSSQL as well.



 reduce workload generated by JDBCStatsPublisher
 ---

 Key: HIVE-2144
 URL: https://issues.apache.org/jira/browse/HIVE-2144
 Project: Hive
  Issue Type: Improvement
Reporter: Ning Zhang
Assignee: Tomasz Nykiel

 In JDBCStatsPublisher, we first try a SELECT query to see if the specific ID 
 was inserted by another task (mostly likely a speculative or previously 
 failed task). Depending on if the ID is there, an INSERT or UPDATE query was 
 issues. So there are basically 2x of queries per row inserted into the 
 intermediate stats table. This workload could be reduced to 1/2 if we insert 
 it anyway (it is very rare that IDs are duplicated) and use a different SQL 
 query in the aggregation phase to dedup the ID (e.g., using group-by and 
 max()). The benefits are that even though the aggregation query is more 
 expensive, it is only run once per query. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2096) throw a error if the input is larger than a threshold for index input format

2011-05-18 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13035965#comment-13035965
 ] 

He Yongqiang commented on HIVE-2096:


will commit after tests pass.

 throw a error if the input is larger than a threshold for index input format
 

 Key: HIVE-2096
 URL: https://issues.apache.org/jira/browse/HIVE-2096
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Namit Jain
 Attachments: HIVE-2096.1.patch.txt, HIVE-2096.2.patch.txt, 
 HIVE-2096.3.patch.txt, HIVE-2096.4.patch.txt


 This can hang for ever.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira