[jira] [Updated] (HIVE-2184) Few improvements in org.apache.hadoop.hive.ql.metadata.Hive.close()

2011-05-25 Thread Chinna Rao Lalam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chinna Rao Lalam updated HIVE-2184:
---

Attachment: HIVE-2184.patch

> Few improvements in org.apache.hadoop.hive.ql.metadata.Hive.close()
> ---
>
> Key: HIVE-2184
> URL: https://issues.apache.org/jira/browse/HIVE-2184
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.5.0, 0.8.0
> Environment: Hadoop 0.20.1, Hive0.8.0 and SUSE Linux Enterprise 
> Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5)
>Reporter: Chinna Rao Lalam
>Assignee: Chinna Rao Lalam
> Attachments: HIVE-2184.patch
>
>
> 1)Hive.close() will call HiveMetaStoreClient.close() in this method the 
> variable "standAloneClient" is never become true then client.shutdown() never 
> call.
> 2)Hive.close() After calling metaStoreClient.close() need to make 
> metaStoreClient=null

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Review Request: extend table statistics to store the size of uncompressed data (+extend interfaces for collecting other types of statistics)

2011-05-25 Thread Tomasz Nykiel

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/785/
---

Review request for hive.


Summary
---

Currently, when executing INSERT OVERWRITE and ANALYZE TABLE commands we 
collect statistics about the number of rows per partition/table. 
Other statistics (e.g., total table/partition size) are derived from the file 
system.

We introduce a new feature for collecting information about the sizes of 
uncompressed data, to be able to determine the efficiency of compression.
On top of adding the new statistic collected, this patch extends the stats 
collection mechanism, so any new statistics could be added easily.

1. serializer/deserializer classes are amended to accommodate collecting sizes 
of uncompressed data, when serializing/deserializing objects.
We support:

Columnar SerDe
LazySimpleSerDe
LazyBinarySerDe

For other SerDe classes the uncompressed siez will be 0.

2. StatsPublisher / StatsAggregator interfaces are extended to support 
multi-stats collection for both JDBC and HBase.

3. For both INSERT OVERWRITE and ANALYZE statements, FileSinkOperator and 
TableScanOperator respectively are extended to support multi-stats collection.

(2) and (3) enable easy extension for other types of statistics.

4. Collecting uncompressed size can be disabled by setting:

hive.stats.collect.uncompressedsize = false


This addresses bug HIVE-2185.
https://issues.apache.org/jira/browse/HIVE-2185


Diffs
-

  trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1127756 
  trunk/contrib/src/java/org/apache/hadoop/hive/contrib/serde2/RegexSerDe.java 
1127756 
  
trunk/contrib/src/java/org/apache/hadoop/hive/contrib/serde2/TypedBytesSerDe.java
 1127756 
  
trunk/contrib/src/java/org/apache/hadoop/hive/contrib/serde2/s3/S3LogDeserializer.java
 1127756 
  trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java 
1127756 
  
trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsAggregator.java
 1127756 
  
trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsPublisher.java
 1127756 
  
trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsSetupConstants.java
 1127756 
  
trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsUtils.java 
PRE-CREATION 
  trunk/hbase-handler/src/test/queries/hbase_stats.q 1127756 
  trunk/hbase-handler/src/test/results/hbase_stats.q.out 1127756 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java 
1127756 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/MapOperator.java 1127756 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Stat.java 1127756 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/StatsTask.java 1127756 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java 
1127756 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/VirtualColumn.java 
1127756 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java
 1127756 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
1127756 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 1127756 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsAggregator.java 
1127756 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsPublisher.java 1127756 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsSetupConst.java 
1127756 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsAggregator.java 
1127756 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsPublisher.java 
1127756 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsSetupConstants.java
 1127756 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsUtils.java 
PRE-CREATION 
  trunk/ql/src/test/org/apache/hadoop/hive/ql/exec/TestStatsPublisher.java 
1127756 
  
trunk/ql/src/test/org/apache/hadoop/hive/ql/exec/TestStatsPublisherEnhanced.java
 PRE-CREATION 
  trunk/ql/src/test/org/apache/hadoop/hive/serde2/TestSerDe.java 1127756 
  trunk/ql/src/test/queries/clientpositive/stats14.q PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/stats15.q PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/bucketmapjoin1.q.out 1127756 
  trunk/ql/src/test/results/clientpositive/bucketmapjoin2.q.out 1127756 
  trunk/ql/src/test/results/clientpositive/bucketmapjoin3.q.out 1127756 
  trunk/ql/src/test/results/clientpositive/bucketmapjoin4.q.out 1127756 
  trunk/ql/src/test/results/clientpositive/bucketmapjoin5.q.out 1127756 
  trunk/ql/src/test/results/clientpositive/combine2.q.out 1127756 
  trunk/ql/src/test/results/clientpositive/filter_join_breaktask.q.out 1127756 
  trunk/ql/src/test/results/clientpositive/join_map_ppr.q.out 1127756 
  trunk/ql/src/test/results/clientpositive/merge3.q.out 1127756 
  trunk/ql/src/test/results/clientpositive/merge4.q.out

[jira] [Updated] (HIVE-2185) extend table statistics to store the size of uncompressed data (+extend interfaces for collecting other types of statistics)

2011-05-25 Thread Tomasz Nykiel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomasz Nykiel updated HIVE-2185:


Attachment: HIVE-2185.patch

> extend table statistics to store the size of uncompressed data (+extend 
> interfaces for collecting other types of statistics)
> 
>
> Key: HIVE-2185
> URL: https://issues.apache.org/jira/browse/HIVE-2185
> Project: Hive
>  Issue Type: New Feature
>  Components: Serializers/Deserializers, Statistics
>Reporter: Tomasz Nykiel
>Assignee: Tomasz Nykiel
> Attachments: HIVE-2185.patch
>
>
> Currently, when executing INSERT OVERWRITE and ANALYZE TABLE commands we 
> collect statistics about the number of rows per partition/table. Other 
> statistics (e.g., total table/partition size) are derived from the file 
> system. 
> Here, we want to collect information about the sizes of uncompressed data, to 
> be able to determine the efficiency of compression.
> Currently, a large part of statistics collection mechanism is hardcoded and 
> not-easily extensible for other statistics.
> On top of adding the new statistic collected, it would be desirable to extend 
> the collection mechanism, so any new statistics could be added easily.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2185) extend table statistics to store the size of uncompressed data (+extend interfaces for collecting other types of statistics)

2011-05-25 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13039493#comment-13039493
 ] 

jirapos...@reviews.apache.org commented on HIVE-2185:
-


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/785/
---

Review request for hive.


Summary
---

Currently, when executing INSERT OVERWRITE and ANALYZE TABLE commands we 
collect statistics about the number of rows per partition/table. 
Other statistics (e.g., total table/partition size) are derived from the file 
system.

We introduce a new feature for collecting information about the sizes of 
uncompressed data, to be able to determine the efficiency of compression.
On top of adding the new statistic collected, this patch extends the stats 
collection mechanism, so any new statistics could be added easily.

1. serializer/deserializer classes are amended to accommodate collecting sizes 
of uncompressed data, when serializing/deserializing objects.
We support:

Columnar SerDe
LazySimpleSerDe
LazyBinarySerDe

For other SerDe classes the uncompressed siez will be 0.

2. StatsPublisher / StatsAggregator interfaces are extended to support 
multi-stats collection for both JDBC and HBase.

3. For both INSERT OVERWRITE and ANALYZE statements, FileSinkOperator and 
TableScanOperator respectively are extended to support multi-stats collection.

(2) and (3) enable easy extension for other types of statistics.

4. Collecting uncompressed size can be disabled by setting:

hive.stats.collect.uncompressedsize = false


This addresses bug HIVE-2185.
https://issues.apache.org/jira/browse/HIVE-2185


Diffs
-

  trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1127756 
  trunk/contrib/src/java/org/apache/hadoop/hive/contrib/serde2/RegexSerDe.java 
1127756 
  
trunk/contrib/src/java/org/apache/hadoop/hive/contrib/serde2/TypedBytesSerDe.java
 1127756 
  
trunk/contrib/src/java/org/apache/hadoop/hive/contrib/serde2/s3/S3LogDeserializer.java
 1127756 
  trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java 
1127756 
  
trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsAggregator.java
 1127756 
  
trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsPublisher.java
 1127756 
  
trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsSetupConstants.java
 1127756 
  
trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsUtils.java 
PRE-CREATION 
  trunk/hbase-handler/src/test/queries/hbase_stats.q 1127756 
  trunk/hbase-handler/src/test/results/hbase_stats.q.out 1127756 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java 
1127756 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/MapOperator.java 1127756 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Stat.java 1127756 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/StatsTask.java 1127756 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java 
1127756 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/VirtualColumn.java 
1127756 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java
 1127756 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
1127756 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 1127756 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsAggregator.java 
1127756 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsPublisher.java 1127756 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsSetupConst.java 
1127756 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsAggregator.java 
1127756 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsPublisher.java 
1127756 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsSetupConstants.java
 1127756 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsUtils.java 
PRE-CREATION 
  trunk/ql/src/test/org/apache/hadoop/hive/ql/exec/TestStatsPublisher.java 
1127756 
  
trunk/ql/src/test/org/apache/hadoop/hive/ql/exec/TestStatsPublisherEnhanced.java
 PRE-CREATION 
  trunk/ql/src/test/org/apache/hadoop/hive/serde2/TestSerDe.java 1127756 
  trunk/ql/src/test/queries/clientpositive/stats14.q PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/stats15.q PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/bucketmapjoin1.q.out 1127756 
  trunk/ql/src/test/results/clientpositive/bucketmapjoin2.q.out 1127756 
  trunk/ql/src/test/results/clientpositive/bucketmapjoin3.q.out 1127756 
  trunk/ql/src/test/results/clientpositive/bucketmapjoin4.q.out 1127756 
  trunk/ql/src/test/results/clientpositive/bucketmapjoin5.q.out 1127756 
  trunk/ql/src/test/results/clientpositive/combine2.q.o

[jira] [Commented] (HIVE-2144) reduce workload generated by JDBCStatsPublisher

2011-05-25 Thread Tomasz Nykiel (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13039417#comment-13039417
 ] 

Tomasz Nykiel commented on HIVE-2144:
-

IMPORTANT NOTE!

Before deployment, the primary key constraint needs to be added manually on the 
ID column of PARTITION_STAT_TBL, if the table already exists.
Otherwise, the statistics might be duplicated for some entries, and the 
aggregated statistics will be silently incorrect.

If the table does not exist, it will be created in the proper format.



> reduce workload generated by JDBCStatsPublisher
> ---
>
> Key: HIVE-2144
> URL: https://issues.apache.org/jira/browse/HIVE-2144
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ning Zhang
>Assignee: Tomasz Nykiel
> Fix For: 0.8.0
>
> Attachments: HIVE-2144.1.patch, HIVE-2144.2.patch, HIVE-2144.patch
>
>
> In JDBCStatsPublisher, we first try a SELECT query to see if the specific ID 
> was inserted by another task (mostly likely a speculative or previously 
> failed task). Depending on if the ID is there, an INSERT or UPDATE query was 
> issues. So there are basically 2x of queries per row inserted into the 
> intermediate stats table. This workload could be reduced to 1/2 if we insert 
> it anyway (it is very rare that IDs are duplicated) and use a different SQL 
> query in the aggregation phase to dedup the ID (e.g., using group-by and 
> max()). The benefits are that even though the aggregation query is more 
> expensive, it is only run once per query. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2171) Allow custom serdes to set field comments

2011-05-25 Thread Jakob Homan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated HIVE-2171:
--

Attachment: HIVE-2171.patch

Patch:
* Adds comment field to StructField interface and implements reasonable 
versions to each of its implementations.
* Adds overloaded versions of each of the struct-based ObjectInspector 
factories to allow the comments to be set.
* Adjusts MetastoreUtils to check if the comment of the field is null, if so, 
maintains previous behavior, else uses the comment.
* Adds new unit test for MetastoreUtils.  For this, mockito was added as a 
dependency.  Right now it looks like Hive's Ivy conf isn't set up to only 
include some jars in the package.  If this patch goes in, I'll open another 
jira to make sure the mockito and other test-related jars aren't included in 
jars they don't need to be.
* Refactors the TestStandardObjectInspectors test to test both with and without 
comments.

After this patch, a serde that wants to specify comments can and have them show 
up in the table description. For example, with a table kst created by an 
implementation of SerDe, that has an example for each type (the comments are 
all separate, they're all just boring: this is field BLAH) can now set the 
field comments:
{noformat}hive> describe kst;
OK
string1 string  this field is string1
string2 string  this field is string2
int1int this field is int1
boolean1boolean this field is boolean1
long1   bigint  this field is long1
float1  float   this field is float1
double1 double  this field is double1
inner_record1   struct 
this field is inner_record1
enum1   string  this field is enum1
array1  array   this field is array1
map1map  this field is map1
union1  uniontype this field is union1
fixed1  array  this field is fixed1
null1   voidthis field is null1
unionnullintint this field is UnionNullInt
bytes1  array  this field is bytes1
ds  string
Time taken: 0.286 seconds{noformat}

One thing I noticed is that these field comments on structs should extended to 
substructures, and does with this new patch for custom serdes:
{noformat}hive> describe kst.inner_record1;
OK
int_in_inner_record1int this field is int_in_inner_record1
string_in_inner_record1 string  this field is string_in_inner_record1
Time taken: 0.113 seconds{noformat}

However, this doesn't work correctly with built-in serdes:

{noformat}hive> create table test_table(a STRUCT COMMENT 'comment for a');
OK
Time taken: 2.565 seconds
hive> describe test_table;
OK
a   struct  comment for a
Time taken: 0.139 seconds
hive> describe test_table.a;
OK
z   string  from deserializer
x   int from deserializer
Time taken: 0.096 seconds
hive> describe test_table.a.z;
OK
z   string  from deserializer
Time taken: 0.089 seconds
hive>{noformat}

The comment for field z is lost, replaced by the boilerplate text "from 
deserializer" and can't be retrieved from the CLI.  I'll open a JIRA for this.

This is my first Hive patch, so please check to see if I missed anything.

> Allow custom serdes to set field comments
> -
>
> Key: HIVE-2171
> URL: https://issues.apache.org/jira/browse/HIVE-2171
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 0.7.0
>Reporter: Jakob Homan
>Assignee: Jakob Homan
> Fix For: 0.7.1
>
> Attachments: HIVE-2171.patch
>
>
> Currently, while serde implementations can set a field's name, they can't set 
> its comment.  These are set in the metastore utils to {{(from 
> deserializer)}}.  For those serdes that can provide meaningful comments for a 
> field, they should be propagated to the table description.  These 
> serde-provided comments could be prepended to "(from deserializer)" if others 
> feel that's a meaningful distinction.  This change involves updating 
> {{StructField}} to support a (possibly null) comment field and then 
> propagating this change out to the myriad places {{StructField}} is thrown 
> around.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2171) Allow custom serdes to set field comments

2011-05-25 Thread Jakob Homan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated HIVE-2171:
--

Fix Version/s: 0.7.1
Affects Version/s: 0.7.0
   Status: Patch Available  (was: Open)

> Allow custom serdes to set field comments
> -
>
> Key: HIVE-2171
> URL: https://issues.apache.org/jira/browse/HIVE-2171
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 0.7.0
>Reporter: Jakob Homan
>Assignee: Jakob Homan
> Fix For: 0.7.1
>
> Attachments: HIVE-2171.patch
>
>
> Currently, while serde implementations can set a field's name, they can't set 
> its comment.  These are set in the metastore utils to {{(from 
> deserializer)}}.  For those serdes that can provide meaningful comments for a 
> field, they should be propagated to the table description.  These 
> serde-provided comments could be prepended to "(from deserializer)" if others 
> feel that's a meaningful distinction.  This change involves updating 
> {{StructField}} to support a (possibly null) comment field and then 
> propagating this change out to the myriad places {{StructField}} is thrown 
> around.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-2185) extend table statistics to store the size of uncompressed data (+extend interfaces for collecting other types of statistics)

2011-05-25 Thread Tomasz Nykiel (JIRA)
extend table statistics to store the size of uncompressed data (+extend 
interfaces for collecting other types of statistics)


 Key: HIVE-2185
 URL: https://issues.apache.org/jira/browse/HIVE-2185
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers, Statistics
Reporter: Tomasz Nykiel
Assignee: Tomasz Nykiel


Currently, when executing INSERT OVERWRITE and ANALYZE TABLE commands we 
collect statistics about the number of rows per partition/table. Other 
statistics (e.g., total table/partition size) are derived from the file system. 

Here, we want to collect information about the sizes of uncompressed data, to 
be able to determine the efficiency of compression.
Currently, a large part of statistics collection mechanism is hardcoded and 
not-easily extensible for other statistics.
On top of adding the new statistic collected, it would be desirable to extend 
the collection mechanism, so any new statistics could be added easily.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Build failed in Jenkins: Hive-trunk-h0.21 #749

2011-05-25 Thread Apache Jenkins Server
See 

--
[...truncated 32138 lines...]
 [echo]  Writing POM to 

No ivy:settings found for the default reference 'ivy.instance'.  A default 
instance will be used
no settings file found, using default...
:: loading settings :: url = 
jar:file:/home/hudson/.ant/lib/ivy-2.0.0-rc2.jar!/org/apache/ivy/core/settings/ivysettings.xml

ivy-init-dirs:

ivy-download:
  [get] Getting: 
http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.1.0/ivy-2.1.0.jar
  [get] To: 

  [get] Not modified - so not downloaded

ivy-probe-antlib:

ivy-init-antlib:

ivy-init:

check-ivy:

create-dirs:

compile-ant-tasks:

create-dirs:

init:

compile:
 [echo] Compiling: anttasks
[javac] 
:40:
 warning: 'includeantruntime' was not set, defaulting to 
build.sysclasspath=last; set to false for repeatable builds

deploy-ant-tasks:

create-dirs:

init:

compile:
 [echo] Compiling: anttasks
[javac] 
:40:
 warning: 'includeantruntime' was not set, defaulting to 
build.sysclasspath=last; set to false for repeatable builds

jar:

init:

install-hadoopcore:

install-hadoopcore-default:

ivy-init-dirs:

ivy-download:
  [get] Getting: 
http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.1.0/ivy-2.1.0.jar
  [get] To: 

  [get] Not modified - so not downloaded

ivy-probe-antlib:

ivy-init-antlib:

ivy-init:

ivy-retrieve-hadoop-source:
:: loading settings :: file = 

[ivy:retrieve] :: resolving dependencies :: 
org.apache.hive#hive-hwi;0.8.0-SNAPSHOT
[ivy:retrieve]  confs: [default]
[ivy:retrieve]  found hadoop#core;0.20.1 in hadoop-source
[ivy:retrieve] :: resolution report :: resolve 661ms :: artifacts dl 1ms
-
|  |modules||   artifacts   |
|   conf   | number| search|dwnlded|evicted|| number|dwnlded|
-
|  default |   1   |   0   |   0   |   0   ||   1   |   0   |
-
[ivy:retrieve] :: retrieving :: org.apache.hive#hive-hwi
[ivy:retrieve]  confs: [default]
[ivy:retrieve]  0 artifacts copied, 1 already retrieved (0kB/1ms)

install-hadoopcore-internal:

setup:

war:

compile:
 [echo] Compiling: hwi
[javac] 
:71:
 warning: 'includeantruntime' was not set, defaulting to 
build.sysclasspath=last; set to false for repeatable builds

jar:
 [echo] Jar: hwi

make-pom:
 [echo]  Writing POM to 

No ivy:settings found for the default reference 'ivy.instance'.  A default 
instance will be used
no settings file found, using default...
:: loading settings :: url = 
jar:file:/home/hudson/.ant/lib/ivy-2.0.0-rc2.jar!/org/apache/ivy/core/settings/ivysettings.xml

ivy-init-dirs:

ivy-download:
  [get] Getting: 
http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.1.0/ivy-2.1.0.jar
  [get] To: 

  [get] Not modified - so not downloaded

ivy-probe-antlib:

ivy-init-antlib:

ivy-init:

check-ivy:

create-dirs:

compile-ant-tasks:

create-dirs:

init:

compile:
 [echo] Compiling: anttasks
[javac] 
:40:
 warning: 'includeantruntime' was not set, defaulting to 
build.sysclasspath=last; set to false for repeatable builds

deploy-ant-tasks:

create-dirs:

init:

compile:
 [echo] Compiling: anttasks
[javac] 
:40:
 warning: 'includeantruntime' was not set, defaulting to 
build.sysclasspath=last; set to false for repeatable builds

jar:

init:

setup:

compile:
 [echo] Compiling: hbase-handler
[javac] 
:299:
 warning: 'includeantruntime' was not set, defaulting to 
build.sysclasspath=last; set to false for repeatable builds
 [copy] Warning: 

 does not exist.

jar:
 [echo] Jar: hbase-handler

make-pom:
 [echo]  Writing POM to

Build failed in Jenkins: Hive-branch-0.7.1-h0.21 #4

2011-05-25 Thread Apache Jenkins Server
See 

--
[...truncated 27403 lines...]
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] POSTHOOK: type: CREATETABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: load data local inpath 
'
 into table testhivedrivertable
[junit] PREHOOK: type: LOAD
[junit] Copying data from 

[junit] Loading data to table default.testhivedrivertable
[junit] POSTHOOK: query: load data local inpath 
'
 into table testhivedrivertable
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: select count(1) as cnt from testhivedrivertable
[junit] PREHOOK: type: QUERY
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: 
file:/tmp/hudson/hive_2011-05-25_12-09-41_516_5538876000417618343/-mr-1
[junit] Total MapReduce jobs = 1
[junit] Launching Job 1 out of 1
[junit] Number of reduce tasks determined at compile time: 1
[junit] In order to change the average load for a reducer (in bytes):
[junit]   set hive.exec.reducers.bytes.per.reducer=
[junit] In order to limit the maximum number of reducers:
[junit]   set hive.exec.reducers.max=
[junit] In order to set a constant number of reducers:
[junit]   set mapred.reduce.tasks=
[junit] Job running in-process (local Hadoop)
[junit] 2011-05-25 12:09:44,599 null map = 100%,  reduce = 100%
[junit] Ended Job = job_local_0001
[junit] POSTHOOK: query: select count(1) as cnt from testhivedrivertable
[junit] POSTHOOK: type: QUERY
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 
file:/tmp/hudson/hive_2011-05-25_12-09-41_516_5538876000417618343/-mr-1
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (num int)
[junit] PREHOOK: type: CREATETABLE
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] POSTHOOK: type: CREATETABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: load data local inpath 
'
 into table testhivedrivertable
[junit] PREHOOK: type: LOAD
[junit] Copying data from 

[junit] Loading data to table default.testhivedrivertable
[junit] POSTHOOK: query: load data local inpath 
'
 into table testhivedrivertable
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: select * from testhivedrivertable limit 10
[junit] PREHOOK: type: QUERY
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: 
file:/tmp/hudson/hive_2011-05-25_12-09-46_073_8776542399968628601/-mr-1
[junit] POSTHOOK: query: select * from testhivedrivertable limit 10
[junit] POSTHOOK: type: QUERY
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 
file:/tmp/hudson/hive_2011-05-25_12-09-46_073_8776542399968628601/-mr-1
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=

[jira] [Commented] (HIVE-2147) Add api to send / receive message to metastore

2011-05-25 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13039249#comment-13039249
 ] 

jirapos...@reviews.apache.org commented on HIVE-2147:
-



bq.  On 2011-05-25 03:43:30, Carl Steinbach wrote:
bq.  > trunk/metastore/if/hive_metastore.thrift, line 347
bq.  > 
bq.  >
bq.  > Having separate calls for sending request and response messages 
looks unnecessary. A sendMessage() function with separate request and response 
message types should work just as well, and will help to avoid confusion -- 
otherwise I think people will assume that receiveMessage is a polling call.
bq.  > 
bq.  > This is starting to look like a general purpose messaging/rpc 
framework. Is that the intent?
bq.  >

bq. > A sendMessage() function with separate request and response message types 
should work just as well.
That is correct. But semantically they are different. In sendMessage() user is 
just notifying Metastore of an event and is not bothered of return value. 
recvMessage() user is asking for a response for his message. This distinction 
is further enforced by return types. We could just have one api sendMessage() 
for both as you suggested, but having distinct apis for sending and receiving 
makes it easier for client to understand the semantics.

bq. > This is starting to look like a general purpose messaging/rpc framework. 
Well general purpose rpc framework would be much more sophisticated. I am not 
aiming for that. 


bq.  On 2011-05-25 03:43:30, Carl Steinbach wrote:
bq.  > trunk/metastore/if/hive_metastore.thrift, line 348
bq.  > 
bq.  >
bq.  > Identifying the message type using an integer seems brittle. This 
won't work if you have more than one application that is firing events at the 
metastore.

There are two other alternatives that I thought of before settling on this one.
1) Add specific apis for different message types. This would have made doing 
this generic api redundant but then this will result in application specific 
apis in the metastore. E.g., in HCatalog we want to send a message for "set of 
partitions" telling Metastore to mark them as done. What does 
finalizePartition() mean in metastore api when Metastore itself is not aware of 
this concept as this is application specific. This would be confusing.
2) Use enums instead of integer. This will result in similar problem as above 
though on a lower scale. Enums give compile time safety so we have to define 
them in Metastore code. Defining application specific enums doesnt look like a 
good idea because of similar reasons. 


bq.  On 2011-05-25 03:43:30, Carl Steinbach wrote:
bq.  > 
trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java, 
line 3126
bq.  > 
bq.  >
bq.  > So the event model is that each event may be handled by at most one 
event handler?

Yes.


bq.  On 2011-05-25 03:43:30, Carl Steinbach wrote:
bq.  > 
trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java, 
line 3134
bq.  > 
bq.  >
bq.  > Please add some DEBUG or TRACE level logging here that indicates 
which handler consumed a particular event, or if an event was unserviceable.

Will add logging.


bq.  On 2011-05-25 03:43:30, Carl Steinbach wrote:
bq.  > 
trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java, 
line 3149
bq.  > 
bq.  >
bq.  > Semantically this function looks more like "sendRequest" than 
"receiveMessage" (and "sendMessage" looks more like "fireEvent").

Same as my very first comment.


bq.  On 2011-05-25 03:43:30, Carl Steinbach wrote:
bq.  > 
trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java, 
line 3151
bq.  > 
bq.  >
bq.  > Checkstyle: you need a space between control flow tokens and open 
parens.
bq.  >

will roll this in.


bq.  On 2011-05-25 03:43:30, Carl Steinbach wrote:
bq.  > 
trunk/metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java,
 line 640
bq.  > 
bq.  >
bq.  > Nice to have: javadoc.

Will add.


bq.  On 2011-05-25 03:43:30, Carl Steinbach wrote:
bq.  > 
trunk/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreEventListener.java,
 line 86
bq.  > 
bq.  >
bq.  > canProcessSendMessage() looks like a redundant call. Is there any 
reason that this can't be be rolled into processSendMessage()?
bq.  >

Event 

Re: Review Request: HIVE-2147 : Add api to send / receive message to metastore

2011-05-25 Thread Ashutosh Chauhan


> On 2011-05-25 03:43:30, Carl Steinbach wrote:
> > trunk/metastore/if/hive_metastore.thrift, line 347
> > 
> >
> > Having separate calls for sending request and response messages looks 
> > unnecessary. A sendMessage() function with separate request and response 
> > message types should work just as well, and will help to avoid confusion -- 
> > otherwise I think people will assume that receiveMessage is a polling call.
> > 
> > This is starting to look like a general purpose messaging/rpc 
> > framework. Is that the intent?
> >

>> A sendMessage() function with separate request and response message types 
>> should work just as well.
That is correct. But semantically they are different. In sendMessage() user is 
just notifying Metastore of an event and is not bothered of return value. 
recvMessage() user is asking for a response for his message. This distinction 
is further enforced by return types. We could just have one api sendMessage() 
for both as you suggested, but having distinct apis for sending and receiving 
makes it easier for client to understand the semantics.

>> This is starting to look like a general purpose messaging/rpc framework. 
Well general purpose rpc framework would be much more sophisticated. I am not 
aiming for that. 


> On 2011-05-25 03:43:30, Carl Steinbach wrote:
> > trunk/metastore/if/hive_metastore.thrift, line 348
> > 
> >
> > Identifying the message type using an integer seems brittle. This won't 
> > work if you have more than one application that is firing events at the 
> > metastore.

There are two other alternatives that I thought of before settling on this one.
1) Add specific apis for different message types. This would have made doing 
this generic api redundant but then this will result in application specific 
apis in the metastore. E.g., in HCatalog we want to send a message for "set of 
partitions" telling Metastore to mark them as done. What does 
finalizePartition() mean in metastore api when Metastore itself is not aware of 
this concept as this is application specific. This would be confusing.
2) Use enums instead of integer. This will result in similar problem as above 
though on a lower scale. Enums give compile time safety so we have to define 
them in Metastore code. Defining application specific enums doesnt look like a 
good idea because of similar reasons. 


> On 2011-05-25 03:43:30, Carl Steinbach wrote:
> > trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java,
> >  line 3126
> > 
> >
> > So the event model is that each event may be handled by at most one 
> > event handler?

Yes.


> On 2011-05-25 03:43:30, Carl Steinbach wrote:
> > trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java,
> >  line 3134
> > 
> >
> > Please add some DEBUG or TRACE level logging here that indicates which 
> > handler consumed a particular event, or if an event was unserviceable.

Will add logging.


> On 2011-05-25 03:43:30, Carl Steinbach wrote:
> > trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java,
> >  line 3149
> > 
> >
> > Semantically this function looks more like "sendRequest" than 
> > "receiveMessage" (and "sendMessage" looks more like "fireEvent").

Same as my very first comment.


> On 2011-05-25 03:43:30, Carl Steinbach wrote:
> > trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java,
> >  line 3151
> > 
> >
> > Checkstyle: you need a space between control flow tokens and open 
> > parens.
> >

will roll this in.


> On 2011-05-25 03:43:30, Carl Steinbach wrote:
> > trunk/metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java,
> >  line 640
> > 
> >
> > Nice to have: javadoc.

Will add.


> On 2011-05-25 03:43:30, Carl Steinbach wrote:
> > trunk/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreEventListener.java,
> >  line 86
> > 
> >
> > canProcessSendMessage() looks like a redundant call. Is there any 
> > reason that this can't be be rolled into processSendMessage()?
> >

Event model is every event is handled by atmost one handler. If we roll this in 
processSendMsg() then we have to make this method return boolean which will 
tell whether this event got serviced by this handler or not. Then how will it 
communicate back the actual return value. In case of sendMsg() this is fine, 
but recvMsg() returns a valid value which is then