[jira] Created: (HIVE-2031) Correct the exception message for the better traceability for the scenario load into the partitioned table having 2 partitions by specifying only one partition in the load

2011-03-08 Thread Chinna Rao Lalam (JIRA)
Correct the exception message for the better traceability for the scenario load 
into the partitioned table having 2  partitions by specifying only one 
partition in the load statement. 


 Key: HIVE-2031
 URL: https://issues.apache.org/jira/browse/HIVE-2031
 Project: Hive
  Issue Type: Bug
  Components: Logging
Affects Versions: 0.7.0
 Environment: Hadoop 0.20.1, Hive0.7.0 and SUSE Linux Enterprise Server 
10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5).
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam


 Load into the partitioned table having 2 partitions by specifying only one 
partition in the load statement is failing and logging the following exception 
message.

{noformat}
 org.apache.hadoop.hive.ql.parse.SemanticException: line 1:91 Partition not 
found '21Oct'
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer$tableSpec.init(BaseSemanticAnalyzer.java:685)
at 
org.apache.hadoop.hive.ql.parse.LoadSemanticAnalyzer.analyzeInternal(LoadSemanticAnalyzer.java:196)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:340)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:736)
at 
org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServer.java:151)
at 
org.apache.hadoop.hive.service.ThriftHive$Processor$execute.process(ThriftHive.java:764)
at 
org.apache.hadoop.hive.service.ThriftHive$Processor.process(ThriftHive.java:742)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
{noformat}

This needs to be corrected in such a way what is the actual root cause for this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (HIVE-1976) Exception should be thrown when invalid jar,file,archive is given to add command

2011-03-08 Thread Chinna Rao Lalam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chinna Rao Lalam updated HIVE-1976:
---

Attachment: HIVE-1976.3.patch

 Exception should be thrown when invalid jar,file,archive is given to add 
 command
 

 Key: HIVE-1976
 URL: https://issues.apache.org/jira/browse/HIVE-1976
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.5.0, 0.7.0
 Environment: Hadoop 0.20.1, Hive0.5.0 and SUSE Linux Enterprise 
 Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5).
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam
 Attachments: HIVE-1976.2.patch, HIVE-1976.3.patch, HIVE-1976.patch


 When executed add command with non existing jar it should throw exception 
 through   HiveStatement
 Ex:
 {noformat}
   add jar /root/invalidpath/testjar.jar
 {noformat}
 Here testjar.jar is not exist so it should throw exception.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (HIVE-2031) Correct the exception message for the better traceability for the scenario load into the partitioned table having 2 partitions by specifying only one partition in the load

2011-03-08 Thread Chinna Rao Lalam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chinna Rao Lalam updated HIVE-2031:
---

Attachment: HIVE-2031.patch

 Correct the exception message for the better traceability for the scenario 
 load into the partitioned table having 2  partitions by specifying only one 
 partition in the load statement. 
 

 Key: HIVE-2031
 URL: https://issues.apache.org/jira/browse/HIVE-2031
 Project: Hive
  Issue Type: Bug
  Components: Logging
Affects Versions: 0.7.0
 Environment: Hadoop 0.20.1, Hive0.7.0 and SUSE Linux Enterprise 
 Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5).
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam
 Attachments: HIVE-2031.patch


  Load into the partitioned table having 2 partitions by specifying only one 
 partition in the load statement is failing and logging the following 
 exception message.
 {noformat}
  org.apache.hadoop.hive.ql.parse.SemanticException: line 1:91 Partition not 
 found '21Oct'
   at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer$tableSpec.init(BaseSemanticAnalyzer.java:685)
   at 
 org.apache.hadoop.hive.ql.parse.LoadSemanticAnalyzer.analyzeInternal(LoadSemanticAnalyzer.java:196)
   at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:340)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:736)
   at 
 org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServer.java:151)
   at 
 org.apache.hadoop.hive.service.ThriftHive$Processor$execute.process(ThriftHive.java:764)
   at 
 org.apache.hadoop.hive.service.ThriftHive$Processor.process(ThriftHive.java:742)
   at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:619)
 {noformat}
 This needs to be corrected in such a way what is the actual root cause for 
 this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (HIVE-1976) Exception should be thrown when invalid jar,file,archive is given to add command

2011-03-08 Thread Chinna Rao Lalam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13003918#comment-13003918
 ] 

Chinna Rao Lalam commented on HIVE-1976:


Ya sorry i missed the behavior change of the CLI scenario.

Now i attached one patch with the new solution like instead of throwing the 
RuntimeExcetion return the CommandProcessorResponse with non zero responseCode.

Based on the responseCode of CommandProcessorResponse 
HiveServer.java(HIVEServer Mode), CliDriver.java(CLI Mode) and 
HWISessionItem.java(HIVEWebUI mode) will respond.

This way the behavior change won't be there. Pls review this and give u r 
comments.

 Exception should be thrown when invalid jar,file,archive is given to add 
 command
 

 Key: HIVE-1976
 URL: https://issues.apache.org/jira/browse/HIVE-1976
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.5.0, 0.7.0
 Environment: Hadoop 0.20.1, Hive0.5.0 and SUSE Linux Enterprise 
 Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5).
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam
 Attachments: HIVE-1976.2.patch, HIVE-1976.3.patch, HIVE-1976.patch


 When executed add command with non existing jar it should throw exception 
 through   HiveStatement
 Ex:
 {noformat}
   add jar /root/invalidpath/testjar.jar
 {noformat}
 Here testjar.jar is not exist so it should throw exception.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (HIVE-2031) Correct the exception message for the better traceability for the scenario load into the partitioned table having 2 partitions by specifying only one partition in the lo

2011-03-08 Thread Chinna Rao Lalam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13003951#comment-13003951
 ] 

Chinna Rao Lalam commented on HIVE-2031:




{noformat}

create table sampletable (a string,b string) PARTITIONED BY(dt STRING, country 
STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '@';

LOAD DATA INPATH '/user/root/mytable/joindata.txt' OVERWRITE INTO TABLE 
sampletable partition (dt='21Oct');

{noformat}

The above query will fail because load query don't have 2 partitions 
information. If the log message is coming like this it is easy to debug

{noformat}
2011-03-08 17:13:20,901 ERROR ql.Driver (SessionState.java:printError(365)) - 
FAILED: Error in semantic analysis: line 1:91 Partition not found '21Oct'
org.apache.hadoop.hive.ql.parse.SemanticException: line 1:91 Partition not 
found '21Oct'
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer$tableSpec.init(BaseSemanticAnalyzer.java:685)
.
at java.lang.Thread.run(Thread.java:619)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: table is 
partitioned but partition spec is not specified or does not fully match table 
partitioning: {dt=21Oct}
at org.apache.hadoop.hive.ql.metadata.Table.isValidSpec(Table.java:341)
... 11 more
{noformat}

 Correct the exception message for the better traceability for the scenario 
 load into the partitioned table having 2  partitions by specifying only one 
 partition in the load statement. 
 

 Key: HIVE-2031
 URL: https://issues.apache.org/jira/browse/HIVE-2031
 Project: Hive
  Issue Type: Bug
  Components: Logging
Affects Versions: 0.7.0
 Environment: Hadoop 0.20.1, Hive0.7.0 and SUSE Linux Enterprise 
 Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5).
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam
 Attachments: HIVE-2031.patch


  Load into the partitioned table having 2 partitions by specifying only one 
 partition in the load statement is failing and logging the following 
 exception message.
 {noformat}
  org.apache.hadoop.hive.ql.parse.SemanticException: line 1:91 Partition not 
 found '21Oct'
   at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer$tableSpec.init(BaseSemanticAnalyzer.java:685)
   at 
 org.apache.hadoop.hive.ql.parse.LoadSemanticAnalyzer.analyzeInternal(LoadSemanticAnalyzer.java:196)
   at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:340)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:736)
   at 
 org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServer.java:151)
   at 
 org.apache.hadoop.hive.service.ThriftHive$Processor$execute.process(ThriftHive.java:764)
   at 
 org.apache.hadoop.hive.service.ThriftHive$Processor.process(ThriftHive.java:742)
   at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:619)
 {noformat}
 This needs to be corrected in such a way what is the actual root cause for 
 this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (HIVE-2031) Correct the exception message for the better traceability for the scenario load into the partitioned table having 2 partitions by specifying only one partition in the load

2011-03-08 Thread Chinna Rao Lalam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chinna Rao Lalam updated HIVE-2031:
---

Status: Patch Available  (was: Open)

 Correct the exception message for the better traceability for the scenario 
 load into the partitioned table having 2  partitions by specifying only one 
 partition in the load statement. 
 

 Key: HIVE-2031
 URL: https://issues.apache.org/jira/browse/HIVE-2031
 Project: Hive
  Issue Type: Bug
  Components: Logging
Affects Versions: 0.7.0
 Environment: Hadoop 0.20.1, Hive0.7.0 and SUSE Linux Enterprise 
 Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5).
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam
 Attachments: HIVE-2031.patch


  Load into the partitioned table having 2 partitions by specifying only one 
 partition in the load statement is failing and logging the following 
 exception message.
 {noformat}
  org.apache.hadoop.hive.ql.parse.SemanticException: line 1:91 Partition not 
 found '21Oct'
   at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer$tableSpec.init(BaseSemanticAnalyzer.java:685)
   at 
 org.apache.hadoop.hive.ql.parse.LoadSemanticAnalyzer.analyzeInternal(LoadSemanticAnalyzer.java:196)
   at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:340)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:736)
   at 
 org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServer.java:151)
   at 
 org.apache.hadoop.hive.service.ThriftHive$Processor$execute.process(ThriftHive.java:764)
   at 
 org.apache.hadoop.hive.service.ThriftHive$Processor.process(ThriftHive.java:742)
   at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:619)
 {noformat}
 This needs to be corrected in such a way what is the actual root cause for 
 this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (HIVE-1555) JDBC Storage Handler

2011-03-08 Thread Andrew Wilson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13004129#comment-13004129
 ] 

Andrew Wilson commented on HIVE-1555:
-

Hi,

Can I get this issue assigned to me? I have a basic implementation working, 
which I'd like to contribute. 

It wraps the DBInputFormat and DBOutputFormat classes. 

It expects values for the DBConfiguration properties to be provided through the 
SERDEPROPERTIES block in the create table statement. The 
configureTableJobProperties() method copies these properties out of the table 
description and into each job context.

It also allows users to set SerDe properties which will cause the 
DBOutputFormat to generate UPSERT sql statements or DELETE sql statements 
instead of the vanilla INSERT sql generated by default. Right now this feature 
has a MySql bias. I am still trying to decide what the best way is to make this 
more database vendor agnostic.

Andrew


 JDBC Storage Handler
 

 Key: HIVE-1555
 URL: https://issues.apache.org/jira/browse/HIVE-1555
 Project: Hive
  Issue Type: New Feature
  Components: JDBC
Reporter: Bob Robertson
   Original Estimate: 24h
  Remaining Estimate: 24h

 With the Cassandra and HBase Storage Handlers I thought it would make sense 
 to include a generic JDBC RDBMS Storage Handler so that you could import a 
 standard DB table into Hive. Many people must want to perform HiveQL joins, 
 etc against tables in other systems etc.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Assigned: (HIVE-1555) JDBC Storage Handler

2011-03-08 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi reassigned HIVE-1555:


Assignee: Andrew Wilson

 JDBC Storage Handler
 

 Key: HIVE-1555
 URL: https://issues.apache.org/jira/browse/HIVE-1555
 Project: Hive
  Issue Type: New Feature
  Components: JDBC
Reporter: Bob Robertson
Assignee: Andrew Wilson
   Original Estimate: 24h
  Remaining Estimate: 24h

 With the Cassandra and HBase Storage Handlers I thought it would make sense 
 to include a generic JDBC RDBMS Storage Handler so that you could import a 
 standard DB table into Hive. Many people must want to perform HiveQL joins, 
 etc against tables in other systems etc.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Created: (HIVE-2032) create database does not honour warehouse.dir in dbproperties

2011-03-08 Thread Thiruvel Thirumoolan (JIRA)
create database does not honour warehouse.dir in dbproperties
-

 Key: HIVE-2032
 URL: https://issues.apache.org/jira/browse/HIVE-2032
 Project: Hive
  Issue Type: Bug
  Components: Clients
Affects Versions: 0.7.0, 0.8.0
Reporter: Thiruvel Thirumoolan
Assignee: Thiruvel Thirumoolan
 Fix For: 0.8.0


# create database db with dbproperties ('hive.metastore.warehouse.dir' = 'loc');

The above command does not set location of 'db' to 'loc'. It instead creates 
'db.db' under the warehouse directory configured in hive-site.xml of CLI. Looks 
conflicting with HIVE-1820's expectation. If scratch dir is specified here, 
that is honoured.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Created: (HIVE-2033) A database's warehouse.dir is not used for tables created in it.

2011-03-08 Thread Thiruvel Thirumoolan (JIRA)
A database's warehouse.dir is not used for tables created in it.


 Key: HIVE-2033
 URL: https://issues.apache.org/jira/browse/HIVE-2033
 Project: Hive
  Issue Type: Bug
  Components: Clients, Metastore
Affects Versions: 0.7.0, 0.8.0
Reporter: Thiruvel Thirumoolan
Assignee: Thiruvel Thirumoolan
 Fix For: 0.8.0


$ create database db with dbproperties ('hive.metastore.warehouse.dir' = 'loc');
$ use db;
$ create table test(name string);

Table 'test's location is not under 'loc'. Instead its under hive-site.xml's 
warehouse dir.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (HIVE-2033) A database's warehouse.dir is not used for tables created in it.

2011-03-08 Thread Thiruvel Thirumoolan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HIVE-2033:
---

Attachment: HIVE-2033_prelim.patch

Preliminary patch. Test cases being added.

 A database's warehouse.dir is not used for tables created in it.
 

 Key: HIVE-2033
 URL: https://issues.apache.org/jira/browse/HIVE-2033
 Project: Hive
  Issue Type: Bug
  Components: Clients, Metastore
Affects Versions: 0.7.0, 0.8.0
Reporter: Thiruvel Thirumoolan
Assignee: Thiruvel Thirumoolan
 Fix For: 0.8.0

 Attachments: HIVE-2033_prelim.patch


 $ create database db with dbproperties ('hive.metastore.warehouse.dir' = 
 'loc');
 $ use db;
 $ create table test(name string);
 Table 'test's location is not under 'loc'. Instead its under hive-site.xml's 
 warehouse dir.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (HIVE-1555) JDBC Storage Handler

2011-03-08 Thread Tim Perkins (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13004148#comment-13004148
 ] 

Tim Perkins commented on HIVE-1555:
---

hey... you need to get off this email address.  I don't know who on your
team is improperly claiming this address as their own, but they're mistaken.

Please remove this address from your system.




 JDBC Storage Handler
 

 Key: HIVE-1555
 URL: https://issues.apache.org/jira/browse/HIVE-1555
 Project: Hive
  Issue Type: New Feature
  Components: JDBC
Reporter: Bob Robertson
Assignee: Andrew Wilson
   Original Estimate: 24h
  Remaining Estimate: 24h

 With the Cassandra and HBase Storage Handlers I thought it would make sense 
 to include a generic JDBC RDBMS Storage Handler so that you could import a 
 standard DB table into Hive. Many people must want to perform HiveQL joins, 
 etc against tables in other systems etc.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Build failed in Jenkins: Hive-0.7.0-h0.20 #31

2011-03-08 Thread Apache Hudson Server
See https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/31/

--
[...truncated 26800 lines...]
[junit] Hive history 
file=https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/build/service/tmp/hive_job_log_hudson_201103081145_409011186.txt
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (num int)
[junit] PREHOOK: type: CREATETABLE
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] POSTHOOK: type: CREATETABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: load data local inpath 
'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt
[junit] Loading data to table default.testhivedrivertable
[junit] POSTHOOK: query: load data local inpath 
'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: select count(1) as cnt from testhivedrivertable
[junit] PREHOOK: type: QUERY
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: 
file:/tmp/hudson/hive_2011-03-08_11-45-08_932_3545880278145320390/-mr-1
[junit] Total MapReduce jobs = 1
[junit] Launching Job 1 out of 1
[junit] Number of reduce tasks determined at compile time: 1
[junit] In order to change the average load for a reducer (in bytes):
[junit]   set hive.exec.reducers.bytes.per.reducer=number
[junit] In order to limit the maximum number of reducers:
[junit]   set hive.exec.reducers.max=number
[junit] In order to set a constant number of reducers:
[junit]   set mapred.reduce.tasks=number
[junit] Job running in-process (local Hadoop)
[junit] 2011-03-08 11:45:11,974 null map = 100%,  reduce = 100%
[junit] Ended Job = job_local_0001
[junit] POSTHOOK: query: select count(1) as cnt from testhivedrivertable
[junit] POSTHOOK: type: QUERY
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 
file:/tmp/hudson/hive_2011-03-08_11-45-08_932_3545880278145320390/-mr-1
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/build/service/tmp/hive_job_log_hudson_201103081145_648816479.txt
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (num int)
[junit] PREHOOK: type: CREATETABLE
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] POSTHOOK: type: CREATETABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: load data local inpath 
'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt
[junit] Loading data to table default.testhivedrivertable
[junit] POSTHOOK: query: load data local inpath 
'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: select * from testhivedrivertable limit 10
[junit] PREHOOK: type: QUERY
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: 
file:/tmp/hudson/hive_2011-03-08_11-45-13_557_6849149356289617543/-mr-1
[junit] POSTHOOK: query: select * from testhivedrivertable limit 10
[junit] POSTHOOK: type: QUERY
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 
file:/tmp/hudson/hive_2011-03-08_11-45-13_557_6849149356289617543/-mr-1
[junit] OK
[junit] PREHOOK: query: drop table 

[jira] Commented: (HIVE-1991) Hive Shell to output number of mappers and number of reducers

2011-03-08 Thread Siying Dong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13004226#comment-13004226
 ] 

Siying Dong commented on HIVE-1991:
---

This change was overriden by HIVE-1950.

 Hive Shell to output number of mappers and number of reducers
 -

 Key: HIVE-1991
 URL: https://issues.apache.org/jira/browse/HIVE-1991
 Project: Hive
  Issue Type: Improvement
  Components: CLI
Reporter: Siying Dong
Assignee: Siying Dong
Priority: Trivial
 Fix For: 0.8.0

 Attachments: HIVE-1991.1.patch, HIVE-1991.2.patch


 Number of mappers and number of reducers are nice information to be outputted 
 for users to know.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (HIVE-2034) Backport HIVE-1991 after overridden by HIVE-1950

2011-03-08 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2034:
--

Attachment: HIVE-2034.1.patch

 Backport HIVE-1991 after overridden by HIVE-1950
 

 Key: HIVE-2034
 URL: https://issues.apache.org/jira/browse/HIVE-2034
 Project: Hive
  Issue Type: Bug
Reporter: Siying Dong
Assignee: Siying Dong
Priority: Trivial
 Attachments: HIVE-2034.1.patch




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (HIVE-2034) Backport HIVE-1991 after overridden by HIVE-1950

2011-03-08 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2034:
--

Status: Patch Available  (was: Open)

 Backport HIVE-1991 after overridden by HIVE-1950
 

 Key: HIVE-2034
 URL: https://issues.apache.org/jira/browse/HIVE-2034
 Project: Hive
  Issue Type: Bug
Reporter: Siying Dong
Assignee: Siying Dong
Priority: Trivial
 Attachments: HIVE-2034.1.patch




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (HIVE-2030) isEmptyPath() to use ContentSummary cache

2011-03-08 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13004253#comment-13004253
 ] 

He Yongqiang commented on HIVE-2030:


The ContentSummary is not guaranteed to be populated. Even it is, it seems this 
information is not passed to the child process. (So this is not empty only when 
executing with local mode)

 isEmptyPath() to use ContentSummary cache
 -

 Key: HIVE-2030
 URL: https://issues.apache.org/jira/browse/HIVE-2030
 Project: Hive
  Issue Type: Improvement
Reporter: Siying Dong
Assignee: Siying Dong
Priority: Minor
 Attachments: HIVE-2030.1.patch


 addInputPaths() calls isEmptyPath() for every input path. Now every call is a 
 DFS namenode call. Making isEmptyPath() to use cached ContentSummary, we 
 should be able to avoid some namenode calls and reduce latency in the case of 
 multiple partitions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (HIVE-2030) isEmptyPath() to use ContentSummary cache

2011-03-08 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-2030:
---

Status: Open  (was: Patch Available)

 isEmptyPath() to use ContentSummary cache
 -

 Key: HIVE-2030
 URL: https://issues.apache.org/jira/browse/HIVE-2030
 Project: Hive
  Issue Type: Improvement
Reporter: Siying Dong
Assignee: Siying Dong
Priority: Minor
 Attachments: HIVE-2030.1.patch


 addInputPaths() calls isEmptyPath() for every input path. Now every call is a 
 DFS namenode call. Making isEmptyPath() to use cached ContentSummary, we 
 should be able to avoid some namenode calls and reduce latency in the case of 
 multiple partitions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


hooks in metastore functions

2011-03-08 Thread Ashutosh Chauhan
Hi all,

I have a requirement that every time some change on metastore takes
place, we have some logic which needs to be run. For example, if a new
table is getting created in metastore I want to send a message to a
message bus. Easiest way for this to work is to add the logic in
createTable(). Control it by a hiveConf param and turn it off by
default. Alternative way is via hooks. Have this extra logic in hook
and then load and fire the hook if its available. Does anyone has an
opinion which of these two is preferable. Second one requires new hook
loading and execution logic. I am currently interested in four
functions: createTable() dropTable() addPartition() dropPartition().
Current, HiveMetaHook which exists in createTable() doesn't perfectly
fit the bill, since it is fired only when user expresses it in his
create table statement (i.e., if he has specified a storage handler)
Instead I want to have this logic always run.
If it is unclear, let me know, I can post the code  which can
demonstrate my usecase.

Ashutosh


[jira] Commented: (HIVE-2034) Backport HIVE-1991 after overridden by HIVE-1950

2011-03-08 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13004257#comment-13004257
 ] 

Ning Zhang commented on HIVE-2034:
--

+1. Will commit if tests pass. 

 Backport HIVE-1991 after overridden by HIVE-1950
 

 Key: HIVE-2034
 URL: https://issues.apache.org/jira/browse/HIVE-2034
 Project: Hive
  Issue Type: Bug
Reporter: Siying Dong
Assignee: Siying Dong
Priority: Trivial
 Attachments: HIVE-2034.1.patch




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (HIVE-2030) isEmptyPath() to use ContentSummary cache

2011-03-08 Thread Siying Dong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siying Dong updated HIVE-2030:
--

Attachment: HIVE-2030.2.patch

In the case of Exception, we don't populate cache. It's to make sure cache 
never gets wrong value.

 isEmptyPath() to use ContentSummary cache
 -

 Key: HIVE-2030
 URL: https://issues.apache.org/jira/browse/HIVE-2030
 Project: Hive
  Issue Type: Improvement
Reporter: Siying Dong
Assignee: Siying Dong
Priority: Minor
 Attachments: HIVE-2030.1.patch, HIVE-2030.2.patch


 addInputPaths() calls isEmptyPath() for every input path. Now every call is a 
 DFS namenode call. Making isEmptyPath() to use cached ContentSummary, we 
 should be able to avoid some namenode calls and reduce latency in the case of 
 multiple partitions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (HIVE-2030) isEmptyPath() to use ContentSummary cache

2011-03-08 Thread Siying Dong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13004260#comment-13004260
 ] 

Siying Dong commented on HIVE-2030:
---

Yongqiang, I don't quite understand your comment. If there is a cache miss, we 
call the original method. We never make things worse.

 isEmptyPath() to use ContentSummary cache
 -

 Key: HIVE-2030
 URL: https://issues.apache.org/jira/browse/HIVE-2030
 Project: Hive
  Issue Type: Improvement
Reporter: Siying Dong
Assignee: Siying Dong
Priority: Minor
 Attachments: HIVE-2030.1.patch, HIVE-2030.2.patch


 addInputPaths() calls isEmptyPath() for every input path. Now every call is a 
 DFS namenode call. Making isEmptyPath() to use cached ContentSummary, we 
 should be able to avoid some namenode calls and reduce latency in the case of 
 multiple partitions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (HIVE-1803) Implement bitmap indexing in Hive

2011-03-08 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-1803:
-

Attachment: JavaEWAH_20110304.zip

Uploading a .zip of the source for reference.

 Implement bitmap indexing in Hive
 -

 Key: HIVE-1803
 URL: https://issues.apache.org/jira/browse/HIVE-1803
 Project: Hive
  Issue Type: New Feature
  Components: Indexing
Reporter: Marquis Wang
Assignee: Marquis Wang
 Attachments: HIVE-1803.1.patch, HIVE-1803.2.patch, HIVE-1803.3.patch, 
 HIVE-1803.4.patch, HIVE-1803.5.patch, HIVE-1803.6.patch, 
 JavaEWAH_20110304.zip, bitmap_index_1.png, bitmap_index_2.png, javaewah.jar, 
 javaewah.jar


 Implement bitmap index handler to complement compact indexing.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Review Request: HIVE-1803: Implement bitmap indexing in Hive (new review starting from patch 6)

2011-03-08 Thread John Sichi

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/481/
---

Review request for hive.


Summary
---

Review board was giving me grief trying to update the old patch, so I'm 
creating a fresh review request for HIVE-1803.6


This addresses bug HIVE-1803.
https://issues.apache.org/jira/browse/HIVE-1803


Diffs
-

  lib/README 1c2f0b1 
  lib/javaewah-0.2.jar PRE-CREATION 
  ql/build.xml 50c604e 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java ba222f3 
  ql/src/java/org/apache/hadoop/hive/ql/exec/MapOperator.java ff74f08 
  ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndex.java 308d985 
  ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexResult.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexedInputFormat.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/index/IndexMetadataChangeTask.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/index/IndexMetadataChangeWork.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/index/TableBasedIndexHandler.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapObjectInput.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapObjectOutput.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java 
1f01446 
  
ql/src/java/org/apache/hadoop/hive/ql/index/compact/HiveCompactIndexInputFormat.java
 6c320c5 
  
ql/src/java/org/apache/hadoop/hive/ql/index/compact/HiveCompactIndexResult.java 
0c9ccea 
  
ql/src/java/org/apache/hadoop/hive/ql/index/compact/IndexMetadataChangeTask.java
 eac168f 
  
ql/src/java/org/apache/hadoop/hive/ql/index/compact/IndexMetadataChangeWork.java
 26beb4e 
  ql/src/java/org/apache/hadoop/hive/ql/io/HiveContextAwareRecordReader.java 
391e5de 
  ql/src/java/org/apache/hadoop/hive/ql/io/IOContext.java 77220a1 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/VirtualColumn.java 30714b8 
  
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/AbstractGenericUDFEWAHBitmapOp.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFEWAHBitmap.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFEWAHBitmapAnd.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFEWAHBitmapEmpty.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFEWAHBitmapOr.java 
PRE-CREATION 
  ql/src/test/queries/clientpositive/index_bitmap.q PRE-CREATION 
  ql/src/test/queries/clientpositive/index_bitmap1.q PRE-CREATION 
  ql/src/test/queries/clientpositive/index_bitmap2.q PRE-CREATION 
  ql/src/test/queries/clientpositive/index_bitmap3.q PRE-CREATION 
  ql/src/test/queries/clientpositive/index_compact.q 6547a52 
  ql/src/test/queries/clientpositive/index_compact_1.q 6d59353 
  ql/src/test/queries/clientpositive/index_compact_2.q 358b5e9 
  ql/src/test/queries/clientpositive/index_compact_3.q ee8abda 
  ql/src/test/queries/clientpositive/udf_bitmap_and.q PRE-CREATION 
  ql/src/test/queries/clientpositive/udf_bitmap_or.q PRE-CREATION 
  ql/src/test/results/clientpositive/index_bitmap.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/index_bitmap1.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/index_bitmap2.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/index_bitmap3.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/udf_bitmap_and.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/udf_bitmap_or.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/481/diff


Testing
---


Thanks,

John



[jira] Updated: (HIVE-1803) Implement bitmap indexing in Hive

2011-03-08 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-1803:
-

Status: Open  (was: Patch Available)

New review board entry (I failed trying to update the old one with the new 
patch):

https://reviews.apache.org/r/481/


 Implement bitmap indexing in Hive
 -

 Key: HIVE-1803
 URL: https://issues.apache.org/jira/browse/HIVE-1803
 Project: Hive
  Issue Type: New Feature
  Components: Indexing
Reporter: Marquis Wang
Assignee: Marquis Wang
 Attachments: HIVE-1803.1.patch, HIVE-1803.2.patch, HIVE-1803.3.patch, 
 HIVE-1803.4.patch, HIVE-1803.5.patch, HIVE-1803.6.patch, 
 JavaEWAH_20110304.zip, bitmap_index_1.png, bitmap_index_2.png, javaewah.jar, 
 javaewah.jar


 Implement bitmap index handler to complement compact indexing.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (HIVE-2030) isEmptyPath() to use ContentSummary cache

2011-03-08 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13004317#comment-13004317
 ] 

He Yongqiang commented on HIVE-2030:


okay, will test and commit.

 isEmptyPath() to use ContentSummary cache
 -

 Key: HIVE-2030
 URL: https://issues.apache.org/jira/browse/HIVE-2030
 Project: Hive
  Issue Type: Improvement
Reporter: Siying Dong
Assignee: Siying Dong
Priority: Minor
 Attachments: HIVE-2030.1.patch, HIVE-2030.2.patch


 addInputPaths() calls isEmptyPath() for every input path. Now every call is a 
 DFS namenode call. Making isEmptyPath() to use cached ContentSummary, we 
 should be able to avoid some namenode calls and reduce latency in the case of 
 multiple partitions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (HIVE-1644) use filter pushdown for automatically accessing indexes

2011-03-08 Thread Russell Melick (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Russell Melick updated HIVE-1644:
-

Attachment: HIVE-1644.7.patch

HIVE-1744.7.patch:

@Yongqiang, I fixed the problems in GenMRTableScal1.java, and think I have 
dealt with most of your comments.  I'm confused about what you mean with the 
combinehiveinputformat.

@John, I made a first attempt at factoring SemanticAnalyzer calls into the 
ParseContext, but would appreciate your input.  This patch will also fail the 
unit test index_opt_where_simple.q as it stands.  However, if you remove the 
lines that attempt to use manual indexing, it succeeds.  The test that succeeds 
looks like

{code:sql}
CREATE INDEX src_index ON TABLE src(key) as 'COMPACT' WITH DEFERRED REBUILD;
ALTER INDEX src_index ON src REBUILD;

SET hive.optimize.autoindex=true;
EXPLAIN SELECT key, value FROM src WHERE key=86 ORDER BY key;
SELECT key, value FROM src WHERE key=86 ORDER BY key;

DROP INDEX src_index on src;
{code}

It appears as if our regular expression that identifies WHERE clauses by 
looking for FIL operators (filters) may not be specific enough.  I think the 
remaining errors might be caused by trying to generate index queries for both 
the {{{SELECT ... FROM src}}} (as desired), and the {{{SELECT ... FROM 
default__src_src_index__}}} that we generated, which is a problem.

 use filter pushdown for automatically accessing indexes
 ---

 Key: HIVE-1644
 URL: https://issues.apache.org/jira/browse/HIVE-1644
 Project: Hive
  Issue Type: Improvement
  Components: Indexing
Affects Versions: 0.7.0
Reporter: John Sichi
Assignee: Russell Melick
 Attachments: HIVE-1644.1.patch, HIVE-1644.2.patch, HIVE-1644.3.patch, 
 HIVE-1644.4.patch, HIVE-1644.5.patch, HIVE-1644.6.patch, HIVE-1644.7.patch


 HIVE-1226 provides utilities for analyzing filters which have been pushed 
 down to a table scan.  The next step is to use these for selecting available 
 indexes and generating access plans for those indexes.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Created: (HIVE-2037) Merge result file size should honor hive.merge.size.per.task

2011-03-08 Thread Ning Zhang (JIRA)
Merge result file size should honor hive.merge.size.per.task


 Key: HIVE-2037
 URL: https://issues.apache.org/jira/browse/HIVE-2037
 Project: Hive
  Issue Type: Bug
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-2037.patch

The merge job set mapred.min.split.size to the value of 
hive.merge.size.per.task, which roughly equals to the output file size. However 
the input split size is also determined by mapred.min.split.size.per.node, 
mapred.min.split.size.per.rack, and mapred.max.split.size. They should be set 
the same as hive.merge.size.per.task as well.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (HIVE-2037) Merge result file size should honor hive.merge.size.per.task

2011-03-08 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-2037:
-

Status: Patch Available  (was: Open)

 Merge result file size should honor hive.merge.size.per.task
 

 Key: HIVE-2037
 URL: https://issues.apache.org/jira/browse/HIVE-2037
 Project: Hive
  Issue Type: Bug
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-2037.patch


 The merge job set mapred.min.split.size to the value of 
 hive.merge.size.per.task, which roughly equals to the output file size. 
 However the input split size is also determined by 
 mapred.min.split.size.per.node, mapred.min.split.size.per.rack, and 
 mapred.max.split.size. They should be set the same as 
 hive.merge.size.per.task as well.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (HIVE-2037) Merge result file size should honor hive.merge.size.per.task

2011-03-08 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-2037:
-

Attachment: HIVE-2037.patch

 Merge result file size should honor hive.merge.size.per.task
 

 Key: HIVE-2037
 URL: https://issues.apache.org/jira/browse/HIVE-2037
 Project: Hive
  Issue Type: Bug
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-2037.patch


 The merge job set mapred.min.split.size to the value of 
 hive.merge.size.per.task, which roughly equals to the output file size. 
 However the input split size is also determined by 
 mapred.min.split.size.per.node, mapred.min.split.size.per.rack, and 
 mapred.max.split.size. They should be set the same as 
 hive.merge.size.per.task as well.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira