[jira] Updated: (HIVE-2053) Hive can't find the Plan

2011-03-15 Thread Aaron Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Guo updated HIVE-2053:


Attachment: patch-1.patch

 Hive can't find the Plan
 

 Key: HIVE-2053
 URL: https://issues.apache.org/jira/browse/HIVE-2053
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.7.0
Reporter: Aaron Guo
Priority: Critical
 Fix For: 0.7.0

 Attachments: patch-1.patch


 We I execute this SQL: select count(1) from table1;
 The MR can't execute for it can't find the Plan File in local path.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Created: (HIVE-2053) Hive can't find the Plan

2011-03-15 Thread Aaron Guo (JIRA)
Hive can't find the Plan


 Key: HIVE-2053
 URL: https://issues.apache.org/jira/browse/HIVE-2053
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.7.0
Reporter: Aaron Guo
Priority: Critical
 Fix For: 0.7.0
 Attachments: patch-1.patch

We I execute this SQL: select count(1) from table1;
The MR can't execute for it can't find the Plan File in local path.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Reopened: (HIVE-2053) Hive can't find the Plan

2011-03-15 Thread Aaron Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Guo reopened HIVE-2053:
-


 Hive can't find the Plan
 

 Key: HIVE-2053
 URL: https://issues.apache.org/jira/browse/HIVE-2053
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.7.0
Reporter: Aaron Guo
Priority: Critical
 Fix For: 0.7.0

 Attachments: patch-1.patch


 We I execute this SQL: select count(1) from table1;
 The MR can't execute for it can't find the Plan File in local path.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Resolved: (HIVE-2053) Hive can't find the Plan

2011-03-15 Thread Aaron Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Guo resolved HIVE-2053.
-

Resolution: Fixed

 Hive can't find the Plan
 

 Key: HIVE-2053
 URL: https://issues.apache.org/jira/browse/HIVE-2053
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.7.0
Reporter: Aaron Guo
Priority: Critical
 Fix For: 0.7.0

 Attachments: patch-1.patch


 We I execute this SQL: select count(1) from table1;
 The MR can't execute for it can't find the Plan File in local path.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (HIVE-1095) Hive in Maven

2011-03-15 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13006847#comment-13006847
 ] 

Amareshwari Sriramadasu commented on HIVE-1095:
---

Gerrit, did you get a chance to do this? As it is little urgent for us, I would 
like to take this up if you can not find time for finishing it.

 Hive in Maven
 -

 Key: HIVE-1095
 URL: https://issues.apache.org/jira/browse/HIVE-1095
 Project: Hive
  Issue Type: Task
  Components: Build Infrastructure
Affects Versions: 0.6.0
Reporter: Gerrit Jansen van Vuuren
Priority: Minor
 Attachments: HIVE-1095-trunk.patch, hiveReleasedToMaven.tar.gz


 Getting hive into maven main repositories
 Documentation on how to do this is on:
 http://maven.apache.org/guides/mini/guide-central-repository-upload.html

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Created: (HIVE-2054) Exception on windows when using the jdbc driver. IOException: The system cannot find the path specified

2011-03-15 Thread Bennie Schut (JIRA)
Exception on windows when using the jdbc driver. IOException: The system 
cannot find the path specified
-

 Key: HIVE-2054
 URL: https://issues.apache.org/jira/browse/HIVE-2054
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Reporter: Bennie Schut
Priority: Minor


It seems something recently changed on the jdbc driver which causes this 
IOException on windows.

java.lang.RuntimeException: java.io.IOException: The system cannot find the 
path specified
at 
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:237)
at 
org.apache.hadoop.hive.jdbc.HiveConnection.init(HiveConnection.java:73)
at org.apache.hadoop.hive.jdbc.HiveDriver.connect(HiveDriver.java:110)


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (HIVE-2054) Exception on windows when using the jdbc driver. IOException: The system cannot find the path specified

2011-03-15 Thread Bennie Schut (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13006899#comment-13006899
 ] 

Bennie Schut commented on HIVE-2054:


This seems to happen because we use the same SessionState class the cli is 
using which is now including some temporary output files and history file 
references.
It's rather trivial to remove the SessionState from the jdbc driver to make it 
work again (just tried this a few minutes ago). We currently have a 
JdbcSessionState which extends the SessonState but I don't see a need for the 
JdbcSessionState either. It seems to be there as a placeholder but is not 
actually used. 

 Exception on windows when using the jdbc driver. IOException: The system 
 cannot find the path specified
 -

 Key: HIVE-2054
 URL: https://issues.apache.org/jira/browse/HIVE-2054
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Reporter: Bennie Schut
Priority: Minor

 It seems something recently changed on the jdbc driver which causes this 
 IOException on windows.
 java.lang.RuntimeException: java.io.IOException: The system cannot find the 
 path specified
   at 
 org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:237)
   at 
 org.apache.hadoop.hive.jdbc.HiveConnection.init(HiveConnection.java:73)
   at org.apache.hadoop.hive.jdbc.HiveDriver.connect(HiveDriver.java:110)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


status of 0.7.0

2011-03-15 Thread Bill Au
What's the status of 0.7.0?  I noticed that rc0 was made available back on
2/18.  But then there has been no vote on it at all.  Is that save to use?

Bill


[jira] Updated: (HIVE-2054) Exception on windows when using the jdbc driver. IOException: The system cannot find the path specified

2011-03-15 Thread Bennie Schut (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bennie Schut updated HIVE-2054:
---

Attachment: HIVE-2054.1.patch.txt

Removing SessionState so the jdbc works correctly on windows.

 Exception on windows when using the jdbc driver. IOException: The system 
 cannot find the path specified
 -

 Key: HIVE-2054
 URL: https://issues.apache.org/jira/browse/HIVE-2054
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Reporter: Bennie Schut
Priority: Minor
 Attachments: HIVE-2054.1.patch.txt


 It seems something recently changed on the jdbc driver which causes this 
 IOException on windows.
 java.lang.RuntimeException: java.io.IOException: The system cannot find the 
 path specified
   at 
 org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:237)
   at 
 org.apache.hadoop.hive.jdbc.HiveConnection.init(HiveConnection.java:73)
   at org.apache.hadoop.hive.jdbc.HiveDriver.connect(HiveDriver.java:110)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (HIVE-1434) Cassandra Storage Handler

2011-03-15 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13006926#comment-13006926
 ] 

Edward Capriolo commented on HIVE-1434:
---

That makes sense. I was thinking about this some more and someone might want to 
try doing the locking with C*. Along that thinking I figure why put it one 
place now just to move it later.

 Cassandra Storage Handler
 -

 Key: HIVE-1434
 URL: https://issues.apache.org/jira/browse/HIVE-1434
 Project: Hive
  Issue Type: New Feature
Affects Versions: 0.7.0
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Attachments: cas-handle.tar.gz, cass_handler.diff, hive-1434-1.txt, 
 hive-1434-2-patch.txt, hive-1434-2011-02-26.patch.txt, 
 hive-1434-2011-03-07.patch.txt, hive-1434-2011-03-07.patch.txt, 
 hive-1434-2011-03-14.patch.txt, hive-1434-3-patch.txt, hive-1434-4-patch.txt, 
 hive-1434-5.patch.txt, hive-1434.2011-02-27.diff.txt, 
 hive-cassandra.2011-02-25.txt, hive.diff


 Add a cassandra storage handler.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


status of 0.7.0

2011-03-15 Thread Bill Au
What's the status of 0.7.0?  I noticed that rc0 was made available back on
2/18.  But then there has been no vote on it at all.  Is that save to use?

Bill


[jira] Created: (HIVE-2055) Hive HBase Integration issue

2011-03-15 Thread sajith v (JIRA)
Hive HBase Integration issue


 Key: HIVE-2055
 URL: https://issues.apache.org/jira/browse/HIVE-2055
 Project: Hive
  Issue Type: Bug
Reporter: sajith v


Created an external table in hive , which points to the HBase table. When tried 
to query a column using the column name in select clause got the following 
exception : ( java.lang.ClassNotFoundException: 
org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat), errorCode:12, 
SQLState:42000)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (HIVE-2042) In error scenario some opened streams may not closed in ExplainTask.java and Throttle.java

2011-03-15 Thread Chinna Rao Lalam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chinna Rao Lalam updated HIVE-2042:
---

Attachment: HIVE-2042.2.Patch

 In error scenario some opened streams may not closed in ExplainTask.java and 
 Throttle.java
 --

 Key: HIVE-2042
 URL: https://issues.apache.org/jira/browse/HIVE-2042
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.7.0
 Environment: Hadoop 0.20.1, Hive0.7.0 and SUSE Linux Enterprise 
 Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5).
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam
 Attachments: HIVE-2042.2.Patch, HIVE-2042.Patch


 1) In error scenario PrintStream may not be closed in execute() of  
 ExplainTask.java
 2) In error scenario InputStream may not be closed in checkJobTracker() of 
 Throttle.java 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (HIVE-2031) Correct the exception message for the better traceability for the scenario load into the partitioned table having 2 partitions by specifying only one partition in the load

2011-03-15 Thread Chinna Rao Lalam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chinna Rao Lalam updated HIVE-2031:
---

Attachment: HIVE-2031.2.patch

 Correct the exception message for the better traceability for the scenario 
 load into the partitioned table having 2  partitions by specifying only one 
 partition in the load statement. 
 

 Key: HIVE-2031
 URL: https://issues.apache.org/jira/browse/HIVE-2031
 Project: Hive
  Issue Type: Bug
  Components: Logging
Affects Versions: 0.7.0
 Environment: Hadoop 0.20.1, Hive0.7.0 and SUSE Linux Enterprise 
 Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5).
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam
 Attachments: HIVE-2031.2.patch, HIVE-2031.patch


  Load into the partitioned table having 2 partitions by specifying only one 
 partition in the load statement is failing and logging the following 
 exception message.
 {noformat}
  org.apache.hadoop.hive.ql.parse.SemanticException: line 1:91 Partition not 
 found '21Oct'
   at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer$tableSpec.init(BaseSemanticAnalyzer.java:685)
   at 
 org.apache.hadoop.hive.ql.parse.LoadSemanticAnalyzer.analyzeInternal(LoadSemanticAnalyzer.java:196)
   at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:340)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:736)
   at 
 org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServer.java:151)
   at 
 org.apache.hadoop.hive.service.ThriftHive$Processor$execute.process(ThriftHive.java:764)
   at 
 org.apache.hadoop.hive.service.ThriftHive$Processor.process(ThriftHive.java:742)
   at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:619)
 {noformat}
 This needs to be corrected in such a way what is the actual root cause for 
 this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (HIVE-2031) Correct the exception message for the better traceability for the scenario load into the partitioned table having 2 partitions by specifying only one partition in the load

2011-03-15 Thread Chinna Rao Lalam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chinna Rao Lalam updated HIVE-2031:
---

Status: Patch Available  (was: Open)

 Correct the exception message for the better traceability for the scenario 
 load into the partitioned table having 2  partitions by specifying only one 
 partition in the load statement. 
 

 Key: HIVE-2031
 URL: https://issues.apache.org/jira/browse/HIVE-2031
 Project: Hive
  Issue Type: Bug
  Components: Logging
Affects Versions: 0.7.0
 Environment: Hadoop 0.20.1, Hive0.7.0 and SUSE Linux Enterprise 
 Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5).
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam
 Attachments: HIVE-2031.2.patch, HIVE-2031.patch


  Load into the partitioned table having 2 partitions by specifying only one 
 partition in the load statement is failing and logging the following 
 exception message.
 {noformat}
  org.apache.hadoop.hive.ql.parse.SemanticException: line 1:91 Partition not 
 found '21Oct'
   at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer$tableSpec.init(BaseSemanticAnalyzer.java:685)
   at 
 org.apache.hadoop.hive.ql.parse.LoadSemanticAnalyzer.analyzeInternal(LoadSemanticAnalyzer.java:196)
   at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:340)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:736)
   at 
 org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServer.java:151)
   at 
 org.apache.hadoop.hive.service.ThriftHive$Processor$execute.process(ThriftHive.java:764)
   at 
 org.apache.hadoop.hive.service.ThriftHive$Processor.process(ThriftHive.java:742)
   at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:619)
 {noformat}
 This needs to be corrected in such a way what is the actual root cause for 
 this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (HIVE-2042) In error scenario some opened streams may not closed in ExplainTask.java and Throttle.java

2011-03-15 Thread Chinna Rao Lalam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13006963#comment-13006963
 ] 

Chinna Rao Lalam commented on HIVE-2042:


Sure Amareshwari, I uploaded a patch by combining these issues HIVE-2042, 
HIVE-2043, HIVE-2044 and HIVE-2046 into one patch.

Next time onwards i will logically group the comments and i will upload the 
patch.

 In error scenario some opened streams may not closed in ExplainTask.java and 
 Throttle.java
 --

 Key: HIVE-2042
 URL: https://issues.apache.org/jira/browse/HIVE-2042
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.7.0
 Environment: Hadoop 0.20.1, Hive0.7.0 and SUSE Linux Enterprise 
 Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5).
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam
 Attachments: HIVE-2042.2.Patch, HIVE-2042.Patch


 1) In error scenario PrintStream may not be closed in execute() of  
 ExplainTask.java
 2) In error scenario InputStream may not be closed in checkJobTracker() of 
 Throttle.java 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (HIVE-1095) Hive in Maven

2011-03-15 Thread Gerrit Jansen van Vuuren (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gerrit Jansen van Vuuren updated HIVE-1095:
---

Attachment: HIVE-1095.v2.PATCH

hive patch for generating maven artifacts for hive.

 Hive in Maven
 -

 Key: HIVE-1095
 URL: https://issues.apache.org/jira/browse/HIVE-1095
 Project: Hive
  Issue Type: Task
  Components: Build Infrastructure
Affects Versions: 0.6.0
Reporter: Gerrit Jansen van Vuuren
Priority: Minor
 Attachments: HIVE-1095-trunk.patch, HIVE-1095.v2.PATCH, 
 hiveReleasedToMaven.tar.gz


 Getting hive into maven main repositories
 Documentation on how to do this is on:
 http://maven.apache.org/guides/mini/guide-central-repository-upload.html

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (HIVE-1095) Hive in Maven

2011-03-15 Thread Gerrit Jansen van Vuuren (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13006971#comment-13006971
 ] 

Gerrit Jansen van Vuuren commented on HIVE-1095:


Hi,

The above patch compiles and generates the maven artifacts to : build/maven
3 directories are generated: jars, licences, and poms.

Notes:
The poms are automatically generated from the ivy dependencies. 
md5 and sha1 checksums is generated for each artifact (jar, pom, and licence).

Target to run:
From the trunk/build.xml the target make-maven can be used.
From each sub project the target make-pom can be used.

As far as I could read this should be enough to deploy to a maven repository. 
The groupId would be org.apache.hadoop.hive, this is what was in the ivy files.

I have added another target called  maven-publish  in the trunk/build.xml
This has not been tested and is supposed to make deployment to a maven 
repository easier.



 

 Hive in Maven
 -

 Key: HIVE-1095
 URL: https://issues.apache.org/jira/browse/HIVE-1095
 Project: Hive
  Issue Type: Task
  Components: Build Infrastructure
Affects Versions: 0.6.0
Reporter: Gerrit Jansen van Vuuren
Priority: Minor
 Attachments: HIVE-1095-trunk.patch, HIVE-1095.v2.PATCH, 
 hiveReleasedToMaven.tar.gz


 Getting hive into maven main repositories
 Documentation on how to do this is on:
 http://maven.apache.org/guides/mini/guide-central-repository-upload.html

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Created: (HIVE-2056) Generate single MR job for multi groupby query.

2011-03-15 Thread Amareshwari Sriramadasu (JIRA)
Generate single MR job for multi groupby query.
---

 Key: HIVE-2056
 URL: https://issues.apache.org/jira/browse/HIVE-2056
 Project: Hive
  Issue Type: Improvement
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (HIVE-2056) Generate single MR job for multi groupby query.

2011-03-15 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13006995#comment-13006995
 ] 

Amareshwari Sriramadasu commented on HIVE-2056:
---

Here is a request from one of our customers:

here is a real example of need to have multi group by with 1 M/R. If
you look at the query below, we have two aggregates being generated out of 
single fact table. The 1st aggregate
generates unique count by date and the 2nd one generates unique count by date 
and gender. We have lot of
these aggregates to be built. We would like this to be done in 1 M/R job as 
against three below. Is it possible to do
this in Hive?

// created two intermediate tables

hive create table test_1 (dt string, bc_cnt bigint);

OK

Time taken: 9.004 seconds

hive create table test_2 (dt string, gender string, bc_cnt bigint);

OK



// multi group by in insert statement



hive from fact_table f

 insert overwrite table test_1 select dt, count(distinct id) group by dt

 insert overwrite table test_2 select dt,gender,count(distinct id) group 
by dt,gender;

Total MapReduce jobs = 3

Launching Job 1 out of 3

Number of reduce tasks not specified. Estimated from input data size: 999

In order to change the average load for a reducer (in bytes):

  set hive.exec.reducers.bytes.per.reducer=number

In order to limit the maximum number of reducers:

  set hive.exec.reducers.max=number

In order to set a constant number of reducers:

  set mapred.reduce.tasks=number



Thanks

Sudhish



 Generate single MR job for multi groupby query.
 ---

 Key: HIVE-2056
 URL: https://issues.apache.org/jira/browse/HIVE-2056
 Project: Hive
  Issue Type: Improvement
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (HIVE-1434) Cassandra Storage Handler

2011-03-15 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007022#comment-13007022
 ] 

John Sichi commented on HIVE-1434:
--

If that happened we would want to go the other direction (make the lock manager 
harness used by the unit test framework pluggable, and pull zk out of ql) 
rather than dragging more stuff into ql.

Let's try to keep cassandra-handler as low-impact as possible.


 Cassandra Storage Handler
 -

 Key: HIVE-1434
 URL: https://issues.apache.org/jira/browse/HIVE-1434
 Project: Hive
  Issue Type: New Feature
Affects Versions: 0.7.0
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Attachments: cas-handle.tar.gz, cass_handler.diff, hive-1434-1.txt, 
 hive-1434-2-patch.txt, hive-1434-2011-02-26.patch.txt, 
 hive-1434-2011-03-07.patch.txt, hive-1434-2011-03-07.patch.txt, 
 hive-1434-2011-03-14.patch.txt, hive-1434-3-patch.txt, hive-1434-4-patch.txt, 
 hive-1434-5.patch.txt, hive-1434.2011-02-27.diff.txt, 
 hive-cassandra.2011-02-25.txt, hive.diff


 Add a cassandra storage handler.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (HIVE-1095) Hive in Maven

2011-03-15 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007144#comment-13007144
 ] 

Carl Steinbach commented on HIVE-1095:
--

Since Hive is now a TLP I think the groupId should be org.apache.hive


 Hive in Maven
 -

 Key: HIVE-1095
 URL: https://issues.apache.org/jira/browse/HIVE-1095
 Project: Hive
  Issue Type: Task
  Components: Build Infrastructure
Affects Versions: 0.6.0
Reporter: Gerrit Jansen van Vuuren
Priority: Minor
 Attachments: HIVE-1095-trunk.patch, HIVE-1095.v2.PATCH, 
 hiveReleasedToMaven.tar.gz


 Getting hive into maven main repositories
 Documentation on how to do this is on:
 http://maven.apache.org/guides/mini/guide-central-repository-upload.html

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (HIVE-1918) Add export/import facilities to the hive system

2011-03-15 Thread Paul Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Yang updated HIVE-1918:


   Resolution: Fixed
Fix Version/s: 0.8.0
   Status: Resolved  (was: Patch Available)

Committed. Thanks Krishna!

 Add export/import facilities to the hive system
 ---

 Key: HIVE-1918
 URL: https://issues.apache.org/jira/browse/HIVE-1918
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Krishna Kumar
Assignee: Krishna Kumar
 Fix For: 0.8.0

 Attachments: HIVE-1918.patch.1.txt, HIVE-1918.patch.2.txt, 
 HIVE-1918.patch.3.txt, HIVE-1918.patch.4.txt, HIVE-1918.patch.5.txt, 
 HIVE-1918.patch.txt, hive-metastore-er.pdf


 This is an enhancement request to add export/import features to hive.
 With this language extension, the user can export the data of the table - 
 which may be located in different hdfs locations in case of a partitioned 
 table - as well as the metadata of the table into a specified output 
 location. This output location can then be moved over to another different 
 hadoop/hive instance and imported there.  
 This should work independent of the source and target metastore dbms used; 
 for instance, between derby and mysql.
 For partitioned tables, the ability to export/import a subset of the 
 partition must be supported.
 Howl will add more features on top of this: The ability to create/use the 
 exported data even in the absence of hive, using MR or Pig. Please see 
 http://wiki.apache.org/pig/Howl/HowlImportExport for these details.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (HIVE-1867) Add mechanism for disabling tests with intermittent failures

2011-03-15 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1867:
-

Summary: Add mechanism for disabling tests with intermittent failures  
(was: Fix intermittent failures in TestNegativeCliDriver/dyn_part_empty.q)

 Add mechanism for disabling tests with intermittent failures
 

 Key: HIVE-1867
 URL: https://issues.apache.org/jira/browse/HIVE-1867
 Project: Hive
  Issue Type: Bug
  Components: Testing Infrastructure
Affects Versions: 0.7.0
Reporter: Carl Steinbach
Assignee: Marcel Kornacker
 Attachments: HIVE-1867.1.patch


 {code}
 [junit] Begin query: dyn_part_empty.q
 [junit] Running org.apache.hadoop.hive.cli.TestNegativeCliDriver
 [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec
 [junit] Test org.apache.hadoop.hive.cli.TestNegativeCliDriver FAILED 
 (crashed)
 {code}
 dyn_part_empty.q has been intermittently failing on Hudson. I was able to 
 reproduce locally,
 and with different versions of JUnit (3.8.1, 4.5, 4.8.2).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (HIVE-1434) Cassandra Storage Handler

2011-03-15 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007169#comment-13007169
 ] 

Edward Capriolo commented on HIVE-1434:
---

Ok so be it moving this is easy. Is this the only comment? (I am not going to 
regen the patch if more review is pending.

 Cassandra Storage Handler
 -

 Key: HIVE-1434
 URL: https://issues.apache.org/jira/browse/HIVE-1434
 Project: Hive
  Issue Type: New Feature
Affects Versions: 0.7.0
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Attachments: cas-handle.tar.gz, cass_handler.diff, hive-1434-1.txt, 
 hive-1434-2-patch.txt, hive-1434-2011-02-26.patch.txt, 
 hive-1434-2011-03-07.patch.txt, hive-1434-2011-03-07.patch.txt, 
 hive-1434-2011-03-14.patch.txt, hive-1434-3-patch.txt, hive-1434-4-patch.txt, 
 hive-1434-5.patch.txt, hive-1434.2011-02-27.diff.txt, 
 hive-cassandra.2011-02-25.txt, hive.diff


 Add a cassandra storage handler.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (HIVE-1095) Hive in Maven

2011-03-15 Thread Gerrit Jansen van Vuuren (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gerrit Jansen van Vuuren updated HIVE-1095:
---

Attachment: HIVE-1095.v3.PATCH

True it should be org.apache.hive
here is the updated patch.

(I also applied some space formatting to the build.xml file).

 Hive in Maven
 -

 Key: HIVE-1095
 URL: https://issues.apache.org/jira/browse/HIVE-1095
 Project: Hive
  Issue Type: Task
  Components: Build Infrastructure
Affects Versions: 0.6.0
Reporter: Gerrit Jansen van Vuuren
Priority: Minor
 Attachments: HIVE-1095-trunk.patch, HIVE-1095.v2.PATCH, 
 HIVE-1095.v3.PATCH, hiveReleasedToMaven.tar.gz


 Getting hive into maven main repositories
 Documentation on how to do this is on:
 http://maven.apache.org/guides/mini/guide-central-repository-upload.html

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (HIVE-2054) Exception on windows when using the jdbc driver. IOException: The system cannot find the path specified

2011-03-15 Thread Bennie Schut (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bennie Schut updated HIVE-2054:
---

Fix Version/s: 0.8.0
 Assignee: Bennie Schut
Affects Version/s: 0.8.0
 Release Note: Fix for IOException on the jdbc driver on windows.
   Status: Patch Available  (was: Open)

https://reviews.apache.org/r/513/

 Exception on windows when using the jdbc driver. IOException: The system 
 cannot find the path specified
 -

 Key: HIVE-2054
 URL: https://issues.apache.org/jira/browse/HIVE-2054
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.8.0
Reporter: Bennie Schut
Assignee: Bennie Schut
Priority: Minor
 Fix For: 0.8.0

 Attachments: HIVE-2054.1.patch.txt


 It seems something recently changed on the jdbc driver which causes this 
 IOException on windows.
 java.lang.RuntimeException: java.io.IOException: The system cannot find the 
 path specified
   at 
 org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:237)
   at 
 org.apache.hadoop.hive.jdbc.HiveConnection.init(HiveConnection.java:73)
   at org.apache.hadoop.hive.jdbc.HiveDriver.connect(HiveDriver.java:110)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Review Request: HIVE-2054: fix for IOException on the jdbc driver on windows.

2011-03-15 Thread Bennie Schut

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/513/
---

Review request for hive.


Summary
---

HIVE-2054: fix for IOException on the jdbc driver on windows.


This addresses bug HIVE-2054.
https://issues.apache.org/jira/browse/HIVE-2054


Diffs
-

  trunk/jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveConnection.java 1081782 
  trunk/jdbc/src/java/org/apache/hadoop/hive/jdbc/HivePreparedStatement.java 
1081782 
  trunk/jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveStatement.java 1081782 
  trunk/jdbc/src/java/org/apache/hadoop/hive/jdbc/JdbcSessionState.java 1081782 

Diff: https://reviews.apache.org/r/513/diff


Testing
---


Thanks,

Bennie



[jira] Commented: (HIVE-1434) Cassandra Storage Handler

2011-03-15 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007209#comment-13007209
 ] 

John Sichi commented on HIVE-1434:
--

The test is passing for me now with the latest patch.  I haven't looked at the 
latest code much yet.

In CassandraTestSetup.java, I see

{noformat}
FramedConnWrapper wrap = new FramedConnWrapper(127.0.0.1,9170,5000);
{noformat}

Does that mean a listening port is being used?  If so, please change it to use 
a dynamic port like I did for the HBase tests; otherwise we'll get sporadic 
conflicts with other services.

Also, what's up with adding 
org/apache/cassandra/contrib/utils/service/CassandraServiceDataCleaner.java 
into the Hive codebase?  I don't think we want to do that.


 Cassandra Storage Handler
 -

 Key: HIVE-1434
 URL: https://issues.apache.org/jira/browse/HIVE-1434
 Project: Hive
  Issue Type: New Feature
Affects Versions: 0.7.0
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Attachments: cas-handle.tar.gz, cass_handler.diff, hive-1434-1.txt, 
 hive-1434-2-patch.txt, hive-1434-2011-02-26.patch.txt, 
 hive-1434-2011-03-07.patch.txt, hive-1434-2011-03-07.patch.txt, 
 hive-1434-2011-03-14.patch.txt, hive-1434-3-patch.txt, hive-1434-4-patch.txt, 
 hive-1434-5.patch.txt, hive-1434.2011-02-27.diff.txt, 
 hive-cassandra.2011-02-25.txt, hive.diff


 Add a cassandra storage handler.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (HIVE-1434) Cassandra Storage Handler

2011-03-15 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007211#comment-13007211
 ] 

Edward Capriolo commented on HIVE-1434:
---

CassandraServiceDataCleaner.java is a glorified 'rm -rf' that is in contrib and 
does not get packaged in maven (I do not think)

As for dynamic listening ports. This is not as easy as it is for hbase. 
Cassandra reading it's configuration is more of a black box. You can use 
properties to point at different folders, but when Cassandra initializes the 
first thing that happens is the configuration file is read.

AFAIK the only way to can do this is dynamically generate it's yaml file. This 
is going to be ugly.

 Cassandra Storage Handler
 -

 Key: HIVE-1434
 URL: https://issues.apache.org/jira/browse/HIVE-1434
 Project: Hive
  Issue Type: New Feature
Affects Versions: 0.7.0
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Attachments: cas-handle.tar.gz, cass_handler.diff, hive-1434-1.txt, 
 hive-1434-2-patch.txt, hive-1434-2011-02-26.patch.txt, 
 hive-1434-2011-03-07.patch.txt, hive-1434-2011-03-07.patch.txt, 
 hive-1434-2011-03-14.patch.txt, hive-1434-3-patch.txt, hive-1434-4-patch.txt, 
 hive-1434-5.patch.txt, hive-1434.2011-02-27.diff.txt, 
 hive-cassandra.2011-02-25.txt, hive.diff


 Add a cassandra storage handler.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Created: (HIVE-2057) eliminate parser warning for Identifier DOT Identifier

2011-03-15 Thread John Sichi (JIRA)
eliminate parser warning for Identifier DOT Identifier


 Key: HIVE-2057
 URL: https://issues.apache.org/jira/browse/HIVE-2057
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: John Sichi


I noticed this warning in recent builds:

{noformat}
build-grammar:
 [echo] Building Grammar 
/data/users/jsichi/open/hive-trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g
  
 [java] ANTLR Parser Generator  Version 3.0.1 (August 13, 2007)  1989-2007
 [java] warning(200): 
/data/users/jsichi/open/hive-trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g:1503:5:
 Decision can match input such as Identifier DOT Identifier using multiple 
alternatives: 1, 2
 [java] As a result, alternative(s) 2 were disabled for that input
{noformat}

This was introduced by HIVE-1517.  Is there a way to get rid of it?


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (HIVE-1434) Cassandra Storage Handler

2011-03-15 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007225#comment-13007225
 ] 

Edward Capriolo commented on HIVE-1434:
---

Older versions of the Cassandra embedded server had init() then start(). If we 
went to a model like that some code could change the loaded configuration after 
the load. 

I do not things the port is a serious blocker. What one out of every 65K tests 
will fail with a port already in use exception

 Cassandra Storage Handler
 -

 Key: HIVE-1434
 URL: https://issues.apache.org/jira/browse/HIVE-1434
 Project: Hive
  Issue Type: New Feature
Affects Versions: 0.7.0
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Attachments: cas-handle.tar.gz, cass_handler.diff, hive-1434-1.txt, 
 hive-1434-2-patch.txt, hive-1434-2011-02-26.patch.txt, 
 hive-1434-2011-03-07.patch.txt, hive-1434-2011-03-07.patch.txt, 
 hive-1434-2011-03-14.patch.txt, hive-1434-3-patch.txt, hive-1434-4-patch.txt, 
 hive-1434-5.patch.txt, hive-1434.2011-02-27.diff.txt, 
 hive-cassandra.2011-02-25.txt, hive.diff


 Add a cassandra storage handler.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (HIVE-1434) Cassandra Storage Handler

2011-03-15 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007231#comment-13007231
 ] 

John Sichi commented on HIVE-1434:
--

It is a blocker.  The HBase problems were part of what caused Hive continuous 
integration to go broken for many weeks.  The failure frequency was very high 
due to conflicts from unrelated port-hungry services being run on committer dev 
boxes.


 Cassandra Storage Handler
 -

 Key: HIVE-1434
 URL: https://issues.apache.org/jira/browse/HIVE-1434
 Project: Hive
  Issue Type: New Feature
Affects Versions: 0.7.0
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Attachments: cas-handle.tar.gz, cass_handler.diff, hive-1434-1.txt, 
 hive-1434-2-patch.txt, hive-1434-2011-02-26.patch.txt, 
 hive-1434-2011-03-07.patch.txt, hive-1434-2011-03-07.patch.txt, 
 hive-1434-2011-03-14.patch.txt, hive-1434-3-patch.txt, hive-1434-4-patch.txt, 
 hive-1434-5.patch.txt, hive-1434.2011-02-27.diff.txt, 
 hive-cassandra.2011-02-25.txt, hive.diff


 Add a cassandra storage handler.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (HIVE-2053) Hive can't find the Plan

2011-03-15 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-2053:
-

Fix Version/s: (was: 0.7.0)

 Hive can't find the Plan
 

 Key: HIVE-2053
 URL: https://issues.apache.org/jira/browse/HIVE-2053
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.7.0
Reporter: Aaron Guo
Priority: Critical
 Attachments: patch-1.patch


 We I execute this SQL: select count(1) from table1;
 The MR can't execute for it can't find the Plan File in local path.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: status of 0.7.0

2011-03-15 Thread Carl Steinbach
Hi Bill,

There are two open blocker tickets related to bugs in the metastore upgrade
scripts (which are present in rc0). Once these are resolved we'll be ready
to vote on a new release candidate.

Thanks.

Carl


On Tue, Mar 15, 2011 at 7:08 AM, Bill Au bill.w...@gmail.com wrote:

 What's the status of 0.7.0?  I noticed that rc0 was made available back on
 2/18.  But then there has been no vote on it at all.  Is that save to use?

 Bill



[jira] Commented: (HIVE-2054) Exception on windows when using the jdbc driver. IOException: The system cannot find the path specified

2011-03-15 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007240#comment-13007240
 ] 

Ning Zhang commented on HIVE-2054:
--

Bennie, do you know what changes in SessionState causes JDBC failed on Windows? 

We recently committed HIVE-818 which changes HiveServer and some behavior on 
how to pass End-of-file from the server side to the client side. Previously 
server just send an empty string and now server throws an HiveServerException 
with ErrorCode = 0. I think that may be the reason. Can you try your test case 
with and without HIVE-818 to verify?

 Exception on windows when using the jdbc driver. IOException: The system 
 cannot find the path specified
 -

 Key: HIVE-2054
 URL: https://issues.apache.org/jira/browse/HIVE-2054
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.8.0
Reporter: Bennie Schut
Assignee: Bennie Schut
Priority: Minor
 Fix For: 0.8.0

 Attachments: HIVE-2054.1.patch.txt


 It seems something recently changed on the jdbc driver which causes this 
 IOException on windows.
 java.lang.RuntimeException: java.io.IOException: The system cannot find the 
 path specified
   at 
 org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:237)
   at 
 org.apache.hadoop.hive.jdbc.HiveConnection.init(HiveConnection.java:73)
   at org.apache.hadoop.hive.jdbc.HiveDriver.connect(HiveDriver.java:110)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (HIVE-2053) Hive can't find the Plan

2011-03-15 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007242#comment-13007242
 ] 

Ning Zhang commented on HIVE-2053:
--

Aaron, did you see this error in trunk or an older release? I think we fixed it 
in some JIRA a while ago. Can you provide a reproducible test case?

 Hive can't find the Plan
 

 Key: HIVE-2053
 URL: https://issues.apache.org/jira/browse/HIVE-2053
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.7.0
Reporter: Aaron Guo
Priority: Critical
 Attachments: patch-1.patch


 We I execute this SQL: select count(1) from table1;
 The MR can't execute for it can't find the Plan File in local path.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (HIVE-2051) getInputSummary() to call FileSystem.getContentSummary() in parallel

2011-03-15 Thread Siying Dong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007293#comment-13007293
 ] 

Siying Dong commented on HIVE-2051:
---

@Carl?

 getInputSummary() to call FileSystem.getContentSummary() in parallel
 

 Key: HIVE-2051
 URL: https://issues.apache.org/jira/browse/HIVE-2051
 Project: Hive
  Issue Type: Improvement
Reporter: Siying Dong
Assignee: Siying Dong
Priority: Minor
 Attachments: HIVE-2051.1.patch, HIVE-2051.2.patch, HIVE-2051.3.patch


 getInputSummary() now call FileSystem.getContentSummary() one by one, which 
 can be extremely slow when the number of input paths are huge. By calling 
 those functions in parallel, we can cut latency in most cases.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Build failed in Jenkins: Hive-0.7.0-h0.20 #39

2011-03-15 Thread Apache Hudson Server
See https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/39/changes

Changes:

[cws] HIVE-1867 Add mechanism for disabling tests with intermittent failures 
(Marcel Kornacker via cws)

--
[...truncated 27355 lines...]
[junit] Hive history 
file=https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/build/service/tmp/hive_job_log_hudson_201103151750_2136657594.txt
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (num int)
[junit] PREHOOK: type: CREATETABLE
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] POSTHOOK: type: CREATETABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: load data local inpath 
'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt
[junit] Loading data to table default.testhivedrivertable
[junit] POSTHOOK: query: load data local inpath 
'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: select count(1) as cnt from testhivedrivertable
[junit] PREHOOK: type: QUERY
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: 
file:/tmp/hudson/hive_2011-03-15_17-50-18_342_6473265109270766332/-mr-1
[junit] Total MapReduce jobs = 1
[junit] Launching Job 1 out of 1
[junit] Number of reduce tasks determined at compile time: 1
[junit] In order to change the average load for a reducer (in bytes):
[junit]   set hive.exec.reducers.bytes.per.reducer=number
[junit] In order to limit the maximum number of reducers:
[junit]   set hive.exec.reducers.max=number
[junit] In order to set a constant number of reducers:
[junit]   set mapred.reduce.tasks=number
[junit] Job running in-process (local Hadoop)
[junit] 2011-03-15 17:50:21,390 null map = 100%,  reduce = 100%
[junit] Ended Job = job_local_0001
[junit] POSTHOOK: query: select count(1) as cnt from testhivedrivertable
[junit] POSTHOOK: type: QUERY
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 
file:/tmp/hudson/hive_2011-03-15_17-50-18_342_6473265109270766332/-mr-1
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/build/service/tmp/hive_job_log_hudson_201103151750_729047961.txt
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (num int)
[junit] PREHOOK: type: CREATETABLE
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] POSTHOOK: type: CREATETABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: load data local inpath 
'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt
[junit] Loading data to table default.testhivedrivertable
[junit] POSTHOOK: query: load data local inpath 
'https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/ws/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: select * from testhivedrivertable limit 10
[junit] PREHOOK: type: QUERY
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: 
file:/tmp/hudson/hive_2011-03-15_17-50-23_152_4705455110613204180/-mr-1
[junit] POSTHOOK: query: select * from testhivedrivertable limit 10
[junit] POSTHOOK: type: QUERY
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 

Re: Review Request: HIVE-2051: getInputSummary() to call FileSystem.getContentSummary() in parallel

2011-03-15 Thread Carl Steinbach

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/491/
---

(Updated 2011-03-15 17:53:42.754109)


Review request for hive.


Changes
---

Updated patch with 
https://issues.apache.org/jira/secure/attachment/12473628/HIVE-2051.3.patch


Summary
---

Review request for HIVE-2051.


This addresses bug HIVE-2051.
https://issues.apache.org/jira/browse/HIVE-2051


Diffs (updated)
-

  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 1081571 

Diff: https://reviews.apache.org/r/491/diff


Testing
---


Thanks,

Carl



[jira] Commented: (HIVE-2053) Hive can't find the Plan

2011-03-15 Thread Aaron Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007311#comment-13007311
 ] 

Aaron Guo commented on HIVE-2053:
-

Hi Zhangning, I see this error in the current trunk, I will provide more 
information later.

 Hive can't find the Plan
 

 Key: HIVE-2053
 URL: https://issues.apache.org/jira/browse/HIVE-2053
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.7.0
Reporter: Aaron Guo
Priority: Critical
 Attachments: patch-1.patch


 We I execute this SQL: select count(1) from table1;
 The MR can't execute for it can't find the Plan File in local path.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (HIVE-2051) getInputSummary() to call FileSystem.getContentSummary() in parallel

2011-03-15 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007323#comment-13007323
 ] 

Joydeep Sen Sarma commented on HIVE-2051:
-

looked at the latest patch from Carl. don't get it - why should we pay cost for 
creating thread when one is not required? 

 getInputSummary() to call FileSystem.getContentSummary() in parallel
 

 Key: HIVE-2051
 URL: https://issues.apache.org/jira/browse/HIVE-2051
 Project: Hive
  Issue Type: Improvement
Reporter: Siying Dong
Assignee: Siying Dong
Priority: Minor
 Attachments: HIVE-2051.1.patch, HIVE-2051.2.patch, HIVE-2051.3.patch


 getInputSummary() now call FileSystem.getContentSummary() one by one, which 
 can be extremely slow when the number of input paths are huge. By calling 
 those functions in parallel, we can cut latency in most cases.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (HIVE-2051) getInputSummary() to call FileSystem.getContentSummary() in parallel

2011-03-15 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007327#comment-13007327
 ] 

Carl Steinbach commented on HIVE-2051:
--

Just to be clear I updated the reviewboard ticket with the latest version of 
Siying's patch. Also, the comments on reviewboard are from M IS, not me.

 getInputSummary() to call FileSystem.getContentSummary() in parallel
 

 Key: HIVE-2051
 URL: https://issues.apache.org/jira/browse/HIVE-2051
 Project: Hive
  Issue Type: Improvement
Reporter: Siying Dong
Assignee: Siying Dong
Priority: Minor
 Attachments: HIVE-2051.1.patch, HIVE-2051.2.patch, HIVE-2051.3.patch


 getInputSummary() now call FileSystem.getContentSummary() one by one, which 
 can be extremely slow when the number of input paths are huge. By calling 
 those functions in parallel, we can cut latency in most cases.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (HIVE-1095) Hive in Maven

2011-03-15 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007334#comment-13007334
 ] 

Amareshwari Sriramadasu commented on HIVE-1095:
---

Also, homeepage should be http://hive.apache.org, not 
http//hadoop.apache.org/hive. 

 Hive in Maven
 -

 Key: HIVE-1095
 URL: https://issues.apache.org/jira/browse/HIVE-1095
 Project: Hive
  Issue Type: Task
  Components: Build Infrastructure
Affects Versions: 0.6.0
Reporter: Gerrit Jansen van Vuuren
Priority: Minor
 Attachments: HIVE-1095-trunk.patch, HIVE-1095.v2.PATCH, 
 HIVE-1095.v3.PATCH, hiveReleasedToMaven.tar.gz


 Getting hive into maven main repositories
 Documentation on how to do this is on:
 http://maven.apache.org/guides/mini/guide-central-repository-upload.html

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (HIVE-1095) Hive in Maven

2011-03-15 Thread Giridharan Kesavan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007344#comment-13007344
 ] 

Giridharan Kesavan commented on HIVE-1095:
--

v3 version of patch fails with a conflict on build-common.xml

Hunk #1 FAILED at 1.
1 out of 1 hunk FAILED -- saving rejects to file build-common.xml.rej



 Hive in Maven
 -

 Key: HIVE-1095
 URL: https://issues.apache.org/jira/browse/HIVE-1095
 Project: Hive
  Issue Type: Task
  Components: Build Infrastructure
Affects Versions: 0.6.0
Reporter: Gerrit Jansen van Vuuren
Priority: Minor
 Attachments: HIVE-1095-trunk.patch, HIVE-1095.v2.PATCH, 
 HIVE-1095.v3.PATCH, hiveReleasedToMaven.tar.gz


 Getting hive into maven main repositories
 Documentation on how to do this is on:
 http://maven.apache.org/guides/mini/guide-central-repository-upload.html

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (HIVE-1538) FilterOperator is applied twice with ppd on.

2011-03-15 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-1538:
--

Status: Open  (was: Patch Available)

 FilterOperator is applied twice with ppd on.
 

 Key: HIVE-1538
 URL: https://issues.apache.org/jira/browse/HIVE-1538
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
 Attachments: patch-1538.txt


 With hive.optimize.ppd set to true, FilterOperator is applied twice. And it 
 seems second operator is always filtering zero rows.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira