[jira] Commented: (HIVE-2054) Exception on windows when using the jdbc driver. IOException: The system cannot find the path specified

2011-03-16 Thread Bennie Schut (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007366#comment-13007366
 ] 

Bennie Schut commented on HIVE-2054:


Yes it was this code block:

  try {
File tmpFile = File.createTempFile(sessionID, .pipeout, tmpDir);
tmpFile.deleteOnExit();
startSs.setTmpOutputFile(tmpFile);
  } catch (IOException e) {
throw new RuntimeException(e);
  }

So you are correct it's related to changes from HIVE-818.

 Exception on windows when using the jdbc driver. IOException: The system 
 cannot find the path specified
 -

 Key: HIVE-2054
 URL: https://issues.apache.org/jira/browse/HIVE-2054
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.8.0
Reporter: Bennie Schut
Assignee: Bennie Schut
Priority: Minor
 Fix For: 0.8.0

 Attachments: HIVE-2054.1.patch.txt


 It seems something recently changed on the jdbc driver which causes this 
 IOException on windows.
 java.lang.RuntimeException: java.io.IOException: The system cannot find the 
 path specified
   at 
 org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:237)
   at 
 org.apache.hadoop.hive.jdbc.HiveConnection.init(HiveConnection.java:73)
   at org.apache.hadoop.hive.jdbc.HiveDriver.connect(HiveDriver.java:110)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (HIVE-1815) The class HiveResultSet should implement batch fetching.

2011-03-16 Thread Bennie Schut (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bennie Schut updated HIVE-1815:
---

Attachment: HIVE-1815.1.patch.txt

This is the simplest implementation I could do. Just changed the fetchOne to 
fetchN and return the result on each next() call until the list is empty and 
then do another fetchN. We've used this for a week and the performance increase 
on large resultsets is significant. You could also do the fetchN on a different 
thread to keep the queue full but that's a bit more work for just a little more 
gain.

I've added 1 small test to call the setFetchSize and getFetchSize
but the jdbc tests should all work like they worked before this test since the 
functionality doesn't change.

 The class HiveResultSet should implement batch fetching.
 

 Key: HIVE-1815
 URL: https://issues.apache.org/jira/browse/HIVE-1815
 Project: Hive
  Issue Type: Improvement
  Components: JDBC
Affects Versions: 0.5.0
 Environment: Custom Java application using the Hive JDBC driver to 
 connect to a Hive server, execute a Hive query and process the results.
Reporter: Guy le Mar
 Attachments: HIVE-1815.1.patch.txt


 When using the Hive JDBC driver, you can execute a Hive query and obtain a 
 HiveResultSet instance that contains the results of the query.
 Unfortunately, HiveResultSet can then only fetch a single row of these 
 results from the Hive server at a time. As a consequence, it's extremely slow 
 to fetch a resultset of anything other than a trivial size.
 It would be nice for the HiveResultSet to be able to fetch N rows from the 
 server at a time, so that performance is suitable to support applications 
 that provide human interaction. 
 (From memory, I think it took me around 20 minutes to fetch 4000 rows.)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (HIVE-1815) The class HiveResultSet should implement batch fetching.

2011-03-16 Thread Bennie Schut (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bennie Schut updated HIVE-1815:
---

Fix Version/s: 0.8.0
Affects Version/s: (was: 0.5.0)
   0.8.0
 Release Note: Use batch fetching on the hive jdbc driver to increase 
performance.
   Status: Patch Available  (was: Reopened)

 The class HiveResultSet should implement batch fetching.
 

 Key: HIVE-1815
 URL: https://issues.apache.org/jira/browse/HIVE-1815
 Project: Hive
  Issue Type: Improvement
  Components: JDBC
Affects Versions: 0.8.0
 Environment: Custom Java application using the Hive JDBC driver to 
 connect to a Hive server, execute a Hive query and process the results.
Reporter: Guy le Mar
 Fix For: 0.8.0

 Attachments: HIVE-1815.1.patch.txt


 When using the Hive JDBC driver, you can execute a Hive query and obtain a 
 HiveResultSet instance that contains the results of the query.
 Unfortunately, HiveResultSet can then only fetch a single row of these 
 results from the Hive server at a time. As a consequence, it's extremely slow 
 to fetch a resultset of anything other than a trivial size.
 It would be nice for the HiveResultSet to be able to fetch N rows from the 
 server at a time, so that performance is suitable to support applications 
 that provide human interaction. 
 (From memory, I think it took me around 20 minutes to fetch 4000 rows.)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (HIVE-1815) The class HiveResultSet should implement batch fetching.

2011-03-16 Thread Bennie Schut (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007386#comment-13007386
 ] 

Bennie Schut commented on HIVE-1815:


https://reviews.apache.org/r/514/

 The class HiveResultSet should implement batch fetching.
 

 Key: HIVE-1815
 URL: https://issues.apache.org/jira/browse/HIVE-1815
 Project: Hive
  Issue Type: Improvement
  Components: JDBC
Affects Versions: 0.8.0
 Environment: Custom Java application using the Hive JDBC driver to 
 connect to a Hive server, execute a Hive query and process the results.
Reporter: Guy le Mar
 Fix For: 0.8.0

 Attachments: HIVE-1815.1.patch.txt


 When using the Hive JDBC driver, you can execute a Hive query and obtain a 
 HiveResultSet instance that contains the results of the query.
 Unfortunately, HiveResultSet can then only fetch a single row of these 
 results from the Hive server at a time. As a consequence, it's extremely slow 
 to fetch a resultset of anything other than a trivial size.
 It would be nice for the HiveResultSet to be able to fetch N rows from the 
 server at a time, so that performance is suitable to support applications 
 that provide human interaction. 
 (From memory, I think it took me around 20 minutes to fetch 4000 rows.)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (HIVE-1095) Hive in Maven

2011-03-16 Thread Gerrit Jansen van Vuuren (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gerrit Jansen van Vuuren updated HIVE-1095:
---

Attachment: HIVE-1095.v4.PATCH

fixed,

 Hive in Maven
 -

 Key: HIVE-1095
 URL: https://issues.apache.org/jira/browse/HIVE-1095
 Project: Hive
  Issue Type: Task
  Components: Build Infrastructure
Affects Versions: 0.6.0
Reporter: Gerrit Jansen van Vuuren
Priority: Minor
 Attachments: HIVE-1095-trunk.patch, HIVE-1095.v2.PATCH, 
 HIVE-1095.v3.PATCH, HIVE-1095.v4.PATCH, hiveReleasedToMaven.tar.gz


 Getting hive into maven main repositories
 Documentation on how to do this is on:
 http://maven.apache.org/guides/mini/guide-central-repository-upload.html

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: status of 0.7.0

2011-03-16 Thread Bill Au
Thanks for that info.

Bill

On Tue, Mar 15, 2011 at 6:28 PM, Carl Steinbach c...@cloudera.com wrote:

 Hi Bill,

 There are two open blocker tickets related to bugs in the metastore upgrade
 scripts (which are present in rc0). Once these are resolved we'll be ready
 to vote on a new release candidate.

 Thanks.

 Carl


 On Tue, Mar 15, 2011 at 7:08 AM, Bill Au bill.w...@gmail.com wrote:

 What's the status of 0.7.0?  I noticed that rc0 was made available back on
 2/18.  But then there has been no vote on it at all.  Is that save to use?

 Bill





Review Request: HIVE-1815: The class HiveResultSet should implement batch fetching.

2011-03-16 Thread Bennie Schut

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/514/
---

Review request for hive.


Summary
---

HIVE-1815: The class HiveResultSet should implement batch fetching.


This addresses bug HIVE-1815.
https://issues.apache.org/jira/browse/HIVE-1815


Diffs
-

  trunk/jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveQueryResultSet.java 
1081785 
  trunk/jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveStatement.java 1081785 
  trunk/jdbc/src/test/org/apache/hadoop/hive/jdbc/TestJdbcDriver.java 1081785 

Diff: https://reviews.apache.org/r/514/diff


Testing
---


Thanks,

Bennie



[jira] Updated: (HIVE-2058) MySQL Upgrade scripts missing new defaults for two table's columns

2011-03-16 Thread Stephen Tunney (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Tunney updated HIVE-2058:
-

Labels:   (was: derby_triage10_5_2)

 MySQL Upgrade scripts missing new defaults for two table's columns
 --

 Key: HIVE-2058
 URL: https://issues.apache.org/jira/browse/HIVE-2058
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Stephen Tunney
Priority: Blocker

 Upgraded from 0.5.0 to 0.7.0, and the upgrade scripts to 0.6.0 and 0.7.0 did 
 not have two defaults that are necessary for being able to create a hive 
 table.  The columns missing default values are:
 COLUMNS.INTEGER_IDX
 SDS.IS_COMPRESSED
 I set them both to zero(0) (false for IS_COMPRESSED, obviously)
 The absence of these two default prevents the ability to create a table in 
 Hive.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Created: (HIVE-2058) MySQL Upgrade scripts missing new defaults for two table's columns

2011-03-16 Thread Stephen Tunney (JIRA)
MySQL Upgrade scripts missing new defaults for two table's columns
--

 Key: HIVE-2058
 URL: https://issues.apache.org/jira/browse/HIVE-2058
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Stephen Tunney
Priority: Blocker


Upgraded from 0.5.0 to 0.7.0, and the upgrade scripts to 0.6.0 and 0.7.0 did 
not have two defaults that are necessary for being able to create a hive table. 
 The columns missing default values are:

COLUMNS.INTEGER_IDX
SDS.IS_COMPRESSED

I set them both to zero(0) (false for IS_COMPRESSED, obviously)

The absence of these two default prevents the ability to create a table in Hive.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (HIVE-2028) Performance instruments for client side execution

2011-03-16 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-2028:
-

Attachment: HIVE-2028.2.patch

Updated to the latest trunk.

 Performance instruments for client side execution
 -

 Key: HIVE-2028
 URL: https://issues.apache.org/jira/browse/HIVE-2028
 Project: Hive
  Issue Type: Improvement
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-2028.2.patch, HIVE-2028.patch


 Hive client side execution could sometimes takes a long time. This task is to 
 instrument the client side code to measure the time spent in the most likely 
 expensive components. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (HIVE-2011) upgrade-0.6.0.mysql.sql script attempts to increase size of PK COLUMNS.TYPE_NAME to 4000

2011-03-16 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-2011:
-

Attachment: HIVE-2011.2.patch.txt

 upgrade-0.6.0.mysql.sql script attempts to increase size of PK 
 COLUMNS.TYPE_NAME to 4000
 

 Key: HIVE-2011
 URL: https://issues.apache.org/jira/browse/HIVE-2011
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.6.0
Reporter: Carl Steinbach
Assignee: Carl Steinbach
Priority: Blocker
 Fix For: 0.7.0

 Attachments: HIVE-2011.1.patch.txt, HIVE-2011.2.patch.txt


 {code}
 # mysql flumenewresearch  upgrade-0.6.0.mysql.sql 
 ERROR 1071 (42000) at line 16: Specified key was too long; max key length is 
 767 bytes
 {code}
 Here's the cause of the problem from upgrade-0.6.0.mysql.sql:
 {code}
 ...
 ALTER TABLE `COLUMNS` MODIFY `TYPE_NAME` VARCHAR(4000);
 ...
 ALTER TABLE `COLUMNS` DROP PRIMARY KEY;
 ALTER TABLE `COLUMNS` ADD PRIMARY KEY (`SD_ID`, `COLUMN_NAME`);
 ...
 {code}
 We need to make sure that the PK on COLUMNS.TYPE_NAME is dropped before the 
 size of the column is bumped to 4000.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (HIVE-2011) upgrade-0.6.0.mysql.sql script attempts to increase size of PK COLUMNS.TYPE_NAME to 4000

2011-03-16 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-2011:
-

Status: Patch Available  (was: Open)

Updated the patch with more extensive instructions and official schemas for 
Hive 0.3.0 through 0.7.0


 upgrade-0.6.0.mysql.sql script attempts to increase size of PK 
 COLUMNS.TYPE_NAME to 4000
 

 Key: HIVE-2011
 URL: https://issues.apache.org/jira/browse/HIVE-2011
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.6.0
Reporter: Carl Steinbach
Assignee: Carl Steinbach
Priority: Blocker
 Fix For: 0.7.0

 Attachments: HIVE-2011.1.patch.txt, HIVE-2011.2.patch.txt


 {code}
 # mysql flumenewresearch  upgrade-0.6.0.mysql.sql 
 ERROR 1071 (42000) at line 16: Specified key was too long; max key length is 
 767 bytes
 {code}
 Here's the cause of the problem from upgrade-0.6.0.mysql.sql:
 {code}
 ...
 ALTER TABLE `COLUMNS` MODIFY `TYPE_NAME` VARCHAR(4000);
 ...
 ALTER TABLE `COLUMNS` DROP PRIMARY KEY;
 ALTER TABLE `COLUMNS` ADD PRIMARY KEY (`SD_ID`, `COLUMN_NAME`);
 ...
 {code}
 We need to make sure that the PK on COLUMNS.TYPE_NAME is dropped before the 
 size of the column is bumped to 4000.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (HIVE-2028) Performance instruments for client side execution

2011-03-16 Thread Paul Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007625#comment-13007625
 ] 

Paul Yang commented on HIVE-2028:
-

In PerfLogEnd():
{code}
sb.append(/);
{code}

Shouldn't this be a  since this is a close tag?

 Performance instruments for client side execution
 -

 Key: HIVE-2028
 URL: https://issues.apache.org/jira/browse/HIVE-2028
 Project: Hive
  Issue Type: Improvement
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-2028.2.patch, HIVE-2028.patch


 Hive client side execution could sometimes takes a long time. This task is to 
 instrument the client side code to measure the time spent in the most likely 
 expensive components. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (HIVE-2059) Add datanucleus.identifierFactory property HiveConf to avoid unintentional MetaStore Schema corruption

2011-03-16 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-2059:
-

Priority: Blocker  (was: Major)

 Add datanucleus.identifierFactory property HiveConf to avoid unintentional 
 MetaStore Schema corruption
 --

 Key: HIVE-2059
 URL: https://issues.apache.org/jira/browse/HIVE-2059
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.7.0
Reporter: Carl Steinbach
Assignee: Carl Steinbach
Priority: Blocker
 Fix For: 0.7.0

 Attachments: HIVE-2059.1.patch.txt


 Hive 0.6.0 we upgraded the version of DataNucleus from 1.0 to 2.0, which 
 changed some of the defaults for how field names get mapped to datastore 
 identifiers. This was problem was resolved in HIVE-1435 by setting 
 datanucleus.identifierFactory=datanucleus in hive-default.xml
 However, this property definition was not added to HiveConf. This can result 
 in schema corruption if the user upgrades from Hive 0.5.0 to 0.6.0 or 0.7.0 
 and retains the Hive 0.5.0 version hive-default.xml on their classpath.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (HIVE-2059) Add datanucleus.identifierFactory property HiveConf to avoid unintentional MetaStore Schema corruption

2011-03-16 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-2059:
-

Attachment: HIVE-2059.1.patch.txt

 Add datanucleus.identifierFactory property HiveConf to avoid unintentional 
 MetaStore Schema corruption
 --

 Key: HIVE-2059
 URL: https://issues.apache.org/jira/browse/HIVE-2059
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.7.0
Reporter: Carl Steinbach
Assignee: Carl Steinbach
 Fix For: 0.7.0

 Attachments: HIVE-2059.1.patch.txt


 Hive 0.6.0 we upgraded the version of DataNucleus from 1.0 to 2.0, which 
 changed some of the defaults for how field names get mapped to datastore 
 identifiers. This was problem was resolved in HIVE-1435 by setting 
 datanucleus.identifierFactory=datanucleus in hive-default.xml
 However, this property definition was not added to HiveConf. This can result 
 in schema corruption if the user upgrades from Hive 0.5.0 to 0.6.0 or 0.7.0 
 and retains the Hive 0.5.0 version hive-default.xml on their classpath.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Created: (HIVE-2059) Add datanucleus.identifierFactory property HiveConf to avoid unintentional MetaStore Schema corruption

2011-03-16 Thread Carl Steinbach (JIRA)
Add datanucleus.identifierFactory property HiveConf to avoid unintentional 
MetaStore Schema corruption
--

 Key: HIVE-2059
 URL: https://issues.apache.org/jira/browse/HIVE-2059
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.7.0
Reporter: Carl Steinbach
Assignee: Carl Steinbach
 Fix For: 0.7.0
 Attachments: HIVE-2059.1.patch.txt

Hive 0.6.0 we upgraded the version of DataNucleus from 1.0 to 2.0, which 
changed some of the defaults for how field names get mapped to datastore 
identifiers. This was problem was resolved in HIVE-1435 by setting 
datanucleus.identifierFactory=datanucleus in hive-default.xml

However, this property definition was not added to HiveConf. This can result in 
schema corruption if the user upgrades from Hive 0.5.0 to 0.6.0 or 0.7.0 and 
retains the Hive 0.5.0 version hive-default.xml on their classpath.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (HIVE-2059) Add datanucleus.identifierFactory property HiveConf to avoid unintentional MetaStore Schema corruption

2011-03-16 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-2059:
-

Status: Patch Available  (was: Open)

 Add datanucleus.identifierFactory property HiveConf to avoid unintentional 
 MetaStore Schema corruption
 --

 Key: HIVE-2059
 URL: https://issues.apache.org/jira/browse/HIVE-2059
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.7.0
Reporter: Carl Steinbach
Assignee: Carl Steinbach
 Fix For: 0.7.0

 Attachments: HIVE-2059.1.patch.txt


 Hive 0.6.0 we upgraded the version of DataNucleus from 1.0 to 2.0, which 
 changed some of the defaults for how field names get mapped to datastore 
 identifiers. This was problem was resolved in HIVE-1435 by setting 
 datanucleus.identifierFactory=datanucleus in hive-default.xml
 However, this property definition was not added to HiveConf. This can result 
 in schema corruption if the user upgrades from Hive 0.5.0 to 0.6.0 or 0.7.0 
 and retains the Hive 0.5.0 version hive-default.xml on their classpath.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Review Request: HIVE-2059: Add datanucleus.identifierFactory property to HiveConf to avoid unintentional MetaStore Schema corruption

2011-03-16 Thread Carl Steinbach

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/515/
---

Review request for hive.


Summary
---

Review request for HIVE-2059.


This addresses bug HIVE-2059.
https://issues.apache.org/jira/browse/HIVE-2059


Diffs
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 8325870 

Diff: https://reviews.apache.org/r/515/diff


Testing
---


Thanks,

Carl



Jenkins build is back to normal : Hive-0.7.0-h0.20 #40

2011-03-16 Thread Apache Hudson Server
See https://hudson.apache.org/hudson/job/Hive-0.7.0-h0.20/40/




[jira] Updated: (HIVE-2028) Performance instruments for client side execution

2011-03-16 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-2028:
-

Attachment: HIVE-2028.3.patch

Good catch Paul. I'm uploading a new one correcting this. 

 Performance instruments for client side execution
 -

 Key: HIVE-2028
 URL: https://issues.apache.org/jira/browse/HIVE-2028
 Project: Hive
  Issue Type: Improvement
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-2028.2.patch, HIVE-2028.3.patch, HIVE-2028.patch


 Hive client side execution could sometimes takes a long time. This task is to 
 instrument the client side code to measure the time spent in the most likely 
 expensive components. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Review Request: HIVE-2049. Push down partition pruning to JDO filtering for a subset of partition predicates

2011-03-16 Thread Ning Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/489/#review333
---



trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
https://reviews.apache.org/r/489/#comment674

will do.



trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java
https://reviews.apache.org/r/489/#comment673

I think explicitly catching HiveException and throw it away will save the 
creation of HiveException in the catch(Exception) block, right?



trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java
https://reviews.apache.org/r/489/#comment675

I agree in general we should use a generic interface in the declaration, 
but I'd prefer leave the LinkedHashMap here. The reason is that what we need 
for the partSpec is an ordered map where the order of iterator.getNext() should 
be the same order of elements being inserted. Unfortunately Java collections 
doesn't have this interface but just an implementation. We could declare an 
interface just for that, but that should be a different issue.


- Ning


On 2011-03-11 14:59:46, Ning Zhang wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/489/
 ---
 
 (Updated 2011-03-11 14:59:46)
 
 
 Review request for hive.
 
 
 Summary
 ---
 
 *  expose HiveMetaStoreClient.listPartitionsByFilter() to Hive.java so 
 that PartitionPruner can use that for certain partition predicates.
 * only allows {=, AND, OR} in the partition predicates that can be pushed 
 down to JDO filtering.
 
 
 Diffs
 -
 
   
 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java
  1080788 
   trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 
 1080788 
   trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 1080788 
   trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 1080788 
   
 trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java
  1080788 
 
 Diff: https://reviews.apache.org/r/489/diff
 
 
 Testing
 ---
 
 
 Thanks,
 
 Ning
 




Re: Review Request: HIVE-2049. Push down partition pruning to JDO filtering for a subset of partition predicates

2011-03-16 Thread Ning Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/489/
---

(Updated 2011-03-16 16:50:15.347839)


Review request for hive.


Changes
---

Add more changes to suite the current implementation of JDO filtering. Some 
major changes are:

  - ExpressionTree.makeFilterForEquals(): previous '=' are translated to JDO 
method matches(). This will introduce false positives if the partition value 
contains regex special characters (e.g., dot). I changed this function to use 
startWith(), endsWith(), and indexOf() in the case of whether the partition 
column is at the beginning, end or middle of the partition spec string. Two 
unit tests files (ppr_pushdown*.q) are added to test these cases. 
  - ObjectStore.listMPartitions(): added a query.setOrder() to return 
partitions ordered by their partition names. This is to be backward compatible 
with the old partition pruning behavior. 
  - PartitionPruner.prune(): check if the partition pruning expression contains 
non-partition columns. If so add the resulting partitions to unkn_parts, 
otherwise to true_parts. This is required by down stream optimizations. 
  - Utilities.checkJDOPushDown(): return true only if the partition column type 
is string and constant type is string. This is required by the current 
implementation of JDO filter (see ExpressionTree.java and Filter.g). 

Passed all unit tests. 


Summary
---

*  expose HiveMetaStoreClient.listPartitionsByFilter() to Hive.java so that 
PartitionPruner can use that for certain partition predicates.
* only allows {=, AND, OR} in the partition predicates that can be pushed 
down to JDO filtering.


Diffs (updated)
-

  
trunk/metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 
1081948 
  trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 
1081948 
  
trunk/metastore/src/java/org/apache/hadoop/hive/metastore/parser/ExpressionTree.java
 1081948 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 1081948 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 1081948 
  
trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java 
1081948 
  trunk/ql/src/test/queries/clientpositive/ppr_pushdown.q PRE-CREATION 
  trunk/ql/src/test/queries/clientpositive/ppr_pushdown2.q PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/ppr_pushdown.q.out PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/ppr_pushdown2.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/489/diff


Testing
---


Thanks,

Ning



[jira] Updated: (HIVE-2049) Push down partition pruning to JDO filtering for a subset of partition predicates

2011-03-16 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-2049:
-

Attachment: HIVE-2049.3.patch

Uploading HIVE-2049.3.patch. This one passed all unit tests. I've also update 
the review board. 

 Push down partition pruning to JDO filtering for a subset of partition 
 predicates
 -

 Key: HIVE-2049
 URL: https://issues.apache.org/jira/browse/HIVE-2049
 Project: Hive
  Issue Type: Sub-task
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-2049.2.patch, HIVE-2049.3.patch, HIVE-2049.patch


 Several tasks:
   - expose HiveMetaStoreClient.listPartitionsByFilter() to Hive.java so that 
 PartitionPruner can use that for certain partition predicates. 
   - figure out a safe subset of partition predicates that can be pushed down 
 to JDO filtering. 
 By my initial testing for the 2nd part is equality queries with AND/OR can be 
 pushed down and return correct results. However range queries on partition 
 columns gave NPE by the JDO execute() function. This might be a bug in the 
 JDO query string itself, but we need to figure it out and heavily test all 
 cases. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (HIVE-2049) Push down partition pruning to JDO filtering for a subset of partition predicates

2011-03-16 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-2049:
-

Status: Patch Available  (was: Open)

 Push down partition pruning to JDO filtering for a subset of partition 
 predicates
 -

 Key: HIVE-2049
 URL: https://issues.apache.org/jira/browse/HIVE-2049
 Project: Hive
  Issue Type: Sub-task
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-2049.2.patch, HIVE-2049.3.patch, HIVE-2049.patch


 Several tasks:
   - expose HiveMetaStoreClient.listPartitionsByFilter() to Hive.java so that 
 PartitionPruner can use that for certain partition predicates. 
   - figure out a safe subset of partition predicates that can be pushed down 
 to JDO filtering. 
 By my initial testing for the 2nd part is equality queries with AND/OR can be 
 pushed down and return correct results. However range queries on partition 
 columns gave NPE by the JDO execute() function. This might be a bug in the 
 JDO query string itself, but we need to figure it out and heavily test all 
 cases. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (HIVE-2028) Performance instruments for client side execution

2011-03-16 Thread Paul Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007759#comment-13007759
 ] 

Paul Yang commented on HIVE-2028:
-

+1 Will test and commit.

 Performance instruments for client side execution
 -

 Key: HIVE-2028
 URL: https://issues.apache.org/jira/browse/HIVE-2028
 Project: Hive
  Issue Type: Improvement
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-2028.2.patch, HIVE-2028.3.patch, HIVE-2028.patch


 Hive client side execution could sometimes takes a long time. This task is to 
 instrument the client side code to measure the time spent in the most likely 
 expensive components. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Review Request: HIVE-2049. Push down partition pruning to JDO filtering for a subset of partition predicates

2011-03-16 Thread M IS


 On 2011-03-16 16:37:44, Ning Zhang wrote:
  trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java,
   line 244
  https://reviews.apache.org/r/489/diff/1/?file=13887#file13887line244
 
  I agree in general we should use a generic interface in the 
  declaration, but I'd prefer leave the LinkedHashMap here. The reason is 
  that what we need for the partSpec is an ordered map where the order of 
  iterator.getNext() should be the same order of elements being inserted. 
  Unfortunately Java collections doesn't have this interface but just an 
  implementation. We could declare an interface just for that, but that 
  should be a different issue.

Using the generic interface (java.util.Map in this context) in declaration will 
not affect the expected FIFO functionality.
Further, when this map instance is being passed around to get the partition in 
the getPartition(...) method, the reference type is generic Map and not 
implementation type.
Here is a sample program to illustrate that using Map as reference for an 
LinkedHashMap maintains the order:

public class Sample {
public static void main(String[] args) {
MapString, String map = new LinkedHashMapString, String();

map.put(A, A);
map.put(B, B);
map.put(C, C);
map.put(D, D);
map.put(E, E);
map.put(F, F);
map.put(G, G);

IteratorString iter = map.keySet().iterator();

while (iter.hasNext()) {
System.out.println(map.get(iter.next()));
}

for (Map.EntryString, String entry : map.entrySet()) {
System.out.println(entry.getKey() +   + 
entry.getValue());
}
}
}


 On 2011-03-16 16:37:44, Ning Zhang wrote:
  trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java,
   line 216
  https://reviews.apache.org/r/489/diff/1/?file=13887#file13887line216
 
  I think explicitly catching HiveException and throw it away will save 
  the creation of HiveException in the catch(Exception) block, right?

Fine.


- M


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/489/#review333
---


On 2011-03-16 16:50:15, Ning Zhang wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/489/
 ---
 
 (Updated 2011-03-16 16:50:15)
 
 
 Review request for hive.
 
 
 Summary
 ---
 
 *  expose HiveMetaStoreClient.listPartitionsByFilter() to Hive.java so 
 that PartitionPruner can use that for certain partition predicates.
 * only allows {=, AND, OR} in the partition predicates that can be pushed 
 down to JDO filtering.
 
 
 Diffs
 -
 
   
 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java
  1081948 
   trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 
 1081948 
   
 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/parser/ExpressionTree.java
  1081948 
   trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 1081948 
   trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 1081948 
   
 trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java
  1081948 
   trunk/ql/src/test/queries/clientpositive/ppr_pushdown.q PRE-CREATION 
   trunk/ql/src/test/queries/clientpositive/ppr_pushdown2.q PRE-CREATION 
   trunk/ql/src/test/results/clientpositive/ppr_pushdown.q.out PRE-CREATION 
   trunk/ql/src/test/results/clientpositive/ppr_pushdown2.q.out PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/489/diff
 
 
 Testing
 ---
 
 
 Thanks,
 
 Ning
 




Re: Review Request: HIVE-2049. Push down partition pruning to JDO filtering for a subset of partition predicates

2011-03-16 Thread M IS

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/489/#review336
---



trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
https://reviews.apache.org/r/489/#comment679

Why can't we just do :
code

return (value instanceof String);

/code



trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
https://reviews.apache.org/r/489/#comment680

Why can't we just do :
code

return (fs.getType().equals(Constants.STRING_TYPE_NAME));

/code


- M


On 2011-03-16 16:50:15, Ning Zhang wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/489/
 ---
 
 (Updated 2011-03-16 16:50:15)
 
 
 Review request for hive.
 
 
 Summary
 ---
 
 *  expose HiveMetaStoreClient.listPartitionsByFilter() to Hive.java so 
 that PartitionPruner can use that for certain partition predicates.
 * only allows {=, AND, OR} in the partition predicates that can be pushed 
 down to JDO filtering.
 
 
 Diffs
 -
 
   
 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java
  1081948 
   trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 
 1081948 
   
 trunk/metastore/src/java/org/apache/hadoop/hive/metastore/parser/ExpressionTree.java
  1081948 
   trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 1081948 
   trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 1081948 
   
 trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java
  1081948 
   trunk/ql/src/test/queries/clientpositive/ppr_pushdown.q PRE-CREATION 
   trunk/ql/src/test/queries/clientpositive/ppr_pushdown2.q PRE-CREATION 
   trunk/ql/src/test/results/clientpositive/ppr_pushdown.q.out PRE-CREATION 
   trunk/ql/src/test/results/clientpositive/ppr_pushdown2.q.out PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/489/diff
 
 
 Testing
 ---
 
 
 Thanks,
 
 Ning
 




[jira] Commented: (HIVE-2054) Exception on windows when using the jdbc driver. IOException: The system cannot find the path specified

2011-03-16 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007803#comment-13007803
 ] 

Ning Zhang commented on HIVE-2054:
--

I think the problem is that the tmpDir by default is /tmp/user.name/, 
specified by hive.querylog.location. Bennie, I think if you set 
hive.querylog.location to be something like 'c:\' on Windows, it should work. 
Can you try that?

The JdbcSessionState was introduced in the very first version of the JDBC 
driver (HIVE-48). It is not used but I guess the reason for it to be able to 
set session level states for the JDBC connection. I'm OK removing it, but not 
sure about others' opinion.

 Exception on windows when using the jdbc driver. IOException: The system 
 cannot find the path specified
 -

 Key: HIVE-2054
 URL: https://issues.apache.org/jira/browse/HIVE-2054
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.8.0
Reporter: Bennie Schut
Assignee: Bennie Schut
Priority: Minor
 Fix For: 0.8.0

 Attachments: HIVE-2054.1.patch.txt


 It seems something recently changed on the jdbc driver which causes this 
 IOException on windows.
 java.lang.RuntimeException: java.io.IOException: The system cannot find the 
 path specified
   at 
 org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:237)
   at 
 org.apache.hadoop.hive.jdbc.HiveConnection.init(HiveConnection.java:73)
   at org.apache.hadoop.hive.jdbc.HiveDriver.connect(HiveDriver.java:110)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (HIVE-2059) Add datanucleus.identifierFactory property to HiveConf to avoid unintentional MetaStore Schema corruption

2011-03-16 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-2059:
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Committed to branch and trunk.  Thanks Carl!


 Add datanucleus.identifierFactory property to HiveConf to avoid unintentional 
 MetaStore Schema corruption
 -

 Key: HIVE-2059
 URL: https://issues.apache.org/jira/browse/HIVE-2059
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.7.0
Reporter: Carl Steinbach
Assignee: Carl Steinbach
Priority: Blocker
 Fix For: 0.7.0

 Attachments: HIVE-2059.1.patch.txt


 Hive 0.6.0 we upgraded the version of DataNucleus from 1.0 to 2.0, which 
 changed some of the defaults for how field names get mapped to datastore 
 identifiers. This was problem was resolved in HIVE-1435 by setting 
 datanucleus.identifierFactory=datanucleus in hive-default.xml
 However, this property definition was not added to HiveConf. This can result 
 in schema corruption if the user upgrades from Hive 0.5.0 to 0.6.0 or 0.7.0 
 and retains the Hive 0.5.0 version hive-default.xml on their classpath.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (HIVE-1959) Potential memory leak when same connection used for long time. TaskInfo and QueryInfo objects are getting accumulated on executing more queries on the same connection.

2011-03-16 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007828#comment-13007828
 ] 

Ning Zhang commented on HIVE-1959:
--

Sorry Chinna, I must have missed your last comment. 

Yeah, I think it will work with the current patch. I will start testing and 
commit if it passes. 

 Potential memory leak when same connection used for long time. TaskInfo and 
 QueryInfo objects are getting accumulated on executing more queries on the 
 same connection.
 ---

 Key: HIVE-1959
 URL: https://issues.apache.org/jira/browse/HIVE-1959
 Project: Hive
  Issue Type: Bug
  Components: Server Infrastructure
Affects Versions: 0.8.0
 Environment: Hadoop 0.20.1, Hive0.5.0 and SUSE Linux Enterprise 
 Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5).
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam
 Attachments: HIVE-1959.patch


 *org.apache.hadoop.hive.ql.history.HiveHistory$TaskInfo* and 
 *org.apache.hadoop.hive.ql.history.HiveHistory$QueryInfo* these two objects 
 are getting accumulated on executing more number of queries on the same 
 connection. These objects are getting released only when the connection is 
 closed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira