[jira] [Updated] (HIVE-2181) Clean up the scratch.dir (tmp/hive-root) while restarting Hive server.

2011-09-23 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-2181:
-

   Resolution: Fixed
Fix Version/s: 0.9.0
   Status: Resolved  (was: Patch Available)

Committed to trunk.  Thanks Chinna!


  Clean up the scratch.dir (tmp/hive-root) while restarting Hive server. 
 

 Key: HIVE-2181
 URL: https://issues.apache.org/jira/browse/HIVE-2181
 Project: Hive
  Issue Type: Bug
  Components: Server Infrastructure
Affects Versions: 0.8.0
 Environment: Suse linux, Hadoop 20.1, Hive 0.8
Reporter: sanoj mathew
Assignee: Chinna Rao Lalam
Priority: Minor
 Fix For: 0.9.0

 Attachments: HIVE-2181.1.patch, HIVE-2181.2.patch, HIVE-2181.3.patch, 
 HIVE-2181.4.patch, HIVE-2181.5.patch, HIVE-2181.6.patch, HIVE-2181.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 Now queries leaves the map outputs under scratch.dir after execution. If the 
 hive server is stopped we need not keep the stopped server's map oputputs. So 
 whle starting the server we can clear the scratch.dir. This can help in 
 improved disk usage.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2380) Add ByteArray Datatype

2011-09-23 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13113770#comment-13113770
 ] 

John Sichi commented on HIVE-2380:
--

I'm planning to review this one next week.


 Add ByteArray Datatype
 --

 Key: HIVE-2380
 URL: https://issues.apache.org/jira/browse/HIVE-2380
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: hive-2380.patch, hive-2380_1.patch


 Add bytearray as a primitive data type.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2442) Metastore upgrade script and schema DDL for Hive 0.8.0

2011-09-23 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-2442:
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

+1, committed to trunk, thanks Carl!

I did not do testing; we can do that with the release candidate, and then if 
there are problems, submit a corrective patch.

I noticed that you omitted PostgreSQL?


 Metastore upgrade script and schema DDL for Hive 0.8.0
 --

 Key: HIVE-2442
 URL: https://issues.apache.org/jira/browse/HIVE-2442
 Project: Hive
  Issue Type: Task
  Components: Metastore
Reporter: Carl Steinbach
Assignee: Carl Steinbach
Priority: Blocker
 Fix For: 0.8.0

 Attachments: HIVE-2442-branch-08.1.patch.txt, 
 HIVE-2442-trunk.1.patch.txt




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HIVE-2462) make INNER a non-reserved keyword

2011-09-22 Thread John Sichi (JIRA)
make INNER a non-reserved keyword
-

 Key: HIVE-2462
 URL: https://issues.apache.org/jira/browse/HIVE-2462
 Project: Hive
  Issue Type: Improvement
Reporter: John Sichi
Assignee: John Sichi


HIVE-2191 introduced the INNER keyword as reserved, which breaks backwards 
compatibility for queries which were using it as an identifier.  This patch 
addresses that.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2462) make INNER a non-reserved keyword

2011-09-22 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13112861#comment-13112861
 ] 

John Sichi commented on HIVE-2462:
--

Not sure whether we want/need this, but if so, here's the patch.


 make INNER a non-reserved keyword
 -

 Key: HIVE-2462
 URL: https://issues.apache.org/jira/browse/HIVE-2462
 Project: Hive
  Issue Type: Improvement
Reporter: John Sichi
Assignee: John Sichi
 Fix For: 0.9.0

 Attachments: HIVE-2462.1.patch


 HIVE-2191 introduced the INNER keyword as reserved, which breaks backwards 
 compatibility for queries which were using it as an identifier.  This patch 
 addresses that.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2181) Clean up the scratch.dir (tmp/hive-root) while restarting Hive server.

2011-09-22 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13112881#comment-13112881
 ] 

John Sichi commented on HIVE-2181:
--

+1.  Will commit when tests pass.

  Clean up the scratch.dir (tmp/hive-root) while restarting Hive server. 
 

 Key: HIVE-2181
 URL: https://issues.apache.org/jira/browse/HIVE-2181
 Project: Hive
  Issue Type: Bug
  Components: Server Infrastructure
Affects Versions: 0.8.0
 Environment: Suse linux, Hadoop 20.1, Hive 0.8
Reporter: sanoj mathew
Assignee: Chinna Rao Lalam
Priority: Minor
 Attachments: HIVE-2181.1.patch, HIVE-2181.2.patch, HIVE-2181.3.patch, 
 HIVE-2181.4.patch, HIVE-2181.5.patch, HIVE-2181.6.patch, HIVE-2181.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 Now queries leaves the map outputs under scratch.dir after execution. If the 
 hive server is stopped we need not keep the stopped server's map oputputs. So 
 whle starting the server we can clear the scratch.dir. This can help in 
 improved disk usage.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HIVE-1558) introducing the dual table

2011-09-22 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi reassigned HIVE-1558:


Assignee: Marcin Kurczych

 introducing the dual table
 

 Key: HIVE-1558
 URL: https://issues.apache.org/jira/browse/HIVE-1558
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Ning Zhang
Assignee: Marcin Kurczych

 The dual table in MySQL and Oracle is very convenient in testing UDFs or 
 constructing rows without reading any other tables. 
 If dual is the only data source we could leverage the local mode execution. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HIVE-2244) Add a Plugin Developer Kit to Hive

2011-09-22 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi reassigned HIVE-2244:


Assignee: John Sichi

 Add a Plugin Developer Kit to Hive
 --

 Key: HIVE-2244
 URL: https://issues.apache.org/jira/browse/HIVE-2244
 Project: Hive
  Issue Type: New Feature
  Components: UDF
Reporter: John Sichi
Assignee: John Sichi
 Attachments: HIVE-2244.patch


 See https://cwiki.apache.org/confluence/display/Hive/PluginDeveloperKit

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HIVE-2463) fix Eclipse for javaewah upgrade

2011-09-22 Thread John Sichi (JIRA)
fix Eclipse for javaewah upgrade


 Key: HIVE-2463
 URL: https://issues.apache.org/jira/browse/HIVE-2463
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure
Affects Versions: 0.9.0
Reporter: John Sichi
Assignee: John Sichi


I always forget this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2463) fix Eclipse for javaewah upgrade

2011-09-22 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-2463:
-

Attachment: HIVE-2463.1.patch

 fix Eclipse for javaewah upgrade
 

 Key: HIVE-2463
 URL: https://issues.apache.org/jira/browse/HIVE-2463
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure
Affects Versions: 0.9.0
Reporter: John Sichi
Assignee: John Sichi
 Fix For: 0.9.0

 Attachments: HIVE-2463.1.patch


 I always forget this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2463) fix Eclipse for javaewah upgrade

2011-09-22 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-2463:
-

Fix Version/s: 0.9.0
   Status: Patch Available  (was: Open)

 fix Eclipse for javaewah upgrade
 

 Key: HIVE-2463
 URL: https://issues.apache.org/jira/browse/HIVE-2463
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure
Affects Versions: 0.9.0
Reporter: John Sichi
Assignee: John Sichi
 Fix For: 0.9.0

 Attachments: HIVE-2463.1.patch


 I always forget this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2244) Add a Plugin Developer Kit to Hive

2011-09-22 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-2244:
-

Attachment: HIVE-2244.1.patch

HIVE-2244.1.patch has the optimization.

 Add a Plugin Developer Kit to Hive
 --

 Key: HIVE-2244
 URL: https://issues.apache.org/jira/browse/HIVE-2244
 Project: Hive
  Issue Type: New Feature
  Components: UDF
Reporter: John Sichi
Assignee: John Sichi
 Attachments: HIVE-2244.1.patch, HIVE-2244.patch


 See https://cwiki.apache.org/confluence/display/Hive/PluginDeveloperKit

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2244) Add a Plugin Developer Kit to Hive

2011-09-22 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-2244:
-

Status: Patch Available  (was: Open)

Review Board at

https://reviews.apache.org/r/2030


 Add a Plugin Developer Kit to Hive
 --

 Key: HIVE-2244
 URL: https://issues.apache.org/jira/browse/HIVE-2244
 Project: Hive
  Issue Type: New Feature
  Components: UDF
Reporter: John Sichi
Assignee: John Sichi
 Attachments: HIVE-2244.1.patch, HIVE-2244.patch


 See https://cwiki.apache.org/confluence/display/Hive/PluginDeveloperKit

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2458) Group-by query optimization Followup: add flag in conf/hive-default.xml

2011-09-21 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-2458:
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Committed to trunk.  Thanks Prajakta!


 Group-by query optimization Followup: add flag in conf/hive-default.xml
 ---

 Key: HIVE-2458
 URL: https://issues.apache.org/jira/browse/HIVE-2458
 Project: Hive
  Issue Type: Improvement
  Components: Indexing, Query Processor
Affects Versions: 0.7.1
Reporter: Prajakta Kalmegh
Assignee: Prajakta Kalmegh
 Fix For: 0.9.0

 Attachments: HIVE-2458.1.patch, HIVE-2458.2.patch


 Followup patch to HIVE-1694.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2181) Clean up the scratch.dir (tmp/hive-root) while restarting Hive server.

2011-09-21 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-2181:
-

Status: Open  (was: Patch Available)

I ran TestHiveServer, and even though it passed, I saw the exception below in 
the test output.  That's because one of the test cases leaves the socket in 
use, so the second one fails to open it.

Rather than actually starting the server, maybe just unit-test the cleanup 
method in isolation?

{noformat}
[junit] org.apache.thrift.transport.TTransportException: Could not create 
ServerSocket on address 0.0.0.0/0.0.0.0:1.
[junit] at 
org.apache.thrift.transport.TServerSocket.init(TServerSocket.java:93)
[junit] at 
org.apache.thrift.transport.TServerSocket.init(TServerSocket.java:75)
[junit] at 
org.apache.thrift.transport.TServerSocket.init(TServerSocket.java:68)
[junit] at 
org.apache.hadoop.hive.service.HiveServer.main(HiveServer.java:688)
[junit] at 
org.apache.hadoop.hive.service.TestHiveServer$2.run(TestHiveServer.java:423)
[junit] -  ---
[junit] 
{noformat}


  Clean up the scratch.dir (tmp/hive-root) while restarting Hive server. 
 

 Key: HIVE-2181
 URL: https://issues.apache.org/jira/browse/HIVE-2181
 Project: Hive
  Issue Type: Bug
  Components: Server Infrastructure
Affects Versions: 0.8.0
 Environment: Suse linux, Hadoop 20.1, Hive 0.8
Reporter: sanoj mathew
Assignee: Chinna Rao Lalam
Priority: Minor
 Attachments: HIVE-2181.1.patch, HIVE-2181.2.patch, HIVE-2181.3.patch, 
 HIVE-2181.4.patch, HIVE-2181.5.patch, HIVE-2181.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 Now queries leaves the map outputs under scratch.dir after execution. If the 
 hive server is stopped we need not keep the stopped server's map oputputs. So 
 whle starting the server we can clear the scratch.dir. This can help in 
 improved disk usage.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HIVE-2459) remove all @author tags from source

2011-09-21 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi resolved HIVE-2459.
--

   Resolution: Fixed
Fix Version/s: 0.9.0
 Hadoop Flags: [Reviewed]

Committed to trunk.  Thanks Ashutosh!


 remove all @author tags from source
 ---

 Key: HIVE-2459
 URL: https://issues.apache.org/jira/browse/HIVE-2459
 Project: Hive
  Issue Type: Bug
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: 0.9.0

 Attachments: hive-2459.patch


 $  grep --exclude-dir=build --exclude-dir=.svn -r @author  .
 ./ql/src/java/org/apache/hadoop/hive/ql/parse/ASTNode.java: * @author athusoo
 ./ql/src/java/org/apache/hadoop/hive/ql/index/IndexSearchCondition.java: * 
 @author John Sichi

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-1496) enhance CREATE INDEX to support immediate index build

2011-09-20 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13108953#comment-13108953
 ] 

John Sichi commented on HIVE-1496:
--

Ashutosh, the DEFERRED REBUILD refers to the data portion (not the metadata for 
the index definition).

 enhance CREATE INDEX to support immediate index build
 -

 Key: HIVE-1496
 URL: https://issues.apache.org/jira/browse/HIVE-1496
 Project: Hive
  Issue Type: Improvement
  Components: Indexing
Affects Versions: 0.7.0, 0.8.0
Reporter: John Sichi
Assignee: Ashutosh Chauhan
 Attachments: hive-1496.patch


 Currently we only support WITH DEFERRED REBUILD.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2181) Clean up the scratch.dir (tmp/hive-root) while restarting Hive server.

2011-09-20 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-2181:
-

Status: Open  (was: Patch Available)

  Clean up the scratch.dir (tmp/hive-root) while restarting Hive server. 
 

 Key: HIVE-2181
 URL: https://issues.apache.org/jira/browse/HIVE-2181
 Project: Hive
  Issue Type: Bug
  Components: Server Infrastructure
Affects Versions: 0.8.0
 Environment: Suse linux, Hadoop 20.1, Hive 0.8
Reporter: sanoj mathew
Assignee: Chinna Rao Lalam
Priority: Minor
 Attachments: HIVE-2181.1.patch, HIVE-2181.2.patch, HIVE-2181.3.patch, 
 HIVE-2181.4.patch, HIVE-2181.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 Now queries leaves the map outputs under scratch.dir after execution. If the 
 hive server is stopped we need not keep the stopped server's map oputputs. So 
 whle starting the server we can clear the scratch.dir. This can help in 
 improved disk usage.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2458) Group-by query optimization Followup: add flag in conf/hive-default.xml

2011-09-20 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109091#comment-13109091
 ] 

John Sichi commented on HIVE-2458:
--

For the _Of_ casing, I searched the code base and found a few more instances:

* RewriteCanApplyProcFactory.java
* RewriteQueryUsingAggregateIndex.java

Do these need to be changed too?


 Group-by query optimization Followup: add flag in conf/hive-default.xml
 ---

 Key: HIVE-2458
 URL: https://issues.apache.org/jira/browse/HIVE-2458
 Project: Hive
  Issue Type: Improvement
  Components: Indexing, Query Processor
Affects Versions: 0.7.1
Reporter: Prajakta Kalmegh
Assignee: Prajakta Kalmegh
 Fix For: 0.9.0

 Attachments: HIVE-2458.1.patch


 Followup patch to HIVE-1694.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2458) Group-by query optimization Followup: add flag in conf/hive-default.xml

2011-09-20 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13109132#comment-13109132
 ] 

John Sichi commented on HIVE-2458:
--

+1.  Will commit when tests pass.


 Group-by query optimization Followup: add flag in conf/hive-default.xml
 ---

 Key: HIVE-2458
 URL: https://issues.apache.org/jira/browse/HIVE-2458
 Project: Hive
  Issue Type: Improvement
  Components: Indexing, Query Processor
Affects Versions: 0.7.1
Reporter: Prajakta Kalmegh
Assignee: Prajakta Kalmegh
 Fix For: 0.9.0

 Attachments: HIVE-2458.1.patch, HIVE-2458.2.patch


 Followup patch to HIVE-1694.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HIVE-1079) CREATE VIEW followup: derive dependencies on underlying base table partitions from view definition

2011-09-19 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi reassigned HIVE-1079:


Assignee: Prajakta Kalmegh  (was: John Sichi)

 CREATE VIEW followup:  derive dependencies on underlying base table 
 partitions from view definition
 ---

 Key: HIVE-1079
 URL: https://issues.apache.org/jira/browse/HIVE-1079
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: John Sichi
Assignee: Prajakta Kalmegh

 When querying a view, it would be useful to know which underlying base table 
 partitions it depends on in order to know how fresh the result is (or to be 
 able to wait until all of those partitions have been loaded consistently).  
 The task is to come up with a way to perform this analysis automatically 
 (possibly overconservatively), or alternately to let the view creator 
 annotate the view definition with this dependency information, or some 
 combination of the two.
 Note that this would be useful for any complex query which directly accesses 
 base tables (not just view definitions).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2448) Upgrade JavaEWAH to 0.3

2011-09-19 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-2448:
-

Attachment: javaewah-0.3.jar

 Upgrade JavaEWAH to 0.3
 ---

 Key: HIVE-2448
 URL: https://issues.apache.org/jira/browse/HIVE-2448
 Project: Hive
  Issue Type: Improvement
  Components: Indexing
Reporter: John Sichi
Assignee: John Sichi
 Attachments: javaewah-0.3.jar


 It contains performance improvements and should be a drop-in replacement.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2448) Upgrade JavaEWAH to 0.3

2011-09-19 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-2448:
-

Attachment: HIVE-2448.1.patch

 Upgrade JavaEWAH to 0.3
 ---

 Key: HIVE-2448
 URL: https://issues.apache.org/jira/browse/HIVE-2448
 Project: Hive
  Issue Type: Improvement
  Components: Indexing
Reporter: John Sichi
Assignee: John Sichi
 Attachments: HIVE-2448.1.patch, javaewah-0.3.jar


 It contains performance improvements and should be a drop-in replacement.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2448) Upgrade JavaEWAH to 0.3

2011-09-19 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13108138#comment-13108138
 ] 

John Sichi commented on HIVE-2448:
--

Did you look in the src .zip?  There's a file unit.java there.

 Upgrade JavaEWAH to 0.3
 ---

 Key: HIVE-2448
 URL: https://issues.apache.org/jira/browse/HIVE-2448
 Project: Hive
  Issue Type: Improvement
  Components: Indexing
Reporter: John Sichi
Assignee: John Sichi
 Attachments: HIVE-2448.1.patch, javaewah-0.3.jar


 It contains performance improvements and should be a drop-in replacement.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2448) Upgrade JavaEWAH to 0.3

2011-09-19 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-2448:
-

Fix Version/s: 0.9.0
   Status: Patch Available  (was: Open)

 Upgrade JavaEWAH to 0.3
 ---

 Key: HIVE-2448
 URL: https://issues.apache.org/jira/browse/HIVE-2448
 Project: Hive
  Issue Type: Improvement
  Components: Indexing
Reporter: John Sichi
Assignee: John Sichi
 Fix For: 0.9.0

 Attachments: HIVE-2448.1.patch, javaewah-0.3.jar


 It contains performance improvements and should be a drop-in replacement.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2448) Upgrade JavaEWAH to 0.3

2011-09-19 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-2448:
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Committed to trunk since Ed already +1'd it.

 Upgrade JavaEWAH to 0.3
 ---

 Key: HIVE-2448
 URL: https://issues.apache.org/jira/browse/HIVE-2448
 Project: Hive
  Issue Type: Improvement
  Components: Indexing
Reporter: John Sichi
Assignee: John Sichi
 Fix For: 0.9.0

 Attachments: HIVE-2448.1.patch, javaewah-0.3.jar


 It contains performance improvements and should be a drop-in replacement.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-198) Parse errors report incorrectly.

2011-09-15 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13105680#comment-13105680
 ] 

John Sichi commented on HIVE-198:
-

The updated patch does not apply cleanly.

Also, I tried the original test query from the description.

Before your patch, the message is cannot recognize input near ',' 'bigint' ')' 
in column type.

After your patch, the message is unexpected input token 'userid' near ',' 
'bigint' ')' in column type.

It's not clear that this is an improvement, since userid is fine; the syntax 
error is in what follows it.

The problem as originally reported (referring to KW_TEMPORARY) seems to have 
been fixed long ago.


 Parse errors report incorrectly.
 

 Key: HIVE-198
 URL: https://issues.apache.org/jira/browse/HIVE-198
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: S. Alex Smith
Assignee: Aviv Eyal
  Labels: parse
 Attachments: HIVE-198.2.patch.txt, PraseErrorMessage.patch


 The following two queries fail:
 CREATE TABLE output_table(userid, bigint);
 CREATE TABLE output_table(userid bigint, age int, sex string, location 
 string);
 each giving the error message FAILED: Parse Error: line 1:16 mismatched 
 input 'TABLE' expecting KW_TEMPORARY
 Although one might not catch it from the error message, the problem with the 
 first is that there is a comma between userid and bigint, and the problem 
 with the second is that location is a reserved keyword.  Reported errors 
 should more accurately describe the nature of the error, such as no type 
 given for column 'userid' or 'location' is not a valid column name.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HIVE-2448) Upgrade JavaEWAH to 0.3

2011-09-15 Thread John Sichi (JIRA)
Upgrade JavaEWAH to 0.3
---

 Key: HIVE-2448
 URL: https://issues.apache.org/jira/browse/HIVE-2448
 Project: Hive
  Issue Type: Improvement
  Components: Indexing
Reporter: John Sichi
Assignee: John Sichi


It contains performance improvements and should be a drop-in replacement.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-1040) use sed rather than diff for masking out noise in diff-based tests

2011-09-15 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13105732#comment-13105732
 ] 

John Sichi commented on HIVE-1040:
--

Another benefit is to show us exactly what is being masked out just by 
examining the .q.out files (something that currently makes some tests give less 
coverage than they should).


 use sed rather than diff for masking out noise in diff-based tests
 --

 Key: HIVE-1040
 URL: https://issues.apache.org/jira/browse/HIVE-1040
 Project: Hive
  Issue Type: Improvement
  Components: Testing Infrastructure
Affects Versions: 0.4.1
Reporter: John Sichi
Priority: Minor

 The current diff -I approach has two problems:  (1) it does not allow 
 resolution finer than line-level, so it's impossible to mask out pattern 
 occurrences within a line, and (2) it produces unmasked files, so if you run 
 diff on the command line to compare the result .q.out with the checked-in 
 file, you see the noise.
 My suggestion is to first run sed to replace noise patterns with an 
 unlikely-to-occur string like ZYZZYZVA, and then diff the pre-masked files 
 without using any -I.
 This would require a one-time hit to update all existing .q.out files so that 
 they would contain the pre-masked results.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HIVE-2449) streamline .q.out format

2011-09-15 Thread John Sichi (JIRA)
streamline .q.out format


 Key: HIVE-2449
 URL: https://issues.apache.org/jira/browse/HIVE-2449
 Project: Hive
  Issue Type: Improvement
  Components: Testing Infrastructure
Reporter: John Sichi


Currently, we enable all available testing hooks (e.g. lineage, input/output) 
for all tests.  This creates a huge amount of noise in the .q.out files, making 
it very difficult to read them and to review diffs in them.

To fix this, we should only selectively enable specific hooks for specific 
tests where the coverage is needed.

Undertaking this will necessitate a one-time hit for updating all existing 
.q.out files.  Probably best to do together with HIVE-1040.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-1040) use sed rather than diff for masking out noise in diff-based tests

2011-09-15 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13105759#comment-13105759
 ] 

John Sichi commented on HIVE-1040:
--

Good point.  We can probably figure out how to do it completely within Java by 
filtering the CLI output stream via java.util.regex.  That's what I did for 
Eigenbase, and it worked fine.


 use sed rather than diff for masking out noise in diff-based tests
 --

 Key: HIVE-1040
 URL: https://issues.apache.org/jira/browse/HIVE-1040
 Project: Hive
  Issue Type: Improvement
  Components: Testing Infrastructure
Affects Versions: 0.4.1
Reporter: John Sichi
Priority: Minor

 The current diff -I approach has two problems:  (1) it does not allow 
 resolution finer than line-level, so it's impossible to mask out pattern 
 occurrences within a line, and (2) it produces unmasked files, so if you run 
 diff on the command line to compare the result .q.out with the checked-in 
 file, you see the noise.
 My suggestion is to first run sed to replace noise patterns with an 
 unlikely-to-occur string like ZYZZYZVA, and then diff the pre-masked files 
 without using any -I.
 This would require a one-time hit to update all existing .q.out files so that 
 they would contain the pre-masked results.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HIVE-2446) Introduction of client statistics publishers possibility

2011-09-15 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi reassigned HIVE-2446:


Assignee: Robert Surówka

 Introduction of client statistics publishers possibility
 

 Key: HIVE-2446
 URL: https://issues.apache.org/jira/browse/HIVE-2446
 Project: Hive
  Issue Type: Improvement
  Components: Clients, Statistics
Reporter: Robert Surówka
Assignee: Robert Surówka
Priority: Minor
 Attachments: HIVE-2446.1.patch, HIVE-2446.1.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 The purpose of this change is to allow publication or storage of counters 
 while the job is running.
 Introduced two new variables to hive-default.xml and HiveConf.java: 
 hive.client.stats.publishers and hive.client.stats.counters. First one 
 specifies classes names, whose instances will be executed by 
 HadoopJobExecHelper.java (similarly as hooks are) in its method 
 progress(ExecDriverTaskHandle): MapRedStats. Second one specifies list of 
 counters that any client stat publishers should publish or stored. Details 
 regarding format of this list is up to a specific deployment (it is up to 
 client stats publishers to parse it), yet it is required to use display names 
 of counter groups and counters.
 Added interface ClientStatsPublishers in org.apache.hadoop.hive.ql.stats 
 package, that must be implemented by all stats publishers.
 Added code to progress(ExecDriverTaskHandle): MapRedStats from 
 HadoopJobExecHelper.java that puts counters' values to a Java map and then 
 executes registered client stats publishers giving them that map and running 
 job id. Added two new methods to HadoopJobExecHelper: 
 extractAllCounterValues(Counters) and getClientStatsPublishers() that are 
 used by code from previous sentence.
 Made cosmetic changes in two other classes
  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2182) Avoid null pointer exception when executing UDF

2011-09-14 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-2182:
-

   Resolution: Fixed
Fix Version/s: 0.9.0
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

Committed to trunk.  Thanks Chinna!


 Avoid null pointer exception when executing UDF
 ---

 Key: HIVE-2182
 URL: https://issues.apache.org/jira/browse/HIVE-2182
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.5.0, 0.8.0
 Environment: Hadoop 0.20.1, Hive0.8.0 and SUSE Linux Enterprise 
 Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5)
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam
 Fix For: 0.9.0

 Attachments: HIVE-2182.1.patch, HIVE-2182.2.patch, HIVE-2182.3.patch, 
 HIVE-2182.4.patch, HIVE-2182.5.patch, HIVE-2182.patch


 For using UDF's executed following steps
 {noformat}
 add jar /home/udf/udf.jar;
 create temporary function grade as 'udf.Grade';
 select m.userid,m.name,grade(m.maths,m.physics,m.chemistry) from marks m;
 {noformat}
 But from the above steps if we miss the first step (add jar) and execute 
 remaining steps
 {noformat}
 create temporary function grade as 'udf.Grade';
 select m.userid,m.name,grade(m.maths,m.physics,m.chemistry) from marks m;
 {noformat}
 In tasktracker it is throwing this exception
 {noformat}
 Caused by: java.lang.RuntimeException: Map operator initialization failed
at 
 org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:121)
... 18 more
 Caused by: java.lang.RuntimeException: java.lang.NullPointerException
at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:115)
at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.initialize(GenericUDFBridge.java:126)
at 
 org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:133)
at 
 org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:878)
at 
 org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:904)
at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:60)
at 
 org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
at 
 org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433)
at 
 org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:389)
at 
 org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:133)
at 
 org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
at 
 org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:444)
at 
 org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
at 
 org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:98)
... 18 more
 Caused by: java.lang.NullPointerException
at 
 java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:768)
at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:107)
... 31 more
 {noformat}
 Instead of null pointer exception it should throw meaning full exception

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-198) Parse errors report incorrectly.

2011-09-14 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13104948#comment-13104948
 ] 

John Sichi commented on HIVE-198:
-

Did you miss create_or_replace_view6.q.out?  Perhaps it was committed after you 
started.


jsichi-mac:clientnegative jsichi$ grep cannot recognize *.q.out
column_rename3.q.out:FAILED: Parse Error: line 1:27 cannot recognize input near 
'EOF' 'EOF' 'EOF' in column type
create_or_replace_view6.q.out:FAILED: Parse Error: line 2:52 cannot recognize 
input near 'blah' 'EOF' 'EOF' in select clause
invalid_select_expression.q.out:FAILED: Parse Error: line 1:32 cannot recognize 
input near '.' 'foo' 'EOF' in expression specification
invalid_tbl_name.q.out:FAILED: Parse Error: line 1:20 cannot recognize input 
near '-' 'name' '(' in create table statement


 Parse errors report incorrectly.
 

 Key: HIVE-198
 URL: https://issues.apache.org/jira/browse/HIVE-198
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: S. Alex Smith
Assignee: Aviv Eyal
  Labels: parse
 Attachments: HIVE-198.2.patch.txt, PraseErrorMessage.patch


 The following two queries fail:
 CREATE TABLE output_table(userid, bigint);
 CREATE TABLE output_table(userid bigint, age int, sex string, location 
 string);
 each giving the error message FAILED: Parse Error: line 1:16 mismatched 
 input 'TABLE' expecting KW_TEMPORARY
 Although one might not catch it from the error message, the problem with the 
 first is that there is a comma between userid and bigint, and the problem 
 with the second is that location is a reserved keyword.  Reported errors 
 should more accurately describe the nature of the error, such as no type 
 given for column 'userid' or 'location' is not a valid column name.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2380) Add ByteArray Datatype

2011-09-14 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13104980#comment-13104980
 ] 

John Sichi commented on HIVE-2380:
--

I don't see any references to it, so I think you're free to use it.

 Add ByteArray Datatype
 --

 Key: HIVE-2380
 URL: https://issues.apache.org/jira/browse/HIVE-2380
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: hive-2380.patch


 Add bytearray as a primitive data type.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2181) Clean up the scratch.dir (tmp/hive-root) while restarting Hive server.

2011-09-14 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13105014#comment-13105014
 ] 

John Sichi commented on HIVE-2181:
--

Oops, looks like I typed in the wrong JIRA issue number in the commit message :(


  Clean up the scratch.dir (tmp/hive-root) while restarting Hive server. 
 

 Key: HIVE-2181
 URL: https://issues.apache.org/jira/browse/HIVE-2181
 Project: Hive
  Issue Type: Bug
  Components: Server Infrastructure
Affects Versions: 0.8.0
 Environment: Suse linux, Hadoop 20.1, Hive 0.8
Reporter: sanoj mathew
Assignee: Chinna Rao Lalam
Priority: Minor
 Attachments: HIVE-2181.1.patch, HIVE-2181.2.patch, HIVE-2181.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 Now queries leaves the map outputs under scratch.dir after execution. If the 
 hive server is stopped we need not keep the stopped server's map oputputs. So 
 whle starting the server we can clear the scratch.dir. This can help in 
 improved disk usage.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2182) Avoid null pointer exception when executing UDF

2011-09-14 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13105025#comment-13105025
 ] 

John Sichi commented on HIVE-2182:
--

Oops, looks like I typed in the wrong JIRA issue number in the commit message 
(I typed in HIVE-2181 instead of HIVE-2182), so the Hudson commit message went 
there instead.  I've fixed it in the svn log though.

 Avoid null pointer exception when executing UDF
 ---

 Key: HIVE-2182
 URL: https://issues.apache.org/jira/browse/HIVE-2182
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.5.0, 0.8.0
 Environment: Hadoop 0.20.1, Hive0.8.0 and SUSE Linux Enterprise 
 Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5)
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam
 Fix For: 0.9.0

 Attachments: HIVE-2182.1.patch, HIVE-2182.2.patch, HIVE-2182.3.patch, 
 HIVE-2182.4.patch, HIVE-2182.5.patch, HIVE-2182.patch


 For using UDF's executed following steps
 {noformat}
 add jar /home/udf/udf.jar;
 create temporary function grade as 'udf.Grade';
 select m.userid,m.name,grade(m.maths,m.physics,m.chemistry) from marks m;
 {noformat}
 But from the above steps if we miss the first step (add jar) and execute 
 remaining steps
 {noformat}
 create temporary function grade as 'udf.Grade';
 select m.userid,m.name,grade(m.maths,m.physics,m.chemistry) from marks m;
 {noformat}
 In tasktracker it is throwing this exception
 {noformat}
 Caused by: java.lang.RuntimeException: Map operator initialization failed
at 
 org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:121)
... 18 more
 Caused by: java.lang.RuntimeException: java.lang.NullPointerException
at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:115)
at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.initialize(GenericUDFBridge.java:126)
at 
 org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:133)
at 
 org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:878)
at 
 org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:904)
at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:60)
at 
 org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
at 
 org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433)
at 
 org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:389)
at 
 org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:133)
at 
 org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
at 
 org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:444)
at 
 org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
at 
 org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:98)
... 18 more
 Caused by: java.lang.NullPointerException
at 
 java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:768)
at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:107)
... 31 more
 {noformat}
 Instead of null pointer exception it should throw meaning full exception

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2380) Add ByteArray Datatype

2011-09-13 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103894#comment-13103894
 ] 

John Sichi commented on HIVE-2380:
--

For accessor functions:

* length
* substring
* concat

We can follow up later with search capabilities.

For conversions:

* to/from hex string
* to/from string using a specific encoding (or default JVM encoding if not 
specified)
* to/from base64 string

We can follow up later with more interesting conversions for non-string types.


 Add ByteArray Datatype
 --

 Key: HIVE-2380
 URL: https://issues.apache.org/jira/browse/HIVE-2380
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: hive-2380.patch


 Add bytearray as a primitive data type.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2223) support grouping on complex types in Hive

2011-09-13 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103900#comment-13103900
 ] 

John Sichi commented on HIVE-2223:
--

I can't seem to view the diff on Review Board?


 support grouping on complex types in Hive
 -

 Key: HIVE-2223
 URL: https://issues.apache.org/jira/browse/HIVE-2223
 Project: Hive
  Issue Type: New Feature
Reporter: Kate Ting
Assignee: Jonathan Chang
Priority: Minor
 Attachments: HIVE-2223.patch


 Creating a query with a GROUP BY statement when an array type column is part 
 of the column list is not yet supported:
 CREATE TABLE test_group_by ( key INT, group INT, terms ARRAYSTRING);
 SELECT key, terms, count(group) FROM test_group_by GROUP BY key, terms;
 ...
 Hash code on complex types not supported yet.
 java.lang.RuntimeException: Error while closing operators
 at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:232)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
 at org.apache.hadoop.mapred.Child.main(Child.java:170)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.RuntimeException: Hash code on complex types not supported yet.
 at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:799)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:462)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:470)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:470)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:470)
 at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:211)
 ... 4 more
 Caused by: java.lang.RuntimeException: Hash code on complex types not 
 supported yet.
 at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hashCode(ObjectInspectorUtils.java:348)
 at 
 org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:187)
 at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:598)
 at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:746)
 at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:780)
 ... 9 more

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2223) support grouping on complex types in Hive

2011-09-13 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13104096#comment-13104096
 ] 

John Sichi commented on HIVE-2223:
--

It applies cleanly for me, but I was also able to upload it to Review Board 
successfully. Did you try choosing hive-git for the repository?


 support grouping on complex types in Hive
 -

 Key: HIVE-2223
 URL: https://issues.apache.org/jira/browse/HIVE-2223
 Project: Hive
  Issue Type: New Feature
Reporter: Kate Ting
Assignee: Jonathan Chang
Priority: Minor
 Attachments: HIVE-2223.patch


 Creating a query with a GROUP BY statement when an array type column is part 
 of the column list is not yet supported:
 CREATE TABLE test_group_by ( key INT, group INT, terms ARRAYSTRING);
 SELECT key, terms, count(group) FROM test_group_by GROUP BY key, terms;
 ...
 Hash code on complex types not supported yet.
 java.lang.RuntimeException: Error while closing operators
 at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:232)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
 at org.apache.hadoop.mapred.Child.main(Child.java:170)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.RuntimeException: Hash code on complex types not supported yet.
 at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:799)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:462)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:470)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:470)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:470)
 at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:211)
 ... 4 more
 Caused by: java.lang.RuntimeException: Hash code on complex types not 
 supported yet.
 at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hashCode(ObjectInspectorUtils.java:348)
 at 
 org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:187)
 at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:598)
 at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:746)
 at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:780)
 ... 9 more

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2405) get_privilege does not get user level privilege

2011-09-13 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-2405:
-

   Resolution: Fixed
Fix Version/s: 0.9.0
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

Committed to trunk.  Thanks Yongqiang!


 get_privilege does not get user level privilege
 ---

 Key: HIVE-2405
 URL: https://issues.apache.org/jira/browse/HIVE-2405
 Project: Hive
  Issue Type: Bug
Reporter: He Yongqiang
Assignee: He Yongqiang
 Fix For: 0.9.0

 Attachments: HIVE-2405.1.patch, HIVE-2405.2.patch


 hive set hive.security.authorization.enabled=true;
 hive  grant all to user heyongqiang;  
 hive show grant user heyongqiang; 
 principalName heyongqiang 
 principalType USER
 privilege All 
 grantTime Wed Aug 24 11:51:54 PDT 2011
 grantor   heyongqiang 
 Time taken: 0.032 seconds
 hive  CREATE TABLE src (foo INT, bar STRING); 
 Authorization failed:No privilege 'Create' found for outputs { 
 database:default}. Use show grant to get more details.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2182) Avoid null pointer exception when executing UDF

2011-09-12 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102927#comment-13102927
 ] 

John Sichi commented on HIVE-2182:
--

Yeah, I hit those failures too while testing. I'll rerun with the latest patch.

 Avoid null pointer exception when executing UDF
 ---

 Key: HIVE-2182
 URL: https://issues.apache.org/jira/browse/HIVE-2182
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.5.0, 0.8.0
 Environment: Hadoop 0.20.1, Hive0.8.0 and SUSE Linux Enterprise 
 Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5)
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam
 Attachments: HIVE-2182.1.patch, HIVE-2182.2.patch, HIVE-2182.3.patch, 
 HIVE-2182.4.patch, HIVE-2182.patch


 For using UDF's executed following steps
 {noformat}
 add jar /home/udf/udf.jar;
 create temporary function grade as 'udf.Grade';
 select m.userid,m.name,grade(m.maths,m.physics,m.chemistry) from marks m;
 {noformat}
 But from the above steps if we miss the first step (add jar) and execute 
 remaining steps
 {noformat}
 create temporary function grade as 'udf.Grade';
 select m.userid,m.name,grade(m.maths,m.physics,m.chemistry) from marks m;
 {noformat}
 In tasktracker it is throwing this exception
 {noformat}
 Caused by: java.lang.RuntimeException: Map operator initialization failed
at 
 org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:121)
... 18 more
 Caused by: java.lang.RuntimeException: java.lang.NullPointerException
at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:115)
at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.initialize(GenericUDFBridge.java:126)
at 
 org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:133)
at 
 org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:878)
at 
 org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:904)
at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:60)
at 
 org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
at 
 org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433)
at 
 org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:389)
at 
 org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:133)
at 
 org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
at 
 org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:444)
at 
 org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
at 
 org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:98)
... 18 more
 Caused by: java.lang.NullPointerException
at 
 java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:768)
at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:107)
... 31 more
 {noformat}
 Instead of null pointer exception it should throw meaning full exception

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2441) Metastore upgrade scripts for schema change introduced in HIVE-2215

2011-09-12 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102944#comment-13102944
 ] 

John Sichi commented on HIVE-2441:
--

@Ashutosh:  we provide the create scripts since DBA's may choose to control 
schema modification (rather than letting Hive do it automatically).

 Metastore upgrade scripts for schema change introduced in HIVE-2215
 ---

 Key: HIVE-2441
 URL: https://issues.apache.org/jira/browse/HIVE-2441
 Project: Hive
  Issue Type: Task
  Components: Metastore
Reporter: Carl Steinbach
Assignee: Ashutosh Chauhan
Priority: Blocker
 Fix For: 0.8.0




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-1694) Accelerate GROUP BY execution using indexes

2011-09-12 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102988#comment-13102988
 ] 

John Sichi commented on HIVE-1694:
--

Prajakta, can you re-attach your latest patch granting rights to ASF (so the 
feather shows up next to the attachment), and then click the Submit Patch 
button?

 Accelerate GROUP BY execution using indexes
 ---

 Key: HIVE-1694
 URL: https://issues.apache.org/jira/browse/HIVE-1694
 Project: Hive
  Issue Type: New Feature
  Components: Indexing, Query Processor
Affects Versions: 0.7.0
Reporter: Nikhil Deshpande
Assignee: Prajakta Kalmegh
 Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, 
 HIVE-1694.3.patch.txt, HIVE-1694.4.patch, HIVE-1694.5.patch, 
 HIVE-1694.6.patch, HIVE-1694.7.patch, HIVE-1694_2010-10-28.diff, demo_q1.hql, 
 demo_q2.hql


 The index building patch (Hive-417) is checked into trunk, this JIRA issue 
 tracks supporting indexes in Hive compiler  execution engine for SELECT 
 queries.
 This is in ref. to John's comment at
 https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869
 on creating separate JIRA issue for tracking index usage in optimizer  query 
 execution.
 The aim of this effort is to use indexes to accelerate query execution (for 
 certain class of queries). E.g.
 - Filters and range scans (already being worked on by He Yongqiang as part of 
 HIVE-417?)
 - Joins (index based joins)
 - Group By, Order By and other misc cases
 The proposal is multi-step:
 1. Building index based operators, compiler and execution engine changes
 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose 
 between index scans, full table scans etc.)
 This JIRA initially focuses on the first step. This JIRA is expected to hold 
 the information about index based plans  operator implementations for above 
 mentioned cases. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2441) Metastore upgrade scripts for schema change introduced in HIVE-2215

2011-09-12 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102995#comment-13102995
 ] 

John Sichi commented on HIVE-2441:
--

Since we defined the feature generically, the tables should always be created; 
their presence will not cause any problem, and having that unconditional 
actually seems less confusing to me (we don't currently have any 
feature-specific portion of the metastore).


 Metastore upgrade scripts for schema change introduced in HIVE-2215
 ---

 Key: HIVE-2441
 URL: https://issues.apache.org/jira/browse/HIVE-2441
 Project: Hive
  Issue Type: Task
  Components: Metastore
Reporter: Carl Steinbach
Assignee: Ashutosh Chauhan
Priority: Blocker
 Fix For: 0.8.0




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-1694) Accelerate GROUP BY execution using indexes

2011-09-12 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103013#comment-13103013
 ] 

John Sichi commented on HIVE-1694:
--

+1.  Will commit when tests pass.

 Accelerate GROUP BY execution using indexes
 ---

 Key: HIVE-1694
 URL: https://issues.apache.org/jira/browse/HIVE-1694
 Project: Hive
  Issue Type: New Feature
  Components: Indexing, Query Processor
Affects Versions: 0.7.0
Reporter: Nikhil Deshpande
Assignee: Prajakta Kalmegh
 Fix For: 0.8.0

 Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, 
 HIVE-1694.3.patch.txt, HIVE-1694.4.patch, HIVE-1694.5.patch, 
 HIVE-1694.6.patch, HIVE-1694.7.patch, HIVE-1694.7.patch, 
 HIVE-1694_2010-10-28.diff, demo_q1.hql, demo_q2.hql


 The index building patch (Hive-417) is checked into trunk, this JIRA issue 
 tracks supporting indexes in Hive compiler  execution engine for SELECT 
 queries.
 This is in ref. to John's comment at
 https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869
 on creating separate JIRA issue for tracking index usage in optimizer  query 
 execution.
 The aim of this effort is to use indexes to accelerate query execution (for 
 certain class of queries). E.g.
 - Filters and range scans (already being worked on by He Yongqiang as part of 
 HIVE-417?)
 - Joins (index based joins)
 - Group By, Order By and other misc cases
 The proposal is multi-step:
 1. Building index based operators, compiler and execution engine changes
 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose 
 between index scans, full table scans etc.)
 This JIRA initially focuses on the first step. This JIRA is expected to hold 
 the information about index based plans  operator implementations for above 
 mentioned cases. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-730) Allow Hive UDF/UDAF to use scala

2011-09-12 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103112#comment-13103112
 ] 

John Sichi commented on HIVE-730:
-

Apparently we just need to document it better:

http://mail-archives.apache.org/mod_mbox/hive-user/201109.mbox/%3CCAKi8Xk3XQHJu1y++BM=oOS6M=astg3mbaojs+9zszugjjf1...@mail.gmail.com%3E

 Allow Hive UDF/UDAF to use scala
 

 Key: HIVE-730
 URL: https://issues.apache.org/jira/browse/HIVE-730
 Project: Hive
  Issue Type: New Feature
Reporter: Zheng Shao

 Scala is a programing language that is concise and can run on top of jvm. 
 http://www.scala-lang.org/
 We should have some examples of Hive UDF/UDAF in Scala, and make it easy for 
 people to write Hive UDF/UDAF in Scala.
 Thanks Venky for information and idea on scala and hive integration.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HIVE-2327) UDFs should be made aware when their arguments are constants.

2011-09-12 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi resolved HIVE-2327.
--

Resolution: Duplicate

 UDFs should be made aware when their arguments are constants.
 -

 Key: HIVE-2327
 URL: https://issues.apache.org/jira/browse/HIVE-2327
 Project: Hive
  Issue Type: Improvement
Reporter: Adam Kramer

 There are a lot of UDFs which would show major performance differences if one 
 assumes that some of its arguments are constant.
 Consider, for example, any UDF that takes a regular expression as input: This 
 can be complied once (fast) if it's a constant, or once per row (wicked slow) 
 if it's not a constant.
 Or, consider any UDF that reads from a file and/or takes a filename as input; 
 it would have to re-read the whole file if the filename changes.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HIVE-200) ant test will fail with apache-ant-1.7.1

2011-09-12 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi resolved HIVE-200.
-

Resolution: Won't Fix

We're on 1.8.x these days.

 ant test will fail with apache-ant-1.7.1
 

 Key: HIVE-200
 URL: https://issues.apache.org/jira/browse/HIVE-200
 Project: Hive
  Issue Type: Bug
  Components: Testing Infrastructure
Reporter: Zheng Shao

 ant test succeeded with Apache Ant version 1.6.5 compiled on June 2 2005, 
 but fails with apache-ant-1.7.1.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2380) Add ByteArray Datatype

2011-09-12 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-2380:
-

Status: Open  (was: Patch Available)

 Add ByteArray Datatype
 --

 Key: HIVE-2380
 URL: https://issues.apache.org/jira/browse/HIVE-2380
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: hive-2380.patch


 Add bytearray as a primitive data type.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2327) UDFs should be made aware when their arguments are constants.

2011-09-12 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103150#comment-13103150
 ] 

John Sichi commented on HIVE-2327:
--

OK, then I guess this issue should be renamed?


 UDFs should be made aware when their arguments are constants.
 -

 Key: HIVE-2327
 URL: https://issues.apache.org/jira/browse/HIVE-2327
 Project: Hive
  Issue Type: Improvement
Reporter: Adam Kramer

 There are a lot of UDFs which would show major performance differences if one 
 assumes that some of its arguments are constant.
 Consider, for example, any UDF that takes a regular expression as input: This 
 can be complied once (fast) if it's a constant, or once per row (wicked slow) 
 if it's not a constant.
 Or, consider any UDF that reads from a file and/or takes a filename as input; 
 it would have to re-read the whole file if the filename changes.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-1694) Accelerate GROUP BY execution using indexes

2011-09-12 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-1694:
-

   Resolution: Fixed
Fix Version/s: (was: 0.8.0)
   0.9.0
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

Committed to trunk.  Thanks Prajakta!


 Accelerate GROUP BY execution using indexes
 ---

 Key: HIVE-1694
 URL: https://issues.apache.org/jira/browse/HIVE-1694
 Project: Hive
  Issue Type: New Feature
  Components: Indexing, Query Processor
Affects Versions: 0.7.0
Reporter: Nikhil Deshpande
Assignee: Prajakta Kalmegh
 Fix For: 0.9.0

 Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, 
 HIVE-1694.3.patch.txt, HIVE-1694.4.patch, HIVE-1694.5.patch, 
 HIVE-1694.6.patch, HIVE-1694.7.patch, HIVE-1694.7.patch, 
 HIVE-1694_2010-10-28.diff, demo_q1.hql, demo_q2.hql


 The index building patch (Hive-417) is checked into trunk, this JIRA issue 
 tracks supporting indexes in Hive compiler  execution engine for SELECT 
 queries.
 This is in ref. to John's comment at
 https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869
 on creating separate JIRA issue for tracking index usage in optimizer  query 
 execution.
 The aim of this effort is to use indexes to accelerate query execution (for 
 certain class of queries). E.g.
 - Filters and range scans (already being worked on by He Yongqiang as part of 
 HIVE-417?)
 - Joins (index based joins)
 - Group By, Order By and other misc cases
 The proposal is multi-step:
 1. Building index based operators, compiler and execution engine changes
 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose 
 between index scans, full table scans etc.)
 This JIRA initially focuses on the first step. This JIRA is expected to hold 
 the information about index based plans  operator implementations for above 
 mentioned cases. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2182) Avoid null pointer exception when executing UDF

2011-09-12 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-2182:
-

Status: Open  (was: Patch Available)

I got merge conflicts trying to apply the latest patch.


At revision 1170007.
(Stripping trailing CRs from patch.)
patching file 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFBridge.java
(Stripping trailing CRs from patch.)
patching file ql/src/test/queries/clientnegative/udfnull.q
(Stripping trailing CRs from patch.)
patching file ql/src/test/results/clientnegative/udfnull.q.out
(Stripping trailing CRs from patch.)
patching file ql/src/test/results/compiler/plan/cast1.q.xml
Hunk #2 FAILED at 62.
Hunk #3 FAILED at 124.
Hunk #4 FAILED at 160.
Hunk #5 succeeded at 371 (offset 4 lines).
Hunk #7 succeeded at 455 (offset 4 lines).
Hunk #9 succeeded at 526 (offset 4 lines).
Hunk #11 succeeded at 622 (offset 4 lines).
Hunk #13 succeeded at 1066 (offset 4 lines).
Hunk #15 FAILED at 1131.
Hunk #16 FAILED at 1193.
5 out of 16 hunks FAILED -- saving rejects to file 
ql/src/test/results/compiler/plan/cast1.q.xml.rej
(Stripping trailing CRs from patch.)
patching file ql/src/test/results/compiler/plan/groupby1.q.xml
(Stripping trailing CRs from patch.)
patching file ql/src/test/results/compiler/plan/groupby2.q.xml
Hunk #13 succeeded at 1408 (offset 4 lines).
(Stripping trailing CRs from patch.)
patching file ql/src/test/results/compiler/plan/groupby3.q.xml
(Stripping trailing CRs from patch.)
patching file ql/src/test/results/compiler/plan/groupby4.q.xml
(Stripping trailing CRs from patch.)
patching file ql/src/test/results/compiler/plan/groupby5.q.xml
(Stripping trailing CRs from patch.)
patching file ql/src/test/results/compiler/plan/groupby6.q.xml
(Stripping trailing CRs from patch.)
patching file ql/src/test/results/compiler/plan/input20.q.xml
Hunk #1 FAILED at 1.
Hunk #2 FAILED at 62.
Hunk #3 FAILED at 124.
Hunk #6 FAILED at 850.
Hunk #7 FAILED at 862.
Hunk #8 FAILED at 919.
Hunk #9 FAILED at 981.
Hunk #10 FAILED at 1015.
8 out of 10 hunks FAILED -- saving rejects to file 
ql/src/test/results/compiler/plan/input20.q.xml.rej
(Stripping trailing CRs from patch.)
patching file ql/src/test/results/compiler/plan/input8.q.xml
Hunk #1 FAILED at 1.
Hunk #2 FAILED at 62.
Hunk #3 FAILED at 124.
Hunk #4 FAILED at 156.
Hunk #5 succeeded at 314 (offset 4 lines).
Hunk #7 succeeded at 403 (offset 4 lines).
Hunk #8 FAILED at 641.
Hunk #9 FAILED at 653.
Hunk #10 FAILED at 710.
Hunk #11 FAILED at 772.
8 out of 11 hunks FAILED -- saving rejects to file 
ql/src/test/results/compiler/plan/input8.q.xml.rej
(Stripping trailing CRs from patch.)
patching file ql/src/test/results/compiler/plan/join2.q.xml
(Stripping trailing CRs from patch.)
patching file ql/src/test/results/compiler/plan/sample1.q.xml
Hunk #5 succeeded at 555 (offset 4 lines).
Hunk #7 succeeded at 639 (offset 4 lines).
Hunk #9 succeeded at 885 (offset 4 lines).
Hunk #11 succeeded at 1021 (offset 4 lines).
(Stripping trailing CRs from patch.)
patching file ql/src/test/results/compiler/plan/sample2.q.xml
(Stripping trailing CRs from patch.)
patching file ql/src/test/results/compiler/plan/sample3.q.xml
(Stripping trailing CRs from patch.)
patching file ql/src/test/results/compiler/plan/sample4.q.xml
(Stripping trailing CRs from patch.)
patching file ql/src/test/results/compiler/plan/sample5.q.xml
(Stripping trailing CRs from patch.)
patching file ql/src/test/results/compiler/plan/sample6.q.xml
(Stripping trailing CRs from patch.)
patching file ql/src/test/results/compiler/plan/sample7.q.xml
(Stripping trailing CRs from patch.)
patching file ql/src/test/results/compiler/plan/udf1.q.xml
Hunk #5 succeeded at 510 (offset 4 lines).
Hunk #7 succeeded at 606 (offset 4 lines).
Hunk #9 succeeded at 702 (offset 4 lines).
Hunk #11 succeeded at 798 (offset 4 lines).
Hunk #13 succeeded at 894 (offset 4 lines).
Hunk #15 succeeded at 997 (offset 4 lines).
Hunk #17 succeeded at 1093 (offset 4 lines).
Hunk #19 succeeded at 1203 (offset 4 lines).
Hunk #21 succeeded at 1306 (offset 4 lines).
Hunk #23 succeeded at 1904 (offset 4 lines).
Hunk #25 succeeded at 2023 (offset 4 lines).
(Stripping trailing CRs from patch.)
patching file ql/src/test/results/compiler/plan/udf4.q.xml
Hunk #5 succeeded at 523 (offset 4 lines).
Hunk #7 succeeded at 585 (offset 4 lines).
Hunk #9 succeeded at 662 (offset 4 lines).
Hunk #11 succeeded at 717 (offset 4 lines).
Hunk #13 succeeded at 794 (offset 4 lines).
Hunk #15 succeeded at 849 (offset 4 lines).
Hunk #17 succeeded at 919 (offset 4 lines).
Hunk #19 succeeded at 996 (offset 4 lines).
Hunk #21 succeeded at 1051 (offset 4 lines).
Hunk #23 succeeded at 1126 (offset 4 lines).
Hunk #25 succeeded at 1212 (offset 4 lines).
Hunk #27 succeeded at 1296 (offset 4 lines).
Hunk #29 succeeded at 1846 (offset 4 lines).
Hunk #31 succeeded at 1965 (offset 4 lines).
(Stripping trailing CRs from patch.)
patching file 

[jira] [Commented] (HIVE-2182) Avoid null pointer exception when executing UDF

2011-09-09 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101499#comment-13101499
 ] 

John Sichi commented on HIVE-2182:
--

It's still failing for me with the latest patch.  Did you use -Doverwrite=true 
to regenerate the log?

{noformat}
[junit] diff -a -I file: -I pfile: -I hdfs: -I /tmp/ -I invalidscheme: -I 
lastUpdateTime -I lastAccessTime -I [Oo]wner -I CreateTime -I LastAccessTime -I 
Location -I LOCATION ' -I transient_lastDdlTime -I last_modified_ -I 
java.lang.RuntimeException -I at org -I at sun -I at java -I at junit -I Caused 
by: -I LOCK_QUERYID: -I grantTime -I [.][.][.] [0-9]* more -I job_[0-9]*_[0-9]* 
-I USING 'java -cp 
/data/users/jsichi/open/test-trunk/build/ql/test/logs/clientnegative/udfnull.q.out
 
/data/users/jsichi/open/test-trunk/ql/src/test/results/clientnegative/udfnull.q.out
[junit] 18c18,27
[junit]  /data/users/jsichi/open/test-trunk/build/ql/tmp//hive.log
[junit] ---
[junit]  /home/opensrc/9thsep/build/ql/tmp//hive.log
[junit]  FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.MapRedTask
[junit]  PREHOOK: query: CREATE TEMPORARY FUNCTION example_arraysum AS 
'org.apache.hadoop.hive.contrib.udf.example.UDFExampleArraySum'
[junit]  PREHOOK: type: CREATEFUNCTION
[junit]  POSTHOOK: query: CREATE TEMPORARY FUNCTION example_arraysum AS 
'org.apache.hadoop.hive.contrib.udf.example.UDFExampleArraySum'
[junit]  POSTHOOK: type: CREATEFUNCTION
[junit]  PREHOOK: query: SELECT example_arraysum(lint)FROM src_thrift
[junit]  PREHOOK: type: QUERY
[junit]  PREHOOK: Input: default@src_thrift
[junit]  PREHOOK: Output: 
file:/tmp/root/hive_2011-05-25_10-05-57_126_4632621650656424226/-mr-1
{noformat}


 Avoid null pointer exception when executing UDF
 ---

 Key: HIVE-2182
 URL: https://issues.apache.org/jira/browse/HIVE-2182
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.5.0, 0.8.0
 Environment: Hadoop 0.20.1, Hive0.8.0 and SUSE Linux Enterprise 
 Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5)
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam
 Attachments: HIVE-2182.1.patch, HIVE-2182.2.patch, HIVE-2182.3.patch, 
 HIVE-2182.patch


 For using UDF's executed following steps
 {noformat}
 add jar /home/udf/udf.jar;
 create temporary function grade as 'udf.Grade';
 select m.userid,m.name,grade(m.maths,m.physics,m.chemistry) from marks m;
 {noformat}
 But from the above steps if we miss the first step (add jar) and execute 
 remaining steps
 {noformat}
 create temporary function grade as 'udf.Grade';
 select m.userid,m.name,grade(m.maths,m.physics,m.chemistry) from marks m;
 {noformat}
 In tasktracker it is throwing this exception
 {noformat}
 Caused by: java.lang.RuntimeException: Map operator initialization failed
at 
 org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:121)
... 18 more
 Caused by: java.lang.RuntimeException: java.lang.NullPointerException
at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:115)
at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.initialize(GenericUDFBridge.java:126)
at 
 org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:133)
at 
 org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:878)
at 
 org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:904)
at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:60)
at 
 org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
at 
 org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433)
at 
 org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:389)
at 
 org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:133)
at 
 org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
at 
 org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:444)
at 
 org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
at 
 org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:98)
... 18 more
 Caused by: java.lang.NullPointerException
at 
 java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:768)
at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:107)
... 31 more
 {noformat}
 Instead of null 

[jira] [Commented] (HIVE-2182) Avoid null pointer exception when executing UDF

2011-09-09 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101501#comment-13101501
 ] 

John Sichi commented on HIVE-2182:
--

Oops, sorry, ignore comment above...I misapplied the latest patch.

 Avoid null pointer exception when executing UDF
 ---

 Key: HIVE-2182
 URL: https://issues.apache.org/jira/browse/HIVE-2182
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.5.0, 0.8.0
 Environment: Hadoop 0.20.1, Hive0.8.0 and SUSE Linux Enterprise 
 Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5)
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam
 Attachments: HIVE-2182.1.patch, HIVE-2182.2.patch, HIVE-2182.3.patch, 
 HIVE-2182.patch


 For using UDF's executed following steps
 {noformat}
 add jar /home/udf/udf.jar;
 create temporary function grade as 'udf.Grade';
 select m.userid,m.name,grade(m.maths,m.physics,m.chemistry) from marks m;
 {noformat}
 But from the above steps if we miss the first step (add jar) and execute 
 remaining steps
 {noformat}
 create temporary function grade as 'udf.Grade';
 select m.userid,m.name,grade(m.maths,m.physics,m.chemistry) from marks m;
 {noformat}
 In tasktracker it is throwing this exception
 {noformat}
 Caused by: java.lang.RuntimeException: Map operator initialization failed
at 
 org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:121)
... 18 more
 Caused by: java.lang.RuntimeException: java.lang.NullPointerException
at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:115)
at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.initialize(GenericUDFBridge.java:126)
at 
 org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:133)
at 
 org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:878)
at 
 org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:904)
at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:60)
at 
 org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
at 
 org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433)
at 
 org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:389)
at 
 org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:133)
at 
 org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
at 
 org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:444)
at 
 org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
at 
 org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:98)
... 18 more
 Caused by: java.lang.NullPointerException
at 
 java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:768)
at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:107)
... 31 more
 {noformat}
 Instead of null pointer exception it should throw meaning full exception

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2182) Avoid null pointer exception when executing UDF

2011-09-09 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101508#comment-13101508
 ] 

John Sichi commented on HIVE-2182:
--

+1.  Will commit when tests pass.

 Avoid null pointer exception when executing UDF
 ---

 Key: HIVE-2182
 URL: https://issues.apache.org/jira/browse/HIVE-2182
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.5.0, 0.8.0
 Environment: Hadoop 0.20.1, Hive0.8.0 and SUSE Linux Enterprise 
 Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5)
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam
 Attachments: HIVE-2182.1.patch, HIVE-2182.2.patch, HIVE-2182.3.patch, 
 HIVE-2182.patch


 For using UDF's executed following steps
 {noformat}
 add jar /home/udf/udf.jar;
 create temporary function grade as 'udf.Grade';
 select m.userid,m.name,grade(m.maths,m.physics,m.chemistry) from marks m;
 {noformat}
 But from the above steps if we miss the first step (add jar) and execute 
 remaining steps
 {noformat}
 create temporary function grade as 'udf.Grade';
 select m.userid,m.name,grade(m.maths,m.physics,m.chemistry) from marks m;
 {noformat}
 In tasktracker it is throwing this exception
 {noformat}
 Caused by: java.lang.RuntimeException: Map operator initialization failed
at 
 org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:121)
... 18 more
 Caused by: java.lang.RuntimeException: java.lang.NullPointerException
at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:115)
at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.initialize(GenericUDFBridge.java:126)
at 
 org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:133)
at 
 org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:878)
at 
 org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:904)
at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:60)
at 
 org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
at 
 org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433)
at 
 org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:389)
at 
 org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:133)
at 
 org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
at 
 org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:444)
at 
 org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
at 
 org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:98)
... 18 more
 Caused by: java.lang.NullPointerException
at 
 java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:768)
at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:107)
... 31 more
 {noformat}
 Instead of null pointer exception it should throw meaning full exception

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-1694) Accelerate GROUP BY execution using indexes

2011-09-09 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101606#comment-13101606
 ] 

John Sichi commented on HIVE-1694:
--

Looks great.  One last change:  for all the SELECT queries in the .q file, can 
you add an ORDER BY on a full key for test determinism.

 Accelerate GROUP BY execution using indexes
 ---

 Key: HIVE-1694
 URL: https://issues.apache.org/jira/browse/HIVE-1694
 Project: Hive
  Issue Type: New Feature
  Components: Indexing, Query Processor
Affects Versions: 0.7.0
Reporter: Nikhil Deshpande
Assignee: Prajakta Kalmegh
 Attachments: HIVE-1694.1.patch.txt, HIVE-1694.2.patch.txt, 
 HIVE-1694.3.patch.txt, HIVE-1694.4.patch, HIVE-1694.5.patch, 
 HIVE-1694.6.patch, HIVE-1694_2010-10-28.diff, demo_q1.hql, demo_q2.hql


 The index building patch (Hive-417) is checked into trunk, this JIRA issue 
 tracks supporting indexes in Hive compiler  execution engine for SELECT 
 queries.
 This is in ref. to John's comment at
 https://issues.apache.org/jira/browse/HIVE-417?focusedCommentId=12884869page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12884869
 on creating separate JIRA issue for tracking index usage in optimizer  query 
 execution.
 The aim of this effort is to use indexes to accelerate query execution (for 
 certain class of queries). E.g.
 - Filters and range scans (already being worked on by He Yongqiang as part of 
 HIVE-417?)
 - Joins (index based joins)
 - Group By, Order By and other misc cases
 The proposal is multi-step:
 1. Building index based operators, compiler and execution engine changes
 2. Optimizer enhancements (e.g. cost-based optimizer to compare and choose 
 between index scans, full table scans etc.)
 This JIRA initially focuses on the first step. This JIRA is expected to hold 
 the information about index based plans  operator implementations for above 
 mentioned cases. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2182) Avoid null pointer exception when executing UDF

2011-09-08 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-2182:
-

Status: Open  (was: Patch Available)

Can you add the test case back in?  Also create a review board request?

 Avoid null pointer exception when executing UDF
 ---

 Key: HIVE-2182
 URL: https://issues.apache.org/jira/browse/HIVE-2182
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.5.0, 0.8.0
 Environment: Hadoop 0.20.1, Hive0.8.0 and SUSE Linux Enterprise 
 Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5)
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam
 Attachments: HIVE-2182.1.patch, HIVE-2182.patch


 For using UDF's executed following steps
 {noformat}
 add jar /home/udf/udf.jar;
 create temporary function grade as 'udf.Grade';
 select m.userid,m.name,grade(m.maths,m.physics,m.chemistry) from marks m;
 {noformat}
 But from the above steps if we miss the first step (add jar) and execute 
 remaining steps
 {noformat}
 create temporary function grade as 'udf.Grade';
 select m.userid,m.name,grade(m.maths,m.physics,m.chemistry) from marks m;
 {noformat}
 In tasktracker it is throwing this exception
 {noformat}
 Caused by: java.lang.RuntimeException: Map operator initialization failed
at 
 org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:121)
... 18 more
 Caused by: java.lang.RuntimeException: java.lang.NullPointerException
at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:115)
at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.initialize(GenericUDFBridge.java:126)
at 
 org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:133)
at 
 org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:878)
at 
 org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:904)
at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:60)
at 
 org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
at 
 org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433)
at 
 org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:389)
at 
 org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:133)
at 
 org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
at 
 org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:444)
at 
 org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
at 
 org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:98)
... 18 more
 Caused by: java.lang.NullPointerException
at 
 java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:768)
at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:107)
... 31 more
 {noformat}
 Instead of null pointer exception it should throw meaning full exception

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2402) Function like with empty string is throwing null pointer exception

2011-09-08 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13100569#comment-13100569
 ] 

John Sichi commented on HIVE-2402:
--

+1.  Will commit when tests pass.


 Function like with empty string is throwing null pointer exception
 --

 Key: HIVE-2402
 URL: https://issues.apache.org/jira/browse/HIVE-2402
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.8.0
 Environment: Hadoop 0.20.1, Hive0.8.0 and SUSE Linux Enterprise 
 Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5)
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam
 Attachments: HIVE-2402.1.patch, HIVE-2402.patch


 select emp.ename from emp where ename like ''
 This query is throwing null pointer exception

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2223) support grouping on complex types in Hive

2011-09-08 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13100570#comment-13100570
 ] 

John Sichi commented on HIVE-2223:
--

Jonathan, fill in the bug field in Review Board with HIVE-2223 so that the 
comments from there will automatically get propagated here.


 support grouping on complex types in Hive
 -

 Key: HIVE-2223
 URL: https://issues.apache.org/jira/browse/HIVE-2223
 Project: Hive
  Issue Type: New Feature
Reporter: Kate Ting
Assignee: Jonathan Chang
Priority: Minor
 Attachments: HIVE-2223.patch


 Creating a query with a GROUP BY statement when an array type column is part 
 of the column list is not yet supported:
 CREATE TABLE test_group_by ( key INT, group INT, terms ARRAYSTRING);
 SELECT key, terms, count(group) FROM test_group_by GROUP BY key, terms;
 ...
 Hash code on complex types not supported yet.
 java.lang.RuntimeException: Error while closing operators
 at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:232)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
 at org.apache.hadoop.mapred.Child.main(Child.java:170)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.RuntimeException: Hash code on complex types not supported yet.
 at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:799)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:462)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:470)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:470)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:470)
 at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:211)
 ... 4 more
 Caused by: java.lang.RuntimeException: Hash code on complex types not 
 supported yet.
 at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hashCode(ObjectInspectorUtils.java:348)
 at 
 org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:187)
 at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:598)
 at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:746)
 at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:780)
 ... 9 more

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HIVE-198) Parse errors report incorrectly.

2011-09-08 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi reassigned HIVE-198:
---

Assignee: Aviv Eyal

 Parse errors report incorrectly.
 

 Key: HIVE-198
 URL: https://issues.apache.org/jira/browse/HIVE-198
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: S. Alex Smith
Assignee: Aviv Eyal
  Labels: parse
 Attachments: PraseErrorMessage.patch


 The following two queries fail:
 CREATE TABLE output_table(userid, bigint);
 CREATE TABLE output_table(userid bigint, age int, sex string, location 
 string);
 each giving the error message FAILED: Parse Error: line 1:16 mismatched 
 input 'TABLE' expecting KW_TEMPORARY
 Although one might not catch it from the error message, the problem with the 
 first is that there is a comma between userid and bigint, and the problem 
 with the second is that location is a reserved keyword.  Reported errors 
 should more accurately describe the nature of the error, such as no type 
 given for column 'userid' or 'location' is not a valid column name.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-198) Parse errors report incorrectly.

2011-09-08 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-198:


Status: Open  (was: Patch Available)

Could you add a test case, and also submit a review board request?

https://cwiki.apache.org/confluence/display/Hive/HowToContribute#HowToContribute-ReviewProcess


 Parse errors report incorrectly.
 

 Key: HIVE-198
 URL: https://issues.apache.org/jira/browse/HIVE-198
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: S. Alex Smith
Assignee: Aviv Eyal
  Labels: parse
 Attachments: PraseErrorMessage.patch


 The following two queries fail:
 CREATE TABLE output_table(userid, bigint);
 CREATE TABLE output_table(userid bigint, age int, sex string, location 
 string);
 each giving the error message FAILED: Parse Error: line 1:16 mismatched 
 input 'TABLE' expecting KW_TEMPORARY
 Although one might not catch it from the error message, the problem with the 
 first is that there is a comma between userid and bigint, and the problem 
 with the second is that location is a reserved keyword.  Reported errors 
 should more accurately describe the nature of the error, such as no type 
 given for column 'userid' or 'location' is not a valid column name.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HIVE-2250) DESCRIBE EXTENDED table_name shows inconsistent compression information.

2011-09-08 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi reassigned HIVE-2250:


Assignee: subramanian raghunathan

 DESCRIBE EXTENDED table_name shows inconsistent compression information.
 --

 Key: HIVE-2250
 URL: https://issues.apache.org/jira/browse/HIVE-2250
 Project: Hive
  Issue Type: Bug
  Components: CLI, Diagnosability
Affects Versions: 0.7.0
 Environment: RHEL, Full Cloudera stack
Reporter: Travis Powell
Assignee: subramanian raghunathan
Priority: Critical
 Attachments: HIVE-2250.patch


 Commands executed in this order:
 user@node # hive
 hive SET hive.exec.compress.output=true; 
 hive SET io.seqfile.compression.type=BLOCK;
 hive CREATE TABLE table_name ( [...] ) ROW FORMAT DELIMITED FIELDS 
 TERMINATED BY '\t' STORED AS SEQUENCEFILE;
 hive CREATE TABLE staging_table ( [...] ) ROW FORMAT DELIMITED FIELDS 
 TERMINATED BY '\t';
 hive LOAD DATA LOCAL INPATH 'file:///root/input/' OVERWRITE INTO TABLE 
 staging_table;
 hive INSERT OVERWRITE TABLE table_name SELECT * FROM staging_table;
 (Map reduce job to change to sequence file...)
 hive DESCRIBE EXTENDED table_name;
 Detailed Table Information  Table(tableName:table_name, 
 dbName:benchmarking, owner:root, createTime:1309480053, lastAccessTime:0, 
 retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:session_key, 
 type:string, comment:null), FieldSchema(name:remote_address, type:string, 
 comment:null), FieldSchema(name:canister_lssn, type:string, comment:null), 
 FieldSchema(name:canister_session_id, type:bigint, comment:null), 
 FieldSchema(name:tltsid, type:string, comment:null), FieldSchema(name:tltuid, 
 type:string, comment:null), FieldSchema(name:tltvid, type:string, 
 comment:null), FieldSchema(name:canister_server, type:string, comment:null), 
 FieldSchema(name:session_timestamp, type:string, comment:null), 
 FieldSchema(name:session_duration, type:string, comment:null), 
 FieldSchema(name:hit_count, type:bigint, comment:null), 
 FieldSchema(name:http_user_agent, type:string, comment:null), 
 FieldSchema(name:extractid, type:bigint, comment:null), 
 FieldSchema(name:site_link, type:string, comment:null), FieldSchema(name:dt, 
 type:string, comment:null), FieldSchema(name:hour, type:int, comment:null)], 
 location:hdfs://hadoop2/user/hive/warehouse/benchmarking.db/table_name, 
 inputFormat:org.apache.hadoop.mapred.SequenceFileInputFormat, 
 outputFormat:org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat, 
 compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, 
 serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
 parameters:{serialization.format=   , field.delim=
 *** SEE ABOVE: Compression is set to FALSE, even though contents of table is 
 compressed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HIVE-2217) add Query text for debugging in lock data

2011-09-08 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi reassigned HIVE-2217:


Assignee: Jiayan Jiang

 add Query text for debugging in lock data
 -

 Key: HIVE-2217
 URL: https://issues.apache.org/jira/browse/HIVE-2217
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.7.1
Reporter: Namit Jain
Assignee: Jiayan Jiang
 Attachments: hive_diff2


 Currently, the queryId is stored in the lock data - 
 Query text would improve the debuggability

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HIVE-2432) Bring project into compliance with Apache Software Foundation Branding Requirements

2011-09-08 Thread John Sichi (JIRA)
Bring project into compliance with Apache Software Foundation Branding 
Requirements
---

 Key: HIVE-2432
 URL: https://issues.apache.org/jira/browse/HIVE-2432
 Project: Hive
  Issue Type: Improvement
Reporter: John Sichi
Assignee: John Sichi


http://www.apache.org/foundation/marks/pmcs.html

I will be creating sub-tasks for the various work items needed.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HIVE-2433) add DOAP file for Hive

2011-09-08 Thread John Sichi (JIRA)
add DOAP file for Hive
--

 Key: HIVE-2433
 URL: https://issues.apache.org/jira/browse/HIVE-2433
 Project: Hive
  Issue Type: Sub-task
Reporter: John Sichi


http://www.apache.org/foundation/marks/pmcs.html#metadata

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HIVE-2434) add a TM to Hive logo image

2011-09-08 Thread John Sichi (JIRA)
add a TM to Hive logo image
---

 Key: HIVE-2434
 URL: https://issues.apache.org/jira/browse/HIVE-2434
 Project: Hive
  Issue Type: Sub-task
Reporter: John Sichi


http://www.apache.org/foundation/marks/pmcs.html#graphics

And maybe the feather?


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2435) Update project naming and description in Hive wiki

2011-09-08 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-2435:
-

Description: http://www.apache.org/foundation/marks/pmcs.html#naming

 Update project naming and description in Hive wiki
 --

 Key: HIVE-2435
 URL: https://issues.apache.org/jira/browse/HIVE-2435
 Project: Hive
  Issue Type: Sub-task
Reporter: John Sichi
Assignee: John Sichi

 http://www.apache.org/foundation/marks/pmcs.html#naming

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HIVE-2436) Update project naming and description in Hive website

2011-09-08 Thread John Sichi (JIRA)
Update project naming and description in Hive website
-

 Key: HIVE-2436
 URL: https://issues.apache.org/jira/browse/HIVE-2436
 Project: Hive
  Issue Type: Sub-task
Reporter: John Sichi


http://www.apache.org/foundation/marks/pmcs.html#naming

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HIVE-2435) Update project naming and description in Hive wiki

2011-09-08 Thread John Sichi (JIRA)
Update project naming and description in Hive wiki
--

 Key: HIVE-2435
 URL: https://issues.apache.org/jira/browse/HIVE-2435
 Project: Hive
  Issue Type: Sub-task
Reporter: John Sichi
Assignee: John Sichi




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HIVE-2437) update project website navigation links

2011-09-08 Thread John Sichi (JIRA)
update project website navigation links
---

 Key: HIVE-2437
 URL: https://issues.apache.org/jira/browse/HIVE-2437
 Project: Hive
  Issue Type: Sub-task
Reporter: John Sichi


http://www.apache.org/foundation/marks/pmcs.html#navigation

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HIVE-2438) add trademark attributions to Hive homepage

2011-09-08 Thread John Sichi (JIRA)
add trademark attributions to Hive homepage
---

 Key: HIVE-2438
 URL: https://issues.apache.org/jira/browse/HIVE-2438
 Project: Hive
  Issue Type: Sub-task
Reporter: John Sichi


http://www.apache.org/foundation/marks/pmcs.html#attributions

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2402) Function like with empty string is throwing null pointer exception

2011-09-08 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-2402:
-

   Resolution: Fixed
Fix Version/s: 0.9.0
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

Committed to trunk.  Thanks Chinna!


 Function like with empty string is throwing null pointer exception
 --

 Key: HIVE-2402
 URL: https://issues.apache.org/jira/browse/HIVE-2402
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.8.0
 Environment: Hadoop 0.20.1, Hive0.8.0 and SUSE Linux Enterprise 
 Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5)
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam
 Fix For: 0.9.0

 Attachments: HIVE-2402.1.patch, HIVE-2402.patch


 select emp.ename from emp where ename like ''
 This query is throwing null pointer exception

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2182) Avoid null pointer exception when executing UDF

2011-09-08 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-2182:
-

Status: Open  (was: Patch Available)

I am getting the failure below when running the new test with latest trunk.  
Did you update the .q.out?

{noformat}
[junit] diff -a -I file: -I pfile: -I hdfs: -I /tmp/ -I invalidscheme: -I 
lastUpdateTime -I lastAccessTime -I [Oo]wner -I CreateTime -I LastAccessTime -I 
Location -I LOCATION ' -I transient_lastDdlTime -I last_modified_ -I 
java.lang.RuntimeException -I at org -I at sun -I at java -I at junit -I Caused 
by: -I LOCK_QUERYID: -I grantTime -I [.][.][.] [0-9]* more -I job_[0-9]*_[0-9]* 
-I USING 'java -cp 
/data/users/jsichi/open/test-trunk/build/ql/test/logs/clientnegative/udfnull.q.out
 
/data/users/jsichi/open/test-trunk/ql/src/test/results/clientnegative/udfnull.q.out
[junit] 8,18c8
[junit]  PREHOOK: Output: 
file:/tmp/jsichi/hive_2011-09-08_16-48-29_269_6749666372366482183/-mr-1
[junit]  Execution failed with exit status: 2
[junit]  Obtaining error information
[junit]  
[junit]  Task failed!
[junit]  Task ID:
[junit]Stage-1
[junit]  
[junit]  Logs:
[junit]  
[junit]  /data/users/jsichi/open/test-trunk/build/ql/tmp//hive.log
[junit] ---
[junit]  PREHOOK: Output: 
file:/tmp/root/hive_2011-05-25_10-05-57_126_4632621650656424226/-mr-1
[junit] Exception: Client execution results failed with error code = 1
[junit] See build/ql/tmp/hive.log, or try ant test ... 
-Dtest.silent=false to get more logs.
[junit] Cleaning up TestNegativeCliDriver
[junit] Tests run: 2, Failures: 1, Errors: 0, Time elapsed: 5.496 sec
[junit] Test org.apache.hadoop.hive.cli.TestNegativeCliDriver FAILED
{noformat}


 Avoid null pointer exception when executing UDF
 ---

 Key: HIVE-2182
 URL: https://issues.apache.org/jira/browse/HIVE-2182
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.5.0, 0.8.0
 Environment: Hadoop 0.20.1, Hive0.8.0 and SUSE Linux Enterprise 
 Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5)
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam
 Attachments: HIVE-2182.1.patch, HIVE-2182.2.patch, HIVE-2182.patch


 For using UDF's executed following steps
 {noformat}
 add jar /home/udf/udf.jar;
 create temporary function grade as 'udf.Grade';
 select m.userid,m.name,grade(m.maths,m.physics,m.chemistry) from marks m;
 {noformat}
 But from the above steps if we miss the first step (add jar) and execute 
 remaining steps
 {noformat}
 create temporary function grade as 'udf.Grade';
 select m.userid,m.name,grade(m.maths,m.physics,m.chemistry) from marks m;
 {noformat}
 In tasktracker it is throwing this exception
 {noformat}
 Caused by: java.lang.RuntimeException: Map operator initialization failed
at 
 org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:121)
... 18 more
 Caused by: java.lang.RuntimeException: java.lang.NullPointerException
at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:115)
at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.initialize(GenericUDFBridge.java:126)
at 
 org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:133)
at 
 org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:878)
at 
 org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:904)
at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:60)
at 
 org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
at 
 org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433)
at 
 org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:389)
at 
 org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:133)
at 
 org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
at 
 org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:444)
at 
 org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
at 
 org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:98)
... 18 more
 Caused by: java.lang.NullPointerException
at 
 java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:768)
at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:107)
... 31 more
 {noformat}
 Instead of null pointer exception 

[jira] [Updated] (HIVE-2426) Test that views with joins work properly

2011-09-07 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-2426:
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

+1, passed tests, committed to trunk.  Thanks Charles!


 Test that views with joins work properly
 

 Key: HIVE-2426
 URL: https://issues.apache.org/jira/browse/HIVE-2426
 Project: Hive
  Issue Type: Test
Reporter: Charles Chen
Assignee: Charles Chen
 Fix For: 0.9.0

 Attachments: HIVE-2426.3.patch, HIVE-2426v2.patch


 With the testcase
 {noformat}
 drop table invites;
 drop table invites2;
 create table invites (foo int, bar string) partitioned by (ds string);
 create table invites2 (foo int, bar string) partitioned by (ds string);
 set hive.mapred.mode=strict;
 -- test join views: see HIVE-1989
 create view v as select invites.bar, invites2.foo, invites2.ds from invites 
 join invites2 on invites.ds=invites2.ds;
 explain select * from v where ds='2011-09-01';
 drop view v;
 drop table invites;
 drop table invites2;
 {noformat}
 We should not have the partition pruner complain about invites.ds not having 
 a predicate because the predicate invites2.ds='2011-09-01' will be inferred 
 with the ppd transitivity optimization

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2420) partition pruner expr is not populated due to some bug in ppd

2011-09-07 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13099939#comment-13099939
 ] 

John Sichi commented on HIVE-2420:
--

Yongqiang, didn't we already temporarily set that to false in our own config 
due to HIVE-2344?  That has since been fixed, but if there are other problems, 
we can keep it disabled until all are resolved.


 partition pruner expr is not populated due to some bug in ppd
 -

 Key: HIVE-2420
 URL: https://issues.apache.org/jira/browse/HIVE-2420
 Project: Hive
  Issue Type: Bug
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: HIVE-2420.reproduce.diff




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2380) Add ByteArray Datatype

2011-09-06 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13098322#comment-13098322
 ] 

John Sichi commented on HIVE-2380:
--

You can find design doc examples here:

https://cwiki.apache.org/confluence/display/Hive/DesignDocs


 Add ByteArray Datatype
 --

 Key: HIVE-2380
 URL: https://issues.apache.org/jira/browse/HIVE-2380
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: hive-2380.patch


 Add bytearray as a primitive data type.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-1143) CREATE VIEW followup: updatable views

2011-09-06 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13098413#comment-13098413
 ] 

John Sichi commented on HIVE-1143:
--

Charles, if you had a patch in progress for this one, can you post it here as a 
checkpoint in case someone else has time to pick it up later?

 CREATE VIEW followup:  updatable views
 --

 Key: HIVE-1143
 URL: https://issues.apache.org/jira/browse/HIVE-1143
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: John Sichi
Assignee: Charles Chen

 For HIVE-972, we only implemented read-only views.  Updatable views are 
 difficult in general, but for simple cases where views are being used to 
 impose a rename layer on existing tables/columns, update support would be 
 high value (for consistent read/write access) and not a lot of work.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2426) Test that views with joins work properly

2011-09-06 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-2426:
-

Attachment: HIVE-2426.3.patch

Made a few minor improvements to the overlap handling code and comments.

 Test that views with joins work properly
 

 Key: HIVE-2426
 URL: https://issues.apache.org/jira/browse/HIVE-2426
 Project: Hive
  Issue Type: Test
Reporter: Charles Chen
Assignee: Charles Chen
 Fix For: 0.9.0

 Attachments: HIVE-2426.3.patch, HIVE-2426v2.patch


 With the testcase
 {noformat}
 drop table invites;
 drop table invites2;
 create table invites (foo int, bar string) partitioned by (ds string);
 create table invites2 (foo int, bar string) partitioned by (ds string);
 set hive.mapred.mode=strict;
 -- test join views: see HIVE-1989
 create view v as select invites.bar, invites2.foo, invites2.ds from invites 
 join invites2 on invites.ds=invites2.ds;
 explain select * from v where ds='2011-09-01';
 drop view v;
 drop table invites;
 drop table invites2;
 {noformat}
 We should not have the partition pruner complain about invites.ds not having 
 a predicate because the predicate invites2.ds='2011-09-01' will be inferred 
 with the ppd transitivity optimization

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2369) Minor typo in error message in HiveConnection.java (JDBC)

2011-09-06 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-2369:
-

   Resolution: Fixed
Fix Version/s: 0.9.0
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

Committed to trunk.  Thanks Clément!


 Minor typo in error message in HiveConnection.java (JDBC)
 -

 Key: HIVE-2369
 URL: https://issues.apache.org/jira/browse/HIVE-2369
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.7.1, 0.8.0
 Environment: Linux
Reporter: Clément Notin
Assignee: Clément Notin
Priority: Trivial
 Fix For: 0.9.0

 Attachments: HIVE-2369.patch

   Original Estimate: 2m
  Remaining Estimate: 2m

 There is a minor typo issue in HiveConnection.java (jdbc) :
 {code}throw new SQLException(Could not establish connecton to 
 + uri + :  + e.getMessage(), 08S01);{code}
 It seems like there's a i missing.
 I know it's a very minor typo but I report it anyway. I won't attach a patch 
 because it would be too long for me to SVN checkout just for 1 letter.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2408) Perpetually degrading performance in checkPaths

2011-09-02 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-2408:
-

Component/s: (was: HBase Handler)
 Query Processor

 Perpetually degrading performance in checkPaths
 ---

 Key: HIVE-2408
 URL: https://issues.apache.org/jira/browse/HIVE-2408
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.7.1, 0.8.0
Reporter: Grisha Trubetskoy

 In ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java, checkPaths() 
 tacks on a copy_N if a file exists, working its way up until an available 
 file name is found. The problem is that the exists() check is quite expensive 
 in HDFS, and if you have hundreds of files to go through this becomes a 
 serious bottleneck.
 A better solution would be to use a timestamp in the file name, then followed 
 by the copy_N scheme. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-1989) recognize transitivity of predicates on join keys

2011-09-02 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-1989:
-

Status: Open  (was: Patch Available)

I got failures in the following tests:

index_auto_mult_tables
index_auto_mult_tables_compact
outer_join_ppr
ppd_gby_join
ppd_join
ppd_join2
ppd_join3
ppd_outer_join3
ppd_outer_join5
ppd_union


 recognize transitivity of predicates on join keys
 -

 Key: HIVE-1989
 URL: https://issues.apache.org/jira/browse/HIVE-1989
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: John Sichi
Assignee: Charles Chen
 Fix For: 0.8.0

 Attachments: HIVE-1989v1.patch, HIVE-1989v10.patch, 
 HIVE-1989v11.patch, HIVE-1989v4.patch, HIVE-1989v5-WITH-HIVE-2382v1.patch, 
 HIVE-1989v6-WITH-HIVE-2383v1.patch, HIVE-1989v8.patch, HIVE-1989v9.patch


 Given
 {noformat}
 set hive.mapred.mode=strict;
 create table invites (foo int, bar string) partitioned by (ds string);
 create table invites2 (foo int, bar string) partitioned by (ds string);
 select count(*) from invites join invites2 on invites.ds=invites2.ds where 
 invites.ds='2011-01-01';
 {noformat}
 currently an error occurs:
 {noformat}
 Error in semantic analysis: No Partition Predicate Found for Alias invites2 
 Table invites2
 {noformat}
 The optimizer should be able to infer a predicate on invites2 via 
 transitivity.  The current lack places a burden on the user to add a 
 redundant predicate, and makes impossible (at least in strict mode) join 
 views where both underlying tables are partitioned (the join select list has 
 to pick one of the tables arbitrarily).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-1989) recognize transitivity of predicates on join keys

2011-09-02 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096433#comment-13096433
 ] 

John Sichi commented on HIVE-1989:
--

+1.  Will commit when tests pass.


 recognize transitivity of predicates on join keys
 -

 Key: HIVE-1989
 URL: https://issues.apache.org/jira/browse/HIVE-1989
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: John Sichi
Assignee: Charles Chen
 Fix For: 0.8.0

 Attachments: HIVE-1989v1.patch, HIVE-1989v10.patch, 
 HIVE-1989v11.patch, HIVE-1989v12.patch, HIVE-1989v4.patch, 
 HIVE-1989v5-WITH-HIVE-2382v1.patch, HIVE-1989v6-WITH-HIVE-2383v1.patch, 
 HIVE-1989v8.patch, HIVE-1989v9.patch


 Given
 {noformat}
 set hive.mapred.mode=strict;
 create table invites (foo int, bar string) partitioned by (ds string);
 create table invites2 (foo int, bar string) partitioned by (ds string);
 select count(*) from invites join invites2 on invites.ds=invites2.ds where 
 invites.ds='2011-01-01';
 {noformat}
 currently an error occurs:
 {noformat}
 Error in semantic analysis: No Partition Predicate Found for Alias invites2 
 Table invites2
 {noformat}
 The optimizer should be able to infer a predicate on invites2 via 
 transitivity.  The current lack places a burden on the user to add a 
 redundant predicate, and makes impossible (at least in strict mode) join 
 views where both underlying tables are partitioned (the join select list has 
 to pick one of the tables arbitrarily).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-1989) recognize transitivity of predicates on join keys

2011-09-02 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-1989:
-

   Resolution: Fixed
Fix Version/s: (was: 0.8.0)
   0.9.0
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

Committed to trunk.  Thanks Charles!


 recognize transitivity of predicates on join keys
 -

 Key: HIVE-1989
 URL: https://issues.apache.org/jira/browse/HIVE-1989
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: John Sichi
Assignee: Charles Chen
 Fix For: 0.9.0

 Attachments: HIVE-1989v1.patch, HIVE-1989v10.patch, 
 HIVE-1989v11.patch, HIVE-1989v12.patch, HIVE-1989v4.patch, 
 HIVE-1989v5-WITH-HIVE-2382v1.patch, HIVE-1989v6-WITH-HIVE-2383v1.patch, 
 HIVE-1989v8.patch, HIVE-1989v9.patch


 Given
 {noformat}
 set hive.mapred.mode=strict;
 create table invites (foo int, bar string) partitioned by (ds string);
 create table invites2 (foo int, bar string) partitioned by (ds string);
 select count(*) from invites join invites2 on invites.ds=invites2.ds where 
 invites.ds='2011-01-01';
 {noformat}
 currently an error occurs:
 {noformat}
 Error in semantic analysis: No Partition Predicate Found for Alias invites2 
 Table invites2
 {noformat}
 The optimizer should be able to infer a predicate on invites2 via 
 transitivity.  The current lack places a burden on the user to add a 
 redundant predicate, and makes impossible (at least in strict mode) join 
 views where both underlying tables are partitioned (the join select list has 
 to pick one of the tables arbitrarily).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2337) Predicate pushdown erroneously conservative with outer joins

2011-09-01 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13095607#comment-13095607
 ] 

John Sichi commented on HIVE-2337:
--

+1.  Will commit when tests pass.


 Predicate pushdown erroneously conservative with outer joins
 

 Key: HIVE-2337
 URL: https://issues.apache.org/jira/browse/HIVE-2337
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Charles Chen
Assignee: Charles Chen
 Fix For: 0.9.0

 Attachments: HIVE-2337v1.patch, HIVE-2337v2.patch, HIVE-2337v3.patch, 
 HIVE-2337v4.patch, HIVE-2337v5.patch, HIVE-2337v6.patch, HIVE-2337v7.patch


 The predicate pushdown filter is not applying left associativity of joins 
 correctly in determining possible aliases for pushing predicates.
 In hive.ql.ppd.OpProcFactory.JoinPPD.getQualifiedAliases, the criteria for 
 pushing aliases is specified as:
 {noformat}
 /**
  * Figures out the aliases for whom it is safe to push predicates based on
  * ANSI SQL semantics For inner join, all predicates for all aliases can 
 be
  * pushed For full outer join, none of the predicates can be pushed as 
 that
  * would limit the number of rows for join For left outer join, all the
  * predicates on the left side aliases can be pushed up For right outer
  * join, all the predicates on the right side aliases can be pushed up 
 Joins
  * chain containing both left and right outer joins are treated as full
  * outer join. [...]
  *
  * @param op
  *  Join Operator
  * @param rr
  *  Row resolver
  * @return set of qualified aliases
  */
 {noformat}
 Since hive joins are left associative, something like a RIGHT OUTER JOIN b 
 LEFT OUTER JOIN c INNER JOIN d should be interpreted as ((a RIGHT OUTER 
 JOIN b) LEFT OUTER JOIN c) INNER JOIN d, so there would be cases where joins 
 with both left and right outer joins can have aliases that can be pushed.  
 Here, aliases b and d are eligible to be pushed up while the current criteria 
 provide that none are eligible.
 Using:
 {noformat}
 create table t1 (id int, key string, value string);
 create table t2 (id int, key string, value string);
 create table t3 (id int, key string, value string);
 create table t4 (id int, key string, value string);
 {noformat}
 For example, the query
 {noformat}
 explain select * from t1 full outer join t2 on t1.id=t2.id join t3 on 
 t2.id=t3.id where t3.id=20; 
 {noformat}
 currently gives
 {noformat}
 STAGE DEPENDENCIES:
   Stage-1 is a root stage
   Stage-0 is a root stage
 STAGE PLANS:
   Stage: Stage-1
 Map Reduce
   Alias - Map Operator Tree:
 t1 
   TableScan
 alias: t1
 Reduce Output Operator
   key expressions:
 expr: id
 type: int
   sort order: +
   Map-reduce partition columns:
 expr: id
 type: int
   tag: 0
   value expressions:
 expr: id
 type: int
 expr: key
 type: string
 expr: value
 type: string
 t2 
   TableScan
 alias: t2
 Reduce Output Operator
   key expressions:
 expr: id
 type: int
   sort order: +
   Map-reduce partition columns:
 expr: id
 type: int
   tag: 1
   value expressions:
 expr: id
 type: int
 expr: key
 type: string
 expr: value
 type: string
 t3 
   TableScan
 alias: t3
 Reduce Output Operator
   key expressions:
 expr: id
 type: int
   sort order: +
   Map-reduce partition columns:
 expr: id
 type: int
   tag: 2
   value expressions:
 expr: id
 type: int
 expr: key
 type: string
 expr: value
 type: string
   Reduce Operator Tree:
 Join Operator
   condition map:
Outer Join 0 to 1
Inner Join 1 to 2
   condition expressions:
 0 {VALUE._col0} {VALUE._col1} {VALUE._col2}
 1 {VALUE._col0} {VALUE._col1} {VALUE._col2}
 2 {VALUE._col0} {VALUE._col1} {VALUE._col2}
   handleSkewJoin: false
   

[jira] [Commented] (HIVE-1545) Add a bunch of UDFs and UDAFs

2011-09-01 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13095621#comment-13095621
 ] 

John Sichi commented on HIVE-1545:
--

I'm way behind on the PDK (probably not gonna make it for 0.8), but I'm 
planning to rework the UDFUtils into annotations as part of it.

Cyril, I think they are mostly used for validation purposes, in which case you 
can just comment out the calls for now if you want to use the UDF without 
validation.


 Add a bunch of UDFs and UDAFs
 -

 Key: HIVE-1545
 URL: https://issues.apache.org/jira/browse/HIVE-1545
 Project: Hive
  Issue Type: New Feature
  Components: UDF
Reporter: Jonathan Chang
Assignee: Jonathan Chang
Priority: Minor
 Attachments: core.tar.gz, ext.tar.gz, udfs.tar.gz, udfs.tar.gz


 Here some UD(A)Fs which can be incorporated into the Hive distribution:
 UDFArgMax - Find the 0-indexed index of the largest argument. e.g., ARGMAX(4, 
 5, 3) returns 1.
 UDFBucket - Find the bucket in which the first argument belongs. e.g., 
 BUCKET(x, b_1, b_2, b_3, ...), will return the smallest i such that x  b_{i} 
 but = b_{i+1}. Returns 0 if x is smaller than all the buckets.
 UDFFindInArray - Finds the 1-index of the first element in the array given as 
 the second argument. Returns 0 if not found. Returns NULL if either argument 
 is NULL. E.g., FIND_IN_ARRAY(5, array(1,2,5)) will return 3. FIND_IN_ARRAY(5, 
 array(1,2,3)) will return 0.
 UDFGreatCircleDist - Finds the great circle distance (in km) between two 
 lat/long coordinates (in degrees).
 UDFLDA - Performs LDA inference on a vector given fixed topics.
 UDFNumberRows - Number successive rows starting from 1. Counter resets to 1 
 whenever any of its parameters changes.
 UDFPmax - Finds the maximum of a set of columns. e.g., PMAX(4, 5, 3) returns 
 5.
 UDFRegexpExtractAll - Like REGEXP_EXTRACT except that it returns all matches 
 in an array.
 UDFUnescape - Returns the string unescaped (using C/Java style unescaping).
 UDFWhich - Given a boolean array, return the indices which are TRUE.
 UDFJaccard
 UDAFCollect - Takes all the values associated with a row and converts it into 
 a list. Make sure to have: set hive.map.aggr = false;
 UDAFCollectMap - Like collect except that it takes tuples and generates a map.
 UDAFEntropy - Compute the entropy of a column.
 UDAFPearson (BROKEN!!!) - Computes the pearson correlation between two 
 columns.
 UDAFTop - TOP(KEY, VAL) - returns the KEY associated with the largest value 
 of VAL.
 UDAFTopN (BROKEN!!!) - Like TOP except returns a list of the keys associated 
 with the N (passed as the third parameter) largest values of VAL.
 UDAFHistogram

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-1989) recognize transitivity of predicates on join keys

2011-09-01 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13095643#comment-13095643
 ] 

John Sichi commented on HIVE-1989:
--

Charles, can you add a test case for the original partitioned join view use 
case?  Separate JIRA is fine.

 recognize transitivity of predicates on join keys
 -

 Key: HIVE-1989
 URL: https://issues.apache.org/jira/browse/HIVE-1989
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: John Sichi
Assignee: Charles Chen
 Fix For: 0.8.0

 Attachments: HIVE-1989v1.patch, HIVE-1989v4.patch, 
 HIVE-1989v5-WITH-HIVE-2382v1.patch, HIVE-1989v6-WITH-HIVE-2383v1.patch, 
 HIVE-1989v8.patch, HIVE-1989v9.patch


 Given
 {noformat}
 set hive.mapred.mode=strict;
 create table invites (foo int, bar string) partitioned by (ds string);
 create table invites2 (foo int, bar string) partitioned by (ds string);
 select count(*) from invites join invites2 on invites.ds=invites2.ds where 
 invites.ds='2011-01-01';
 {noformat}
 currently an error occurs:
 {noformat}
 Error in semantic analysis: No Partition Predicate Found for Alias invites2 
 Table invites2
 {noformat}
 The optimizer should be able to infer a predicate on invites2 via 
 transitivity.  The current lack places a burden on the user to add a 
 redundant predicate, and makes impossible (at least in strict mode) join 
 views where both underlying tables are partitioned (the join select list has 
 to pick one of the tables arbitrarily).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2380) Add ByteArray Datatype

2011-09-01 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13095669#comment-13095669
 ] 

John Sichi commented on HIVE-2380:
--

Ashutosh, maybe we can discuss this one at the contributor meetup next week 
(and then record the conclusions here).

A few questions that I've heard so far:

* Is there a design doc somewhere?
* Since Hive already has an array type, but this feature is independent, we 
probably want a different type name than bytearray.
* For conversions, is going through string for all types a good default 
behavior?  An alternative would be to prevent implicit conversions altogether, 
and force users to pick the UDF with the desired behavior.  E.g. for 
string/binary conversion, it's a good idea to be able to specify an encoding 
rather than always using the JVM default.
* How does the new type work with TRANSFORM scripts, UDF's, saving to textfile, 
etc?
* Don't we need more accessor functions (e.g. making the existing string 
functions such as LENGTH work)?



 Add ByteArray Datatype
 --

 Key: HIVE-2380
 URL: https://issues.apache.org/jira/browse/HIVE-2380
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: hive-2380.patch


 Add bytearray as a primitive data type.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2401) Show functions with regex not working

2011-09-01 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-2401:
-

Status: Open  (was: Patch Available)

The wiki already explains how to do this.  I don't hink we need any behavior 
change here.

hive show functions 'm.*';
OK
map
map_keys
map_values
max
min
minute
month


 Show functions with regex not working
 -

 Key: HIVE-2401
 URL: https://issues.apache.org/jira/browse/HIVE-2401
 Project: Hive
  Issue Type: Improvement
  Components: CLI
 Environment: Hadoop 0.20.1, Hive0.8.0 and SUSE Linux Enterprise 
 Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5)
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam
 Attachments: HIVE-2401.patch


 show functions a;
 If it gives all the function names starting with a  is easy to search.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2402) Function like with empty string is throwing null pointer exception

2011-09-01 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-2402:
-

Status: Open  (was: Patch Available)

 Function like with empty string is throwing null pointer exception
 --

 Key: HIVE-2402
 URL: https://issues.apache.org/jira/browse/HIVE-2402
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.8.0
 Environment: Hadoop 0.20.1, Hive0.8.0 and SUSE Linux Enterprise 
 Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5)
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam
 Attachments: HIVE-2402.patch


 select emp.ename from emp where ename like ''
 This query is throwing null pointer exception

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HIVE-2369) Minor typo in error message in HiveConnection.java (JDBC)

2011-09-01 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi reassigned HIVE-2369:


Assignee: Clément Notin

 Minor typo in error message in HiveConnection.java (JDBC)
 -

 Key: HIVE-2369
 URL: https://issues.apache.org/jira/browse/HIVE-2369
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.7.1, 0.8.0
 Environment: Linux
Reporter: Clément Notin
Assignee: Clément Notin
Priority: Trivial
 Attachments: HIVE-2369.patch

   Original Estimate: 2m
  Remaining Estimate: 2m

 There is a minor typo issue in HiveConnection.java (jdbc) :
 {code}throw new SQLException(Could not establish connecton to 
 + uri + :  + e.getMessage(), 08S01);{code}
 It seems like there's a i missing.
 I know it's a very minor typo but I report it anyway. I won't attach a patch 
 because it would be too long for me to SVN checkout just for 1 letter.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (HIVE-2401) Show functions with regex not working

2011-09-01 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13095671#comment-13095671
 ] 

John Sichi edited comment on HIVE-2401 at 9/1/11 11:31 PM:
---

The wiki already explains how to do this.  I don't think we need any behavior 
change here.

hive show functions 'm.*';
OK
map
map_keys
map_values
max
min
minute
month


  was (Author: jvs):
The wiki already explains how to do this.  I don't hink we need any 
behavior change here.

hive show functions 'm.*';
OK
map
map_keys
map_values
max
min
minute
month

  
 Show functions with regex not working
 -

 Key: HIVE-2401
 URL: https://issues.apache.org/jira/browse/HIVE-2401
 Project: Hive
  Issue Type: Improvement
  Components: CLI
 Environment: Hadoop 0.20.1, Hive0.8.0 and SUSE Linux Enterprise 
 Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5)
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam
 Attachments: HIVE-2401.patch


 show functions a;
 If it gives all the function names starting with a  is easy to search.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2337) Predicate pushdown erroneously conservative with outer joins

2011-09-01 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-2337:
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Committed to trunk.  Thanks Charles!

 Predicate pushdown erroneously conservative with outer joins
 

 Key: HIVE-2337
 URL: https://issues.apache.org/jira/browse/HIVE-2337
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Charles Chen
Assignee: Charles Chen
 Fix For: 0.9.0

 Attachments: HIVE-2337v1.patch, HIVE-2337v2.patch, HIVE-2337v3.patch, 
 HIVE-2337v4.patch, HIVE-2337v5.patch, HIVE-2337v6.patch, HIVE-2337v7.patch


 The predicate pushdown filter is not applying left associativity of joins 
 correctly in determining possible aliases for pushing predicates.
 In hive.ql.ppd.OpProcFactory.JoinPPD.getQualifiedAliases, the criteria for 
 pushing aliases is specified as:
 {noformat}
 /**
  * Figures out the aliases for whom it is safe to push predicates based on
  * ANSI SQL semantics For inner join, all predicates for all aliases can 
 be
  * pushed For full outer join, none of the predicates can be pushed as 
 that
  * would limit the number of rows for join For left outer join, all the
  * predicates on the left side aliases can be pushed up For right outer
  * join, all the predicates on the right side aliases can be pushed up 
 Joins
  * chain containing both left and right outer joins are treated as full
  * outer join. [...]
  *
  * @param op
  *  Join Operator
  * @param rr
  *  Row resolver
  * @return set of qualified aliases
  */
 {noformat}
 Since hive joins are left associative, something like a RIGHT OUTER JOIN b 
 LEFT OUTER JOIN c INNER JOIN d should be interpreted as ((a RIGHT OUTER 
 JOIN b) LEFT OUTER JOIN c) INNER JOIN d, so there would be cases where joins 
 with both left and right outer joins can have aliases that can be pushed.  
 Here, aliases b and d are eligible to be pushed up while the current criteria 
 provide that none are eligible.
 Using:
 {noformat}
 create table t1 (id int, key string, value string);
 create table t2 (id int, key string, value string);
 create table t3 (id int, key string, value string);
 create table t4 (id int, key string, value string);
 {noformat}
 For example, the query
 {noformat}
 explain select * from t1 full outer join t2 on t1.id=t2.id join t3 on 
 t2.id=t3.id where t3.id=20; 
 {noformat}
 currently gives
 {noformat}
 STAGE DEPENDENCIES:
   Stage-1 is a root stage
   Stage-0 is a root stage
 STAGE PLANS:
   Stage: Stage-1
 Map Reduce
   Alias - Map Operator Tree:
 t1 
   TableScan
 alias: t1
 Reduce Output Operator
   key expressions:
 expr: id
 type: int
   sort order: +
   Map-reduce partition columns:
 expr: id
 type: int
   tag: 0
   value expressions:
 expr: id
 type: int
 expr: key
 type: string
 expr: value
 type: string
 t2 
   TableScan
 alias: t2
 Reduce Output Operator
   key expressions:
 expr: id
 type: int
   sort order: +
   Map-reduce partition columns:
 expr: id
 type: int
   tag: 1
   value expressions:
 expr: id
 type: int
 expr: key
 type: string
 expr: value
 type: string
 t3 
   TableScan
 alias: t3
 Reduce Output Operator
   key expressions:
 expr: id
 type: int
   sort order: +
   Map-reduce partition columns:
 expr: id
 type: int
   tag: 2
   value expressions:
 expr: id
 type: int
 expr: key
 type: string
 expr: value
 type: string
   Reduce Operator Tree:
 Join Operator
   condition map:
Outer Join 0 to 1
Inner Join 1 to 2
   condition expressions:
 0 {VALUE._col0} {VALUE._col1} {VALUE._col2}
 1 {VALUE._col0} {VALUE._col1} {VALUE._col2}
 2 {VALUE._col0} {VALUE._col1} {VALUE._col2}
   

[jira] [Updated] (HIVE-2184) Few improvements in org.apache.hadoop.hive.ql.metadata.Hive.close()

2011-08-31 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-2184:
-

   Resolution: Fixed
Fix Version/s: 0.9.0
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

Committed to trunk.  Thanks Chinna!


 Few improvements in org.apache.hadoop.hive.ql.metadata.Hive.close()
 ---

 Key: HIVE-2184
 URL: https://issues.apache.org/jira/browse/HIVE-2184
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.5.0, 0.8.0
 Environment: Hadoop 0.20.1, Hive0.8.0 and SUSE Linux Enterprise 
 Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5)
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam
 Fix For: 0.9.0

 Attachments: HIVE-2184.1.patch, HIVE-2184.1.patch, HIVE-2184.2.patch, 
 HIVE-2184.3.patch, HIVE-2184.patch


 1)Hive.close() will call HiveMetaStoreClient.close() in this method the 
 variable standAloneClient is never become true then client.shutdown() never 
 call.
 2)Hive.close() After calling metaStoreClient.close() need to make 
 metaStoreClient=null

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2383) Incorrect alias filtering for predicate pushdown

2011-08-31 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-2383:
-

   Resolution: Fixed
Fix Version/s: (was: 0.8.0)
   0.9.0
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

Passed tests and committed to trunk.  Thanks Charles!


 Incorrect alias filtering for predicate pushdown
 

 Key: HIVE-2383
 URL: https://issues.apache.org/jira/browse/HIVE-2383
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: Charles Chen
Assignee: Charles Chen
Priority: Critical
 Fix For: 0.9.0

 Attachments: HIVE-2383v1.patch, HIVE-2383v2.patch, HIVE-2383v5.patch


 The predicate pushdown optimizer starts at the topmost operators traverses 
 the operator tree, at each stage collecting predicates to be pushed down.  At 
 each operator, ive.ql.ppd.OpProcFactory.DefaultPPD.mergeWithChildrenPred is 
 called, which merges the predicates of the children nodes into the current 
 node.  The predicates are stored in hive.ql.ppd.ExprWalkerInfo.pushdownPreds 
 as a map from the alias a predicate refers to (a predicate may only refer to 
 one alias at a time as only such predicates can be pushed) to a list of such 
 predicates.  Since at each stage the alias the predicate refers to may change 
 (subqueries may change aliases), this is updated for each operator 
 (hive.ql.ppd.ExprWalkerProcFactory.extractPushdownPreds is called which walks 
 the ExprNodeDesc for each predicate). When a JoinOperator is encountered, 
 mergeWithChildrenPred is passed an optional parameter aliases which 
 contains a set of aliases that can be pushed per ansi semantics (see 
 hive.ql.ppd.OpProcFactory.JoinPPD.getQualifiedAliases).  The part that is 
 incorrect is that aliases are filtered in mergeWithChildrenPred before 
 extractPushdownPreds is called, which associates the predicates with the 
 correct alias in the current operator's context while the filtering should 
 happen after.
 In test case Q2 below, when the predicate a.bar=3 comes into the 
 JoinOperator, the alias is a coming in so it is accepted for pushdown.  
 When brought into the JoinOperator's context, however, since the predicate 
 refers to b.foo in the inner scope, we should not actually accept this for 
 pushdown.
 With the test cases
 {noformat}
 -- Q1: predicate should not be pushed on the right side of a left outer join 
 (this is correct in trunk)
 explain
 SELECT a.foo as foo1, b.foo as foo2, b.bar
 FROM pokes a LEFT OUTER JOIN pokes2 b
 ON a.foo=b.foo
 WHERE b.bar=3;
 -- Q2: predicate should not be pushed on the right side of a left outer join 
 (this is broken in trunk)
 explain
 SELECT * FROM
 (SELECT a.foo as foo1, b.foo as foo2, b.bar
 FROM pokes a LEFT OUTER JOIN pokes2 b
 ON a.foo=b.foo) a
 WHERE a.bar=3;
 -- Q3: predicate should be pushed (this is correct in trunk)
 explain
 SELECT * FROM
 (SELECT a.foo as foo1, b.foo as foo2, a.bar
 FROM pokes a JOIN pokes2 b
 ON a.foo=b.foo) a
 WHERE a.bar=3;
 {noformat}
 The current output is
 {noformat}
 hive 
  -- Q1: predicate should not be pushed on the right side of a left outer 
 join
  explain
  SELECT a.foo as foo1, b.foo as foo2, b.bar
  FROM pokes a LEFT OUTER JOIN pokes2 b
  ON a.foo=b.foo
  WHERE b.bar=3;
 OK
 ABSTRACT SYNTAX TREE:
   (TOK_QUERY (TOK_FROM (TOK_LEFTOUTERJOIN (TOK_TABREF (TOK_TABNAME pokes) a) 
 (TOK_TABREF (TOK_TABNAME pokes2) b) (= (. (TOK_TABLE_OR_COL a) foo) (. 
 (TOK_TABLE_OR_COL b) foo (TOK_INSERT (TOK_DESTINATION (TOK_DIR 
 TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) foo) foo1) 
 (TOK_SELEXPR (. (TOK_TABLE_OR_COL b) foo) foo2) (TOK_SELEXPR (. 
 (TOK_TABLE_OR_COL b) bar))) (TOK_WHERE (= (. (TOK_TABLE_OR_COL b) bar) 3
 STAGE DEPENDENCIES:
   Stage-1 is a root stage
   Stage-0 is a root stage
 STAGE PLANS:
   Stage: Stage-1
 Map Reduce
   Alias - Map Operator Tree:
 a 
   TableScan
 alias: a
 Reduce Output Operator
   key expressions:
 expr: foo
 type: int
   sort order: +
   Map-reduce partition columns:
 expr: foo
 type: int
   tag: 0
   value expressions:
 expr: foo
 type: int
 b 
   TableScan
 alias: b
 Reduce Output Operator
   key expressions:
 expr: foo
 type: int
   sort order: +
   Map-reduce partition columns:
 expr: foo
 type: int
   tag: 1
  

[jira] [Resolved] (HIVE-1395) Table aliases are ambiguous

2011-08-31 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi resolved HIVE-1395.
--

Resolution: Won't Fix

We're fixing the bugs and sticking with the normal SQL rules, which allow 
duplicate aliases, for the reasons mentioned above.


 Table aliases are ambiguous
 ---

 Key: HIVE-1395
 URL: https://issues.apache.org/jira/browse/HIVE-1395
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: Adam Kramer

 Consider this query:
 SELECT a.num FROM (
   SELECT a.num AS num, b.num AS num2
   FROM foo a LEFT OUTER JOIN bar b ON a.num=b.num
 ) a
 WHERE a.num2 IS NULL;
 ...in this case, the table alias 'a' is ambiguous. It could be the outer 
 table (i.e., the subquery result), or it could be the inner table (foo).
 In the above case, Hive silently parses the outer reference to a as the inner 
 reference. The result, then, is akin to:
 SELECT foo.num FROM foo WHERE bar.num IS NULL. This is bad.
 The bigger problem, however, is that Hive even lets people use the same table 
 alias at multiple points in the query. We should simply throw an exception 
 during the parse stage if there is any ambiguity in which table is which, 
 just like we do if the column names are ambiguous.
 Or, if for some reason we need people to be able to use 'a' to refer to 
 multiple tables or subqueries, it would be excellent if the exact parsing 
 structure were made clear and added to the wiki. In that case, I will file a 
 separate bug JIRA to complain about how it should be different. :)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HIVE-1342) Predicate push down get error result when sub-queries have the same alias name

2011-08-31 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi resolved HIVE-1342.
--

   Resolution: Fixed
Fix Version/s: 0.9.0

Fixed by committing sub-issues (not the patches attached to this issue).

 Predicate push down get error result when sub-queries have the same alias 
 name 
 ---

 Key: HIVE-1342
 URL: https://issues.apache.org/jira/browse/HIVE-1342
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: Ted Xu
Assignee: Charles Chen
Priority: Critical
 Fix For: 0.9.0

 Attachments: HIVE-1342v1.patch, HIVE-1342v2.patch, HIVE-1342v3.patch, 
 HIVE-1342v4.patch, cmd.hql, explain, ppd_same_alias_1.patch, 
 ppd_same_alias_2.patch


 Query is over-optimized by PPD when sub-queries have the same alias name, see 
 the query:
 ---
 create table if not exists dm_fact_buyer_prd_info_d (
   category_id string
   ,gmv_trade_num  int
   ,user_idint
   )
 PARTITIONED BY (ds int);
 set hive.optimize.ppd=true;
 set hive.map.aggr=true;
 explain select category_id1,category_id2,assoc_idx
 from (
   select 
   category_id1
   , category_id2
   , count(distinct user_id) as assoc_idx
   from (
   select 
   t1.category_id as category_id1
   , t2.category_id as category_id2
   , t1.user_id
   from (
   select category_id, user_id
   from dm_fact_buyer_prd_info_d
   group by category_id, user_id ) t1
   join (
   select category_id, user_id
   from dm_fact_buyer_prd_info_d
   group by category_id, user_id ) t2 on 
 t1.user_id=t2.user_id 
   ) t1
   group by category_id1, category_id2 ) t_o
   where category_id1  category_id2
   and assoc_idx  2;
 -
 The query above will fail when execute, throwing exception: can not cast 
 UDFOpNotEqual(Text, IntWritable) to UDFOpNotEqual(Text, Text). 
 I explained the query and the execute plan looks really wired ( only Stage-1, 
 see the highlighted predicate):
 ---
 Stage: Stage-1
 Map Reduce
   Alias - Map Operator Tree:
 t_o:t1:t1:dm_fact_buyer_prd_info_d 
   TableScan
 alias: dm_fact_buyer_prd_info_d
 Filter Operator
   predicate:
   expr: *(category_id  user_id)*
   type: boolean
   Select Operator
 expressions:
   expr: category_id
   type: string
   expr: user_id
   type: bigint
 outputColumnNames: category_id, user_id
 Group By Operator
   keys:
 expr: category_id
 type: string
 expr: user_id
 type: bigint
   mode: hash
   outputColumnNames: _col0, _col1
   Reduce Output Operator
 key expressions:
   expr: _col0
   type: string
   expr: _col1
   type: bigint
 sort order: ++
 Map-reduce partition columns:
   expr: _col0
   type: string
   expr: _col1
   type: bigint
 tag: -1
   Reduce Operator Tree:
 Group By Operator
   keys:
 expr: KEY._col0
 type: string
 expr: KEY._col1
 type: bigint
   mode: mergepartial
   outputColumnNames: _col0, _col1
   Select Operator
 expressions:
   expr: _col0
   type: string
   expr: _col1
   type: bigint
 outputColumnNames: _col0, _col1
 File Output Operator
   compressed: true
   GlobalTableId: 0
   table:
   input format: 
 org.apache.hadoop.mapred.SequenceFileInputFormat
   output format: 
 org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
  

[jira] [Commented] (HIVE-2383) Incorrect alias filtering for predicate pushdown

2011-08-31 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13095020#comment-13095020
 ] 

John Sichi commented on HIVE-2383:
--

Oh, um, also:  +1.


 Incorrect alias filtering for predicate pushdown
 

 Key: HIVE-2383
 URL: https://issues.apache.org/jira/browse/HIVE-2383
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: Charles Chen
Assignee: Charles Chen
Priority: Critical
 Fix For: 0.9.0

 Attachments: HIVE-2383v1.patch, HIVE-2383v2.patch, HIVE-2383v5.patch


 The predicate pushdown optimizer starts at the topmost operators traverses 
 the operator tree, at each stage collecting predicates to be pushed down.  At 
 each operator, ive.ql.ppd.OpProcFactory.DefaultPPD.mergeWithChildrenPred is 
 called, which merges the predicates of the children nodes into the current 
 node.  The predicates are stored in hive.ql.ppd.ExprWalkerInfo.pushdownPreds 
 as a map from the alias a predicate refers to (a predicate may only refer to 
 one alias at a time as only such predicates can be pushed) to a list of such 
 predicates.  Since at each stage the alias the predicate refers to may change 
 (subqueries may change aliases), this is updated for each operator 
 (hive.ql.ppd.ExprWalkerProcFactory.extractPushdownPreds is called which walks 
 the ExprNodeDesc for each predicate). When a JoinOperator is encountered, 
 mergeWithChildrenPred is passed an optional parameter aliases which 
 contains a set of aliases that can be pushed per ansi semantics (see 
 hive.ql.ppd.OpProcFactory.JoinPPD.getQualifiedAliases).  The part that is 
 incorrect is that aliases are filtered in mergeWithChildrenPred before 
 extractPushdownPreds is called, which associates the predicates with the 
 correct alias in the current operator's context while the filtering should 
 happen after.
 In test case Q2 below, when the predicate a.bar=3 comes into the 
 JoinOperator, the alias is a coming in so it is accepted for pushdown.  
 When brought into the JoinOperator's context, however, since the predicate 
 refers to b.foo in the inner scope, we should not actually accept this for 
 pushdown.
 With the test cases
 {noformat}
 -- Q1: predicate should not be pushed on the right side of a left outer join 
 (this is correct in trunk)
 explain
 SELECT a.foo as foo1, b.foo as foo2, b.bar
 FROM pokes a LEFT OUTER JOIN pokes2 b
 ON a.foo=b.foo
 WHERE b.bar=3;
 -- Q2: predicate should not be pushed on the right side of a left outer join 
 (this is broken in trunk)
 explain
 SELECT * FROM
 (SELECT a.foo as foo1, b.foo as foo2, b.bar
 FROM pokes a LEFT OUTER JOIN pokes2 b
 ON a.foo=b.foo) a
 WHERE a.bar=3;
 -- Q3: predicate should be pushed (this is correct in trunk)
 explain
 SELECT * FROM
 (SELECT a.foo as foo1, b.foo as foo2, a.bar
 FROM pokes a JOIN pokes2 b
 ON a.foo=b.foo) a
 WHERE a.bar=3;
 {noformat}
 The current output is
 {noformat}
 hive 
  -- Q1: predicate should not be pushed on the right side of a left outer 
 join
  explain
  SELECT a.foo as foo1, b.foo as foo2, b.bar
  FROM pokes a LEFT OUTER JOIN pokes2 b
  ON a.foo=b.foo
  WHERE b.bar=3;
 OK
 ABSTRACT SYNTAX TREE:
   (TOK_QUERY (TOK_FROM (TOK_LEFTOUTERJOIN (TOK_TABREF (TOK_TABNAME pokes) a) 
 (TOK_TABREF (TOK_TABNAME pokes2) b) (= (. (TOK_TABLE_OR_COL a) foo) (. 
 (TOK_TABLE_OR_COL b) foo (TOK_INSERT (TOK_DESTINATION (TOK_DIR 
 TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) foo) foo1) 
 (TOK_SELEXPR (. (TOK_TABLE_OR_COL b) foo) foo2) (TOK_SELEXPR (. 
 (TOK_TABLE_OR_COL b) bar))) (TOK_WHERE (= (. (TOK_TABLE_OR_COL b) bar) 3
 STAGE DEPENDENCIES:
   Stage-1 is a root stage
   Stage-0 is a root stage
 STAGE PLANS:
   Stage: Stage-1
 Map Reduce
   Alias - Map Operator Tree:
 a 
   TableScan
 alias: a
 Reduce Output Operator
   key expressions:
 expr: foo
 type: int
   sort order: +
   Map-reduce partition columns:
 expr: foo
 type: int
   tag: 0
   value expressions:
 expr: foo
 type: int
 b 
   TableScan
 alias: b
 Reduce Output Operator
   key expressions:
 expr: foo
 type: int
   sort order: +
   Map-reduce partition columns:
 expr: foo
 type: int
   tag: 1
   value expressions:
 expr: foo
 type: int
 expr: bar
 type: int

[jira] [Commented] (HIVE-2337) Predicate pushdown erroneously conservative with outer joins

2011-08-31 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13095105#comment-13095105
 ] 

John Sichi commented on HIVE-2337:
--

Charles, did you intentionally omit the new ppd_outer_join5.q from the latest 
patch?

Also, there's a weird non-ASCII character in the Javadoc.


 Predicate pushdown erroneously conservative with outer joins
 

 Key: HIVE-2337
 URL: https://issues.apache.org/jira/browse/HIVE-2337
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Charles Chen
Assignee: Charles Chen
 Fix For: 0.9.0

 Attachments: HIVE-2337v1.patch, HIVE-2337v2.patch, HIVE-2337v3.patch, 
 HIVE-2337v4.patch, HIVE-2337v5.patch


 The predicate pushdown filter is not applying left associativity of joins 
 correctly in determining possible aliases for pushing predicates.
 In hive.ql.ppd.OpProcFactory.JoinPPD.getQualifiedAliases, the criteria for 
 pushing aliases is specified as:
 {noformat}
 /**
  * Figures out the aliases for whom it is safe to push predicates based on
  * ANSI SQL semantics For inner join, all predicates for all aliases can 
 be
  * pushed For full outer join, none of the predicates can be pushed as 
 that
  * would limit the number of rows for join For left outer join, all the
  * predicates on the left side aliases can be pushed up For right outer
  * join, all the predicates on the right side aliases can be pushed up 
 Joins
  * chain containing both left and right outer joins are treated as full
  * outer join. [...]
  *
  * @param op
  *  Join Operator
  * @param rr
  *  Row resolver
  * @return set of qualified aliases
  */
 {noformat}
 Since hive joins are left associative, something like a RIGHT OUTER JOIN b 
 LEFT OUTER JOIN c INNER JOIN d should be interpreted as ((a RIGHT OUTER 
 JOIN b) LEFT OUTER JOIN c) INNER JOIN d, so there would be cases where joins 
 with both left and right outer joins can have aliases that can be pushed.  
 Here, aliases b and d are eligible to be pushed up while the current criteria 
 provide that none are eligible.
 Using:
 {noformat}
 create table t1 (id int, key string, value string);
 create table t2 (id int, key string, value string);
 create table t3 (id int, key string, value string);
 create table t4 (id int, key string, value string);
 {noformat}
 For example, the query
 {noformat}
 explain select * from t1 full outer join t2 on t1.id=t2.id join t3 on 
 t2.id=t3.id where t3.id=20; 
 {noformat}
 currently gives
 {noformat}
 STAGE DEPENDENCIES:
   Stage-1 is a root stage
   Stage-0 is a root stage
 STAGE PLANS:
   Stage: Stage-1
 Map Reduce
   Alias - Map Operator Tree:
 t1 
   TableScan
 alias: t1
 Reduce Output Operator
   key expressions:
 expr: id
 type: int
   sort order: +
   Map-reduce partition columns:
 expr: id
 type: int
   tag: 0
   value expressions:
 expr: id
 type: int
 expr: key
 type: string
 expr: value
 type: string
 t2 
   TableScan
 alias: t2
 Reduce Output Operator
   key expressions:
 expr: id
 type: int
   sort order: +
   Map-reduce partition columns:
 expr: id
 type: int
   tag: 1
   value expressions:
 expr: id
 type: int
 expr: key
 type: string
 expr: value
 type: string
 t3 
   TableScan
 alias: t3
 Reduce Output Operator
   key expressions:
 expr: id
 type: int
   sort order: +
   Map-reduce partition columns:
 expr: id
 type: int
   tag: 2
   value expressions:
 expr: id
 type: int
 expr: key
 type: string
 expr: value
 type: string
   Reduce Operator Tree:
 Join Operator
   condition map:
Outer Join 0 to 1
Inner Join 1 to 2
   condition expressions:
 0 {VALUE._col0} {VALUE._col1} {VALUE._col2}
 1 {VALUE._col0} {VALUE._col1} {VALUE._col2}
 2 {VALUE._col0} 

[jira] [Updated] (HIVE-2382) Invalid predicate pushdown from incorrect column expression map for select operator generated by GROUP BY operation

2011-08-30 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-2382:
-

   Resolution: Fixed
Fix Version/s: (was: 0.8.0)
   0.9.0
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

Committed to trunk.  Thanks Charles!


 Invalid predicate pushdown from incorrect column expression map for select 
 operator generated by GROUP BY operation
 ---

 Key: HIVE-2382
 URL: https://issues.apache.org/jira/browse/HIVE-2382
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: Charles Chen
Assignee: Charles Chen
Priority: Critical
 Fix For: 0.9.0

 Attachments: HIVE-2382v1.patch, HIVE-2382v2.patch


 When a GROUP BY is specified, a select operator is added before the GROUP BY 
 in SemanticAnalyzer.insertSelectAllPlanForGroupBy.  Currently, the column 
 expression map for this is set to the column expression map for the parent 
 operator.  This behavior is incorrect as, for example, the parent operator 
 could rearrange the order of the columns (_col0 = _col0, _col1 = _col2, 
 _col2 = _col1) and the new operator should not repeat this.
 The predicate pushdown optimization uses the column expression map to track 
 which columns a filter expression refers to at different operators.  This 
 results in a filter on incorrect columns.
 Here is a simple case of this going wrong: Using
 {noformat}
 create table invites (id int, foo int, bar int);
 {noformat}
 executing the query
 {noformat}
 explain select * from (select foo, bar from (select bar, foo from invites c 
 union all select bar, foo from invites d) b) a group by bar, foo having bar=1;
 {noformat}
 results in
 {noformat}
 STAGE DEPENDENCIES:
   Stage-1 is a root stage
   Stage-0 is a root stage
 STAGE PLANS:
   Stage: Stage-1
 Map Reduce
   Alias - Map Operator Tree:
 a-subquery1:b-subquery1:c 
   TableScan
 alias: c
 Filter Operator
   predicate:
   expr: (foo = 1)
   type: boolean
   Select Operator
 expressions:
   expr: bar
   type: int
   expr: foo
   type: int
 outputColumnNames: _col0, _col1
 Union
   Select Operator
 expressions:
   expr: _col1
   type: int
   expr: _col0
   type: int
 outputColumnNames: _col0, _col1
 Select Operator
   expressions:
 expr: _col0
 type: int
 expr: _col1
 type: int
   outputColumnNames: _col0, _col1
   Group By Operator
 bucketGroup: false
 keys:
   expr: _col1
   type: int
   expr: _col0
   type: int
 mode: hash
 outputColumnNames: _col0, _col1
 Reduce Output Operator
   key expressions:
 expr: _col0
 type: int
 expr: _col1
 type: int
   sort order: ++
   Map-reduce partition columns:
 expr: _col0
 type: int
 expr: _col1
 type: int
   tag: -1
 a-subquery2:b-subquery2:d 
   TableScan
 alias: d
 Filter Operator
   predicate:
   expr: (foo = 1)
   type: boolean
   Select Operator
 expressions:
   expr: bar
   type: int
   expr: foo
   type: int
 outputColumnNames: _col0, _col1
 Union
   Select Operator
 expressions:
   expr: _col1
   type: int
   expr: _col0
   type: int
 outputColumnNames: _col0, _col1
 Select Operator
   expressions:
   

  1   2   3   4   5   6   >