[jira] [Commented] (HIVE-4951) combine2_win.q.out needs update for HIVE-3253 (increasing nesting levels)

2013-07-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723468#comment-13723468
 ] 

Hive QA commented on HIVE-4951:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12594802/HIVE-4951.1.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 2736 tests executed
*Failed tests:*
{noformat}
org.apache.hcatalog.fileformats.TestOrcDynamicPartitioned.testHCatDynamicPartitionedTableMultipleTask
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_serde_user_properties
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/229/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/229/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests failed with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

 combine2_win.q.out needs update for HIVE-3253 (increasing nesting levels)
 -

 Key: HIVE-4951
 URL: https://issues.apache.org/jira/browse/HIVE-4951
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-4951.1.patch


 combine2.q was updated in HIVE-3253, the corresponding change is missing in 
 combine2_win.q, causing it to fail on windows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4638) Thread local PerfLog can get shared by multiple hiveserver2 sessions

2013-07-30 Thread Prasad Mujumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723510#comment-13723510
 ] 

Prasad Mujumdar commented on HIVE-4638:
---

[~ashutoshc] my apologies for missing the comment earlier.

We found the issue in one of our internal integration tests with Cloudera 
Manager. The exec hook retrieves the query start time 
hookContext.getQueryPlan().getQueryStartTime()which sometimes used to return 
bogus timestamp values. The problem didn't reproduce with the patch.


 Thread local PerfLog can get shared by multiple hiveserver2 sessions
 

 Key: HIVE-4638
 URL: https://issues.apache.org/jira/browse/HIVE-4638
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, Query Processor
Affects Versions: 0.11.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Fix For: 0.12.0

 Attachments: HIVE-4638-1.patch


 The PerfLog is accessed as thread local which can be shared by multiple 
 hiveserver2 session, overwriting query runtime information.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4870) Explain Extended to show partition info for Fetch Task

2013-07-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723511#comment-13723511
 ] 

Hive QA commented on HIVE-4870:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12594826/HIVE-4870.patch

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 2736 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join32_lessSize
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_serde_user_properties
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union22
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_8
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketmapjoin7
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/230/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/230/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests failed with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

 Explain Extended to show partition info for Fetch Task
 --

 Key: HIVE-4870
 URL: https://issues.apache.org/jira/browse/HIVE-4870
 Project: Hive
  Issue Type: Bug
  Components: Query Processor, Tests
Affects Versions: 0.11.0
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Attachments: HIVE-4870.patch


 Explain extended does not include partition information for Fetch Task 
 (FetchWork). Map Reduce Task (MapredWork)already does this. 
 Patch includes Partition Description info to Fetch Task.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4954) PTFTranslator hardcodes ranking functions

2013-07-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723513#comment-13723513
 ] 

Hive QA commented on HIVE-4954:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12594832/HIVE-4954.1.patch.txt

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/231/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/231/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests failed with: NonZeroExitCodeException: Command 'bash 
/data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and 
output '+ [[ -n '' ]]
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-Build-231/source-prep.txt
+ mkdir -p maven ivy
+ [[ svn = \s\v\n ]]
+ [[ -n '' ]]
+ [[ -d apache-svn-trunk-source ]]
+ [[ ! -d apache-svn-trunk-source/.svn ]]
+ [[ ! -d apache-svn-trunk-source ]]
+ cd apache-svn-trunk-source
+ svn revert -R .
Reverted 'ql/src/test/results/clientpositive/join33.q.out'
Reverted 'ql/src/test/results/clientpositive/auto_sortmerge_join_4.q.out'
Reverted 'ql/src/test/results/clientpositive/bucketcontext_7.q.out'
Reverted 'ql/src/test/results/clientpositive/auto_sortmerge_join_11.q.out'
Reverted 'ql/src/test/results/clientpositive/bucketcontext_2.q.out'
Reverted 'ql/src/test/results/clientpositive/bucketmapjoin7.q.out'
Reverted 'ql/src/test/results/clientpositive/bucketmapjoin11.q.out'
Reverted 'ql/src/test/results/clientpositive/bucketmapjoin_negative.q.out'
Reverted 'ql/src/test/results/clientpositive/bucketmapjoin2.q.out'
Reverted 'ql/src/test/results/clientpositive/auto_sortmerge_join_1.q.out'
Reverted 'ql/src/test/results/clientpositive/bucketcontext_4.q.out'
Reverted 'ql/src/test/results/clientpositive/sort_merge_join_desc_7.q.out'
Reverted 'ql/src/test/results/clientpositive/bucketmapjoin9.q.out'
Reverted 'ql/src/test/results/clientpositive/bucketmapjoin13.q.out'
Reverted 'ql/src/test/results/clientpositive/union22.q.out'
Reverted 'ql/src/test/results/clientpositive/join32.q.out'
Reverted 'ql/src/test/results/clientpositive/auto_sortmerge_join_8.q.out'
Reverted 'ql/src/test/results/clientpositive/auto_sortmerge_join_3.q.out'
Reverted 'ql/src/test/results/clientpositive/bucketcontext_1.q.out'
Reverted 'ql/src/test/results/clientpositive/bucketmapjoin10.q.out'
Reverted 'ql/src/test/results/clientpositive/bucketmapjoin1.q.out'
Reverted 'ql/src/test/results/clientpositive/bucketcontext_8.q.out'
Reverted 'ql/src/test/results/clientpositive/auto_sortmerge_join_12.q.out'
Reverted 'ql/src/test/results/clientpositive/bucketcontext_3.q.out'
Reverted 'ql/src/test/results/clientpositive/sort_merge_join_desc_6.q.out'
Reverted 'ql/src/test/results/clientpositive/bucketmapjoin8.q.out'
Reverted 'ql/src/test/results/clientpositive/bucketmapjoin12.q.out'
Reverted 'ql/src/test/results/clientpositive/bucketmapjoin3.q.out'
Reverted 'ql/src/test/results/clientpositive/join32_lessSize.q.out'
Reverted 'ql/src/test/results/clientpositive/auto_sortmerge_join_7.q.out'
Reverted 'ql/src/test/results/clientpositive/bucketmapjoin_negative2.q.out'
Reverted 'ql/src/test/results/clientpositive/auto_sortmerge_join_2.q.out'
Reverted 'ql/src/test/results/clientpositive/stats11.q.out'
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/plan/FetchWork.java'
++ egrep -v '^X|^Performing status on external'
++ awk '{print $2}'
++ svn status --no-ignore
+ rm -rf build hcatalog/build hcatalog/core/build 
hcatalog/storage-handlers/hbase/build hcatalog/server-extensions/build 
hcatalog/webhcat/svr/build hcatalog/webhcat/java-client/build 
hcatalog/hcatalog-pig-adapter/build common/src/gen 
ql/src/test/results/clientpositive/bucketmapjoin2.q.out.orig 
ql/src/test/results/clientpositive/join32_lessSize.q.out.orig 
ql/src/test/results/clientpositive/bucketmapjoin1.q.out.orig 
ql/src/test/results/clientpositive/bucketmapjoin7.q.out.orig 
ql/src/test/results/clientpositive/union22.q.out.orig 
ql/src/test/results/clientpositive/auto_sortmerge_join_11.q.out.orig 
ql/src/test/results/clientpositive/bucketmapjoin3.q.out.orig
+ svn update

Fetching external item into 'hcatalog/src/test/e2e/harness'
External at revision 1508328.

At revision 1508328.
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0 to p2
+ exit 1
'
{noformat}

[jira] [Commented] (HIVE-4879) Window functions that imply order can only be registered at compile time

2013-07-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723515#comment-13723515
 ] 

Hive QA commented on HIVE-4879:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12594848/HIVE-4879.3.patch.txt

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/232/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/232/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests failed with: NonZeroExitCodeException: Command 'bash 
/data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and 
output '+ [[ -n '' ]]
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-Build-232/source-prep.txt
+ mkdir -p maven ivy
+ [[ svn = \s\v\n ]]
+ [[ -n '' ]]
+ [[ -d apache-svn-trunk-source ]]
+ [[ ! -d apache-svn-trunk-source/.svn ]]
+ [[ ! -d apache-svn-trunk-source ]]
+ cd apache-svn-trunk-source
+ svn revert -R .
++ egrep -v '^X|^Performing status on external'
++ awk '{print $2}'
++ svn status --no-ignore
+ rm -rf
+ svn update

Fetching external item into 'hcatalog/src/test/e2e/harness'
External at revision 1508328.

At revision 1508328.
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0 to p2
+ exit 1
'
{noformat}

This message is automatically generated.

 Window functions that imply order can only be registered at compile time
 

 Key: HIVE-4879
 URL: https://issues.apache.org/jira/browse/HIVE-4879
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.11.0
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Fix For: 0.12.0

 Attachments: HIVE-4879.1.patch.txt, HIVE-4879.2.patch.txt, 
 HIVE-4879.3.patch.txt


 Adding an annotation for impliesOrder

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4638) Thread local PerfLog can get shared by multiple hiveserver2 sessions

2013-07-30 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723516#comment-13723516
 ] 

Ashutosh Chauhan commented on HIVE-4638:


I see. Can you rebase the patch? Lets get it in.

 Thread local PerfLog can get shared by multiple hiveserver2 sessions
 

 Key: HIVE-4638
 URL: https://issues.apache.org/jira/browse/HIVE-4638
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, Query Processor
Affects Versions: 0.11.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Fix For: 0.12.0

 Attachments: HIVE-4638-1.patch


 The PerfLog is accessed as thread local which can be shared by multiple 
 hiveserver2 session, overwriting query runtime information.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4638) Thread local PerfLog can get shared by multiple hiveserver2 sessions

2013-07-30 Thread Prasad Mujumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Mujumdar updated HIVE-4638:
--

Attachment: HIVE-4638-2.patch

Rebased patch

 Thread local PerfLog can get shared by multiple hiveserver2 sessions
 

 Key: HIVE-4638
 URL: https://issues.apache.org/jira/browse/HIVE-4638
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, Query Processor
Affects Versions: 0.11.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Fix For: 0.12.0

 Attachments: HIVE-4638-1.patch, HIVE-4638-2.patch


 The PerfLog is accessed as thread local which can be shared by multiple 
 hiveserver2 session, overwriting query runtime information.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4574) XMLEncoder thread safety issues in openjdk7 causes HiveServer2 to be stuck

2013-07-30 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723526#comment-13723526
 ] 

Thejas M Nair commented on HIVE-4574:
-

Regarding the bug report, that has gone into the black hole of oracle bug 
reporting system. I haven't heard back from the review process. I wish it was 
really more *open* !
Maybe, switching to a different serialization format as suggested in HIVE-1511 
is the best bet.


 XMLEncoder thread safety issues in openjdk7 causes HiveServer2 to be stuck
 --

 Key: HIVE-4574
 URL: https://issues.apache.org/jira/browse/HIVE-4574
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.11.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-4574.1.patch


 In open jdk7, XMLEncoder.writeObject call leads to calls to 
 java.beans.MethodFinder.findMethod(). MethodFinder class not thread safe 
 because it uses a static WeakHashMap that would get used from multiple 
 threads. See -
 http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/7-b147/com/sun/beans/finder/MethodFinder.java#46
 Concurrent access to HashMap implementation that are not thread safe can 
 sometimes result in infinite-loops and other problems. If jdk7 is in use, it 
 makes sense to synchronize calls to XMLEncoder.writeObject .

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Hive Metastore Server 0.9 Connection Reset and Connection Timeout errors

2013-07-30 Thread Nitin Pawar
The mentioned flow is called when you have unsecure mode of thrift
metastore client-server connection. So one way to avoid this is have a
secure way.

code
public boolean process(final TProtocol in, final TProtocol out)
throwsTException {
setIpAddress(in);
...
...
...
@Override
 protected void setIpAddress(final TProtocol in) {
TUGIContainingTransport ugiTrans =
(TUGIContainingTransport)in.getTransport();
Socket socket = ugiTrans.getSocket();
if (socket != null) {
  setIpAddress(socket);

/code


From the above code snippet, it looks like the null pointer exception is
not handled if the getSocket returns null.

can you check whats the ulimit setting on the server? If its set to default
can you set it to unlimited and restart hcat server. (This is just a wild
guess).

also the getSocket method suggests If the underlying TTransport is an
instance of TSocket, it returns the Socket object which it contains.
Otherwise it returns null.

so someone from thirft gurus need to tell us whats happening. I have no
knowledge of this depth

may be Ashutosh or Thejas will be able to help on this.




From the netstat close_wait, it looks like the hive metastore server has
not closed the connection (do not know why yet), may be the hive dev guys
can help.Are there too many connections in close_wait state?



On Tue, Jul 30, 2013 at 5:52 AM, agateaaa agate...@gmail.com wrote:

 Looking at the hive metastore server logs see errors like these:

 2013-07-26 06:34:52,853 ERROR server.TThreadPoolServer
 (TThreadPoolServer.java:run(182)) - Error occurred during processing of
 message.
 java.lang.NullPointerException
 at

 org.apache.hadoop.hive.metastore.TUGIBasedProcessor.setIpAddress(TUGIBasedProcessor.java:183)
 at

 org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:79)
 at

 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:176)
 at

 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
  at

 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)

 approx same time as we see timeout or connection reset errors.

 Dont know if this is the cause or the side affect of he connection
 timeout/connection reset errors. Does anybody have any pointers or
 suggestions ?

 Thanks


 On Mon, Jul 29, 2013 at 11:29 AM, agateaaa agate...@gmail.com wrote:

  Thanks Nitin!
 
  We have simiar setup (identical hcatalog and hive server versions) on a
  another production environment and dont see any errors (its been running
 ok
  for a few months)
 
  Unfortunately we wont be able to move to hcat 0.5 and hive 0.11 or hive
  0.10 soon.
 
  I did see that the last time we ran into this problem doing a netstat-ntp
  | grep :1 see that server was holding on to one socket connection
 in
  CLOSE_WAIT state for a long time
   (hive metastore server is running on port 1). Dont know if thats
  relevant here or not
 
  Can you suggest any hive configuration settings we can tweak or
 networking
  tools/tips, we can use to narrow this down ?
 
  Thanks
  Agateaaa
 
 
 
 
  On Mon, Jul 29, 2013 at 11:02 AM, Nitin Pawar nitinpawar...@gmail.com
 wrote:
 
  Is there any chance you can do a update on test environment with
 hcat-0.5
  and hive-0(11 or 10) and see if you can reproduce the issue?
 
  We used to see this error when there was load on hcat server or some
  network issue connecting to the server(second one was rare occurrence)
 
 
  On Mon, Jul 29, 2013 at 11:13 PM, agateaaa agate...@gmail.com wrote:
 
  Hi All:
 
  We are running into frequent problem using HCatalog 0.4.1 (HIve
 Metastore
  Server 0.9) where we get connection reset or connection timeout errors.
 
  The hive metastore server has been allocated enough (12G) memory.
 
  This is a critical problem for us and would appreciate if anyone has
 any
  pointers.
 
  We did add a retry logic in our client, which seems to help, but I am
  just
  wondering how can we narrow down to the root cause
  of this problem. Could this be a hiccup in networking which causes the
  hive
  server to get into a unresponsive state  ?
 
  Thanks
 
  Agateaaa
 
 
  Example Connection reset error:
  ===
 
  org.apache.thrift.transport.TTransportException:
  java.net.SocketException:
  Connection reset
  at
 
 
 org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
   at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
  at
 
 
 org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
   at
 
 
 org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
  at
 
 
 org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
   at
 org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
  at
 
 
 

[jira] [Commented] (HIVE-3976) Support specifying scale and precision with Hive decimal type

2013-07-30 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723541#comment-13723541
 ] 

Jason Dere commented on HIVE-3976:
--

Hi Ed/Xuefu, yeah I have similar issues for setting the length parameter for 
char/varchar types for https://issues.apache.org/jira/browse/HIVE-4844. I've 
got some prototype code for this, not entirely sure if I have the best 
approach, was going to try to work through this a bit more but if you'd like I 
can post a patch for you guys to take a look/comment on.  
Basically I've added parameterized versions of PrimitiveTypeEntry, 
PrimitiveTypeInfo, and ObjectInspector, with additional factory methods for 
these types so that the caller can fetch TypeEntry/TypeInfo/ObjectInspector 
based on PrimitiveCategory + type parameters. Will definitely need to work out 
how this interacts with the existing system as currently all of those types 
have liberal use of pointer-based equality, and there seem to be some instances 
where it may not be possible to have have access to type params when trying to 
get the TypeEntry/TypeInfo/ObjectInspector.

 Support specifying scale and precision with Hive decimal type
 -

 Key: HIVE-3976
 URL: https://issues.apache.org/jira/browse/HIVE-3976
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor, Types
Reporter: Mark Grover
Assignee: Xuefu Zhang

 HIVE-2693 introduced support for Decimal datatype in Hive. However, the 
 current implementation has unlimited precision and provides no way to specify 
 precision and scale when creating the table.
 For example, MySQL allows users to specify scale and precision of the decimal 
 datatype when creating the table:
 {code}
 CREATE TABLE numbers (a DECIMAL(20,2));
 {code}
 Hive should support something similar too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3256) Update asm version in Hive

2013-07-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723567#comment-13723567
 ] 

Hive QA commented on HIVE-3256:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12594735/HIVE-3256.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 2736 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_serde_user_properties
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/233/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/233/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests failed with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

 Update asm version in Hive
 --

 Key: HIVE-3256
 URL: https://issues.apache.org/jira/browse/HIVE-3256
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Zhenxiao Luo
Assignee: Ashutosh Chauhan
 Attachments: HIVE-3256.patch


 Hive trunk are currently using asm version 3.1, Hadoop trunk are on 3.2. Any
 objections to bumping the Hive version to 3.2 to be inline with Hadoop

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4955) serde_user_properties.q.out needs to be updated

2013-07-30 Thread Thejas M Nair (JIRA)
Thejas M Nair created HIVE-4955:
---

 Summary: serde_user_properties.q.out needs to be updated
 Key: HIVE-4955
 URL: https://issues.apache.org/jira/browse/HIVE-4955
 Project: Hive
  Issue Type: Bug
Reporter: Thejas M Nair


The testcase TestCliDriver.testCliDriver_serde_user_properties was added in 
HIVE-2906, which was committed around few minutes before HIVE-4825. HIVE-4825 
has changes that change the expected results of serde_user_properties.q, 
causing the test to fail now.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4951) combine2_win.q.out needs update for HIVE-3253 (increasing nesting levels)

2013-07-30 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723584#comment-13723584
 ] 

Thejas M Nair commented on HIVE-4951:
-

The test failures are unrelated to this q.out file change. 
TestOrcDynamicPartitioned.testHCatDynamicPartitionedTableMultipleTask is known 
to be a flaky test (HIVE-4851). 
TestCliDriver.testCliDriver_serde_user_properties failure also unrelated to 
this change, created HIVE-4955 to track that.

 combine2_win.q.out needs update for HIVE-3253 (increasing nesting levels)
 -

 Key: HIVE-4951
 URL: https://issues.apache.org/jira/browse/HIVE-4951
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-4951.1.patch


 combine2.q was updated in HIVE-3253, the corresponding change is missing in 
 combine2_win.q, causing it to fail on windows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4955) serde_user_properties.q.out needs to be updated

2013-07-30 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-4955:


Assignee: Thejas M Nair

 serde_user_properties.q.out needs to be updated
 ---

 Key: HIVE-4955
 URL: https://issues.apache.org/jira/browse/HIVE-4955
 Project: Hive
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Thejas M Nair

 The testcase TestCliDriver.testCliDriver_serde_user_properties was added in 
 HIVE-2906, which was committed around few minutes before HIVE-4825. HIVE-4825 
 has changes that change the expected results of serde_user_properties.q, 
 causing the test to fail now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4955) serde_user_properties.q.out needs to be updated

2013-07-30 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-4955:


Attachment: HIVE-4955.1.patch

 serde_user_properties.q.out needs to be updated
 ---

 Key: HIVE-4955
 URL: https://issues.apache.org/jira/browse/HIVE-4955
 Project: Hive
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-4955.1.patch


 The testcase TestCliDriver.testCliDriver_serde_user_properties was added in 
 HIVE-2906, which was committed around few minutes before HIVE-4825. HIVE-4825 
 has changes that change the expected results of serde_user_properties.q, 
 causing the test to fail now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4955) serde_user_properties.q.out needs to be updated

2013-07-30 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-4955:


Status: Patch Available  (was: Open)

 serde_user_properties.q.out needs to be updated
 ---

 Key: HIVE-4955
 URL: https://issues.apache.org/jira/browse/HIVE-4955
 Project: Hive
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-4955.1.patch


 The testcase TestCliDriver.testCliDriver_serde_user_properties was added in 
 HIVE-2906, which was committed around few minutes before HIVE-4825. HIVE-4825 
 has changes that change the expected results of serde_user_properties.q, 
 causing the test to fail now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2608) Do not require AS a,b,c part in LATERAL VIEW

2013-07-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723612#comment-13723612
 ] 

Hive QA commented on HIVE-2608:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12594851/HIVE-2608.8.patch.txt

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 2737 tests executed
*Failed tests:*
{noformat}
org.apache.hcatalog.pig.TestOrcHCatLoader.testReadDataBasic
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_lateral_view_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_serde_user_properties
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_udtf_not_supported2
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/234/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/234/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests failed with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

 Do not require AS a,b,c part in LATERAL VIEW
 

 Key: HIVE-2608
 URL: https://issues.apache.org/jira/browse/HIVE-2608
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor, UDF
Reporter: Igor Kabiljo
Assignee: Navis
Priority: Minor
 Attachments: HIVE-2608.8.patch.txt, HIVE-2608.D4317.5.patch, 
 HIVE-2608.D4317.6.patch


 Currently, it is required to state column names when LATERAL VIEW is used.
 That shouldn't be necessary, since UDTF returns struct which contains column 
 names - and they should be used by default.
 For example, it would be great if this was possible:
 SELECT t.*, t.key1 + t.key4
 FROM some_table
 LATERAL VIEW JSON_TUPLE(json, 'key1', 'key2', 'key3', 'key3') t;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4843) Refactoring MapRedTask and ExecDriver for better re-usability (for tez) and readability

2013-07-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723673#comment-13723673
 ] 

Hive QA commented on HIVE-4843:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12594872/HIVE-4843.4.patch

{color:red}ERROR:{color} -1 due to 18 failed/errored test(s), 2736 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_escape1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketsortoptimize_insert_8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_smb_mapjoin_12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_load_dyn_part14
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppr_pushdown
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_smb_mapjoin_11
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketsortoptimize_insert_7
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_view
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_special_char
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketsortoptimize_insert_6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_combine2_hadoop20
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_decode_name
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketsortoptimize_insert_4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_partition_coltype
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_serde_user_properties
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketsortoptimize_insert_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_escape2
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/235/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/235/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests failed with: TestsFailedException: 18 tests failed
{noformat}

This message is automatically generated.

 Refactoring MapRedTask and ExecDriver for better re-usability (for tez) and 
 readability
 ---

 Key: HIVE-4843
 URL: https://issues.apache.org/jira/browse/HIVE-4843
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0, tez-branch
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Attachments: HIVE-4843.1.patch, HIVE-4843.2.patch, HIVE-4843.3.patch, 
 HIVE-4843.4.patch


 Currently, there are static apis in multiple locations in ExecDriver and 
 MapRedTask that can be leveraged if put in the already existing utility class 
 in the exec package. This would help making the code more maintainable, 
 readable and also re-usable by other run-time infra such as tez.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-821) Return better status messages from HWI

2013-07-30 Thread manuel aldana (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723684#comment-13723684
 ] 

manuel aldana commented on HIVE-821:


what's the current status of this? as the dependent HIVE-862 + HIVE-795 are 
resolved we could start to work on this?

 Return better status messages from HWI
 --

 Key: HIVE-821
 URL: https://issues.apache.org/jira/browse/HIVE-821
 Project: Hive
  Issue Type: New Feature
  Components: Web UI
Reporter: Edward Capriolo
Assignee: Edward Capriolo

 Users of HWI only receive numeric status code. We should return the message 
 to them as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-821) Return better status messages from HWI

2013-07-30 Thread manuel aldana (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723685#comment-13723685
 ] 

manuel aldana commented on HIVE-821:


also progress (typical hive 0%100% output) information would be very 
helpful.

 Return better status messages from HWI
 --

 Key: HIVE-821
 URL: https://issues.apache.org/jira/browse/HIVE-821
 Project: Hive
  Issue Type: New Feature
  Components: Web UI
Reporter: Edward Capriolo
Assignee: Edward Capriolo

 Users of HWI only receive numeric status code. We should return the message 
 to them as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3618) Hive jdbc or thrift server didn't support a method to get job progress when hive client execute a query

2013-07-30 Thread manuel aldana (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723692#comment-13723692
 ] 

manuel aldana commented on HIVE-3618:
-

is this now possible with hiveServer2 ?

 Hive jdbc or thrift server didn't support a method to get job progress when 
 hive client execute a query
 ---

 Key: HIVE-3618
 URL: https://issues.apache.org/jira/browse/HIVE-3618
 Project: Hive
  Issue Type: Wish
  Components: JDBC, Thrift API, Web UI
Affects Versions: 0.7.1, 0.8.0, 0.9.0
Reporter: chenyukang

 I am writing a Hive web client to run a Hive query using Hive JDBC driver or 
 Hive Thrift Server. Since the data amount is huge I really would like to see 
 the progress when the query is running. I want to get the job progress. 
 otherwise user should wait and be blocked

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4850) Implement vectorized JOIN operators

2013-07-30 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated HIVE-4850:
---

Attachment: HIVE-4850.1.patch

This is an initial implementation of Map join. Multiple join aliases and 
multiple values per key work. The small aliases are row mode data (writable 
objects) and get converted to vector values *for each row in the bit table* 
(after filtering). Also the map hash has row mode keys (objects) and the vector 
mode keys get converted to object keys for lookup of *each row in the big 
table* (after filtering). 

 Implement vectorized JOIN operators
 ---

 Key: HIVE-4850
 URL: https://issues.apache.org/jira/browse/HIVE-4850
 Project: Hive
  Issue Type: Sub-task
Reporter: Remus Rusanu
Assignee: Remus Rusanu
 Attachments: HIVE-4850.1.patch


 Easysauce

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4956) Allow multiple tables in from clause if all them have the same schema, but can be partitioned differently

2013-07-30 Thread Amareshwari Sriramadasu (JIRA)
Amareshwari Sriramadasu created HIVE-4956:
-

 Summary: Allow multiple tables in from clause if all them have the 
same schema, but can be partitioned differently
 Key: HIVE-4956
 URL: https://issues.apache.org/jira/browse/HIVE-4956
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu


We have a usecase where the table storage partitioning changes over time.

For ex:
 we can have a table T1 which is partitioned by p1. But overtime, we want to 
partition the table on p1 and p2 as well. The new table can be T2. So, if we 
have to query table on partition p1, it will be a union query across two table 
T1 and T2. Especially with aggregations like avg, it becomes costly union query 
because we cannot make use of mapside aggregations and other optimizations.

The proposal is to support queries of the following format :

select t.x, t.y,  from T1,T2 t where t.p1='x' OR t.p1='y' ... 
[groupby-clause] [having-clause] [orderby-clause] and so on.

Here we allow from clause as a comma separated list of tables with an alias and 
alias will be used in the full query, and partition pruning will happen on the 
actual tables to pick up the right paths. This will work because the difference 
is only on picking up the input paths and whole operator tree does not change. 
If this sounds a good usecase, I can put up the changes required to support the 
same.





--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4956) Allow multiple tables in from clause if all them have the same schema, but can be partitioned differently

2013-07-30 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723712#comment-13723712
 ] 

Amareshwari Sriramadasu commented on HIVE-4956:
---

The same usecase can be applied for tables stored at different rollups like 
daily rollups and hourly rollups.

 Allow multiple tables in from clause if all them have the same schema, but 
 can be partitioned differently
 -

 Key: HIVE-4956
 URL: https://issues.apache.org/jira/browse/HIVE-4956
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu

 We have a usecase where the table storage partitioning changes over time.
 For ex:
  we can have a table T1 which is partitioned by p1. But overtime, we want to 
 partition the table on p1 and p2 as well. The new table can be T2. So, if we 
 have to query table on partition p1, it will be a union query across two 
 table T1 and T2. Especially with aggregations like avg, it becomes costly 
 union query because we cannot make use of mapside aggregations and other 
 optimizations.
 The proposal is to support queries of the following format :
 select t.x, t.y,  from T1,T2 t where t.p1='x' OR t.p1='y' ... 
 [groupby-clause] [having-clause] [orderby-clause] and so on.
 Here we allow from clause as a comma separated list of tables with an alias 
 and alias will be used in the full query, and partition pruning will happen 
 on the actual tables to pick up the right paths. This will work because the 
 difference is only on picking up the input paths and whole operator tree does 
 not change. If this sounds a good usecase, I can put up the changes required 
 to support the same.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Review Request 13059: HIVE-4850 Implement vector mode map join

2013-07-30 Thread Remus Rusanu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13059/
---

Review request for hive, Eric Hanson and Jitendra Pandey.


Bugs: HIVE-4850
https://issues.apache.org/jira/browse/HIVE-4850


Repository: hive-git


Description
---

This is not the final iteration, but I thought is easier to discuss it with a 
review.
This implementation works, handles multiple aliases and multiple values per 
key. The implementation uses the exiting hash tables saved by the local task 
for the map join, which are row mode hash tables (have row mode keys and store 
row mode writable object values). Going forward we should avoid the 
size-of-big-table conversions of big table keys to row-mode and conversion of 
small table values to vector data. This would require either converting 
on-the-fly the hash tables to vector friendly ones (when loaded) or changing 
the local task tahstable sink to create a vectorization friendly hash. First 
approach may have memory consumption problems (potentially two hash tables end 
up in memory, would have to stream the transformation or transform as reading 
from serialized format... nasty).


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java 82d4b93 
  ql/src/java/org/apache/hadoop/hive/ql/exec/JoinUtil.java 31dbf41 
  ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java 4da1be8 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java 29de38d 
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java e579c00 
  ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinDoubleKeys.java 
d774226 
  ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinObjectKey.java 
791bb3f 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinObjectValue.java 
58a9dc0 
  ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinSingleKey.java 
4bff936 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/ColumnVector.java 8b4c615 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssign.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssignFactory.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorExecMapper.java 
083b9b9 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapOperator.java 
41d2001 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java 
9c90230 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedRowBatch.java 
ff13f89 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorExpressionWriterFactory.java
 9e189c9 
  ql/src/java/org/apache/hadoop/hive/ql/plan/HashTableDummyDesc.java f15ce48 

Diff: https://reviews.apache.org/r/13059/diff/


Testing
---

Manually run some join queries on alltypes_orc table.


Thanks,

Remus Rusanu



[jira] [Commented] (HIVE-4952) When hive.join.emit.interval is small, queries optimized by Correlation Optimizer may generate wrong results

2013-07-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723726#comment-13723726
 ] 

Hive QA commented on HIVE-4952:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12594874/HIVE-4952.D11889.1.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 2737 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_serde_user_properties
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/236/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/236/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests failed with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

 When hive.join.emit.interval is small, queries optimized by Correlation 
 Optimizer may generate wrong results
 

 Key: HIVE-4952
 URL: https://issues.apache.org/jira/browse/HIVE-4952
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Yin Huai
Assignee: Yin Huai
 Attachments: HIVE-4952.D11889.1.patch, replay.txt


 If we have a query like this ...
 {code:sql}
 SELECT xx.key, xx.cnt, yy.key
 FROM
 (SELECT x.key as key, count(1) as cnt FROM src1 x JOIN src1 y ON (x.key = 
 y.key) group by x.key) xx
 JOIN src yy
 ON xx.key=yy.key;
 {\code}
 After Correlation Optimizer, the operator tree in the reducer will be 
 {code}
  JOIN2
|
|
   MUX
  /   \
 / \
GBY |
 |  |
   JOIN1|
 \ /
  \   /
  DEMUX
 {\code}
 For JOIN2, the right table will arrive at this operator first. If 
 hive.join.emit.interval is small, e.g. 1, JOIN2 will output the results even 
 it has not got any row from the left table. The logic related 
 hive.join.emit.interval in JoinOperator assumes that inputs will be ordered 
 by the tag. But, if a query has been optimized by Correlation Optimizer, this 
 assumption may not hold for those JoinOperators inside the reducer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4955) serde_user_properties.q.out needs to be updated

2013-07-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723781#comment-13723781
 ] 

Hive QA commented on HIVE-4955:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12594901/HIVE-4955.1.patch

{color:green}SUCCESS:{color} +1 2736 tests passed

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/238/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/238/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

 serde_user_properties.q.out needs to be updated
 ---

 Key: HIVE-4955
 URL: https://issues.apache.org/jira/browse/HIVE-4955
 Project: Hive
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-4955.1.patch


 The testcase TestCliDriver.testCliDriver_serde_user_properties was added in 
 HIVE-2906, which was committed around few minutes before HIVE-4825. HIVE-4825 
 has changes that change the expected results of serde_user_properties.q, 
 causing the test to fail now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4955) serde_user_properties.q.out needs to be updated

2013-07-30 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723815#comment-13723815
 ] 

Brock Noland commented on HIVE-4955:


+1


 serde_user_properties.q.out needs to be updated
 ---

 Key: HIVE-4955
 URL: https://issues.apache.org/jira/browse/HIVE-4955
 Project: Hive
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-4955.1.patch


 The testcase TestCliDriver.testCliDriver_serde_user_properties was added in 
 HIVE-2906, which was committed around few minutes before HIVE-4825. HIVE-4825 
 has changes that change the expected results of serde_user_properties.q, 
 causing the test to fail now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4955) serde_user_properties.q.out needs to be updated

2013-07-30 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-4955:
---

   Resolution: Fixed
Fix Version/s: 0.12.0
   Status: Resolved  (was: Patch Available)

I committed this since it's causing the build to fail. Thanks for your 
contribution Thejas!

 serde_user_properties.q.out needs to be updated
 ---

 Key: HIVE-4955
 URL: https://issues.apache.org/jira/browse/HIVE-4955
 Project: Hive
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.12.0

 Attachments: HIVE-4955.1.patch


 The testcase TestCliDriver.testCliDriver_serde_user_properties was added in 
 HIVE-2906, which was committed around few minutes before HIVE-4825. HIVE-4825 
 has changes that change the expected results of serde_user_properties.q, 
 causing the test to fail now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3256) Update asm version in Hive

2013-07-30 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723833#comment-13723833
 ] 

Brock Noland commented on HIVE-3256:


Test failed due to HIVE-4955.

 Update asm version in Hive
 --

 Key: HIVE-3256
 URL: https://issues.apache.org/jira/browse/HIVE-3256
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Zhenxiao Luo
Assignee: Ashutosh Chauhan
 Attachments: HIVE-3256.patch


 Hive trunk are currently using asm version 3.1, Hadoop trunk are on 3.2. Any
 objections to bumping the Hive version to 3.2 to be inline with Hadoop

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2564) Set dbname at JDBC URL or properties

2013-07-30 Thread Jin Adachi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jin Adachi updated HIVE-2564:
-

Attachment: HIVE-2564.patch

 Set dbname at JDBC URL or properties
 

 Key: HIVE-2564
 URL: https://issues.apache.org/jira/browse/HIVE-2564
 Project: Hive
  Issue Type: Improvement
  Components: JDBC
Affects Versions: 0.7.1
Reporter: Shinsuke Sugaya
 Attachments: hive-2564.patch, HIVE-2564.patch


 The current Hive implementation ignores a database name at JDBC URL, 
 though we can set it by executing use DBNAME statement.
 I think it is better to also specify a database name at JDBC URL or database 
 properties.
 Therefore, I'll attach the patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2564) Set dbname at JDBC URL or properties

2013-07-30 Thread Jin Adachi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jin Adachi updated HIVE-2564:
-

Attachment: HIVE-2564.1.patch

 Set dbname at JDBC URL or properties
 

 Key: HIVE-2564
 URL: https://issues.apache.org/jira/browse/HIVE-2564
 Project: Hive
  Issue Type: Improvement
  Components: JDBC
Affects Versions: 0.7.1
Reporter: Shinsuke Sugaya
 Attachments: HIVE-2564.1.patch, hive-2564.patch


 The current Hive implementation ignores a database name at JDBC URL, 
 though we can set it by executing use DBNAME statement.
 I think it is better to also specify a database name at JDBC URL or database 
 properties.
 Therefore, I'll attach the patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2564) Set dbname at JDBC URL or properties

2013-07-30 Thread Jin Adachi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jin Adachi updated HIVE-2564:
-

Attachment: (was: HIVE-2564.patch)

 Set dbname at JDBC URL or properties
 

 Key: HIVE-2564
 URL: https://issues.apache.org/jira/browse/HIVE-2564
 Project: Hive
  Issue Type: Improvement
  Components: JDBC
Affects Versions: 0.7.1
Reporter: Shinsuke Sugaya
 Attachments: HIVE-2564.1.patch, hive-2564.patch


 The current Hive implementation ignores a database name at JDBC URL, 
 though we can set it by executing use DBNAME statement.
 I think it is better to also specify a database name at JDBC URL or database 
 properties.
 Therefore, I'll attach the patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2137) JDBC driver doesn't encode string properly.

2013-07-30 Thread Kousuke Saruta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated HIVE-2137:
-

Labels: patch  (was: )

 JDBC driver doesn't encode string properly.
 ---

 Key: HIVE-2137
 URL: https://issues.apache.org/jira/browse/HIVE-2137
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.9.0
Reporter: Jin Adachi
  Labels: patch
 Fix For: 0.12.0

 Attachments: HIVE-2137.patch, HIVE-2137.patch


 JDBC driver for HiveServer1 decodes string by client side default encoding, 
 which depends on operating system unless we don't specify another encoding. 
 It ignore server side encoding. 
 For example, 
 when server side operating system and encoding are Linux (utf-8) and client 
 side operating system and encoding are Windows (shift-jis : it's japanese 
 charset, makes character corruption happens in the client.
 In current implementation of Hive, UTF-8 appears to be expected in server 
 side so client side should encode/decode string as UTF-8.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4388) HBase tests fail against Hadoop 2

2013-07-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723883#comment-13723883
 ] 

Hive QA commented on HIVE-4388:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12594727/HIVE-4388.patch

{color:red}ERROR:{color} -1 due to 15 failed/errored test(s), 2717 tests 
executed
*Failed tests:*
{noformat}
junit.framework.TestSuite.org.apache.hcatalog.hbase.TestSnapshots
junit.framework.TestSuite.org.apache.hcatalog.hbase.snapshot.TestZNodeSetUp
junit.framework.TestSuite.org.apache.hcatalog.hbase.snapshot.TestIDGenerator
junit.framework.TestSuite.org.apache.hcatalog.hbase.snapshot.TestRevisionManagerEndpoint
junit.framework.TestSuite.org.apache.hcatalog.hbase.TestHBaseHCatStorageHandler
junit.framework.TestSuite.org.apache.hcatalog.hbase.TestHBaseDirectOutputFormat
org.apache.hadoop.hive.hbase.TestHBaseSerDe.testHBaseSerDeII
junit.framework.TestSuite.org.apache.hcatalog.hbase.TestHBaseInputFormat
junit.framework.TestSuite.org.apache.hcatalog.hbase.TestHBaseBulkOutputFormat
junit.framework.TestSuite.org.apache.hcatalog.hbase.snapshot.TestRevisionManager
org.apache.hadoop.hive.hbase.TestHBaseSerDe.testHBaseSerDeI
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_serde_user_properties
org.apache.hadoop.hive.hbase.TestHBaseSerDe.testHBaseSerDeWithColumnPrefixes
org.apache.hadoop.hive.hbase.TestHBaseSerDe.testHBaseSerDeWithHiveMapToHBaseColumnFamilyII
org.apache.hadoop.hive.hbase.TestHBaseSerDe.testHBaseSerDeWithTimestamp
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/239/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/239/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests failed with: TestsFailedException: 15 tests failed
{noformat}

This message is automatically generated.

 HBase tests fail against Hadoop 2
 -

 Key: HIVE-4388
 URL: https://issues.apache.org/jira/browse/HIVE-4388
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Brock Noland
 Attachments: HIVE-4388.patch, HIVE-4388-wip.txt


 Currently we're building by default against 0.92. When you run against hadoop 
 2 (-Dhadoop.mr.rev=23) builds fail because of: HBASE-5963.
 HIVE-3861 upgrades the version of hbase used. This will get you past the 
 problem in HBASE-5963 (which was fixed in 0.94.1) but fails with: HBASE-6396.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4388) HBase tests fail against Hadoop 2

2013-07-30 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-4388:
---

Attachment: HIVE-4388.patch

Update patch should fix some failures.

 HBase tests fail against Hadoop 2
 -

 Key: HIVE-4388
 URL: https://issues.apache.org/jira/browse/HIVE-4388
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Brock Noland
 Attachments: HIVE-4388.patch, HIVE-4388.patch, HIVE-4388-wip.txt


 Currently we're building by default against 0.92. When you run against hadoop 
 2 (-Dhadoop.mr.rev=23) builds fail because of: HBASE-5963.
 HIVE-3861 upgrades the version of hbase used. This will get you past the 
 problem in HBASE-5963 (which was fixed in 0.94.1) but fails with: HBASE-6396.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4574) XMLEncoder thread safety issues in openjdk7 causes HiveServer2 to be stuck

2013-07-30 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723972#comment-13723972
 ] 

Edward Capriolo commented on HIVE-4574:
---

I am pretty close to having an xstream patch on HIVE-1511

 XMLEncoder thread safety issues in openjdk7 causes HiveServer2 to be stuck
 --

 Key: HIVE-4574
 URL: https://issues.apache.org/jira/browse/HIVE-4574
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.11.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-4574.1.patch


 In open jdk7, XMLEncoder.writeObject call leads to calls to 
 java.beans.MethodFinder.findMethod(). MethodFinder class not thread safe 
 because it uses a static WeakHashMap that would get used from multiple 
 threads. See -
 http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/7-b147/com/sun/beans/finder/MethodFinder.java#46
 Concurrent access to HashMap implementation that are not thread safe can 
 sometimes result in infinite-loops and other problems. If jdk7 is in use, it 
 makes sense to synchronize calls to XMLEncoder.writeObject .

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3256) Update asm version in Hive

2013-07-30 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-3256:
---

   Resolution: Fixed
Fix Version/s: 0.12.0
   Status: Resolved  (was: Patch Available)

Thank you for the contribution Ashutosh! I have committed this to trunk.

 Update asm version in Hive
 --

 Key: HIVE-3256
 URL: https://issues.apache.org/jira/browse/HIVE-3256
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Zhenxiao Luo
Assignee: Ashutosh Chauhan
 Fix For: 0.12.0

 Attachments: HIVE-3256.patch


 Hive trunk are currently using asm version 3.1, Hadoop trunk are on 3.2. Any
 objections to bumping the Hive version to 3.2 to be inline with Hadoop

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4920) PTest2 handle Spot Price increases gracefully and improve rsync paralllelsim

2013-07-30 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-4920:
---

Attachment: HIVE-4920.patch

Minor update to handle the hcat tests (which use TestSuite) correctly.

 PTest2 handle Spot Price increases gracefully and improve rsync paralllelsim
 

 Key: HIVE-4920
 URL: https://issues.apache.org/jira/browse/HIVE-4920
 Project: Hive
  Issue Type: Improvement
Reporter: Brock Noland
Assignee: Brock Noland
Priority: Critical
 Attachments: HIVE-4920.patch, HIVE-4920.patch, HIVE-4920.patch, 
 Screen Shot 2013-07-23 at 3.35.00 PM.png


 We should handle spot price increases more gracefully and parallelize rsync 
 to slaves better
 NO PRECOMMIT TESTS

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2564) Set dbname at JDBC URL or properties

2013-07-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724015#comment-13724015
 ] 

Hive QA commented on HIVE-2564:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12594952/HIVE-2564.1.patch

{color:red}ERROR:{color} -1 due to 58 failed/errored test(s), 2738 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDriverProperties
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDatabaseMetaData
org.apache.hive.jdbc.TestJdbcDriver2.testSelectAll
org.apache.hive.jdbc.TestJdbcDriver2.testResultSetMetaData
org.apache.hive.jdbc.TestJdbcDriver2.testMetaDataGetColumnsMetaData
org.apache.hive.jdbc.TestJdbcDriver2.testDescribeTable
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSelectAllFetchSize
org.apache.hive.jdbc.TestJdbcDriver2.testNullType
org.apache.hive.jdbc.TestJdbcDriver2.testDuplicateColumnNameOrder
org.apache.hive.jdbc.TestJdbcDriver2.testProccedures
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSelectAllMaxRows
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testErrorMessages
org.apache.hive.jdbc.TestJdbcDriver2.testSelectAllFetchSize
org.apache.hive.jdbc.TestJdbcDriver2.testDataTypes2
org.apache.hive.jdbc.TestJdbcDriver2.testOutOfBoundCols
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSelectAll
org.apache.hive.jdbc.TestJdbcDriver2.testDriverProperties
org.apache.hive.jdbc.TestJdbcDriver2.testErrorDiag
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testNullType
org.apache.hive.jdbc.TestJdbcDriver2.testProcCols
org.apache.hive.jdbc.TestJdbcDriver2.testBuiltInUDFCol
org.apache.hive.jdbc.TestJdbcDriver2.testPostClose
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetColumnsMetaData
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetTableTypes
org.apache.hive.jdbc.TestJdbcDriver2.testErrorMessages
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSelectAllPartioned
org.apache.hive.jdbc.TestJdbcDriver2.testMetaDataGetColumns
org.apache.hive.jdbc.TestJdbcDriver2.testPrimaryKeys
org.apache.hive.jdbc.TestJdbcDriver2.testShowTables
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetTables
org.apache.hive.jdbc.TestJdbcDriver2.testShowDatabases
org.apache.hive.jdbc.TestJdbcDriver2.testExplainStmt
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testResultSetMetaData
org.apache.hive.jdbc.TestJdbcDriver2.testBadURL
org.apache.hive.jdbc.TestJdbcDriver2.testMetaDataGetCatalogs
org.apache.hive.jdbc.TestJdbcDriver2.testImportedKeys
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testShowTables
org.apache.hive.jdbc.TestJdbcDriver2.testPrepareStatement
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetCatalogs
org.apache.hive.jdbc.TestJdbcDriver2.testInvalidURL
org.apache.hive.jdbc.TestJdbcDriver2.testSetCommand
org.apache.hive.jdbc.TestJdbcDriver2.testSelectAllMaxRows
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testExplainStmt
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDescribeTable
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDataTypes
org.apache.hive.jdbc.TestJdbcDriver2.testDataTypes
org.apache.hive.jdbc.TestJdbcDriver2.testSelectAllPartioned
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSetCommand
org.apache.hive.jdbc.TestJdbcDriver2.testMetaDataGetTables
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetColumns
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testPrepareStatement
org.apache.hive.jdbc.TestJdbcDriver2.testDatabaseMetaData
org.apache.hive.jdbc.TestJdbcDriver2.testMetaDataGetSchemas
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetSchemas
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testConversionsBaseResultSet
org.apache.hive.jdbc.TestJdbcDriver2.testExprCol
org.apache.hive.jdbc.TestJdbcDriver2.testMetaDataGetTableTypes
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testShowDatabases
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/240/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/240/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests failed with: TestsFailedException: 58 tests failed
{noformat}

This message is automatically generated.

 Set dbname at JDBC URL or properties
 

 Key: HIVE-2564
 URL: https://issues.apache.org/jira/browse/HIVE-2564
 Project: Hive
  Issue Type: Improvement
  Components: JDBC
Affects Versions: 0.7.1
Reporter: Shinsuke Sugaya
 Attachments: HIVE-2564.1.patch, hive-2564.patch


 The current Hive implementation ignores a database name at JDBC URL, 
 though we can set it by executing use DBNAME statement.
 I think it is better to also specify a 

[jira] [Commented] (HIVE-3976) Support specifying scale and precision with Hive decimal type

2013-07-30 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724019#comment-13724019
 ] 

Xuefu Zhang commented on HIVE-3976:
---

Thanks, Jason.

Please do attach a patch so that we can see how you plan to do it, even if it's 
incomplete. I think our issues belong to the same category, so a generic 
approach works best.

 Support specifying scale and precision with Hive decimal type
 -

 Key: HIVE-3976
 URL: https://issues.apache.org/jira/browse/HIVE-3976
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor, Types
Reporter: Mark Grover
Assignee: Xuefu Zhang

 HIVE-2693 introduced support for Decimal datatype in Hive. However, the 
 current implementation has unlimited precision and provides no way to specify 
 precision and scale when creating the table.
 For example, MySQL allows users to specify scale and precision of the decimal 
 datatype when creating the table:
 {code}
 CREATE TABLE numbers (a DECIMAL(20,2));
 {code}
 Hive should support something similar too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4957) Restrict number of bit vectors, to prevent out of Java heap memory

2013-07-30 Thread Brock Noland (JIRA)
Brock Noland created HIVE-4957:
--

 Summary: Restrict number of bit vectors, to prevent out of Java 
heap memory
 Key: HIVE-4957
 URL: https://issues.apache.org/jira/browse/HIVE-4957
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0
Reporter: Brock Noland


normally increase number of bit vectors will increase calculation accuracy. 
Let's say
{noformat}
select compute_stats(a, 40) from test_hive;
{noformat}
generally get better accuracy than
{noformat}
select compute_stats(a, 16) from test_hive;
{noformat}
But larger number of bit vectors also cause query run slower. When number of 
bit vectors over 50, it won't help to increase accuracy anymore. But it still 
increase memory usage, and crash Hive if number if too huge. Current Hive 
doesn't prevent user use ridiculous large number of bit vectors in 
'compute_stats' query.

One example
{noformat}
select compute_stats(a, 9) from column_eight_types;
{noformat}
crashes Hive.

{noformat}
2012-12-20 23:21:52,247 Stage-1 map = 0%,  reduce = 0%
2012-12-20 23:22:11,315 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.29 
sec
MapReduce Total cumulative CPU time: 290 msec
Ended Job = job_1354923204155_0777 with errors
Error during job, obtaining debugging information...
Job Tracking URL: 
http://cs-10-20-81-171.cloud.cloudera.com:8088/proxy/application_1354923204155_0777/
Examining task ID: task_1354923204155_0777_m_00 (and more) from job 
job_1354923204155_0777

Task with the most failures(4): 
-
Task ID:
  task_1354923204155_0777_m_00

URL:
  
http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1354923204155_0777tipid=task_1354923204155_0777_m_00
-
Diagnostic Messages for this Task:
Error: Java heap space
{noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4958) AppConfig.init() loads webhcat-*.xml before core-*.xml

2013-07-30 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-4958:


 Summary: AppConfig.init() loads webhcat-*.xml before core-*.xml
 Key: HIVE-4958
 URL: https://issues.apache.org/jira/browse/HIVE-4958
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.12.0
Reporter: Eugene Koifman


This method first loads webhcat-*.xml and then core-*xml, mapred-*.xml, etc.
Shouldn't it be in the opposite order?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3264) Add support for binary dataype to AvroSerde

2013-07-30 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-3264:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Eli for the patch. Thanks, Mark for testcases. 
Thanks, Jakob for the review.

 Add support for binary dataype to AvroSerde
 ---

 Key: HIVE-3264
 URL: https://issues.apache.org/jira/browse/HIVE-3264
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.9.0
Reporter: Jakob Homan
Assignee: Eli Reisman
  Labels: patch
 Fix For: 0.12.0

 Attachments: HIVE-3264-1.patch, HIVE-3264-2.patch, HIVE-3264-3.patch, 
 HIVE-3264-4.patch, HIVE-3264-5.patch, HIVE-3264.6.patch, HIVE-3264.7.patch


 When the AvroSerde was written, Hive didn't have a binary type, so Avro's 
 byte array type is converted an array of small ints.  Now that HIVE-2380 is 
 in, this step isn't necessary and we can convert both Avro's bytes type and 
 probably fixed type to Hive's binary type.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4525) Support timestamps earlier than 1970 and later than 2038

2013-07-30 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-4525:
---

   Resolution: Fixed
Fix Version/s: 0.12.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Mikhail!

 Support timestamps earlier than 1970 and later than 2038
 

 Key: HIVE-4525
 URL: https://issues.apache.org/jira/browse/HIVE-4525
 Project: Hive
  Issue Type: Bug
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Fix For: 0.12.0

 Attachments: D10755.1.patch, D10755.2.patch


 TimestampWritable currently serializes timestamps using the lower 31 bits of 
 an int. This does not allow to store timestamps earlier than 1970 or later 
 than a certain point in 2038.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HIVE-2702) Enhance listPartitionsByFilter to add support for integral types both for equality and non-equality

2013-07-30 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan resolved HIVE-2702.


Resolution: Fixed

Committed to trunk. Thanks, Sergey!

 Enhance listPartitionsByFilter to add support for integral types both for 
 equality and non-equality
 ---

 Key: HIVE-2702
 URL: https://issues.apache.org/jira/browse/HIVE-2702
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.8.1
Reporter: Aniket Mokashi
Assignee: Sergey Shelukhin
 Fix For: 0.12.0

 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2702.D2043.1.patch, 
 HIVE-2702.1.patch, HIVE-2702.D11715.1.patch, HIVE-2702.D11715.2.patch, 
 HIVE-2702.D11715.3.patch, HIVE-2702.D11847.1.patch, HIVE-2702.D11847.2.patch, 
 HIVE-2702.patch, HIVE-2702-v0.patch


 listPartitionsByFilter supports only non-string partitions. This is because 
 its explicitly specified in generateJDOFilterOverPartitions in 
 ExpressionTree.java. 
 //Can only support partitions whose types are string
   if( ! table.getPartitionKeys().get(partitionColumnIndex).
   
 getType().equals(org.apache.hadoop.hive.serde.Constants.STRING_TYPE_NAME) ) {
 throw new MetaException
 (Filtering is supported only on partition keys of type string);
   }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4956) Allow multiple tables in from clause if all them have the same schema, but can be partitioned differently

2013-07-30 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724133#comment-13724133
 ] 

Xuefu Zhang commented on HIVE-4956:
---

The syntax, select ... from T1, T2 ... without join, might cause semantic 
confusion as in some databases it really means crossing (Cartesian production), 
which has a different meaning from yours. From a database point of view, A 
table is a table, and two table are two tables. Treating two tables as one 
seems going beyond what SQL defines. It might be conceptually clearer if we 
allow tables have heterogeneous partitions. Of course, this may be more 
involved.

 Allow multiple tables in from clause if all them have the same schema, but 
 can be partitioned differently
 -

 Key: HIVE-4956
 URL: https://issues.apache.org/jira/browse/HIVE-4956
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu

 We have a usecase where the table storage partitioning changes over time.
 For ex:
  we can have a table T1 which is partitioned by p1. But overtime, we want to 
 partition the table on p1 and p2 as well. The new table can be T2. So, if we 
 have to query table on partition p1, it will be a union query across two 
 table T1 and T2. Especially with aggregations like avg, it becomes costly 
 union query because we cannot make use of mapside aggregations and other 
 optimizations.
 The proposal is to support queries of the following format :
 select t.x, t.y,  from T1,T2 t where t.p1='x' OR t.p1='y' ... 
 [groupby-clause] [having-clause] [orderby-clause] and so on.
 Here we allow from clause as a comma separated list of tables with an alias 
 and alias will be used in the full query, and partition pruning will happen 
 on the actual tables to pick up the right paths. This will work because the 
 difference is only on picking up the input paths and whole operator tree does 
 not change. If this sounds a good usecase, I can put up the changes required 
 to support the same.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4956) Allow multiple tables in from clause if all them have the same schema, but can be partitioned differently

2013-07-30 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724139#comment-13724139
 ] 

Ashutosh Chauhan commented on HIVE-4956:


Completely agreed with [~xuefuz] Lets not redefine sql semantics.

 Allow multiple tables in from clause if all them have the same schema, but 
 can be partitioned differently
 -

 Key: HIVE-4956
 URL: https://issues.apache.org/jira/browse/HIVE-4956
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu

 We have a usecase where the table storage partitioning changes over time.
 For ex:
  we can have a table T1 which is partitioned by p1. But overtime, we want to 
 partition the table on p1 and p2 as well. The new table can be T2. So, if we 
 have to query table on partition p1, it will be a union query across two 
 table T1 and T2. Especially with aggregations like avg, it becomes costly 
 union query because we cannot make use of mapside aggregations and other 
 optimizations.
 The proposal is to support queries of the following format :
 select t.x, t.y,  from T1,T2 t where t.p1='x' OR t.p1='y' ... 
 [groupby-clause] [having-clause] [orderby-clause] and so on.
 Here we allow from clause as a comma separated list of tables with an alias 
 and alias will be used in the full query, and partition pruning will happen 
 on the actual tables to pick up the right paths. This will work because the 
 difference is only on picking up the input paths and whole operator tree does 
 not change. If this sounds a good usecase, I can put up the changes required 
 to support the same.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2564) Set dbname at JDBC URL or properties

2013-07-30 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-2564:
---

Status: Open  (was: Patch Available)

Hey guys, I am excited about this patch! However, since there are test failures 
I am going to remove the Patch Available status. As soon as you have a 
another one please mark it Patch Available.

 Set dbname at JDBC URL or properties
 

 Key: HIVE-2564
 URL: https://issues.apache.org/jira/browse/HIVE-2564
 Project: Hive
  Issue Type: Improvement
  Components: JDBC
Affects Versions: 0.7.1
Reporter: Shinsuke Sugaya
 Attachments: HIVE-2564.1.patch, hive-2564.patch


 The current Hive implementation ignores a database name at JDBC URL, 
 though we can set it by executing use DBNAME statement.
 I think it is better to also specify a database name at JDBC URL or database 
 properties.
 Therefore, I'll attach the patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4051) Hive's metastore suffers from 1+N queries when querying partitions is slow

2013-07-30 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724152#comment-13724152
 ] 

Sergey Shelukhin commented on HIVE-4051:


I've fixed most of the queries, there are a couple of bugs and sorting is 
undefined (and changed as we no longer sort by partition name) in some tests, 
couple stubborn ones remain, hopefully will update today

 Hive's metastore suffers from 1+N queries when querying partitions  is slow
 

 Key: HIVE-4051
 URL: https://issues.apache.org/jira/browse/HIVE-4051
 Project: Hive
  Issue Type: Bug
  Components: Clients, Metastore
 Environment: RHEL 6.3 / EC2 C1.XL
Reporter: Gopal V
Assignee: Sergey Shelukhin
 Attachments: HIVE-4051.D11805.1.patch, HIVE-4051.D11805.2.patch


 Hive's query client takes a long time to initialize  start planning queries 
 because of delays in creating all the MTable/MPartition objects.
 For a hive db with 1800 partitions, the metastore took 6-7 seconds to 
 initialize - firing approximately 5900 queries to the mysql database.
 Several of those queries fetch exactly one row to create a single object on 
 the client.
 The following 12 queries were repeated for each partition, generating a storm 
 of SQL queries 
 {code}
 4 Query SELECT 
 `A0`.`SD_ID`,`B0`.`INPUT_FORMAT`,`B0`.`IS_COMPRESSED`,`B0`.`IS_STOREDASSUBDIRECTORIES`,`B0`.`LOCATION`,`B0`.`NUM_BUCKETS`,`B0`.`OUTPUT_FORMAT`,`B0`.`SD_ID`
  FROM `PARTITIONS` `A0` LEFT OUTER JOIN `SDS` `B0` ON `A0`.`SD_ID` = 
 `B0`.`SD_ID` WHERE `A0`.`PART_ID` = 3945
 4 Query SELECT `A0`.`CD_ID`,`B0`.`CD_ID` FROM `SDS` `A0` LEFT OUTER JOIN 
 `CDS` `B0` ON `A0`.`CD_ID` = `B0`.`CD_ID` WHERE `A0`.`SD_ID` =4871
 4 Query SELECT COUNT(*) FROM `COLUMNS_V2` THIS WHERE THIS.`CD_ID`=1546 
 AND THIS.`INTEGER_IDX`=0
 4 Query SELECT 
 `A0`.`COMMENT`,`A0`.`COLUMN_NAME`,`A0`.`TYPE_NAME`,`A0`.`INTEGER_IDX` AS 
 NUCORDER0 FROM `COLUMNS_V2` `A0` WHERE `A0`.`CD_ID` = 1546 AND 
 `A0`.`INTEGER_IDX` = 0 ORDER BY NUCORDER0
 4 Query SELECT `A0`.`SERDE_ID`,`B0`.`NAME`,`B0`.`SLIB`,`B0`.`SERDE_ID` 
 FROM `SDS` `A0` LEFT OUTER JOIN `SERDES` `B0` ON `A0`.`SERDE_ID` = 
 `B0`.`SERDE_ID` WHERE `A0`.`SD_ID` =4871
 4 Query SELECT COUNT(*) FROM `SORT_COLS` THIS WHERE THIS.`SD_ID`=4871 AND 
 THIS.`INTEGER_IDX`=0
 4 Query SELECT `A0`.`COLUMN_NAME`,`A0`.`ORDER`,`A0`.`INTEGER_IDX` AS 
 NUCORDER0 FROM `SORT_COLS` `A0` WHERE `A0`.`SD_ID` =4871 AND 
 `A0`.`INTEGER_IDX` = 0 ORDER BY NUCORDER0
 4 Query SELECT COUNT(*) FROM `SKEWED_VALUES` THIS WHERE 
 THIS.`SD_ID_OID`=4871 AND THIS.`INTEGER_IDX`=0
 4 Query SELECT 'org.apache.hadoop.hive.metastore.model.MStringList' AS 
 NUCLEUS_TYPE,`A1`.`STRING_LIST_ID`,`A0`.`INTEGER_IDX` AS NUCORDER0 FROM 
 `SKEWED_VALUES` `A0` INNER JOIN `SKEWED_STRING_LIST` `A1` ON 
 `A0`.`STRING_LIST_ID_EID` = `A1`.`STRING_LIST_ID` WHERE `A0`.`SD_ID_OID` 
 =4871 AND `A0`.`INTEGER_IDX` = 0 ORDER BY NUCORDER0
 4 Query SELECT COUNT(*) FROM `SKEWED_COL_VALUE_LOC_MAP` WHERE `SD_ID` 
 =4871 AND `STRING_LIST_ID_KID` IS NOT NULL
 4 Query SELECT 'org.apache.hadoop.hive.metastore.model.MStringList' AS 
 NUCLEUS_TYPE,`A0`.`STRING_LIST_ID` FROM `SKEWED_STRING_LIST` `A0` INNER JOIN 
 `SKEWED_COL_VALUE_LOC_MAP` `B0` ON `A0`.`STRING_LIST_ID` = 
 `B0`.`STRING_LIST_ID_KID` WHERE `B0`.`SD_ID` =4871
 4 Query SELECT `A0`.`STRING_LIST_ID_KID`,`A0`.`LOCATION` FROM 
 `SKEWED_COL_VALUE_LOC_MAP` `A0` WHERE `A0`.`SD_ID` =4871 AND NOT 
 (`A0`.`STRING_LIST_ID_KID` IS NULL)
 {code}
 This data is not detached or cached, so this operation is performed during 
 every query plan for the partitions, even in the same hive client.
 The queries are automatically generated by JDO/DataNucleus which makes it 
 nearly impossible to rewrite it into a single denormalized join operation  
 process it locally.
 Attempts to optimize this with JDO fetch-groups did not bear fruit in 
 improving the query count.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4888) listPartitionsByFilter doesn't support lt/gt/lte/gte

2013-07-30 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724158#comment-13724158
 ] 

Sergey Shelukhin commented on HIVE-4888:


There doesn't appear to be any equivalent of Long.parse in DN/JDO. Casts are 
not it. 
I am playing with writing a DN SQLMethod plugin, but it would only work if DN 
is backed by SQL store as far as I see. I will file DN jira. HIVE-4051 makes 
pushdown work for SQL, so I might end up doing HIVE-4914 instead and having 
server decide to do pushdown to SQL or no pushdown for JDOQL for those. Depends 
on how seamless the plugin would be.

 listPartitionsByFilter doesn't support lt/gt/lte/gte
 

 Key: HIVE-4888
 URL: https://issues.apache.org/jira/browse/HIVE-4888
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

 Filter pushdown could be improved. Based on my experiments there's no 
 reasonable way to do it with DN 2.0, due to DN bug in substring and 
 Collection.get(int) not being implemented.
 With version as low as 2.1 we can use values.get on partition to extract 
 values to compare to. Type compatibility is an issue, but is easy for strings 
 and integral values.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4051) Hive's metastore suffers from 1+N queries when querying partitions is slow

2013-07-30 Thread Laurent Chouinard (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724160#comment-13724160
 ] 

Laurent Chouinard commented on HIVE-4051:
-

Hi,

I will be on vacation from July 29th to August 6th inclusively. For any 
question or emergency, please contact the group 
mtl-it-production-to...@ubisoft.commailto:mtl-it-production-to...@ubisoft.com

Thanks.

Laurent Chouinard
IT Production - Tools Programmer




 Hive's metastore suffers from 1+N queries when querying partitions  is slow
 

 Key: HIVE-4051
 URL: https://issues.apache.org/jira/browse/HIVE-4051
 Project: Hive
  Issue Type: Bug
  Components: Clients, Metastore
 Environment: RHEL 6.3 / EC2 C1.XL
Reporter: Gopal V
Assignee: Sergey Shelukhin
 Attachments: HIVE-4051.D11805.1.patch, HIVE-4051.D11805.2.patch


 Hive's query client takes a long time to initialize  start planning queries 
 because of delays in creating all the MTable/MPartition objects.
 For a hive db with 1800 partitions, the metastore took 6-7 seconds to 
 initialize - firing approximately 5900 queries to the mysql database.
 Several of those queries fetch exactly one row to create a single object on 
 the client.
 The following 12 queries were repeated for each partition, generating a storm 
 of SQL queries 
 {code}
 4 Query SELECT 
 `A0`.`SD_ID`,`B0`.`INPUT_FORMAT`,`B0`.`IS_COMPRESSED`,`B0`.`IS_STOREDASSUBDIRECTORIES`,`B0`.`LOCATION`,`B0`.`NUM_BUCKETS`,`B0`.`OUTPUT_FORMAT`,`B0`.`SD_ID`
  FROM `PARTITIONS` `A0` LEFT OUTER JOIN `SDS` `B0` ON `A0`.`SD_ID` = 
 `B0`.`SD_ID` WHERE `A0`.`PART_ID` = 3945
 4 Query SELECT `A0`.`CD_ID`,`B0`.`CD_ID` FROM `SDS` `A0` LEFT OUTER JOIN 
 `CDS` `B0` ON `A0`.`CD_ID` = `B0`.`CD_ID` WHERE `A0`.`SD_ID` =4871
 4 Query SELECT COUNT(*) FROM `COLUMNS_V2` THIS WHERE THIS.`CD_ID`=1546 
 AND THIS.`INTEGER_IDX`=0
 4 Query SELECT 
 `A0`.`COMMENT`,`A0`.`COLUMN_NAME`,`A0`.`TYPE_NAME`,`A0`.`INTEGER_IDX` AS 
 NUCORDER0 FROM `COLUMNS_V2` `A0` WHERE `A0`.`CD_ID` = 1546 AND 
 `A0`.`INTEGER_IDX` = 0 ORDER BY NUCORDER0
 4 Query SELECT `A0`.`SERDE_ID`,`B0`.`NAME`,`B0`.`SLIB`,`B0`.`SERDE_ID` 
 FROM `SDS` `A0` LEFT OUTER JOIN `SERDES` `B0` ON `A0`.`SERDE_ID` = 
 `B0`.`SERDE_ID` WHERE `A0`.`SD_ID` =4871
 4 Query SELECT COUNT(*) FROM `SORT_COLS` THIS WHERE THIS.`SD_ID`=4871 AND 
 THIS.`INTEGER_IDX`=0
 4 Query SELECT `A0`.`COLUMN_NAME`,`A0`.`ORDER`,`A0`.`INTEGER_IDX` AS 
 NUCORDER0 FROM `SORT_COLS` `A0` WHERE `A0`.`SD_ID` =4871 AND 
 `A0`.`INTEGER_IDX` = 0 ORDER BY NUCORDER0
 4 Query SELECT COUNT(*) FROM `SKEWED_VALUES` THIS WHERE 
 THIS.`SD_ID_OID`=4871 AND THIS.`INTEGER_IDX`=0
 4 Query SELECT 'org.apache.hadoop.hive.metastore.model.MStringList' AS 
 NUCLEUS_TYPE,`A1`.`STRING_LIST_ID`,`A0`.`INTEGER_IDX` AS NUCORDER0 FROM 
 `SKEWED_VALUES` `A0` INNER JOIN `SKEWED_STRING_LIST` `A1` ON 
 `A0`.`STRING_LIST_ID_EID` = `A1`.`STRING_LIST_ID` WHERE `A0`.`SD_ID_OID` 
 =4871 AND `A0`.`INTEGER_IDX` = 0 ORDER BY NUCORDER0
 4 Query SELECT COUNT(*) FROM `SKEWED_COL_VALUE_LOC_MAP` WHERE `SD_ID` 
 =4871 AND `STRING_LIST_ID_KID` IS NOT NULL
 4 Query SELECT 'org.apache.hadoop.hive.metastore.model.MStringList' AS 
 NUCLEUS_TYPE,`A0`.`STRING_LIST_ID` FROM `SKEWED_STRING_LIST` `A0` INNER JOIN 
 `SKEWED_COL_VALUE_LOC_MAP` `B0` ON `A0`.`STRING_LIST_ID` = 
 `B0`.`STRING_LIST_ID_KID` WHERE `B0`.`SD_ID` =4871
 4 Query SELECT `A0`.`STRING_LIST_ID_KID`,`A0`.`LOCATION` FROM 
 `SKEWED_COL_VALUE_LOC_MAP` `A0` WHERE `A0`.`SD_ID` =4871 AND NOT 
 (`A0`.`STRING_LIST_ID_KID` IS NULL)
 {code}
 This data is not detached or cached, so this operation is performed during 
 every query plan for the partitions, even in the same hive client.
 The queries are automatically generated by JDO/DataNucleus which makes it 
 nearly impossible to rewrite it into a single denormalized join operation  
 process it locally.
 Attempts to optimize this with JDO fetch-groups did not bear fruit in 
 improving the query count.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HIVE-4957) Restrict number of bit vectors, to prevent out of Java heap memory

2013-07-30 Thread Shreepadma Venugopalan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan reassigned HIVE-4957:


Assignee: Shreepadma Venugopalan

 Restrict number of bit vectors, to prevent out of Java heap memory
 --

 Key: HIVE-4957
 URL: https://issues.apache.org/jira/browse/HIVE-4957
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0
Reporter: Brock Noland
Assignee: Shreepadma Venugopalan

 normally increase number of bit vectors will increase calculation accuracy. 
 Let's say
 {noformat}
 select compute_stats(a, 40) from test_hive;
 {noformat}
 generally get better accuracy than
 {noformat}
 select compute_stats(a, 16) from test_hive;
 {noformat}
 But larger number of bit vectors also cause query run slower. When number of 
 bit vectors over 50, it won't help to increase accuracy anymore. But it still 
 increase memory usage, and crash Hive if number if too huge. Current Hive 
 doesn't prevent user use ridiculous large number of bit vectors in 
 'compute_stats' query.
 One example
 {noformat}
 select compute_stats(a, 9) from column_eight_types;
 {noformat}
 crashes Hive.
 {noformat}
 2012-12-20 23:21:52,247 Stage-1 map = 0%,  reduce = 0%
 2012-12-20 23:22:11,315 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.29 
 sec
 MapReduce Total cumulative CPU time: 290 msec
 Ended Job = job_1354923204155_0777 with errors
 Error during job, obtaining debugging information...
 Job Tracking URL: 
 http://cs-10-20-81-171.cloud.cloudera.com:8088/proxy/application_1354923204155_0777/
 Examining task ID: task_1354923204155_0777_m_00 (and more) from job 
 job_1354923204155_0777
 Task with the most failures(4): 
 -
 Task ID:
   task_1354923204155_0777_m_00
 URL:
   
 http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1354923204155_0777tipid=task_1354923204155_0777_m_00
 -
 Diagnostic Messages for this Task:
 Error: Java heap space
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4395) Support TFetchOrientation.FIRST for HiveServer2 FetchResults

2013-07-30 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-4395:
---

Status: Open  (was: Patch Available)

Prasad, I am cancelling this patch since it doesn't apply. When we have a patch 
that applies please change it to patch available!

 Support TFetchOrientation.FIRST for HiveServer2 FetchResults
 

 Key: HIVE-4395
 URL: https://issues.apache.org/jira/browse/HIVE-4395
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Affects Versions: 0.11.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Attachments: HIVE-4395-1.patch, HIVE-4395.1.patch


 Currently HiveServer2 only support fetching next row 
 (TFetchOrientation.NEXT). This ticket is to implement support for 
 TFetchOrientation.FIRST that resets the fetch position at the begining of the 
 resultset. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Work started] (HIVE-4957) Restrict number of bit vectors, to prevent out of Java heap memory

2013-07-30 Thread Shreepadma Venugopalan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-4957 started by Shreepadma Venugopalan.

 Restrict number of bit vectors, to prevent out of Java heap memory
 --

 Key: HIVE-4957
 URL: https://issues.apache.org/jira/browse/HIVE-4957
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0
Reporter: Brock Noland
Assignee: Shreepadma Venugopalan

 normally increase number of bit vectors will increase calculation accuracy. 
 Let's say
 {noformat}
 select compute_stats(a, 40) from test_hive;
 {noformat}
 generally get better accuracy than
 {noformat}
 select compute_stats(a, 16) from test_hive;
 {noformat}
 But larger number of bit vectors also cause query run slower. When number of 
 bit vectors over 50, it won't help to increase accuracy anymore. But it still 
 increase memory usage, and crash Hive if number if too huge. Current Hive 
 doesn't prevent user use ridiculous large number of bit vectors in 
 'compute_stats' query.
 One example
 {noformat}
 select compute_stats(a, 9) from column_eight_types;
 {noformat}
 crashes Hive.
 {noformat}
 2012-12-20 23:21:52,247 Stage-1 map = 0%,  reduce = 0%
 2012-12-20 23:22:11,315 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.29 
 sec
 MapReduce Total cumulative CPU time: 290 msec
 Ended Job = job_1354923204155_0777 with errors
 Error during job, obtaining debugging information...
 Job Tracking URL: 
 http://cs-10-20-81-171.cloud.cloudera.com:8088/proxy/application_1354923204155_0777/
 Examining task ID: task_1354923204155_0777_m_00 (and more) from job 
 job_1354923204155_0777
 Task with the most failures(4): 
 -
 Task ID:
   task_1354923204155_0777_m_00
 URL:
   
 http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1354923204155_0777tipid=task_1354923204155_0777_m_00
 -
 Diagnostic Messages for this Task:
 Error: Java heap space
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4388) HBase tests fail against Hadoop 2

2013-07-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724239#comment-13724239
 ] 

Hive QA commented on HIVE-4388:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12594972/HIVE-4388.patch

{color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 2737 tests 
executed
*Failed tests:*
{noformat}
org.apache.hcatalog.hbase.TestHBaseBulkOutputFormat.bulkModeHCatOutputFormatTestWithDefaultDB
org.apache.hcatalog.hbase.TestHBaseInputFormat.TestHBaseTableIgnoreAbortedAndRunningTransactions
org.apache.hcatalog.hbase.TestHBaseInputFormat.TestHBaseTableIgnoreAbortedTransactions
org.apache.hcatalog.hbase.TestHBaseDirectOutputFormat.directModeAbortTest
org.apache.hcatalog.hbase.TestHBaseInputFormat.TestHBaseTableProjectionReadMR
org.apache.hcatalog.hbase.TestHBaseBulkOutputFormat.hbaseBulkOutputFormatTest
org.apache.hcatalog.hbase.TestHBaseBulkOutputFormat.importSequenceFileTest
org.apache.hcatalog.hbase.TestHBaseInputFormat.TestHBaseInputFormatProjectionReadMR
org.apache.hcatalog.hbase.TestHBaseBulkOutputFormat.bulkModeHCatOutputFormatTest
org.apache.hcatalog.hbase.TestHBaseInputFormat.TestHBaseTableReadMR
org.apache.hcatalog.hbase.TestHBaseBulkOutputFormat.bulkModeAbortTest
org.apache.hcatalog.hbase.TestHBaseDirectOutputFormat.directHCatOutputFormatTest
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/243/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/243/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests failed with: TestsFailedException: 12 tests failed
{noformat}

This message is automatically generated.

 HBase tests fail against Hadoop 2
 -

 Key: HIVE-4388
 URL: https://issues.apache.org/jira/browse/HIVE-4388
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Brock Noland
 Attachments: HIVE-4388.patch, HIVE-4388.patch, HIVE-4388-wip.txt


 Currently we're building by default against 0.92. When you run against hadoop 
 2 (-Dhadoop.mr.rev=23) builds fail because of: HBASE-5963.
 HIVE-3861 upgrades the version of hbase used. This will get you past the 
 problem in HBASE-5963 (which was fixed in 0.94.1) but fails with: HBASE-6396.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4959) Vectorized plan generation should be added as an optimization transform.

2013-07-30 Thread Jitendra Nath Pandey (JIRA)
Jitendra Nath Pandey created HIVE-4959:
--

 Summary: Vectorized plan generation should be added as an 
optimization transform.
 Key: HIVE-4959
 URL: https://issues.apache.org/jira/browse/HIVE-4959
 Project: Hive
  Issue Type: Sub-task
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey


Currently the query plan is vectorized at the query run time in the map task. 
It will be much cleaner to add vectorization as an optimization step.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4844) Add char/varchar data types

2013-07-30 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-4844:
-

Attachment: HIVE-4844.1.patch.hack

Initial patch of progress, as other folks may be interested in type parameters 
for HIVE-3976.

 Add char/varchar data types
 ---

 Key: HIVE-4844
 URL: https://issues.apache.org/jira/browse/HIVE-4844
 Project: Hive
  Issue Type: New Feature
  Components: Types
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-4844.1.patch.hack


 Add new char/varchar data types which have support for more SQL-compliant 
 behavior, such as SQL string comparison semantics, max length, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3976) Support specifying scale and precision with Hive decimal type

2013-07-30 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724260#comment-13724260
 ] 

Jason Dere commented on HIVE-3976:
--

I've attached a patch to HIVE-4844, containing the current progress.

 Support specifying scale and precision with Hive decimal type
 -

 Key: HIVE-3976
 URL: https://issues.apache.org/jira/browse/HIVE-3976
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor, Types
Reporter: Mark Grover
Assignee: Xuefu Zhang

 HIVE-2693 introduced support for Decimal datatype in Hive. However, the 
 current implementation has unlimited precision and provides no way to specify 
 precision and scale when creating the table.
 For example, MySQL allows users to specify scale and precision of the decimal 
 datatype when creating the table:
 {code}
 CREATE TABLE numbers (a DECIMAL(20,2));
 {code}
 Hive should support something similar too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4920) PTest2 handle Spot Price increases gracefully and improve rsync paralllelsim

2013-07-30 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-4920:
---

Attachment: HIVE-4920.patch

Trivial update to sort failed tests. 

 PTest2 handle Spot Price increases gracefully and improve rsync paralllelsim
 

 Key: HIVE-4920
 URL: https://issues.apache.org/jira/browse/HIVE-4920
 Project: Hive
  Issue Type: Improvement
Reporter: Brock Noland
Assignee: Brock Noland
Priority: Critical
 Attachments: HIVE-4920.patch, HIVE-4920.patch, HIVE-4920.patch, 
 HIVE-4920.patch, Screen Shot 2013-07-23 at 3.35.00 PM.png


 We should handle spot price increases more gracefully and parallelize rsync 
 to slaves better
 NO PRECOMMIT TESTS

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2137) JDBC driver doesn't encode string properly.

2013-07-30 Thread Kousuke Saruta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated HIVE-2137:
-

Attachment: HIVE-2137.patch

I forgot to drop the table I added in tearDown.
Thank you for your advice, tamtam180!

 JDBC driver doesn't encode string properly.
 ---

 Key: HIVE-2137
 URL: https://issues.apache.org/jira/browse/HIVE-2137
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.9.0
Reporter: Jin Adachi
  Labels: patch
 Fix For: 0.12.0

 Attachments: HIVE-2137.patch, HIVE-2137.patch, HIVE-2137.patch


 JDBC driver for HiveServer1 decodes string by client side default encoding, 
 which depends on operating system unless we don't specify another encoding. 
 It ignore server side encoding. 
 For example, 
 when server side operating system and encoding are Linux (utf-8) and client 
 side operating system and encoding are Windows (shift-jis : it's japanese 
 charset, makes character corruption happens in the client.
 In current implementation of Hive, UTF-8 appears to be expected in server 
 side so client side should encode/decode string as UTF-8.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Review Request 12795: [HIVE-4827] Merge a Map-only job to its following MapReduce job with multiple inputs

2013-07-30 Thread Yin Huai

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/12795/
---

(Updated July 30, 2013, 7:35 p.m.)


Review request for hive.


Bugs: HIVE-4827
https://issues.apache.org/jira/browse/HIVE-4827


Repository: hive-git


Description
---

https://issues.apache.org/jira/browse/HIVE-4827


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java cb59560 
  conf/hive-default.xml.template e0b7f5c 
  ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorUtils.java 66b84ff 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java bf224e0 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/QueryPlanTreeTransformation.java
 f704ec1 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/CommonJoinTaskDispatcher.java
 d532bb1 
  ql/src/test/queries/clientpositive/auto_join33.q 5c85842 
  ql/src/test/queries/clientpositive/correlationoptimizer1.q 2adf855 
  ql/src/test/queries/clientpositive/correlationoptimizer3.q fcbb764 
  ql/src/test/queries/clientpositive/correlationoptimizer4.q 0e84cb7 
  ql/src/test/queries/clientpositive/correlationoptimizer5.q 1900f5d 
  ql/src/test/queries/clientpositive/correlationoptimizer6.q 88d790c 
  ql/src/test/queries/clientpositive/correlationoptimizer7.q 9b18972 
  ql/src/test/queries/clientpositive/multiMapJoin1.q 86b0586 
  ql/src/test/queries/clientpositive/multiMapJoin2.q PRE-CREATION 
  ql/src/test/queries/clientpositive/union34.q a88e395 
  ql/src/test/results/clientpositive/auto_join0.q.out c48181d 
  ql/src/test/results/clientpositive/auto_join10.q.out deb8eb5 
  ql/src/test/results/clientpositive/auto_join11.q.out 82bc3f9 
  ql/src/test/results/clientpositive/auto_join12.q.out 1a170cb 
  ql/src/test/results/clientpositive/auto_join13.q.out 948ca70 
  ql/src/test/results/clientpositive/auto_join15.q.out aa40cff 
  ql/src/test/results/clientpositive/auto_join16.q.out 06d73d8 
  ql/src/test/results/clientpositive/auto_join2.q.out a11f347 
  ql/src/test/results/clientpositive/auto_join20.q.out cae120a 
  ql/src/test/results/clientpositive/auto_join21.q.out 423094d 
  ql/src/test/results/clientpositive/auto_join22.q.out 6f418db 
  ql/src/test/results/clientpositive/auto_join23.q.out 6a6bc6c 
  ql/src/test/results/clientpositive/auto_join24.q.out c7e872e 
  ql/src/test/results/clientpositive/auto_join26.q.out 7268755 
  ql/src/test/results/clientpositive/auto_join28.q.out 89db4aa 
  ql/src/test/results/clientpositive/auto_join29.q.out c3744f3 
  ql/src/test/results/clientpositive/auto_join32.q.out 312664a 
  ql/src/test/results/clientpositive/auto_join33.q.out 8fc0e84 
  ql/src/test/results/clientpositive/auto_sortmerge_join_10.q.out da375f6 
  ql/src/test/results/clientpositive/auto_sortmerge_join_11.q.out 9769bd8 
  ql/src/test/results/clientpositive/auto_sortmerge_join_12.q.out 5c4ba5b 
  ql/src/test/results/clientpositive/auto_sortmerge_join_9.q.out 6add99a 
  ql/src/test/results/clientpositive/correlationoptimizer1.q.out db3bd78 
  ql/src/test/results/clientpositive/correlationoptimizer3.q.out cfa7eff 
  ql/src/test/results/clientpositive/correlationoptimizer4.q.out 285a54f 
  ql/src/test/results/clientpositive/correlationoptimizer6.q.out b0438e6 
  ql/src/test/results/clientpositive/correlationoptimizer7.q.out f8db2bf 
  ql/src/test/results/clientpositive/join28.q.out 60165e2 
  ql/src/test/results/clientpositive/join32.q.out 41d183b 
  ql/src/test/results/clientpositive/join33.q.out 41d183b 
  ql/src/test/results/clientpositive/join_star.q.out 797b892 
  ql/src/test/results/clientpositive/mapjoin_filter_on_outerjoin.q.out 0fab62f 
  ql/src/test/results/clientpositive/mapjoin_mapjoin.q.out 2f5f613 
  ql/src/test/results/clientpositive/mapjoin_subquery.q.out 8243c2c 
  ql/src/test/results/clientpositive/mapjoin_subquery2.q.out 292abe4 
  ql/src/test/results/clientpositive/mapjoin_test_outer.q.out 37817d9 
  ql/src/test/results/clientpositive/multiMapJoin1.q.out a3f5c53 
  ql/src/test/results/clientpositive/multiMapJoin2.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/multi_join_union.q.out 5182bdf 
  ql/src/test/results/clientpositive/union34.q.out 166062a 

Diff: https://reviews.apache.org/r/12795/diff/


Testing
---

Running tests.


Thanks,

Yin Huai



[jira] [Updated] (HIVE-4827) Merge a Map-only job to its following MapReduce job with multiple inputs

2013-07-30 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-4827:
---

Attachment: HIVE-4827.6.patch

update

 Merge a Map-only job to its following MapReduce job with multiple inputs
 

 Key: HIVE-4827
 URL: https://issues.apache.org/jira/browse/HIVE-4827
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Yin Huai
Assignee: Yin Huai
 Attachments: HIVE-4827.1.patch, HIVE-4827.2.patch, HIVE-4827.3.patch, 
 HIVE-4827.4.patch, HIVE-4827.5.patch, HIVE-4827.6.patch


 When hive.optimize.mapjoin.mapreduce is on, CommonJoinResolver can attach a 
 Map-only job (MapJoin) to its following MapReduce job. But this merge only 
 happens when the MapReduce job has a single input. With Correlation Optimizer 
 (HIVE-2206), it is possible that the MapReduce job can have multiple inputs 
 (for multiple operation paths). It is desired to improve CommonJoinResolver 
 to merge a Map-only job to the corresponding Map task of the MapReduce job.
 Example:
 {code:sql}
 set hive.optimize.correlation=true;
 set hive.auto.convert.join=true;
 set hive.optimize.mapjoin.mapreduce=true;
 SELECT tmp1.key, count(*)
 FROM (SELECT x1.key1 AS key
   FROM bigTable1 x1 JOIN smallTable1 y1 ON (x1.key1 = y1.key1)
   GROUP BY x1.key1) tmp1
 JOIN (SELECT x2.key2 AS key
   FROM bigTable2 x2 JOIN smallTable2 y2 ON (x2.key2 = y2.key2)
   GROUP BY x2.key2) tmp2
 ON (tmp1.key = tmp2.key)
 GROUP BY tmp1.key;
 {\code}
 In this query, join operations inside tmp1 and tmp2 will be converted to two 
 MapJoins. With Correlation Optimizer, aggregations in tmp1, tmp2, and join of 
 tmp1 and tmp2, and the last aggregation will be executed in the same 
 MapReduce job (Reduce side). Since this MapReduce job has two inputs, right 
 now, CommonJoinResolver cannot attach two MapJoins to the Map side of a 
 MapReduce job.
 Another example:
 {code:sql}
 SELECT tmp1.key
 FROM (SELECT x1.key2 AS key
   FROM bigTable1 x1 JOIN smallTable1 y1 ON (x1.key1 = y1.key1)
   UNION ALL
   SELECT x2.key2 AS key
   FROM bigTable2 x2 JOIN smallTable2 y2 ON (x2.key1 = y2.key1)) tmp1
 {\code}
 For this case, we will have three Map-only jobs (two for MapJoins and one for 
 Union). It will be good to use a single Map-only job to execute this query.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4952) When hive.join.emit.interval is small, queries optimized by Correlation Optimizer may generate wrong results

2013-07-30 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724316#comment-13724316
 ] 

Yin Huai commented on HIVE-4952:


seems the failed test is caused by HIVE-4955

 When hive.join.emit.interval is small, queries optimized by Correlation 
 Optimizer may generate wrong results
 

 Key: HIVE-4952
 URL: https://issues.apache.org/jira/browse/HIVE-4952
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Yin Huai
Assignee: Yin Huai
 Attachments: HIVE-4952.D11889.1.patch, replay.txt


 If we have a query like this ...
 {code:sql}
 SELECT xx.key, xx.cnt, yy.key
 FROM
 (SELECT x.key as key, count(1) as cnt FROM src1 x JOIN src1 y ON (x.key = 
 y.key) group by x.key) xx
 JOIN src yy
 ON xx.key=yy.key;
 {\code}
 After Correlation Optimizer, the operator tree in the reducer will be 
 {code}
  JOIN2
|
|
   MUX
  /   \
 / \
GBY |
 |  |
   JOIN1|
 \ /
  \   /
  DEMUX
 {\code}
 For JOIN2, the right table will arrive at this operator first. If 
 hive.join.emit.interval is small, e.g. 1, JOIN2 will output the results even 
 it has not got any row from the left table. The logic related 
 hive.join.emit.interval in JoinOperator assumes that inputs will be ordered 
 by the tag. But, if a query has been optimized by Correlation Optimizer, this 
 assumption may not hold for those JoinOperators inside the reducer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4955) serde_user_properties.q.out needs to be updated

2013-07-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724320#comment-13724320
 ] 

Hudson commented on HIVE-4955:
--

FAILURE: Integrated in Hive-trunk-hadoop2-ptest #37 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/37/])
HIVE-4955: serde_user_properties.q.out needs to be updated (Thejas M Nair via 
Brock Noland) (brock: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1508429)
* /hive/trunk/ql/src/test/results/clientpositive/serde_user_properties.q.out


 serde_user_properties.q.out needs to be updated
 ---

 Key: HIVE-4955
 URL: https://issues.apache.org/jira/browse/HIVE-4955
 Project: Hive
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.12.0

 Attachments: HIVE-4955.1.patch


 The testcase TestCliDriver.testCliDriver_serde_user_properties was added in 
 HIVE-2906, which was committed around few minutes before HIVE-4825. HIVE-4825 
 has changes that change the expected results of serde_user_properties.q, 
 causing the test to fail now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4928) Date literals do not work properly in partition spec clause

2013-07-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724322#comment-13724322
 ] 

Hudson commented on HIVE-4928:
--

FAILURE: Integrated in Hive-trunk-hadoop2-ptest #37 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/37/])
HIVE-4928 : Date literals do not work properly in partition spec clause (Jason 
Dere via Ashutosh Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1508534)
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java
* /hive/trunk/ql/src/test/queries/clientpositive/partition_date2.q
* /hive/trunk/ql/src/test/results/clientpositive/partition_date2.q.out


 Date literals do not work properly in partition spec clause
 ---

 Key: HIVE-4928
 URL: https://issues.apache.org/jira/browse/HIVE-4928
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Jason Dere
Assignee: Jason Dere
 Fix For: 0.12.0

 Attachments: HIVE-4928.1.patch.txt, HIVE-4928.D11871.1.patch


 The partition spec parsing doesn't do any actual real evaluation of the 
 values in the partition spec, instead just taking the text value of the 
 ASTNode representing the partition value. This works fine for string/numeric 
 literals (expression tree below):
 (TOK_PARTVAL region 99)
 But not for Date literals which are of form DATE '-mm-dd' (expression 
 tree below:
 (TOK_DATELITERAL '1999-12-31')
 In this case the parser/analyzer uses TOK_DATELITERAL as the partition 
 column value, when it should really get value of the child of the DATELITERAL 
 token.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4525) Support timestamps earlier than 1970 and later than 2038

2013-07-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724325#comment-13724325
 ] 

Hudson commented on HIVE-4525:
--

FAILURE: Integrated in Hive-trunk-hadoop2-ptest #37 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/37/])
HIVE-4525 : Support timestamps earlier than 1970 and later than 2038 (Mikhail 
Bautin via Ashutosh Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1508537)
* 
/hive/trunk/hcatalog/storage-handlers/hbase/src/java/org/apache/hcatalog/hbase/snapshot/RevisionManagerFactory.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToInteger.java
* 
/hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/binarysortable/BinarySortableSerDe.java
* 
/hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/io/TimestampWritable.java
* 
/hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryUtils.java
* /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/io
* 
/hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/io/TestTimestampWritable.java


 Support timestamps earlier than 1970 and later than 2038
 

 Key: HIVE-4525
 URL: https://issues.apache.org/jira/browse/HIVE-4525
 Project: Hive
  Issue Type: Bug
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Fix For: 0.12.0

 Attachments: D10755.1.patch, D10755.2.patch


 TimestampWritable currently serializes timestamps using the lower 31 bits of 
 an int. This does not allow to store timestamps earlier than 1970 or later 
 than a certain point in 2038.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2702) Enhance listPartitionsByFilter to add support for integral types both for equality and non-equality

2013-07-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724324#comment-13724324
 ] 

Hudson commented on HIVE-2702:
--

FAILURE: Integrated in Hive-trunk-hadoop2-ptest #37 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/37/])
HIVE-2702 : Enhance listPartitionsByFilter to add support for integral types 
both for equality and non-equality (Sergey Shelukhin via Ashutosh Chauhan) 
(hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1508539)
* 
/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/parser/ExpressionTree.java
* 
/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/parser/Filter.g
* 
/hive/trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
* /hive/trunk/ql/src/test/results/clientpositive/alter_partition_coltype.q.out


 Enhance listPartitionsByFilter to add support for integral types both for 
 equality and non-equality
 ---

 Key: HIVE-2702
 URL: https://issues.apache.org/jira/browse/HIVE-2702
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.8.1
Reporter: Aniket Mokashi
Assignee: Sergey Shelukhin
 Fix For: 0.12.0

 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2702.D2043.1.patch, 
 HIVE-2702.1.patch, HIVE-2702.D11715.1.patch, HIVE-2702.D11715.2.patch, 
 HIVE-2702.D11715.3.patch, HIVE-2702.D11847.1.patch, HIVE-2702.D11847.2.patch, 
 HIVE-2702.patch, HIVE-2702-v0.patch


 listPartitionsByFilter supports only non-string partitions. This is because 
 its explicitly specified in generateJDOFilterOverPartitions in 
 ExpressionTree.java. 
 //Can only support partitions whose types are string
   if( ! table.getPartitionKeys().get(partitionColumnIndex).
   
 getType().equals(org.apache.hadoop.hive.serde.Constants.STRING_TYPE_NAME) ) {
 throw new MetaException
 (Filtering is supported only on partition keys of type string);
   }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4954) PTFTranslator hardcodes ranking functions

2013-07-30 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-4954:
--

Attachment: HIVE-4879.2.patch.txt

 PTFTranslator hardcodes ranking functions
 -

 Key: HIVE-4954
 URL: https://issues.apache.org/jira/browse/HIVE-4954
 Project: Hive
  Issue Type: Improvement
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Attachments: HIVE-4879.2.patch.txt, HIVE-4954.1.patch.txt


   protected static final ArrayListString RANKING_FUNCS = new 
 ArrayListString();
   static {
 RANKING_FUNCS.add(rank);
 RANKING_FUNCS.add(dense_rank);
 RANKING_FUNCS.add(percent_rank);
 RANKING_FUNCS.add(cume_dist);
   };
 Move this logic to annotations

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4624) Integrate Vectorzied Substr into Vectorized QE

2013-07-30 Thread Eric Hanson (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Hanson updated HIVE-4624:
--

Attachment: HIVE-4624.1-vectorization.patch

 Integrate Vectorzied Substr into Vectorized QE
 --

 Key: HIVE-4624
 URL: https://issues.apache.org/jira/browse/HIVE-4624
 Project: Hive
  Issue Type: Sub-task
Reporter: Timothy Chen
Assignee: Eric Hanson
 Attachments: HIVE-4624.1-vectorization.patch


 Need to hook up the Vectorized Substr directly into Hive Vectorized QE so it 
 can be leveraged.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4960) lastAlias in CommonJoinOperator is not used

2013-07-30 Thread Yin Huai (JIRA)
Yin Huai created HIVE-4960:
--

 Summary: lastAlias in CommonJoinOperator is not used
 Key: HIVE-4960
 URL: https://issues.apache.org/jira/browse/HIVE-4960
 Project: Hive
  Issue Type: Improvement
Reporter: Yin Huai
Assignee: Yin Huai
Priority: Minor


In CommonJoinOperator, there is object called lastAlias. The initial value of 
this object is 'null'. After tracing the usage of this object, I found that 
there is no place to change the value of this object. Also, it is only used in 
processOp in JoinOperator and MapJoinOperator as
{code}
if ((lastAlias == null) || (!lastAlias.equals(alias))) {
  nextSz = joinEmitInterval;
}
{\code}
Since lastAlias will always be null, we will assign joinEmitInterval to nextSz 
every time we get a row. Later in processOp, we have 
{code}
nextSz = getNextSize(nextSz);
{\code}
Because we reset the value of nextSz to joinEmitInterval every time we get a 
row, seems that getNextSize will not be used as expected.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4624) Integrate Vectorzied Substr into Vectorized QE

2013-07-30 Thread Eric Hanson (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Hanson updated HIVE-4624:
--

Affects Version/s: vectorization-branch
   Status: Patch Available  (was: Open)

 Integrate Vectorzied Substr into Vectorized QE
 --

 Key: HIVE-4624
 URL: https://issues.apache.org/jira/browse/HIVE-4624
 Project: Hive
  Issue Type: Sub-task
Affects Versions: vectorization-branch
Reporter: Timothy Chen
Assignee: Eric Hanson
 Attachments: HIVE-4624.1-vectorization.patch


 Need to hook up the Vectorized Substr directly into Hive Vectorized QE so it 
 can be leveraged.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4624) Integrate Vectorized Substr into Vectorized QE

2013-07-30 Thread Eric Hanson (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Hanson updated HIVE-4624:
--

Summary: Integrate Vectorized Substr into Vectorized QE  (was: Integrate 
Vectorzied Substr into Vectorized QE)

 Integrate Vectorized Substr into Vectorized QE
 --

 Key: HIVE-4624
 URL: https://issues.apache.org/jira/browse/HIVE-4624
 Project: Hive
  Issue Type: Sub-task
Affects Versions: vectorization-branch
Reporter: Timothy Chen
Assignee: Eric Hanson
 Attachments: HIVE-4624.1-vectorization.patch


 Need to hook up the Vectorized Substr directly into Hive Vectorized QE so it 
 can be leveraged.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4624) Integrate Vectorzied Substr into Vectorized QE

2013-07-30 Thread Eric Hanson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724350#comment-13724350
 ] 

Eric Hanson commented on HIVE-4624:
---

This patch includes both changes to VectorizationContext to enable SUBSTR() to 
run end-to-end, but also bug fixes and unit test fixes related to 
StringSubstrColStart and StringSubstrColStartLen. I did ad hoc tests from the 
console to test a large number of variations of use of SUBSTR() in vectorized 
mode.

 Integrate Vectorzied Substr into Vectorized QE
 --

 Key: HIVE-4624
 URL: https://issues.apache.org/jira/browse/HIVE-4624
 Project: Hive
  Issue Type: Sub-task
Reporter: Timothy Chen
Assignee: Eric Hanson
 Attachments: HIVE-4624.1-vectorization.patch


 Need to hook up the Vectorized Substr directly into Hive Vectorized QE so it 
 can be leveraged.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2702) Enhance listPartitionsByFilter to add support for integral types both for equality and non-equality

2013-07-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724420#comment-13724420
 ] 

Hudson commented on HIVE-2702:
--

SUCCESS: Integrated in Hive-trunk-hadoop1-ptest #109 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/109/])
HIVE-2702 : Enhance listPartitionsByFilter to add support for integral types 
both for equality and non-equality (Sergey Shelukhin via Ashutosh Chauhan) 
(hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1508539)
* 
/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/parser/ExpressionTree.java
* 
/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/parser/Filter.g
* 
/hive/trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
* /hive/trunk/ql/src/test/results/clientpositive/alter_partition_coltype.q.out


 Enhance listPartitionsByFilter to add support for integral types both for 
 equality and non-equality
 ---

 Key: HIVE-2702
 URL: https://issues.apache.org/jira/browse/HIVE-2702
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.8.1
Reporter: Aniket Mokashi
Assignee: Sergey Shelukhin
 Fix For: 0.12.0

 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2702.D2043.1.patch, 
 HIVE-2702.1.patch, HIVE-2702.D11715.1.patch, HIVE-2702.D11715.2.patch, 
 HIVE-2702.D11715.3.patch, HIVE-2702.D11847.1.patch, HIVE-2702.D11847.2.patch, 
 HIVE-2702.patch, HIVE-2702-v0.patch


 listPartitionsByFilter supports only non-string partitions. This is because 
 its explicitly specified in generateJDOFilterOverPartitions in 
 ExpressionTree.java. 
 //Can only support partitions whose types are string
   if( ! table.getPartitionKeys().get(partitionColumnIndex).
   
 getType().equals(org.apache.hadoop.hive.serde.Constants.STRING_TYPE_NAME) ) {
 throw new MetaException
 (Filtering is supported only on partition keys of type string);
   }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3256) Update asm version in Hive

2013-07-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724416#comment-13724416
 ] 

Hudson commented on HIVE-3256:
--

SUCCESS: Integrated in Hive-trunk-hadoop1-ptest #109 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/109/])
HIVE-3256: Update asm version in Hive (Ashutosh Chauhan via Brock Noland) 
(brock: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1508506)
* /hive/trunk/ivy/libraries.properties
* /hive/trunk/metastore/ivy.xml


 Update asm version in Hive
 --

 Key: HIVE-3256
 URL: https://issues.apache.org/jira/browse/HIVE-3256
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Zhenxiao Luo
Assignee: Ashutosh Chauhan
 Fix For: 0.12.0

 Attachments: HIVE-3256.patch


 Hive trunk are currently using asm version 3.1, Hadoop trunk are on 3.2. Any
 objections to bumping the Hive version to 3.2 to be inline with Hadoop

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4525) Support timestamps earlier than 1970 and later than 2038

2013-07-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724421#comment-13724421
 ] 

Hudson commented on HIVE-4525:
--

SUCCESS: Integrated in Hive-trunk-hadoop1-ptest #109 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/109/])
HIVE-4525 : Support timestamps earlier than 1970 and later than 2038 (Mikhail 
Bautin via Ashutosh Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1508537)
* 
/hive/trunk/hcatalog/storage-handlers/hbase/src/java/org/apache/hcatalog/hbase/snapshot/RevisionManagerFactory.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFToInteger.java
* 
/hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/binarysortable/BinarySortableSerDe.java
* 
/hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/io/TimestampWritable.java
* 
/hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryUtils.java
* /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/io
* 
/hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/io/TestTimestampWritable.java


 Support timestamps earlier than 1970 and later than 2038
 

 Key: HIVE-4525
 URL: https://issues.apache.org/jira/browse/HIVE-4525
 Project: Hive
  Issue Type: Bug
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Fix For: 0.12.0

 Attachments: D10755.1.patch, D10755.2.patch


 TimestampWritable currently serializes timestamps using the lower 31 bits of 
 an int. This does not allow to store timestamps earlier than 1970 or later 
 than a certain point in 2038.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3264) Add support for binary dataype to AvroSerde

2013-07-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724418#comment-13724418
 ] 

Hudson commented on HIVE-3264:
--

SUCCESS: Integrated in Hive-trunk-hadoop1-ptest #109 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/109/])
HIVE-3264 : Add support for binary dataype to AvroSerde (Eli Reisman  Mark 
Wagner via Ashutosh Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1508528)
* /hive/trunk/data/files/csv.txt
* /hive/trunk/ql/src/test/queries/clientpositive/avro_nullable_fields.q
* /hive/trunk/ql/src/test/results/clientpositive/avro_nullable_fields.q.out
* /hive/trunk/ql/src/test/results/clientpositive/avro_schema_literal.q.out
* 
/hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java
* 
/hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerializer.java
* 
/hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/SchemaToTypeInfo.java
* 
/hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroDeserializer.java
* 
/hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroObjectInspectorGenerator.java


 Add support for binary dataype to AvroSerde
 ---

 Key: HIVE-3264
 URL: https://issues.apache.org/jira/browse/HIVE-3264
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.9.0
Reporter: Jakob Homan
Assignee: Eli Reisman
  Labels: patch
 Fix For: 0.12.0

 Attachments: HIVE-3264-1.patch, HIVE-3264-2.patch, HIVE-3264-3.patch, 
 HIVE-3264-4.patch, HIVE-3264-5.patch, HIVE-3264.6.patch, HIVE-3264.7.patch


 When the AvroSerde was written, Hive didn't have a binary type, so Avro's 
 byte array type is converted an array of small ints.  Now that HIVE-2380 is 
 in, this step isn't necessary and we can convert both Avro's bytes type and 
 probably fixed type to Hive's binary type.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4928) Date literals do not work properly in partition spec clause

2013-07-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724417#comment-13724417
 ] 

Hudson commented on HIVE-4928:
--

SUCCESS: Integrated in Hive-trunk-hadoop1-ptest #109 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/109/])
HIVE-4928 : Date literals do not work properly in partition spec clause (Jason 
Dere via Ashutosh Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1508534)
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java
* /hive/trunk/ql/src/test/queries/clientpositive/partition_date2.q
* /hive/trunk/ql/src/test/results/clientpositive/partition_date2.q.out


 Date literals do not work properly in partition spec clause
 ---

 Key: HIVE-4928
 URL: https://issues.apache.org/jira/browse/HIVE-4928
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Jason Dere
Assignee: Jason Dere
 Fix For: 0.12.0

 Attachments: HIVE-4928.1.patch.txt, HIVE-4928.D11871.1.patch


 The partition spec parsing doesn't do any actual real evaluation of the 
 values in the partition spec, instead just taking the text value of the 
 ASTNode representing the partition value. This works fine for string/numeric 
 literals (expression tree below):
 (TOK_PARTVAL region 99)
 But not for Date literals which are of form DATE '-mm-dd' (expression 
 tree below:
 (TOK_DATELITERAL '1999-12-31')
 In this case the parser/analyzer uses TOK_DATELITERAL as the partition 
 column value, when it should really get value of the child of the DATELITERAL 
 token.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4960) lastAlias in CommonJoinOperator is not used

2013-07-30 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-4960:
--

Attachment: HIVE-4960.D11895.1.patch

yhuai requested code review of HIVE-4960 [jira] lastAlias in 
CommonJoinOperator is not used.

Reviewers: JIRA

first commit

In CommonJoinOperator, there is object called lastAlias. The initial value of 
this object is 'null'. After tracing the usage of this object, I found that 
there is no place to change the value of this object. Also, it is only used in 
processOp in JoinOperator and MapJoinOperator as

if ((lastAlias == null) || (!lastAlias.equals(alias))) {
  nextSz = joinEmitInterval;
}

Since lastAlias will always be null, we will assign joinEmitInterval to nextSz 
every time we get a row. Later in processOp, we have

nextSz = getNextSize(nextSz);

Because we reset the value of nextSz to joinEmitInterval every time we get a 
row, seems that getNextSize will not be used as expected.

TEST PLAN
  EMPTY

REVISION DETAIL
  https://reviews.facebook.net/D11895

AFFECTED FILES
  ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/JoinOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java

MANAGE HERALD RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/28341/

To: JIRA, yhuai


 lastAlias in CommonJoinOperator is not used
 ---

 Key: HIVE-4960
 URL: https://issues.apache.org/jira/browse/HIVE-4960
 Project: Hive
  Issue Type: Improvement
Reporter: Yin Huai
Assignee: Yin Huai
Priority: Minor
 Attachments: HIVE-4960.D11895.1.patch


 In CommonJoinOperator, there is object called lastAlias. The initial value of 
 this object is 'null'. After tracing the usage of this object, I found that 
 there is no place to change the value of this object. Also, it is only used 
 in processOp in JoinOperator and MapJoinOperator as
 {code}
 if ((lastAlias == null) || (!lastAlias.equals(alias))) {
   nextSz = joinEmitInterval;
 }
 {\code}
 Since lastAlias will always be null, we will assign joinEmitInterval to 
 nextSz every time we get a row. Later in processOp, we have 
 {code}
 nextSz = getNextSize(nextSz);
 {\code}
 Because we reset the value of nextSz to joinEmitInterval every time we get a 
 row, seems that getNextSize will not be used as expected.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4960) lastAlias in CommonJoinOperator is not used

2013-07-30 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-4960:
---

Status: Patch Available  (was: Open)

 lastAlias in CommonJoinOperator is not used
 ---

 Key: HIVE-4960
 URL: https://issues.apache.org/jira/browse/HIVE-4960
 Project: Hive
  Issue Type: Improvement
Reporter: Yin Huai
Assignee: Yin Huai
Priority: Minor
 Attachments: HIVE-4960.D11895.1.patch


 In CommonJoinOperator, there is object called lastAlias. The initial value of 
 this object is 'null'. After tracing the usage of this object, I found that 
 there is no place to change the value of this object. Also, it is only used 
 in processOp in JoinOperator and MapJoinOperator as
 {code}
 if ((lastAlias == null) || (!lastAlias.equals(alias))) {
   nextSz = joinEmitInterval;
 }
 {\code}
 Since lastAlias will always be null, we will assign joinEmitInterval to 
 nextSz every time we get a row. Later in processOp, we have 
 {code}
 nextSz = getNextSize(nextSz);
 {\code}
 Because we reset the value of nextSz to joinEmitInterval every time we get a 
 row, seems that getNextSize will not be used as expected.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4950) Hive childSuspend is broken (debugging local hadoop jobs)

2013-07-30 Thread Laljo John Pullokkaran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-4950:
-

Status: Open  (was: Patch Available)

 Hive childSuspend is broken (debugging local hadoop jobs)
 -

 Key: HIVE-4950
 URL: https://issues.apache.org/jira/browse/HIVE-4950
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Fix For: 0.11.1

 Attachments: HIVE-4950.patch


 Hive debug has an option to suspend child JVMs, which seems to be broken 
 currently (--debug childSuspend=y). Note that this mode may be useful only 
 when running in local mode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4950) Hive childSuspend is broken (debugging local hadoop jobs)

2013-07-30 Thread Laljo John Pullokkaran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-4950:
-

Status: Patch Available  (was: Open)

 Hive childSuspend is broken (debugging local hadoop jobs)
 -

 Key: HIVE-4950
 URL: https://issues.apache.org/jira/browse/HIVE-4950
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Fix For: 0.11.1

 Attachments: HIVE-4950.patch


 Hive debug has an option to suspend child JVMs, which seems to be broken 
 currently (--debug childSuspend=y). Note that this mode may be useful only 
 when running in local mode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4827) Merge a Map-only job to its following MapReduce job with multiple inputs

2013-07-30 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724477#comment-13724477
 ] 

Gunther Hagleitner commented on HIVE-4827:
--

This looks really good. Just a few smaller things:

- Can you add tests for cascading mapjoins? Something like 5 joins that will 
produce two groups 3 + 2.
- Can you add test to show that setting the threshold to 0 effectively turns 
off the optimization?

Finally, I think it makes sense to remove the old flag since it no longer 
applies. But can you fill out the release notes section of this ticket and 
describe how 'optimize.mapjoin.mapreduce' is superseded by threshold = 0 or 
noconditionaltask?

 Merge a Map-only job to its following MapReduce job with multiple inputs
 

 Key: HIVE-4827
 URL: https://issues.apache.org/jira/browse/HIVE-4827
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Yin Huai
Assignee: Yin Huai
 Attachments: HIVE-4827.1.patch, HIVE-4827.2.patch, HIVE-4827.3.patch, 
 HIVE-4827.4.patch, HIVE-4827.5.patch, HIVE-4827.6.patch


 When hive.optimize.mapjoin.mapreduce is on, CommonJoinResolver can attach a 
 Map-only job (MapJoin) to its following MapReduce job. But this merge only 
 happens when the MapReduce job has a single input. With Correlation Optimizer 
 (HIVE-2206), it is possible that the MapReduce job can have multiple inputs 
 (for multiple operation paths). It is desired to improve CommonJoinResolver 
 to merge a Map-only job to the corresponding Map task of the MapReduce job.
 Example:
 {code:sql}
 set hive.optimize.correlation=true;
 set hive.auto.convert.join=true;
 set hive.optimize.mapjoin.mapreduce=true;
 SELECT tmp1.key, count(*)
 FROM (SELECT x1.key1 AS key
   FROM bigTable1 x1 JOIN smallTable1 y1 ON (x1.key1 = y1.key1)
   GROUP BY x1.key1) tmp1
 JOIN (SELECT x2.key2 AS key
   FROM bigTable2 x2 JOIN smallTable2 y2 ON (x2.key2 = y2.key2)
   GROUP BY x2.key2) tmp2
 ON (tmp1.key = tmp2.key)
 GROUP BY tmp1.key;
 {\code}
 In this query, join operations inside tmp1 and tmp2 will be converted to two 
 MapJoins. With Correlation Optimizer, aggregations in tmp1, tmp2, and join of 
 tmp1 and tmp2, and the last aggregation will be executed in the same 
 MapReduce job (Reduce side). Since this MapReduce job has two inputs, right 
 now, CommonJoinResolver cannot attach two MapJoins to the Map side of a 
 MapReduce job.
 Another example:
 {code:sql}
 SELECT tmp1.key
 FROM (SELECT x1.key2 AS key
   FROM bigTable1 x1 JOIN smallTable1 y1 ON (x1.key1 = y1.key1)
   UNION ALL
   SELECT x2.key2 AS key
   FROM bigTable2 x2 JOIN smallTable2 y2 ON (x2.key1 = y2.key1)) tmp1
 {\code}
 For this case, we will have three Map-only jobs (two for MapJoins and one for 
 Union). It will be good to use a single Map-only job to execute this query.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4827) Merge a Map-only job to its following MapReduce job with multiple inputs

2013-07-30 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-4827:
---

Status: Open  (was: Patch Available)

canceling patch. Will update later

 Merge a Map-only job to its following MapReduce job with multiple inputs
 

 Key: HIVE-4827
 URL: https://issues.apache.org/jira/browse/HIVE-4827
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Yin Huai
Assignee: Yin Huai
 Attachments: HIVE-4827.1.patch, HIVE-4827.2.patch, HIVE-4827.3.patch, 
 HIVE-4827.4.patch, HIVE-4827.5.patch, HIVE-4827.6.patch


 When hive.optimize.mapjoin.mapreduce is on, CommonJoinResolver can attach a 
 Map-only job (MapJoin) to its following MapReduce job. But this merge only 
 happens when the MapReduce job has a single input. With Correlation Optimizer 
 (HIVE-2206), it is possible that the MapReduce job can have multiple inputs 
 (for multiple operation paths). It is desired to improve CommonJoinResolver 
 to merge a Map-only job to the corresponding Map task of the MapReduce job.
 Example:
 {code:sql}
 set hive.optimize.correlation=true;
 set hive.auto.convert.join=true;
 set hive.optimize.mapjoin.mapreduce=true;
 SELECT tmp1.key, count(*)
 FROM (SELECT x1.key1 AS key
   FROM bigTable1 x1 JOIN smallTable1 y1 ON (x1.key1 = y1.key1)
   GROUP BY x1.key1) tmp1
 JOIN (SELECT x2.key2 AS key
   FROM bigTable2 x2 JOIN smallTable2 y2 ON (x2.key2 = y2.key2)
   GROUP BY x2.key2) tmp2
 ON (tmp1.key = tmp2.key)
 GROUP BY tmp1.key;
 {\code}
 In this query, join operations inside tmp1 and tmp2 will be converted to two 
 MapJoins. With Correlation Optimizer, aggregations in tmp1, tmp2, and join of 
 tmp1 and tmp2, and the last aggregation will be executed in the same 
 MapReduce job (Reduce side). Since this MapReduce job has two inputs, right 
 now, CommonJoinResolver cannot attach two MapJoins to the Map side of a 
 MapReduce job.
 Another example:
 {code:sql}
 SELECT tmp1.key
 FROM (SELECT x1.key2 AS key
   FROM bigTable1 x1 JOIN smallTable1 y1 ON (x1.key1 = y1.key1)
   UNION ALL
   SELECT x2.key2 AS key
   FROM bigTable2 x2 JOIN smallTable2 y2 ON (x2.key1 = y2.key1)) tmp1
 {\code}
 For this case, we will have three Map-only jobs (two for MapJoins and one for 
 Union). It will be good to use a single Map-only job to execute this query.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2137) JDBC driver doesn't encode string properly.

2013-07-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724484#comment-13724484
 ] 

Hive QA commented on HIVE-2137:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12595024/HIVE-2137.patch

{color:green}SUCCESS:{color} +1 2749 tests passed

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/245/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/245/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

 JDBC driver doesn't encode string properly.
 ---

 Key: HIVE-2137
 URL: https://issues.apache.org/jira/browse/HIVE-2137
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.9.0
Reporter: Jin Adachi
  Labels: patch
 Fix For: 0.12.0

 Attachments: HIVE-2137.patch, HIVE-2137.patch, HIVE-2137.patch


 JDBC driver for HiveServer1 decodes string by client side default encoding, 
 which depends on operating system unless we don't specify another encoding. 
 It ignore server side encoding. 
 For example, 
 when server side operating system and encoding are Linux (utf-8) and client 
 side operating system and encoding are Windows (shift-jis : it's japanese 
 charset, makes character corruption happens in the client.
 In current implementation of Hive, UTF-8 appears to be expected in server 
 side so client side should encode/decode string as UTF-8.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4055) add Date data type

2013-07-30 Thread Lars Francke (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724515#comment-13724515
 ] 

Lars Francke commented on HIVE-4055:


It'd be great if you could document this new data type in the Wiki.

 add Date data type
 --

 Key: HIVE-4055
 URL: https://issues.apache.org/jira/browse/HIVE-4055
 Project: Hive
  Issue Type: Sub-task
  Components: JDBC, Query Processor, Serializers/Deserializers, UDF
Reporter: Sun Rui
Assignee: Jason Dere
 Fix For: 0.12.0

 Attachments: Date.pdf, HIVE-4055.1.patch.txt, HIVE-4055.2.patch.txt, 
 HIVE-4055.3.patch.txt, HIVE-4055.4.patch, HIVE-4055.4.patch.txt, 
 HIVE-4055.D11547.1.patch


 Add Date data type, a new primitive data type which supports the standard SQL 
 date type.
 Basically, the implementation can take HIVE-2272 and HIVE-2957 as references.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2608) Do not require AS a,b,c part in LATERAL VIEW

2013-07-30 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-2608:
--

Attachment: HIVE-2608.D4317.7.patch

navis updated the revision HIVE-2608 [jira] Do not require AS a,b,c part in 
LATERAL VIEW.

  Marked '@Deprecated' for error message which would not be used further

Reviewers: ashutoshc, JIRA

REVISION DETAIL
  https://reviews.facebook.net/D4317

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D4317?vs=35643id=36597#toc

AFFECTED FILES
  ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/FromClauseParser.g
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
  ql/src/test/queries/clientpositive/lateral_view_noalias.q
  ql/src/test/results/clientpositive/lateral_view_noalias.q.out

To: JIRA, ashutoshc, navis
Cc: ikabiljo


 Do not require AS a,b,c part in LATERAL VIEW
 

 Key: HIVE-2608
 URL: https://issues.apache.org/jira/browse/HIVE-2608
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor, UDF
Reporter: Igor Kabiljo
Assignee: Navis
Priority: Minor
 Attachments: HIVE-2608.8.patch.txt, HIVE-2608.D4317.5.patch, 
 HIVE-2608.D4317.6.patch, HIVE-2608.D4317.7.patch


 Currently, it is required to state column names when LATERAL VIEW is used.
 That shouldn't be necessary, since UDTF returns struct which contains column 
 names - and they should be used by default.
 For example, it would be great if this was possible:
 SELECT t.*, t.key1 + t.key4
 FROM some_table
 LATERAL VIEW JSON_TUPLE(json, 'key1', 'key2', 'key3', 'key3') t;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4954) PTFTranslator hardcodes ranking functions

2013-07-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724561#comment-13724561
 ] 

Hive QA commented on HIVE-4954:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12595032/HIVE-4879.2.patch.txt

{color:green}SUCCESS:{color} +1 2749 tests passed

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/248/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/248/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

 PTFTranslator hardcodes ranking functions
 -

 Key: HIVE-4954
 URL: https://issues.apache.org/jira/browse/HIVE-4954
 Project: Hive
  Issue Type: Improvement
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Attachments: HIVE-4879.2.patch.txt, HIVE-4954.1.patch.txt


   protected static final ArrayListString RANKING_FUNCS = new 
 ArrayListString();
   static {
 RANKING_FUNCS.add(rank);
 RANKING_FUNCS.add(dense_rank);
 RANKING_FUNCS.add(percent_rank);
 RANKING_FUNCS.add(cume_dist);
   };
 Move this logic to annotations

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-896) Add LEAD/LAG/FIRST/LAST analytical windowing functions to Hive.

2013-07-30 Thread Guilherme Braccialli (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724575#comment-13724575
 ] 

Guilherme Braccialli commented on HIVE-896:
---

Harish,

I noticed that NPath class is on Hive 0.11 source and it's also a known 
function on hive. Is it working? Could you please give us a sample query? I 
tried the query below, but its not working.

Thanks.

create external table flights_tiny (ORIGIN_CITY_NAME string, DEST_CITY_NAME 
string, YEAR int, MONTH int, DAY_OF_MONTH int, ARR_DELAY float, FL_NUM string)
location '/user/x';

select npath(
'ONTIME.LATE+', 
'LATE', arr_delay  15, 
'EARLY', arr_delay  0,
'ONTIME', arr_delay =0 and arr_delay = 15,
'origin_city_name, fl_num, year, month, day_of_month, size(tpath) as sz, tpath 
as tpath'
)
from flights_tiny;

FAILED: NullPointerException null


 Add LEAD/LAG/FIRST/LAST analytical windowing functions to Hive.
 ---

 Key: HIVE-896
 URL: https://issues.apache.org/jira/browse/HIVE-896
 Project: Hive
  Issue Type: New Feature
  Components: OLAP, UDF
Reporter: Amr Awadallah
Assignee: Harish Butani
Priority: Minor
 Fix For: 0.11.0

 Attachments: DataStructs.pdf, HIVE-896.1.patch.txt, 
 Hive-896.2.patch.txt, hive-896.3.patch.txt, HIVE-896.4.patch, 
 HIVE-896.5.patch.txt


 Windowing functions are very useful for click stream processing and similar 
 time-series/sliding-window analytics.
 More details at:
 http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1006709
 http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1007059
 http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1007032
 -- amr

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3455) ANSI CORR(X,Y) is incorrect

2013-07-30 Thread Jon Hartlaub (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Hartlaub updated HIVE-3455:
---

Attachment: HIVE3455.corrTest.tar.gz

Attached are a data file, a .q file and a .q.out file that exercise the 
corr merge problem.  The patch in this ticket passes this test.  (test 
correlates a variable with itself using a CLUSTER BY column)

 ANSI CORR(X,Y) is incorrect
 ---

 Key: HIVE-3455
 URL: https://issues.apache.org/jira/browse/HIVE-3455
 Project: Hive
  Issue Type: Bug
  Components: UDF
Affects Versions: 0.7.1, 0.8.0, 0.8.1, 0.9.0, 0.9.1, 0.10.0
Reporter: Maxim Bolotin
  Labels: patch
 Attachments: HIVE3455.corrTest.tar.gz, my.patch


 A simple test with 2 collinear vectors returns a wrong result.
 The problem is the merge of variances, file:
 http://svn.apache.org/viewvc/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCorrelation.java?revision=1157222view=markup
 lines:
 347: myagg.xvar += xvarB + (xavgA - xavgB) * (xavgA - xavgB) * myagg.count;
 348: myagg.yvar += yvarB + (yavgA - yavgB) * (yavgA - yavgB) * myagg.count;
 the correct merge should be like this:
 347: myagg.xvar += xvarB + (xavgA - xavgB) * (xavgA - xavgB) / myagg.count * 
 nA * nB;
 348: myagg.yvar += yvarB + (yavgA - yavgB) * (yavgA - yavgB) / myagg.count * 
 nA * nB;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Review Request (wikidoc): LZO Compression in Hive

2013-07-30 Thread Sanjay Subramanian
Hi

Met with Lefty this afternoon and she was kind to spend time to add my 
documentation to the site - since I still don't have editing privileges :-)

Please review the new wikidoc about LZO compression in the Hive language 
manual.  If anything is unclear or needs more information, you can email 
suggestions to this list or edit the wiki yourself (if you have editing 
privileges).  Here are the links:

  1.  Language 
Manualhttps://cwiki.apache.org/confluence/display/Hive/LanguageManual (new 
bullet under File Formats)
  2.  LZO 
Compressionhttps://cwiki.apache.org/confluence/display/Hive/LanguageManual+LZO
  3.  CREATE 
TABLEhttps://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTable
 (near end of section, pasted in here:)
Use STORED AS TEXTFILE if the data needs to be stored as plain text files. Use 
STORED AS SEQUENCEFILE if the data needs to be compressed. Please read more 
about 
CompressedStoragehttps://cwiki.apache.org/confluence/display/Hive/CompressedStorage
 if you are planning to keep data compressed in your Hive tables. Use 
INPUTFORMAT and OUTPUTFORMAT to specify the name of a corresponding InputFormat 
and OutputFormat class as a string literal, e.g., 
'org.apache.hadoop.hive.contrib.fileformat.base64.Base64TextInputFormat'. For 
LZO compression, the values to use are 'INPUTFORMAT 
com.hadoop.mapred.DeprecatedLzoTextInputFormat OUTPUTFORMAT 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' (see LZO 
Compressionhttps://cwiki.apache.org/confluence/display/Hive/LanguageManual+LZO).

My cwiki id is
https://cwiki.apache.org/confluence/display/~sanjaysubraman...@yahoo.com
It will be great if I could get edit privileges

Thanks
sanjay

CONFIDENTIALITY NOTICE
==
This email message and any attachments are for the exclusive use of the 
intended recipient(s) and may contain confidential and privileged information. 
Any unauthorized review, use, disclosure or distribution is prohibited. If you 
are not the intended recipient, please contact the sender by reply email and 
destroy all copies of the original message along with any attachments, from 
your computer system. If you are the intended recipient, please be advised that 
the content of this message is subject to access, review and disclosure by the 
sender's Email System Administrator.


[jira] [Updated] (HIVE-4051) Hive's metastore suffers from 1+N queries when querying partitions is slow

2013-07-30 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-4051:
--

Attachment: HIVE-4051.D11805.3.patch

sershe updated the revision HIVE-4051 [jira] Hive's metastore suffers from 1+N 
queries when querying partitions  is slow.

  Addressed Phabricator comments, fixed minor bugs (e.g. null checks, 
setTableName called instead of setDbName), added column schemas that ended up 
being needed after all, cleaned up the code a bit, added some short circuiting, 
added order to tests that had undefined order and so depended on the order in 
which partitions are returned (ORM code returns them by name, SQL by ID).
  Added some short-circuiting to the queries/getting stuff.
  I compared the reflection-based dump of SQL- and ORM- based objects from some 
tests (code not included) and they are the same.

  The existing tests seem to adequately cover this code. The only concern is 
that if it fails it's impossible to see as it falls back to ORM...

Reviewers: JIRA

REVISION DETAIL
  https://reviews.facebook.net/D11805

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D11805?vs=36357id=36615#toc

AFFECTED FILES
  build.xml
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
  metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java
  metastore/src/java/org/apache/hadoop/hive/metastore/parser/ExpressionTree.java
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
  ql/src/test/queries/clientpositive/alter_partition_coltype.q
  ql/src/test/queries/clientpositive/load_dyn_part3.q
  ql/src/test/queries/clientpositive/load_dyn_part4.q
  ql/src/test/queries/clientpositive/load_dyn_part9.q
  ql/src/test/queries/clientpositive/ppr_pushdown2.q
  ql/src/test/queries/clientpositive/stats4.q
  ql/src/test/results/clientpositive/load_dyn_part3.q.out
  ql/src/test/results/clientpositive/load_dyn_part4.q.out
  ql/src/test/results/clientpositive/load_dyn_part9.q.out
  ql/src/test/results/clientpositive/ppr_pushdown2.q.out
  ql/src/test/results/clientpositive/stats4.q.out

To: JIRA, sershe
Cc: brock


 Hive's metastore suffers from 1+N queries when querying partitions  is slow
 

 Key: HIVE-4051
 URL: https://issues.apache.org/jira/browse/HIVE-4051
 Project: Hive
  Issue Type: Bug
  Components: Clients, Metastore
 Environment: RHEL 6.3 / EC2 C1.XL
Reporter: Gopal V
Assignee: Sergey Shelukhin
 Attachments: HIVE-4051.D11805.1.patch, HIVE-4051.D11805.2.patch, 
 HIVE-4051.D11805.3.patch


 Hive's query client takes a long time to initialize  start planning queries 
 because of delays in creating all the MTable/MPartition objects.
 For a hive db with 1800 partitions, the metastore took 6-7 seconds to 
 initialize - firing approximately 5900 queries to the mysql database.
 Several of those queries fetch exactly one row to create a single object on 
 the client.
 The following 12 queries were repeated for each partition, generating a storm 
 of SQL queries 
 {code}
 4 Query SELECT 
 `A0`.`SD_ID`,`B0`.`INPUT_FORMAT`,`B0`.`IS_COMPRESSED`,`B0`.`IS_STOREDASSUBDIRECTORIES`,`B0`.`LOCATION`,`B0`.`NUM_BUCKETS`,`B0`.`OUTPUT_FORMAT`,`B0`.`SD_ID`
  FROM `PARTITIONS` `A0` LEFT OUTER JOIN `SDS` `B0` ON `A0`.`SD_ID` = 
 `B0`.`SD_ID` WHERE `A0`.`PART_ID` = 3945
 4 Query SELECT `A0`.`CD_ID`,`B0`.`CD_ID` FROM `SDS` `A0` LEFT OUTER JOIN 
 `CDS` `B0` ON `A0`.`CD_ID` = `B0`.`CD_ID` WHERE `A0`.`SD_ID` =4871
 4 Query SELECT COUNT(*) FROM `COLUMNS_V2` THIS WHERE THIS.`CD_ID`=1546 
 AND THIS.`INTEGER_IDX`=0
 4 Query SELECT 
 `A0`.`COMMENT`,`A0`.`COLUMN_NAME`,`A0`.`TYPE_NAME`,`A0`.`INTEGER_IDX` AS 
 NUCORDER0 FROM `COLUMNS_V2` `A0` WHERE `A0`.`CD_ID` = 1546 AND 
 `A0`.`INTEGER_IDX` = 0 ORDER BY NUCORDER0
 4 Query SELECT `A0`.`SERDE_ID`,`B0`.`NAME`,`B0`.`SLIB`,`B0`.`SERDE_ID` 
 FROM `SDS` `A0` LEFT OUTER JOIN `SERDES` `B0` ON `A0`.`SERDE_ID` = 
 `B0`.`SERDE_ID` WHERE `A0`.`SD_ID` =4871
 4 Query SELECT COUNT(*) FROM `SORT_COLS` THIS WHERE THIS.`SD_ID`=4871 AND 
 THIS.`INTEGER_IDX`=0
 4 Query SELECT `A0`.`COLUMN_NAME`,`A0`.`ORDER`,`A0`.`INTEGER_IDX` AS 
 NUCORDER0 FROM `SORT_COLS` `A0` WHERE `A0`.`SD_ID` =4871 AND 
 `A0`.`INTEGER_IDX` = 0 ORDER BY NUCORDER0
 4 Query SELECT COUNT(*) FROM `SKEWED_VALUES` THIS WHERE 
 THIS.`SD_ID_OID`=4871 AND THIS.`INTEGER_IDX`=0
 4 Query SELECT 'org.apache.hadoop.hive.metastore.model.MStringList' AS 
 NUCLEUS_TYPE,`A1`.`STRING_LIST_ID`,`A0`.`INTEGER_IDX` AS NUCORDER0 FROM 
 `SKEWED_VALUES` `A0` INNER JOIN `SKEWED_STRING_LIST` `A1` ON 
 `A0`.`STRING_LIST_ID_EID` = `A1`.`STRING_LIST_ID` WHERE `A0`.`SD_ID_OID` 
 =4871 AND `A0`.`INTEGER_IDX` = 0 ORDER BY NUCORDER0
 4 Query SELECT COUNT(*) FROM `SKEWED_COL_VALUE_LOC_MAP` WHERE `SD_ID` 
 =4871 AND 

[jira] [Resolved] (HIVE-4524) Make the Hive HBaseStorageHandler work under HCat

2013-07-30 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan resolved HIVE-4524.


Resolution: Duplicate

I'm going to mark this issue as a duplicate of HIVE-4331, since it attempts to 
correct the same problem, although with a different approach.

 Make the Hive HBaseStorageHandler work under HCat
 -

 Key: HIVE-4524
 URL: https://issues.apache.org/jira/browse/HIVE-4524
 Project: Hive
  Issue Type: Bug
  Components: HBase Handler, HCatalog
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
 Attachments: hbh4.patch


 Currently, HCatalog has its own HCatHBaseStorageHandler that extends from 
 HBaseStorageHandler to allow for StorageHandler support, and does some 
 translations, like org.apache.mapred-org.apache.mapreduce wrapping, etc. 
 However, this compatibility layer is not complete in functionality as it 
 still assumes the underlying OutputFormat is a mapred.OutputFormat 
 implementation as opposed to a HiveOutputFormat implementation, and it makes 
 assumptions about config property copies that implementations of the 
 HiveStorageHandler, such as the HBaseStorageHandler do not do.
 To fix this, we need to improve the ability for HCat to properly load 
 native-hive-style StorageHandlers.
 Also, since HCat has its own HBaseStorageHandler and we'd like to not 
 maintain two separate HBaseStorageHandlers, the idea is to deprecate HCat's 
 storage handler over time, and make sure that hive's HBaseStorageHandler 
 works properly from HCat, and over time, have it reach feature parity with 
 the HCat one.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4844) Add char/varchar data types

2013-07-30 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724626#comment-13724626
 ] 

Xuefu Zhang commented on HIVE-4844:
---

[~jdere] Thanks for sharing your work. I went thru your patch and had some 
initial questions. I understand that your patch is still in progress, but I'm 
wondering what's your thought on how you plan to store the type params. 
Obviously, type params are metadata of a column, which needs to be stored. I 
assume that hive schema needs to change to accommodate this.

Secondly, SQL VAR or VARCHAR seems to be special hive string with additional 
restriction. Your patch seems treating them independently. Do you think if type 
inheritance works here?

Lastly, introducing param types seems non-trivial. Do you think if a design doc 
or wiki page makes sense?

 Add char/varchar data types
 ---

 Key: HIVE-4844
 URL: https://issues.apache.org/jira/browse/HIVE-4844
 Project: Hive
  Issue Type: New Feature
  Components: Types
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-4844.1.patch.hack


 Add new char/varchar data types which have support for more SQL-compliant 
 behavior, such as SQL string comparison semantics, max length, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4051) Hive's metastore suffers from 1+N queries when querying partitions is slow

2013-07-30 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-4051:
--

Attachment: HIVE-4051.D11805.4.patch

sershe updated the revision HIVE-4051 [jira] Hive's metastore suffers from 1+N 
queries when querying partitions  is slow.

  Followup - forgot to rerun one query after changing.

Reviewers: JIRA

REVISION DETAIL
  https://reviews.facebook.net/D11805

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D11805?vs=36615id=36621#toc

AFFECTED FILES
  build.xml
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
  metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java
  metastore/src/java/org/apache/hadoop/hive/metastore/parser/ExpressionTree.java
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
  ql/src/test/queries/clientpositive/alter_partition_coltype.q
  ql/src/test/queries/clientpositive/load_dyn_part3.q
  ql/src/test/queries/clientpositive/load_dyn_part4.q
  ql/src/test/queries/clientpositive/load_dyn_part9.q
  ql/src/test/queries/clientpositive/ppr_pushdown2.q
  ql/src/test/queries/clientpositive/stats4.q
  ql/src/test/results/clientpositive/alter_partition_coltype.q.out
  ql/src/test/results/clientpositive/load_dyn_part3.q.out
  ql/src/test/results/clientpositive/load_dyn_part4.q.out
  ql/src/test/results/clientpositive/load_dyn_part9.q.out
  ql/src/test/results/clientpositive/ppr_pushdown2.q.out
  ql/src/test/results/clientpositive/stats4.q.out

To: JIRA, sershe
Cc: brock


 Hive's metastore suffers from 1+N queries when querying partitions  is slow
 

 Key: HIVE-4051
 URL: https://issues.apache.org/jira/browse/HIVE-4051
 Project: Hive
  Issue Type: Bug
  Components: Clients, Metastore
 Environment: RHEL 6.3 / EC2 C1.XL
Reporter: Gopal V
Assignee: Sergey Shelukhin
 Attachments: HIVE-4051.D11805.1.patch, HIVE-4051.D11805.2.patch, 
 HIVE-4051.D11805.3.patch, HIVE-4051.D11805.4.patch


 Hive's query client takes a long time to initialize  start planning queries 
 because of delays in creating all the MTable/MPartition objects.
 For a hive db with 1800 partitions, the metastore took 6-7 seconds to 
 initialize - firing approximately 5900 queries to the mysql database.
 Several of those queries fetch exactly one row to create a single object on 
 the client.
 The following 12 queries were repeated for each partition, generating a storm 
 of SQL queries 
 {code}
 4 Query SELECT 
 `A0`.`SD_ID`,`B0`.`INPUT_FORMAT`,`B0`.`IS_COMPRESSED`,`B0`.`IS_STOREDASSUBDIRECTORIES`,`B0`.`LOCATION`,`B0`.`NUM_BUCKETS`,`B0`.`OUTPUT_FORMAT`,`B0`.`SD_ID`
  FROM `PARTITIONS` `A0` LEFT OUTER JOIN `SDS` `B0` ON `A0`.`SD_ID` = 
 `B0`.`SD_ID` WHERE `A0`.`PART_ID` = 3945
 4 Query SELECT `A0`.`CD_ID`,`B0`.`CD_ID` FROM `SDS` `A0` LEFT OUTER JOIN 
 `CDS` `B0` ON `A0`.`CD_ID` = `B0`.`CD_ID` WHERE `A0`.`SD_ID` =4871
 4 Query SELECT COUNT(*) FROM `COLUMNS_V2` THIS WHERE THIS.`CD_ID`=1546 
 AND THIS.`INTEGER_IDX`=0
 4 Query SELECT 
 `A0`.`COMMENT`,`A0`.`COLUMN_NAME`,`A0`.`TYPE_NAME`,`A0`.`INTEGER_IDX` AS 
 NUCORDER0 FROM `COLUMNS_V2` `A0` WHERE `A0`.`CD_ID` = 1546 AND 
 `A0`.`INTEGER_IDX` = 0 ORDER BY NUCORDER0
 4 Query SELECT `A0`.`SERDE_ID`,`B0`.`NAME`,`B0`.`SLIB`,`B0`.`SERDE_ID` 
 FROM `SDS` `A0` LEFT OUTER JOIN `SERDES` `B0` ON `A0`.`SERDE_ID` = 
 `B0`.`SERDE_ID` WHERE `A0`.`SD_ID` =4871
 4 Query SELECT COUNT(*) FROM `SORT_COLS` THIS WHERE THIS.`SD_ID`=4871 AND 
 THIS.`INTEGER_IDX`=0
 4 Query SELECT `A0`.`COLUMN_NAME`,`A0`.`ORDER`,`A0`.`INTEGER_IDX` AS 
 NUCORDER0 FROM `SORT_COLS` `A0` WHERE `A0`.`SD_ID` =4871 AND 
 `A0`.`INTEGER_IDX` = 0 ORDER BY NUCORDER0
 4 Query SELECT COUNT(*) FROM `SKEWED_VALUES` THIS WHERE 
 THIS.`SD_ID_OID`=4871 AND THIS.`INTEGER_IDX`=0
 4 Query SELECT 'org.apache.hadoop.hive.metastore.model.MStringList' AS 
 NUCLEUS_TYPE,`A1`.`STRING_LIST_ID`,`A0`.`INTEGER_IDX` AS NUCORDER0 FROM 
 `SKEWED_VALUES` `A0` INNER JOIN `SKEWED_STRING_LIST` `A1` ON 
 `A0`.`STRING_LIST_ID_EID` = `A1`.`STRING_LIST_ID` WHERE `A0`.`SD_ID_OID` 
 =4871 AND `A0`.`INTEGER_IDX` = 0 ORDER BY NUCORDER0
 4 Query SELECT COUNT(*) FROM `SKEWED_COL_VALUE_LOC_MAP` WHERE `SD_ID` 
 =4871 AND `STRING_LIST_ID_KID` IS NOT NULL
 4 Query SELECT 'org.apache.hadoop.hive.metastore.model.MStringList' AS 
 NUCLEUS_TYPE,`A0`.`STRING_LIST_ID` FROM `SKEWED_STRING_LIST` `A0` INNER JOIN 
 `SKEWED_COL_VALUE_LOC_MAP` `B0` ON `A0`.`STRING_LIST_ID` = 
 `B0`.`STRING_LIST_ID_KID` WHERE `B0`.`SD_ID` =4871
 4 Query SELECT `A0`.`STRING_LIST_ID_KID`,`A0`.`LOCATION` FROM 
 `SKEWED_COL_VALUE_LOC_MAP` `A0` WHERE `A0`.`SD_ID` =4871 AND NOT 
 (`A0`.`STRING_LIST_ID_KID` IS NULL)
 {code}
 This data is not detached or cached, so this operation is performed during 
 every query plan for the 

Re: [jira] [Created] (HIVE-4954) PTFTranslator hardcodes ranking functions

2013-07-30 Thread Edward Capriolo
Can i get some +1 love ? I have 2 or 3 follow ons.

On Tuesday, July 30, 2013, Hive QA (JIRA) j...@apache.org wrote:

 [
https://issues.apache.org/jira/browse/HIVE-4954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724561#comment-13724561]

 Hive QA commented on HIVE-4954:
 ---



 {color:green}Overall{color}: +1 all checks pass

 Here are the results of testing the latest attachment:

https://issues.apache.org/jira/secure/attachment/12595032/HIVE-4879.2.patch.txt

 {color:green}SUCCESS:{color} +1 2749 tests passed

 Test results:
https://builds.apache.org/job/PreCommit-HIVE-Build/248/testReport
 Console output:
https://builds.apache.org/job/PreCommit-HIVE-Build/248/console

 Messages:
 {noformat}
 Executing org.apache.hive.ptest.execution.PrepPhase
 Executing org.apache.hive.ptest.execution.ExecutionPhase
 Executing org.apache.hive.ptest.execution.ReportingPhase
 {noformat}

 This message is automatically generated.

 PTFTranslator hardcodes ranking functions
 -

 Key: HIVE-4954
 URL: https://issues.apache.org/jira/browse/HIVE-4954
 Project: Hive
  Issue Type: Improvement
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Attachments: HIVE-4879.2.patch.txt, HIVE-4954.1.patch.txt


   protected static final ArrayListString RANKING_FUNCS = new
ArrayListString();
   static {
 RANKING_FUNCS.add(rank);
 RANKING_FUNCS.add(dense_rank);
 RANKING_FUNCS.add(percent_rank);
 RANKING_FUNCS.add(cume_dist);
   };
 Move this logic to annotations

 --
 This message is automatically generated by JIRA.
 If you think it was sent incorrectly, please contact your JIRA
administrators
 For more information on JIRA, see: http://www.atlassian.com/software/jira



Re: [jira] [Created] (HIVE-4844) Add char/varchar data types

2013-07-30 Thread Edward Capriolo
As for the param types. How do we enforce these? If we have a lazy simple
serde and a varchar 10 and we are reading a column with 11 chars, what do
we do?

Maybe we say that char is an alias and we do not actually enforce anything.
On Tuesday, July 30, 2013, Xuefu Zhang (JIRA) j...@apache.org wrote:

 [
https://issues.apache.org/jira/browse/HIVE-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724626#comment-13724626]

 Xuefu Zhang commented on HIVE-4844:
 ---

 [~jdere] Thanks for sharing your work. I went thru your patch and had
some initial questions. I understand that your patch is still in progress,
but I'm wondering what's your thought on how you plan to store the type
params. Obviously, type params are metadata of a column, which needs to be
stored. I assume that hive schema needs to change to accommodate this.

 Secondly, SQL VAR or VARCHAR seems to be special hive string with
additional restriction. Your patch seems treating them independently. Do
you think if type inheritance works here?

 Lastly, introducing param types seems non-trivial. Do you think if a
design doc or wiki page makes sense?

 Add char/varchar data types
 ---

 Key: HIVE-4844
 URL: https://issues.apache.org/jira/browse/HIVE-4844
 Project: Hive
  Issue Type: New Feature
  Components: Types
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-4844.1.patch.hack


 Add new char/varchar data types which have support for more
SQL-compliant behavior, such as SQL string comparison semantics, max
length, etc.

 --
 This message is automatically generated by JIRA.
 If you think it was sent incorrectly, please contact your JIRA
administrators
 For more information on JIRA, see: http://www.atlassian.com/software/jira



[jira] [Commented] (HIVE-4574) XMLEncoder thread safety issues in openjdk7 causes HiveServer2 to be stuck

2013-07-30 Thread Chris Drome (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724653#comment-13724653
 ] 

Chris Drome commented on HIVE-4574:
---

[~thejas], I was wondering why none of the other methods which use XMLEncoder 
are not synchronized as well. Is there something specific about 
serializeExpression that makes it different?

 XMLEncoder thread safety issues in openjdk7 causes HiveServer2 to be stuck
 --

 Key: HIVE-4574
 URL: https://issues.apache.org/jira/browse/HIVE-4574
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.11.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-4574.1.patch


 In open jdk7, XMLEncoder.writeObject call leads to calls to 
 java.beans.MethodFinder.findMethod(). MethodFinder class not thread safe 
 because it uses a static WeakHashMap that would get used from multiple 
 threads. See -
 http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/7-b147/com/sun/beans/finder/MethodFinder.java#46
 Concurrent access to HashMap implementation that are not thread safe can 
 sometimes result in infinite-loops and other problems. If jdk7 is in use, it 
 makes sense to synchronize calls to XMLEncoder.writeObject .

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4827) Merge a Map-only job to its following MapReduce job with multiple inputs

2013-07-30 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-4827:
---

Release Note: 
Before applying this jira to trunk, CommonJoinTaskDispatcher has two methods, 
mergeMapJoinTaskWithChildMapJoinTask and mergeMapJoinTaskWithMapReduceTask. The 
first method tries to merge a map-only task (for MapJoin) to its child map-only 
task. The second method tries to merge a map-only task to its child MapReduce 
task (a task has a reducer). There was a flag called 
hive.optimize.mapjoin.mapreduce to determine if 
mergeMapJoinTaskWithMapReduceTask will be called. 

This work combines  mergeMapJoinTaskWithChildMapJoinTask and 
mergeMapJoinTaskWithMapReduceTask. So, a map-only task will be merged into its 
child task no matter the child task is a map-only task or a MapReduce task. So 
hive.optimize.mapjoin.mapreduce is not needed any more. 

If a user wants to disable merging a map-only task to its child task, he or she 
can use either 
set hive.auto.convert.join.noconditionaltask=false;
or
set hive.auto.convert.join.noconditionaltask=true;
set hive.auto.convert.join.noconditionaltask.size=0;

  was:
Before applying this jira to trunk, CommonJoinTaskDispatcher has two methods, 
mergeMapJoinTaskWithChildMapJoinTask and mergeMapJoinTaskWithMapReduceTask. The 
first method tries to merge a map-only task (for MapJoin) to its child map-only 
task. The second method tries to merge a map-only task to its child MapReduce 
task (a task has a reducer). There was a flag called 
hive.optimize.mapjoin.mapreduce to determine if 
mergeMapJoinTaskWithMapReduceTask will be called. 

This work combines  mergeMapJoinTaskWithChildMapJoinTask and 
mergeMapJoinTaskWithMapReduceTask. So, a map-only task will be merged into its 
child task no matter the child task is a map-only task or a MapReduce task. So 
hive.optimize.mapjoin.mapreduce is not needed any more. 

If a user wants to disable merging a map-only task to its child task, he or she 
can use either 
{code}
set hive.auto.convert.join.noconditionaltask=false;
{\code}

{code}
set hive.auto.convert.join.noconditionaltask=true;
set hive.auto.convert.join.noconditionaltask.size=0;
{\code}


 Merge a Map-only job to its following MapReduce job with multiple inputs
 

 Key: HIVE-4827
 URL: https://issues.apache.org/jira/browse/HIVE-4827
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Yin Huai
Assignee: Yin Huai
 Attachments: HIVE-4827.1.patch, HIVE-4827.2.patch, HIVE-4827.3.patch, 
 HIVE-4827.4.patch, HIVE-4827.5.patch, HIVE-4827.6.patch


 When hive.optimize.mapjoin.mapreduce is on, CommonJoinResolver can attach a 
 Map-only job (MapJoin) to its following MapReduce job. But this merge only 
 happens when the MapReduce job has a single input. With Correlation Optimizer 
 (HIVE-2206), it is possible that the MapReduce job can have multiple inputs 
 (for multiple operation paths). It is desired to improve CommonJoinResolver 
 to merge a Map-only job to the corresponding Map task of the MapReduce job.
 Example:
 {code:sql}
 set hive.optimize.correlation=true;
 set hive.auto.convert.join=true;
 set hive.optimize.mapjoin.mapreduce=true;
 SELECT tmp1.key, count(*)
 FROM (SELECT x1.key1 AS key
   FROM bigTable1 x1 JOIN smallTable1 y1 ON (x1.key1 = y1.key1)
   GROUP BY x1.key1) tmp1
 JOIN (SELECT x2.key2 AS key
   FROM bigTable2 x2 JOIN smallTable2 y2 ON (x2.key2 = y2.key2)
   GROUP BY x2.key2) tmp2
 ON (tmp1.key = tmp2.key)
 GROUP BY tmp1.key;
 {\code}
 In this query, join operations inside tmp1 and tmp2 will be converted to two 
 MapJoins. With Correlation Optimizer, aggregations in tmp1, tmp2, and join of 
 tmp1 and tmp2, and the last aggregation will be executed in the same 
 MapReduce job (Reduce side). Since this MapReduce job has two inputs, right 
 now, CommonJoinResolver cannot attach two MapJoins to the Map side of a 
 MapReduce job.
 Another example:
 {code:sql}
 SELECT tmp1.key
 FROM (SELECT x1.key2 AS key
   FROM bigTable1 x1 JOIN smallTable1 y1 ON (x1.key1 = y1.key1)
   UNION ALL
   SELECT x2.key2 AS key
   FROM bigTable2 x2 JOIN smallTable2 y2 ON (x2.key1 = y2.key1)) tmp1
 {\code}
 For this case, we will have three Map-only jobs (two for MapJoins and one for 
 Union). It will be good to use a single Map-only job to execute this query.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


  1   2   >