date:20140916


[ 
https://issues.apache.org/jira/browse/HIVE-8107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135025#comment-14135025
 ] 

Hive QA commented on HIVE-8107:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12668838/HIVE-8107.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6277 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.ql.parse.TestParse.testParse_union
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/814/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/814/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-814/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12668838

 Bad error message for non-existent table in update and delete
 -

 Key: HIVE-8107
 URL: https://issues.apache.org/jira/browse/HIVE-8107
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
 Attachments: HIVE-8107.patch


 update no_such_table set x = 3;
 produces an error message like:
 {noformat}
 2014-09-12 19:45:00,138 ERROR [main]: ql.Driver 
 (SessionState.java:printError(824)) - FAILED: SemanticException [Error 
 10290]: Encountered parse error while parsing rewritten update or delete query
 org.apache.hadoop.hive.ql.parse.SemanticException: Encountered parse error 
 while parsing rewritten update or delete query
   at 
 org.apache.hadoop.hive.ql.parse.UpdateDeleteSemanticAnalyzer.reparseAndSuperAnalyze(UpdateDeleteSemanticAnalyzer.java:130)
   at 
 org.apache.hadoop.hive.ql.parse.UpdateDeleteSemanticAnalyzer.analyzeDelete(UpdateDeleteSemanticAnalyzer.java:97)
   at 
 org.apache.hadoop.hive.ql.parse.UpdateDeleteSemanticAnalyzer.analyzeInternal(UpdateDeleteSemanticAnalyzer.java:66)
   at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:217)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:406)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:302)
   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1051)
   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1121)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:988)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:978)
   at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:246)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:198)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:408)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:344)
   at 
 org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:441)
   at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:457)
   at 
 org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:737)
   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
 Caused by: org.apache.hadoop.hive.ql.metadata.InvalidTableException: Table 
 not found no_such_table
   at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:1008)
   at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:978)
   at 
 org.apache.hadoop.hive.ql.parse.UpdateDeleteSemanticAnalyzer.reparseAndSuperAnalyze(UpdateDeleteSemanticAnalyzer.java:128)
   ... 24 more
 {noformat}
 It should give something much cleaner, or at least push the Table not found 
 message to the top rather than bury it in an exception stack.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-860) Persistent distributed cache

2014-09-16 Thread Ferdinand Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu reassigned HIVE-860:
-

Assignee: Ferdinand Xu  (was: Brock Noland)

 Persistent distributed cache
 

 Key: HIVE-860
 URL: https://issues.apache.org/jira/browse/HIVE-860
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.12.0
Reporter: Zheng Shao
Assignee: Ferdinand Xu
 Fix For: 0.14.0

 Attachments: HIVE-860.patch, HIVE-860.patch, HIVE-860.patch, 
 HIVE-860.patch, HIVE-860.patch, HIVE-860.patch, HIVE-860.patch, 
 HIVE-860.patch, HIVE-860.patch, HIVE-860.patch, HIVE-860.patch


 DistributedCache is shared across multiple jobs, if the hdfs file name is the 
 same.
 We need to make sure Hive put the same file into the same location every time 
 and do not overwrite if the file content is the same.
 We can achieve 2 different results:
 A1. Files added with the same name, timestamp, and md5 in the same session 
 will have a single copy in distributed cache.
 A2. Filed added with the same name, timestamp, and md5 will have a single 
 copy in distributed cache.
 A2 has a bigger benefit in sharing but may raise a question on when Hive 
 should clean it up in hdfs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-5744) Implement support for BETWEEN in SELECT list


 [ 
https://issues.apache.org/jira/browse/HIVE-5744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-5744:

Assignee: Navis
  Status: Patch Available  (was: Open)

 Implement support for BETWEEN in SELECT list
 

 Key: HIVE-5744
 URL: https://issues.apache.org/jira/browse/HIVE-5744
 Project: Hive
  Issue Type: Sub-task
Reporter: Eric Hanson
Assignee: Navis
 Attachments: HIVE-4160.1.patch.txt


 Queries like 
 SELECT col1 BETWEEN 0 and 10 from T;
 fail in vectorized mode. Support needs to be implemented for a BETWEEN 
 expression in the SELECT list, comparable to how it was added for comparison 
 operators (, , ...). These were done by adding new, templates that return a 
 value for a comparison instead of applying a filter. See 
 ColumnCompareScalar.txt under ql/src/gen for an example.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-5744) Implement support for BETWEEN in SELECT list


 [ 
https://issues.apache.org/jira/browse/HIVE-5744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-5744:

Attachment: HIVE-4160.1.patch.txt

 Implement support for BETWEEN in SELECT list
 

 Key: HIVE-5744
 URL: https://issues.apache.org/jira/browse/HIVE-5744
 Project: Hive
  Issue Type: Sub-task
Reporter: Eric Hanson
 Attachments: HIVE-4160.1.patch.txt


 Queries like 
 SELECT col1 BETWEEN 0 and 10 from T;
 fail in vectorized mode. Support needs to be implemented for a BETWEEN 
 expression in the SELECT list, comparable to how it was added for comparison 
 operators (, , ...). These were done by adding new, templates that return a 
 value for a comparison instead of applying a filter. See 
 ColumnCompareScalar.txt under ql/src/gen for an example.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-5744) Implement support for BETWEEN in SELECT list


 [ 
https://issues.apache.org/jira/browse/HIVE-5744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-5744:

Attachment: HIVE-5744.1.patch.txt

 Implement support for BETWEEN in SELECT list
 

 Key: HIVE-5744
 URL: https://issues.apache.org/jira/browse/HIVE-5744
 Project: Hive
  Issue Type: Sub-task
Reporter: Eric Hanson
Assignee: Navis
 Attachments: HIVE-5744.1.patch.txt


 Queries like 
 SELECT col1 BETWEEN 0 and 10 from T;
 fail in vectorized mode. Support needs to be implemented for a BETWEEN 
 expression in the SELECT list, comparable to how it was added for comparison 
 operators (, , ...). These were done by adding new, templates that return a 
 value for a comparison instead of applying a filter. See 
 ColumnCompareScalar.txt under ql/src/gen for an example.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8102) Partitions of type 'date' behave incorrectly with daylight saving time.

2014-09-16 Thread Eli Acherkan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135083#comment-14135083
 ] 

Eli Acherkan commented on HIVE-8102:


Thanks [~jdere]! The patch appears to work well for us. (Haven't tested on 
other timezones.)

 Partitions of type 'date' behave incorrectly with daylight saving time.
 ---

 Key: HIVE-8102
 URL: https://issues.apache.org/jira/browse/HIVE-8102
 Project: Hive
  Issue Type: Bug
  Components: Database/Schema, Serializers/Deserializers
Affects Versions: 0.13.0
Reporter: Eli Acherkan
 Attachments: HIVE-8102.1.patch


 On 2AM on March 28th 2014, Israel went from standard time (GMT+2) to daylight 
 saving time (GMT+3).
 The server's timezone is Asia/Jerusalem. When creating a partition whose key 
 is 2014-03-28, Hive creates a partition for 2013-03-27 instead:
 hive (default) create table test (a int) partitioned by (`b_prt` date);
 OK
 Time taken: 0.092 seconds
 hive (default) alter table test add partition (b_prt='2014-03-28');
 OK
 Time taken: 0.187 seconds
 hive (default) show partitions test;   
 OK
 partition
 b_prt=2014-03-27
 Time taken: 0.134 seconds, Fetched: 1 row(s)
 It seems that the root cause is the behavior of 
 DateWritable.daysToMillis/dateToDays.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8038) Decouple ORC files split calculation logic from Filesystem's get file location implementation


[ 
https://issues.apache.org/jira/browse/HIVE-8038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135101#comment-14135101
 ] 

Hive QA commented on HIVE-8038:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12668843/HIVE-8038.2.patch

{color:green}SUCCESS:{color} +1 6276 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/815/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/815/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-815/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12668843

 Decouple ORC files split calculation logic from Filesystem's get file 
 location implementation
 -

 Key: HIVE-8038
 URL: https://issues.apache.org/jira/browse/HIVE-8038
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Affects Versions: 0.13.1
Reporter: Pankit Thapar
Assignee: Pankit Thapar
 Fix For: 0.14.0

 Attachments: HIVE-8038.2.patch, HIVE-8038.patch


 What is the Current Logic
 ==
 1.get the file blocks from FileSystem.getFileBlockLocations() which returns 
 an array of BlockLocation
 2.In SplitGenerator.createSplit(), check if split only spans one block or 
 multiple blocks.
 3.If split spans just one block, then using the array index (index = 
 offset/blockSize), get the corresponding host having the blockLocation
 4.If the split spans multiple blocks, then get all hosts that have at least 
 80% of the max of total data in split hosted by any host.
 5.add the split to a list of splits
 Issue with Current Logic
 =
 Dependency on FileSystem API’s logic for block location calculations. It 
 returns an array and we need to rely on FileSystem to  
 make all blocks of same size if we want to directly access a block from the 
 array.
  
 What is the Fix
 =
 1a.get the file blocks from FileSystem.getFileBlockLocations() which returns 
 an array of BlockLocation
 1b.convert the array into a tree map offset, BlockLocation and return it 
 through getLocationsWithOffSet()
 2.In SplitGenerator.createSplit(), check if split only spans one block or 
 multiple blocks.
 3.If split spans just one block, then using Tree.floorEntry(key), get the 
 highest entry smaller than offset for the split and get the corresponding 
 host.
 4a.If the split spans multiple blocks, get a submap, which contains all 
 entries containing blockLocations from the offset to offset + length
 4b.get all hosts that have at least 80% of the max of total data in split 
 hosted by any host.
 5.add the split to a list of splits
 What are the major changes in logic
 ==
 1. store BlockLocations in a Map instead of an array
 2. Call SHIMS.getLocationsWithOffSet() instead of getLocations()
 3. one block case is checked by if(offset + length = start.getOffset() + 
 start.getLength())  instead of if((offset % blockSize) + length = 
 blockSize)
 What is the affect on Complexity (Big O)
 =
 1. We add a O(n) loop to build a TreeMap from an array but its a one time 
 cost and would not be called for each split
 2. In case of one block case, we can get the block in O(logn) worst case 
 which was O(1) before
 3. Getting the submap is O(logn)
 4. In case of multiple block case, building the list of hosts is O(m) which 
 was O(n)  m  n as previously we were iterating 
over all the block locations but now we are only iterating only blocks 
 that belong to that range go offsets that we need. 
 What are the benefits of the change
 ==
 1. With this fix, we do not depend on the blockLocations returned by 
 FileSystem to figure out the block corresponding to the offset and blockSize
 2. Also, it is not necessary that block lengths is same for all blocks for 
 all FileSystems
 3. Previously we were using blockSize for one block case and block.length for 
 multiple block case, which is not the case now. We figure out the block
depending upon the actual length and offset of the block



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-6705) hive jdbc can not used by jmeter, because of unsupported auto commit feature


 [ 
https://issues.apache.org/jira/browse/HIVE-6705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis reassigned HIVE-6705:
---

Assignee: Navis

 hive jdbc can not used by jmeter, because of unsupported auto commit feature
 

 Key: HIVE-6705
 URL: https://issues.apache.org/jira/browse/HIVE-6705
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.12.0
 Environment: CentOS_X86_64 
 JMeter 2.11
Reporter: Ben
Assignee: Navis
 Attachments: HIVE-6705.1.patch.txt


 In apache jmeter ,the autocommit property is required.
 but in the hive jdbc the auto commit is unsupported method.
 in 
 /jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveConnection.java
 {quote}
  public void setAutoCommit(boolean autoCommit) throws SQLException {
 // TODO Auto-generated method stub
   throw new {color:red}  SQLException(Method not supported);
 {color}
   }
 {quote}
 so ,should  we make a mock to support  the auto commit property == false ?
 {quote}
 public void setAutoCommit(boolean autoCommit) throws SQLException {
   // TODO Auto-generated method stub
  {color:red}if(autoCommit) {color}
   throw new SQLException(Method not supported);
  else
   return;
  }
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Review Request 25688: hive jdbc can not used by jmeter, because of unsupported auto commit feature

2014-09-16 Thread Navis Ryu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25688/
---

Review request for hive.


Bugs: HIVE-6705
https://issues.apache.org/jira/browse/HIVE-6705


Repository: hive-git


Description
---

In apache jmeter ,the autocommit property is required.
but in the hive jdbc the auto commit is unsupported method.

in 
/jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveConnection.java

{quote}
 public void setAutoCommit(boolean autoCommit) throws SQLException {
// TODO Auto-generated method stub

  throw new {color:red}  SQLException(Method not supported);
{color}
  }
{quote}

so ,should  we make a mock to support  the auto commit property == false ?

{quote}
public void setAutoCommit(boolean autoCommit) throws SQLException {
  // TODO Auto-generated method stub
 {color:red}if(autoCommit) {color}
  throw new SQLException(Method not supported);
 else
  return;
 }
{quote}


Diffs
-

  jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveConnection.java 59ce692 
  jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java cbcfec7 

Diff: https://reviews.apache.org/r/25688/diff/


Testing
---


Thanks,

Navis Ryu

[jira] [Updated] (HIVE-6705) hive jdbc can not used by jmeter, because of unsupported auto commit feature


 [ 
https://issues.apache.org/jira/browse/HIVE-6705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-6705:

Attachment: HIVE-6705.2.patch.txt

 hive jdbc can not used by jmeter, because of unsupported auto commit feature
 

 Key: HIVE-6705
 URL: https://issues.apache.org/jira/browse/HIVE-6705
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.12.0
 Environment: CentOS_X86_64 
 JMeter 2.11
Reporter: Ben
Assignee: Navis
 Attachments: HIVE-6705.1.patch.txt, HIVE-6705.2.patch.txt


 In apache jmeter ,the autocommit property is required.
 but in the hive jdbc the auto commit is unsupported method.
 in 
 /jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveConnection.java
 {quote}
  public void setAutoCommit(boolean autoCommit) throws SQLException {
 // TODO Auto-generated method stub
   throw new {color:red}  SQLException(Method not supported);
 {color}
   }
 {quote}
 so ,should  we make a mock to support  the auto commit property == false ?
 {quote}
 public void setAutoCommit(boolean autoCommit) throws SQLException {
   // TODO Auto-generated method stub
  {color:red}if(autoCommit) {color}
   throw new SQLException(Method not supported);
  else
   return;
  }
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-7996) Potential resource leak in HiveBurnInClient


 [ 
https://issues.apache.org/jira/browse/HIVE-7996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

skrho reassigned HIVE-7996:
---

Assignee: skrho

 Potential resource leak in HiveBurnInClient
 ---

 Key: HIVE-7996
 URL: https://issues.apache.org/jira/browse/HIVE-7996
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
Assignee: skrho
Priority: Minor

 In createTables() and runQueries(), Statement stmt is not closed upon return.
 In main(), Connection con is not closed upon exit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7996) Potential resource leak in HiveBurnInClient


[ 
https://issues.apache.org/jira/browse/HIVE-7996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135139#comment-14135139
 ] 

skrho commented on HIVE-7996:
-

Hello Ted Yu~~

What is class name which is fixed?  or Where do I check to fix ? ^^

 Potential resource leak in HiveBurnInClient
 ---

 Key: HIVE-7996
 URL: https://issues.apache.org/jira/browse/HIVE-7996
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
Assignee: skrho
Priority: Minor

 In createTables() and runQueries(), Statement stmt is not closed upon return.
 In main(), Connection con is not closed upon exit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8118) SparkMapRecorderHandler and SparkReduceRecordHandler should be initialized with multiple result collectors[Spark Branch]

2014-09-16 Thread Chengxiang Li (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-8118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135151#comment-14135151
]

Chengxiang Li commented on HIVE-8118:
-

Actually, we could generate a spark graph with one map RDD followed by multi
reduce RDDs, it should not related with SparkMapRecordHandler and
SparkReduceRecorderHandler, we could wrap each reduce side child operator with
a separate HiveReduceFunction in SparkCompiler level.
For a map RDD which is followed by two reduce RDDs and then connected to a
union RDD, Spark would compute map RDD twice unless map RDD is cached. If two
reduce share the same shuffle dependency(which means they have same map output
partitions), the job could be optimized to compute map RDD only once
theoretically, but i think this should be an Spark framework level
optimization. while two reduce RDDs don't share the same shuffle dependency,
map RDD would be computed twice anyway.
For multi-insert case, if we wrap all FileSinkOperators into one RDD, parent of
FileSinkOperator would forward rows to each FileSinkOperator, so the data
source for insert would be only generated once.
so I think we do not really need multiple result collectors for
SparkMapRecorderHandler and SparkReduceRecordHandler.

SparkMapRecorderHandler and SparkReduceRecordHandler should be initialized
with multiple result collectors[Spark Branch]

Key: HIVE-8118
URL: https://issues.apache.org/jira/browse/HIVE-8118
Project: Hive
Issue Type: Bug
Components: Spark
Reporter: Xuefu Zhang
Assignee: Venki Korukanti
Labels: Spark-M1

In the current implementation, both SparkMapRecordHandler and
SparkReduceRecorderHandler takes only one result collector, which limits that
the corresponding map or reduce task can have only one child. It's very
comment in multi-insert queries where a map/reduce task has more than one
children. A query like the following has two map tasks as parents:
{code}
select name, sum(value) from dec group by name union all select name, value
from dec order by name
{code}
It's possible in the future an optimation may be implemented so that a map
work is followed by two reduce works and then connected to a union work.
Thus, we should take this as a general case. Tez is currently providing a
collector for each child operator in the map-side or reduce side operator
tree. We can take Tez as a reference.
Likely this is a big change and subtasks are possible.
With this, we can have a simpler and clean multi-insert implementation. This
is also the problem observed in HIVE-7731.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8104) Insert statements against ACID tables NPE when vectorization is on


[ 
https://issues.apache.org/jira/browse/HIVE-8104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135169#comment-14135169
 ] 

Hive QA commented on HIVE-8104:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12668847/HIVE-8104.patch

{color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 6277 tests 
executed
*Failed tests:*
{noformat}
org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes
org.apache.hive.hcatalog.streaming.TestStreaming.testAddPartition
org.apache.hive.hcatalog.streaming.TestStreaming.testEndpointConnection
org.apache.hive.hcatalog.streaming.TestStreaming.testInterleavedTransactionBatchCommits
org.apache.hive.hcatalog.streaming.TestStreaming.testMultipleTransactionBatchCommits
org.apache.hive.hcatalog.streaming.TestStreaming.testRemainingTransactions
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchAbort
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchAbortAndCommit
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchCommit_Delimited
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchCommit_Json
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyAbort
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/816/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/816/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-816/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 12 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12668847

 Insert statements against ACID tables NPE when vectorization is on
 --

 Key: HIVE-8104
 URL: https://issues.apache.org/jira/browse/HIVE-8104
 Project: Hive
  Issue Type: Bug
  Components: Query Processor, Vectorization
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Critical
 Attachments: HIVE-8104.patch


 Doing an insert against a table that is using ACID format with the 
 transaction manager set to DbTxnManager and vectorization turned on results 
 in an NPE.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-6883) Dynamic partitioning optimization does not honor sort order or order by

2014-09-16 Thread Zhichun Wu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135176#comment-14135176
 ] 

Zhichun Wu commented on HIVE-6883:
--

@ [~prasanth_j] , this fix cause some problems when combine dynamic 
partitioning with group by. Consider the following case:
{code}
CREATE TABLE `t1`(  `a` int,`b` string) PARTITIONED BY (`dt` string);
create table src1 (
  `key` string,
  `val` string
);
explain insert overwrite table t1 partition(dt) select 1, hello, 20140901 
from src1 group by key;
{code}
The key expressions of RS in Stage-2 are wrong. The part of the patch which 
using the parent RS's keyCols needs more changes.
{code}
 if (parentRSOpOrder != null  !parentRSOpOrder.isEmpty()  
sortPositions.isEmpty()) {
  newKeyCols.addAll(parentRSOp.getConf().getKeyCols());
  orderStr += parentRSOpOrder;
}
{code}



 Dynamic partitioning optimization does not honor sort order or order by
 ---

 Key: HIVE-6883
 URL: https://issues.apache.org/jira/browse/HIVE-6883
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Prasanth J
Assignee: Prasanth J
Priority: Critical
 Fix For: 0.14.0, 0.13.1

 Attachments: HIVE-6883-branch-0.13.3.patch, HIVE-6883.1.patch, 
 HIVE-6883.2.patch, HIVE-6883.3.patch


 HIVE-6455 patch does not honor sort order of the output table or order by of 
 select statement. The reason for the former is numDistributionKey in 
 ReduceSinkDesc is set wrongly. It doesn't take into account the sort columns, 
 because of this RSOp sets the sort columns to null in Key. Since nulls are 
 set in place of sort columns in Key, the sort columns in Value are not 
 sorted. 
 The other issue is ORDER BY columns are not honored during insertion. For 
 example
 {code}
 insert overwrite table over1k_part_orc partition(ds=foo, t) select 
 si,i,b,f,t from over1k_orc where t is null or t=27 order by si;
 {code}
 the select query performs order by on column 'si' in the first MR job. The 
 following MR job (inserted by HIVE-6455), sorts the input data on dynamic 
 partition column 't' without taking into account the already sorted 'si' 
 column. This results in out of order insertion for 'si' column.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-6090) Audit logs for HiveServer2

2014-09-16 Thread Thiruvel Thirumoolan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HIVE-6090:
---
Attachment: HIVE-6090.1.WIP.patch

Uploading a WIP progress patch that should apply cleanly. Will test against a 
live cluster (kerberos) and submit for precommit tests.

 Audit logs for HiveServer2
 --

 Key: HIVE-6090
 URL: https://issues.apache.org/jira/browse/HIVE-6090
 Project: Hive
  Issue Type: Improvement
  Components: Diagnosability, HiveServer2
Reporter: Thiruvel Thirumoolan
Assignee: Thiruvel Thirumoolan
 Attachments: HIVE-6090.1.WIP.patch, HIVE-6090.patch


 HiveMetastore has audit logs and would like to audit all queries or requests 
 to HiveServer2 also. This will help in understanding how the APIs were used, 
 queries submitted, users etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7305) Return value from in.read() is ignored in SerializationUtils#readLongLE()


 [ 
https://issues.apache.org/jira/browse/HIVE-7305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

skrho updated HIVE-7305:

Attachment: HIVE-7305_001.patch

I added null check and size check logic.. Please review my patch~~

 Return value from in.read() is ignored in SerializationUtils#readLongLE()
 -

 Key: HIVE-7305
 URL: https://issues.apache.org/jira/browse/HIVE-7305
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor
 Attachments: HIVE-7305_001.patch


 {code}
   long readLongLE(InputStream in) throws IOException {
 in.read(readBuffer, 0, 8);
 return (((readBuffer[0]  0xff)  0)
 + ((readBuffer[1]  0xff)  8)
 {code}
 Return value from read() may indicate fewer than 8 bytes read.
 The return value should be checked.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7305) Return value from in.read() is ignored in SerializationUtils#readLongLE()


 [ 
https://issues.apache.org/jira/browse/HIVE-7305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

skrho updated HIVE-7305:

Assignee: skrho
  Status: Patch Available  (was: Open)

 Return value from in.read() is ignored in SerializationUtils#readLongLE()
 -

 Key: HIVE-7305
 URL: https://issues.apache.org/jira/browse/HIVE-7305
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
Assignee: skrho
Priority: Minor
 Attachments: HIVE-7305_001.patch


 {code}
   long readLongLE(InputStream in) throws IOException {
 in.read(readBuffer, 0, 8);
 return (((readBuffer[0]  0xff)  0)
 + ((readBuffer[1]  0xff)  8)
 {code}
 Return value from read() may indicate fewer than 8 bytes read.
 The return value should be checked.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-6148) Support arbitrary structs stored in HBase


[ 
https://issues.apache.org/jira/browse/HIVE-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135336#comment-14135336
 ] 

Hive QA commented on HIVE-6148:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12668872/HIVE-6148.1.patch.txt

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6277 tests executed
*Failed tests:*
{noformat}
org.apache.hive.service.TestHS2ImpersonationWithRemoteMS.testImpersonation
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/818/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/818/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-818/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12668872

 Support arbitrary structs stored in HBase
 -

 Key: HIVE-6148
 URL: https://issues.apache.org/jira/browse/HIVE-6148
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Affects Versions: 0.12.0
Reporter: Swarnim Kulkarni
 Attachments: HIVE-6148.1.patch.txt


 We should add support to be able to query arbitrary structs stored in HBase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7935) Support dynamic service discovery for HiveServer2


[ 
https://issues.apache.org/jira/browse/HIVE-7935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135420#comment-14135420
 ] 

Hive QA commented on HIVE-7935:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12668869/HIVE-7935.8.patch

{color:green}SUCCESS:{color} +1 6276 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/819/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/819/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-819/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12668869

 Support dynamic service discovery for HiveServer2
 -

 Key: HIVE-7935
 URL: https://issues.apache.org/jira/browse/HIVE-7935
 Project: Hive
  Issue Type: New Feature
  Components: HiveServer2, JDBC
Affects Versions: 0.14.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
 Fix For: 0.14.0

 Attachments: HIVE-7935.1.patch, HIVE-7935.2.patch, HIVE-7935.3.patch, 
 HIVE-7935.4.patch, HIVE-7935.5.patch, HIVE-7935.6.patch, HIVE-7935.7.patch, 
 HIVE-7935.8.patch


 To support Rolling Upgrade / HA, we need a mechanism by which a JDBC client 
 can dynamically resolve an HiveServer2 to connect to.
 *High Level Design:* 
 Whether, dynamic service discovery is supported or not, can be configured by 
 setting HIVE_SERVER2_SUPPORT_DYNAMIC_SERVICE_DISCOVERY. ZooKeeper is used to 
 support this.
 * When an instance of HiveServer2 comes up, it adds itself as a znode to 
 ZooKeeper under a configurable namespace (HIVE_SERVER2_ZOOKEEPER_NAMESPACE).
 * A JDBC/ODBC client now specifies the ZooKeeper ensemble in its connection 
 string, instead of pointing to a specific HiveServer2 instance. The JDBC 
 driver, uses the ZooKeeper ensemble to pick an instance of HiveServer2 to 
 connect for the entire session.
 * When an instance is removed from ZooKeeper, the existing client sessions 
 continue till completion. When the last client session completes, the 
 instance shuts down.
 * All new client connection pick one of the available HiveServer2 uris from 
 ZooKeeper.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8054) Disable hive.optimize.union.remove when hive.execution.engine=spark [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-8054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135415#comment-14135415
 ] 

Xuefu Zhang commented on HIVE-8054:
---

Thank you for the catch, [~leftylev].

 Disable hive.optimize.union.remove when hive.execution.engine=spark [Spark 
 Branch]
 --

 Key: HIVE-8054
 URL: https://issues.apache.org/jira/browse/HIVE-8054
 Project: Hive
  Issue Type: Improvement
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Na Yang
  Labels: Spark-M1, TODOC-SPARK
 Fix For: spark-branch

 Attachments: HIVE-8054-spark.patch, HIVE-8054.2-spark.patch, 
 HIVE-8054.3-spark.patch


 Option hive.optimize.union.remove introduced in HIVE-3276 removes union 
 operators from the operator graph in certain cases as an optimization reduce 
 the number of MR jobs. While making sense in MR, this optimization is 
 actually harmful to an execution engine such as Spark, which natives supports 
 union without requiring additional jobs. This is because removing union 
 operator creates disjointed operator graphs, each graph generating a job, and 
 thus this optimization requires more jobs to run the query. Not to mention 
 the additional complexity handling linked FS descriptors.
 I propose that we disable such optimization when the execution engine is 
 Spark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8118) SparkMapRecorderHandler and SparkReduceRecordHandler should be initialized with multiple result collectors[Spark Branch]

[
https://issues.apache.org/jira/browse/HIVE-8118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135517#comment-14135517
]

Xuefu Zhang commented on HIVE-8118:
---

Hi [~chengxiang li],

Thank you for your input. I'm not sure if I understand your thought right. Let
me clarify the problem by giving a SparkWork like this:
{code}
MapWork1 - ReduceWork1
\- ReduceWork2
{code}
it means that MapWork1 will generate different datasets to feed to ReduceWork1
and ReduceWork2. In case of multi-insert, ReduceWork1 and ReduceWork2 will have
a FS operator. Inside MapWork1, there will be two operator branches consuming
the same data, and push different data sets to two RS operators. (ReduceWork1
and ReduceWork2 have different HiveReduceFunctions.)

However, current implemenation only takes the first data set and feed it to
both reduce works. The same problem can happen also if MapWork1 were a reduce
work following other ReduceWork or MapWork.

With this problem, I'm not sure how we can get around without letting MapWork1
generate two output RDDs, one for each following reduce work. Potentially, we
can duplicate MapWork1 and have the following diagram:
{code}
MapWork11 - ReduceWork1
MapWork12 - ReduceWork2
{code}
where MapWork11 and MapWork12 consume the same input table (input table as
RDD), and feed its first output RDD to ReduceWork1 and the second to
ReduceWork2. This has its complexity, but more importantly, there will be
wasted READ (unless SPark is smart enough to cache the input table, which is
unlikely) and COMPUTATION (computing data twice). I feel that it's unlikely to
get such optimizations from Spark framework in the near term.

Thus, I think we have to take into consideration that a map work or a reduce
work might generate multiple RDDs, one feeds to each of its children. Since
SparkMapRecorderHandler and SparkReduceRecordHandler are doing the data
processing on map and reduce side, they need to have a way to generate multiple
outputs.

Please correct me if I understood you wrong. Thanks.

SparkMapRecorderHandler and SparkReduceRecordHandler should be initialized
with multiple result collectors[Spark Branch]

Key: HIVE-8118
URL: https://issues.apache.org/jira/browse/HIVE-8118
Project: Hive
Issue Type: Bug
Components: Spark
Reporter: Xuefu Zhang
Assignee: Venki Korukanti
Labels: Spark-M1

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7870) Insert overwrite table query does not generate correct task plan [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-7870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-7870:
--
   Resolution: Fixed
Fix Version/s: spark-branch
   Status: Resolved  (was: Patch Available)

Fixed via HIVE-8017.

 Insert overwrite table query does not generate correct task plan [Spark 
 Branch]
 ---

 Key: HIVE-7870
 URL: https://issues.apache.org/jira/browse/HIVE-7870
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Na Yang
Assignee: Na Yang
  Labels: Spark-M1
 Fix For: spark-branch

 Attachments: HIVE-7870.1-spark.patch, HIVE-7870.2-spark.patch, 
 HIVE-7870.3-spark.patch, HIVE-7870.4-spark.patch, HIVE-7870.5-spark.patch


 Insert overwrite table query does not generate correct task plan when 
 hive.optimize.union.remove and hive.merge.sparkfiles properties are ON. 
 {noformat}
 set hive.optimize.union.remove=true
 set hive.merge.sparkfiles=true
 insert overwrite table outputTbl1
 SELECT * FROM
 (
 select key, 1 as values from inputTbl1
 union all
 select * FROM (
   SELECT key, count(1) as values from inputTbl1 group by key
   UNION ALL
   SELECT key, 2 as values from inputTbl1
 ) a
 )b;
 select * from outputTbl1 order by key, values;
 {noformat}
 query result
 {noformat}
 1 1
 1 2
 2 1
 2 2
 3 1
 3 2
 7 1
 7 2
 8 2
 8 2
 8 2
 {noformat}
 expected result:
 {noformat}
 1 1
 1 1
 1 2
 2 1
 2 1
 2 2
 3 1
 3 1
 3 2
 7 1
 7 1
 7 2
 8 1
 8 1
 8 2
 8 2
 8 2
 {noformat}
 Move work is not working properly and some data are missing during move.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-7870) Insert overwrite table query does not generate correct task plan [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-7870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135532#comment-14135532
 ] 

Xuefu Zhang edited comment on HIVE-7870 at 9/16/14 2:36 PM:


Fixed via HIVE-8054.


was (Author: xuefuz):
Fixed via HIVE-8017.

 Insert overwrite table query does not generate correct task plan [Spark 
 Branch]
 ---

 Key: HIVE-7870
 URL: https://issues.apache.org/jira/browse/HIVE-7870
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Na Yang
Assignee: Na Yang
  Labels: Spark-M1
 Fix For: spark-branch

 Attachments: HIVE-7870.1-spark.patch, HIVE-7870.2-spark.patch, 
 HIVE-7870.3-spark.patch, HIVE-7870.4-spark.patch, HIVE-7870.5-spark.patch


 Insert overwrite table query does not generate correct task plan when 
 hive.optimize.union.remove and hive.merge.sparkfiles properties are ON. 
 {noformat}
 set hive.optimize.union.remove=true
 set hive.merge.sparkfiles=true
 insert overwrite table outputTbl1
 SELECT * FROM
 (
 select key, 1 as values from inputTbl1
 union all
 select * FROM (
   SELECT key, count(1) as values from inputTbl1 group by key
   UNION ALL
   SELECT key, 2 as values from inputTbl1
 ) a
 )b;
 select * from outputTbl1 order by key, values;
 {noformat}
 query result
 {noformat}
 1 1
 1 2
 2 1
 2 2
 3 1
 3 2
 7 1
 7 2
 8 2
 8 2
 8 2
 {noformat}
 expected result:
 {noformat}
 1 1
 1 1
 1 2
 2 1
 2 1
 2 2
 3 1
 3 1
 3 2
 7 1
 7 1
 7 2
 8 1
 8 1
 8 2
 8 2
 8 2
 {noformat}
 Move work is not working properly and some data are missing during move.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8061) improve the partition col stats update speed