[jira] [Commented] (HIVE-8126) Standalone hive-jdbc jar is not packaged in the Hive distribution

2014-09-16 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135017#comment-14135017
 ] 

Ashutosh Chauhan commented on HIVE-8126:


+1

 Standalone hive-jdbc jar is not packaged in the Hive distribution
 -

 Key: HIVE-8126
 URL: https://issues.apache.org/jira/browse/HIVE-8126
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.14.0
Reporter: Deepesh Khandelwal
Assignee: Deepesh Khandelwal
 Fix For: 0.14.0

 Attachments: HIVE-8126.1.patch


 With HIVE-538 we started creating the hive-jdbc-*-standalone.jar but the 
 packaging/distribution does not contain the standalone jdbc jar. I would have 
 expected it to locate under the lib folder of the distribution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 25329: HIVE-7932: It may cause NP exception when add accessed columns to ReadEntity

2014-09-16 Thread Xiaomeng Huang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25329/
---

(Updated Sept. 16, 2014, 6:08 a.m.)


Review request for hive, Brock Noland, Prasad Mujumdar, and Szehon Ho.


Changes
---

fix some format issue seem like patch apply has something wrong.


Repository: hive-git


Description
---

When I execute a query with view join, the view's type is table, but 
tableToColumnAccessMap will not store view's name, so it will throw null 
pointer exception 


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 392f7ce 
  ql/src/test/org/apache/hadoop/hive/ql/parse/TestColumnAccess.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/25329/diff/


Testing
---


Thanks,

Xiaomeng Huang



[jira] [Commented] (HIVE-8107) Bad error message for non-existent table in update and delete

2014-09-16 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135025#comment-14135025
 ] 

Hive QA commented on HIVE-8107:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12668838/HIVE-8107.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6277 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.ql.parse.TestParse.testParse_union
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/814/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/814/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-814/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12668838

 Bad error message for non-existent table in update and delete
 -

 Key: HIVE-8107
 URL: https://issues.apache.org/jira/browse/HIVE-8107
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
 Attachments: HIVE-8107.patch


 update no_such_table set x = 3;
 produces an error message like:
 {noformat}
 2014-09-12 19:45:00,138 ERROR [main]: ql.Driver 
 (SessionState.java:printError(824)) - FAILED: SemanticException [Error 
 10290]: Encountered parse error while parsing rewritten update or delete query
 org.apache.hadoop.hive.ql.parse.SemanticException: Encountered parse error 
 while parsing rewritten update or delete query
   at 
 org.apache.hadoop.hive.ql.parse.UpdateDeleteSemanticAnalyzer.reparseAndSuperAnalyze(UpdateDeleteSemanticAnalyzer.java:130)
   at 
 org.apache.hadoop.hive.ql.parse.UpdateDeleteSemanticAnalyzer.analyzeDelete(UpdateDeleteSemanticAnalyzer.java:97)
   at 
 org.apache.hadoop.hive.ql.parse.UpdateDeleteSemanticAnalyzer.analyzeInternal(UpdateDeleteSemanticAnalyzer.java:66)
   at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:217)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:406)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:302)
   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1051)
   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1121)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:988)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:978)
   at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:246)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:198)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:408)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:344)
   at 
 org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:441)
   at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:457)
   at 
 org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:737)
   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
 Caused by: org.apache.hadoop.hive.ql.metadata.InvalidTableException: Table 
 not found no_such_table
   at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:1008)
   at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:978)
   at 
 org.apache.hadoop.hive.ql.parse.UpdateDeleteSemanticAnalyzer.reparseAndSuperAnalyze(UpdateDeleteSemanticAnalyzer.java:128)
   ... 24 more
 {noformat}
 It should give something much cleaner, or at least push the Table not found 
 message to the top rather than bury it in an exception stack.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-860) Persistent distributed cache

2014-09-16 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu reassigned HIVE-860:
-

Assignee: Ferdinand Xu  (was: Brock Noland)

 Persistent distributed cache
 

 Key: HIVE-860
 URL: https://issues.apache.org/jira/browse/HIVE-860
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.12.0
Reporter: Zheng Shao
Assignee: Ferdinand Xu
 Fix For: 0.14.0

 Attachments: HIVE-860.patch, HIVE-860.patch, HIVE-860.patch, 
 HIVE-860.patch, HIVE-860.patch, HIVE-860.patch, HIVE-860.patch, 
 HIVE-860.patch, HIVE-860.patch, HIVE-860.patch, HIVE-860.patch


 DistributedCache is shared across multiple jobs, if the hdfs file name is the 
 same.
 We need to make sure Hive put the same file into the same location every time 
 and do not overwrite if the file content is the same.
 We can achieve 2 different results:
 A1. Files added with the same name, timestamp, and md5 in the same session 
 will have a single copy in distributed cache.
 A2. Filed added with the same name, timestamp, and md5 will have a single 
 copy in distributed cache.
 A2 has a bigger benefit in sharing but may raise a question on when Hive 
 should clean it up in hdfs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-5744) Implement support for BETWEEN in SELECT list

2014-09-16 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-5744:

Assignee: Navis
  Status: Patch Available  (was: Open)

 Implement support for BETWEEN in SELECT list
 

 Key: HIVE-5744
 URL: https://issues.apache.org/jira/browse/HIVE-5744
 Project: Hive
  Issue Type: Sub-task
Reporter: Eric Hanson
Assignee: Navis
 Attachments: HIVE-4160.1.patch.txt


 Queries like 
 SELECT col1 BETWEEN 0 and 10 from T;
 fail in vectorized mode. Support needs to be implemented for a BETWEEN 
 expression in the SELECT list, comparable to how it was added for comparison 
 operators (, , ...). These were done by adding new, templates that return a 
 value for a comparison instead of applying a filter. See 
 ColumnCompareScalar.txt under ql/src/gen for an example.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-5744) Implement support for BETWEEN in SELECT list

2014-09-16 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-5744:

Attachment: HIVE-4160.1.patch.txt

 Implement support for BETWEEN in SELECT list
 

 Key: HIVE-5744
 URL: https://issues.apache.org/jira/browse/HIVE-5744
 Project: Hive
  Issue Type: Sub-task
Reporter: Eric Hanson
 Attachments: HIVE-4160.1.patch.txt


 Queries like 
 SELECT col1 BETWEEN 0 and 10 from T;
 fail in vectorized mode. Support needs to be implemented for a BETWEEN 
 expression in the SELECT list, comparable to how it was added for comparison 
 operators (, , ...). These were done by adding new, templates that return a 
 value for a comparison instead of applying a filter. See 
 ColumnCompareScalar.txt under ql/src/gen for an example.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-5744) Implement support for BETWEEN in SELECT list

2014-09-16 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-5744:

Attachment: HIVE-5744.1.patch.txt

 Implement support for BETWEEN in SELECT list
 

 Key: HIVE-5744
 URL: https://issues.apache.org/jira/browse/HIVE-5744
 Project: Hive
  Issue Type: Sub-task
Reporter: Eric Hanson
Assignee: Navis
 Attachments: HIVE-5744.1.patch.txt


 Queries like 
 SELECT col1 BETWEEN 0 and 10 from T;
 fail in vectorized mode. Support needs to be implemented for a BETWEEN 
 expression in the SELECT list, comparable to how it was added for comparison 
 operators (, , ...). These were done by adding new, templates that return a 
 value for a comparison instead of applying a filter. See 
 ColumnCompareScalar.txt under ql/src/gen for an example.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8102) Partitions of type 'date' behave incorrectly with daylight saving time.

2014-09-16 Thread Eli Acherkan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135083#comment-14135083
 ] 

Eli Acherkan commented on HIVE-8102:


Thanks [~jdere]! The patch appears to work well for us. (Haven't tested on 
other timezones.)

 Partitions of type 'date' behave incorrectly with daylight saving time.
 ---

 Key: HIVE-8102
 URL: https://issues.apache.org/jira/browse/HIVE-8102
 Project: Hive
  Issue Type: Bug
  Components: Database/Schema, Serializers/Deserializers
Affects Versions: 0.13.0
Reporter: Eli Acherkan
 Attachments: HIVE-8102.1.patch


 On 2AM on March 28th 2014, Israel went from standard time (GMT+2) to daylight 
 saving time (GMT+3).
 The server's timezone is Asia/Jerusalem. When creating a partition whose key 
 is 2014-03-28, Hive creates a partition for 2013-03-27 instead:
 hive (default) create table test (a int) partitioned by (`b_prt` date);
 OK
 Time taken: 0.092 seconds
 hive (default) alter table test add partition (b_prt='2014-03-28');
 OK
 Time taken: 0.187 seconds
 hive (default) show partitions test;   
 OK
 partition
 b_prt=2014-03-27
 Time taken: 0.134 seconds, Fetched: 1 row(s)
 It seems that the root cause is the behavior of 
 DateWritable.daysToMillis/dateToDays.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8038) Decouple ORC files split calculation logic from Filesystem's get file location implementation

2014-09-16 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135101#comment-14135101
 ] 

Hive QA commented on HIVE-8038:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12668843/HIVE-8038.2.patch

{color:green}SUCCESS:{color} +1 6276 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/815/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/815/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-815/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12668843

 Decouple ORC files split calculation logic from Filesystem's get file 
 location implementation
 -

 Key: HIVE-8038
 URL: https://issues.apache.org/jira/browse/HIVE-8038
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Affects Versions: 0.13.1
Reporter: Pankit Thapar
Assignee: Pankit Thapar
 Fix For: 0.14.0

 Attachments: HIVE-8038.2.patch, HIVE-8038.patch


 What is the Current Logic
 ==
 1.get the file blocks from FileSystem.getFileBlockLocations() which returns 
 an array of BlockLocation
 2.In SplitGenerator.createSplit(), check if split only spans one block or 
 multiple blocks.
 3.If split spans just one block, then using the array index (index = 
 offset/blockSize), get the corresponding host having the blockLocation
 4.If the split spans multiple blocks, then get all hosts that have at least 
 80% of the max of total data in split hosted by any host.
 5.add the split to a list of splits
 Issue with Current Logic
 =
 Dependency on FileSystem API’s logic for block location calculations. It 
 returns an array and we need to rely on FileSystem to  
 make all blocks of same size if we want to directly access a block from the 
 array.
  
 What is the Fix
 =
 1a.get the file blocks from FileSystem.getFileBlockLocations() which returns 
 an array of BlockLocation
 1b.convert the array into a tree map offset, BlockLocation and return it 
 through getLocationsWithOffSet()
 2.In SplitGenerator.createSplit(), check if split only spans one block or 
 multiple blocks.
 3.If split spans just one block, then using Tree.floorEntry(key), get the 
 highest entry smaller than offset for the split and get the corresponding 
 host.
 4a.If the split spans multiple blocks, get a submap, which contains all 
 entries containing blockLocations from the offset to offset + length
 4b.get all hosts that have at least 80% of the max of total data in split 
 hosted by any host.
 5.add the split to a list of splits
 What are the major changes in logic
 ==
 1. store BlockLocations in a Map instead of an array
 2. Call SHIMS.getLocationsWithOffSet() instead of getLocations()
 3. one block case is checked by if(offset + length = start.getOffset() + 
 start.getLength())  instead of if((offset % blockSize) + length = 
 blockSize)
 What is the affect on Complexity (Big O)
 =
 1. We add a O(n) loop to build a TreeMap from an array but its a one time 
 cost and would not be called for each split
 2. In case of one block case, we can get the block in O(logn) worst case 
 which was O(1) before
 3. Getting the submap is O(logn)
 4. In case of multiple block case, building the list of hosts is O(m) which 
 was O(n)  m  n as previously we were iterating 
over all the block locations but now we are only iterating only blocks 
 that belong to that range go offsets that we need. 
 What are the benefits of the change
 ==
 1. With this fix, we do not depend on the blockLocations returned by 
 FileSystem to figure out the block corresponding to the offset and blockSize
 2. Also, it is not necessary that block lengths is same for all blocks for 
 all FileSystems
 3. Previously we were using blockSize for one block case and block.length for 
 multiple block case, which is not the case now. We figure out the block
depending upon the actual length and offset of the block



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-6705) hive jdbc can not used by jmeter, because of unsupported auto commit feature

2014-09-16 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis reassigned HIVE-6705:
---

Assignee: Navis

 hive jdbc can not used by jmeter, because of unsupported auto commit feature
 

 Key: HIVE-6705
 URL: https://issues.apache.org/jira/browse/HIVE-6705
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.12.0
 Environment: CentOS_X86_64 
 JMeter 2.11
Reporter: Ben
Assignee: Navis
 Attachments: HIVE-6705.1.patch.txt


 In apache jmeter ,the autocommit property is required.
 but in the hive jdbc the auto commit is unsupported method.
 in 
 /jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveConnection.java
 {quote}
  public void setAutoCommit(boolean autoCommit) throws SQLException {
 // TODO Auto-generated method stub
   throw new {color:red}  SQLException(Method not supported);
 {color}
   }
 {quote}
 so ,should  we make a mock to support  the auto commit property == false ?
 {quote}
 public void setAutoCommit(boolean autoCommit) throws SQLException {
   // TODO Auto-generated method stub
  {color:red}if(autoCommit) {color}
   throw new SQLException(Method not supported);
  else
   return;
  }
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Review Request 25688: hive jdbc can not used by jmeter, because of unsupported auto commit feature

2014-09-16 Thread Navis Ryu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25688/
---

Review request for hive.


Bugs: HIVE-6705
https://issues.apache.org/jira/browse/HIVE-6705


Repository: hive-git


Description
---

In apache jmeter ,the autocommit property is required.
but in the hive jdbc the auto commit is unsupported method.

in 
/jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveConnection.java

{quote}
 public void setAutoCommit(boolean autoCommit) throws SQLException {
// TODO Auto-generated method stub

  throw new {color:red}  SQLException(Method not supported);
{color}
  }
{quote}

so ,should  we make a mock to support  the auto commit property == false ?

{quote}
public void setAutoCommit(boolean autoCommit) throws SQLException {
  // TODO Auto-generated method stub
 {color:red}if(autoCommit) {color}
  throw new SQLException(Method not supported);
 else
  return;
 }
{quote}


Diffs
-

  jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveConnection.java 59ce692 
  jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java cbcfec7 

Diff: https://reviews.apache.org/r/25688/diff/


Testing
---


Thanks,

Navis Ryu



[jira] [Updated] (HIVE-6705) hive jdbc can not used by jmeter, because of unsupported auto commit feature

2014-09-16 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-6705:

Attachment: HIVE-6705.2.patch.txt

 hive jdbc can not used by jmeter, because of unsupported auto commit feature
 

 Key: HIVE-6705
 URL: https://issues.apache.org/jira/browse/HIVE-6705
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.12.0
 Environment: CentOS_X86_64 
 JMeter 2.11
Reporter: Ben
Assignee: Navis
 Attachments: HIVE-6705.1.patch.txt, HIVE-6705.2.patch.txt


 In apache jmeter ,the autocommit property is required.
 but in the hive jdbc the auto commit is unsupported method.
 in 
 /jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveConnection.java
 {quote}
  public void setAutoCommit(boolean autoCommit) throws SQLException {
 // TODO Auto-generated method stub
   throw new {color:red}  SQLException(Method not supported);
 {color}
   }
 {quote}
 so ,should  we make a mock to support  the auto commit property == false ?
 {quote}
 public void setAutoCommit(boolean autoCommit) throws SQLException {
   // TODO Auto-generated method stub
  {color:red}if(autoCommit) {color}
   throw new SQLException(Method not supported);
  else
   return;
  }
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-7996) Potential resource leak in HiveBurnInClient

2014-09-16 Thread skrho (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

skrho reassigned HIVE-7996:
---

Assignee: skrho

 Potential resource leak in HiveBurnInClient
 ---

 Key: HIVE-7996
 URL: https://issues.apache.org/jira/browse/HIVE-7996
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
Assignee: skrho
Priority: Minor

 In createTables() and runQueries(), Statement stmt is not closed upon return.
 In main(), Connection con is not closed upon exit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7996) Potential resource leak in HiveBurnInClient

2014-09-16 Thread skrho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135139#comment-14135139
 ] 

skrho commented on HIVE-7996:
-

Hello Ted Yu~~

What is class name which is fixed?  or Where do I check to fix ? ^^

 Potential resource leak in HiveBurnInClient
 ---

 Key: HIVE-7996
 URL: https://issues.apache.org/jira/browse/HIVE-7996
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
Assignee: skrho
Priority: Minor

 In createTables() and runQueries(), Statement stmt is not closed upon return.
 In main(), Connection con is not closed upon exit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8118) SparkMapRecorderHandler and SparkReduceRecordHandler should be initialized with multiple result collectors[Spark Branch]

2014-09-16 Thread Chengxiang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135151#comment-14135151
 ] 

Chengxiang Li commented on HIVE-8118:
-

Actually, we could generate a spark graph with one map RDD followed by multi 
reduce RDDs, it should not related with SparkMapRecordHandler and 
SparkReduceRecorderHandler, we could wrap each reduce side child operator with 
a separate HiveReduceFunction in SparkCompiler level. 
For a map RDD which is followed by two reduce RDDs and then connected to a 
union RDD, Spark would compute map RDD twice unless map RDD is cached. If two 
reduce share the same shuffle dependency(which means they have same map output 
partitions), the job could be optimized to compute map RDD only once 
theoretically, but i think this should be an Spark framework level 
optimization. while two reduce RDDs don't share the same shuffle dependency, 
map RDD would be computed twice anyway. 
For multi-insert case, if we wrap all FileSinkOperators into one RDD, parent of 
FileSinkOperator would forward rows to each FileSinkOperator, so the data 
source for insert would be only generated once. 
so I think we do not really need multiple result collectors for 
SparkMapRecorderHandler and SparkReduceRecordHandler.

 SparkMapRecorderHandler and SparkReduceRecordHandler should be initialized 
 with multiple result collectors[Spark Branch]
 

 Key: HIVE-8118
 URL: https://issues.apache.org/jira/browse/HIVE-8118
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Venki Korukanti
  Labels: Spark-M1

 In the current implementation, both SparkMapRecordHandler and 
 SparkReduceRecorderHandler takes only one result collector, which limits that 
 the corresponding map or reduce task can have only one child. It's very 
 comment in multi-insert queries where a map/reduce task has more than one 
 children. A query like the following has two map tasks as parents:
 {code}
 select name, sum(value) from dec group by name union all select name, value 
 from dec order by name
 {code}
 It's possible in the future an optimation may be implemented so that a map 
 work is followed by two reduce works and then connected to a union work.
 Thus, we should take this as a general case. Tez is currently providing a 
 collector for each child operator in the map-side or reduce side operator 
 tree. We can take Tez as a reference.
 Likely this is a big change and subtasks are possible. 
 With this, we can have a simpler and clean multi-insert implementation. This 
 is also the problem observed in HIVE-7731.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8104) Insert statements against ACID tables NPE when vectorization is on

2014-09-16 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135169#comment-14135169
 ] 

Hive QA commented on HIVE-8104:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12668847/HIVE-8104.patch

{color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 6277 tests 
executed
*Failed tests:*
{noformat}
org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes
org.apache.hive.hcatalog.streaming.TestStreaming.testAddPartition
org.apache.hive.hcatalog.streaming.TestStreaming.testEndpointConnection
org.apache.hive.hcatalog.streaming.TestStreaming.testInterleavedTransactionBatchCommits
org.apache.hive.hcatalog.streaming.TestStreaming.testMultipleTransactionBatchCommits
org.apache.hive.hcatalog.streaming.TestStreaming.testRemainingTransactions
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchAbort
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchAbortAndCommit
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchCommit_Delimited
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchCommit_Json
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyAbort
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/816/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/816/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-816/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 12 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12668847

 Insert statements against ACID tables NPE when vectorization is on
 --

 Key: HIVE-8104
 URL: https://issues.apache.org/jira/browse/HIVE-8104
 Project: Hive
  Issue Type: Bug
  Components: Query Processor, Vectorization
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Critical
 Attachments: HIVE-8104.patch


 Doing an insert against a table that is using ACID format with the 
 transaction manager set to DbTxnManager and vectorization turned on results 
 in an NPE.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-6883) Dynamic partitioning optimization does not honor sort order or order by

2014-09-16 Thread Zhichun Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135176#comment-14135176
 ] 

Zhichun Wu commented on HIVE-6883:
--

@ [~prasanth_j] , this fix cause some problems when combine dynamic 
partitioning with group by. Consider the following case:
{code}
CREATE TABLE `t1`(  `a` int,`b` string) PARTITIONED BY (`dt` string);
create table src1 (
  `key` string,
  `val` string
);
explain insert overwrite table t1 partition(dt) select 1, hello, 20140901 
from src1 group by key;
{code}
The key expressions of RS in Stage-2 are wrong. The part of the patch which 
using the parent RS's keyCols needs more changes.
{code}
 if (parentRSOpOrder != null  !parentRSOpOrder.isEmpty()  
sortPositions.isEmpty()) {
  newKeyCols.addAll(parentRSOp.getConf().getKeyCols());
  orderStr += parentRSOpOrder;
}
{code}



 Dynamic partitioning optimization does not honor sort order or order by
 ---

 Key: HIVE-6883
 URL: https://issues.apache.org/jira/browse/HIVE-6883
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Prasanth J
Assignee: Prasanth J
Priority: Critical
 Fix For: 0.14.0, 0.13.1

 Attachments: HIVE-6883-branch-0.13.3.patch, HIVE-6883.1.patch, 
 HIVE-6883.2.patch, HIVE-6883.3.patch


 HIVE-6455 patch does not honor sort order of the output table or order by of 
 select statement. The reason for the former is numDistributionKey in 
 ReduceSinkDesc is set wrongly. It doesn't take into account the sort columns, 
 because of this RSOp sets the sort columns to null in Key. Since nulls are 
 set in place of sort columns in Key, the sort columns in Value are not 
 sorted. 
 The other issue is ORDER BY columns are not honored during insertion. For 
 example
 {code}
 insert overwrite table over1k_part_orc partition(ds=foo, t) select 
 si,i,b,f,t from over1k_orc where t is null or t=27 order by si;
 {code}
 the select query performs order by on column 'si' in the first MR job. The 
 following MR job (inserted by HIVE-6455), sorts the input data on dynamic 
 partition column 't' without taking into account the already sorted 'si' 
 column. This results in out of order insertion for 'si' column.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-6090) Audit logs for HiveServer2

2014-09-16 Thread Thiruvel Thirumoolan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HIVE-6090:
---
Attachment: HIVE-6090.1.WIP.patch

Uploading a WIP progress patch that should apply cleanly. Will test against a 
live cluster (kerberos) and submit for precommit tests.

 Audit logs for HiveServer2
 --

 Key: HIVE-6090
 URL: https://issues.apache.org/jira/browse/HIVE-6090
 Project: Hive
  Issue Type: Improvement
  Components: Diagnosability, HiveServer2
Reporter: Thiruvel Thirumoolan
Assignee: Thiruvel Thirumoolan
 Attachments: HIVE-6090.1.WIP.patch, HIVE-6090.patch


 HiveMetastore has audit logs and would like to audit all queries or requests 
 to HiveServer2 also. This will help in understanding how the APIs were used, 
 queries submitted, users etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7305) Return value from in.read() is ignored in SerializationUtils#readLongLE()

2014-09-16 Thread skrho (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

skrho updated HIVE-7305:

Attachment: HIVE-7305_001.patch

I added null check and size check logic.. Please review my patch~~

 Return value from in.read() is ignored in SerializationUtils#readLongLE()
 -

 Key: HIVE-7305
 URL: https://issues.apache.org/jira/browse/HIVE-7305
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor
 Attachments: HIVE-7305_001.patch


 {code}
   long readLongLE(InputStream in) throws IOException {
 in.read(readBuffer, 0, 8);
 return (((readBuffer[0]  0xff)  0)
 + ((readBuffer[1]  0xff)  8)
 {code}
 Return value from read() may indicate fewer than 8 bytes read.
 The return value should be checked.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7305) Return value from in.read() is ignored in SerializationUtils#readLongLE()

2014-09-16 Thread skrho (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

skrho updated HIVE-7305:

Assignee: skrho
  Status: Patch Available  (was: Open)

 Return value from in.read() is ignored in SerializationUtils#readLongLE()
 -

 Key: HIVE-7305
 URL: https://issues.apache.org/jira/browse/HIVE-7305
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
Assignee: skrho
Priority: Minor
 Attachments: HIVE-7305_001.patch


 {code}
   long readLongLE(InputStream in) throws IOException {
 in.read(readBuffer, 0, 8);
 return (((readBuffer[0]  0xff)  0)
 + ((readBuffer[1]  0xff)  8)
 {code}
 Return value from read() may indicate fewer than 8 bytes read.
 The return value should be checked.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-6148) Support arbitrary structs stored in HBase

2014-09-16 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135336#comment-14135336
 ] 

Hive QA commented on HIVE-6148:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12668872/HIVE-6148.1.patch.txt

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6277 tests executed
*Failed tests:*
{noformat}
org.apache.hive.service.TestHS2ImpersonationWithRemoteMS.testImpersonation
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/818/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/818/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-818/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12668872

 Support arbitrary structs stored in HBase
 -

 Key: HIVE-6148
 URL: https://issues.apache.org/jira/browse/HIVE-6148
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Affects Versions: 0.12.0
Reporter: Swarnim Kulkarni
 Attachments: HIVE-6148.1.patch.txt


 We should add support to be able to query arbitrary structs stored in HBase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7935) Support dynamic service discovery for HiveServer2

2014-09-16 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135420#comment-14135420
 ] 

Hive QA commented on HIVE-7935:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12668869/HIVE-7935.8.patch

{color:green}SUCCESS:{color} +1 6276 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/819/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/819/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-819/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12668869

 Support dynamic service discovery for HiveServer2
 -

 Key: HIVE-7935
 URL: https://issues.apache.org/jira/browse/HIVE-7935
 Project: Hive
  Issue Type: New Feature
  Components: HiveServer2, JDBC
Affects Versions: 0.14.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
 Fix For: 0.14.0

 Attachments: HIVE-7935.1.patch, HIVE-7935.2.patch, HIVE-7935.3.patch, 
 HIVE-7935.4.patch, HIVE-7935.5.patch, HIVE-7935.6.patch, HIVE-7935.7.patch, 
 HIVE-7935.8.patch


 To support Rolling Upgrade / HA, we need a mechanism by which a JDBC client 
 can dynamically resolve an HiveServer2 to connect to.
 *High Level Design:* 
 Whether, dynamic service discovery is supported or not, can be configured by 
 setting HIVE_SERVER2_SUPPORT_DYNAMIC_SERVICE_DISCOVERY. ZooKeeper is used to 
 support this.
 * When an instance of HiveServer2 comes up, it adds itself as a znode to 
 ZooKeeper under a configurable namespace (HIVE_SERVER2_ZOOKEEPER_NAMESPACE).
 * A JDBC/ODBC client now specifies the ZooKeeper ensemble in its connection 
 string, instead of pointing to a specific HiveServer2 instance. The JDBC 
 driver, uses the ZooKeeper ensemble to pick an instance of HiveServer2 to 
 connect for the entire session.
 * When an instance is removed from ZooKeeper, the existing client sessions 
 continue till completion. When the last client session completes, the 
 instance shuts down.
 * All new client connection pick one of the available HiveServer2 uris from 
 ZooKeeper.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8054) Disable hive.optimize.union.remove when hive.execution.engine=spark [Spark Branch]

2014-09-16 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135415#comment-14135415
 ] 

Xuefu Zhang commented on HIVE-8054:
---

Thank you for the catch, [~leftylev].

 Disable hive.optimize.union.remove when hive.execution.engine=spark [Spark 
 Branch]
 --

 Key: HIVE-8054
 URL: https://issues.apache.org/jira/browse/HIVE-8054
 Project: Hive
  Issue Type: Improvement
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Na Yang
  Labels: Spark-M1, TODOC-SPARK
 Fix For: spark-branch

 Attachments: HIVE-8054-spark.patch, HIVE-8054.2-spark.patch, 
 HIVE-8054.3-spark.patch


 Option hive.optimize.union.remove introduced in HIVE-3276 removes union 
 operators from the operator graph in certain cases as an optimization reduce 
 the number of MR jobs. While making sense in MR, this optimization is 
 actually harmful to an execution engine such as Spark, which natives supports 
 union without requiring additional jobs. This is because removing union 
 operator creates disjointed operator graphs, each graph generating a job, and 
 thus this optimization requires more jobs to run the query. Not to mention 
 the additional complexity handling linked FS descriptors.
 I propose that we disable such optimization when the execution engine is 
 Spark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8118) SparkMapRecorderHandler and SparkReduceRecordHandler should be initialized with multiple result collectors[Spark Branch]

2014-09-16 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135517#comment-14135517
 ] 

Xuefu Zhang commented on HIVE-8118:
---

Hi [~chengxiang li],

Thank you for your input. I'm not sure if I understand your thought right. Let 
me clarify the problem  by giving a SparkWork like this:
{code}
MapWork1 - ReduceWork1
 \- ReduceWork2
{code}
it means that MapWork1 will generate different datasets to feed to ReduceWork1 
and ReduceWork2. In case of multi-insert, ReduceWork1 and ReduceWork2 will have 
a FS operator. Inside MapWork1, there will be two operator branches consuming 
the same data, and push different data sets to two RS operators. (ReduceWork1 
and ReduceWork2 have different HiveReduceFunctions.)

However, current implemenation only takes the first data set and feed it to 
both reduce works. The same problem can happen also if MapWork1 were a reduce 
work following other ReduceWork or MapWork.

With this problem, I'm not sure how we can get around without letting MapWork1 
generate two output RDDs, one for each following reduce work. Potentially, we 
can duplicate MapWork1 and have the following diagram:
{code}
MapWork11 - ReduceWork1
MapWork12 - ReduceWork2
{code}
where MapWork11 and MapWork12 consume the same input table (input table as 
RDD), and feed its first output RDD to ReduceWork1 and the second to 
ReduceWork2. This has its complexity, but more importantly, there will be 
wasted READ (unless SPark is smart enough to cache the input table, which is 
unlikely) and COMPUTATION (computing data twice). I feel that it's unlikely to 
get such optimizations from Spark framework in the near term.

Thus, I think we have to take into consideration that a map work or a reduce 
work might generate multiple RDDs, one feeds to each of its children. Since 
SparkMapRecorderHandler and SparkReduceRecordHandler are doing the data 
processing on map and reduce side, they need to have a way to generate multiple 
outputs.

Please correct me if I understood you wrong. Thanks.


 SparkMapRecorderHandler and SparkReduceRecordHandler should be initialized 
 with multiple result collectors[Spark Branch]
 

 Key: HIVE-8118
 URL: https://issues.apache.org/jira/browse/HIVE-8118
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Venki Korukanti
  Labels: Spark-M1

 In the current implementation, both SparkMapRecordHandler and 
 SparkReduceRecorderHandler takes only one result collector, which limits that 
 the corresponding map or reduce task can have only one child. It's very 
 comment in multi-insert queries where a map/reduce task has more than one 
 children. A query like the following has two map tasks as parents:
 {code}
 select name, sum(value) from dec group by name union all select name, value 
 from dec order by name
 {code}
 It's possible in the future an optimation may be implemented so that a map 
 work is followed by two reduce works and then connected to a union work.
 Thus, we should take this as a general case. Tez is currently providing a 
 collector for each child operator in the map-side or reduce side operator 
 tree. We can take Tez as a reference.
 Likely this is a big change and subtasks are possible. 
 With this, we can have a simpler and clean multi-insert implementation. This 
 is also the problem observed in HIVE-7731.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7870) Insert overwrite table query does not generate correct task plan [Spark Branch]

2014-09-16 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-7870:
--
   Resolution: Fixed
Fix Version/s: spark-branch
   Status: Resolved  (was: Patch Available)

Fixed via HIVE-8017.

 Insert overwrite table query does not generate correct task plan [Spark 
 Branch]
 ---

 Key: HIVE-7870
 URL: https://issues.apache.org/jira/browse/HIVE-7870
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Na Yang
Assignee: Na Yang
  Labels: Spark-M1
 Fix For: spark-branch

 Attachments: HIVE-7870.1-spark.patch, HIVE-7870.2-spark.patch, 
 HIVE-7870.3-spark.patch, HIVE-7870.4-spark.patch, HIVE-7870.5-spark.patch


 Insert overwrite table query does not generate correct task plan when 
 hive.optimize.union.remove and hive.merge.sparkfiles properties are ON. 
 {noformat}
 set hive.optimize.union.remove=true
 set hive.merge.sparkfiles=true
 insert overwrite table outputTbl1
 SELECT * FROM
 (
 select key, 1 as values from inputTbl1
 union all
 select * FROM (
   SELECT key, count(1) as values from inputTbl1 group by key
   UNION ALL
   SELECT key, 2 as values from inputTbl1
 ) a
 )b;
 select * from outputTbl1 order by key, values;
 {noformat}
 query result
 {noformat}
 1 1
 1 2
 2 1
 2 2
 3 1
 3 2
 7 1
 7 2
 8 2
 8 2
 8 2
 {noformat}
 expected result:
 {noformat}
 1 1
 1 1
 1 2
 2 1
 2 1
 2 2
 3 1
 3 1
 3 2
 7 1
 7 1
 7 2
 8 1
 8 1
 8 2
 8 2
 8 2
 {noformat}
 Move work is not working properly and some data are missing during move.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-7870) Insert overwrite table query does not generate correct task plan [Spark Branch]

2014-09-16 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135532#comment-14135532
 ] 

Xuefu Zhang edited comment on HIVE-7870 at 9/16/14 2:36 PM:


Fixed via HIVE-8054.


was (Author: xuefuz):
Fixed via HIVE-8017.

 Insert overwrite table query does not generate correct task plan [Spark 
 Branch]
 ---

 Key: HIVE-7870
 URL: https://issues.apache.org/jira/browse/HIVE-7870
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Na Yang
Assignee: Na Yang
  Labels: Spark-M1
 Fix For: spark-branch

 Attachments: HIVE-7870.1-spark.patch, HIVE-7870.2-spark.patch, 
 HIVE-7870.3-spark.patch, HIVE-7870.4-spark.patch, HIVE-7870.5-spark.patch


 Insert overwrite table query does not generate correct task plan when 
 hive.optimize.union.remove and hive.merge.sparkfiles properties are ON. 
 {noformat}
 set hive.optimize.union.remove=true
 set hive.merge.sparkfiles=true
 insert overwrite table outputTbl1
 SELECT * FROM
 (
 select key, 1 as values from inputTbl1
 union all
 select * FROM (
   SELECT key, count(1) as values from inputTbl1 group by key
   UNION ALL
   SELECT key, 2 as values from inputTbl1
 ) a
 )b;
 select * from outputTbl1 order by key, values;
 {noformat}
 query result
 {noformat}
 1 1
 1 2
 2 1
 2 2
 3 1
 3 2
 7 1
 7 2
 8 2
 8 2
 8 2
 {noformat}
 expected result:
 {noformat}
 1 1
 1 1
 1 2
 2 1
 2 1
 2 2
 3 1
 3 1
 3 2
 7 1
 7 1
 7 2
 8 1
 8 1
 8 2
 8 2
 8 2
 {noformat}
 Move work is not working properly and some data are missing during move.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8061) improve the partition col stats update speed

2014-09-16 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135567#comment-14135567
 ] 

Hive QA commented on HIVE-8061:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12668871/HIVE-8061.4.patch

{color:green}SUCCESS:{color} +1 6276 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/820/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/820/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-820/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12668871

 improve the partition col stats update speed
 

 Key: HIVE-8061
 URL: https://issues.apache.org/jira/browse/HIVE-8061
 Project: Hive
  Issue Type: Improvement
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
Priority: Minor
 Attachments: HIVE-8061.1.patch, HIVE-8061.2.patch, HIVE-8061.3.patch, 
 HIVE-8061.4.patch


 We worked hard towards faster update stats for columns of a partition of a 
 table previously HIVE-7736
 and HIVE-7876
 Although there is some improvement, it is only correct in the first run. 
 There will be duplicate column stats later. Thanks to Eugene Koifman 's 
 comments.
 We fixed this in HIVE-7944 by reversing the patch.
 This JIRA ticket is my another try to improve the speed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-8128) Improve Parquet Vectorization

2014-09-16 Thread Brock Noland (JIRA)
Brock Noland created HIVE-8128:
--

 Summary: Improve Parquet Vectorization
 Key: HIVE-8128
 URL: https://issues.apache.org/jira/browse/HIVE-8128
 Project: Hive
  Issue Type: Sub-task
Reporter: Brock Noland


We'll want to do is finish the vectorization work (e.g. VectorizedOrcSerde, 
VectorizedOrcSerde) which was partially done in HIVE-5998.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8121) Create micro-benchmarks for ParquetSerde and evaluate performance

2014-09-16 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-8121:
---
Description: 
These benchmarks should not execute queries but test only the ParquetSerde code 
to ensure we are as efficient as possible. 

The output of this JIRA is:

1) Benchmark tool exists
2) We create new tasks under HIVE-8120 to track the improvements required

  was:
These benchmarks should not execute queries but test only the ParquetSerde code 
to ensure we are as efficient as possible. Likely the first thing we'll want to 
do is finish the vectorization work (e.g. VectorizedOrcSerde, 
VectorizedOrcSerde) which was partially done in HIVE-5998.

The output of this JIRA is:

1) Benchmark tool exists
2) We create new tasks under HIVE-8120 to track the improvements required


 Create micro-benchmarks for ParquetSerde and evaluate performance
 -

 Key: HIVE-8121
 URL: https://issues.apache.org/jira/browse/HIVE-8121
 Project: Hive
  Issue Type: Sub-task
Reporter: Brock Noland

 These benchmarks should not execute queries but test only the ParquetSerde 
 code to ensure we are as efficient as possible. 
 The output of this JIRA is:
 1) Benchmark tool exists
 2) We create new tasks under HIVE-8120 to track the improvements required



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-8130) Support Date in Avro

2014-09-16 Thread Brock Noland (JIRA)
Brock Noland created HIVE-8130:
--

 Summary: Support Date in Avro
 Key: HIVE-8130
 URL: https://issues.apache.org/jira/browse/HIVE-8130
 Project: Hive
  Issue Type: Sub-task
Reporter: Brock Noland






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-8131) Support timestamp in Avro

2014-09-16 Thread Brock Noland (JIRA)
Brock Noland created HIVE-8131:
--

 Summary: Support timestamp in Avro
 Key: HIVE-8131
 URL: https://issues.apache.org/jira/browse/HIVE-8131
 Project: Hive
  Issue Type: Sub-task
Reporter: Brock Noland






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-8132) Support avro ACID (bulk update)

2014-09-16 Thread Brock Noland (JIRA)
Brock Noland created HIVE-8132:
--

 Summary: Support avro ACID (bulk update)
 Key: HIVE-8132
 URL: https://issues.apache.org/jira/browse/HIVE-8132
 Project: Hive
  Issue Type: Sub-task
Reporter: Brock Noland






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8120) Umbrella JIRA tracking Parquet improvements

2014-09-16 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135614#comment-14135614
 ] 

Brock Noland commented on HIVE-8120:


The view from my side is:

* Perf (Benchmarks, vectorization) (P1)
* Data types (P2)
* Refactoring/cleanup (P2)
* ACID (bulk update) (P3)

 Umbrella JIRA tracking Parquet improvements
 ---

 Key: HIVE-8120
 URL: https://issues.apache.org/jira/browse/HIVE-8120
 Project: Hive
  Issue Type: Improvement
Reporter: Brock Noland





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8129) Umbrella JIRA to track Avro improvements

2014-09-16 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135617#comment-14135617
 ] 

Brock Noland commented on HIVE-8129:



* Data types (P1)
* ACID (bulk update) (P2)

 Umbrella JIRA to track Avro improvements
 

 Key: HIVE-8129
 URL: https://issues.apache.org/jira/browse/HIVE-8129
 Project: Hive
  Issue Type: Improvement
Reporter: Brock Noland





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8120) Umbrella JIRA tracking Parquet improvements

2014-09-16 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-8120:
---
Summary: Umbrella JIRA tracking Parquet improvements  (was: Umbrella JIRA 
tracking Parquet work)

 Umbrella JIRA tracking Parquet improvements
 ---

 Key: HIVE-8120
 URL: https://issues.apache.org/jira/browse/HIVE-8120
 Project: Hive
  Issue Type: Improvement
Reporter: Brock Noland





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-8129) Umbrella JIRA to track Avro improvements

2014-09-16 Thread Brock Noland (JIRA)
Brock Noland created HIVE-8129:
--

 Summary: Umbrella JIRA to track Avro improvements
 Key: HIVE-8129
 URL: https://issues.apache.org/jira/browse/HIVE-8129
 Project: Hive
  Issue Type: Improvement
Reporter: Brock Noland






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-8133) Support Postgres via DirectSQL

2014-09-16 Thread Brock Noland (JIRA)
Brock Noland created HIVE-8133:
--

 Summary: Support Postgres via DirectSQL
 Key: HIVE-8133
 URL: https://issues.apache.org/jira/browse/HIVE-8133
 Project: Hive
  Issue Type: Sub-task
Reporter: Brock Noland






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-8134) concurrency improvements

2014-09-16 Thread Brock Noland (JIRA)
Brock Noland created HIVE-8134:
--

 Summary: concurrency improvements
 Key: HIVE-8134
 URL: https://issues.apache.org/jira/browse/HIVE-8134
 Project: Hive
  Issue Type: Improvement
Reporter: Brock Noland


The goal of this JIRA is track supportability issues with concurrent users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8134) Umbrella JIRA to track concurrency improvements

2014-09-16 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-8134:
---
Summary: Umbrella JIRA to track concurrency improvements  (was: concurrency 
improvements)

 Umbrella JIRA to track concurrency improvements
 ---

 Key: HIVE-8134
 URL: https://issues.apache.org/jira/browse/HIVE-8134
 Project: Hive
  Issue Type: Improvement
Reporter: Brock Noland

 The goal of this JIRA is track supportability issues with concurrent users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8133) Support Postgres via DirectSQL

2014-09-16 Thread Damien Carol (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135639#comment-14135639
 ] 

Damien Carol commented on HIVE-8133:


[~brocknoland] The first step should be to enable Postgres as Metastore back 
end BEFORE trying to do direct SQL.
Currentlry metastore can't work on Postgres. See HIVE-7689
I'm trying to fix normal use of Metastore with Postgres in HIVE-7689
I can take this ticket after if it's possible. 

 Support Postgres via DirectSQL
 --

 Key: HIVE-8133
 URL: https://issues.apache.org/jira/browse/HIVE-8133
 Project: Hive
  Issue Type: Sub-task
Reporter: Brock Noland





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-8118) SparkMapRecorderHandler and SparkReduceRecordHandler should be initialized with multiple result collectors[Spark Branch]

2014-09-16 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135517#comment-14135517
 ] 

Xuefu Zhang edited comment on HIVE-8118 at 9/16/14 4:02 PM:


Hi [~chengxiang li],

Thank you for your input. I'm not sure if I understand your thought right. Let 
me clarify the problem  by giving a SparkWork like this:
{code}
MapWork1 - ReduceWork1
  \- ReduceWork2
{code}
it means that MapWork1 will generate different datasets to feed to ReduceWork1 
and ReduceWork2. In case of multi-insert, ReduceWork1 and ReduceWork2 will have 
a FS operator. Inside MapWork1, there will be two operator branches consuming 
the same data, and push different data sets to two RS operators. (ReduceWork1 
and ReduceWork2 have different HiveReduceFunctions.)

However, current implemenation only takes the first data set and feed it to 
both reduce works. The same problem can happen also if MapWork1 were a reduce 
work following other ReduceWork or MapWork.

With this problem, I'm not sure how we can get around without letting MapWork1 
generate two output RDDs, one for each following reduce work. Potentially, we 
can duplicate MapWork1 and have the following diagram:
{code}
MapWork11 - ReduceWork1
MapWork12 - ReduceWork2
{code}
where MapWork11 and MapWork12 consume the same input table (input table as 
RDD), and feed its first output RDD to ReduceWork1 and the second to 
ReduceWork2. This has its complexity, but more importantly, there will be 
wasted READ (unless SPark is smart enough to cache the input table, which is 
unlikely) and COMPUTATION (computing data twice). I feel that it's unlikely to 
get such optimizations from Spark framework in the near term.

Thus, I think we have to take into consideration that a map work or a reduce 
work might generate multiple RDDs, one feeds to each of its children. Since 
SparkMapRecorderHandler and SparkReduceRecordHandler are doing the data 
processing on map and reduce side, they need to have a way to generate multiple 
outputs.

Please correct me if I understood you wrong. Thanks.



was (Author: xuefuz):
Hi [~chengxiang li],

Thank you for your input. I'm not sure if I understand your thought right. Let 
me clarify the problem  by giving a SparkWork like this:
{code}
MapWork1 - ReduceWork1
 \- ReduceWork2
{code}
it means that MapWork1 will generate different datasets to feed to ReduceWork1 
and ReduceWork2. In case of multi-insert, ReduceWork1 and ReduceWork2 will have 
a FS operator. Inside MapWork1, there will be two operator branches consuming 
the same data, and push different data sets to two RS operators. (ReduceWork1 
and ReduceWork2 have different HiveReduceFunctions.)

However, current implemenation only takes the first data set and feed it to 
both reduce works. The same problem can happen also if MapWork1 were a reduce 
work following other ReduceWork or MapWork.

With this problem, I'm not sure how we can get around without letting MapWork1 
generate two output RDDs, one for each following reduce work. Potentially, we 
can duplicate MapWork1 and have the following diagram:
{code}
MapWork11 - ReduceWork1
MapWork12 - ReduceWork2
{code}
where MapWork11 and MapWork12 consume the same input table (input table as 
RDD), and feed its first output RDD to ReduceWork1 and the second to 
ReduceWork2. This has its complexity, but more importantly, there will be 
wasted READ (unless SPark is smart enough to cache the input table, which is 
unlikely) and COMPUTATION (computing data twice). I feel that it's unlikely to 
get such optimizations from Spark framework in the near term.

Thus, I think we have to take into consideration that a map work or a reduce 
work might generate multiple RDDs, one feeds to each of its children. Since 
SparkMapRecorderHandler and SparkReduceRecordHandler are doing the data 
processing on map and reduce side, they need to have a way to generate multiple 
outputs.

Please correct me if I understood you wrong. Thanks.


 SparkMapRecorderHandler and SparkReduceRecordHandler should be initialized 
 with multiple result collectors[Spark Branch]
 

 Key: HIVE-8118
 URL: https://issues.apache.org/jira/browse/HIVE-8118
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Venki Korukanti
  Labels: Spark-M1

 In the current implementation, both SparkMapRecordHandler and 
 SparkReduceRecorderHandler takes only one result collector, which limits that 
 the corresponding map or reduce task can have only one child. It's very 
 comment in multi-insert queries where a map/reduce task has more than one 
 children. A query like the following has two map tasks as parents:
 

[jira] [Assigned] (HIVE-8133) Support Postgres via DirectSQL

2014-09-16 Thread Damien Carol (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Carol reassigned HIVE-8133:
--

Assignee: Damien Carol

 Support Postgres via DirectSQL
 --

 Key: HIVE-8133
 URL: https://issues.apache.org/jira/browse/HIVE-8133
 Project: Hive
  Issue Type: Sub-task
Reporter: Brock Noland
Assignee: Damien Carol





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-8136) Finer grained locking

2014-09-16 Thread Brock Noland (JIRA)
Brock Noland created HIVE-8136:
--

 Summary: Finer grained locking
 Key: HIVE-8136
 URL: https://issues.apache.org/jira/browse/HIVE-8136
 Project: Hive
  Issue Type: Sub-task
Reporter: Brock Noland


When using ZK for concurrency control, some statements require an exclusive 
table lock when they are atomic. Such as setting a tables location.

This JIRA is to analyze the scope of statements like ALTER TABLE and see if we 
can reduce the locking required.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8094) add LIKE keyword support for SHOW FUNCTIONS

2014-09-16 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135674#comment-14135674
 ] 

Hive QA commented on HIVE-8094:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12668887/HIVE-8094.1.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6276 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_delete_where_partitioned
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/821/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/821/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-821/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12668887

 add LIKE keyword support for SHOW FUNCTIONS
 ---

 Key: HIVE-8094
 URL: https://issues.apache.org/jira/browse/HIVE-8094
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.14.0, 0.13.1
Reporter: peter liu
Assignee: peter liu
 Fix For: 0.14.0

 Attachments: HIVE-8094.1.patch


 It would be nice to  add LIKE keyword support for SHOW FUNCTIONS as below, 
 and keep the patterns consistent to the way as SHOW DATABASES, SHOW TABLES.
 bq. SHOW FUNCTIONS LIKE 'foo*';



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8080) CBO: function name may not match UDF name during translation

2014-09-16 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135707#comment-14135707
 ] 

Sergey Shelukhin commented on HIVE-8080:


[~ashutoshc] [~jpullokkaran] ping?

 CBO: function name may not match UDF name during translation
 

 Key: HIVE-8080
 URL: https://issues.apache.org/jira/browse/HIVE-8080
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-8080.01.patch, HIVE-8080.02.patch, HIVE-8080.patch


 create_func1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7812) Disable CombineHiveInputFormat when ACID format is used

2014-09-16 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-7812:

Status: Patch Available  (was: Reopened)

 Disable CombineHiveInputFormat when ACID format is used
 ---

 Key: HIVE-7812
 URL: https://issues.apache.org/jira/browse/HIVE-7812
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 0.14.0

 Attachments: HIVE-7812.patch, HIVE-7812.patch, HIVE-7812.patch


 Currently the HiveCombineInputFormat complains when called on an ACID 
 directory. Modify HiveCombineInputFormat so that HiveInputFormat is used 
 instead if the directory is ACID format.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7812) Disable CombineHiveInputFormat when ACID format is used

2014-09-16 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-7812:

Attachment: HIVE-7812.patch

I fixed the problem that was causing trouble for the new Tez tests.

 Disable CombineHiveInputFormat when ACID format is used
 ---

 Key: HIVE-7812
 URL: https://issues.apache.org/jira/browse/HIVE-7812
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 0.14.0

 Attachments: HIVE-7812.patch, HIVE-7812.patch, HIVE-7812.patch, 
 HIVE-7812.patch


 Currently the HiveCombineInputFormat complains when called on an ACID 
 directory. Modify HiveCombineInputFormat so that HiveInputFormat is used 
 instead if the directory is ACID format.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8055) Code cleanup after HIVE-8054 [Spark Branch]

2014-09-16 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-8055:
---
Issue Type: Sub-task  (was: Task)
Parent: HIVE-7292

 Code cleanup after HIVE-8054 [Spark Branch]
 ---

 Key: HIVE-8055
 URL: https://issues.apache.org/jira/browse/HIVE-8055
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
  Labels: Spark-M1

 There is quite some code handling union removal optimization in SparkCompiler 
 and related classes. We need to clean this up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8080) CBO: function name may not match UDF name during translation

2014-09-16 Thread Laljo John Pullokkaran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135740#comment-14135740
 ] 

Laljo John Pullokkaran commented on HIVE-8080:
--

Could you add a RB entry?
Will be easier to read the patch.

Thanks

 CBO: function name may not match UDF name during translation
 

 Key: HIVE-8080
 URL: https://issues.apache.org/jira/browse/HIVE-8080
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-8080.01.patch, HIVE-8080.02.patch, HIVE-8080.patch


 create_func1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Review Request 25700: HIVE-8080 CBO: function name may not match UDF name during translation

2014-09-16 Thread Sergey Shelukhin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25700/
---

Review request for hive, Ashutosh Chauhan and John Pullokkaran.


Repository: hive-git


Description
---

see jira


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java c503bbb 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/RexNodeConverter.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/SqlFunctionConverter.java
 PRE-CREATION 
  ql/src/test/queries/clientpositive/create_func1.q ad924d3 
  ql/src/test/results/clientpositive/create_func1.q.out 798f77f 

Diff: https://reviews.apache.org/r/25700/diff/


Testing
---


Thanks,

Sergey Shelukhin



[jira] [Commented] (HIVE-8080) CBO: function name may not match UDF name during translation

2014-09-16 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135758#comment-14135758
 ] 

Sergey Shelukhin commented on HIVE-8080:


https://reviews.apache.org/r/25700/

 CBO: function name may not match UDF name during translation
 

 Key: HIVE-8080
 URL: https://issues.apache.org/jira/browse/HIVE-8080
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-8080.01.patch, HIVE-8080.02.patch, HIVE-8080.patch


 create_func1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-8138) Global Init file should allow specifying file name not only directory

2014-09-16 Thread Brock Noland (JIRA)
Brock Noland created HIVE-8138:
--

 Summary: Global Init file should allow specifying file name  not 
only directory
 Key: HIVE-8138
 URL: https://issues.apache.org/jira/browse/HIVE-8138
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland


HIVE-5160 allows you to specify a directory where a .hiverc file exists. 
However since .hiverc is a hidden file this can be confusing. The property 
should allow a path to a file or a directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-8138) Global Init file should allow specifying file name not only directory

2014-09-16 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland reassigned HIVE-8138:
--

Assignee: Brock Noland

 Global Init file should allow specifying file name  not only directory
 --

 Key: HIVE-8138
 URL: https://issues.apache.org/jira/browse/HIVE-8138
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Brock Noland

 HIVE-5160 allows you to specify a directory where a .hiverc file exists. 
 However since .hiverc is a hidden file this can be confusing. The property 
 should allow a path to a file or a directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 25700: HIVE-8080 CBO: function name may not match UDF name during translation

2014-09-16 Thread John Pullokkaran

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25700/#review53546
---



ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java
https://reviews.apache.org/r/25700/#comment93239

Is hive token case insensitive or all function names are in lower case?



ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java
https://reviews.apache.org/r/25700/#comment93240

are all functions qualified in hive (w.r.t DB)
How about built in functions like toLower?
Could you say DB NAME.toLower()?


- John Pullokkaran


On Sept. 16, 2014, 5:23 p.m., Sergey Shelukhin wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25700/
 ---
 
 (Updated Sept. 16, 2014, 5:23 p.m.)
 
 
 Review request for hive, Ashutosh Chauhan and John Pullokkaran.
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 see jira
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java c503bbb 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/RexNodeConverter.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/SqlFunctionConverter.java
  PRE-CREATION 
   ql/src/test/queries/clientpositive/create_func1.q ad924d3 
   ql/src/test/results/clientpositive/create_func1.q.out 798f77f 
 
 Diff: https://reviews.apache.org/r/25700/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Sergey Shelukhin
 




[jira] [Commented] (HIVE-8137) Empty ORC file handling

2014-09-16 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135768#comment-14135768
 ] 

Gopal V commented on HIVE-8137:
---

The right approach is skip generating splits for such files.

There is no reason to schedule this split or run a task at all.

 Empty ORC file handling
 ---

 Key: HIVE-8137
 URL: https://issues.apache.org/jira/browse/HIVE-8137
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Affects Versions: 0.13.1
Reporter: Pankit Thapar
 Fix For: 0.14.0


 Hive 13 does not handle reading of a zero size Orc File properly. An Orc file 
 is suposed to have a post-script
 which the ReaderIml class tries to read and initialize the footer with it. 
 But in case, the file is empty 
 or is of zero size, then it runs into an IndexOutOfBound Exception because of 
 ReaderImpl trying to read in its constructor.
 Code Snippet : 
 //get length of PostScript
 int psLen = buffer.get(readSize - 1)  0xff; 
 In the above code, readSize for an empty file is zero.
 I see that ensureOrcFooter() method performs some sanity checks for footer , 
 so, either we can move the above code snippet to ensureOrcFooter() and throw 
 a Malformed ORC file exception or we can create a dummy Reader that does 
 not initialize footer and basically has hasNext() set to false so that it 
 returns false on the first call.
 Basically, I would like to know what might be the correct way to handle an 
 empty ORC file in a mapred job?
 Should we neglect it and not throw an exception or we can throw an exeption 
 that the ORC file is malformed.
 Please let me know your thoughts on this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8137) Empty ORC file handling

2014-09-16 Thread Pankit Thapar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135778#comment-14135778
 ] 

Pankit Thapar commented on HIVE-8137:
-

The issue is hadoop might create a split in case its a CombineInputFormat. 
Hadoop specifically creates empty splits.

 Empty ORC file handling
 ---

 Key: HIVE-8137
 URL: https://issues.apache.org/jira/browse/HIVE-8137
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Affects Versions: 0.13.1
Reporter: Pankit Thapar
 Fix For: 0.14.0


 Hive 13 does not handle reading of a zero size Orc File properly. An Orc file 
 is suposed to have a post-script
 which the ReaderIml class tries to read and initialize the footer with it. 
 But in case, the file is empty 
 or is of zero size, then it runs into an IndexOutOfBound Exception because of 
 ReaderImpl trying to read in its constructor.
 Code Snippet : 
 //get length of PostScript
 int psLen = buffer.get(readSize - 1)  0xff; 
 In the above code, readSize for an empty file is zero.
 I see that ensureOrcFooter() method performs some sanity checks for footer , 
 so, either we can move the above code snippet to ensureOrcFooter() and throw 
 a Malformed ORC file exception or we can create a dummy Reader that does 
 not initialize footer and basically has hasNext() set to false so that it 
 returns false on the first call.
 Basically, I would like to know what might be the correct way to handle an 
 empty ORC file in a mapred job?
 Should we neglect it and not throw an exception or we can throw an exeption 
 that the ORC file is malformed.
 Please let me know your thoughts on this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 25700: HIVE-8080 CBO: function name may not match UDF name during translation

2014-09-16 Thread John Pullokkaran


 On Sept. 16, 2014, 5:30 p.m., John Pullokkaran wrote:
  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java, line 646
  https://reviews.apache.org/r/25700/diff/1/?file=690720#file690720line646
 
  are all functions qualified in hive (w.r.t DB)
  How about built in functions like toLower?
  Could you say DB NAME.toLower()?

Also could you run few of the q test below and see if your change causes 
problems
authorization_create_func1.qshow_functions.q
vectorized_string_funcs.q
create_func1.q  vector_decimal_math_funcs.q 
vectorized_timestamp_funcs.q
drop_function.q vectorized_date_funcs.q
show_describe_func_quotes.q vectorized_math_funcs.q


- John


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25700/#review53546
---


On Sept. 16, 2014, 5:23 p.m., Sergey Shelukhin wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25700/
 ---
 
 (Updated Sept. 16, 2014, 5:23 p.m.)
 
 
 Review request for hive, Ashutosh Chauhan and John Pullokkaran.
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 see jira
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java c503bbb 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/RexNodeConverter.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/SqlFunctionConverter.java
  PRE-CREATION 
   ql/src/test/queries/clientpositive/create_func1.q ad924d3 
   ql/src/test/results/clientpositive/create_func1.q.out 798f77f 
 
 Diff: https://reviews.apache.org/r/25700/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Sergey Shelukhin
 




[jira] [Updated] (HIVE-8097) Vectorized Reduce-Side [SMB] MapJoin operator fails

2014-09-16 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-8097:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks [~mmccline]!

 Vectorized Reduce-Side [SMB] MapJoin operator fails
 ---

 Key: HIVE-8097
 URL: https://issues.apache.org/jira/browse/HIVE-8097
 Project: Hive
  Issue Type: Bug
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Critical
 Attachments: HIVE-8097.01.patch, HIVE-8097.02.patch, 
 HIVE-8097.03.patch


 Fails attempting to getScratchColumnVectorTypes since mapWork is null on 
 reduce-side.
 Fix by calling that method using reduceWork object.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8137) Empty ORC file handling

2014-09-16 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135790#comment-14135790
 ] 

Gopal V commented on HIVE-8137:
---

Hive's CombineInputFormat has pending changes to fix this - HIVE-6554

But obviously that does not apply to MR's combine implementation. 

The Tez one actually works as expected in this case, because it combines 
InputSplits instead of combining arbitrary FileSplits.

 Empty ORC file handling
 ---

 Key: HIVE-8137
 URL: https://issues.apache.org/jira/browse/HIVE-8137
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Affects Versions: 0.13.1
Reporter: Pankit Thapar
 Fix For: 0.14.0


 Hive 13 does not handle reading of a zero size Orc File properly. An Orc file 
 is suposed to have a post-script
 which the ReaderIml class tries to read and initialize the footer with it. 
 But in case, the file is empty 
 or is of zero size, then it runs into an IndexOutOfBound Exception because of 
 ReaderImpl trying to read in its constructor.
 Code Snippet : 
 //get length of PostScript
 int psLen = buffer.get(readSize - 1)  0xff; 
 In the above code, readSize for an empty file is zero.
 I see that ensureOrcFooter() method performs some sanity checks for footer , 
 so, either we can move the above code snippet to ensureOrcFooter() and throw 
 a Malformed ORC file exception or we can create a dummy Reader that does 
 not initialize footer and basically has hasNext() set to false so that it 
 returns false on the first call.
 Basically, I would like to know what might be the correct way to handle an 
 empty ORC file in a mapred job?
 Should we neglect it and not throw an exception or we can throw an exeption 
 that the ORC file is malformed.
 Please let me know your thoughts on this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8074) Merge spark into trunk 9/12/2014

2014-09-16 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-8074:
---
Attachment: (was: HIVE-8074.1-spark.patch)

 Merge spark into trunk 9/12/2014
 

 Key: HIVE-8074
 URL: https://issues.apache.org/jira/browse/HIVE-8074
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Brock Noland
Assignee: Brock Noland





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8137) Empty ORC file handling

2014-09-16 Thread Pankit Thapar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135821#comment-14135821
 ] 

Pankit Thapar commented on HIVE-8137:
-

I ran an insert overwrite query from an empty table into an orc table. That 
triggered Hadoop's CombineFileInputFormat which does not check if the split is 
empty or not.


 Empty ORC file handling
 ---

 Key: HIVE-8137
 URL: https://issues.apache.org/jira/browse/HIVE-8137
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Affects Versions: 0.13.1
Reporter: Pankit Thapar
 Fix For: 0.14.0


 Hive 13 does not handle reading of a zero size Orc File properly. An Orc file 
 is suposed to have a post-script
 which the ReaderIml class tries to read and initialize the footer with it. 
 But in case, the file is empty 
 or is of zero size, then it runs into an IndexOutOfBound Exception because of 
 ReaderImpl trying to read in its constructor.
 Code Snippet : 
 //get length of PostScript
 int psLen = buffer.get(readSize - 1)  0xff; 
 In the above code, readSize for an empty file is zero.
 I see that ensureOrcFooter() method performs some sanity checks for footer , 
 so, either we can move the above code snippet to ensureOrcFooter() and throw 
 a Malformed ORC file exception or we can create a dummy Reader that does 
 not initialize footer and basically has hasNext() set to false so that it 
 returns false on the first call.
 Basically, I would like to know what might be the correct way to handle an 
 empty ORC file in a mapred job?
 Should we neglect it and not throw an exception or we can throw an exeption 
 that the ORC file is malformed.
 Please let me know your thoughts on this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7588) Using type variable in UDF

2014-09-16 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135827#comment-14135827
 ] 

Hive QA commented on HIVE-7588:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12668941/HIVE-7588.4.patch.txt

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6277 tests executed
*Failed tests:*
{noformat}
org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/823/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/823/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-823/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12668941

 Using type variable in UDF
 --

 Key: HIVE-7588
 URL: https://issues.apache.org/jira/browse/HIVE-7588
 Project: Hive
  Issue Type: Improvement
  Components: UDF
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-7588.1.patch.txt, HIVE-7588.2.patch.txt, 
 HIVE-7588.3.patch.txt, HIVE-7588.4.patch.txt


 From http://www.mail-archive.com/user@hive.apache.org/msg12307.html
 Support type variables in UDF
 {code}
 public T T evaluate(final T s, final String column_name, final int bitmap) 
 throws Exception {
  if (s instanceof Double)
 return (T) new Double(-1.0);
  Else if( s instance of Integer)
 Return (T) new Integer(-1) ;  
 …..
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8038) Decouple ORC files split calculation logic from Filesystem's get file location implementation

2014-09-16 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-8038:
--
Attachment: HIVE-8038.3.patch

+1 - Patch looks good.

For commit - .3.patch, removed a white space change  a javadoc.

{code}
  context.splits.add(new OrcSplit(file.getPath(), offset, length,
 -hosts, fileMetaInfo, isOriginal, hasBase, deltas));
 +hosts, fileMetaInfo, isOriginal, hasBase, deltas));
}
...
-   * @return TreeMapOffst, BlockLocation
+  * @return TreeMapLong, BlockLocation
{code}

 Decouple ORC files split calculation logic from Filesystem's get file 
 location implementation
 -

 Key: HIVE-8038
 URL: https://issues.apache.org/jira/browse/HIVE-8038
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Affects Versions: 0.13.1
Reporter: Pankit Thapar
Assignee: Pankit Thapar
 Fix For: 0.14.0

 Attachments: HIVE-8038.2.patch, HIVE-8038.3.patch, HIVE-8038.patch


 What is the Current Logic
 ==
 1.get the file blocks from FileSystem.getFileBlockLocations() which returns 
 an array of BlockLocation
 2.In SplitGenerator.createSplit(), check if split only spans one block or 
 multiple blocks.
 3.If split spans just one block, then using the array index (index = 
 offset/blockSize), get the corresponding host having the blockLocation
 4.If the split spans multiple blocks, then get all hosts that have at least 
 80% of the max of total data in split hosted by any host.
 5.add the split to a list of splits
 Issue with Current Logic
 =
 Dependency on FileSystem API’s logic for block location calculations. It 
 returns an array and we need to rely on FileSystem to  
 make all blocks of same size if we want to directly access a block from the 
 array.
  
 What is the Fix
 =
 1a.get the file blocks from FileSystem.getFileBlockLocations() which returns 
 an array of BlockLocation
 1b.convert the array into a tree map offset, BlockLocation and return it 
 through getLocationsWithOffSet()
 2.In SplitGenerator.createSplit(), check if split only spans one block or 
 multiple blocks.
 3.If split spans just one block, then using Tree.floorEntry(key), get the 
 highest entry smaller than offset for the split and get the corresponding 
 host.
 4a.If the split spans multiple blocks, get a submap, which contains all 
 entries containing blockLocations from the offset to offset + length
 4b.get all hosts that have at least 80% of the max of total data in split 
 hosted by any host.
 5.add the split to a list of splits
 What are the major changes in logic
 ==
 1. store BlockLocations in a Map instead of an array
 2. Call SHIMS.getLocationsWithOffSet() instead of getLocations()
 3. one block case is checked by if(offset + length = start.getOffset() + 
 start.getLength())  instead of if((offset % blockSize) + length = 
 blockSize)
 What is the affect on Complexity (Big O)
 =
 1. We add a O(n) loop to build a TreeMap from an array but its a one time 
 cost and would not be called for each split
 2. In case of one block case, we can get the block in O(logn) worst case 
 which was O(1) before
 3. Getting the submap is O(logn)
 4. In case of multiple block case, building the list of hosts is O(m) which 
 was O(n)  m  n as previously we were iterating 
over all the block locations but now we are only iterating only blocks 
 that belong to that range go offsets that we need. 
 What are the benefits of the change
 ==
 1. With this fix, we do not depend on the blockLocations returned by 
 FileSystem to figure out the block corresponding to the offset and blockSize
 2. Also, it is not necessary that block lengths is same for all blocks for 
 all FileSystems
 3. Previously we were using blockSize for one block case and block.length for 
 multiple block case, which is not the case now. We figure out the block
depending upon the actual length and offset of the block



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-8090) Potential null pointer reference in WriterImpl#StreamFactory#createStream()

2014-09-16 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V reassigned HIVE-8090:
-

Assignee: Gopal V

 Potential null pointer reference in WriterImpl#StreamFactory#createStream()
 ---

 Key: HIVE-8090
 URL: https://issues.apache.org/jira/browse/HIVE-8090
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 0.14.0
Reporter: Ted Yu
Assignee: Gopal V
Priority: Minor
 Attachments: HIVE-8090.1.patch, HIVE-8090.2.patch, HIVE-8090.3.patch, 
 HIVE-8090.4.patch


 {code}
   switch (kind) {
 ...
   default:
 modifiers = null;
 break;
   }
   BufferedStream result = streams.get(name);
   if (result == null) {
 result = new BufferedStream(name.toString(), bufferSize,
 codec == null ? codec : codec.modify(modifiers));
 {code}
 In case modifiers is null and codec is ZlibCodec, there would be NPE in 
 ZlibCodec#modify(EnumSetModifier modifiers) :
 {code}
 for (Modifier m : modifiers) {
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8090) Potential null pointer reference in WriterImpl#StreamFactory#createStream()

2014-09-16 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-8090:
--
Component/s: File Formats

 Potential null pointer reference in WriterImpl#StreamFactory#createStream()
 ---

 Key: HIVE-8090
 URL: https://issues.apache.org/jira/browse/HIVE-8090
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 0.14.0
Reporter: Ted Yu
Assignee: Gopal V
Priority: Minor
 Attachments: HIVE-8090.1.patch, HIVE-8090.2.patch, HIVE-8090.3.patch, 
 HIVE-8090.4.patch


 {code}
   switch (kind) {
 ...
   default:
 modifiers = null;
 break;
   }
   BufferedStream result = streams.get(name);
   if (result == null) {
 result = new BufferedStream(name.toString(), bufferSize,
 codec == null ? codec : codec.modify(modifiers));
 {code}
 In case modifiers is null and codec is ZlibCodec, there would be NPE in 
 ZlibCodec#modify(EnumSetModifier modifiers) :
 {code}
 for (Modifier m : modifiers) {
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8090) Potential null pointer reference in WriterImpl#StreamFactory#createStream()

2014-09-16 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135847#comment-14135847
 ] 

Gopal V commented on HIVE-8090:
---

Test failures look unrelated - +1.

Assigned to myself till [~rpalamut] gets contributor access.

 Potential null pointer reference in WriterImpl#StreamFactory#createStream()
 ---

 Key: HIVE-8090
 URL: https://issues.apache.org/jira/browse/HIVE-8090
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 0.14.0
Reporter: Ted Yu
Assignee: Gopal V
Priority: Minor
 Attachments: HIVE-8090.1.patch, HIVE-8090.2.patch, HIVE-8090.3.patch, 
 HIVE-8090.4.patch


 {code}
   switch (kind) {
 ...
   default:
 modifiers = null;
 break;
   }
   BufferedStream result = streams.get(name);
   if (result == null) {
 result = new BufferedStream(name.toString(), bufferSize,
 codec == null ? codec : codec.modify(modifiers));
 {code}
 In case modifiers is null and codec is ZlibCodec, there would be NPE in 
 ZlibCodec#modify(EnumSetModifier modifiers) :
 {code}
 for (Modifier m : modifiers) {
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8038) Decouple ORC files split calculation logic from Filesystem's get file location implementation

2014-09-16 Thread Pankit Thapar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135848#comment-14135848
 ] 

Pankit Thapar commented on HIVE-8038:
-

Is .3.patch commited to trunk?


 Decouple ORC files split calculation logic from Filesystem's get file 
 location implementation
 -

 Key: HIVE-8038
 URL: https://issues.apache.org/jira/browse/HIVE-8038
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Affects Versions: 0.13.1
Reporter: Pankit Thapar
Assignee: Pankit Thapar
 Fix For: 0.14.0

 Attachments: HIVE-8038.2.patch, HIVE-8038.3.patch, HIVE-8038.patch


 What is the Current Logic
 ==
 1.get the file blocks from FileSystem.getFileBlockLocations() which returns 
 an array of BlockLocation
 2.In SplitGenerator.createSplit(), check if split only spans one block or 
 multiple blocks.
 3.If split spans just one block, then using the array index (index = 
 offset/blockSize), get the corresponding host having the blockLocation
 4.If the split spans multiple blocks, then get all hosts that have at least 
 80% of the max of total data in split hosted by any host.
 5.add the split to a list of splits
 Issue with Current Logic
 =
 Dependency on FileSystem API’s logic for block location calculations. It 
 returns an array and we need to rely on FileSystem to  
 make all blocks of same size if we want to directly access a block from the 
 array.
  
 What is the Fix
 =
 1a.get the file blocks from FileSystem.getFileBlockLocations() which returns 
 an array of BlockLocation
 1b.convert the array into a tree map offset, BlockLocation and return it 
 through getLocationsWithOffSet()
 2.In SplitGenerator.createSplit(), check if split only spans one block or 
 multiple blocks.
 3.If split spans just one block, then using Tree.floorEntry(key), get the 
 highest entry smaller than offset for the split and get the corresponding 
 host.
 4a.If the split spans multiple blocks, get a submap, which contains all 
 entries containing blockLocations from the offset to offset + length
 4b.get all hosts that have at least 80% of the max of total data in split 
 hosted by any host.
 5.add the split to a list of splits
 What are the major changes in logic
 ==
 1. store BlockLocations in a Map instead of an array
 2. Call SHIMS.getLocationsWithOffSet() instead of getLocations()
 3. one block case is checked by if(offset + length = start.getOffset() + 
 start.getLength())  instead of if((offset % blockSize) + length = 
 blockSize)
 What is the affect on Complexity (Big O)
 =
 1. We add a O(n) loop to build a TreeMap from an array but its a one time 
 cost and would not be called for each split
 2. In case of one block case, we can get the block in O(logn) worst case 
 which was O(1) before
 3. Getting the submap is O(logn)
 4. In case of multiple block case, building the list of hosts is O(m) which 
 was O(n)  m  n as previously we were iterating 
over all the block locations but now we are only iterating only blocks 
 that belong to that range go offsets that we need. 
 What are the benefits of the change
 ==
 1. With this fix, we do not depend on the blockLocations returned by 
 FileSystem to figure out the block corresponding to the offset and blockSize
 2. Also, it is not necessary that block lengths is same for all blocks for 
 all FileSystems
 3. Previously we were using blockSize for one block case and block.length for 
 multiple block case, which is not the case now. We figure out the block
depending upon the actual length and offset of the block



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8118) SparkMapRecorderHandler and SparkReduceRecordHandler should be initialized with multiple result collectors [Spark Branch]

2014-09-16 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135849#comment-14135849
 ] 

Xuefu Zhang commented on HIVE-8118:
---

I and [~chengxiang li] had an offline discussion and there was just a little 
bit confusion on understanding the problem, and now we are in the same page. To 
summarize, the problem comes when a map work or reduce work is connected to 
multiple reduce works. Currently the a map work or reduce work is only wired 
with one collector, which collects all data regardless the branch. That data 
set feeds to all subsequent child reduce works.
 
I also noted that Tez provides a name, outputcollector map to its recorder 
handlers. However, for us, we may not be able to do that, due to the 
limitations of Spark's RDD transformation APIs.


 SparkMapRecorderHandler and SparkReduceRecordHandler should be initialized 
 with multiple result collectors [Spark Branch]
 -

 Key: HIVE-8118
 URL: https://issues.apache.org/jira/browse/HIVE-8118
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Venki Korukanti
  Labels: Spark-M1

 In the current implementation, both SparkMapRecordHandler and 
 SparkReduceRecorderHandler takes only one result collector, which limits that 
 the corresponding map or reduce task can have only one child. It's very 
 comment in multi-insert queries where a map/reduce task has more than one 
 children. A query like the following has two map tasks as parents:
 {code}
 select name, sum(value) from dec group by name union all select name, value 
 from dec order by name
 {code}
 It's possible in the future an optimation may be implemented so that a map 
 work is followed by two reduce works and then connected to a union work.
 Thus, we should take this as a general case. Tez is currently providing a 
 collector for each child operator in the map-side or reduce side operator 
 tree. We can take Tez as a reference.
 Likely this is a big change and subtasks are possible. 
 With this, we can have a simpler and clean multi-insert implementation. This 
 is also the problem observed in HIVE-7731.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8038) Decouple ORC files split calculation logic from Filesystem's get file location implementation

2014-09-16 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135857#comment-14135857
 ] 

Gopal V commented on HIVE-8038:
---

No, there is a 24 hour waiting period after the +1.

I will resolve the ticket once it is committed. Leave comments if you need to.

 Decouple ORC files split calculation logic from Filesystem's get file 
 location implementation
 -

 Key: HIVE-8038
 URL: https://issues.apache.org/jira/browse/HIVE-8038
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Affects Versions: 0.13.1
Reporter: Pankit Thapar
Assignee: Pankit Thapar
 Fix For: 0.14.0

 Attachments: HIVE-8038.2.patch, HIVE-8038.3.patch, HIVE-8038.patch


 What is the Current Logic
 ==
 1.get the file blocks from FileSystem.getFileBlockLocations() which returns 
 an array of BlockLocation
 2.In SplitGenerator.createSplit(), check if split only spans one block or 
 multiple blocks.
 3.If split spans just one block, then using the array index (index = 
 offset/blockSize), get the corresponding host having the blockLocation
 4.If the split spans multiple blocks, then get all hosts that have at least 
 80% of the max of total data in split hosted by any host.
 5.add the split to a list of splits
 Issue with Current Logic
 =
 Dependency on FileSystem API’s logic for block location calculations. It 
 returns an array and we need to rely on FileSystem to  
 make all blocks of same size if we want to directly access a block from the 
 array.
  
 What is the Fix
 =
 1a.get the file blocks from FileSystem.getFileBlockLocations() which returns 
 an array of BlockLocation
 1b.convert the array into a tree map offset, BlockLocation and return it 
 through getLocationsWithOffSet()
 2.In SplitGenerator.createSplit(), check if split only spans one block or 
 multiple blocks.
 3.If split spans just one block, then using Tree.floorEntry(key), get the 
 highest entry smaller than offset for the split and get the corresponding 
 host.
 4a.If the split spans multiple blocks, get a submap, which contains all 
 entries containing blockLocations from the offset to offset + length
 4b.get all hosts that have at least 80% of the max of total data in split 
 hosted by any host.
 5.add the split to a list of splits
 What are the major changes in logic
 ==
 1. store BlockLocations in a Map instead of an array
 2. Call SHIMS.getLocationsWithOffSet() instead of getLocations()
 3. one block case is checked by if(offset + length = start.getOffset() + 
 start.getLength())  instead of if((offset % blockSize) + length = 
 blockSize)
 What is the affect on Complexity (Big O)
 =
 1. We add a O(n) loop to build a TreeMap from an array but its a one time 
 cost and would not be called for each split
 2. In case of one block case, we can get the block in O(logn) worst case 
 which was O(1) before
 3. Getting the submap is O(logn)
 4. In case of multiple block case, building the list of hosts is O(m) which 
 was O(n)  m  n as previously we were iterating 
over all the block locations but now we are only iterating only blocks 
 that belong to that range go offsets that we need. 
 What are the benefits of the change
 ==
 1. With this fix, we do not depend on the blockLocations returned by 
 FileSystem to figure out the block corresponding to the offset and blockSize
 2. Also, it is not necessary that block lengths is same for all blocks for 
 all FileSystems
 3. Previously we were using blockSize for one block case and block.length for 
 multiple block case, which is not the case now. We figure out the block
depending upon the actual length and offset of the block



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-6554) CombineHiveInputFormat should use the underlying InputSplits

2014-09-16 Thread Pankit Thapar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135863#comment-14135863
 ] 

Pankit Thapar commented on HIVE-6554:
-

Is there any update in this?


 CombineHiveInputFormat should use the underlying InputSplits
 

 Key: HIVE-6554
 URL: https://issues.apache.org/jira/browse/HIVE-6554
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Owen O'Malley
Assignee: Owen O'Malley

 Currently CombineHiveInputFormat generates FileSplits without using the 
 underlying InputFormat. This leads to a problem when an InputFormat needs a 
 InputSplit that isn't exactly a FileSplit, because CombineHiveInputSplit 
 always generates FileSplits and then calls the underlying InputFormats 
 getRecordReader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8106) Enable vectorization for spark [spark branch]

2014-09-16 Thread Chinna Rao Lalam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chinna Rao Lalam updated HIVE-8106:
---
Status: Open  (was: Patch Available)

Patch need to be reworked.

 Enable vectorization for spark [spark branch]
 -

 Key: HIVE-8106
 URL: https://issues.apache.org/jira/browse/HIVE-8106
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam
 Attachments: HIVE-8106-spark.patch


 Enable the vectorization optimization on spark



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7926) long-lived daemons for query fragment execution, I/O and caching

2014-09-16 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135879#comment-14135879
 ] 

Sergey Shelukhin commented on HIVE-7926:


Just pushed some early prototype code for storage layer into development branch

 long-lived daemons for query fragment execution, I/O and caching
 

 Key: HIVE-7926
 URL: https://issues.apache.org/jira/browse/HIVE-7926
 Project: Hive
  Issue Type: New Feature
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: LLAPdesigndocument.pdf


 We are proposing a new execution model for Hive that is a combination of 
 existing process-based tasks and long-lived daemons running on worker nodes. 
 These nodes can take care of efficient I/O, caching and query fragment 
 execution, while heavy lifting like most joins, ordering, etc. can be handled 
 by tasks.
 The proposed model is not a 2-system solution for small and large queries; 
 neither it is a separate execution engine like MR or Tez. It can be used by 
 any Hive execution engine, if support is added; in future even external 
 products (e.g. Pig) can use it.
 The document with high-level design we are proposing will be attached shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-6936) Provide table properties to InputFormats

2014-09-16 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-6936:

Attachment: HIVE-6936.patch

Resubmitting patch to jenkins.

 Provide table properties to InputFormats
 

 Key: HIVE-6936
 URL: https://issues.apache.org/jira/browse/HIVE-6936
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 0.14.0

 Attachments: HIVE-6936.patch, HIVE-6936.patch, HIVE-6936.patch, 
 HIVE-6936.patch, HIVE-6936.patch, HIVE-6936.patch, HIVE-6936.patch, 
 HIVE-6936.patch, HIVE-6936.patch


 Some advanced file formats need the table properties made available to them. 
 Additionally, it would be convenient to provide a unique id for fetch 
 operators and the complete list of directories.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 25700: HIVE-8080 CBO: function name may not match UDF name during translation

2014-09-16 Thread Sergey Shelukhin


 On Sept. 16, 2014, 5:30 p.m., John Pullokkaran wrote:
  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java, line 644
  https://reviews.apache.org/r/25700/diff/1/?file=690720#file690720line644
 
  Is hive token case insensitive or all function names are in lower case?

see get... and register... methods in FunctionRegistry; when storing or 
retrieving, they are all made lower case


 On Sept. 16, 2014, 5:30 p.m., John Pullokkaran wrote:
  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java, line 646
  https://reviews.apache.org/r/25700/diff/1/?file=690720#file690720line646
 
  are all functions qualified in hive (w.r.t DB)
  How about built in functions like toLower?
  Could you say DB NAME.toLower()?
 
 John Pullokkaran wrote:
 Also could you run few of the q test below and see if your change causes 
 problems
 authorization_create_func1.q  show_functions.q
 vectorized_string_funcs.q
 create_func1.qvector_decimal_math_funcs.q 
 vectorized_timestamp_funcs.q
 drop_function.q   vectorized_date_funcs.q
 show_describe_func_quotes.q   vectorized_math_funcs.q

this is covered by the 2nd part of the condition (if function is located w/o 
qualified name, just the name is returned)

Will run the tests


- Sergey


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25700/#review53546
---


On Sept. 16, 2014, 5:23 p.m., Sergey Shelukhin wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25700/
 ---
 
 (Updated Sept. 16, 2014, 5:23 p.m.)
 
 
 Review request for hive, Ashutosh Chauhan and John Pullokkaran.
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 see jira
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java c503bbb 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/RexNodeConverter.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/SqlFunctionConverter.java
  PRE-CREATION 
   ql/src/test/queries/clientpositive/create_func1.q ad924d3 
   ql/src/test/results/clientpositive/create_func1.q.out 798f77f 
 
 Diff: https://reviews.apache.org/r/25700/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Sergey Shelukhin
 




[jira] [Created] (HIVE-8139) Upgrade commons-lang from 2.4 to 2.6

2014-09-16 Thread Prasad Mujumdar (JIRA)
Prasad Mujumdar created HIVE-8139:
-

 Summary: Upgrade commons-lang from 2.4 to 2.6
 Key: HIVE-8139
 URL: https://issues.apache.org/jira/browse/HIVE-8139
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure
Affects Versions: 0.14.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Fix For: 0.14.0


Upgrade commons-lang version from 2.4 to latest 2.6



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8139) Upgrade commons-lang from 2.4 to 2.6

2014-09-16 Thread Prasad Mujumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Mujumdar updated HIVE-8139:
--
Attachment: HIVE-8139.1.patch

 Upgrade commons-lang from 2.4 to 2.6
 

 Key: HIVE-8139
 URL: https://issues.apache.org/jira/browse/HIVE-8139
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure
Affects Versions: 0.14.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Fix For: 0.14.0

 Attachments: HIVE-8139.1.patch


 Upgrade commons-lang version from 2.4 to latest 2.6



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8139) Upgrade commons-lang from 2.4 to 2.6

2014-09-16 Thread Prasad Mujumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Mujumdar updated HIVE-8139:
--
Status: Patch Available  (was: Open)

 Upgrade commons-lang from 2.4 to 2.6
 

 Key: HIVE-8139
 URL: https://issues.apache.org/jira/browse/HIVE-8139
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure
Affects Versions: 0.14.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Fix For: 0.14.0

 Attachments: HIVE-8139.1.patch


 Upgrade commons-lang version from 2.4 to latest 2.6



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 25595: HIVE-8083: Authorization DDLs should not enforce hive identifier syntax for user or group namesname that

2014-09-16 Thread Prasad Mujumdar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25595/
---

(Updated Sept. 16, 2014, 6:29 p.m.)


Review request for hive and Brock Noland.


Changes
---

Rebased with latest


Bugs: HIVE-8083
https://issues.apache.org/jira/browse/HIVE-8083


Repository: hive-git


Description
---

The compiler expects principals (user, group and role) as hive identifiers for 
authorization DDLs. The user and group are entities that belong to external 
namespace and we can't expect those to follow hive identifier syntax rules. For 
example, a userid or group can contain '-' which is not allowed by compiler.
The patch is to allow string literal for user and group names.
The quoted identifier support perhaps can be made to work with this. However 
IMO this syntax should be supported regardless of quoted identifier support 
(which is an optional configuration)


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g 25cd3a5 
  ql/src/test/queries/clientpositive/authorization_non_id.q PRE-CREATION 
  ql/src/test/results/clientpositive/authorization_non_id.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/25595/diff/


Testing
---

Added test case to verify various auth DDLs with new syntax.


Thanks,

Prasad Mujumdar



[jira] [Updated] (HIVE-8083) Authorization DDLs should not enforce hive identifier syntax for user or group

2014-09-16 Thread Prasad Mujumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Mujumdar updated HIVE-8083:
--
Attachment: HIVE-8083.2.patch

Rebased with latest

 Authorization DDLs should not enforce hive identifier syntax for user or group
 --

 Key: HIVE-8083
 URL: https://issues.apache.org/jira/browse/HIVE-8083
 Project: Hive
  Issue Type: Bug
  Components: SQL, SQLStandardAuthorization
Affects Versions: 0.13.0, 0.13.1
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Attachments: HIVE-8083.1.patch, HIVE-8083.2.patch


 The compiler expects principals (user, group and role) as hive identifiers 
 for authorization DDLs. The user and group are entities that belong to 
 external namespace and we can't expect those to follow hive identifier syntax 
 rules. For example, a userid or group can contain '-' which is not allowed by 
 compiler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8055) Code cleanup after HIVE-8054 [Spark Branch]

2014-09-16 Thread Na Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Na Yang updated HIVE-8055:
--
Attachment: HIVE-8055-spark.patch

HIVE-8054 disabled the union remove optimization feature on spark execution 
engine, so that the linked FileSink descriptors do not need to be maintained. 
This patch is cleaning up the un-necessary code.

 Code cleanup after HIVE-8054 [Spark Branch]
 ---

 Key: HIVE-8055
 URL: https://issues.apache.org/jira/browse/HIVE-8055
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Na Yang
  Labels: Spark-M1
 Attachments: HIVE-8055-spark.patch


 There is quite some code handling union removal optimization in SparkCompiler 
 and related classes. We need to clean this up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8139) Upgrade commons-lang from 2.4 to 2.6

2014-09-16 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135918#comment-14135918
 ] 

Brock Noland commented on HIVE-8139:


+1 pending tests

 Upgrade commons-lang from 2.4 to 2.6
 

 Key: HIVE-8139
 URL: https://issues.apache.org/jira/browse/HIVE-8139
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure
Affects Versions: 0.14.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Fix For: 0.14.0

 Attachments: HIVE-8139.1.patch


 Upgrade commons-lang version from 2.4 to latest 2.6



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 25700: HIVE-8080 CBO: function name may not match UDF name during translation

2014-09-16 Thread Sergey Shelukhin


 On Sept. 16, 2014, 5:30 p.m., John Pullokkaran wrote:
  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java, line 646
  https://reviews.apache.org/r/25700/diff/1/?file=690720#file690720line646
 
  are all functions qualified in hive (w.r.t DB)
  How about built in functions like toLower?
  Could you say DB NAME.toLower()?
 
 John Pullokkaran wrote:
 Also could you run few of the q test below and see if your change causes 
 problems
 authorization_create_func1.q  show_functions.q
 vectorized_string_funcs.q
 create_func1.qvector_decimal_math_funcs.q 
 vectorized_timestamp_funcs.q
 drop_function.q   vectorized_date_funcs.q
 show_describe_func_quotes.q   vectorized_math_funcs.q
 
 Sergey Shelukhin wrote:
 this is covered by the 2nd part of the condition (if function is located 
 w/o qualified name, just the name is returned)
 
 Will run the tests

Ran the tests; there are some out file changes, but they are the same as on 
current cbo branch.


- Sergey


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25700/#review53546
---


On Sept. 16, 2014, 5:23 p.m., Sergey Shelukhin wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25700/
 ---
 
 (Updated Sept. 16, 2014, 5:23 p.m.)
 
 
 Review request for hive, Ashutosh Chauhan and John Pullokkaran.
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 see jira
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java c503bbb 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/RexNodeConverter.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/SqlFunctionConverter.java
  PRE-CREATION 
   ql/src/test/queries/clientpositive/create_func1.q ad924d3 
   ql/src/test/results/clientpositive/create_func1.q.out 798f77f 
 
 Diff: https://reviews.apache.org/r/25700/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Sergey Shelukhin
 




Review Request 25704: HIVE-8055:Code cleanup after HIVE-8054 [Spark Branch]

2014-09-16 Thread Na Yang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25704/
---

Review request for hive, Brock Noland and Xuefu Zhang.


Bugs: HIVE-8055
https://issues.apache.org/jira/browse/HIVE-8055


Repository: hive-git


Description
---

HIVE-8054 disabled the union remove optimization feature on spark execution 
engine, so that the linked FileSink descriptors do not need to be maintained. 
This patch is cleaning up the un-necessary code.


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 
5ddc16d 
  ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 3cdfc51 

Diff: https://reviews.apache.org/r/25704/diff/


Testing
---


Thanks,

Na Yang



[jira] [Updated] (HIVE-8115) Hive select query hang when fields contain map

2014-09-16 Thread Xiaobing Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaobing Zhou updated HIVE-8115:

Attachment: HIVE-8115.1.patch

made a patch to warn empty key or empty pair. Can anyone do a quick review? 
Thanks!

 Hive select query hang when fields contain map
 --

 Key: HIVE-8115
 URL: https://issues.apache.org/jira/browse/HIVE-8115
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Xiaobing Zhou
Assignee: Xiaobing Zhou
 Fix For: 0.14.0

 Attachments: HIVE-8115.1.patch, createTable.hql, data


 Attached the repro of the issue. When creating an table loading the data 
 attached, all hive query with hangs even just select * from the table.
 repro steps:
 1. run createTable.hql
 2. hadoop fs ls -put data /data
 3. LOAD DATA INPATH '/data' OVERWRITE INTO TABLE testtable;
 4. SELECT * FROM testtable;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8055) Code cleanup after HIVE-8054 [Spark Branch]

2014-09-16 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135948#comment-14135948
 ] 

Xuefu Zhang commented on HIVE-8055:
---

Patch looks good. +1 pending on test.

 Code cleanup after HIVE-8054 [Spark Branch]
 ---

 Key: HIVE-8055
 URL: https://issues.apache.org/jira/browse/HIVE-8055
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Na Yang
  Labels: Spark-M1
 Attachments: HIVE-8055-spark.patch


 There is quite some code handling union removal optimization in SparkCompiler 
 and related classes. We need to clean this up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8055) Code cleanup after HIVE-8054 [Spark Branch]

2014-09-16 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-8055:
--
Status: Patch Available  (was: Open)

 Code cleanup after HIVE-8054 [Spark Branch]
 ---

 Key: HIVE-8055
 URL: https://issues.apache.org/jira/browse/HIVE-8055
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Na Yang
  Labels: Spark-M1
 Attachments: HIVE-8055-spark.patch


 There is quite some code handling union removal optimization in SparkCompiler 
 and related classes. We need to clean this up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8115) Hive select query hang when fields contain map

2014-09-16 Thread Xiaobing Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaobing Zhou updated HIVE-8115:

Status: Patch Available  (was: Open)

 Hive select query hang when fields contain map
 --

 Key: HIVE-8115
 URL: https://issues.apache.org/jira/browse/HIVE-8115
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Xiaobing Zhou
Assignee: Xiaobing Zhou
 Fix For: 0.14.0

 Attachments: HIVE-8115.1.patch, createTable.hql, data


 Attached the repro of the issue. When creating an table loading the data 
 attached, all hive query with hangs even just select * from the table.
 repro steps:
 1. run createTable.hql
 2. hadoop fs ls -put data /data
 3. LOAD DATA INPATH '/data' OVERWRITE INTO TABLE testtable;
 4. SELECT * FROM testtable;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7762) Enhancement while getting partitions via webhcat client

2014-09-16 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135954#comment-14135954
 ] 

Mithun Radhakrishnan commented on HIVE-7762:


Hello, Suhas. Thanks for working on fixing this inconsistency. Generally, this 
is a good fix.

I'd encourage you to add a test-case to TestHCatClient, to create a table with 
uppercase partition columns, and then querying it with a lowercase 
partition-spec. (Essentially, what you've included in your description.) Adding 
one won't be hard; you could just use one of the other tests for reference.
Also, could I please bother you to verify the behaviour of 
{{HCatClient.getPartitions()}}, for case insensitivity? If it's broken too, I'd 
rather we fixed both here. I expect that this should be alright, since it goes 
through the {{listPartitionsByFilter()}} API, but it would be good to have 
confirmation.

Mithun

 Enhancement while getting partitions via webhcat client
 ---

 Key: HIVE-7762
 URL: https://issues.apache.org/jira/browse/HIVE-7762
 Project: Hive
  Issue Type: Improvement
  Components: WebHCat
Reporter: Suhas Vasu
Priority: Minor
 Attachments: HIVE-7762.2.patch, HIVE-7762.patch


 Hcatalog creates partitions in lower case, whereas getting partitions from 
 hcatalog via webhcat client doesn't handle this. So the client starts 
 throwing exceptions.
 Ex:
 CREATE EXTERNAL TABLE in_table (word STRING, cnt INT) PARTITIONED BY (Year 
 STRING, Month STRING, Date STRING, Hour STRING, Minute STRING) STORED AS 
 TEXTFILE LOCATION '/user/suhas/hcat-data/in/';
 Then i try to get partitions by:
 {noformat}
 String inputTableName = in_table;
 String database = default;
 MapString, String partitionSpec = new HashMapString, String();
 partitionSpec.put(Year, 2014);
 partitionSpec.put(Month, 08);
 partitionSpec.put(Date, 11);
 partitionSpec.put(Hour, 00);
 partitionSpec.put(Minute, 00);
 HCatClient client = get(catalogUrl);
 HCatPartition hCatPartition = client.getPartition(database, 
 inputTableName, partitionSpec);
 {noformat}
 This throws up saying:
 {noformat}
 Exception in thread main org.apache.hcatalog.common.HCatException : 9001 : 
 Exception occurred while processing HCat request : Invalid partition-key 
 specified: year
   at 
 org.apache.hcatalog.api.HCatClientHMSImpl.getPartition(HCatClientHMSImpl.java:366)
   at com.inmobi.demo.HcatPartitions.main(HcatPartitions.java:34)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
 {noformat}
 The same code works if i do
 {noformat}
 partitionSpec.put(year, 2014);
 partitionSpec.put(month, 08);
 partitionSpec.put(date, 11);
 partitionSpec.put(hour, 00);
 partitionSpec.put(minute, 00);
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8106) Enable vectorization for spark [spark branch]

2014-09-16 Thread Chinna Rao Lalam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chinna Rao Lalam updated HIVE-8106:
---
Attachment: HIVE-8106.1-spark.patch

 Enable vectorization for spark [spark branch]
 -

 Key: HIVE-8106
 URL: https://issues.apache.org/jira/browse/HIVE-8106
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam
 Attachments: HIVE-8106-spark.patch, HIVE-8106.1-spark.patch


 Enable the vectorization optimization on spark



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 25700: HIVE-8080 CBO: function name may not match UDF name during translation

2014-09-16 Thread John Pullokkaran


 On Sept. 16, 2014, 5:30 p.m., John Pullokkaran wrote:
  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java, line 646
  https://reviews.apache.org/r/25700/diff/1/?file=690720#file690720line646
 
  are all functions qualified in hive (w.r.t DB)
  How about built in functions like toLower?
  Could you say DB NAME.toLower()?
 
 John Pullokkaran wrote:
 Also could you run few of the q test below and see if your change causes 
 problems
 authorization_create_func1.q  show_functions.q
 vectorized_string_funcs.q
 create_func1.qvector_decimal_math_funcs.q 
 vectorized_timestamp_funcs.q
 drop_function.q   vectorized_date_funcs.q
 show_describe_func_quotes.q   vectorized_math_funcs.q
 
 Sergey Shelukhin wrote:
 this is covered by the 2nd part of the condition (if function is located 
 w/o qualified name, just the name is returned)
 
 Will run the tests
 
 Sergey Shelukhin wrote:
 Ran the tests; there are some out file changes, but they are the same as 
 on current cbo branch.

it seems like the change would always use qualified function name.
If thats the case would built in functions work?
For example in the select statement could you always qualify functions with db 
name. What about arithmetic expressions, conjunctive/disjunctive functions 
(ad/or)?
It seems like your change would qualify thos functions with DB name.

What is that i am missing?


- John


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25700/#review53546
---


On Sept. 16, 2014, 5:23 p.m., Sergey Shelukhin wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25700/
 ---
 
 (Updated Sept. 16, 2014, 5:23 p.m.)
 
 
 Review request for hive, Ashutosh Chauhan and John Pullokkaran.
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 see jira
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java c503bbb 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/RexNodeConverter.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/SqlFunctionConverter.java
  PRE-CREATION 
   ql/src/test/queries/clientpositive/create_func1.q ad924d3 
   ql/src/test/results/clientpositive/create_func1.q.out 798f77f 
 
 Diff: https://reviews.apache.org/r/25700/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Sergey Shelukhin
 




[jira] [Commented] (HIVE-8139) Upgrade commons-lang from 2.4 to 2.6

2014-09-16 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135970#comment-14135970
 ] 

Ashutosh Chauhan commented on HIVE-8139:


I think HIVE-7145 is relevant. Consider that one too.

 Upgrade commons-lang from 2.4 to 2.6
 

 Key: HIVE-8139
 URL: https://issues.apache.org/jira/browse/HIVE-8139
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure
Affects Versions: 0.14.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Fix For: 0.14.0

 Attachments: HIVE-8139.1.patch


 Upgrade commons-lang version from 2.4 to latest 2.6



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8106) Enable vectorization for spark [spark branch]

2014-09-16 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135995#comment-14135995
 ] 

Xuefu Zhang commented on HIVE-8106:
---

Hi [~chinnalalam], If the patch is ready, please click above submit patch 
button to allow the test run. Thanks.


 Enable vectorization for spark [spark branch]
 -

 Key: HIVE-8106
 URL: https://issues.apache.org/jira/browse/HIVE-8106
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam
 Attachments: HIVE-8106-spark.patch, HIVE-8106.1-spark.patch


 Enable the vectorization optimization on spark



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8139) Upgrade commons-lang from 2.4 to 2.6

2014-09-16 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135991#comment-14135991
 ] 

Brock Noland commented on HIVE-8139:


Makes sense. I think we can move to 2.6 until we are able to remove 
commons-lang 2.

 Upgrade commons-lang from 2.4 to 2.6
 

 Key: HIVE-8139
 URL: https://issues.apache.org/jira/browse/HIVE-8139
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure
Affects Versions: 0.14.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Fix For: 0.14.0

 Attachments: HIVE-8139.1.patch


 Upgrade commons-lang version from 2.4 to latest 2.6



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-5764) Stopping Metastore and HiveServer2 from command line

2014-09-16 Thread Xiaobing Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14136012#comment-14136012
 ] 

Xiaobing Zhou commented on HIVE-5764:
-

Can anyone do a review to make it go to trunk? Thanks!

 Stopping Metastore and HiveServer2 from command line
 

 Key: HIVE-5764
 URL: https://issues.apache.org/jira/browse/HIVE-5764
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, Metastore
Reporter: Vaibhav Gumashta
Assignee: Xiaobing Zhou
  Labels: patch
 Fix For: 0.14.0

 Attachments: HIVE-5764.patch


 Currently a user needs to kill the process. Ideally there should be something 
 like:
 hive --service metastore stop
 hive --service hiveserver2 stop



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-5764) Stopping Metastore and HiveServer2 from command line

2014-09-16 Thread Xiaobing Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaobing Zhou updated HIVE-5764:

Attachment: HIVE-5764.1.patch

 Stopping Metastore and HiveServer2 from command line
 

 Key: HIVE-5764
 URL: https://issues.apache.org/jira/browse/HIVE-5764
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, Metastore
Reporter: Vaibhav Gumashta
Assignee: Xiaobing Zhou
  Labels: patch
 Fix For: 0.14.0

 Attachments: HIVE-5764.1.patch, HIVE-5764.patch


 Currently a user needs to kill the process. Ideally there should be something 
 like:
 hive --service metastore stop
 hive --service hiveserver2 stop



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-5764) Stopping Metastore and HiveServer2 from command line

2014-09-16 Thread Xiaobing Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaobing Zhou updated HIVE-5764:

Attachment: (was: HIVE-5764.patch)

 Stopping Metastore and HiveServer2 from command line
 

 Key: HIVE-5764
 URL: https://issues.apache.org/jira/browse/HIVE-5764
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, Metastore
Reporter: Vaibhav Gumashta
Assignee: Xiaobing Zhou
  Labels: patch
 Fix For: 0.14.0

 Attachments: HIVE-5764.1.patch


 Currently a user needs to kill the process. Ideally there should be something 
 like:
 hive --service metastore stop
 hive --service hiveserver2 stop



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8102) Partitions of type 'date' behave incorrectly with daylight saving time.

2014-09-16 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-8102:
-
Status: Open  (was: Patch Available)

Just tried with a timezone with half-hour offsets that is ahead of UTC 
(Asia/Tehran) and this does not work, cancelling patch.

 Partitions of type 'date' behave incorrectly with daylight saving time.
 ---

 Key: HIVE-8102
 URL: https://issues.apache.org/jira/browse/HIVE-8102
 Project: Hive
  Issue Type: Bug
  Components: Database/Schema, Serializers/Deserializers
Affects Versions: 0.13.0
Reporter: Eli Acherkan
 Attachments: HIVE-8102.1.patch


 On 2AM on March 28th 2014, Israel went from standard time (GMT+2) to daylight 
 saving time (GMT+3).
 The server's timezone is Asia/Jerusalem. When creating a partition whose key 
 is 2014-03-28, Hive creates a partition for 2013-03-27 instead:
 hive (default) create table test (a int) partitioned by (`b_prt` date);
 OK
 Time taken: 0.092 seconds
 hive (default) alter table test add partition (b_prt='2014-03-28');
 OK
 Time taken: 0.187 seconds
 hive (default) show partitions test;   
 OK
 partition
 b_prt=2014-03-27
 Time taken: 0.134 seconds, Fetched: 1 row(s)
 It seems that the root cause is the behavior of 
 DateWritable.daysToMillis/dateToDays.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7777) add CSV support for Serde

2014-09-16 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14136036#comment-14136036
 ] 

Hive QA commented on HIVE-:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12668948/HIVE-.3.patch

{color:green}SUCCESS:{color} +1 6282 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/824/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/824/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-824/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12668948

 add CSV support for Serde
 -

 Key: HIVE-
 URL: https://issues.apache.org/jira/browse/HIVE-
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu
 Attachments: HIVE-.1.patch, HIVE-.2.patch, HIVE-.3.patch, 
 HIVE-.patch, csv-serde-master.zip


 There is no official support for csvSerde for hive while there is an open 
 source project in github(https://github.com/ogrodnek/csv-serde). CSV is of 
 high frequency in use as a data format.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8074) Merge spark into trunk 9/12/2014

2014-09-16 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14136041#comment-14136041
 ] 

Brock Noland commented on HIVE-8074:


The merge was really ugly due to the new statistics which for CBO which has 
been done on trunk. I have done the merge and will update the Spark Test file 
outputs soon.

Until then, most spark tests will fail. Sorry for the disruption.

 Merge spark into trunk 9/12/2014
 

 Key: HIVE-8074
 URL: https://issues.apache.org/jira/browse/HIVE-8074
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Brock Noland
Assignee: Brock Noland





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-8140) Remove obsolete code from SparkWork [Spark Branch]

2014-09-16 Thread Xuefu Zhang (JIRA)
Xuefu Zhang created HIVE-8140:
-

 Summary: Remove obsolete code from SparkWork [Spark Branch]
 Key: HIVE-8140
 URL: https://issues.apache.org/jira/browse/HIVE-8140
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Xuefu Zhang


There are old code in SparkWork about get/set map/reduce work. It's from POC 
code, which isn't applicable any more. We should remove it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   >