date:20130828


[ 
https://issues.apache.org/jira/browse/HIVE-5147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13752350#comment-13752350
 ] 

Hudson commented on HIVE-5147:
--

FAILURE: Integrated in Hive-trunk-hadoop2 #386 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/386/])
HIVE-5147 : Newly added test TestSessionHooks is failing on trunk (Navis via 
Ashutosh Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1517873)
* 
/hive/trunk/service/src/java/org/apache/hive/service/cli/session/HiveSessionHookContext.java
* 
/hive/trunk/service/src/java/org/apache/hive/service/cli/session/HiveSessionHookContextImpl.java
* 
/hive/trunk/service/src/java/org/apache/hive/service/cli/session/SessionManager.java


 Newly added test TestSessionHooks is failing on trunk
 -

 Key: HIVE-5147
 URL: https://issues.apache.org/jira/browse/HIVE-5147
 Project: Hive
  Issue Type: Test
  Components: Tests
Affects Versions: 0.12.0
Reporter: Ashutosh Chauhan
Assignee: Navis
 Fix For: 0.12.0

 Attachments: HIVE-5147.D12543.1.patch


 This was recently added via HIVE-4588

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5144) HashTableSink allocates empty new Object[] arrays OOMs - use a static emptyRow instead


[ 
https://issues.apache.org/jira/browse/HIVE-5144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13752349#comment-13752349
 ] 

Hudson commented on HIVE-5144:
--

FAILURE: Integrated in Hive-trunk-hadoop2 #386 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/386/])
HIVE-5144 : HashTableSink allocates empty new Object[] arrays  OOMs - use a 
static emptyRow instead (Gopal V via Ashutosh Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1517877)
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java


 HashTableSink allocates empty new Object[] arrays  OOMs - use a static 
 emptyRow instead
 

 Key: HIVE-5144
 URL: https://issues.apache.org/jira/browse/HIVE-5144
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
 Environment: Ubuntu LXC + -Xmx512m client opts
Reporter: Gopal V
Assignee: Gopal V
Priority: Minor
  Labels: perfomance
 Fix For: 0.12.0

 Attachments: HIVE-5144.01.patch, HIVE-5144.02.patch


 The map-join hashtable sink in the local-task creates an in-memory hashtable 
 with the following code.
 {code}
  Object[] value = JoinUtil.computeMapJoinValues(row, joinValues[alias],
 ...
  MapJoinRowContainer rowContainer = tableContainer.get(key);
 if (rowContainer == null) {
   rowContainer = new MapJoinRowContainer();
   rowContainer.add(value);
 {code}
 But for a query where the joinValues[alias].size() == 0, this results in a 
 large number of unnecessary allocations which would be better served with a 
 copy-on-write default value container  a pre-allocated zero object array 
 which is immutable (the only immutable array there is in java).
 The query tested is roughly the following to scan all of 
 customer_demographics in the hash-sink
 {code}
 select c_salutation, count(1)
  from customer
   JOIN customer_demographics ON customer.c_current_cdemo_sk = 
 customer_demographics.cd_demo_sk
  group by c_salutation
  limit 10
 ;
 {code}
 When running with current trunk, the code results in an OOM with 512Mb ram.
 {code}
 2013-08-23 05:11:26   Processing rows:140 Hashtable size: 139 
 Memory usage:   292418944   percentage: 0.579
 Execution failed with exit status: 3
 Obtaining error information
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Possible to get table metadata in UDTF or UDF?

2013-08-28 Thread Hs

Hi all,

Is it possible to get metadata (e.g. column names, column ids ) for a given
table name inside a User Defined Table Function ?

Best regards,

Shawn

[jira] [Updated] (HIVE-3562) Some limit can be pushed down to map stage


 [ 
https://issues.apache.org/jira/browse/HIVE-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-3562:
---

   Resolution: Fixed
Fix Version/s: 0.12.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Navis for your persistence on this one!

 Some limit can be pushed down to map stage
 --

 Key: HIVE-3562
 URL: https://issues.apache.org/jira/browse/HIVE-3562
 Project: Hive
  Issue Type: Bug
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Fix For: 0.12.0

 Attachments: HIVE-3562.D5967.1.patch, HIVE-3562.D5967.2.patch, 
 HIVE-3562.D5967.3.patch, HIVE-3562.D5967.4.patch, HIVE-3562.D5967.5.patch, 
 HIVE-3562.D5967.6.patch, HIVE-3562.D5967.7.patch, HIVE-3562.D5967.8.patch, 
 HIVE-3562.D5967.9.patch


 Queries with limit clause (with reasonable number), for example
 {noformat}
 select * from src order by key limit 10;
 {noformat}
 makes operator tree, 
 TS-SEL-RS-EXT-LIMIT-FS
 But LIMIT can be partially calculated in RS, reducing size of shuffling.
 TS-SEL-RS(TOP-N)-EXT-LIMIT-FS

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5166) TestWebHCatE2e is failing on trunk


[ 
https://issues.apache.org/jira/browse/HIVE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13752481#comment-13752481
 ] 

Ashutosh Chauhan commented on HIVE-5166:


Stacktrace:
{noformat}
TestCase TestWebHCatE2e

   Name Status Type Time(s)
   getStatus Failure GET 
http://localhost:50111/templeton/v1/status?user.name=johndoe html head 
meta http-equiv=Content-Type
   content=text/html;charset=ISO-8859-1/ titleError 503 
java.lang.RuntimeException: Could not load wadl generators from
   wadlGeneratorDescriptions./title /head body h2HTTP ERROR: 503/h2 
pProblem accessing /templeton/v1/status. Reason: pre
   java.lang.RuntimeException: Could not load wadl generators from 
wadlGeneratorDescriptions./pre/p hr /ismallPowered by
   Jetty:///small/i /body /html expected:200 but was:503
   junit.framework.AssertionFailedError: GET 
http://localhost:50111/templeton/v1/status?user.name=johndoe html
   head
   meta http-equiv=Content-Type content=text/html;charset=ISO-8859-1/
   titleError 503 java.lang.RuntimeException: Could not load wadl generators 
from wadlGeneratorDescriptions./title
   /head
   body
   h2HTTP ERROR: 503/h2
   pProblem accessing /templeton/v1/status. Reason:
   pre java.lang.RuntimeException: Could not load wadl generators from 
wadlGeneratorDescriptions./pre/p
   hr /ismallPowered by Jetty:///small/i
   /body
   /html
   expected:200 but was:503
   at 
org.apache.hcatalog.templeton.TestWebHCatE2e.getStatus(TestWebHCatE2e.java:85)
   0.125
   invalidPath Failure GET 
http://localhost:50111/templeton/v1/no_such_mapping/database?user.name=johndoe 
html head meta
   http-equiv=Content-Type content=text/html;charset=ISO-8859-1/ 
titleError 503 java.lang.RuntimeException: Could not load wadl
   generators from wadlGeneratorDescriptions./title /head body h2HTTP 
ERROR: 503/h2 pProblem accessing
   /templeton/v1/no_such_mapping/database. Reason: pre 
java.lang.RuntimeException: Could not load wadl generators from
   wadlGeneratorDescriptions./pre/p hr /ismallPowered by 
Jetty:///small/i /body /html expected:500 but was:503
   junit.framework.AssertionFailedError: GET 
http://localhost:50111/templeton/v1/no_such_mapping/database?user.name=johndoe 
html
   head
   meta http-equiv=Content-Type content=text/html;charset=ISO-8859-1/
   titleError 503 java.lang.RuntimeException: Could not load wadl generators 
from wadlGeneratorDescriptions./title
   /head
  body
   h2HTTP ERROR: 503/h2
   pProblem accessing /templeton/v1/no_such_mapping/database. Reason:
   pre java.lang.RuntimeException: Could not load wadl generators from 
wadlGeneratorDescriptions./pre/p
   hr /ismallPowered by Jetty:///small/i
   /body
   /html
   expected:500 but was:503
   at 
org.apache.hcatalog.templeton.TestWebHCatE2e.invalidPath(TestWebHCatE2e.java:105)
{noformat}

 TestWebHCatE2e is failing on trunk
 --

 Key: HIVE-5166
 URL: https://issues.apache.org/jira/browse/HIVE-5166
 Project: Hive
  Issue Type: Bug
  Components: Tests, WebHCat
Affects Versions: 0.12.0
Reporter: Ashutosh Chauhan

 I observed these while running full test suite last couple of times.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-5166) TestWebHCatE2e is failing on trunk

Ashutosh Chauhan created HIVE-5166:
--

 Summary: TestWebHCatE2e is failing on trunk
 Key: HIVE-5166
 URL: https://issues.apache.org/jira/browse/HIVE-5166
 Project: Hive
  Issue Type: Bug
  Components: Tests, WebHCat
Affects Versions: 0.12.0
Reporter: Ashutosh Chauhan


I observed these while running full test suite last couple of times.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5166) TestWebHCatE2e is failing on trunk


[ 
https://issues.apache.org/jira/browse/HIVE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13752486#comment-13752486
 ] 

Ashutosh Chauhan commented on HIVE-5166:


Also, I should note this is happening inconsistently. On an another box in full 
test run, these tests indeed passed.

 TestWebHCatE2e is failing on trunk
 --

 Key: HIVE-5166
 URL: https://issues.apache.org/jira/browse/HIVE-5166
 Project: Hive
  Issue Type: Bug
  Components: Tests, WebHCat
Affects Versions: 0.12.0
Reporter: Ashutosh Chauhan

 I observed these while running full test suite last couple of times.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5128) Direct SQL for view is failing


 [ 
https://issues.apache.org/jira/browse/HIVE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-5128:
---

   Resolution: Fixed
Fix Version/s: 0.12.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Sergey!

 Direct SQL for view is failing 
 ---

 Key: HIVE-5128
 URL: https://issues.apache.org/jira/browse/HIVE-5128
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Navis
Assignee: Sergey Shelukhin
Priority: Trivial
 Fix For: 0.12.0

 Attachments: HIVE-5128.D12465.1.patch, HIVE-5128.D12465.2.patch


 I cannot sure of this, but dropping views, (it rolls back to JPA and works 
 fine)
 {noformat}
 etastore.ObjectStore: Direct SQL failed, falling back to ORM
 MetaException(message:Unexpected null for one of the IDs, SD null, column 
 null, serde null)
   at 
 org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:195)
   at 
 org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:98)
   at 
 org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilter(ObjectStore.java:1758)
 ...
 {noformat}
 Should it be disabled for views or can be fixed?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3562) Some limit can be pushed down to map stage

2013-08-28 Thread Gopal V (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13752490#comment-13752490
 ] 

Gopal V commented on HIVE-3562:
---

Good work Navis.

Let me mark HIVE-5093 as obsoleted by this - no need for that hack anymore.

 Some limit can be pushed down to map stage
 --

 Key: HIVE-3562
 URL: https://issues.apache.org/jira/browse/HIVE-3562
 Project: Hive
  Issue Type: Bug
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Fix For: 0.12.0

 Attachments: HIVE-3562.D5967.1.patch, HIVE-3562.D5967.2.patch, 
 HIVE-3562.D5967.3.patch, HIVE-3562.D5967.4.patch, HIVE-3562.D5967.5.patch, 
 HIVE-3562.D5967.6.patch, HIVE-3562.D5967.7.patch, HIVE-3562.D5967.8.patch, 
 HIVE-3562.D5967.9.patch


 Queries with limit clause (with reasonable number), for example
 {noformat}
 select * from src order by key limit 10;
 {noformat}
 makes operator tree, 
 TS-SEL-RS-EXT-LIMIT-FS
 But LIMIT can be partially calculated in RS, reducing size of shuffling.
 TS-SEL-RS(TOP-N)-EXT-LIMIT-FS

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5158) allow getting all partitions for table to also use direct SQL path


[ 
https://issues.apache.org/jira/browse/HIVE-5158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13752496#comment-13752496
 ] 

Phabricator commented on HIVE-5158:
---

ashutoshc has requested changes to the revision HIVE-5158 [jira] allow getting 
all partitions for table to also use direct SQL path.

  Question on supporting max.

INLINE COMMENTS
  metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java:1387 
Seems like its not hard to support max in this scenario. We can simply do  
query.setRange(0, max) for it. Did you consider supporting it?

REVISION DETAIL
  https://reviews.facebook.net/D12573

BRANCH
  HIVE-5158

ARCANIST PROJECT
  hive

To: JIRA, ashutoshc, sershe


 allow getting all partitions for table to also use direct SQL path
 --

 Key: HIVE-5158
 URL: https://issues.apache.org/jira/browse/HIVE-5158
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-5158.D12573.1.patch


 While testing some queries I noticed that getPartitions can be very slow 
 (which happens e.g. in non-strict mode with no partition column filter); with 
 a table with many partitions it can take 10-12s easily. SQL perf path can 
 also be used for this path.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5158) allow getting all partitions for table to also use direct SQL path

2013-08-28 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13752526#comment-13752526
 ] 

Sergey Shelukhin commented on HIVE-5158:


Actually there's another path that needs to be changed...

 allow getting all partitions for table to also use direct SQL path
 --

 Key: HIVE-5158
 URL: https://issues.apache.org/jira/browse/HIVE-5158
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-5158.D12573.1.patch


 While testing some queries I noticed that getPartitions can be very slow 
 (which happens e.g. in non-strict mode with no partition column filter); with 
 a table with many partitions it can take 10-12s easily. SQL perf path can 
 also be used for this path.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5128) Direct SQL for view is failing


[ 
https://issues.apache.org/jira/browse/HIVE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13752530#comment-13752530
 ] 

Hudson commented on HIVE-5128:
--

FAILURE: Integrated in Hive-trunk-hadoop2-ptest #74 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/74/])
HIVE-5128 : Direct SQL for view is failing (Sergey Shelukhin via Ashutosh 
Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1518258)
* 
/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
* 
/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java


 Direct SQL for view is failing 
 ---

 Key: HIVE-5128
 URL: https://issues.apache.org/jira/browse/HIVE-5128
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Navis
Assignee: Sergey Shelukhin
Priority: Trivial
 Fix For: 0.12.0

 Attachments: HIVE-5128.D12465.1.patch, HIVE-5128.D12465.2.patch


 I cannot sure of this, but dropping views, (it rolls back to JPA and works 
 fine)
 {noformat}
 etastore.ObjectStore: Direct SQL failed, falling back to ORM
 MetaException(message:Unexpected null for one of the IDs, SD null, column 
 null, serde null)
   at 
 org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:195)
   at 
 org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:98)
   at 
 org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilter(ObjectStore.java:1758)
 ...
 {noformat}
 Should it be disabled for views or can be fixed?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3562) Some limit can be pushed down to map stage


[ 
https://issues.apache.org/jira/browse/HIVE-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13752529#comment-13752529
 ] 

Hudson commented on HIVE-3562:
--

FAILURE: Integrated in Hive-trunk-hadoop2-ptest #74 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/74/])
HIVE-3562 : Some limit can be pushed down to map stage (Navis via Ashutosh 
Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1518234)
* /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
* /hive/trunk/conf/hive-default.xml.template
* /hive/trunk/ql/build.xml
* /hive/trunk/ql/ivy.xml
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExtractOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ForwardOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/SelectOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/TopNHash.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveKey.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/LimitPushdownOptimizer.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java
* /hive/trunk/ql/src/test/queries/clientpositive/limit_pushdown.q
* /hive/trunk/ql/src/test/queries/clientpositive/limit_pushdown_negative.q
* /hive/trunk/ql/src/test/results/clientpositive/limit_pushdown.q.out
* /hive/trunk/ql/src/test/results/clientpositive/limit_pushdown_negative.q.out


 Some limit can be pushed down to map stage
 --

 Key: HIVE-3562
 URL: https://issues.apache.org/jira/browse/HIVE-3562
 Project: Hive
  Issue Type: Bug
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Fix For: 0.12.0

 Attachments: HIVE-3562.D5967.1.patch, HIVE-3562.D5967.2.patch, 
 HIVE-3562.D5967.3.patch, HIVE-3562.D5967.4.patch, HIVE-3562.D5967.5.patch, 
 HIVE-3562.D5967.6.patch, HIVE-3562.D5967.7.patch, HIVE-3562.D5967.8.patch, 
 HIVE-3562.D5967.9.patch


 Queries with limit clause (with reasonable number), for example
 {noformat}
 select * from src order by key limit 10;
 {noformat}
 makes operator tree, 
 TS-SEL-RS-EXT-LIMIT-FS
 But LIMIT can be partially calculated in RS, reducing size of shuffling.
 TS-SEL-RS(TOP-N)-EXT-LIMIT-FS

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HIVE-5093) Use a combiner for LIMIT with GROUP BY and ORDER BY operators


 [ 
https://issues.apache.org/jira/browse/HIVE-5093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan resolved HIVE-5093.


Resolution: Not A Problem

HIVE-3562 made this redundant.

 Use a combiner for LIMIT with GROUP BY and ORDER BY operators
 -

 Key: HIVE-5093
 URL: https://issues.apache.org/jira/browse/HIVE-5093
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Gopal V
Assignee: Gopal V
 Attachments: HIVE-5093-WIP-01.patch


 Operator trees of the following structure can have a memory friendly combiner 
 put in place after the sort-phase 
 GBY-LIM and OBY-LIM
 This will cut down on I/O when spilling to disk and particularly during the 
 merge phase of the reducer.
 There are two possible combiners - LimitNKeysCombiner and 
 LimitNValuesCombiner.
 The first one would be ideal for the GROUP-BY case, while the latter would 
 more useful for the ORDER-BY case.
 The combiners are still relevant even if there are 1:1 forward operators on 
 the reducer side and for small data items, the MR base layer does not run the 
 combiners at all.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3562) Some limit can be pushed down to map stage

2013-08-28 Thread Sivaramakrishnan Narayanan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13752543#comment-13752543
 ] 

Sivaramakrishnan Narayanan commented on HIVE-3562:
--

Good stuff, Navis!

 Some limit can be pushed down to map stage
 --

 Key: HIVE-3562
 URL: https://issues.apache.org/jira/browse/HIVE-3562
 Project: Hive
  Issue Type: Bug
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Fix For: 0.12.0

 Attachments: HIVE-3562.D5967.1.patch, HIVE-3562.D5967.2.patch, 
 HIVE-3562.D5967.3.patch, HIVE-3562.D5967.4.patch, HIVE-3562.D5967.5.patch, 
 HIVE-3562.D5967.6.patch, HIVE-3562.D5967.7.patch, HIVE-3562.D5967.8.patch, 
 HIVE-3562.D5967.9.patch


 Queries with limit clause (with reasonable number), for example
 {noformat}
 select * from src order by key limit 10;
 {noformat}
 makes operator tree, 
 TS-SEL-RS-EXT-LIMIT-FS
 But LIMIT can be partially calculated in RS, reducing size of shuffling.
 TS-SEL-RS(TOP-N)-EXT-LIMIT-FS

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3562) Some limit can be pushed down to map stage


[ 
https://issues.apache.org/jira/browse/HIVE-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13752552#comment-13752552
 ] 

Hudson commented on HIVE-3562:
--

FAILURE: Integrated in Hive-trunk-hadoop1-ptest #142 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/142/])
HIVE-3562 : Some limit can be pushed down to map stage (Navis via Ashutosh 
Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1518234)
* /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
* /hive/trunk/conf/hive-default.xml.template
* /hive/trunk/ql/build.xml
* /hive/trunk/ql/ivy.xml
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExtractOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ForwardOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/SelectOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/TopNHash.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveKey.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/LimitPushdownOptimizer.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java
* /hive/trunk/ql/src/test/queries/clientpositive/limit_pushdown.q
* /hive/trunk/ql/src/test/queries/clientpositive/limit_pushdown_negative.q
* /hive/trunk/ql/src/test/results/clientpositive/limit_pushdown.q.out
* /hive/trunk/ql/src/test/results/clientpositive/limit_pushdown_negative.q.out


 Some limit can be pushed down to map stage
 --

 Key: HIVE-3562
 URL: https://issues.apache.org/jira/browse/HIVE-3562
 Project: Hive
  Issue Type: Bug
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Fix For: 0.12.0

 Attachments: HIVE-3562.D5967.1.patch, HIVE-3562.D5967.2.patch, 
 HIVE-3562.D5967.3.patch, HIVE-3562.D5967.4.patch, HIVE-3562.D5967.5.patch, 
 HIVE-3562.D5967.6.patch, HIVE-3562.D5967.7.patch, HIVE-3562.D5967.8.patch, 
 HIVE-3562.D5967.9.patch


 Queries with limit clause (with reasonable number), for example
 {noformat}
 select * from src order by key limit 10;
 {noformat}
 makes operator tree, 
 TS-SEL-RS-EXT-LIMIT-FS
 But LIMIT can be partially calculated in RS, reducing size of shuffling.
 TS-SEL-RS(TOP-N)-EXT-LIMIT-FS

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5128) Direct SQL for view is failing


[ 
https://issues.apache.org/jira/browse/HIVE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13752553#comment-13752553
 ] 

Hudson commented on HIVE-5128:
--

FAILURE: Integrated in Hive-trunk-hadoop1-ptest #142 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/142/])
HIVE-5128 : Direct SQL for view is failing (Sergey Shelukhin via Ashutosh 
Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1518258)
* 
/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
* 
/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java


 Direct SQL for view is failing 
 ---

 Key: HIVE-5128
 URL: https://issues.apache.org/jira/browse/HIVE-5128
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Navis
Assignee: Sergey Shelukhin
Priority: Trivial
 Fix For: 0.12.0

 Attachments: HIVE-5128.D12465.1.patch, HIVE-5128.D12465.2.patch


 I cannot sure of this, but dropping views, (it rolls back to JPA and works 
 fine)
 {noformat}
 etastore.ObjectStore: Direct SQL failed, falling back to ORM
 MetaException(message:Unexpected null for one of the IDs, SD null, column 
 null, serde null)
   at 
 org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:195)
   at 
 org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:98)
   at 
 org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilter(ObjectStore.java:1758)
 ...
 {noformat}
 Should it be disabled for views or can be fixed?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

hive pull request: Kk wb 1228

2013-08-28 Thread krishna-verticloud

GitHub user krishna-verticloud opened a pull request:

https://github.com/apache/hive/pull/11

Kk wb 1228



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/VertiPub/hive kk-WB-1228

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/11.patch

hive pull request: Kk wb 1228

2013-08-28 Thread krishna-verticloud

Github user krishna-verticloud closed the pull request at:

https://github.com/apache/hive/pull/11

[jira] [Commented] (HIVE-5158) allow getting all partitions for table to also use direct SQL path


[ 
https://issues.apache.org/jira/browse/HIVE-5158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13752565#comment-13752565
 ] 

Phabricator commented on HIVE-5158:
---

sershe has commented on the revision HIVE-5158 [jira] allow getting all 
partitions for table to also use direct SQL path.

INLINE COMMENTS
  metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java:1387 1) 
I am not sure this will work for SQL JDO; probably it will just get all of them 
and return limited number of rows.
  2) Due to absence of offset, it's really a semi-useless parameter; I don't 
see it used.

REVISION DETAIL
  https://reviews.facebook.net/D12573

BRANCH
  HIVE-5158

ARCANIST PROJECT
  hive

To: JIRA, ashutoshc, sershe


 allow getting all partitions for table to also use direct SQL path
 --

 Key: HIVE-5158
 URL: https://issues.apache.org/jira/browse/HIVE-5158
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-5158.D12573.1.patch


 While testing some queries I noticed that getPartitions can be very slow 
 (which happens e.g. in non-strict mode with no partition column filter); with 
 a table with many partitions it can take 10-12s easily. SQL perf path can 
 also be used for this path.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5158) allow getting all partitions for table to also use direct SQL path

2013-08-28 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-5158:
---

Status: Open  (was: Patch Available)

 allow getting all partitions for table to also use direct SQL path
 --

 Key: HIVE-5158
 URL: https://issues.apache.org/jira/browse/HIVE-5158
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-5158.D12573.1.patch


 While testing some queries I noticed that getPartitions can be very slow 
 (which happens e.g. in non-strict mode with no partition column filter); with 
 a table with many partitions it can take 10-12s easily. SQL perf path can 
 also be used for this path.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5029) direct SQL perf optimization cannot be tested well


[ 
https://issues.apache.org/jira/browse/HIVE-5029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13752588#comment-13752588
 ] 

Phabricator commented on HIVE-5029:
---

ashutoshc has requested changes to the revision HIVE-5029 [jira] direct SQL 
perf optimization cannot be tested well.

  Couple of comments.

INLINE COMMENTS
  metastore/src/java/org/apache/hadoop/hive/metastore/RetryingRawStore.java:76 
Why is this change required?
  
metastore/src/java/org/apache/hadoop/hive/metastore/VerifyingObjectStore.java:19
 Can you place this class in metastore/src/test instead of metastore/src/java ?

REVISION DETAIL
  https://reviews.facebook.net/D12483

BRANCH
  HIVE-sqltest

ARCANIST PROJECT
  hive

To: JIRA, ashutoshc, sershe


 direct SQL perf optimization cannot be tested well
 --

 Key: HIVE-5029
 URL: https://issues.apache.org/jira/browse/HIVE-5029
 Project: Hive
  Issue Type: Test
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
Priority: Critical
 Attachments: HIVE-5029.D12483.1.patch, HIVE-5029.patch, 
 HIVE-5029.patch


 HIVE-4051 introduced perf optimization that involves getting partitions 
 directly via SQL in metastore. Given that SQL queries might not work on all 
 datastores (and will not work on non-SQL ones), JDO fallback is in place.
 Given that perf improvement is very large for short queries, it's on by 
 default.
 However, there's a problem with tests with regard to that. If SQL code is 
 broken, tests may fall back to JDO and pass. If JDO code is broken, SQL might 
 allow tests to pass.
 We are going to disable SQL by default before the testing problem is resolved.
 There are several possible solultions:
 1) Separate build for this setting. Seems like an overkill...
 2) Enable by default; disable by default in tests, create a clone of 
 TestCliDriver with a subset of queries that will exercise the SQL path.
 3) Have some sort of test hook inside metastore that will run both ORM and 
 SQL and compare.
 3') Or make a subclass of ObjectStore that will do that. ObjectStore is 
 already pluggable.
 4) Write unit tests for one of the modes (JDO, as non-default?) and declare 
 that they are sufficient; disable fallback in tests.
 3' seems like the easiest. For now we will disable SQL by default.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5158) allow getting all partitions for table to also use direct SQL path


[ 
https://issues.apache.org/jira/browse/HIVE-5158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13752591#comment-13752591
 ] 

Phabricator commented on HIVE-5158:
---

ashutoshc has commented on the revision HIVE-5158 [jira] allow getting all 
partitions for table to also use direct SQL path.

INLINE COMMENTS
  metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java:1387 1) 
That should still be as fast as orm path, if not faster.
  2) Metastore thrift api is public. There can be consumer of if apart from 
Hive.

REVISION DETAIL
  https://reviews.facebook.net/D12573

BRANCH
  HIVE-5158

ARCANIST PROJECT
  hive

To: JIRA, ashutoshc, sershe


 allow getting all partitions for table to also use direct SQL path
 --

 Key: HIVE-5158
 URL: https://issues.apache.org/jira/browse/HIVE-5158
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-5158.D12573.1.patch


 While testing some queries I noticed that getPartitions can be very slow 
 (which happens e.g. in non-strict mode with no partition column filter); with 
 a table with many partitions it can take 10-12s easily. SQL perf path can 
 also be used for this path.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5029) direct SQL perf optimization cannot be tested well


[ 
https://issues.apache.org/jira/browse/HIVE-5029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13752624#comment-13752624
 ] 

Phabricator commented on HIVE-5029:
---

sershe has commented on the revision HIVE-5029 [jira] direct SQL perf 
optimization cannot be tested well.

INLINE COMMENTS
  metastore/src/java/org/apache/hadoop/hive/metastore/RetryingRawStore.java:76 
This class inherits from ObjectStore, so the interface this is looking for is 
on base class. getInterfaces doesn't give you all the interfaces in the  
hierarchy
  
metastore/src/java/org/apache/hadoop/hive/metastore/VerifyingObjectStore.java:19
 let me try this..

REVISION DETAIL
  https://reviews.facebook.net/D12483

BRANCH
  HIVE-sqltest

ARCANIST PROJECT
  hive

To: JIRA, ashutoshc, sershe


 direct SQL perf optimization cannot be tested well
 --

 Key: HIVE-5029
 URL: https://issues.apache.org/jira/browse/HIVE-5029
 Project: Hive
  Issue Type: Test
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
Priority: Critical
 Attachments: HIVE-5029.D12483.1.patch, HIVE-5029.patch, 
 HIVE-5029.patch


 HIVE-4051 introduced perf optimization that involves getting partitions 
 directly via SQL in metastore. Given that SQL queries might not work on all 
 datastores (and will not work on non-SQL ones), JDO fallback is in place.
 Given that perf improvement is very large for short queries, it's on by 
 default.
 However, there's a problem with tests with regard to that. If SQL code is 
 broken, tests may fall back to JDO and pass. If JDO code is broken, SQL might 
 allow tests to pass.
 We are going to disable SQL by default before the testing problem is resolved.
 There are several possible solultions:
 1) Separate build for this setting. Seems like an overkill...
 2) Enable by default; disable by default in tests, create a clone of 
 TestCliDriver with a subset of queries that will exercise the SQL path.
 3) Have some sort of test hook inside metastore that will run both ORM and 
 SQL and compare.
 3') Or make a subclass of ObjectStore that will do that. ObjectStore is 
 already pluggable.
 4) Write unit tests for one of the modes (JDO, as non-default?) and declare 
 that they are sufficient; disable fallback in tests.
 3' seems like the easiest. For now we will disable SQL by default.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4460) Publish HCatalog artifacts for Hadoop 2.x


 [ 
https://issues.apache.org/jira/browse/HIVE-4460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-4460:
-

Attachment: HIVE-4460.3.patch

HIVE-4460.3.patch incorporates RB comments

 Publish HCatalog artifacts for Hadoop 2.x
 -

 Key: HIVE-4460
 URL: https://issues.apache.org/jira/browse/HIVE-4460
 Project: Hive
  Issue Type: Sub-task
  Components: HCatalog
Affects Versions: 0.12.0
 Environment: Hadoop 2.x
Reporter: Venkat Ranganathan
Assignee: Eugene Koifman
 Fix For: 0.12.0

 Attachments: HIVE-4460.2.patch, HIVE-4460.3.patch, HIVE-4460.patch

   Original Estimate: 72h
  Time Spent: 40h 40m
  Remaining Estimate: 31h 20m

 HCatalog artifacts are only published for Hadoop 1.x version.  As more 
 projects add HCatalog integration, the need for HCatalog artifcats on Hadoop 
 versions supported by the product  is needed so that automated builds that 
 target different Hadoop releases can be built successfully.   For example 
 SQOOP-931 introduces Sqoop/HCatalog integration and Sqoop builds with Hadoop 
 1.x and 2.x releases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5029) direct SQL perf optimization cannot be tested well


[ 
https://issues.apache.org/jira/browse/HIVE-5029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13752671#comment-13752671
 ] 

Phabricator commented on HIVE-5029:
---

ashutoshc has commented on the revision HIVE-5029 [jira] direct SQL perf 
optimization cannot be tested well.

INLINE COMMENTS
  metastore/src/java/org/apache/hadoop/hive/metastore/RetryingRawStore.java:76 
Can you add this in a comment here?
  metastore/src/java/org/apache/hadoop/hive/metastore/RetryingRawStore.java:81 
You can do list.toArray() for this.

REVISION DETAIL
  https://reviews.facebook.net/D12483

BRANCH
  HIVE-sqltest

ARCANIST PROJECT
  hive

To: JIRA, ashutoshc, sershe


 direct SQL perf optimization cannot be tested well
 --

 Key: HIVE-5029
 URL: https://issues.apache.org/jira/browse/HIVE-5029
 Project: Hive
  Issue Type: Test
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
Priority: Critical
 Attachments: HIVE-5029.D12483.1.patch, HIVE-5029.patch, 
 HIVE-5029.patch


 HIVE-4051 introduced perf optimization that involves getting partitions 
 directly via SQL in metastore. Given that SQL queries might not work on all 
 datastores (and will not work on non-SQL ones), JDO fallback is in place.
 Given that perf improvement is very large for short queries, it's on by 
 default.
 However, there's a problem with tests with regard to that. If SQL code is 
 broken, tests may fall back to JDO and pass. If JDO code is broken, SQL might 
 allow tests to pass.
 We are going to disable SQL by default before the testing problem is resolved.
 There are several possible solultions:
 1) Separate build for this setting. Seems like an overkill...
 2) Enable by default; disable by default in tests, create a clone of 
 TestCliDriver with a subset of queries that will exercise the SQL path.
 3) Have some sort of test hook inside metastore that will run both ORM and 
 SQL and compare.
 3') Or make a subclass of ObjectStore that will do that. ObjectStore is 
 already pluggable.
 4) Write unit tests for one of the modes (JDO, as non-default?) and declare 
 that they are sufficient; disable fallback in tests.
 3' seems like the easiest. For now we will disable SQL by default.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5163) refactor org.apache.hadoop.mapred.HCatMapRedUtil


 [ 
https://issues.apache.org/jira/browse/HIVE-5163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-5163:
-

Attachment: HIVE-5163.update
HIVE-5163.patch
HIVE-5163.move

Moved HCatMapRedUtil to org.apache.hcatalog.mapreduce to make above mentioned 
bugs easier.
HIVE-5163.patch - cummulative (for automated build)
HIVE-5163.move - just the rename for SVN rename to preserve history
HIVE-5163.update - changed to apply after SVN move is done

 refactor org.apache.hadoop.mapred.HCatMapRedUtil
 

 Key: HIVE-5163
 URL: https://issues.apache.org/jira/browse/HIVE-5163
 Project: Hive
  Issue Type: Sub-task
  Components: HCatalog
Affects Versions: 0.12.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Fix For: 0.12.0

 Attachments: HIVE-5163.move, HIVE-5163.patch, HIVE-5163.update


 Everything that this class does is delegated to a Shim class.
 To make HIVE-4895 and HIVE-4896 smoother, we need to get rid of 
 HCatMapRedUtil and make the calls directly to the Shim layer.  It will make 
 it easier because all org.apache.hcatalog classes will move to 
 org.apache.hive.hcatalog classes thus making way to provide binary backwards 
 compat.  This class won't change it's name so it's more difficult to provide 
 backwards compat.  The org.apache.hadoop.mapred.TempletonJobTracker is not an 
 issue since it goes away in HIVE-4460.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5163) refactor org.apache.hadoop.mapred.HCatMapRedUtil


[ 
https://issues.apache.org/jira/browse/HIVE-5163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13752685#comment-13752685
 ] 

Eugene Koifman commented on HIVE-5163:
--

This must be checked in after HIVE-4460

 refactor org.apache.hadoop.mapred.HCatMapRedUtil
 

 Key: HIVE-5163
 URL: https://issues.apache.org/jira/browse/HIVE-5163
 Project: Hive
  Issue Type: Sub-task
  Components: HCatalog
Affects Versions: 0.12.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Fix For: 0.12.0

 Attachments: HIVE-5163.move, HIVE-5163.patch, HIVE-5163.update


 Everything that this class does is delegated to a Shim class.
 To make HIVE-4895 and HIVE-4896 smoother, we need to get rid of 
 HCatMapRedUtil and make the calls directly to the Shim layer.  It will make 
 it easier because all org.apache.hcatalog classes will move to 
 org.apache.hive.hcatalog classes thus making way to provide binary backwards 
 compat.  This class won't change it's name so it's more difficult to provide 
 backwards compat.  The org.apache.hadoop.mapred.TempletonJobTracker is not an 
 issue since it goes away in HIVE-4460.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5163) refactor org.apache.hadoop.mapred.HCatMapRedUtil


 [ 
https://issues.apache.org/jira/browse/HIVE-5163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-5163:
-

Status: Patch Available  (was: Open)

 refactor org.apache.hadoop.mapred.HCatMapRedUtil
 

 Key: HIVE-5163
 URL: https://issues.apache.org/jira/browse/HIVE-5163
 Project: Hive
  Issue Type: Sub-task
  Components: HCatalog
Affects Versions: 0.12.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Fix For: 0.12.0

 Attachments: HIVE-5163.move, HIVE-5163.patch, HIVE-5163.update


 Everything that this class does is delegated to a Shim class.
 To make HIVE-4895 and HIVE-4896 smoother, we need to get rid of 
 HCatMapRedUtil and make the calls directly to the Shim layer.  It will make 
 it easier because all org.apache.hcatalog classes will move to 
 org.apache.hive.hcatalog classes thus making way to provide binary backwards 
 compat.  This class won't change it's name so it's more difficult to provide 
 backwards compat.  The org.apache.hadoop.mapred.TempletonJobTracker is not an 
 issue since it goes away in HIVE-4460.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4895) Move all HCatalog classes to org.apache.hive.hcatalog


 [ 
https://issues.apache.org/jira/browse/HIVE-4895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-4895:
-

Status: Open  (was: Patch Available)

This patch will need to be redone after HIVE-4460  HIVE-5163

 Move all HCatalog classes to org.apache.hive.hcatalog
 -

 Key: HIVE-4895
 URL: https://issues.apache.org/jira/browse/HIVE-4895
 Project: Hive
  Issue Type: Sub-task
  Components: HCatalog
Affects Versions: 0.12.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Fix For: 0.12.0

 Attachments: HIVE-4895.move.patch, HIVE-4895.patch, 
 HIVE-4895.update.patch

   Original Estimate: 24h
  Time Spent: 12h
  Remaining Estimate: 12h

 make sure to preserve history in SCM

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: Review Request 13862: ReduceSinkDeDuplication can pick the wrong partitioning columns

2013-08-28 Thread Yin Huai


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13862/
---

(Updated Aug. 28, 2013, 7:03 p.m.)


Review request for hive.


Changes
---

update comments


Bugs: HIVE-5149
https://issues.apache.org/jira/browse/HIVE-5149


Repository: hive-git


Description
---

https://mail-archives.apache.org/mod_mbox/hive-user/201308.mbox/%3CCAG6Lhyex5XPwszpihKqkPRpzri2k=m4qgc+cpar5yvr8sjt...@mail.gmail.com%3E


Diffs (updated)
-

  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplication.java
 c380a2d 
  ql/src/test/results/clientpositive/groupby2_map_skew.q.out da7a128 
  ql/src/test/results/clientpositive/groupby_cube1.q.out a52f4eb 
  ql/src/test/results/clientpositive/groupby_rollup1.q.out f120471 
  ql/src/test/results/clientpositive/reduce_deduplicate_extended.q.out 3297ebb 

Diff: https://reviews.apache.org/r/13862/diff/


Testing
---


Thanks,

Yin Huai

[jira] [Updated] (HIVE-5149) ReduceSinkDeDuplication can pick the wrong partitioning columns

2013-08-28 Thread Yin Huai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-5149:
---

Attachment: HIVE-5149.2.patch

 ReduceSinkDeDuplication can pick the wrong partitioning columns
 ---

 Key: HIVE-5149
 URL: https://issues.apache.org/jira/browse/HIVE-5149
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.11.0, 0.12.0
Reporter: Yin Huai
Assignee: Yin Huai
 Attachments: HIVE-5149.1.patch, HIVE-5149.2.patch


 https://mail-archives.apache.org/mod_mbox/hive-user/201308.mbox/%3CCAG6Lhyex5XPwszpihKqkPRpzri2k=m4qgc+cpar5yvr8sjt...@mail.gmail.com%3E

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4460) Publish HCatalog artifacts for Hadoop 2.x


[ 
https://issues.apache.org/jira/browse/HIVE-4460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13752783#comment-13752783
 ] 

Thejas M Nair commented on HIVE-4460:
-

+1 . I will kick off the tests on my machine as pre-commit tests are not 
working right now.


 Publish HCatalog artifacts for Hadoop 2.x
 -

 Key: HIVE-4460
 URL: https://issues.apache.org/jira/browse/HIVE-4460
 Project: Hive
  Issue Type: Sub-task
  Components: HCatalog
Affects Versions: 0.12.0
 Environment: Hadoop 2.x
Reporter: Venkat Ranganathan
Assignee: Eugene Koifman
 Fix For: 0.12.0

 Attachments: HIVE-4460.2.patch, HIVE-4460.3.patch, HIVE-4460.patch

   Original Estimate: 72h
  Time Spent: 40h 40m
  Remaining Estimate: 31h 20m

 HCatalog artifacts are only published for Hadoop 1.x version.  As more 
 projects add HCatalog integration, the need for HCatalog artifcats on Hadoop 
 versions supported by the product  is needed so that automated builds that 
 target different Hadoop releases can be built successfully.   For example 
 SQOOP-931 introduces Sqoop/HCatalog integration and Sqoop builds with Hadoop 
 1.x and 2.x releases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5029) direct SQL perf optimization cannot be tested well


[ 
https://issues.apache.org/jira/browse/HIVE-5029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13752805#comment-13752805
 ] 

Phabricator commented on HIVE-5029:
---

sershe has commented on the revision HIVE-5029 [jira] direct SQL perf 
optimization cannot be tested well.

INLINE COMMENTS
  metastore/src/java/org/apache/hadoop/hive/metastore/RetryingRawStore.java:81 
ClassUtils.getAllInterfaces returns non-generic list, so it only has 
toArray(Object[])

REVISION DETAIL
  https://reviews.facebook.net/D12483

To: JIRA, ashutoshc, sershe


 direct SQL perf optimization cannot be tested well
 --

 Key: HIVE-5029
 URL: https://issues.apache.org/jira/browse/HIVE-5029
 Project: Hive
  Issue Type: Test
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
Priority: Critical
 Attachments: HIVE-5029.D12483.1.patch, HIVE-5029.D12483.2.patch, 
 HIVE-5029.patch, HIVE-5029.patch


 HIVE-4051 introduced perf optimization that involves getting partitions 
 directly via SQL in metastore. Given that SQL queries might not work on all 
 datastores (and will not work on non-SQL ones), JDO fallback is in place.
 Given that perf improvement is very large for short queries, it's on by 
 default.
 However, there's a problem with tests with regard to that. If SQL code is 
 broken, tests may fall back to JDO and pass. If JDO code is broken, SQL might 
 allow tests to pass.
 We are going to disable SQL by default before the testing problem is resolved.
 There are several possible solultions:
 1) Separate build for this setting. Seems like an overkill...
 2) Enable by default; disable by default in tests, create a clone of 
 TestCliDriver with a subset of queries that will exercise the SQL path.
 3) Have some sort of test hook inside metastore that will run both ORM and 
 SQL and compare.
 3') Or make a subclass of ObjectStore that will do that. ObjectStore is 
 already pluggable.
 4) Write unit tests for one of the modes (JDO, as non-default?) and declare 
 that they are sufficient; disable fallback in tests.
 3' seems like the easiest. For now we will disable SQL by default.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5029) direct SQL perf optimization cannot be tested well


 [ 
https://issues.apache.org/jira/browse/HIVE-5029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-5029:
--

Attachment: HIVE-5029.D12483.2.patch

sershe updated the revision HIVE-5029 [jira] direct SQL perf optimization 
cannot be tested well.

  Update w/feedback. The moved file didn't change.

Reviewers: ashutoshc, JIRA

REVISION DETAIL
  https://reviews.facebook.net/D12483

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D12483?vs=38841id=39177#toc

AFFECTED FILES
  metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java
  metastore/src/java/org/apache/hadoop/hive/metastore/RetryingRawStore.java
  metastore/src/test/org/apache/hadoop/hive/metastore/VerifyingObjectStore.java
  ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java

To: JIRA, ashutoshc, sershe


 direct SQL perf optimization cannot be tested well
 --

 Key: HIVE-5029
 URL: https://issues.apache.org/jira/browse/HIVE-5029
 Project: Hive
  Issue Type: Test
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
Priority: Critical
 Attachments: HIVE-5029.D12483.1.patch, HIVE-5029.D12483.2.patch, 
 HIVE-5029.patch, HIVE-5029.patch


 HIVE-4051 introduced perf optimization that involves getting partitions 
 directly via SQL in metastore. Given that SQL queries might not work on all 
 datastores (and will not work on non-SQL ones), JDO fallback is in place.
 Given that perf improvement is very large for short queries, it's on by 
 default.
 However, there's a problem with tests with regard to that. If SQL code is 
 broken, tests may fall back to JDO and pass. If JDO code is broken, SQL might 
 allow tests to pass.
 We are going to disable SQL by default before the testing problem is resolved.
 There are several possible solultions:
 1) Separate build for this setting. Seems like an overkill...
 2) Enable by default; disable by default in tests, create a clone of 
 TestCliDriver with a subset of queries that will exercise the SQL path.
 3) Have some sort of test hook inside metastore that will run both ORM and 
 SQL and compare.
 3') Or make a subclass of ObjectStore that will do that. ObjectStore is 
 already pluggable.
 4) Write unit tests for one of the modes (JDO, as non-default?) and declare 
 that they are sufficient; disable fallback in tests.
 3' seems like the easiest. For now we will disable SQL by default.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5163) refactor org.apache.hadoop.mapred.HCatMapRedUtil


[ 
https://issues.apache.org/jira/browse/HIVE-5163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13752820#comment-13752820
 ] 

Thejas M Nair commented on HIVE-5163:
-

Looks good +1.
This is hcat only change, so I will make sure it doesn't break hive build and 
run the hcat unit tests before committing.



 refactor org.apache.hadoop.mapred.HCatMapRedUtil
 

 Key: HIVE-5163
 URL: https://issues.apache.org/jira/browse/HIVE-5163
 Project: Hive
  Issue Type: Sub-task
  Components: HCatalog
Affects Versions: 0.12.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Fix For: 0.12.0

 Attachments: HIVE-5163.move, HIVE-5163.patch, HIVE-5163.update


 Everything that this class does is delegated to a Shim class.
 To make HIVE-4895 and HIVE-4896 smoother, we need to get rid of 
 HCatMapRedUtil and make the calls directly to the Shim layer.  It will make 
 it easier because all org.apache.hcatalog classes will move to 
 org.apache.hive.hcatalog classes thus making way to provide binary backwards 
 compat.  This class won't change it's name so it's more difficult to provide 
 backwards compat.  The org.apache.hadoop.mapred.TempletonJobTracker is not an 
 issue since it goes away in HIVE-4460.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4844) Add char/varchar data types

2013-08-28 Thread Jason Dere (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-4844:
-

Attachment: HIVE-4844.9.patch

attaching HIVE-4844.9.patch, changes per review from hbutani:
- descriptive comment about numericTypes map
- TypeInfoParser fix and tests for invalid TypeInfo parameter syntax
- raise error if Hive tries to instantiate varchar TypeInfo without type 
params.
- fixed typo in constant value in Thrift file


 Add char/varchar data types
 ---

 Key: HIVE-4844
 URL: https://issues.apache.org/jira/browse/HIVE-4844
 Project: Hive
  Issue Type: New Feature
  Components: Types
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-4844.1.patch.hack, HIVE-4844.2.patch, 
 HIVE-4844.3.patch, HIVE-4844.4.patch, HIVE-4844.5.patch, HIVE-4844.6.patch, 
 HIVE-4844.7.patch, HIVE-4844.8.patch, HIVE-4844.9.patch, screenshot.png


 Add new char/varchar data types which have support for more SQL-compliant 
 behavior, such as SQL string comparison semantics, max length, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4961) Create bridge for custom UDFs to operate in vectorized mode

2013-08-28 Thread Eric Hanson (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Hanson updated HIVE-4961:
--

Attachment: vectorUDF.8.patch

Added unit tests, plus support for isRepeating performance optimization for the 
case when all input vectors passed into a function are marked as isRepeating = 
true. Fixed a bug related to setting string output.

 Create bridge for custom UDFs to operate in vectorized mode
 ---

 Key: HIVE-4961
 URL: https://issues.apache.org/jira/browse/HIVE-4961
 Project: Hive
  Issue Type: Sub-task
Reporter: Eric Hanson
Assignee: Eric Hanson
 Attachments: vectorUDF.4.patch, vectorUDF.5.patch, vectorUDF.8.patch


 Suppose you have a custom UDF myUDF() that you've created to extend hive. The 
 goal of this JIRA is to create a facility where if you run a query that uses 
 myUDF() in an expression, the query will run in vectorized mode.
 This would be a general-purpose bridge for custom UDFs that users add to 
 Hive. It would work with existing UDFs.
 I'm considering a separate JIRA for a new kind of custom UDF implementation 
 that is vectorized from the beginning, to optimize performance. That is not 
 covered by this JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5102) ORC getSplits should create splits based the stripes


 [ 
https://issues.apache.org/jira/browse/HIVE-5102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-5102:
--

Attachment: HIVE-5102.D12579.1.patch

omalley requested code review of HIVE-5102 [jira] ORC getSplits should create 
splits based the stripes.

Reviewers: JIRA

working on orcinputformat

Currently ORC inherits getSplits from FileFormat, which basically makes a split 
per an HDFS block. This can create too little parallelism and would be better 
done by having getSplits look at the file footer and create splits based on the 
stripes.

TEST PLAN
  EMPTY

REVISION DETAIL
  https://reviews.facebook.net/D12579

AFFECTED FILES
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/StripeInformation.java
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java
  shims/src/0.20/java/org/apache/hadoop/hive/shims/Hadoop20Shims.java
  shims/src/0.20S/java/org/apache/hadoop/hive/shims/Hadoop20SShims.java
  shims/src/0.23/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java
  shims/src/common/java/org/apache/hadoop/hive/shims/HadoopShims.java

MANAGE HERALD RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/30219/

To: JIRA, omalley


 ORC getSplits should create splits based the stripes 
 -

 Key: HIVE-5102
 URL: https://issues.apache.org/jira/browse/HIVE-5102
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: HIVE-5102.D12579.1.patch


 Currently ORC inherits getSplits from FileFormat, which basically makes a 
 split per an HDFS block. This can create too little parallelism and would be 
 better done by having getSplits look at the file footer and create splits 
 based on the stripes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5091) ORC files should have an option to pad stripes to the HDFS block boundaries

2013-08-28 Thread Owen O'Malley (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-5091:


Component/s: File Formats

 ORC files should have an option to pad stripes to the HDFS block boundaries
 ---

 Key: HIVE-5091
 URL: https://issues.apache.org/jira/browse/HIVE-5091
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: HIVE-5091.D12249.1.patch, HIVE-5091.D12249.2.patch


 With ORC stripes being large, if a stripe straddles an HDFS block, the 
 locality of read is suboptimal. It would be good to add padding to ensure 
 that stripes don't straddle HDFS blocks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5102) ORC getSplits should create splits based the stripes

2013-08-28 Thread Owen O'Malley (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-5102:


Status: Patch Available  (was: Open)

 ORC getSplits should create splits based the stripes 
 -

 Key: HIVE-5102
 URL: https://issues.apache.org/jira/browse/HIVE-5102
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: HIVE-5102.D12579.1.patch


 Currently ORC inherits getSplits from FileFormat, which basically makes a 
 split per an HDFS block. This can create too little parallelism and would be 
 better done by having getSplits look at the file footer and create splits 
 based on the stripes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-5167) webhcat_config.sh checks for env variables being set before sourcing webhcat-env.sh

Thejas M Nair created HIVE-5167:
---

 Summary: webhcat_config.sh checks for env variables being set 
before sourcing webhcat-env.sh
 Key: HIVE-5167
 URL: https://issues.apache.org/jira/browse/HIVE-5167
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Affects Versions: 0.12.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair


HIVE-4820 introduced checks for env variables, but it does so before sourcing 
webhcat-env.sh. This order needs to be reversed.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5167) webhcat_config.sh checks for env variables being set before sourcing webhcat-env.sh


 [ 
https://issues.apache.org/jira/browse/HIVE-5167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-5167:


Attachment: HIVE-5167.1.patch

 webhcat_config.sh checks for env variables being set before sourcing 
 webhcat-env.sh
 ---

 Key: HIVE-5167
 URL: https://issues.apache.org/jira/browse/HIVE-5167
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Affects Versions: 0.12.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-5167.1.patch


 HIVE-4820 introduced checks for env variables, but it does so before sourcing 
 webhcat-env.sh. This order needs to be reversed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5167) webhcat_config.sh checks for env variables being set before sourcing webhcat-env.sh


[ 
https://issues.apache.org/jira/browse/HIVE-5167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13752941#comment-13752941
 ] 

Thejas M Nair commented on HIVE-5167:
-

Also, the check for environment variables being set should be changed to from a 
fatal error to a warning. These are necessary only for default configuration of 
webhcat. HIVE_HOME is not used in default config file.


 webhcat_config.sh checks for env variables being set before sourcing 
 webhcat-env.sh
 ---

 Key: HIVE-5167
 URL: https://issues.apache.org/jira/browse/HIVE-5167
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Affects Versions: 0.12.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-5167.1.patch


 HIVE-4820 introduced checks for env variables, but it does so before sourcing 
 webhcat-env.sh. This order needs to be reversed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4196) Support for Streaming Partitions in Hive

2013-08-28 Thread Roshan Naik (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-4196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13752946#comment-13752946
]

Roshan Naik commented on HIVE-4196:
---

{quote} According to the Hive coding conventions lines should be bounded at
100 characters. Many lines in this patch exceed that. {quote}

Will fix the ones which are not in the thrift generated files.

{quote} I'm surprised to see that streamingStatus sets the chunk id for the
table. {quote}

Seems like a bug. Will fix.

{quote} The logic at the end of of these functions doesn't look right. Take
getNextChunkID for example. If commitTransaction fails (line 2132) rollback
will be called but the next chunk id will still be returned. It seems you need
a check on success after commit. I realize many of the calls in the class
follow this, but it doesn't seem right. {quote}

Good catch. At the time I thought commitTxn() will only fail with an exception
does not return false. But on closer inspection there is indeed a corner case
(if rollBack was called) that it returns false also. Its a bizzare thing for a
function to fail with without exceptions. But for now I will fix my code to
live with it.

{quote} In HiveMetaStoreClient.java, is assert what you want? Are you ok with
the validity of the arguments not being checked most of the time?{quote}

Not all checks are in place. There is some checks that will happen at lower
layers. Some at higher. Will be adding more checks.

{quote} I'm trying to figure out whether the chunk files are moved, deleted, or
left alone during the partition rolling. {quote}

That would depend on whether the table is defined to be an external or internal
table. It is essentially an add_partition of the new partition. It calls
HiveMetastore.add_partition_core_notxn() inside a transaction.

Support for Streaming Partitions in Hive

Key: HIVE-4196
URL: https://issues.apache.org/jira/browse/HIVE-4196
Project: Hive
Issue Type: New Feature
Components: Database/Schema, HCatalog
Affects Versions: 0.10.1
Reporter: Roshan Naik
Assignee: Roshan Naik
Attachments: HCatalogStreamingIngestFunctionalSpecificationandDesign-
apr 29- patch1.docx, HCatalogStreamingIngestFunctionalSpecificationandDesign-
apr 29- patch1.pdf, HIVE-4196.v1.patch

Motivation: Allow Hive users to immediately query data streaming in through
clients such as Flume.
Currently Hive partitions must be created after all the data for the
partition is available. Thereafter, data in the partitions is considered
immutable.
This proposal introduces the notion of a streaming partition into which new
files an be committed periodically and made available for queries before the
partition is closed and converted into a standard partition.
The admin enables streaming partition on a table using DDL. He provides the
following pieces of information:
- Name of the partition in the table on which streaming is enabled
- Frequency at which the streaming partition should be closed and converted
into a standard partition.
Tables with streaming partition enabled will be partitioned by one and only
one column. It is assumed that this column will contain a timestamp.
Closing the current streaming partition converts it into a standard
partition. Based on the specified frequency, the current streaming partition
is closed and a new one created for future writes. This is referred to as
'rolling the partition'.
A streaming partition's life cycle is as follows:
- A new streaming partition is instantiated for writes
- Streaming clients request (via webhcat) for a HDFS file name into which
they can write a chunk of records for a specific table.
- Streaming clients write a chunk (via webhdfs) to that file and commit
it(via webhcat). Committing merely indicates that the chunk has been written
completely and ready for serving queries.
- When the partition is rolled, all committed chunks are swept into single
directory and a standard partition pointing to that directory is created. The
streaming partition is closed and new streaming partition is created. Rolling
the partition is atomic. Streaming clients are agnostic of partition rolling.

- Hive queries will be able to query the partition that is currently open
for streaming. only committed chunks will be visible. read consistency will
be ensured so that repeated reads of the same partition will be idempotent
for the lifespan of the query.
Partition rolling requires an active agent/thread running to check when it is
time to roll and trigger the roll. This could be either be achieved by using
an external agent such as Oozie (preferably) or an internal agent.

--
This message is automatically generated by JIRA.
If you think

[jira] [Updated] (HIVE-5158) allow getting all partitions for table to also use direct SQL path


 [ 
https://issues.apache.org/jira/browse/HIVE-5158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-5158:
--

Attachment: HIVE-5158.D12573.2.patch

sershe updated the revision HIVE-5158 [jira] allow getting all partitions for 
table to also use direct SQL path.

  Change the patch instead in such manner that PartitionPruner calls the method 
I already modified. It seems like it doesn't need auth (get-by-filter and 
get-by-name don't use it).

Reviewers: ashutoshc, JIRA

REVISION DETAIL
  https://reviews.facebook.net/D12573

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D12573?vs=39141id=39201#toc

MANIPHEST TASKS
  https://reviews.facebook.net/T63

AFFECTED FILES
  metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
  metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java

To: JIRA, ashutoshc, sershe


 allow getting all partitions for table to also use direct SQL path
 --

 Key: HIVE-5158
 URL: https://issues.apache.org/jira/browse/HIVE-5158
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-5158.D12573.1.patch, HIVE-5158.D12573.2.patch


 While testing some queries I noticed that getPartitions can be very slow 
 (which happens e.g. in non-strict mode with no partition column filter); with 
 a table with many partitions it can take 10-12s easily. SQL perf path can 
 also be used for this path.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-5091) ORC files should have an option to pad stripes to the HDFS block boundaries


 [ 
https://issues.apache.org/jira/browse/HIVE-5091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-5091:
--

Attachment: HIVE-5091.D12249.3.patch

omalley updated the revision HIVE-5091 [jira] ORC files should have an option 
to pad stripes to the HDFS block boundaries.

  Updated test file dump output

Reviewers: JIRA

REVISION DETAIL
  https://reviews.facebook.net/D12249

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D12249?vs=38865id=39207#toc

AFFECTED FILES
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestNewIntegerEncoding.java
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcNullOptimization.java
  ql/src/test/resources/orc-file-dump.out

To: JIRA, omalley
Cc: hagleitn


 ORC files should have an option to pad stripes to the HDFS block boundaries
 ---

 Key: HIVE-5091
 URL: https://issues.apache.org/jira/browse/HIVE-5091
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: HIVE-5091.D12249.1.patch, HIVE-5091.D12249.2.patch, 
 HIVE-5091.D12249.3.patch


 With ORC stripes being large, if a stripe straddles an HDFS block, the 
 locality of read is suboptimal. It would be good to add padding to ensure 
 that stripes don't straddle HDFS blocks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HIVE-5159) Change the kind fields in ORC's proto file to optional

2013-08-28 Thread Jason Dere (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere reassigned HIVE-5159:


Assignee: Jason Dere

 Change the kind fields in ORC's proto file to optional
 --

 Key: HIVE-5159
 URL: https://issues.apache.org/jira/browse/HIVE-5159
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Jason Dere

 Java's protobuf generated code uses a null value to represent enum values 
 that were added after the reader was compiled. To reflect that reality, the 
 enum values should always be marked as optional.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: pre-commit-build offline untill wed

2013-08-28 Thread Brock Noland

OK I just fixed this and verified the restart script works.  Sorry
about the delay, as Edward said things went to hell, I was out of town
without a laptop, and our restart scripts were untested.

We'll do our best to make sure this doesn't happen again.

On Sat, Aug 24, 2013 at 12:07 PM, Edward Capriolo edlinuxg...@gmail.com wrote:
 I do have access the the build box, however I never poked the sudo
 mechanism to restart the service before and that is not correct.

 No pre-commit testing anymore, we have to go back to the old system for a
 while .



-- 
Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org

[jira] [Updated] (HIVE-4964) Cleanup PTF code: remove code dealing with non standard sql behavior we had original introduced


 [ 
https://issues.apache.org/jira/browse/HIVE-4964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-4964:
--

Attachment: HIVE-4964.D12585.1.patch

hbutani requested code review of HIVE-4964 [jira] Cleanup PTF code: remove 
code dealing with non standard sql behavior we had original introduced.

Reviewers: JIRA, ashutoshc

merge with trunk

There are still pieces of code that deal with:

supporting select expressions with Windowing
supporting a filter with windowing

Need to do this before introducing  Perf. improvements.

TEST PLAN
  EMPTY

REVISION DETAIL
  https://reviews.facebook.net/D12585

AFFECTED FILES
  ql/src/java/org/apache/hadoop/hive/ql/exec/PTFOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/PTFTranslator.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/WindowingSpec.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/PTFDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/PTFDeserializer.java
  ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/WindowingTableFunction.java

MANAGE HERALD RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/30237/

To: JIRA, ashutoshc, hbutani


 Cleanup PTF code: remove code dealing with non standard sql behavior we had 
 original introduced
 ---

 Key: HIVE-4964
 URL: https://issues.apache.org/jira/browse/HIVE-4964
 Project: Hive
  Issue Type: Bug
Reporter: Harish Butani
Priority: Minor
 Attachments: HIVE-4964.D11985.1.patch, HIVE-4964.D11985.2.patch, 
 HIVE-4964.D12585.1.patch


 There are still pieces of code that deal with:
 - supporting select expressions with Windowing
 - supporting a filter with windowing
 Need to do this before introducing  Perf. improvements. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4617) ExecuteStatementAsync call to run a query in non-blocking mode


 [ 
https://issues.apache.org/jira/browse/HIVE-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-4617:


Attachment: HIVE-4617.D12507Test.1.patch

HIVE-4617.D12507Test.1.patch - Copy of HIVE-4617.D12507.1.patch to kick off 
tests.

 ExecuteStatementAsync call to run a query in non-blocking mode
 --

 Key: HIVE-4617
 URL: https://issues.apache.org/jira/browse/HIVE-4617
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Affects Versions: 0.11.0
Reporter: Jaideep Dhok
Assignee: Vaibhav Gumashta
 Attachments: HIVE-4617.D12417.1.patch, HIVE-4617.D12417.2.patch, 
 HIVE-4617.D12417.3.patch, HIVE-4617.D12417.4.patch, HIVE-4617.D12417.5.patch, 
 HIVE-4617.D12417.6.patch, HIVE-4617.D12507.1.patch, 
 HIVE-4617.D12507Test.1.patch


 Provide a way to run a queries asynchronously. Current executeStatement call 
 blocks until the query run is complete.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4617) ExecuteStatementAsync call to run a query in non-blocking mode


 [ 
https://issues.apache.org/jira/browse/HIVE-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-4617:


Status: Open  (was: Patch Available)

 ExecuteStatementAsync call to run a query in non-blocking mode
 --

 Key: HIVE-4617
 URL: https://issues.apache.org/jira/browse/HIVE-4617
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Affects Versions: 0.11.0
Reporter: Jaideep Dhok
Assignee: Vaibhav Gumashta
 Attachments: HIVE-4617.D12417.1.patch, HIVE-4617.D12417.2.patch, 
 HIVE-4617.D12417.3.patch, HIVE-4617.D12417.4.patch, HIVE-4617.D12417.5.patch, 
 HIVE-4617.D12417.6.patch, HIVE-4617.D12507.1.patch, 
 HIVE-4617.D12507Test.1.patch


 Provide a way to run a queries asynchronously. Current executeStatement call 
 blocks until the query run is complete.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4617) ExecuteStatementAsync call to run a query in non-blocking mode


 [ 
https://issues.apache.org/jira/browse/HIVE-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-4617:


Status: Patch Available  (was: Open)

 ExecuteStatementAsync call to run a query in non-blocking mode
 --

 Key: HIVE-4617
 URL: https://issues.apache.org/jira/browse/HIVE-4617
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Affects Versions: 0.11.0
Reporter: Jaideep Dhok
Assignee: Vaibhav Gumashta
 Attachments: HIVE-4617.D12417.1.patch, HIVE-4617.D12417.2.patch, 
 HIVE-4617.D12417.3.patch, HIVE-4617.D12417.4.patch, HIVE-4617.D12417.5.patch, 
 HIVE-4617.D12417.6.patch, HIVE-4617.D12507.1.patch, 
 HIVE-4617.D12507Test.1.patch


 Provide a way to run a queries asynchronously. Current executeStatement call 
 blocks until the query run is complete.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4844) Add char/varchar data types

2013-08-28 Thread Xuefu Zhang (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753058#comment-13753058
]

Xuefu Zhang commented on HIVE-4844:
---

Hi Jason,

Thanks for your response. I understand it's hard to separate your patch into
small patches. On the other hand, I'm wondering if the changes you made dealing
with precision/scale is required for char/varchar support. If not, could you
spare them from you patch?

The problem I have is the difficulty to rebase my changes on your patch because
of the progressive nature. This might makes easier for both of us to proceed.
In the meantime, please feel free to include whatever changes that are needed
for both feature.

Please let me know. Thanks.

Add char/varchar data types
---

Key: HIVE-4844
URL: https://issues.apache.org/jira/browse/HIVE-4844
Project: Hive
Issue Type: New Feature
Components: Types
Reporter: Jason Dere
Assignee: Jason Dere
Attachments: HIVE-4844.1.patch.hack, HIVE-4844.2.patch,
HIVE-4844.3.patch, HIVE-4844.4.patch, HIVE-4844.5.patch, HIVE-4844.6.patch,
HIVE-4844.7.patch, HIVE-4844.8.patch, HIVE-4844.9.patch, screenshot.png

Add new char/varchar data types which have support for more SQL-compliant
behavior, such as SQL string comparison semantics, max length, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3183) case expression should allow different types per ISO-SQL 2012

2013-08-28 Thread Xiu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiu updated HIVE-3183:
--

Attachment: udf_when_type_wrong3.q.out
udf_when_type_wrong2.q.out
Hive-3183.patch.txt

This patch removes the restriction on 'when' clause. So some negative testcases 
become positive, namely:

udf_when_type_wrong2.q 
udf_when_type_wrong3.q

They should be moved from 'ql/src/test/queries/clientnegative/' to 
'ql/src/test/queries/clientpositive/', and be renamed to reflect its positive 
nature.

Also in ql/src/test/results/clientpositive/

udf_when_type_wrong2.q.out
udf_when_type_wrong3.q.out

need to be added.

 case expression should allow different types per ISO-SQL 2012
 -

 Key: HIVE-3183
 URL: https://issues.apache.org/jira/browse/HIVE-3183
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 0.8.0
Reporter: N Campbell
 Attachments: Hive-3183.patch.txt, udf_when_type_wrong2.q.out, 
 udf_when_type_wrong3.q.out


 The ISO-SQL standard specification for CASE allows the specification to 
 include different types in the WHEN and ELSE blocks including this example 
 which mixes smallint and integer types
 select case when vsint.csint is not null then vsint.csint else 1 end from 
 cert.vsint vsint 
 The Apache Hive docs do not state how it deviates from the standard or any 
 given restrictions so unsure if this is a bug vs an enhancement. Many SQL 
 applications mix so this seems to be a restrictive implementation if this is 
 by design.
 Argument type mismatch '1': The expression after ELSE should have the same 
 type as those after THEN: smallint is expected but int is found

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

RFC: Major HCatalog refactoring

2013-08-28 Thread Eugene Koifman

Hi,


Here is the plan for refactoring HCatalog as was agreed to when it was
merged into Hive during.  HIVE-4869 is the umbrella bug for this work.  The
changes are complex and touch every single file under hcatalog.  Please
comment.

When HCatalog project was merged into Hive on 0.11 several integration
items did not make the 0.11 deadline.  It was agreed to finish them in 0.12
release.  Specifically:

1. HIVE-4895 - change package name from org.apache.hcatalog to
org.apache.hive.hcatalog

2. HIVE-4896 - create binary backwards compatibility layer for hcat users
upgrading from 0.11 to 0.12

For item 1, we’ll just move every file under org.apache.hcatalog to
org.apache.hive.hcatalog and update all “package” and “import” statement as
well as all hcat/webhcat scripts.  This will include all JUnit tests.

Item 2 will ensure that if a user has a M/R program or Pig script, etc.
that uses HCatalog public API, their programs will continue to work w/o
change with hive 0.12.

The proposal is to make the changes that have as little impact on the build
system, in part to make upcoming ‘mavenization’ of hive easier, in part to
make the changes more manageable.



The list of public interfaces (and their transitive closure) for which
backwards compat will be provided.

   1.

   HCatLoader
   2.

   HCatStorer
   3.

   HCatInputFormat
   4.

   HCatOutputFormat
   5.

   HCatReader
   6.

   HCatWriter
   7.

   HCatRecord
   8.

   HCatSchema


To achieve this, 0.11 version of these classes will be added in
org.apache.hcatalog package (after item 1 is done).  Each of these classes
as well as dependencies will be deprecated to make it clear that any new
development needs to happen in org.apache.hive.hcatalog.  0.11 version of
JUnit tests for hcat will also be brought to trunk and handled the same way
as mainline code.  A sunset clause will be added to the deprecation message.

Thus, the published HCatalog JARs will contain both packages and the unit
tests will cover both versions of the API.

Since these changes are unavoidably disruptive, we’ll need to lock down
hcatalog part of hive, check in all existing patches (which are ready, i.e.
apply/test cleanly and don’t have review comments which need to be
addressed) and them make the refactoring changes.


Thanks,

Eugene

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

[jira] [Commented] (HIVE-4844) Add char/varchar data types

2013-08-28 Thread Jason Dere (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753094#comment-13753094
]

Jason Dere commented on HIVE-4844:
--

Hi Xuefu, sorry about. I did add precision/scale in a few places, let's take a
look:

1. JDBC: The precision/scale is also used for returning varchar length, so
these changes are necessary.
2. BaseTypeParams/TypeQualifiers/TTypeQualifiers: These are objects used to
hold type qualifier information, and I did add precision/scale fields/setters
to these objects. If you'd like them removed I can remove any mention of
precision/scale in these objects.
3. TCLIService.thrift: Add constant string values to represent precision/scale
fields. I can also remove those constant definitions if you like.

Let me know if you want me to remove mention of precision/scale from (2) and
(3).

Add char/varchar data types
---

Add new char/varchar data types which have support for more SQL-compliant
behavior, such as SQL string comparison semantics, max length, etc.

[jira] [Commented] (HIVE-5029) direct SQL perf optimization cannot be tested well

2013-08-28 Thread Hive QA (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-5029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753109#comment-13753109
]

Hive QA commented on HIVE-5029:
---

{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12600451/HIVE-5029.D12483.2.patch

{color:green}SUCCESS:{color} +1 2902 tests passed

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/552/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/552/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

direct SQL perf optimization cannot be tested well
--

Key: HIVE-5029
URL: https://issues.apache.org/jira/browse/HIVE-5029
Project: Hive
Issue Type: Test
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
Priority: Critical
Attachments: HIVE-5029.D12483.1.patch, HIVE-5029.D12483.2.patch,
HIVE-5029.patch, HIVE-5029.patch

HIVE-4051 introduced perf optimization that involves getting partitions
directly via SQL in metastore. Given that SQL queries might not work on all
datastores (and will not work on non-SQL ones), JDO fallback is in place.
Given that perf improvement is very large for short queries, it's on by
default.
However, there's a problem with tests with regard to that. If SQL code is
broken, tests may fall back to JDO and pass. If JDO code is broken, SQL might
allow tests to pass.
We are going to disable SQL by default before the testing problem is resolved.
There are several possible solultions:
1) Separate build for this setting. Seems like an overkill...
2) Enable by default; disable by default in tests, create a clone of
TestCliDriver with a subset of queries that will exercise the SQL path.
3) Have some sort of test hook inside metastore that will run both ORM and
SQL and compare.
3') Or make a subclass of ObjectStore that will do that. ObjectStore is
already pluggable.
4) Write unit tests for one of the modes (JDO, as non-default?) and declare
that they are sufficient; disable fallback in tests.
3' seems like the easiest. For now we will disable SQL by default.

[jira] [Commented] (HIVE-951) Selectively include EXTERNAL TABLE source files via REGEX

2013-08-28 Thread indrajit (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753163#comment-13753163
 ] 

indrajit commented on HIVE-951:
---

CREATE EXTERNAL TABLE allow users to us the table on the top of HDFS , 
Its good feature and it does not look for the path whether it is created or not 
,
After creation of table you can lazily create the path 

 Selectively include EXTERNAL TABLE source files via REGEX
 -

 Key: HIVE-951
 URL: https://issues.apache.org/jira/browse/HIVE-951
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Carl Steinbach
Assignee: Carl Steinbach
 Attachments: HIVE-951.patch


 CREATE EXTERNAL TABLE should allow users to cherry-pick files via regular 
 expression. 
 CREATE EXTERNAL TABLE was designed to allow users to access data that exists 
 outside of Hive, and
 currently makes the assumption that all of the files located under the 
 supplied path should be included
 in the new table. Users frequently encounter directories containing multiple
 datasets, or directories that contain data in heterogeneous schemas, and it's 
 often
 impractical or impossible to adjust the layout of the directory to meet the 
 requirements of 
 CREATE EXTERNAL TABLE. A good example of this problem is creating an external 
 table based
 on the contents of an S3 bucket. 
 One way to solve this problem is to extend the syntax of CREATE EXTERNAL TABLE
 as follows:
 CREATE EXTERNAL TABLE
 ...
 LOCATION path [file_regex]
 ...
 For example:
 {code:sql}
 CREATE EXTERNAL TABLE mytable1 ( a string, b string, c string )
 STORED AS TEXTFILE
 LOCATION 's3://my.bucket/' 'folder/2009.*\.bz2$';
 {code}
 Creates mytable1 which includes all files in s3:/my.bucket with a filename 
 matching 'folder/2009*.bz2'
 {code:sql}
 CREATE EXTERNAL TABLE mytable2 ( d string, e int, f int, g int )
 STORED AS TEXTFILE 
 LOCATION 'hdfs://data/' 'xyz.*2009.bz2$';
 {code}
 Creates mytable2 including all files matching 'xyz*2009.bz2' located 
 under hdfs://data/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4964) Cleanup PTF code: remove code dealing with non standard sql behavior we had original introduced


 [ 
https://issues.apache.org/jira/browse/HIVE-4964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-4964:
---

Assignee: Harish Butani

 Cleanup PTF code: remove code dealing with non standard sql behavior we had 
 original introduced
 ---

 Key: HIVE-4964
 URL: https://issues.apache.org/jira/browse/HIVE-4964
 Project: Hive
  Issue Type: Bug
Reporter: Harish Butani
Assignee: Harish Butani
Priority: Minor
 Attachments: HIVE-4964.D11985.1.patch, HIVE-4964.D11985.2.patch, 
 HIVE-4964.D12585.1.patch


 There are still pieces of code that deal with:
 - supporting select expressions with Windowing
 - supporting a filter with windowing
 Need to do this before introducing  Perf. improvements. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4964) Cleanup PTF code: remove code dealing with non standard sql behavior we had original introduced


[ 
https://issues.apache.org/jira/browse/HIVE-4964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753169#comment-13753169
 ] 

Phabricator commented on HIVE-4964:
---

ashutoshc has accepted the revision HIVE-4964 [jira] Cleanup PTF code: remove 
code dealing with non standard sql behavior we had original introduced.

  +1

REVISION DETAIL
  https://reviews.facebook.net/D12585

BRANCH
  HIVE-4964-2

ARCANIST PROJECT
  hive

To: JIRA, ashutoshc, hbutani


 Cleanup PTF code: remove code dealing with non standard sql behavior we had 
 original introduced
 ---

 Key: HIVE-4964
 URL: https://issues.apache.org/jira/browse/HIVE-4964
 Project: Hive
  Issue Type: Bug
Reporter: Harish Butani
Assignee: Harish Butani
Priority: Minor
 Attachments: HIVE-4964.D11985.1.patch, HIVE-4964.D11985.2.patch, 
 HIVE-4964.D12585.1.patch


 There are still pieces of code that deal with:
 - supporting select expressions with Windowing
 - supporting a filter with windowing
 Need to do this before introducing  Perf. improvements. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5029) direct SQL perf optimization cannot be tested well

2013-08-28 Thread Sergey Shelukhin (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-5029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753175#comment-13753175
]

Sergey Shelukhin commented on HIVE-5029:

Hive QA passed

direct SQL perf optimization cannot be tested well
--

[jira] [Updated] (HIVE-5158) allow getting all partitions for table to also use direct SQL path


 [ 
https://issues.apache.org/jira/browse/HIVE-5158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-5158:
--

Attachment: HIVE-5158.D12573.3.patch

sershe updated the revision HIVE-5158 [jira] allow getting all partitions for 
table to also use direct SQL path.

  Adding the limit support to this and other call... tests are running

Reviewers: ashutoshc, JIRA

REVISION DETAIL
  https://reviews.facebook.net/D12573

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D12573?vs=39201id=39237#toc

MANIPHEST TASKS
  https://reviews.facebook.net/T63

AFFECTED FILES
  metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
  metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java

To: JIRA, ashutoshc, sershe


 allow getting all partitions for table to also use direct SQL path
 --

 Key: HIVE-5158
 URL: https://issues.apache.org/jira/browse/HIVE-5158
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-5158.D12573.1.patch, HIVE-5158.D12573.2.patch, 
 HIVE-5158.D12573.3.patch


 While testing some queries I noticed that getPartitions can be very slow 
 (which happens e.g. in non-strict mode with no partition column filter); with 
 a table with many partitions it can take 10-12s easily. SQL perf path can 
 also be used for this path.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5029) direct SQL perf optimization cannot be tested well


[ 
https://issues.apache.org/jira/browse/HIVE-5029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753217#comment-13753217
 ] 

Phabricator commented on HIVE-5029:
---

ashutoshc has commented on the revision HIVE-5029 [jira] direct SQL perf 
optimization cannot be tested well.

INLINE COMMENTS
  metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java:1684 
Before throwing exceptions, don't we need to rollbackTransaction() ?
  metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java:1783 
Before throwing exception, don't we need to rollbackTransaction() ?

REVISION DETAIL
  https://reviews.facebook.net/D12483

To: JIRA, ashutoshc, sershe


 direct SQL perf optimization cannot be tested well
 --

 Key: HIVE-5029
 URL: https://issues.apache.org/jira/browse/HIVE-5029
 Project: Hive
  Issue Type: Test
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
Priority: Critical
 Attachments: HIVE-5029.D12483.1.patch, HIVE-5029.D12483.2.patch, 
 HIVE-5029.patch, HIVE-5029.patch


 HIVE-4051 introduced perf optimization that involves getting partitions 
 directly via SQL in metastore. Given that SQL queries might not work on all 
 datastores (and will not work on non-SQL ones), JDO fallback is in place.
 Given that perf improvement is very large for short queries, it's on by 
 default.
 However, there's a problem with tests with regard to that. If SQL code is 
 broken, tests may fall back to JDO and pass. If JDO code is broken, SQL might 
 allow tests to pass.
 We are going to disable SQL by default before the testing problem is resolved.
 There are several possible solultions:
 1) Separate build for this setting. Seems like an overkill...
 2) Enable by default; disable by default in tests, create a clone of 
 TestCliDriver with a subset of queries that will exercise the SQL path.
 3) Have some sort of test hook inside metastore that will run both ORM and 
 SQL and compare.
 3') Or make a subclass of ObjectStore that will do that. ObjectStore is 
 already pluggable.
 4) Write unit tests for one of the modes (JDO, as non-default?) and declare 
 that they are sufficient; disable fallback in tests.
 3' seems like the easiest. For now we will disable SQL by default.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5107) Change hive's build to maven

2013-08-28 Thread Roshan Naik (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753233#comment-13753233
 ] 

Roshan Naik commented on HIVE-5107:
---

curious .. is ant's  'makepom' task (to convert a ivy file into a pom file) a 
useful starting point for such an effort ?

 Change hive's build to maven
 

 Key: HIVE-5107
 URL: https://issues.apache.org/jira/browse/HIVE-5107
 Project: Hive
  Issue Type: Task
Reporter: Edward Capriolo
Assignee: Edward Capriolo

 I can not cope with hive's build infrastructure any more. I have started 
 working on porting the project to maven. When I have some solid progess i 
 will github the entire thing for review. Then we can talk about switching the 
 project somehow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5158) allow getting all partitions for table to also use direct SQL path


[ 
https://issues.apache.org/jira/browse/HIVE-5158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753248#comment-13753248
 ] 

Phabricator commented on HIVE-5158:
---

ashutoshc has accepted the revision HIVE-5158 [jira] allow getting all 
partitions for table to also use direct SQL path.

  +1

REVISION DETAIL
  https://reviews.facebook.net/D12573

BRANCH
  HIVE-5158

ARCANIST PROJECT
  hive

To: JIRA, ashutoshc, sershe


 allow getting all partitions for table to also use direct SQL path
 --

 Key: HIVE-5158
 URL: https://issues.apache.org/jira/browse/HIVE-5158
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-5158.D12573.1.patch, HIVE-5158.D12573.2.patch, 
 HIVE-5158.D12573.3.patch


 While testing some queries I noticed that getPartitions can be very slow 
 (which happens e.g. in non-strict mode with no partition column filter); with 
 a table with many partitions it can take 10-12s easily. SQL perf path can 
 also be used for this path.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5102) ORC getSplits should create splits based the stripes

2013-08-28 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753305#comment-13753305
 ] 

Hive QA commented on HIVE-5102:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12600472/HIVE-5102.D12579.1.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 2907 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.ql.io.orc.TestInputOutputFormat.testFileGenerator
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/554/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/554/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests failed with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

 ORC getSplits should create splits based the stripes 
 -

 Key: HIVE-5102
 URL: https://issues.apache.org/jira/browse/HIVE-5102
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: HIVE-5102.D12579.1.patch


 Currently ORC inherits getSplits from FileFormat, which basically makes a 
 split per an HDFS block. This can create too little parallelism and would be 
 better done by having getSplits look at the file footer and create splits 
 based on the stripes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3562) Some limit can be pushed down to map stage


[ 
https://issues.apache.org/jira/browse/HIVE-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753307#comment-13753307
 ] 

Hudson commented on HIVE-3562:
--

SUCCESS: Integrated in Hive-trunk-h0.21 #2295 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/2295/])
HIVE-3562 : Some limit can be pushed down to map stage (Navis via Ashutosh 
Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1518234)
* /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
* /hive/trunk/conf/hive-default.xml.template
* /hive/trunk/ql/build.xml
* /hive/trunk/ql/ivy.xml
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExtractOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ForwardOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/SelectOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/TopNHash.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveKey.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/LimitPushdownOptimizer.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java
* /hive/trunk/ql/src/test/queries/clientpositive/limit_pushdown.q
* /hive/trunk/ql/src/test/queries/clientpositive/limit_pushdown_negative.q
* /hive/trunk/ql/src/test/results/clientpositive/limit_pushdown.q.out
* /hive/trunk/ql/src/test/results/clientpositive/limit_pushdown_negative.q.out


 Some limit can be pushed down to map stage
 --

 Key: HIVE-3562
 URL: https://issues.apache.org/jira/browse/HIVE-3562
 Project: Hive
  Issue Type: Bug
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Fix For: 0.12.0

 Attachments: HIVE-3562.D5967.1.patch, HIVE-3562.D5967.2.patch, 
 HIVE-3562.D5967.3.patch, HIVE-3562.D5967.4.patch, HIVE-3562.D5967.5.patch, 
 HIVE-3562.D5967.6.patch, HIVE-3562.D5967.7.patch, HIVE-3562.D5967.8.patch, 
 HIVE-3562.D5967.9.patch


 Queries with limit clause (with reasonable number), for example
 {noformat}
 select * from src order by key limit 10;
 {noformat}
 makes operator tree, 
 TS-SEL-RS-EXT-LIMIT-FS
 But LIMIT can be partially calculated in RS, reducing size of shuffling.
 TS-SEL-RS(TOP-N)-EXT-LIMIT-FS

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-5128) Direct SQL for view is failing