date:20141024


 [ 
https://issues.apache.org/jira/browse/HIVE-8532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-8532:

   Resolution: Fixed
Fix Version/s: 0.15.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks vitthal (Suhas) Gogate, for the contribution.

 return code of source xxx clause is missing
 -

 Key: HIVE-8532
 URL: https://issues.apache.org/jira/browse/HIVE-8532
 Project: Hive
  Issue Type: Bug
  Components: Clients
Affects Versions: 0.12.0, 0.13.1
Reporter: Gordon Wang
 Fix For: 0.15.0

 Attachments: HIVE-8532.patch


 When executing source hql-file  clause, hive client driver does not catch 
 the return code of this command.
 This behaviour causes an issue when running hive query in Oozie workflow.
 When the source clause is put into a Oozie workflow, Oozie can not get the 
 return code of this command. Thus, Oozie consider the source clause as 
 successful all the time. 
 So, when the source clause fails, the hive query does not abort and the 
 oozie workflow does not abort either.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8586) Record counters aren't updated correctly for vectorized queries


 [ 
https://issues.apache.org/jira/browse/HIVE-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-8586:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to trunk + branch.

 Record counters aren't updated correctly for vectorized queries
 ---

 Key: HIVE-8586
 URL: https://issues.apache.org/jira/browse/HIVE-8586
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Fix For: 0.14.0

 Attachments: HIVE-8586.1.patch


 Counts batches not rows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-8590) With different parameters or column number dense_rank function gets different count distinct results

2014-10-24 Thread ericni (JIRA)

ericni created HIVE-8590:


 Summary: With different parameters  or column number dense_rank 
function gets different count distinct results 
 Key: HIVE-8590
 URL: https://issues.apache.org/jira/browse/HIVE-8590
 Project: Hive
  Issue Type: Bug
  Components: UDF
Affects Versions: 0.13.1
 Environment: cdh 4.6.0/hive0.13
Reporter: ericni


We create a table with sql which contains the  dense_rank function,and then run 
count distinct on this table,
we found that with diffrent dense_rank parameters or even defferent columns,we 
will get the defferent count distinct results:
1.Less data will be ok(in our test case,200 million rows will get the same 
results,but 300 million rows will get the different results )
2.Different dense_rank parameters may be get the different results ,e.g  
dense_rank() over(distribute by a,b sort by c desc) and dense_rank() 
over(distribute by a sort by c desc)
3.All window functions(rank,row_number,dense_rank) have this problem
4.Less column number may be ok
5.Count(1) is ok,but Count distinct gets different results
6.It seems that some rows have been lost and some rows repeated 

test data(File is too large to upload.):
http://pan.baidu.com/s/1hqnCzze

test sql:
http://pan.baidu.com/s/1eQna8q2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8517) When joining on partition column NDV gets overridden by StatsUtils.getColStatisticsFromExpression


 [ 
https://issues.apache.org/jira/browse/HIVE-8517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-8517:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to branch and trunk. Thanks [~mmokhtar]

 When joining on partition column NDV gets overridden by 
 StatsUtils.getColStatisticsFromExpression
 -

 Key: HIVE-8517
 URL: https://issues.apache.org/jira/browse/HIVE-8517
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Mostafa Mokhtar
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8517.1.patch, HIVE-8517.2.patch, HIVE-8517.3.patch


 When joining on partition column number of partitions is used as NDV which 
 gets overridden by StatsUtils.getColStatisticsFromExpression and the number 
 of partitions used as NDV is replaced by number of rows which results in the 
 same behavior as explained in 
 https://issues.apache.org/jira/browse/HIVE-8196. Joining on partition 
 columns with fetch column stats enabled results it very small CE which 
 negatively affects query performance 
 This is the call stack.
 {code}
 StatsUtils.getColStatisticsFromExpression(HiveConf, Statistics, ExprNodeDesc) 
 line: 1001  
 StatsRulesProcFactory$ReduceSinkStatsRule.process(Node, StackNode, 
 NodeProcessorCtx, Object...) line: 1479  
 DefaultRuleDispatcher.dispatch(Node, StackNode, Object...) line: 90 
 PreOrderWalker(DefaultGraphWalker).dispatchAndReturn(Node, StackNode) line: 
 94  
 PreOrderWalker(DefaultGraphWalker).dispatch(Node, StackNode) line: 78   
 PreOrderWalker.walk(Node) line: 54
 PreOrderWalker.walk(Node) line: 59
 PreOrderWalker.walk(Node) line: 59
 PreOrderWalker(DefaultGraphWalker).startWalking(CollectionNode, 
 HashMapNode,Object) line: 109 
 AnnotateWithStatistics.transform(ParseContext) line: 78   
 TezCompiler.runStatsAnnotation(OptimizeTezProcContext) line: 248  
 TezCompiler.optimizeOperatorPlan(ParseContext, SetReadEntity, 
 SetWriteEntity) line: 120   
 TezCompiler(TaskCompiler).compile(ParseContext, ListTaskSerializable, 
 HashSetReadEntity, HashSetWriteEntity) line: 99 
 SemanticAnalyzer.analyzeInternal(ASTNode) line: 10037 
 SemanticAnalyzer(BaseSemanticAnalyzer).analyze(ASTNode, Context) line: 221
 ExplainSemanticAnalyzer.analyzeInternal(ASTNode) line: 74 
 ExplainSemanticAnalyzer(BaseSemanticAnalyzer).analyze(ASTNode, Context) line: 
 221 
 Driver.compile(String, boolean) line: 415 
 {code}
 Query
 {code}
 select
   ss_item_sk item_sk, d_date, sum(ss_sales_price),
   sum(sum(ss_sales_price))
   over (partition by ss_item_sk order by d_date rows between unbounded 
 preceding and current row) cume_sales
 from store_sales
 ,date_dim
 where ss_sold_date_sk=d_date_sk
   and d_month_seq between 1193 and 1193+11
   and ss_item_sk is not NULL
 group by ss_item_sk, d_date
 {code}
 Plan 
 Notice in the Map join operator the number of rows drop from 82,510,879,939 
 to 36524 after the join.
 {code}
 OK
 STAGE DEPENDENCIES:
   Stage-1 is a root stage
   Stage-0 depends on stages: Stage-1
 STAGE PLANS:
   Stage: Stage-1
 Tez
   Edges:
 Map 1 - Map 4 (BROADCAST_EDGE)
 Reducer 2 - Map 1 (SIMPLE_EDGE)
 Reducer 3 - Reducer 2 (SIMPLE_EDGE)
   DagName: mmokhtar_20141019131818_086d663a-5621-456c-bf25-8ccb7112ee3b:6
   Vertices:
 Map 1
 Map Operator Tree:
 TableScan
   alias: store_sales
   filterExpr: ss_item_sk is not null (type: boolean)
   Statistics: Num rows: 82510879939 Data size: 6873789738208 
 Basic stats: COMPLETE Column stats: COMPLETE
   Filter Operator
 predicate: ss_item_sk is not null (type: boolean)
 Statistics: Num rows: 82510879939 Data size: 652315818272 
 Basic stats: COMPLETE Column stats: COMPLETE
 Map Join Operator
   condition map:
Inner Join 0 to 1
   condition expressions:
 0 {ss_item_sk} {ss_sales_price} {ss_sold_date_sk}
 1 {d_date_sk} {d_date} {d_month_seq}
   keys:
 0 ss_sold_date_sk (type: int)
 1 d_date_sk (type: int)
   outputColumnNames: _col1, _col12, _col22, _col26, 
 _col28, _col29
   input vertices:
 1 Map 4
   Statistics: Num rows: 36524 Data size: 4163736 Basic 
 stats: COMPLETE Column stats: COMPLETE
   Filter Operator

[jira] [Updated] (HIVE-8567) Vectorized queries output extra stuff for Binary columns


 [ 
https://issues.apache.org/jira/browse/HIVE-8567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-8567:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to trunk and branch. Thanks [~mmccline]!

 Vectorized queries output extra stuff for Binary columns
 

 Key: HIVE-8567
 URL: https://issues.apache.org/jira/browse/HIVE-8567
 Project: Hive
  Issue Type: Bug
  Components: Vectorization
Affects Versions: 0.14.0
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8567.01.patch


 See vector_data_types.q query output.  Non-vectorized output is shorter than 
 vectorized binary column output which seems to include characters from 
 earlier rows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8582) Outer Join Simplification is broken


 [ 
https://issues.apache.org/jira/browse/HIVE-8582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-8582:
-
Priority: Critical  (was: Major)

 Outer Join Simplification is broken
 ---

 Key: HIVE-8582
 URL: https://issues.apache.org/jira/browse/HIVE-8582
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Affects Versions: 0.14.0
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8582.patch, HIVE-8582.patch


 CLEAR LIBRARY CACHE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8409) SMB joins fail intermittently on tez


 [ 
https://issues.apache.org/jira/browse/HIVE-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-8409:
-
Labels: TODOC14  (was: )

 SMB joins fail intermittently on tez
 

 Key: HIVE-8409
 URL: https://issues.apache.org/jira/browse/HIVE-8409
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: 0.14.0
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
Priority: Critical
  Labels: TODOC14
 Fix For: 0.14.0

 Attachments: HIVE-8409.1.patch, HIVE-8409.10.patch, 
 HIVE-8409.11.patch, HIVE-8409.2.patch, HIVE-8409.3.patch, HIVE-8409.7.patch, 
 HIVE-8409.8.patch, HIVE-8409.9.patch


 Flakiness with regard to SMB joins in tez. TEZ-1647 is required to complete 
 the fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8409) SMB joins fail intermittently on tez


[ 
https://issues.apache.org/jira/browse/HIVE-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182517#comment-14182517
 ] 

Lefty Leverenz commented on HIVE-8409:
--

Doc note:  This adds configuration parameter *hive.tez.smb.number.waves* to 
HiveConf.java, so it needs to be documented in the wiki.

* [Configuration Properties -- Tez | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-Tez]

 SMB joins fail intermittently on tez
 

 Key: HIVE-8409
 URL: https://issues.apache.org/jira/browse/HIVE-8409
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: 0.14.0
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
Priority: Critical
  Labels: TODOC14
 Fix For: 0.14.0

 Attachments: HIVE-8409.1.patch, HIVE-8409.10.patch, 
 HIVE-8409.11.patch, HIVE-8409.2.patch, HIVE-8409.3.patch, HIVE-8409.7.patch, 
 HIVE-8409.8.patch, HIVE-8409.9.patch


 Flakiness with regard to SMB joins in tez. TEZ-1647 is required to complete 
 the fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8582) Outer Join Simplification is broken


[ 
https://issues.apache.org/jira/browse/HIVE-8582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182538#comment-14182538
 ] 

Hive QA commented on HIVE-8582:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12676820/HIVE-8582.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 6578 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_histogram_numeric
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_5
org.apache.hive.minikdc.TestJdbcWithMiniKdc.testNegativeTokenAuth
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1438/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1438/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1438/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12676820 - PreCommit-HIVE-TRUNK-Build

 Outer Join Simplification is broken
 ---

 Key: HIVE-8582
 URL: https://issues.apache.org/jira/browse/HIVE-8582
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Affects Versions: 0.14.0
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8582.patch, HIVE-8582.patch


 CLEAR LIBRARY CACHE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8534) sql std auth : update configuration whitelist for 0.14


 [ 
https://issues.apache.org/jira/browse/HIVE-8534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-8534:
-
Labels: TODOC14  (was: )

 sql std auth : update configuration whitelist for 0.14
 --

 Key: HIVE-8534
 URL: https://issues.apache.org/jira/browse/HIVE-8534
 Project: Hive
  Issue Type: Bug
  Components: Authorization, SQLStandardAuthorization
Reporter: Thejas M Nair
Assignee: Thejas M Nair
Priority: Blocker
  Labels: TODOC14
 Fix For: 0.14.0

 Attachments: HIVE-8534.1.patch, HIVE-8534.2.patch, HIVE-8534.3.patch, 
 HIVE-8534.4.patch, HIVE-8534.5.patch


 New config parameters have been introduced in hive 0.14. SQL standard 
 authorization needs to be updated to allow some new parameters to be set, 
 when the authorization mode is enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8534) sql std auth : update configuration whitelist for 0.14


[ 
https://issues.apache.org/jira/browse/HIVE-8534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182571#comment-14182571
 ] 

Lefty Leverenz commented on HIVE-8534:
--

Doc note:  This adds *hive.security.authorization.sqlstd.confwhitelist.append* 
and changes the description of 
*hive.security.authorization.sqlstd.confwhitelist* in HiveConf.java, so they 
need to be documented in the wiki.

* [Configuration Properties -- SQL Standard Based Authorization | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-SQLStandardBasedAuthorization]
* [SQL Standard Based Hive Authorization -- Restrictions on Hive Commands and 
Statements | 
https://cwiki.apache.org/confluence/display/Hive/SQL+Standard+Based+Hive+Authorization#SQLStandardBasedHiveAuthorization-RestrictionsonHiveCommandsandStatements]
* and optionally [SQL Standard Based Hive Authorization -- Configuration | 
https://cwiki.apache.org/confluence/display/Hive/SQL+Standard+Based+Hive+Authorization#SQLStandardBasedHiveAuthorization-Configuration]

 sql std auth : update configuration whitelist for 0.14
 --

 Key: HIVE-8534
 URL: https://issues.apache.org/jira/browse/HIVE-8534
 Project: Hive
  Issue Type: Bug
  Components: Authorization, SQLStandardAuthorization
Reporter: Thejas M Nair
Assignee: Thejas M Nair
Priority: Blocker
  Labels: TODOC14
 Fix For: 0.14.0

 Attachments: HIVE-8534.1.patch, HIVE-8534.2.patch, HIVE-8534.3.patch, 
 HIVE-8534.4.patch, HIVE-8534.5.patch


 New config parameters have been introduced in hive 0.14. SQL standard 
 authorization needs to be updated to allow some new parameters to be set, 
 when the authorization mode is enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8582) Outer Join Simplification is broken


 [ 
https://issues.apache.org/jira/browse/HIVE-8582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-8582:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Failures unrelated. Committed to trunk and branch.

 Outer Join Simplification is broken
 ---

 Key: HIVE-8582
 URL: https://issues.apache.org/jira/browse/HIVE-8582
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Affects Versions: 0.14.0
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8582.patch, HIVE-8582.patch


 CLEAR LIBRARY CACHE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-6806) CREATE TABLE should support STORED AS AVRO


[ 
https://issues.apache.org/jira/browse/HIVE-6806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182579#comment-14182579
 ] 

Navis commented on HIVE-6806:
-

[~leftylev] Right. I'll book that into new issue.

 CREATE TABLE should support STORED AS AVRO
 --

 Key: HIVE-6806
 URL: https://issues.apache.org/jira/browse/HIVE-6806
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Affects Versions: 0.12.0
Reporter: Jeremy Beard
Assignee: Ashish Kumar Singh
Priority: Minor
  Labels: Avro
 Fix For: 0.14.0

 Attachments: HIVE-6806.1.patch, HIVE-6806.2.patch, HIVE-6806.3.patch, 
 HIVE-6806.patch


 Avro is well established and widely used within Hive, however creating 
 Avro-backed tables requires the messy listing of the SerDe, InputFormat and 
 OutputFormat classes.
 Similarly to HIVE-5783 for Parquet, Hive would be easier to use if it had 
 native Avro support.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-8591) hive.default.fileformat should accept all formats described by StorageFormatDescriptor

Navis created HIVE-8591:
---

 Summary: hive.default.fileformat should accept all formats 
described by StorageFormatDescriptor
 Key: HIVE-8591
 URL: https://issues.apache.org/jira/browse/HIVE-8591
 Project: Hive
  Issue Type: Task
  Components: Configuration
Reporter: Navis
Assignee: Navis
Priority: Minor


FileFormats are described by StorageFormatDescriptor, which is added in 
HIVE-5976. Validator for FileFormats should reflect that also.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8591) hive.default.fileformat should accept all formats described by StorageFormatDescriptor


 [ 
https://issues.apache.org/jira/browse/HIVE-8591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-8591:

Description: 
NO PRECOMMIT TESTS

FileFormats are described by StorageFormatDescriptor, which is added in 
HIVE-5976. Validator for FileFormats should reflect that also.

  was:FileFormats are described by StorageFormatDescriptor, which is added in 
HIVE-5976. Validator for FileFormats should reflect that also.


 hive.default.fileformat should accept all formats described by 
 StorageFormatDescriptor
 --

 Key: HIVE-8591
 URL: https://issues.apache.org/jira/browse/HIVE-8591
 Project: Hive
  Issue Type: Task
  Components: Configuration
Reporter: Navis
Assignee: Navis
Priority: Minor

 NO PRECOMMIT TESTS
 FileFormats are described by StorageFormatDescriptor, which is added in 
 HIVE-5976. Validator for FileFormats should reflect that also.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8543) Compactions fail on metastore using postgres

2014-10-24 Thread Damien Carol (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182584#comment-14182584
 ] 

Damien Carol commented on HIVE-8543:


[~alangates] You're welcome. I'm sorry I was very busy these last few weeks I 
have not been able to take care of these postgres tickets. You're making a 
good job with these ones.

 Compactions fail on metastore using postgres
 

 Key: HIVE-8543
 URL: https://issues.apache.org/jira/browse/HIVE-8543
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8543.patch


 The worker fails to update the stats when the metastore is using Postgres as 
 the RDBMS.  
 {code}
 org.postgresql.util.PSQLException: ERROR: relation tab_col_stats does not 
 exist
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8591) hive.default.fileformat should accept all formats described by StorageFormatDescriptor


 [ 
https://issues.apache.org/jira/browse/HIVE-8591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-8591:

Status: Patch Available  (was: Open)

 hive.default.fileformat should accept all formats described by 
 StorageFormatDescriptor
 --

 Key: HIVE-8591
 URL: https://issues.apache.org/jira/browse/HIVE-8591
 Project: Hive
  Issue Type: Task
  Components: Configuration
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-8591.1.patch.txt


 NO PRECOMMIT TESTS
 FileFormats are described by StorageFormatDescriptor, which is added in 
 HIVE-5976. Validator for FileFormats should reflect that also.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8591) hive.default.fileformat should accept all formats described by StorageFormatDescriptor


 [ 
https://issues.apache.org/jira/browse/HIVE-8591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-8591:

Attachment: HIVE-8591.1.patch.txt

 hive.default.fileformat should accept all formats described by 
 StorageFormatDescriptor
 --

 Key: HIVE-8591
 URL: https://issues.apache.org/jira/browse/HIVE-8591
 Project: Hive
  Issue Type: Task
  Components: Configuration
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-8591.1.patch.txt


 NO PRECOMMIT TESTS
 FileFormats are described by StorageFormatDescriptor, which is added in 
 HIVE-5976. Validator for FileFormats should reflect that also.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8564) DROP TABLE IF EXISTS throws exception if the table does not exist.


 [ 
https://issues.apache.org/jira/browse/HIVE-8564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-8564:

Description: 
NO PRECOMMIT TESTS

DROP TABLE IF EXISTS throws exception if the table does not exist.  

I tried set hive.exec.drop.ignorenonexistent=true, and it made no difference.


hive DROP TABLE IF EXISTS testdb.mytable;
14/10/22 15:48:29 ERROR metadata.Hive: 
NoSuchObjectException(message:testdb.mytable table not found)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_result$get_table_resultStandardScheme.read(ThriftHiveMetastore.java:29338)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_result$get_table_resultStandardScheme.read(ThriftHiveMetastore.java:29306)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_result.read(ThriftHiveMetastore.java:29237)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_table(ThriftHiveMetastore.java:1036)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_table(ThriftHiveMetastore.java:1022)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:997)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:89)
at com.sun.proxy.$Proxy7.getTable(Unknown Source)
at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:975)
at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:930)
at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:917)
at 
org.apache.hadoop.hive.ql.exec.DDLTask.dropTableOrPartitions(DDLTask.java:3846)
at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:306)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1503)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1270)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1088)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423)
at 
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:792)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

OK


  was:
DROP TABLE IF EXISTS throws exception if the table does not exist.  

I tried set hive.exec.drop.ignorenonexistent=true, and it made no difference.


hive DROP TABLE IF EXISTS testdb.mytable;
14/10/22 15:48:29 ERROR metadata.Hive: 
NoSuchObjectException(message:testdb.mytable table not found)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_result$get_table_resultStandardScheme.read(ThriftHiveMetastore.java:29338)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_result$get_table_resultStandardScheme.read(ThriftHiveMetastore.java:29306)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_result.read(ThriftHiveMetastore.java:29237)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_table(ThriftHiveMetastore.java:1036)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_table(ThriftHiveMetastore.java:1022)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:997)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at

[jira] [Updated] (HIVE-8564) DROP TABLE IF EXISTS throws exception if the table does not exist.


 [ 
https://issues.apache.org/jira/browse/HIVE-8564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-8564:

Assignee: Navis
  Status: Patch Available  (was: Open)

 DROP TABLE IF EXISTS throws exception if the table does not exist.  
 

 Key: HIVE-8564
 URL: https://issues.apache.org/jira/browse/HIVE-8564
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.13.1
Reporter: Ben
Assignee: Navis
Priority: Minor
 Attachments: HIVE-8564.1.patch.txt


 NO PRECOMMIT TESTS
 DROP TABLE IF EXISTS throws exception if the table does not exist.  
 I tried set hive.exec.drop.ignorenonexistent=true, and it made no difference.
 hive DROP TABLE IF EXISTS testdb.mytable;
 14/10/22 15:48:29 ERROR metadata.Hive: 
 NoSuchObjectException(message:testdb.mytable table not found)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_result$get_table_resultStandardScheme.read(ThriftHiveMetastore.java:29338)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_result$get_table_resultStandardScheme.read(ThriftHiveMetastore.java:29306)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_result.read(ThriftHiveMetastore.java:29237)
 at 
 org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_table(ThriftHiveMetastore.java:1036)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_table(ThriftHiveMetastore.java:1022)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:997)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:89)
 at com.sun.proxy.$Proxy7.getTable(Unknown Source)
 at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:975)
 at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:930)
 at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:917)
 at 
 org.apache.hadoop.hive.ql.exec.DDLTask.dropTableOrPartitions(DDLTask.java:3846)
 at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:306)
 at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
 at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
 at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1503)
 at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1270)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1088)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423)
 at 
 org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:792)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
 OK



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8564) DROP TABLE IF EXISTS throws exception if the table does not exist.


 [ 
https://issues.apache.org/jira/browse/HIVE-8564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-8564:

Attachment: HIVE-8564.1.patch.txt

It's just a log message (DDLTask returns 0). But seemed annoying.

 DROP TABLE IF EXISTS throws exception if the table does not exist.  
 

 Key: HIVE-8564
 URL: https://issues.apache.org/jira/browse/HIVE-8564
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.13.1
Reporter: Ben
Priority: Minor
 Attachments: HIVE-8564.1.patch.txt


 NO PRECOMMIT TESTS
 DROP TABLE IF EXISTS throws exception if the table does not exist.  
 I tried set hive.exec.drop.ignorenonexistent=true, and it made no difference.
 hive DROP TABLE IF EXISTS testdb.mytable;
 14/10/22 15:48:29 ERROR metadata.Hive: 
 NoSuchObjectException(message:testdb.mytable table not found)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_result$get_table_resultStandardScheme.read(ThriftHiveMetastore.java:29338)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_result$get_table_resultStandardScheme.read(ThriftHiveMetastore.java:29306)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_result.read(ThriftHiveMetastore.java:29237)
 at 
 org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_table(ThriftHiveMetastore.java:1036)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_table(ThriftHiveMetastore.java:1022)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:997)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:89)
 at com.sun.proxy.$Proxy7.getTable(Unknown Source)
 at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:975)
 at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:930)
 at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:917)
 at 
 org.apache.hadoop.hive.ql.exec.DDLTask.dropTableOrPartitions(DDLTask.java:3846)
 at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:306)
 at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
 at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
 at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1503)
 at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1270)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1088)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423)
 at 
 org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:792)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
 OK



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-6165) Unify HivePreparedStatement from jdbc:hive and jdbc:hive2


[ 
https://issues.apache.org/jira/browse/HIVE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182648#comment-14182648
 ] 

Hive QA commented on HIVE-6165:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12676822/HIVE-6165.2.patch

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 6563 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_correctness
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_correctness
org.apache.hive.hcatalog.streaming.TestStreaming.testInterleavedTransactionBatchCommits
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchCommit_Json
org.apache.hive.minikdc.TestJdbcWithMiniKdc.testNegativeTokenAuth
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1439/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1439/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1439/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12676822 - PreCommit-HIVE-TRUNK-Build

 Unify HivePreparedStatement from jdbc:hive and jdbc:hive2
 -

 Key: HIVE-6165
 URL: https://issues.apache.org/jira/browse/HIVE-6165
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, JDBC
Reporter: Helmut Zechmann
Priority: Minor
 Attachments: HIVE-6165.1.patch.txt, HIVE-6165.1.patch.txt, 
 HIVE-6165.2.patch, HIVE-6165.2.patch.txt


 org.apache.hadoop.hive.jdbc.HivePreparedStatement.class from the hive jdbc 
 driver and org.apache.hive.jdbc.HivePreparedStatement.class from the hive2 
 jdbc drivers contain lots of duplicate code. 
 Especially hive-HivePreparedStatement supports setObject, while the hive2 
 version does not.
 Share more code between the two to avoid duplicate work and to make sure that 
 both support the broadest possible feature set.
 CLEAR LIBRARY CACHE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8585) Constant folding should happen before ppd


[ 
https://issues.apache.org/jira/browse/HIVE-8585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182669#comment-14182669
 ] 

Hive QA commented on HIVE-8585:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12676825/HIVE-8585.patch

{color:red}ERROR:{color} -1 due to 71 failed/errored test(s), 6578 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_join_pkfk
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join16
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_without_localtask
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_smb_mapjoin_14
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_10
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_9
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cluster
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer10
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer9
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cross_product_check_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_grouping_sets4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_ppd
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_mult_tables
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_mult_tables_compact
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join16
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join34
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join35
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join38
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_cond_pushdown_3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_cond_pushdown_unqual3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_vc
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_query_oneskew_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_mapjoin_mapjoin
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_clusterby
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_gby_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_join2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_join3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_join4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_outer_join5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_random
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_udf_case
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_udf_col
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_union_view
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_quotedid_basic
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_quotedid_partition
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_semijoin
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_smb_mapjoin_14
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_smb_mapjoin_25
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_smb_mapjoin_6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sort_merge_join_desc_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sort_merge_join_desc_8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_exists
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_in
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_in_having
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_notin_having
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_unqualcolumnrefs
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_views
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_transform_ppr1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union20
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union24
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_19
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_decimal_mapjoin
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_mapjoin_reduce
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_pushdown
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_queries
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_5

[jira] [Created] (HIVE-8592) 0 values convert to null if casting to or inserting to Hive DECIMAL where precision and scale are the same

2014-10-24 Thread Aidan Semple (JIRA)

Aidan Semple created HIVE-8592:
--

 Summary: 0 values convert to null if casting to or inserting to 
Hive DECIMAL where precision and scale are the same
 Key: HIVE-8592
 URL: https://issues.apache.org/jira/browse/HIVE-8592
 Project: Hive
  Issue Type: Bug
  Components: Database/Schema, SQL
Affects Versions: 0.13.0
 Environment: Running Apache Hive version 0.13.0 using HortonWorks 
2.1.2.1 with hadoop version 2.4.0.2.1.2.1-471, on Linux operating system 
centos5 (also occurs on centos6)
Reporter: Aidan Semple
 Fix For: 0.13.0


I am trying to load zero values into Hive Decimal fields into a Hive table 
where the precision and scale are defined as the same e.g. DECIMAL(1,1) or 
DECIMAL(3,3) etc...
However every time I run a hive ql insert statement to do this containing zero 
values or run a LOAD DATA command to load a text file of data containing zero 
values to these columns / fields, on performing a query on the table, these 
zero values are displayed and treated as NULL values.
On further investigation, I was able to narrow the problem down to doing simple 
selects with casts. See example and output from Hive below. So attempting to do 
a cast for 0 or 0.0 or '.0' to DECIMAL(1,1) NULL is returned instead of 0. This 
is the same for precisions 1-38 where the scale is also the same
If there is a work around for this then please let me know. Thanks!

hive select cast('.0' as DECIMAL(1,1)), cast('0.0' as DECIMAL(1,1)), cast('0' 
as DECIMAL(1,1)), cast(0 as DECIMAL(1,1)), cast(0.0 as DECIMAL(1,1));
Query ID = xxx_2014102414_e4dfdcc1-e4ad-4f84-bd48-198e29fd3757
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1413470329106_0052, Tracking URL = 
http://hdp8:8088/proxy/application_1413470329106_0052/
Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill job_1413470329106_0052
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2014-10-24 14:01:10,256 Stage-1 map = 0%,  reduce = 0%
2014-10-24 14:01:27,644 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 6.51 
sec
MapReduce Total cumulative CPU time: 6 seconds 510 msec
Ended Job = job_1413470329106_0052
MapReduce Jobs Launched: 
Job 0: Map: 1   Cumulative CPU: 6.51 sec   HDFS Read: 269 HDFS Write: 15 SUCCESS
Total MapReduce CPU Time Spent: 6 seconds 510 msec
OK
NULLNULLNULLNULLNULL
Time taken: 36.281 seconds, Fetched: 1 row(s)















--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8535) Enable compile time skew join optimization for spark [Spark Branch]

2014-10-24 Thread Rui Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182755#comment-14182755
 ] 

Rui Li commented on HIVE-8535:
--

The failed test is because I added SORT_QUERY_RESULTS label to the qfile which 
has non-deterministic results in order.
Maybe we have to merge that change to the trunk.

 Enable compile time skew join optimization for spark [Spark Branch]
 ---

 Key: HIVE-8535
 URL: https://issues.apache.org/jira/browse/HIVE-8535
 Project: Hive
  Issue Type: Improvement
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-8535.1-spark.patch, HIVE-8535.2-spark.patch


 Sub-task of HIVE-8406



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8406) Research on skewed join [Spark Branch]

2014-10-24 Thread Rui Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-8406:
-
Attachment: Skew join background.pdf

Upload the doc so it may help people get a better understand how skew join is 
done. Comments and suggestions are welcome.
The doc may change as I dig deeper into the details and begin implementation.

 Research on skewed join [Spark Branch]
 --

 Key: HIVE-8406
 URL: https://issues.apache.org/jira/browse/HIVE-8406
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Rui Li
 Attachments: Skew join background.pdf


 Research on how to handle skewed join for hive on spark. Here is original 
 hive's design doc for skewed join, 
 https://cwiki.apache.org/confluence/display/Hive/Skewed+Join+Optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8577) Cannot deserialize Avro schema with a mapstring,string with null values


[ 
https://issues.apache.org/jira/browse/HIVE-8577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182764#comment-14182764
 ] 

Hive QA commented on HIVE-8577:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12676823/HIVE-8577.1.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6562 tests executed
*Failed tests:*
{noformat}
org.apache.hive.minikdc.TestJdbcWithMiniKdc.testNegativeTokenAuth
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1441/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1441/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1441/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12676823 - PreCommit-HIVE-TRUNK-Build

 Cannot deserialize Avro schema with a mapstring,string with null values
 -

 Key: HIVE-8577
 URL: https://issues.apache.org/jira/browse/HIVE-8577
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.15.0
Reporter: Sergio Peña
Assignee: Sergio Peña
  Labels: regression
 Attachments: HIVE-8577.1.patch, HIVE-8577.1.patch, 
 map_null_schema.avro, map_null_val.avro


 An avro table with a mapstring,string column that contains null values
 cannot be deserialized when running the select statement.
 Create the following table:
 {noformat}
 CREATE TABLE avro_table (avreau_col_1 mapstring,string) 
 ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS
 INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' 
 OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' 
 TBLPROPERTIES ('avro.schema.url'='file:///tmp/map_null_schema.avro');
 {noformat}
 Then load the avro data:
 {noformat}
 LOAD DATA LOCAL INPATH '/tmp/map_null_val.avro' OVERWRITE INTO TABLE 
 avro_table;
 {noformat}
 And do the select (it fails):
 {noformat}
 SELECT * FROM avro_table;
 Error: java.io.IOException: org.apache.avro.AvroRuntimeException: Not a map: 
 null (state=,code=0)
 {noformat}
 This is a regression bug (it works correctly on hive 0.13.1 version).
 This is the output that hive 0.13.1 displays:
 {noformat}
 {key3:val3,key4:null}
 {key3:val3,key4:null}
 {key1:null,key2:val2}
 {key3:val3,key4:null}
 {key3:val3,key4:null}
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8535) Enable compile time skew join optimization for spark [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-8535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182798#comment-14182798
 ] 

Xuefu Zhang commented on HIVE-8535:
---

Hi [~lirui], for those tests that you added SORT_QUERY_RESULTS, please create a 
JIRA on trunk. We will merge it to Spark branch once it's committed. Thanks.

 Enable compile time skew join optimization for spark [Spark Branch]
 ---

 Key: HIVE-8535
 URL: https://issues.apache.org/jira/browse/HIVE-8535
 Project: Hive
  Issue Type: Improvement
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-8535.1-spark.patch, HIVE-8535.2-spark.patch


 Sub-task of HIVE-8406



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HIVE-8592) 0 values convert to null if casting to or inserting to Hive DECIMAL where precision and scale are the same


 [ 
https://issues.apache.org/jira/browse/HIVE-8592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang resolved HIVE-8592.
---
Resolution: Duplicate

Dupe of HIVE-8559.

 0 values convert to null if casting to or inserting to Hive DECIMAL where 
 precision and scale are the same
 --

 Key: HIVE-8592
 URL: https://issues.apache.org/jira/browse/HIVE-8592
 Project: Hive
  Issue Type: Bug
  Components: Database/Schema, SQL
Affects Versions: 0.13.0
 Environment: Running Apache Hive version 0.13.0 using HortonWorks 
 2.1.2.1 with hadoop version 2.4.0.2.1.2.1-471, on Linux operating system 
 centos5 (also occurs on centos6)
Reporter: Aidan Semple
 Fix For: 0.13.0


 I am trying to load zero values into Hive Decimal fields into a Hive table 
 where the precision and scale are defined as the same e.g. DECIMAL(1,1) or 
 DECIMAL(3,3) etc...
 However every time I run a hive ql insert statement to do this containing 
 zero values or run a LOAD DATA command to load a text file of data containing 
 zero values to these columns / fields, on performing a query on the table, 
 these zero values are displayed and treated as NULL values.
 On further investigation, I was able to narrow the problem down to doing 
 simple selects with casts. See example and output from Hive below. So 
 attempting to do a cast for 0 or 0.0 or '.0' to DECIMAL(1,1) NULL is returned 
 instead of 0. This is the same for precisions 1-38 where the scale is also 
 the same
 If there is a work around for this then please let me know. Thanks!
 hive select cast('.0' as DECIMAL(1,1)), cast('0.0' as DECIMAL(1,1)), 
 cast('0' as DECIMAL(1,1)), cast(0 as DECIMAL(1,1)), cast(0.0 as DECIMAL(1,1));
 Query ID = xxx_2014102414_e4dfdcc1-e4ad-4f84-bd48-198e29fd3757
 Total jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks is set to 0 since there's no reduce operator
 Starting Job = job_1413470329106_0052, Tracking URL = 
 http://hdp8:8088/proxy/application_1413470329106_0052/
 Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill job_1413470329106_0052
 Hadoop job information for Stage-1: number of mappers: 1; number of reducers:  0
 2014-10-24 14:01:10,256 Stage-1 map = 0%,  reduce = 0%
 2014-10-24 14:01:27,644 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 6.51 
 sec
 MapReduce Total cumulative CPU time: 6 seconds 510 msec
 Ended Job = job_1413470329106_0052
 MapReduce Jobs Launched: 
 Job 0: Map: 1   Cumulative CPU: 6.51 sec   HDFS Read: 269 HDFS Write: 15 
 SUCCESS
 Total MapReduce CPU Time Spent: 6 seconds 510 msec
 OK
 NULLNULLNULLNULLNULL
 Time taken: 36.281 seconds, Fetched: 1 row(s)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8588) sqoop REST endpoint fails to send appropriate JDBC driver to the cluster


 [ 
https://issues.apache.org/jira/browse/HIVE-8588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-8588:
-
Priority: Critical  (was: Major)

 sqoop REST endpoint fails to send appropriate JDBC driver to the cluster
 

 Key: HIVE-8588
 URL: https://issues.apache.org/jira/browse/HIVE-8588
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Affects Versions: 0.14.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
Priority: Critical
 Attachments: HIVE-8588.1.patch


 This is originally discovered by [~deepesh]
 When running a Sqoop integration test from WebHCat
 {noformat}
 curl --show-error -d command=export -libjars 
 hdfs:///tmp/mysql-connector-java.jar --connect 
 jdbc:mysql://deepesh-c6-1.cs1cloud.internal/sqooptest --username sqoop 
 --password passwd --export-dir /tmp/templeton_test_data/sqoop --table person 
 -d statusdir=sqoop.output -X POST 
 http://deepesh-c6-1.cs1cloud.internal:50111/templeton/v1/sqoop?user.name=hrt_qa;
 {noformat}
 the job is failing with the following error:
 {noformat}
 $ hadoop fs -cat /user/hrt_qa/sqoop.output/stderr
 14/10/15 23:52:53 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5.2.2.0.0-897
 14/10/15 23:52:53 WARN tool.BaseSqoopTool: Setting your password on the 
 command-line is insecure. Consider using -P instead.
 14/10/15 23:52:54 INFO manager.MySQLManager: Preparing to use a MySQL 
 streaming resultset.
 14/10/15 23:52:54 INFO tool.CodeGenTool: Beginning code generation
 14/10/15 23:52:54 ERROR sqoop.Sqoop: Got exception running Sqoop: 
 java.lang.RuntimeException: Could not load db driver class: 
 com.mysql.jdbc.Driver
 java.lang.RuntimeException: Could not load db driver class: 
 com.mysql.jdbc.Driver
   at 
 org.apache.sqoop.manager.SqlManager.makeConnection(SqlManager.java:848)
   at 
 org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:52)
   at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:736)
   at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:759)
   at 
 org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:269)
   at 
 org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:240)
   at 
 org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:226)
   at 
 org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:295)
   at 
 org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1773)
   at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1578)
   at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:96)
   at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:64)
   at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:100)
   at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
   at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
   at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
   at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
 {noformat}
 Note that the Sqoop tar bundle does not contain the JDBC connector jar. I 
 think the problem here maybe that the mysql connector jar added to libjars 
 isn't available to the Sqoop tool which first connects to the database 
 through JDBC driver to collect some table information before running the MR 
 job. libjars will only add the connector jar for the MR job and not the local 
 one.
 NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8588) sqoop REST endpoint fails to send appropriate JDBC driver to the cluster


 [ 
https://issues.apache.org/jira/browse/HIVE-8588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-8588:
-
Status: Patch Available  (was: Open)

 sqoop REST endpoint fails to send appropriate JDBC driver to the cluster
 

 Key: HIVE-8588
 URL: https://issues.apache.org/jira/browse/HIVE-8588
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Affects Versions: 0.14.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Attachments: HIVE-8588.1.patch


 This is originally discovered by [~deepesh]
 When running a Sqoop integration test from WebHCat
 {noformat}
 curl --show-error -d command=export -libjars 
 hdfs:///tmp/mysql-connector-java.jar --connect 
 jdbc:mysql://deepesh-c6-1.cs1cloud.internal/sqooptest --username sqoop 
 --password passwd --export-dir /tmp/templeton_test_data/sqoop --table person 
 -d statusdir=sqoop.output -X POST 
 http://deepesh-c6-1.cs1cloud.internal:50111/templeton/v1/sqoop?user.name=hrt_qa;
 {noformat}
 the job is failing with the following error:
 {noformat}
 $ hadoop fs -cat /user/hrt_qa/sqoop.output/stderr
 14/10/15 23:52:53 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5.2.2.0.0-897
 14/10/15 23:52:53 WARN tool.BaseSqoopTool: Setting your password on the 
 command-line is insecure. Consider using -P instead.
 14/10/15 23:52:54 INFO manager.MySQLManager: Preparing to use a MySQL 
 streaming resultset.
 14/10/15 23:52:54 INFO tool.CodeGenTool: Beginning code generation
 14/10/15 23:52:54 ERROR sqoop.Sqoop: Got exception running Sqoop: 
 java.lang.RuntimeException: Could not load db driver class: 
 com.mysql.jdbc.Driver
 java.lang.RuntimeException: Could not load db driver class: 
 com.mysql.jdbc.Driver
   at 
 org.apache.sqoop.manager.SqlManager.makeConnection(SqlManager.java:848)
   at 
 org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:52)
   at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:736)
   at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:759)
   at 
 org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:269)
   at 
 org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:240)
   at 
 org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:226)
   at 
 org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:295)
   at 
 org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1773)
   at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1578)
   at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:96)
   at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:64)
   at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:100)
   at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
   at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
   at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
   at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
 {noformat}
 Note that the Sqoop tar bundle does not contain the JDBC connector jar. I 
 think the problem here maybe that the mysql connector jar added to libjars 
 isn't available to the Sqoop tool which first connects to the database 
 through JDBC driver to collect some table information before running the MR 
 job. libjars will only add the connector jar for the MR job and not the local 
 one.
 NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8588) sqoop REST endpoint fails to send appropriate JDBC driver to the cluster


 [ 
https://issues.apache.org/jira/browse/HIVE-8588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-8588:
-
Attachment: HIVE-8588.1.patch

 sqoop REST endpoint fails to send appropriate JDBC driver to the cluster
 

 Key: HIVE-8588
 URL: https://issues.apache.org/jira/browse/HIVE-8588
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Affects Versions: 0.14.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Attachments: HIVE-8588.1.patch


 This is originally discovered by [~deepesh]
 When running a Sqoop integration test from WebHCat
 {noformat}
 curl --show-error -d command=export -libjars 
 hdfs:///tmp/mysql-connector-java.jar --connect 
 jdbc:mysql://deepesh-c6-1.cs1cloud.internal/sqooptest --username sqoop 
 --password passwd --export-dir /tmp/templeton_test_data/sqoop --table person 
 -d statusdir=sqoop.output -X POST 
 http://deepesh-c6-1.cs1cloud.internal:50111/templeton/v1/sqoop?user.name=hrt_qa;
 {noformat}
 the job is failing with the following error:
 {noformat}
 $ hadoop fs -cat /user/hrt_qa/sqoop.output/stderr
 14/10/15 23:52:53 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5.2.2.0.0-897
 14/10/15 23:52:53 WARN tool.BaseSqoopTool: Setting your password on the 
 command-line is insecure. Consider using -P instead.
 14/10/15 23:52:54 INFO manager.MySQLManager: Preparing to use a MySQL 
 streaming resultset.
 14/10/15 23:52:54 INFO tool.CodeGenTool: Beginning code generation
 14/10/15 23:52:54 ERROR sqoop.Sqoop: Got exception running Sqoop: 
 java.lang.RuntimeException: Could not load db driver class: 
 com.mysql.jdbc.Driver
 java.lang.RuntimeException: Could not load db driver class: 
 com.mysql.jdbc.Driver
   at 
 org.apache.sqoop.manager.SqlManager.makeConnection(SqlManager.java:848)
   at 
 org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:52)
   at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:736)
   at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:759)
   at 
 org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:269)
   at 
 org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:240)
   at 
 org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:226)
   at 
 org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:295)
   at 
 org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1773)
   at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1578)
   at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:96)
   at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:64)
   at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:100)
   at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
   at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
   at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
   at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
 {noformat}
 Note that the Sqoop tar bundle does not contain the JDBC connector jar. I 
 think the problem here maybe that the mysql connector jar added to libjars 
 isn't available to the Sqoop tool which first connects to the database 
 through JDBC driver to collect some table information before running the MR 
 job. libjars will only add the connector jar for the MR job and not the local 
 one.
 NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-8593) Unintended regex is used in ScriptOperator#blackListed()

2014-10-24 Thread Ted Yu (JIRA)

Ted Yu created HIVE-8593:


 Summary: Unintended regex is used in ScriptOperator#blackListed()
 Key: HIVE-8593
 URL: https://issues.apache.org/jira/browse/HIVE-8593
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor


{code}
  for (String b : bls) {
b.replaceAll(., _);
{code}
The dot can match any character.

See 
http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#replaceAll(java.lang.String,%20java.lang.String)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8588) sqoop REST endpoint fails to send appropriate JDBC driver to the cluster


[ 
https://issues.apache.org/jira/browse/HIVE-8588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182970#comment-14182970
 ] 

Eugene Koifman commented on HIVE-8588:
--

I meant to say in my previous comment that it would be good to get this into 
0.14.

 sqoop REST endpoint fails to send appropriate JDBC driver to the cluster
 

 Key: HIVE-8588
 URL: https://issues.apache.org/jira/browse/HIVE-8588
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Affects Versions: 0.14.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
Priority: Critical
 Attachments: HIVE-8588.1.patch


 This is originally discovered by [~deepesh]
 When running a Sqoop integration test from WebHCat
 {noformat}
 curl --show-error -d command=export -libjars 
 hdfs:///tmp/mysql-connector-java.jar --connect 
 jdbc:mysql://deepesh-c6-1.cs1cloud.internal/sqooptest --username sqoop 
 --password passwd --export-dir /tmp/templeton_test_data/sqoop --table person 
 -d statusdir=sqoop.output -X POST 
 http://deepesh-c6-1.cs1cloud.internal:50111/templeton/v1/sqoop?user.name=hrt_qa;
 {noformat}
 the job is failing with the following error:
 {noformat}
 $ hadoop fs -cat /user/hrt_qa/sqoop.output/stderr
 14/10/15 23:52:53 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5.2.2.0.0-897
 14/10/15 23:52:53 WARN tool.BaseSqoopTool: Setting your password on the 
 command-line is insecure. Consider using -P instead.
 14/10/15 23:52:54 INFO manager.MySQLManager: Preparing to use a MySQL 
 streaming resultset.
 14/10/15 23:52:54 INFO tool.CodeGenTool: Beginning code generation
 14/10/15 23:52:54 ERROR sqoop.Sqoop: Got exception running Sqoop: 
 java.lang.RuntimeException: Could not load db driver class: 
 com.mysql.jdbc.Driver
 java.lang.RuntimeException: Could not load db driver class: 
 com.mysql.jdbc.Driver
   at 
 org.apache.sqoop.manager.SqlManager.makeConnection(SqlManager.java:848)
   at 
 org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:52)
   at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:736)
   at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:759)
   at 
 org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:269)
   at 
 org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:240)
   at 
 org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:226)
   at 
 org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:295)
   at 
 org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1773)
   at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1578)
   at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:96)
   at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:64)
   at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:100)
   at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
   at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
   at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
   at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
 {noformat}
 Note that the Sqoop tar bundle does not contain the JDBC connector jar. I 
 think the problem here maybe that the mysql connector jar added to libjars 
 isn't available to the Sqoop tool which first connects to the database 
 through JDBC driver to collect some table information before running the MR 
 job. libjars will only add the connector jar for the MR job and not the local 
 one.
 NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8588) sqoop REST endpoint fails to send appropriate JDBC driver to the cluster


[ 
https://issues.apache.org/jira/browse/HIVE-8588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182969#comment-14182969
 ] 

Eugene Koifman commented on HIVE-8588:
--

[~vikram.dixit] w/o this change, for users to submit Sqoop jobs via WebHCat 
requires them to modify the Sqoop tar file to include the additional JDBC jars 
in it which is a major usability issue especially when working with multiple 
DBs and upgrades.

 sqoop REST endpoint fails to send appropriate JDBC driver to the cluster
 

 Key: HIVE-8588
 URL: https://issues.apache.org/jira/browse/HIVE-8588
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Affects Versions: 0.14.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
Priority: Critical
 Attachments: HIVE-8588.1.patch


 This is originally discovered by [~deepesh]
 When running a Sqoop integration test from WebHCat
 {noformat}
 curl --show-error -d command=export -libjars 
 hdfs:///tmp/mysql-connector-java.jar --connect 
 jdbc:mysql://deepesh-c6-1.cs1cloud.internal/sqooptest --username sqoop 
 --password passwd --export-dir /tmp/templeton_test_data/sqoop --table person 
 -d statusdir=sqoop.output -X POST 
 http://deepesh-c6-1.cs1cloud.internal:50111/templeton/v1/sqoop?user.name=hrt_qa;
 {noformat}
 the job is failing with the following error:
 {noformat}
 $ hadoop fs -cat /user/hrt_qa/sqoop.output/stderr
 14/10/15 23:52:53 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5.2.2.0.0-897
 14/10/15 23:52:53 WARN tool.BaseSqoopTool: Setting your password on the 
 command-line is insecure. Consider using -P instead.
 14/10/15 23:52:54 INFO manager.MySQLManager: Preparing to use a MySQL 
 streaming resultset.
 14/10/15 23:52:54 INFO tool.CodeGenTool: Beginning code generation
 14/10/15 23:52:54 ERROR sqoop.Sqoop: Got exception running Sqoop: 
 java.lang.RuntimeException: Could not load db driver class: 
 com.mysql.jdbc.Driver
 java.lang.RuntimeException: Could not load db driver class: 
 com.mysql.jdbc.Driver
   at 
 org.apache.sqoop.manager.SqlManager.makeConnection(SqlManager.java:848)
   at 
 org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:52)
   at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:736)
   at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:759)
   at 
 org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:269)
   at 
 org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:240)
   at 
 org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:226)
   at 
 org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:295)
   at 
 org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1773)
   at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1578)
   at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:96)
   at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:64)
   at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:100)
   at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
   at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
   at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
   at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
 {noformat}
 Note that the Sqoop tar bundle does not contain the JDBC connector jar. I 
 think the problem here maybe that the mysql connector jar added to libjars 
 isn't available to the Sqoop tool which first connects to the database 
 through JDBC driver to collect some table information before running the MR 
 job. libjars will only add the connector jar for the MR job and not the local 
 one.
 NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8579) Guaranteed NPE in DDLSemanticAnalyzer


[ 
https://issues.apache.org/jira/browse/HIVE-8579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182971#comment-14182971
 ] 

Hive QA commented on HIVE-8579:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12676824/HIVE-8579.1.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6563 tests executed
*Failed tests:*
{noformat}
org.apache.hive.minikdc.TestJdbcWithMiniKdc.testNegativeTokenAuth
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1442/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1442/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1442/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12676824 - PreCommit-HIVE-TRUNK-Build

 Guaranteed NPE in DDLSemanticAnalyzer
 -

 Key: HIVE-8579
 URL: https://issues.apache.org/jira/browse/HIVE-8579
 Project: Hive
  Issue Type: Bug
Reporter: Lars Francke
Assignee: Jason Dere
 Attachments: HIVE-8579.1.patch, HIVE-8579.1.patch


 This was added by [~jdere] in HIVE-8411. I don't fully understand the code 
 (i.e. what it means when desc is null) but I'm sure, Jason, you can fix it 
 without much trouble?
 {code}
 if (desc == null || 
 !AlterTableDesc.doesAlterTableTypeSupportPartialPartitionSpec(desc.getOp())) {
   throw new SemanticException( 
 ErrorMsg.ALTER_TABLE_TYPE_PARTIAL_PARTITION_SPEC_NO_SUPPORTED, 
 desc.getOp().name());
 } else if (!conf.getBoolVar(HiveConf.ConfVars.DYNAMICPARTITIONING)) {
   throw new SemanticException(ErrorMsg.DYNAMIC_PARTITION_DISABLED);
 }
 {code}
 You check for whether {{desc}} is null but then use it to do {{desc.getOp()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-8594) Wrong condition in SettableConfigUpdater#setHiveConfWhiteList()

2014-10-24 Thread Ted Yu (JIRA)

Ted Yu created HIVE-8594:


 Summary: Wrong condition in 
SettableConfigUpdater#setHiveConfWhiteList()
 Key: HIVE-8594
 URL: https://issues.apache.org/jira/browse/HIVE-8594
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor


{code}
if(whiteListParamsStr == null  whiteListParamsStr.trim().isEmpty()) {
{code}
If whiteListParamsStr is null, the call to trim() would result in NPE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8588) sqoop REST endpoint fails to send appropriate JDBC driver to the cluster

2014-10-24 Thread Vikram Dixit K (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182976#comment-14182976
 ] 

Vikram Dixit K commented on HIVE-8588:
--

+1 for 0.14

 sqoop REST endpoint fails to send appropriate JDBC driver to the cluster
 

 Key: HIVE-8588
 URL: https://issues.apache.org/jira/browse/HIVE-8588
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Affects Versions: 0.14.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
Priority: Critical
 Attachments: HIVE-8588.1.patch


 This is originally discovered by [~deepesh]
 When running a Sqoop integration test from WebHCat
 {noformat}
 curl --show-error -d command=export -libjars 
 hdfs:///tmp/mysql-connector-java.jar --connect 
 jdbc:mysql://deepesh-c6-1.cs1cloud.internal/sqooptest --username sqoop 
 --password passwd --export-dir /tmp/templeton_test_data/sqoop --table person 
 -d statusdir=sqoop.output -X POST 
 http://deepesh-c6-1.cs1cloud.internal:50111/templeton/v1/sqoop?user.name=hrt_qa;
 {noformat}
 the job is failing with the following error:
 {noformat}
 $ hadoop fs -cat /user/hrt_qa/sqoop.output/stderr
 14/10/15 23:52:53 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5.2.2.0.0-897
 14/10/15 23:52:53 WARN tool.BaseSqoopTool: Setting your password on the 
 command-line is insecure. Consider using -P instead.
 14/10/15 23:52:54 INFO manager.MySQLManager: Preparing to use a MySQL 
 streaming resultset.
 14/10/15 23:52:54 INFO tool.CodeGenTool: Beginning code generation
 14/10/15 23:52:54 ERROR sqoop.Sqoop: Got exception running Sqoop: 
 java.lang.RuntimeException: Could not load db driver class: 
 com.mysql.jdbc.Driver
 java.lang.RuntimeException: Could not load db driver class: 
 com.mysql.jdbc.Driver
   at 
 org.apache.sqoop.manager.SqlManager.makeConnection(SqlManager.java:848)
   at 
 org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:52)
   at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:736)
   at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:759)
   at 
 org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:269)
   at 
 org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:240)
   at 
 org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:226)
   at 
 org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:295)
   at 
 org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1773)
   at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1578)
   at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:96)
   at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:64)
   at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:100)
   at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
   at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
   at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
   at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
 {noformat}
 Note that the Sqoop tar bundle does not contain the JDBC connector jar. I 
 think the problem here maybe that the mysql connector jar added to libjars 
 isn't available to the Sqoop tool which first connects to the database 
 through JDBC driver to collect some table information before running the MR 
 job. libjars will only add the connector jar for the MR job and not the local 
 one.
 NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-8595) Slow IO when stdout directed to NAS with large blocksize

2014-10-24 Thread Kevin English (JIRA)

Kevin English created HIVE-8595:
---

 Summary: Slow IO when stdout directed to NAS with large blocksize
 Key: HIVE-8595
 URL: https://issues.apache.org/jira/browse/HIVE-8595
 Project: Hive
  Issue Type: Bug
  Components: Clients
Affects Versions: 0.13.1
 Environment: nfs4 rsize=1048576,wsize=1048576
Reporter: Kevin English
Priority: Minor


Very slow IO when executing a SQL command file using the following command line 
when the target file system is an nfs4 mounted NAS with a large blocksize:

hive -f sqlscript.sql 2results.log results.tab

Work around (thousands of times faster):

hive -f sqlscript.sql 2results.log | cat results.tab

For instance I had a command finish 10 hours ago and I forgot to use cat and it 
is still writing out the output which after 10 hours is in the 180 GB range.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8585) Constant folding should happen before ppd


 [ 
https://issues.apache.org/jira/browse/HIVE-8585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-8585:
---
Status: Open  (was: Patch Available)

 Constant folding should happen before ppd
 -

 Key: HIVE-8585
 URL: https://issues.apache.org/jira/browse/HIVE-8585
 Project: Hive
  Issue Type: Improvement
  Components: Logical Optimizer
Affects Versions: 0.14.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-8585.1.patch, HIVE-8585.patch, HIVE-8585.patch


 will help {{NullScanOptimizer}} to kick in more places.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8585) Constant folding should happen before ppd


 [ 
https://issues.apache.org/jira/browse/HIVE-8585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-8585:
---
Status: Patch Available  (was: Open)

 Constant folding should happen before ppd
 -

 Key: HIVE-8585
 URL: https://issues.apache.org/jira/browse/HIVE-8585
 Project: Hive
  Issue Type: Improvement
  Components: Logical Optimizer
Affects Versions: 0.14.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-8585.1.patch, HIVE-8585.patch, HIVE-8585.patch


 will help {{NullScanOptimizer}} to kick in more places.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8585) Constant folding should happen before ppd


 [ 
https://issues.apache.org/jira/browse/HIVE-8585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-8585:
---
Attachment: HIVE-8585.1.patch

 Constant folding should happen before ppd
 -

 Key: HIVE-8585
 URL: https://issues.apache.org/jira/browse/HIVE-8585
 Project: Hive
  Issue Type: Improvement
  Components: Logical Optimizer
Affects Versions: 0.14.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-8585.1.patch, HIVE-8585.patch, HIVE-8585.patch


 will help {{NullScanOptimizer}} to kick in more places.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8583) HIVE-8341 Cleanup Test for hive.script.operator.env.blacklist

2014-10-24 Thread Alan Gates (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183002#comment-14183002
 ] 

Alan Gates commented on HIVE-8583:
--

Yes, Lars is correct.  That is just a piece of earlier code that I neglected to 
take out.

 HIVE-8341 Cleanup  Test for hive.script.operator.env.blacklist
 ---

 Key: HIVE-8583
 URL: https://issues.apache.org/jira/browse/HIVE-8583
 Project: Hive
  Issue Type: Improvement
Reporter: Lars Francke
Assignee: Lars Francke
Priority: Minor
 Attachments: HIVE-8583.1.patch


 [~alangates] added the following in HIVE-8341:
 {code}
 String bl = 
 hconf.get(HiveConf.ConfVars.HIVESCRIPT_ENV_BLACKLIST.toString());
 if (bl != null  bl.length()  0) {
   String[] bls = bl.split(,);
   for (String b : bls) {
 b.replaceAll(., _);
 blackListedConfEntries.add(b);
   }
 }
 {code}
 The {{replaceAll}} call is confusing as its result is not used at all.
 This patch contains the following:
 * Minor style modification (missorted modifiers)
 * Adds reading of default value for HIVESCRIPT_ENV_BLACKLIST
 * Removes replaceAll
 * Lets blackListed take a Configuration job as parameter which allowed me to 
 add a test for this



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 27117: HIVE-8457 - MapOperator initialization fails when multiple Spark threads is enabled [Spark Branch]

2014-10-24 Thread Chao Sun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27117/
---

(Updated Oct. 24, 2014, 4:51 p.m.)


Review request for hive and Xuefu Zhang.


Changes
---

Thanks Xuefu for the comments. I've updated my patch.


Bugs: HIVE-8457
https://issues.apache.org/jira/browse/HIVE-8457


Repository: hive-git


Description
---

Currently, on the Spark branch, each thread it is bound with a thread-local 
IOContext, which gets initialized when we generates an input HadoopRDD, and 
later used in MapOperator, FilterOperator, etc.
And, given the introduction of HIVE-8118, we may have multiple downstream RDDs 
that share the same input HadoopRDD, and we would like to have the HadoopRDD to 
be cached, to avoid scanning the same table multiple times. A typical case 
would be like the following:
 inputRDD inputRDD
||
   MT_11MT_12
||
   RT_1 RT_2
Here, MT_11 and MT_12 are MapTran from a splitted MapWork,
and RT_1 and RT_2 are two ReduceTran. Note that, this example is simplified, as 
we may also have ShuffleTran between MapTran and ReduceTran.
When multiple Spark threads are running, MT_11 may be executed first, and it 
will ask for an iterator from the HadoopRDD will trigger the creation of the 
iterator, which in turn triggers the initialization of the IOContext associated 
with that particular thread.
Now, the problem is: before MT_12 starts executing, it will also ask for an 
iterator from the
HadoopRDD, and since the RDD is already cached, instead of creating a new 
iterator, it will just fetch it from the cached result. However, this will skip 
the initialization of the IOContext associated with this particular thread. 
And, when MT_12 starts executing, it will try to initialize the MapOperator, 
but since the IOContext is not initialized, this will fail miserably.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkMapRecordHandler.java 
20ea977 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
00a6f3d 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 4de3ad4 
  ql/src/java/org/apache/hadoop/hive/ql/io/HiveContextAwareRecordReader.java 
58e1ceb 
  ql/src/java/org/apache/hadoop/hive/ql/io/IOContext.java 5fb3b13 

Diff: https://reviews.apache.org/r/27117/diff/


Testing
---

All multi-insertion related tests are passing on my local machine.


Thanks,

Chao Sun

[jira] [Updated] (HIVE-8457) MapOperator initialization fails when multiple Spark threads is enabled [Spark Branch]

2014-10-24 Thread Chao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao updated HIVE-8457:
---
Attachment: HIVE-8457.2-spark.patch

Addressing RB comments.

 MapOperator initialization fails when multiple Spark threads is enabled 
 [Spark Branch]
 --

 Key: HIVE-8457
 URL: https://issues.apache.org/jira/browse/HIVE-8457
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chao
Assignee: Chao
 Attachments: HIVE-8457.1-spark.patch, HIVE-8457.2-spark.patch


 Currently, on the Spark branch, each thread it is bound with a thread-local 
 IOContext, which gets initialized when we generates an input {{HadoopRDD}}, 
 and later used in {{MapOperator}}, {{FilterOperator}}, etc.
 And, given the introduction of HIVE-8118, we may have multiple downstream 
 RDDs that share the same input {{HadoopRDD}}, and we would like to have the 
 {{HadoopRDD}} to be cached, to avoid scanning the same table multiple times. 
 A typical case would be like the following:
 {noformat}
  inputRDD inputRDD
 ||
MT_11MT_12
 ||
RT_1 RT_2
 {noformat}
 Here, {{MT_11}} and {{MT_12}} are {{MapTran}} from a splitted {{MapWork}},
 and {{RT_1}} and {{RT_2}} are two {{ReduceTran}}. Note that, this example is 
 simplified, as we may also have {{ShuffleTran}} between {{MapTran}} and 
 {{ReduceTran}}.
 When multiple Spark threads are running, {{MT_11}} may be executed first, and 
 it will ask for an iterator from the {{HadoopRDD}} will trigger the creation 
 of the iterator, which in turn triggers the initialization of the 
 {{IOContext}} associated with that particular thread.
 *Now, the problem is*: before {{MT_12}} starts executing, it will also ask 
 for an iterator from the
 {{HadoopRDD}}, and since the RDD is already cached, instead of creating a new 
 iterator, it will just fetch it from the cached result. However, *this will 
 skip the initialization of the IOContext associated with this particular 
 thread*. And, when {{MT_12}} starts executing, it will try to initialize the 
 {{MapOperator}}, but since the {{IOContext}} is not initialized, this will 
 fail miserably. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8573) Fix some non-deterministic vectorization tests

2014-10-24 Thread Jimmy Xiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HIVE-8573:
--
Status: Patch Available  (was: Open)

 Fix some non-deterministic vectorization tests
 --

 Key: HIVE-8573
 URL: https://issues.apache.org/jira/browse/HIVE-8573
 Project: Hive
  Issue Type: Test
  Components: Tests
Affects Versions: 0.15.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Attachments: HIVE-8573.1.patch, HIVE-8573.2.patch


 I found the following vectorization tests are not deterministic:
 vectorization_16.q
 vectorization_short_regress.q
 vector_distinct_2.q
 vector_groupby_3.q
 vector_mapjoin_reduce.q



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8573) Fix some non-deterministic vectorization tests

2014-10-24 Thread Jimmy Xiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HIVE-8573:
--
Status: Open  (was: Patch Available)

 Fix some non-deterministic vectorization tests
 --

 Key: HIVE-8573
 URL: https://issues.apache.org/jira/browse/HIVE-8573
 Project: Hive
  Issue Type: Test
  Components: Tests
Affects Versions: 0.15.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Attachments: HIVE-8573.1.patch, HIVE-8573.2.patch


 I found the following vectorization tests are not deterministic:
 vectorization_16.q
 vectorization_short_regress.q
 vector_distinct_2.q
 vector_groupby_3.q
 vector_mapjoin_reduce.q



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 25550: HIVE-8021 CBO: support CTAS and insert ... select

2014-10-24 Thread John Pullokkaran


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25550/#review58290
---



ql/src/test/queries/clientpositive/ctas_colname.q
https://reviews.apache.org/r/25550/#comment99237

Why this change


- John Pullokkaran


On Oct. 23, 2014, 9:11 p.m., Sergey Shelukhin wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25550/
 ---
 
 (Updated Oct. 23, 2014, 9:11 p.m.)
 
 
 Review request for hive, Ashutosh Chauhan and John Pullokkaran.
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 see JIRA
 
 
 Diffs
 -
 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteParseContextGenerator.java
  dee7d7e 
   ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 
 37cbf7f 
   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java d8c50e3 
   ql/src/test/queries/clientpositive/cbo_correctness.q 4d8f156 
   ql/src/test/queries/clientpositive/ctas_colname.q 5322626 
   ql/src/test/queries/clientpositive/decimal_serde.q cf3a86c 
   ql/src/test/queries/clientpositive/insert0.q PRE-CREATION 
   ql/src/test/results/clientpositive/ctas_colname.q.out 97dacf6 
   ql/src/test/results/clientpositive/decimal_serde.q.out e461c2e 
   ql/src/test/results/clientpositive/insert0.q.out PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/25550/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Sergey Shelukhin

[jira] [Commented] (HIVE-8528) Add remote Spark client to Hive [Spark Branch]

2014-10-24 Thread Marcelo Vanzin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183060#comment-14183060
 ] 

Marcelo Vanzin commented on HIVE-8528:
--

Actually, Left, that's a good point, this might need some end-user 
documentation since the recommended setup is to have a full Spark installation 
available on the HS2 node. I don't know if the plan is to somehow package that 
with HS2 or leave it as a configuration step.

 Add remote Spark client to Hive [Spark Branch]
 --

 Key: HIVE-8528
 URL: https://issues.apache.org/jira/browse/HIVE-8528
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Marcelo Vanzin
Assignee: Marcelo Vanzin
 Fix For: spark-branch

 Attachments: HIVE-8528.1-spark-client.patch, HIVE-8528.1-spark.patch, 
 HIVE-8528.2-spark.patch, HIVE-8528.2-spark.patch, HIVE-8528.3-spark.patch


 For the time being, at least, we've decided to build the Spark client (see 
 SPARK-3215) inside Hive. This task tracks merging the ongoing work into the 
 Spark branch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8444) update pom to junit 4.11

2014-10-24 Thread Jason Dere (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183067#comment-14183067
 ] 

Jason Dere commented on HIVE-8444:
--

[~brocknoland] any issue with bumping up to Junit 4.11?

 update pom to junit 4.11
 

 Key: HIVE-8444
 URL: https://issues.apache.org/jira/browse/HIVE-8444
 Project: Hive
  Issue Type: Bug
  Components: Testing Infrastructure
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-8444.1.patch, HIVE-8444.2.patch


 allows deterministic ordering of tests



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8532) return code of source xxx clause is missing

2014-10-24 Thread vitthal (Suhas) Gogate (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183072#comment-14183072
 ] 

vitthal (Suhas) Gogate commented on HIVE-8532:
--

Thanks [~navis]! 

 return code of source xxx clause is missing
 -

 Key: HIVE-8532
 URL: https://issues.apache.org/jira/browse/HIVE-8532
 Project: Hive
  Issue Type: Bug
  Components: Clients
Affects Versions: 0.12.0, 0.13.1
Reporter: Gordon Wang
 Fix For: 0.15.0

 Attachments: HIVE-8532.patch


 When executing source hql-file  clause, hive client driver does not catch 
 the return code of this command.
 This behaviour causes an issue when running hive query in Oozie workflow.
 When the source clause is put into a Oozie workflow, Oozie can not get the 
 return code of this command. Thus, Oozie consider the source clause as 
 successful all the time. 
 So, when the source clause fails, the hive query does not abort and the 
 oozie workflow does not abort either.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8528) Add remote Spark client to Hive [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-8528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183073#comment-14183073
 ] 

Xuefu Zhang commented on HIVE-8528:
---

[~vanzin], I thought spark installation on HS2 host was optional. Let me know 
if this has changed.

 Add remote Spark client to Hive [Spark Branch]
 --

 Key: HIVE-8528
 URL: https://issues.apache.org/jira/browse/HIVE-8528
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Marcelo Vanzin
Assignee: Marcelo Vanzin
 Fix For: spark-branch

 Attachments: HIVE-8528.1-spark-client.patch, HIVE-8528.1-spark.patch, 
 HIVE-8528.2-spark.patch, HIVE-8528.2-spark.patch, HIVE-8528.3-spark.patch


 For the time being, at least, we've decided to build the Spark client (see 
 SPARK-3215) inside Hive. This task tracks merging the ongoing work into the 
 Spark branch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-6165) Unify HivePreparedStatement from jdbc:hive and jdbc:hive2


 [ 
https://issues.apache.org/jira/browse/HIVE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-6165:
--
Attachment: HIVE-6165.2.patch.txt

Reload the same patch to re-run test.

 Unify HivePreparedStatement from jdbc:hive and jdbc:hive2
 -

 Key: HIVE-6165
 URL: https://issues.apache.org/jira/browse/HIVE-6165
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, JDBC
Reporter: Helmut Zechmann
Priority: Minor
 Attachments: HIVE-6165.1.patch.txt, HIVE-6165.1.patch.txt, 
 HIVE-6165.2.patch, HIVE-6165.2.patch.txt, HIVE-6165.2.patch.txt


 org.apache.hadoop.hive.jdbc.HivePreparedStatement.class from the hive jdbc 
 driver and org.apache.hive.jdbc.HivePreparedStatement.class from the hive2 
 jdbc drivers contain lots of duplicate code. 
 Especially hive-HivePreparedStatement supports setObject, while the hive2 
 version does not.
 Share more code between the two to avoid duplicate work and to make sure that 
 both support the broadest possible feature set.
 CLEAR LIBRARY CACHE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8528) Add remote Spark client to Hive [Spark Branch]

2014-10-24 Thread Marcelo Vanzin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183079#comment-14183079
 ] 

Marcelo Vanzin commented on HIVE-8528:
--

It is optional, but I don't really think we should encourage that. A full 
install should be the recommended setup.

 Add remote Spark client to Hive [Spark Branch]
 --

 Key: HIVE-8528
 URL: https://issues.apache.org/jira/browse/HIVE-8528
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Marcelo Vanzin
Assignee: Marcelo Vanzin
 Fix For: spark-branch

 Attachments: HIVE-8528.1-spark-client.patch, HIVE-8528.1-spark.patch, 
 HIVE-8528.2-spark.patch, HIVE-8528.2-spark.patch, HIVE-8528.3-spark.patch


 For the time being, at least, we've decided to build the Spark client (see 
 SPARK-3215) inside Hive. This task tracks merging the ongoing work into the 
 Spark branch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8528) Add remote Spark client to Hive [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-8528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183081#comment-14183081
 ] 

Xuefu Zhang commented on HIVE-8528:
---

Got it. Thanks for the clarification.

 Add remote Spark client to Hive [Spark Branch]
 --

 Key: HIVE-8528
 URL: https://issues.apache.org/jira/browse/HIVE-8528
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Marcelo Vanzin
Assignee: Marcelo Vanzin
 Fix For: spark-branch

 Attachments: HIVE-8528.1-spark-client.patch, HIVE-8528.1-spark.patch, 
 HIVE-8528.2-spark.patch, HIVE-8528.2-spark.patch, HIVE-8528.3-spark.patch


 For the time being, at least, we've decided to build the Spark client (see 
 SPARK-3215) inside Hive. This task tracks merging the ongoing work into the 
 Spark branch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 26854: HIVE-2573 Create per-session function registry

2014-10-24 Thread Jason Dere



 On Oct. 23, 2014, 9:50 p.m., Jason Dere wrote:
  ql/src/java/org/apache/hadoop/hive/ql/exec/Registry.java, line 465
  https://reviews.apache.org/r/26854/diff/1-3/?file=723909#file723909line465
 
  There is no longer a way to query the metastore for UDFs apart from the 
  static initialization. So if one CLI user creates a permanent UDF, another 
  user on CLI, or HS2, will not be able to use that new UDF if the 2nd CLI or 
  HS2 was initialized before this UDF was created.
 
 Navis Ryu wrote:
 Permanent functions (persistent function seemed better name, imho) are 
 registered to system registry, which is shared to all clients. So if one user 
 creates new permanent function, it's shared to all clients. The time a user 
 accesses the function, the class is loaded with required resources and 
 registered to session registry as a temporary function.

So this would work if all clients are using hiveserver2, because all clients in 
this scenario would share the same system registry.
But if one or more clients are using the Hive CLI, any persistent UDFs 
created/dropped by this CLI client would not be reflected in the other clients 
(or HS2), since it's a different process/system registry.


- Jason


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26854/#review57952
---


On Oct. 23, 2014, 12:20 a.m., Jason Dere wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/26854/
 ---
 
 (Updated Oct. 23, 2014, 12:20 a.m.)
 
 
 Review request for hive, Navis Ryu and Thejas Nair.
 
 
 Bugs: HIVE-2573
 https://issues.apache.org/jira/browse/HIVE-2573
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Small updates to Navis' changes:
 - session registry doesn't lookup metastore for UDFs
 - my feedback from Navis' original patch
 - metastore udfs should not be considered native. This allows them to be 
 added/removed from registry
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 9ac540e 
   ql/src/java/org/apache/hadoop/hive/ql/exec/CommonFunctionInfo.java 93c15c0 
   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionInfo.java 074255b 
   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 08e1136 
   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionTask.java 569c125 
   ql/src/java/org/apache/hadoop/hive/ql/exec/Registry.java PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/exec/WindowFunctionInfo.java efecb05 
   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java b900627 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/SqlFunctionConverter.java
  31f906a 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java 
 e43d39f 
   ql/src/java/org/apache/hadoop/hive/ql/parse/FunctionSemanticAnalyzer.java 
 22e5b47 
   ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java af633cb 
   ql/src/test/org/apache/hadoop/hive/ql/parse/TestMacroSemanticAnalyzer.java 
 46f8052 
   ql/src/test/queries/clientnegative/drop_native_udf.q ae047bb 
   ql/src/test/results/clientnegative/create_function_nonexistent_class.q.out 
 c7405ed 
   ql/src/test/results/clientnegative/create_function_nonudf_class.q.out 
 d0dd50a 
   ql/src/test/results/clientnegative/drop_native_udf.q.out 9f0eaa5 
   service/src/test/org/apache/hadoop/hive/service/TestHiveServerSessions.java 
 fd38907 
 
 Diff: https://reviews.apache.org/r/26854/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Jason Dere

[jira] [Updated] (HIVE-8528) Add remote Spark client to Hive [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-8528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-8528:
--
Labels: TODOC-SPARK  (was: )

 Add remote Spark client to Hive [Spark Branch]
 --

 Key: HIVE-8528
 URL: https://issues.apache.org/jira/browse/HIVE-8528
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Marcelo Vanzin
Assignee: Marcelo Vanzin
  Labels: TODOC-SPARK
 Fix For: spark-branch

 Attachments: HIVE-8528.1-spark-client.patch, HIVE-8528.1-spark.patch, 
 HIVE-8528.2-spark.patch, HIVE-8528.2-spark.patch, HIVE-8528.3-spark.patch


 For the time being, at least, we've decided to build the Spark client (see 
 SPARK-3215) inside Hive. This task tracks merging the ongoing work into the 
 Spark branch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 26854: HIVE-2573 Create per-session function registry

2014-10-24 Thread Jason Dere


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26854/
---

(Updated Oct. 24, 2014, 5:34 p.m.)


Review request for hive, Navis Ryu and Thejas Nair.


Changes
---

Updating with HIVE-2573.10.patch.txt from Navis


Bugs: HIVE-2573
https://issues.apache.org/jira/browse/HIVE-2573


Repository: hive-git


Description
---

Small updates to Navis' changes:
- session registry doesn't lookup metastore for UDFs
- my feedback from Navis' original patch
- metastore udfs should not be considered native. This allows them to be 
added/removed from registry


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/common/JavaUtils.java 9aa917c 
  
metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java
 88b0791 
  ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 9ac540e 
  ql/src/java/org/apache/hadoop/hive/ql/exec/CommonFunctionInfo.java 93c15c0 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionInfo.java 074255b 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java e43a328 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionTask.java 569c125 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Registry.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 7443f8a 
  ql/src/java/org/apache/hadoop/hive/ql/exec/WindowFunctionInfo.java efecb05 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java b900627 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Partition.java 13277a9 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java 211ab6c 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/IndexUtils.java e2768ff 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/SqlFunctionConverter.java
 793f117 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java 
1796b7b 
  ql/src/java/org/apache/hadoop/hive/ql/parse/FunctionSemanticAnalyzer.java 
22e5b47 
  ql/src/java/org/apache/hadoop/hive/ql/parse/IndexUpdater.java 2b239ab 
  ql/src/java/org/apache/hadoop/hive/ql/session/SessionConf.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java af633cb 
  ql/src/test/org/apache/hadoop/hive/ql/parse/TestMacroSemanticAnalyzer.java 
46f8052 
  ql/src/test/queries/clientnegative/drop_native_udf.q ae047bb 
  ql/src/test/results/clientnegative/create_function_nonexistent_class.q.out 
c7405ed 
  ql/src/test/results/clientnegative/create_function_nonudf_class.q.out d0dd50a 
  ql/src/test/results/clientnegative/drop_native_udf.q.out 9f0eaa5 
  service/src/test/org/apache/hadoop/hive/service/TestHiveServerSessions.java 
fd38907 

Diff: https://reviews.apache.org/r/26854/diff/


Testing
---


Thanks,

Jason Dere

[jira] [Commented] (HIVE-8486) TPC-DS Query 96 parallelism is not set correcly


[ 
https://issues.apache.org/jira/browse/HIVE-8486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183107#comment-14183107
 ] 

Xuefu Zhang commented on HIVE-8486:
---

Since HIVE-8496 is resolved, parallelism on shuffle is no longer a problem. 
[~csun], please create a separate JIRA to track the spill issue you described. 
I closing this ticket shortly.

 TPC-DS Query 96 parallelism is not set correcly
 ---

 Key: HIVE-8486
 URL: https://issues.apache.org/jira/browse/HIVE-8486
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Brock Noland
Assignee: Chao

 When we run the query on a 20B we only have a parallelism factor of 1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HIVE-8486) TPC-DS Query 96 parallelism is not set correcly


 [ 
https://issues.apache.org/jira/browse/HIVE-8486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang resolved HIVE-8486.
---
   Resolution: Fixed
Fix Version/s: spark-branch

Fixed via HIVE-8496.

 TPC-DS Query 96 parallelism is not set correcly
 ---

 Key: HIVE-8486
 URL: https://issues.apache.org/jira/browse/HIVE-8486
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Brock Noland
Assignee: Chao
 Fix For: spark-branch


 When we run the query on a 20B we only have a parallelism factor of 1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HIVE-7731) Incorrect result returned when a map work has multiple downstream reduce works [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-7731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang resolved HIVE-7731.
---
   Resolution: Fixed
Fix Version/s: spark-branch

Fixed via HIVE-8118.

 Incorrect result returned when a map work has multiple downstream reduce 
 works [Spark Branch]
 -

 Key: HIVE-7731
 URL: https://issues.apache.org/jira/browse/HIVE-7731
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Assignee: Chao
 Fix For: spark-branch


 Encountered when running on spark. Suppose we have three tables:
 {noformat}
 table1(x int, y int);
 table2(x int);
 table3(x int);
 {noformat}
 I run the following query:
 {noformat}
 from table1
 insert overwrite table table2 select x group by x
 insert overwrite table table3 select y group by y;
 {noformat}
 The query generates 1 map and 2 reduces. The map operator has 2 RS, so I 
 suppose it has output for both reduces.
 The problem is all (incorrect) results go to table2 and table3 is empty.
 I tried the same query on MR and it gives correct results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HIVE-8208) Multi-table insertion optimization #1: don't always break operator tree. [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-8208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang resolved HIVE-8208.
---
   Resolution: Won't Fix
Fix Version/s: spark-branch

With HIVE-8118, this is no longer needed.

 Multi-table insertion optimization #1: don't always break operator tree. 
 [Spark Branch]
 ---

 Key: HIVE-8208
 URL: https://issues.apache.org/jira/browse/HIVE-8208
 Project: Hive
  Issue Type: Improvement
Reporter: Chao
 Fix For: spark-branch


 Currently, with the current patch of multi-table insertion, it will break 
 whenever there exists one TableScanOperator that can leads to multiple 
 FileSinkOperators. Then, it identifies the lowest common ancestor (LCA), and 
 breaks the tree there, creating same number of child SparkTasks as the number 
 of FileSinkOperators.
 However, in the following situation it's better not to break the operator 
 tree:
 Of all the paths from these FileSinkOperators to the LCA, if 
 ReduceSinkOperator only exist in 0 or 1 path of them.
 In this case, we can do it in one spark job, and no need to break the 
 operator tree.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HIVE-8215) Multi-table insertion optimization #3: use 1+1 tasks instead of 1+N tasks [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-8215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang resolved HIVE-8215.
---
Resolution: Won't Fix

With HIVE-8118, this is no longer needed.

 Multi-table insertion optimization #3: use 1+1 tasks instead of 1+N tasks 
 [Spark Branch]
 

 Key: HIVE-8215
 URL: https://issues.apache.org/jira/browse/HIVE-8215
 Project: Hive
  Issue Type: Improvement
  Components: Spark
Reporter: Chao

 Currently, for multi-table insertion it generates 1+N tasks - 1 is the task 
 that generates input, and N are the insert queries that read from the input 
 and write to separate output tables.
 In order to make these N tasks run in parallel, we rely on 
 {{hive.exec.parallel}} to be set to {{true}}. In this patch, we propose an 
 alternative approach, which is to combine these N tasks into one single task, 
 which contains N separate operator trees, which in execution leads to N 
 result RDDs. We then may be able to execute these N RDDs in parallel inside 
 Spark, without needing {{hive.exec.parallel}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HIVE-8209) Multi-table insertion optimization #2: use separate context [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-8209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang resolved HIVE-8209.
---
Resolution: Won't Fix

With HIVE-8118, this is no longer needed.

 Multi-table insertion optimization #2: use separate context [Spark Branch]
 --

 Key: HIVE-8209
 URL: https://issues.apache.org/jira/browse/HIVE-8209
 Project: Hive
  Issue Type: Improvement
  Components: Spark
Reporter: Chao
Priority: Minor

 Currently, the multi-table insertion patch uses {{GenSparkProcContext}} and 
 added some states of its own. It's better to use a separate context only for 
 the purpose of handling multi-table insertion. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7387) Guava version conflict between hadoop and spark [Spark-Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-7387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183150#comment-14183150
 ] 

Xuefu Zhang commented on HIVE-7387:
---

With SPARK-2848, shading guava in Spark, this is no longer a problem in Hive.

 Guava version conflict between hadoop and spark [Spark-Branch]
 --

 Key: HIVE-7387
 URL: https://issues.apache.org/jira/browse/HIVE-7387
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
 Attachments: HIVE-7387-spark.patch


 The guava conflict happens in hive driver compile stage, as in the follow 
 exception stacktrace, conflict happens while initiate spark RDD in 
 SparkClient, hive driver take both guava 11 from hadoop classpath and spark 
 assembly jar which contains guava 14 classes in its classpath, spark invoked 
 HashFunction.hasInt which method does not exists in guava 11 version, obvious 
 the guava 11 version HashFunction is loaded into the JVM, which lead to a  
 NoSuchMethodError during initiate spark RDD.
 {code}
 java.lang.NoSuchMethodError: 
 com.google.common.hash.HashFunction.hashInt(I)Lcom/google/common/hash/HashCode;
   at 
 org.apache.spark.util.collection.OpenHashSet.org$apache$spark$util$collection$OpenHashSet$$hashcode(OpenHashSet.scala:261)
   at 
 org.apache.spark.util.collection.OpenHashSet$mcI$sp.getPos$mcI$sp(OpenHashSet.scala:165)
   at 
 org.apache.spark.util.collection.OpenHashSet$mcI$sp.contains$mcI$sp(OpenHashSet.scala:102)
   at 
 org.apache.spark.util.SizeEstimator$$anonfun$visitArray$2.apply$mcVI$sp(SizeEstimator.scala:214)
   at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
   at 
 org.apache.spark.util.SizeEstimator$.visitArray(SizeEstimator.scala:210)
   at 
 org.apache.spark.util.SizeEstimator$.visitSingleObject(SizeEstimator.scala:169)
   at 
 org.apache.spark.util.SizeEstimator$.org$apache$spark$util$SizeEstimator$$estimate(SizeEstimator.scala:161)
   at 
 org.apache.spark.util.SizeEstimator$.estimate(SizeEstimator.scala:155)
   at org.apache.spark.storage.MemoryStore.putValues(MemoryStore.scala:75)
   at org.apache.spark.storage.MemoryStore.putValues(MemoryStore.scala:92)
   at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:661)
   at org.apache.spark.storage.BlockManager.put(BlockManager.scala:546)
   at 
 org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:812)
   at 
 org.apache.spark.broadcast.HttpBroadcast.init(HttpBroadcast.scala:52)
   at 
 org.apache.spark.broadcast.HttpBroadcastFactory.newBroadcast(HttpBroadcastFactory.scala:35)
   at 
 org.apache.spark.broadcast.HttpBroadcastFactory.newBroadcast(HttpBroadcastFactory.scala:29)
   at 
 org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62)
   at org.apache.spark.SparkContext.broadcast(SparkContext.scala:776)
   at org.apache.spark.rdd.HadoopRDD.init(HadoopRDD.scala:112)
   at org.apache.spark.SparkContext.hadoopRDD(SparkContext.scala:527)
   at 
 org.apache.spark.api.java.JavaSparkContext.hadoopRDD(JavaSparkContext.scala:307)
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkClient.createRDD(SparkClient.java:204)
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkClient.execute(SparkClient.java:167)
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:32)
   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:159)
   at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
   at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:72)
 {code}
 NO PRECOMMIT TESTS. This is for spark branch only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HIVE-7387) Guava version conflict between hadoop and spark [Spark-Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-7387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang resolved HIVE-7387.
---
Resolution: Not a Problem

 Guava version conflict between hadoop and spark [Spark-Branch]
 --

 Key: HIVE-7387
 URL: https://issues.apache.org/jira/browse/HIVE-7387
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
 Attachments: HIVE-7387-spark.patch


 The guava conflict happens in hive driver compile stage, as in the follow 
 exception stacktrace, conflict happens while initiate spark RDD in 
 SparkClient, hive driver take both guava 11 from hadoop classpath and spark 
 assembly jar which contains guava 14 classes in its classpath, spark invoked 
 HashFunction.hasInt which method does not exists in guava 11 version, obvious 
 the guava 11 version HashFunction is loaded into the JVM, which lead to a  
 NoSuchMethodError during initiate spark RDD.
 {code}
 java.lang.NoSuchMethodError: 
 com.google.common.hash.HashFunction.hashInt(I)Lcom/google/common/hash/HashCode;
   at 
 org.apache.spark.util.collection.OpenHashSet.org$apache$spark$util$collection$OpenHashSet$$hashcode(OpenHashSet.scala:261)
   at 
 org.apache.spark.util.collection.OpenHashSet$mcI$sp.getPos$mcI$sp(OpenHashSet.scala:165)
   at 
 org.apache.spark.util.collection.OpenHashSet$mcI$sp.contains$mcI$sp(OpenHashSet.scala:102)
   at 
 org.apache.spark.util.SizeEstimator$$anonfun$visitArray$2.apply$mcVI$sp(SizeEstimator.scala:214)
   at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
   at 
 org.apache.spark.util.SizeEstimator$.visitArray(SizeEstimator.scala:210)
   at 
 org.apache.spark.util.SizeEstimator$.visitSingleObject(SizeEstimator.scala:169)
   at 
 org.apache.spark.util.SizeEstimator$.org$apache$spark$util$SizeEstimator$$estimate(SizeEstimator.scala:161)
   at 
 org.apache.spark.util.SizeEstimator$.estimate(SizeEstimator.scala:155)
   at org.apache.spark.storage.MemoryStore.putValues(MemoryStore.scala:75)
   at org.apache.spark.storage.MemoryStore.putValues(MemoryStore.scala:92)
   at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:661)
   at org.apache.spark.storage.BlockManager.put(BlockManager.scala:546)
   at 
 org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:812)
   at 
 org.apache.spark.broadcast.HttpBroadcast.init(HttpBroadcast.scala:52)
   at 
 org.apache.spark.broadcast.HttpBroadcastFactory.newBroadcast(HttpBroadcastFactory.scala:35)
   at 
 org.apache.spark.broadcast.HttpBroadcastFactory.newBroadcast(HttpBroadcastFactory.scala:29)
   at 
 org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62)
   at org.apache.spark.SparkContext.broadcast(SparkContext.scala:776)
   at org.apache.spark.rdd.HadoopRDD.init(HadoopRDD.scala:112)
   at org.apache.spark.SparkContext.hadoopRDD(SparkContext.scala:527)
   at 
 org.apache.spark.api.java.JavaSparkContext.hadoopRDD(JavaSparkContext.scala:307)
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkClient.createRDD(SparkClient.java:204)
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkClient.execute(SparkClient.java:167)
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:32)
   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:159)
   at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
   at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:72)
 {code}
 NO PRECOMMIT TESTS. This is for spark branch only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8561) Expose Hive optiq operator tree to be able to support other sql on hadoop query engines

2014-10-24 Thread Laljo John Pullokkaran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183151#comment-14183151
 ] 

Laljo John Pullokkaran commented on HIVE-8561:
--

Na Yang,

 If i understand correctly, goal of this patch is to use  Hive for query 
parsing, resolving, cost based optimization and use Drill as the execution 
engine. If my guess is right this patch makes Hive's Optiq Op tree a public 
interface. The Hive's Optiq Op tree is not meant to be a public interface and 
it would go through many changes as we add more to CBO  support for more 
operators.

Why can't Drill be plugged in as another execution engine just like MR, TEZ, 
Spark? 

 Expose Hive optiq operator tree to be able to support other sql on hadoop 
 query engines
 ---

 Key: HIVE-8561
 URL: https://issues.apache.org/jira/browse/HIVE-8561
 Project: Hive
  Issue Type: Task
  Components: CBO
Affects Versions: 0.14.0
Reporter: Na Yang
Assignee: Na Yang
 Attachments: HIVE-8561.patch


 Hive-0.14 added cost based optimization and optiq operator tree is created 
 for select queries. However, the optiq operator tree is not visible from 
 outside and hard to be used by other Sql on Hadoop query engine such as 
 apache Drill. To be able to allow drill to access the hive optiq operator 
 tree, we need to add a public api to return the hive optiq operator tree.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8533) Enable all q-tests for multi-insertion [Spark Branch]

2014-10-24 Thread Chao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao updated HIVE-8533:
---
Attachment: HIVE-8533.1-spark.patch

Spark tests enabled in this patch:
{noformat}
auto_smb_mapjoin_14.q
groupby10.q
groupby11.q
groupby3_map_skew.q
groupby7.q
groupby7_noskew_multi_single_reducer.q
groupby8.q
groupby8_map.q
groupby8_map_skew.q
groupby8_noskew.q
groupby9.q
groupby_complex_types.q
groupby_complex_types_multi_single_reducer.q
groupby_multi_insert_common_distinct.q
pcr.q
smb_mapjoin_13.q
smb_mapjoin_15.q
smb_mapjoin_16.q
table_access_keys_stats.q
{noformat}

The result for {{groupby_complex_types_multi_single_reducer.q}} is different 
from MR's, but this is because it uses {{limit 10}}.

The result for {{groupby3_map_skew.q}} also is slightly different:
{noformat}
 130091.0  260.182 256.10355987055016  98.00.0 
142.92680950752379  143.06995106518903  20428.07288 20469.0109
---
 130091.0  260.182 256.10355987055016  98.00.0 
 142.9268095075238   143.06995106518906  20428.07288 20469.0109
{noformat}

I think this is just something about decimal precision, not a correctness issue.

 Enable all q-tests for multi-insertion [Spark Branch]
 -

 Key: HIVE-8533
 URL: https://issues.apache.org/jira/browse/HIVE-8533
 Project: Hive
  Issue Type: Test
  Components: Spark
Reporter: Chao
Assignee: Chao
 Attachments: HIVE-8533.1-spark.patch


 As HIVE-8436 is done, we should be able to enable all multi-insertion related 
 tests. This JIRA is created to track this and record any potential issue 
 encountered. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HIVE-7916) Snappy-java error when running hive query on spark [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-7916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang resolved HIVE-7916.
---
Resolution: Not a Problem

With latest Spark-Hive integration, the problem seems disappeared. 

 Snappy-java error when running hive query on spark [Spark Branch]
 -

 Key: HIVE-7916
 URL: https://issues.apache.org/jira/browse/HIVE-7916
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Xuefu Zhang
  Labels: Spark-M1

 Recently spark branch upgraded its dependency on Spark to 1.1.0-SNAPSHOT. 
 While the new version addressed some lib conflicts (such as guava), I'm 
 afraid that it also introduced new problems. The following might be one, when 
 I set the master URL to be a spark standalone cluster:
 {code}
 hive set hive.execution.engine=spark;
 hive set spark.serializer=org.apache.spark.serializer.KryoSerializer;
 hive set spark.master=spark://xzdt:7077;
 hive select name, avg(value) from dec group by name;
 14/08/28 16:41:52 INFO storage.MemoryStore: Block broadcast_0 stored as 
 values in memory (estimated size 333.0 KB, free 128.0 MB)
 java.lang.reflect.InvocationTargetException
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:601)
 at org.xerial.snappy.SnappyLoader.loadNativeLibrary(SnappyLoader.java:317)
 at org.xerial.snappy.SnappyLoader.load(SnappyLoader.java:219)
 at org.xerial.snappy.Snappy.clinit(Snappy.java:44)
 at org.xerial.snappy.SnappyOutputStream.init(SnappyOutputStream.java:79)
 at 
 org.apache.spark.io.SnappyCompressionCodec.compressedOutputStream(CompressionCodec.scala:124)
 at 
 org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:207)
 at 
 org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:83)
 at 
 org.apache.spark.broadcast.TorrentBroadcast.init(TorrentBroadcast.scala:68)
 at 
 org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:36)
 at 
 org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29)
 at 
 org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62)
 at org.apache.spark.SparkContext.broadcast(SparkContext.scala:809)
 at org.apache.spark.rdd.HadoopRDD.init(HadoopRDD.scala:116)
 at org.apache.spark.SparkContext.hadoopRDD(SparkContext.scala:541)
 at 
 org.apache.spark.api.java.JavaSparkContext.hadoopRDD(JavaSparkContext.scala:318)
 at 
 org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generateRDD(SparkPlanGenerator.java:160)
 at 
 org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generate(SparkPlanGenerator.java:88)
 at 
 org.apache.hadoop.hive.ql.exec.spark.SparkClient.execute(SparkClient.java:156)
 at 
 org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.submit(SparkSessionImpl.java:52)
 at 
 org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:77)
 at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:161)
 at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
 at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1537)
 at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1304)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1116)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:940)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:930)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:246)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:198)
 at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:408)
 at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:781)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:601)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
 Caused by: java.lang.UnsatisfiedLinkError: no snappyjava in java.library.path
 at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1860)
 at java.lang.Runtime.loadLibrary0(Runtime.java:845)
 at java.lang.System.loadLibrary(System.java:1084)
 at

[jira] [Updated] (HIVE-8533) Enable all q-tests for multi-insertion [Spark Branch]

2014-10-24 Thread Chao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao updated HIVE-8533:
---
Status: Patch Available  (was: Open)

 Enable all q-tests for multi-insertion [Spark Branch]
 -

 Key: HIVE-8533
 URL: https://issues.apache.org/jira/browse/HIVE-8533
 Project: Hive
  Issue Type: Test
  Components: Spark
Reporter: Chao
Assignee: Chao
 Attachments: HIVE-8533.1-spark.patch


 As HIVE-8436 is done, we should be able to enable all multi-insertion related 
 tests. This JIRA is created to track this and record any potential issue 
 encountered. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HIVE-8426) paralle.q assert failed.[Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-8426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang resolved HIVE-8426.
---
   Resolution: Fixed
Fix Version/s: spark-branch

Fixed via HIVE-8362.

 paralle.q assert failed.[Spark Branch]
 --

 Key: HIVE-8426
 URL: https://issues.apache.org/jira/browse/HIVE-8426
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
 Fix For: spark-branch


 parallel.q failed to assert output in qtests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Review Request 27148: HIVE-8533 - Enable all q-tests for multi-insertion [Spark Branch]

2014-10-24 Thread Chao Sun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27148/
---

Review request for hive, Brock Noland, Szehon Ho, and Xuefu Zhang.


Bugs: HIVE-8533
https://issues.apache.org/jira/browse/HIVE-8533


Repository: hive-git


Description
---

As HIVE-8436 is done, we should be able to enable all multi-insertion related 
tests. This JIRA is created to track this and record any potential issue 
encountered.


Diffs
-

  itests/src/test/resources/testconfiguration.properties db8866d 
  ql/src/test/results/clientpositive/spark/auto_smb_mapjoin_14.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby10.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby11.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby3_map_skew.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby7.q.out PRE-CREATION 
  
ql/src/test/results/clientpositive/spark/groupby7_noskew_multi_single_reducer.q.out
 PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby8.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby8_map.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby8_map_skew.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby8_noskew.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby9.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby_complex_types.q.out 
PRE-CREATION 
  
ql/src/test/results/clientpositive/spark/groupby_complex_types_multi_single_reducer.q.out
 PRE-CREATION 
  
ql/src/test/results/clientpositive/spark/groupby_multi_insert_common_distinct.q.out
 PRE-CREATION 
  ql/src/test/results/clientpositive/spark/pcr.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/smb_mapjoin_13.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/smb_mapjoin_15.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/smb_mapjoin_16.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/table_access_keys_stats.q.out 
PRE-CREATION 

Diff: https://reviews.apache.org/r/27148/diff/


Testing
---

auto_smb_mapjoin_14.q
groupby10.q
groupby11.q
groupby3_map_skew.q
groupby7.q
groupby7_noskew_multi_single_reducer.q
groupby8.q
groupby8_map.q
groupby8_map_skew.q
groupby8_noskew.q
groupby9.q
groupby_complex_types.q
groupby_complex_types_multi_single_reducer.q
groupby_multi_insert_common_distinct.q
pcr.q
smb_mapjoin_13.q
smb_mapjoin_15.q
smb_mapjoin_16.q
table_access_keys_stats.q


Thanks,

Chao Sun

Re: Review Request 27148: HIVE-8533 - Enable all q-tests for multi-insertion [Spark Branch]

2014-10-24 Thread Chao Sun


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27148/
---

(Updated Oct. 24, 2014, 6:03 p.m.)


Review request for hive, Brock Noland, Szehon Ho, and Xuefu Zhang.


Bugs: HIVE-8533
https://issues.apache.org/jira/browse/HIVE-8533


Repository: hive-git


Description
---

As HIVE-8436 is done, we should be able to enable all multi-insertion related 
tests. This JIRA is created to track this and record any potential issue 
encountered.


Diffs
-

  itests/src/test/resources/testconfiguration.properties db8866d 
  ql/src/test/results/clientpositive/spark/auto_smb_mapjoin_14.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby10.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby11.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby3_map_skew.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby7.q.out PRE-CREATION 
  
ql/src/test/results/clientpositive/spark/groupby7_noskew_multi_single_reducer.q.out
 PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby8.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby8_map.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby8_map_skew.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby8_noskew.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby9.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/groupby_complex_types.q.out 
PRE-CREATION 
  
ql/src/test/results/clientpositive/spark/groupby_complex_types_multi_single_reducer.q.out
 PRE-CREATION 
  
ql/src/test/results/clientpositive/spark/groupby_multi_insert_common_distinct.q.out
 PRE-CREATION 
  ql/src/test/results/clientpositive/spark/pcr.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/smb_mapjoin_13.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/smb_mapjoin_15.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/smb_mapjoin_16.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/spark/table_access_keys_stats.q.out 
PRE-CREATION 

Diff: https://reviews.apache.org/r/27148/diff/


Testing
---

auto_smb_mapjoin_14.q
groupby10.q
groupby11.q
groupby3_map_skew.q
groupby7.q
groupby7_noskew_multi_single_reducer.q
groupby8.q
groupby8_map.q
groupby8_map_skew.q
groupby8_noskew.q
groupby9.q
groupby_complex_types.q
groupby_complex_types_multi_single_reducer.q
groupby_multi_insert_common_distinct.q
pcr.q
smb_mapjoin_13.q
smb_mapjoin_15.q
smb_mapjoin_16.q
table_access_keys_stats.q


Thanks,

Chao Sun

[jira] [Resolved] (HIVE-8220) Refactor multi-insert code such that plan splitting and task generation are modular and reusable [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang resolved HIVE-8220.
---
Resolution: Won't Fix

No needed with HIVE-8118.

 Refactor multi-insert code such that plan splitting and task generation are 
 modular and reusable [Spark Branch]
 ---

 Key: HIVE-8220
 URL: https://issues.apache.org/jira/browse/HIVE-8220
 Project: Hive
  Issue Type: Improvement
  Components: Spark
Reporter: Xuefu Zhang
  Labels: Spark-M1

 This is a followup for HIVE-7053. Currently the code to split the operator 
 tree and to generate tasks is mingled and thus hard to understand and 
 maintain. Logically the two seems independent. This can be improved by 
 modulizing both. The following might be helpful:
 {code}
 @Override
 protected void generateTaskTree(ListTask? extends Serializable rootTasks, 
 ParseContext pCtx,
   ListTaskMoveWork mvTask, SetReadEntity inputs, SetWriteEntity 
 outputs)
   throws SemanticException {
 // 1. Identify if the plan is for multi-insert and split the plan if necessary
 ListSetOperator operatorSets = multiInsertSplit(...);
 // 2. For each operator set, generate a task.
 for (SetOperator topOps : operatorSets) {
   SparkTask task = generateTask(topOps);
   ...
 }
 // 3. wire up the tasks
 ...
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-8596) HiveServer2 dynamic service discovery: ZK throws too many connections error

Vaibhav Gumashta created HIVE-8596:
--

 Summary: HiveServer2 dynamic service discovery: ZK throws too many 
connections error
 Key: HIVE-8596
 URL: https://issues.apache.org/jira/browse/HIVE-8596
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, JDBC
Affects Versions: 0.14.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
Priority: Critical


{noformat}
2014-10-23 07:55:44,221 - WARN  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193] - Too many 
connections from /172.31.47.11 - max is 60
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8457) MapOperator initialization fails when multiple Spark threads is enabled [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183172#comment-14183172
 ] 

Hive QA commented on HIVE-8457:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12676940/HIVE-8457.2-spark.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6809 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_tez_smb_1
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/260/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/260/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-260/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12676940 - PreCommit-HIVE-SPARK-Build

 MapOperator initialization fails when multiple Spark threads is enabled 
 [Spark Branch]
 --

 Key: HIVE-8457
 URL: https://issues.apache.org/jira/browse/HIVE-8457
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chao
Assignee: Chao
 Attachments: HIVE-8457.1-spark.patch, HIVE-8457.2-spark.patch


 Currently, on the Spark branch, each thread it is bound with a thread-local 
 IOContext, which gets initialized when we generates an input {{HadoopRDD}}, 
 and later used in {{MapOperator}}, {{FilterOperator}}, etc.
 And, given the introduction of HIVE-8118, we may have multiple downstream 
 RDDs that share the same input {{HadoopRDD}}, and we would like to have the 
 {{HadoopRDD}} to be cached, to avoid scanning the same table multiple times. 
 A typical case would be like the following:
 {noformat}
  inputRDD inputRDD
 ||
MT_11MT_12
 ||
RT_1 RT_2
 {noformat}
 Here, {{MT_11}} and {{MT_12}} are {{MapTran}} from a splitted {{MapWork}},
 and {{RT_1}} and {{RT_2}} are two {{ReduceTran}}. Note that, this example is 
 simplified, as we may also have {{ShuffleTran}} between {{MapTran}} and 
 {{ReduceTran}}.
 When multiple Spark threads are running, {{MT_11}} may be executed first, and 
 it will ask for an iterator from the {{HadoopRDD}} will trigger the creation 
 of the iterator, which in turn triggers the initialization of the 
 {{IOContext}} associated with that particular thread.
 *Now, the problem is*: before {{MT_12}} starts executing, it will also ask 
 for an iterator from the
 {{HadoopRDD}}, and since the RDD is already cached, instead of creating a new 
 iterator, it will just fetch it from the cached result. However, *this will 
 skip the initialization of the IOContext associated with this particular 
 thread*. And, when {{MT_12}} starts executing, it will try to initialize the 
 {{MapOperator}}, but since the {{IOContext}} is not initialized, this will 
 fail miserably. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8596) HiveServer2 dynamic service discovery: ZK throws too many connections error


[ 
https://issues.apache.org/jira/browse/HIVE-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183169#comment-14183169
 ] 

Vaibhav Gumashta commented on HIVE-8596:


[~vikram.dixit] This will be an issue with concurrent use. I feel this should 
be resolved in 14. 

cc [~thejas]

 HiveServer2 dynamic service discovery: ZK throws too many connections error
 ---

 Key: HIVE-8596
 URL: https://issues.apache.org/jira/browse/HIVE-8596
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, JDBC
Affects Versions: 0.14.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
Priority: Critical

 {noformat}
 2014-10-23 07:55:44,221 - WARN  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193] - Too 
 many connections from /172.31.47.11 - max is 60
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8583) HIVE-8341 Cleanup Test for hive.script.operator.env.blacklist

2014-10-24 Thread Alan Gates (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183171#comment-14183171
 ] 

Alan Gates commented on HIVE-8583:
--

+1, patch looks fine.

The statement missorted modifiers implies there is a correct order.  If the 
compiler doesn't care about final static private versus private static 
final why should we?

 HIVE-8341 Cleanup  Test for hive.script.operator.env.blacklist
 ---

 Key: HIVE-8583
 URL: https://issues.apache.org/jira/browse/HIVE-8583
 Project: Hive
  Issue Type: Improvement
Reporter: Lars Francke
Assignee: Lars Francke
Priority: Minor
 Attachments: HIVE-8583.1.patch


 [~alangates] added the following in HIVE-8341:
 {code}
 String bl = 
 hconf.get(HiveConf.ConfVars.HIVESCRIPT_ENV_BLACKLIST.toString());
 if (bl != null  bl.length()  0) {
   String[] bls = bl.split(,);
   for (String b : bls) {
 b.replaceAll(., _);
 blackListedConfEntries.add(b);
   }
 }
 {code}
 The {{replaceAll}} call is confusing as its result is not used at all.
 This patch contains the following:
 * Minor style modification (missorted modifiers)
 * Adds reading of default value for HIVESCRIPT_ENV_BLACKLIST
 * Removes replaceAll
 * Lets blackListed take a Configuration job as parameter which allowed me to 
 add a test for this



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8588) sqoop REST endpoint fails to send appropriate JDBC driver to the cluster


 [ 
https://issues.apache.org/jira/browse/HIVE-8588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-8588:
-
Attachment: HIVE-8588.2.patch

 sqoop REST endpoint fails to send appropriate JDBC driver to the cluster
 

 Key: HIVE-8588
 URL: https://issues.apache.org/jira/browse/HIVE-8588
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Affects Versions: 0.14.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
Priority: Critical
 Attachments: HIVE-8588.1.patch, HIVE-8588.2.patch


 This is originally discovered by [~deepesh]
 When running a Sqoop integration test from WebHCat
 {noformat}
 curl --show-error -d command=export -libjars 
 hdfs:///tmp/mysql-connector-java.jar --connect 
 jdbc:mysql://deepesh-c6-1.cs1cloud.internal/sqooptest --username sqoop 
 --password passwd --export-dir /tmp/templeton_test_data/sqoop --table person 
 -d statusdir=sqoop.output -X POST 
 http://deepesh-c6-1.cs1cloud.internal:50111/templeton/v1/sqoop?user.name=hrt_qa;
 {noformat}
 the job is failing with the following error:
 {noformat}
 $ hadoop fs -cat /user/hrt_qa/sqoop.output/stderr
 14/10/15 23:52:53 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5.2.2.0.0-897
 14/10/15 23:52:53 WARN tool.BaseSqoopTool: Setting your password on the 
 command-line is insecure. Consider using -P instead.
 14/10/15 23:52:54 INFO manager.MySQLManager: Preparing to use a MySQL 
 streaming resultset.
 14/10/15 23:52:54 INFO tool.CodeGenTool: Beginning code generation
 14/10/15 23:52:54 ERROR sqoop.Sqoop: Got exception running Sqoop: 
 java.lang.RuntimeException: Could not load db driver class: 
 com.mysql.jdbc.Driver
 java.lang.RuntimeException: Could not load db driver class: 
 com.mysql.jdbc.Driver
   at 
 org.apache.sqoop.manager.SqlManager.makeConnection(SqlManager.java:848)
   at 
 org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:52)
   at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:736)
   at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:759)
   at 
 org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:269)
   at 
 org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:240)
   at 
 org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:226)
   at 
 org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:295)
   at 
 org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1773)
   at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1578)
   at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:96)
   at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:64)
   at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:100)
   at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
   at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
   at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
   at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
 {noformat}
 Note that the Sqoop tar bundle does not contain the JDBC connector jar. I 
 think the problem here maybe that the mysql connector jar added to libjars 
 isn't available to the Sqoop tool which first connects to the database 
 through JDBC driver to collect some table information before running the MR 
 job. libjars will only add the connector jar for the MR job and not the local 
 one.
 NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8596) HiveServer2 dynamic service discovery: ZK throws too many connections error

2014-10-24 Thread Vikram Dixit K (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183197#comment-14183197
 ] 

Vikram Dixit K commented on HIVE-8596:
--

Ack for 0.14.

 HiveServer2 dynamic service discovery: ZK throws too many connections error
 ---

 Key: HIVE-8596
 URL: https://issues.apache.org/jira/browse/HIVE-8596
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, JDBC
Affects Versions: 0.14.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
Priority: Critical

 {noformat}
 2014-10-23 07:55:44,221 - WARN  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193] - Too 
 many connections from /172.31.47.11 - max is 60
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-6165) Unify HivePreparedStatement from jdbc:hive and jdbc:hive2


 [ 
https://issues.apache.org/jira/browse/HIVE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-6165:
-
Description: 
org.apache.hadoop.hive.jdbc.HivePreparedStatement.class from the hive jdbc 
driver and org.apache.hive.jdbc.HivePreparedStatement.class from the hive2 jdbc 
drivers contain lots of duplicate code. 

Especially hive-HivePreparedStatement supports setObject, while the hive2 
version does not.

Share more code between the two to avoid duplicate work and to make sure that 
both support the broadest possible feature set.


  was:
org.apache.hadoop.hive.jdbc.HivePreparedStatement.class from the hive jdbc 
driver and org.apache.hive.jdbc.HivePreparedStatement.class from the hive2 jdbc 
drivers contain lots of duplicate code. 

Especially hive-HivePreparedStatement supports setObject, while the hive2 
version does not.

Share more code between the two to avoid duplicate work and to make sure that 
both support the broadest possible feature set.

CLEAR LIBRARY CACHE


 Unify HivePreparedStatement from jdbc:hive and jdbc:hive2
 -

 Key: HIVE-6165
 URL: https://issues.apache.org/jira/browse/HIVE-6165
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, JDBC
Reporter: Helmut Zechmann
Priority: Minor
 Attachments: HIVE-6165.1.patch.txt, HIVE-6165.1.patch.txt, 
 HIVE-6165.2.patch, HIVE-6165.2.patch.txt, HIVE-6165.2.patch.txt


 org.apache.hadoop.hive.jdbc.HivePreparedStatement.class from the hive jdbc 
 driver and org.apache.hive.jdbc.HivePreparedStatement.class from the hive2 
 jdbc drivers contain lots of duplicate code. 
 Especially hive-HivePreparedStatement supports setObject, while the hive2 
 version does not.
 Share more code between the two to avoid duplicate work and to make sure that 
 both support the broadest possible feature set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8596) HiveServer2 dynamic service discovery: ZK throws too many connections error


 [ 
https://issues.apache.org/jira/browse/HIVE-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-8596:
---
Fix Version/s: 0.14.0

 HiveServer2 dynamic service discovery: ZK throws too many connections error
 ---

 Key: HIVE-8596
 URL: https://issues.apache.org/jira/browse/HIVE-8596
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, JDBC
Affects Versions: 0.14.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
Priority: Critical
 Fix For: 0.14.0


 {noformat}
 2014-10-23 07:55:44,221 - WARN  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193] - Too 
 many connections from /172.31.47.11 - max is 60
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8457) MapOperator initialization fails when multiple Spark threads is enabled [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183203#comment-14183203
 ] 

Xuefu Zhang commented on HIVE-8457:
---

+1

 MapOperator initialization fails when multiple Spark threads is enabled 
 [Spark Branch]
 --

 Key: HIVE-8457
 URL: https://issues.apache.org/jira/browse/HIVE-8457
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chao
Assignee: Chao
 Attachments: HIVE-8457.1-spark.patch, HIVE-8457.2-spark.patch


 Currently, on the Spark branch, each thread it is bound with a thread-local 
 IOContext, which gets initialized when we generates an input {{HadoopRDD}}, 
 and later used in {{MapOperator}}, {{FilterOperator}}, etc.
 And, given the introduction of HIVE-8118, we may have multiple downstream 
 RDDs that share the same input {{HadoopRDD}}, and we would like to have the 
 {{HadoopRDD}} to be cached, to avoid scanning the same table multiple times. 
 A typical case would be like the following:
 {noformat}
  inputRDD inputRDD
 ||
MT_11MT_12
 ||
RT_1 RT_2
 {noformat}
 Here, {{MT_11}} and {{MT_12}} are {{MapTran}} from a splitted {{MapWork}},
 and {{RT_1}} and {{RT_2}} are two {{ReduceTran}}. Note that, this example is 
 simplified, as we may also have {{ShuffleTran}} between {{MapTran}} and 
 {{ReduceTran}}.
 When multiple Spark threads are running, {{MT_11}} may be executed first, and 
 it will ask for an iterator from the {{HadoopRDD}} will trigger the creation 
 of the iterator, which in turn triggers the initialization of the 
 {{IOContext}} associated with that particular thread.
 *Now, the problem is*: before {{MT_12}} starts executing, it will also ask 
 for an iterator from the
 {{HadoopRDD}}, and since the RDD is already cached, instead of creating a new 
 iterator, it will just fetch it from the cached result. However, *this will 
 skip the initialization of the IOContext associated with this particular 
 thread*. And, when {{MT_12}} starts executing, it will try to initialize the 
 {{MapOperator}}, but since the {{IOContext}} is not initialized, this will 
 fail miserably. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-6165) Unify HivePreparedStatement from jdbc:hive and jdbc:hive2


[ 
https://issues.apache.org/jira/browse/HIVE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183213#comment-14183213
 ] 

Gunther Hagleitner commented on HIVE-6165:
--

[~xuefuz] - i've already done that. the last Hive QA entry is for .2 patch (i 
stripped the .txt by mistake).

 Unify HivePreparedStatement from jdbc:hive and jdbc:hive2
 -

 Key: HIVE-6165
 URL: https://issues.apache.org/jira/browse/HIVE-6165
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, JDBC
Reporter: Helmut Zechmann
Priority: Minor
 Attachments: HIVE-6165.1.patch.txt, HIVE-6165.1.patch.txt, 
 HIVE-6165.2.patch, HIVE-6165.2.patch.txt, HIVE-6165.2.patch.txt


 org.apache.hadoop.hive.jdbc.HivePreparedStatement.class from the hive jdbc 
 driver and org.apache.hive.jdbc.HivePreparedStatement.class from the hive2 
 jdbc drivers contain lots of duplicate code. 
 Especially hive-HivePreparedStatement supports setObject, while the hive2 
 version does not.
 Share more code between the two to avoid duplicate work and to make sure that 
 both support the broadest possible feature set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8457) MapOperator initialization fails when multiple Spark threads is enabled [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-8457:
--
   Resolution: Fixed
Fix Version/s: spark-branch
   Status: Resolved  (was: Patch Available)

Patch committed to Spark branch. Thanks to Chao for the contribution.

 MapOperator initialization fails when multiple Spark threads is enabled 
 [Spark Branch]
 --

 Key: HIVE-8457
 URL: https://issues.apache.org/jira/browse/HIVE-8457
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chao
Assignee: Chao
 Fix For: spark-branch

 Attachments: HIVE-8457.1-spark.patch, HIVE-8457.2-spark.patch


 Currently, on the Spark branch, each thread it is bound with a thread-local 
 IOContext, which gets initialized when we generates an input {{HadoopRDD}}, 
 and later used in {{MapOperator}}, {{FilterOperator}}, etc.
 And, given the introduction of HIVE-8118, we may have multiple downstream 
 RDDs that share the same input {{HadoopRDD}}, and we would like to have the 
 {{HadoopRDD}} to be cached, to avoid scanning the same table multiple times. 
 A typical case would be like the following:
 {noformat}
  inputRDD inputRDD
 ||
MT_11MT_12
 ||
RT_1 RT_2
 {noformat}
 Here, {{MT_11}} and {{MT_12}} are {{MapTran}} from a splitted {{MapWork}},
 and {{RT_1}} and {{RT_2}} are two {{ReduceTran}}. Note that, this example is 
 simplified, as we may also have {{ShuffleTran}} between {{MapTran}} and 
 {{ReduceTran}}.
 When multiple Spark threads are running, {{MT_11}} may be executed first, and 
 it will ask for an iterator from the {{HadoopRDD}} will trigger the creation 
 of the iterator, which in turn triggers the initialization of the 
 {{IOContext}} associated with that particular thread.
 *Now, the problem is*: before {{MT_12}} starts executing, it will also ask 
 for an iterator from the
 {{HadoopRDD}}, and since the RDD is already cached, instead of creating a new 
 iterator, it will just fetch it from the cached result. However, *this will 
 skip the initialization of the IOContext associated with this particular 
 thread*. And, when {{MT_12}} starts executing, it will try to initialize the 
 {{MapOperator}}, but since the {{IOContext}} is not initialized, this will 
 fail miserably. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HIVE-8437) Modify SparkPlan generation to set toCache flag to SparkTrans where caching is needed [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-8437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang resolved HIVE-8437.
---
   Resolution: Fixed
Fix Version/s: spark-branch

Fixed via HIVE-8457.

 Modify SparkPlan generation to set toCache flag to SparkTrans where caching 
 is needed [Spark Branch]
 

 Key: HIVE-8437
 URL: https://issues.apache.org/jira/browse/HIVE-8437
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
 Fix For: spark-branch


 HIVE-8436 may modify the SparkWork right before SparkPlan generation. When 
 this happens, the output from some SparkTrans needs to be cached to avoid 
 regenerating the RDD. For more information, please refer to the design doc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8435) Add identity project remover optimization


[ 
https://issues.apache.org/jira/browse/HIVE-8435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183226#comment-14183226
 ] 

Hive QA commented on HIVE-8435:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12676828/HIVE-8435.03.patch

{color:red}ERROR:{color} -1 due to 539 failed/errored test(s), 6549 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver_accumulo_predicate_pushdown
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver_accumulo_queries
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_partition_coltype
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ambiguous_col
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_filter
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_groupby
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_groupby2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_join_pkfk
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_part
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_select
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_union
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_archive_excludeHadoop20
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_create_temp_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join10
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join11
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join15
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join16
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join18
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join18_multi_distinct
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join20
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join21
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join22
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join23
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join24
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join27
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join28
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join29
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join31
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join32
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_reordering_values
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_without_localtask
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_smb_mapjoin_14
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_10
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_11
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_14
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_15
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_7
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_9
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ba_table_union
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket_groupby
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket_map_join_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket_map_join_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_7

[jira] [Commented] (HIVE-6165) Unify HivePreparedStatement from jdbc:hive and jdbc:hive2


[ 
https://issues.apache.org/jira/browse/HIVE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183231#comment-14183231
 ] 

Xuefu Zhang commented on HIVE-6165:
---

Yeah. I knew. However, the test failures seems unrelated, but I'm not quite 
sure. Thus, I liked to have another run to confirm. Thanks for pointing it out 
thought.

 Unify HivePreparedStatement from jdbc:hive and jdbc:hive2
 -

 Key: HIVE-6165
 URL: https://issues.apache.org/jira/browse/HIVE-6165
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, JDBC
Reporter: Helmut Zechmann
Priority: Minor
 Attachments: HIVE-6165.1.patch.txt, HIVE-6165.1.patch.txt, 
 HIVE-6165.2.patch, HIVE-6165.2.patch.txt, HIVE-6165.2.patch.txt


 org.apache.hadoop.hive.jdbc.HivePreparedStatement.class from the hive jdbc 
 driver and org.apache.hive.jdbc.HivePreparedStatement.class from the hive2 
 jdbc drivers contain lots of duplicate code. 
 Especially hive-HivePreparedStatement supports setObject, while the hive2 
 version does not.
 Share more code between the two to avoid duplicate work and to make sure that 
 both support the broadest possible feature set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8596) HiveServer2 dynamic service discovery: ZK throws too many connections error


 [ 
https://issues.apache.org/jira/browse/HIVE-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-8596:
---
Attachment: HIVE-8596.1.patch

 HiveServer2 dynamic service discovery: ZK throws too many connections error
 ---

 Key: HIVE-8596
 URL: https://issues.apache.org/jira/browse/HIVE-8596
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, JDBC
Affects Versions: 0.14.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8596.1.patch


 {noformat}
 2014-10-23 07:55:44,221 - WARN  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193] - Too 
 many connections from /172.31.47.11 - max is 60
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HIVE-8118) Support work that have multiple child works to work around SPARK-3622 [Spark Branch]

[
https://issues.apache.org/jira/browse/HIVE-8118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Xuefu Zhang resolved HIVE-8118.
---
Resolution: Fixed
Fix Version/s: spark-branch

All sub tasks are completed. Thus, this JIRA is closed as fixed as well.

Support work that have multiple child works to work around SPARK-3622 [Spark
Branch]
-

Key: HIVE-8118
URL: https://issues.apache.org/jira/browse/HIVE-8118
Project: Hive
Issue Type: Bug
Components: Spark
Reporter: Xuefu Zhang
Assignee: Chao
Labels: Spark-M1
Fix For: spark-branch

Attachments: HIVE-8118.pdf

In the current implementation, both SparkMapRecordHandler and
SparkReduceRecorderHandler takes only one result collector, which limits that
the corresponding map or reduce task can have only one child. It's very
comment in multi-insert queries where a map/reduce task has more than one
children. A query like the following has two map tasks as parents:
{code}
select name, sum(value) from dec group by name union all select name, value
from dec order by name
{code}
It's possible in the future an optimation may be implemented so that a map
work is followed by two reduce works and then connected to a union work.
Thus, we should take this as a general case. Tez is currently providing a
collector for each child operator in the map-side or reduce side operator
tree. We can take Tez as a reference.
Spark currently doesn't have a tranformation that supports mutliple output
datasets from a single input dataset (SPARK-3622). This is a workaround for
this gap.
Likely this is a big change and subtasks are possible.
With this, we can have a simpler and clean multi-insert implementation. This
is also the problem observed in HIVE-7731 and HIVE-7503.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 25550: HIVE-8021 CBO: support CTAS and insert ... select

2014-10-24 Thread Sergey Shelukhin



 On Oct. 24, 2014, 5:10 p.m., John Pullokkaran wrote:
  ql/src/test/queries/clientpositive/ctas_colname.q, line 9
  https://reviews.apache.org/r/25550/diff/8/?file=731255#file731255line9
 
  Why this change

see HIVE-8512, the original query is not valid and should fail


- Sergey


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25550/#review58290
---


On Oct. 23, 2014, 9:11 p.m., Sergey Shelukhin wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25550/
 ---
 
 (Updated Oct. 23, 2014, 9:11 p.m.)
 
 
 Review request for hive, Ashutosh Chauhan and John Pullokkaran.
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 see JIRA
 
 
 Diffs
 -
 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteParseContextGenerator.java
  dee7d7e 
   ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 
 37cbf7f 
   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java d8c50e3 
   ql/src/test/queries/clientpositive/cbo_correctness.q 4d8f156 
   ql/src/test/queries/clientpositive/ctas_colname.q 5322626 
   ql/src/test/queries/clientpositive/decimal_serde.q cf3a86c 
   ql/src/test/queries/clientpositive/insert0.q PRE-CREATION 
   ql/src/test/results/clientpositive/ctas_colname.q.out 97dacf6 
   ql/src/test/results/clientpositive/decimal_serde.q.out e461c2e 
   ql/src/test/results/clientpositive/insert0.q.out PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/25550/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Sergey Shelukhin

[jira] [Created] (HIVE-8597) SMB join small table side should use the same set of serialized payloads across tasks

2014-10-24 Thread Siddharth Seth (JIRA)

Siddharth Seth created HIVE-8597:


 Summary: SMB join small table side should use the same set of 
serialized payloads across tasks
 Key: HIVE-8597
 URL: https://issues.apache.org/jira/browse/HIVE-8597
 Project: Hive
  Issue Type: Improvement
  Components: Tez
Affects Versions: 0.14.0
Reporter: Siddharth Seth
Assignee: Siddharth Seth
 Fix For: 0.14.0


Each task sees all splits belonging to the bucket being processed by the task. 
At the moment, we end up using different instances of the same serialized split 
which adds unnecessary memory pressure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8597) SMB join small table side should use the same set of serialized payloads across tasks

2014-10-24 Thread Siddharth Seth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-8597:
-
Attachment: HIVE-8597.1.patch

Patch to create one set of serialized splits for each bucket, and re-use them 
across tasks processing the same bucket. Also removes some unused variables, 
and cleans up variables to allow for GC.

[~vikram.dixit] - please review.

 SMB join small table side should use the same set of serialized payloads 
 across tasks
 -

 Key: HIVE-8597
 URL: https://issues.apache.org/jira/browse/HIVE-8597
 Project: Hive
  Issue Type: Improvement
  Components: Tez
Affects Versions: 0.14.0
Reporter: Siddharth Seth
Assignee: Siddharth Seth
 Fix For: 0.14.0

 Attachments: HIVE-8597.1.patch


 Each task sees all splits belonging to the bucket being processed by the 
 task. At the moment, we end up using different instances of the same 
 serialized split which adds unnecessary memory pressure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8597) SMB join small table side should use the same set of serialized payloads across tasks

2014-10-24 Thread Siddharth Seth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-8597:
-
Status: Patch Available  (was: Open)

 SMB join small table side should use the same set of serialized payloads 
 across tasks
 -

 Key: HIVE-8597
 URL: https://issues.apache.org/jira/browse/HIVE-8597
 Project: Hive
  Issue Type: Improvement
  Components: Tez
Affects Versions: 0.14.0
Reporter: Siddharth Seth
Assignee: Siddharth Seth
 Fix For: 0.14.0

 Attachments: HIVE-8597.1.patch


 Each task sees all splits belonging to the bucket being processed by the 
 task. At the moment, we end up using different instances of the same 
 serialized split which adds unnecessary memory pressure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-8598) Push constant filters through joins

Ashutosh Chauhan created HIVE-8598:
--

 Summary: Push constant filters through joins
 Key: HIVE-8598
 URL: https://issues.apache.org/jira/browse/HIVE-8598
 Project: Hive
  Issue Type: Improvement
  Components: Logical Optimizer
Affects Versions: 0.14.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan


Will make {{NullScanOptimizer}} more effective.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8596) HiveServer2 dynamic service discovery: ZK throws too many connections error

2014-10-24 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183263#comment-14183263
 ] 

Thejas M Nair commented on HIVE-8596:
-

Changes look good. Should we just catch Exception , so that any unchecked 
exceptions are also silenced (so that we don't lose the original exception if 
there is one).


 HiveServer2 dynamic service discovery: ZK throws too many connections error
 ---

 Key: HIVE-8596
 URL: https://issues.apache.org/jira/browse/HIVE-8596
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, JDBC
Affects Versions: 0.14.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-8596.1.patch


 {noformat}
 2014-10-23 07:55:44,221 - WARN  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193] - Too 
 many connections from /172.31.47.11 - max is 60
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8021) CBO: support CTAS and insert ... select

2014-10-24 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-8021:
---
Attachment: HIVE-8021.07.patch

Fix a silly NPE

 CBO: support CTAS and insert ... select
 ---

 Key: HIVE-8021
 URL: https://issues.apache.org/jira/browse/HIVE-8021
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-8021.01.patch, HIVE-8021.01.patch, 
 HIVE-8021.02.patch, HIVE-8021.03.patch, HIVE-8021.04.patch, 
 HIVE-8021.05.patch, HIVE-8021.06.patch, HIVE-8021.06.patch, 
 HIVE-8021.07.patch, HIVE-8021.patch, HIVE-8021.preliminary.patch


 Need to send only the select part to CBO for now



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 26721: HIVE-8433 CBO loses a column during AST conversion

2014-10-24 Thread John Pullokkaran


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26721/#review58319
---



ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/ASTConverter.java
https://reviews.apache.org/r/26721/#comment99274

This is unused.


- John Pullokkaran


On Oct. 22, 2014, 11:18 p.m., Sergey Shelukhin wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/26721/
 ---
 
 (Updated Oct. 22, 2014, 11:18 p.m.)
 
 
 Review request for hive, Ashutosh Chauhan and John Pullokkaran.
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 see jira
 
 
 Diffs
 -
 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/ASTConverter.java
  0428263 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/PlanModifierForASTConv.java
  4f96d02 
   ql/src/java/org/apache/hadoop/hive/ql/parse/RowResolver.java 10ac4b2 
   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java d8c50e3 
   ql/src/test/queries/clientpositive/cbo_correctness.q 4d8f156 
   ql/src/test/queries/clientpositive/select_same_col.q PRE-CREATION 
   ql/src/test/results/clientpositive/cbo_correctness.q.out 7c25e1f 
   ql/src/test/results/clientpositive/select_same_col.q.out PRE-CREATION 
   ql/src/test/results/clientpositive/tez/cbo_correctness.q.out e467773 
 
 Diff: https://reviews.apache.org/r/26721/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Sergey Shelukhin

[jira] [Updated] (HIVE-8596) HiveServer2 dynamic service discovery: ZK throws too many connections error