[jira] [Updated] (HIVE-15230) query form a view join another view happens "No work found for tablescan TS"

2016-11-17 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated HIVE-15230:
---
Affects Version/s: 2.1.0

> query form a view join another view happens "No work found for tablescan TS"
> 
>
> Key: HIVE-15230
> URL: https://issues.apache.org/jira/browse/HIVE-15230
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0
> Environment: hive2.1.0
> Tez0.8.4
> hadoop2.7.2
>Reporter: gaoyang
>
> Here is my sql:
> select count(0) from tbllog_level_up b,tbllog_gold a where b.server=a.server 
> and b.role_id=a.role_id;
> tbllog_level_up and tbllog_gold is view which union by two tables, like this:
> create view tbllog_level_up as select * from tbllog_level_up_current union 
> all select * tbllog_level_up_bak;
> create view tbllog_gold as select * from tbllog_gold_current union all select 
> * tbllog_gold_bak;
> When i run the sql on hive2.1 with mr engine,it is ok. 
> But it throws the following exception with Tez0.8.4 engine:
> 2016-11-17T13:55:48,623 WARN  [HiveServer2-Handler-Pool: Thread-39]: 
> thrift.ThriftCLIService (:()) - Error executing statement: 
> org.apache.hive.service.cli.HiveSQLException: Error running query: 
> java.lang.AssertionError: No work found for tablescan TS[8]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:218)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:269)
> at 
> org.apache.hive.service.cli.operation.Operation.run(Operation.java:324)
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:460)
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:447)
> at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:294)
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:497)
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1437)
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1422)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.AssertionError: No work found for tablescan TS[8]
> at 
> org.apache.hadoop.hive.ql.parse.GenTezUtils.processAppMasterEvent(GenTezUtils.java:398)
> at 
> org.apache.hadoop.hive.ql.parse.TezCompiler.generateTaskTree(TezCompiler.java:397)
> at 
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:258)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10857)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:239)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:250)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:437)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:329)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1158)
> at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1145)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:184)
> ... 15 more
> In addition,when i change the sql like this:select count(0) from tbllog_gold 
> b,tbllog_level_up a where b.server=a.server and b.role_id=a.role_id
> It will succeed.
> Why???



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Moved] (HIVE-15230) query form a view join another view happens "No work found for tablescan TS"

2016-11-17 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah moved TEZ-3541 to HIVE-15230:
-

Affects Version/s: (was: 0.8.4)
  Key: HIVE-15230  (was: TEZ-3541)
  Project: Hive  (was: Apache Tez)

> query form a view join another view happens "No work found for tablescan TS"
> 
>
> Key: HIVE-15230
> URL: https://issues.apache.org/jira/browse/HIVE-15230
> Project: Hive
>  Issue Type: Bug
> Environment: hive2.1.0
> Tez0.8.4
> hadoop2.7.2
>Reporter: gaoyang
>
> Here is my sql:
> select count(0) from tbllog_level_up b,tbllog_gold a where b.server=a.server 
> and b.role_id=a.role_id;
> tbllog_level_up and tbllog_gold is view which union by two tables, like this:
> create view tbllog_level_up as select * from tbllog_level_up_current union 
> all select * tbllog_level_up_bak;
> create view tbllog_gold as select * from tbllog_gold_current union all select 
> * tbllog_gold_bak;
> When i run the sql on hive2.1 with mr engine,it is ok. 
> But it throws the following exception with Tez0.8.4 engine:
> 2016-11-17T13:55:48,623 WARN  [HiveServer2-Handler-Pool: Thread-39]: 
> thrift.ThriftCLIService (:()) - Error executing statement: 
> org.apache.hive.service.cli.HiveSQLException: Error running query: 
> java.lang.AssertionError: No work found for tablescan TS[8]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:218)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:269)
> at 
> org.apache.hive.service.cli.operation.Operation.run(Operation.java:324)
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:460)
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:447)
> at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:294)
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:497)
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1437)
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1422)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.AssertionError: No work found for tablescan TS[8]
> at 
> org.apache.hadoop.hive.ql.parse.GenTezUtils.processAppMasterEvent(GenTezUtils.java:398)
> at 
> org.apache.hadoop.hive.ql.parse.TezCompiler.generateTaskTree(TezCompiler.java:397)
> at 
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:258)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10857)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:239)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:250)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:437)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:329)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1158)
> at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1145)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:184)
> ... 15 more
> In addition,when i change the sql like this:select count(0) from tbllog_gold 
> b,tbllog_level_up a where b.server=a.server and b.role_id=a.role_id
> It will succeed.
> Why???



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15144) JSON.org license is now CatX

2016-11-07 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15645760#comment-15645760
 ] 

Hitesh Shah commented on HIVE-15144:


Tez uses jersey-json which is CDDL licensed and does not have the do no evil 
disclaimer. 

> JSON.org license is now CatX
> 
>
> Key: HIVE-15144
> URL: https://issues.apache.org/jira/browse/HIVE-15144
> Project: Hive
>  Issue Type: Bug
>Reporter: Robert Kanter
>Priority: Blocker
> Fix For: 2.2.0
>
>
> per [update resolved legal|http://www.apache.org/legal/resolved.html#json]:
> {quote}
> CAN APACHE PRODUCTS INCLUDE WORKS LICENSED UNDER THE JSON LICENSE?
> No. As of 2016-11-03 this has been moved to the 'Category X' license list. 
> Prior to this, use of the JSON Java library was allowed. See Debian's page 
> for a list of alternatives.
> {quote}
> I'm not sure when this dependency was first introduced, but it looks like 
> it's currently used in a few places:
> https://github.com/apache/hive/search?p=1=%22org.json%22=%E2%9C%93



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Moved] (HIVE-15103) Error when inserting into hive table with HBase storage handler

2016-11-01 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah moved TEZ-3506 to HIVE-15103:
-

Affects Version/s: (was: 0.7.0)
  Key: HIVE-15103  (was: TEZ-3506)
  Project: Hive  (was: Apache Tez)

> Error when inserting into hive table with HBase storage handler
> ---
>
> Key: HIVE-15103
> URL: https://issues.apache.org/jira/browse/HIVE-15103
> Project: Hive
>  Issue Type: Bug
> Environment: HDP 2.4.2 on centos 6.8 x64
>Reporter: Edward Chen
>
> Exceptions are returned when executing the following simple insert statement 
> in hive on tez.
> {code}
> insert into table accounts  select * from temp_account;
> {code}
> where table accounts is a hive external table, stored in hbase:
> {code}
> create external table accounts (key:string, value:string)
> stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> with serdeproperties ("hbase.columns.mapping" = ":key,0:val")
> tblproperties ("hbase.table.name" = "accounts_hbase", 
> "hbase.mapred.output.outputtable" = "accounts_hbase");
> {code}
> The SQL could be executed when hive.execution.engine=mr, however, when set 
> hive.execution.engine=tez, we got following error:
> {code}
> Status: Failed
> Vertex failed, vertexName=Map 1, vertexId=vertex_1475059990829_1927_1_00, 
> diagnostics=[Task failed, taskId=task_1475059990829_1927_1_00_99, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running 
> task:java.lang.RuntimeException: java.lang.RuntimeException: Map operator 
> initialization failed
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:181)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: Map operator initialization failed
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:265)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:149)
> ... 14 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.IllegalArgumentException: Must specify table name
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.createHiveOutputFormat(FileSinkOperator.java:1139)
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:346)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:363)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:482)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:439)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:482)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:439)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:482)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:439)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator.initializeMapOperator(MapOperator.java:486)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:234)
> ... 15 more
> Caused by: java.lang.IllegalArgumentException: Must 

[jira] [Commented] (HIVE-14987) CombineHiveInputFormat with Tez fails to initiate vertex if table is empty

2016-10-17 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15582878#comment-15582878
 ] 

Hitesh Shah commented on HIVE-14987:


Moved this to Hive as this seems to be Hive specific. 

> CombineHiveInputFormat with Tez fails to initiate vertex if table is empty
> --
>
> Key: HIVE-14987
> URL: https://issues.apache.org/jira/browse/HIVE-14987
> Project: Hive
>  Issue Type: Bug
>Reporter: Yi Zhang
>
> Sometimes user have developed custom inputformat that extends from 
> CombineHiveInputFormat due to difficulty of extending from HiveInputFormat 
> directly, for example to filter out old data files.   
> in this use case, vertex fails to get initialized:
> SELECT city.cid
> FROM
> (select city_id as cid,
> row_number() over(partition by timezone order by population) rnum
> from cities) city
> JOIN
>   (select datestr, id from yizhang.emptyparts where datestr >= 
> date_sub(current_date(),30)) emp
> on city.cid = emp.id
> ;
> 
> VERTICES  STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  
> KILLED
> 
> Map 1 KILLED -1  00   -1   0  
>  0
> Map 3 FAILED -1  00   -1   0  
>  0
> Reducer 2 KILLED  1  001   0  
>  0
> 
> VERTICES: 00/03  [>>--] 0%ELAPSED TIME: 0.34 s
>  
> 
> Status: Failed
> Vertex failed, vertexName=Map 3, vertexId=vertex_1476217616538_398108_1_01, 
> diagnostics=[Vertex vertex_1476217616538_398108_1_01 [Map 3] killed/failed 
> due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: emp initializer failed, 
> vertex=vertex_1476217616538_398108_1_01 [Map 3], 
> java.lang.IllegalArgumentException
>   at 
> java.util.concurrent.ThreadPoolExecutor.(ThreadPoolExecutor.java:1307)
>   at 
> java.util.concurrent.ThreadPoolExecutor.(ThreadPoolExecutor.java:1195)
>   at java.util.concurrent.Executors.newFixedThreadPool(Executors.java:89)
>   at 
> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:519)
>   at 
> org.apache.tez.mapreduce.hadoop.MRInputHelpers.generateOldSplits(MRInputHelpers.java:447)
>   at 
> org.apache.tez.mapreduce.hadoop.MRInputHelpers.generateInputSplitsToMem(MRInputHelpers.java:299)
>   at 
> org.apache.tez.mapreduce.common.MRInputAMSplitGenerator.initialize(MRInputAMSplitGenerator.java:121)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:264)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:258)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:258)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:245)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> ]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Moved] (HIVE-14987) CombineHiveInputFormat with Tez fails to initiate vertex if table is empty

2016-10-17 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah moved TEZ-3474 to HIVE-14987:
-

Affects Version/s: (was: 0.7.1)
  Key: HIVE-14987  (was: TEZ-3474)
  Project: Hive  (was: Apache Tez)

> CombineHiveInputFormat with Tez fails to initiate vertex if table is empty
> --
>
> Key: HIVE-14987
> URL: https://issues.apache.org/jira/browse/HIVE-14987
> Project: Hive
>  Issue Type: Bug
>Reporter: Yi Zhang
>
> Sometimes user have developed custom inputformat that extends from 
> CombineHiveInputFormat due to difficulty of extending from HiveInputFormat 
> directly, for example to filter out old data files.   
> in this use case, vertex fails to get initialized:
> SELECT city.cid
> FROM
> (select city_id as cid,
> row_number() over(partition by timezone order by population) rnum
> from cities) city
> JOIN
>   (select datestr, id from yizhang.emptyparts where datestr >= 
> date_sub(current_date(),30)) emp
> on city.cid = emp.id
> ;
> 
> VERTICES  STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  
> KILLED
> 
> Map 1 KILLED -1  00   -1   0  
>  0
> Map 3 FAILED -1  00   -1   0  
>  0
> Reducer 2 KILLED  1  001   0  
>  0
> 
> VERTICES: 00/03  [>>--] 0%ELAPSED TIME: 0.34 s
>  
> 
> Status: Failed
> Vertex failed, vertexName=Map 3, vertexId=vertex_1476217616538_398108_1_01, 
> diagnostics=[Vertex vertex_1476217616538_398108_1_01 [Map 3] killed/failed 
> due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: emp initializer failed, 
> vertex=vertex_1476217616538_398108_1_01 [Map 3], 
> java.lang.IllegalArgumentException
>   at 
> java.util.concurrent.ThreadPoolExecutor.(ThreadPoolExecutor.java:1307)
>   at 
> java.util.concurrent.ThreadPoolExecutor.(ThreadPoolExecutor.java:1195)
>   at java.util.concurrent.Executors.newFixedThreadPool(Executors.java:89)
>   at 
> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:519)
>   at 
> org.apache.tez.mapreduce.hadoop.MRInputHelpers.generateOldSplits(MRInputHelpers.java:447)
>   at 
> org.apache.tez.mapreduce.hadoop.MRInputHelpers.generateInputSplitsToMem(MRInputHelpers.java:299)
>   at 
> org.apache.tez.mapreduce.common.MRInputAMSplitGenerator.initialize(MRInputAMSplitGenerator.java:121)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:264)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:258)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:258)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:245)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> ]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-14857) select count(*) fails with tez over cassandra

2016-09-29 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15532983#comment-15532983
 ] 

Hitesh Shah edited comment on HIVE-14857 at 9/29/16 2:48 PM:
-

[~carlo_4002] I just moved this to Hive. Can you please update the affects 
version field to indicate what versions of Hive you were running. For Tez, I 
believe you had mentioned 0.7.0?


was (Author: hitesh):
[~carlo_4002] I just moved this to Hive. Can you please update the affects 
version field to indicate what versions of Hive and Tez you were running. 

> select count(*) fails with tez over cassandra
> -
>
> Key: HIVE-14857
> URL: https://issues.apache.org/jira/browse/HIVE-14857
> Project: Hive
>  Issue Type: Bug
>Reporter: jean carlo rivera ura
>
> Hello,
> We have a cluster with nodes having cassandra and hadoop (hortonworks 2.3.2) 
> and we have tez as our engine by default.
> I have a table in cassandra, and I use the driver hive-cassandra to do 
> selects over it. This is the table
> {code:sql}
> CREATE TABLE table1 ( campaign_id text, sid text, name text, ts timestamp, 
> PRIMARY KEY (campaign_id, sid) ) WITH CLUSTERING ORDER BY (sid ASC)
> {code}
> And I have only 3 partitions
> ||campaign_id ||   sid  ||  name  ||  ts||
> |45sqdqs| sqsd |  dea| NULL|
> |QSHJKA | sqsd |  dea| NULL|
> |45s-qs   | sqsd |  dea| NULL|
> At the moment to do a "select count ( * )" over table using hive like that 
> (tez is our engine by default)
> {code} hive -e "select count(*) from table1;" {code}
> I got this error:
> {code}
> Status: Failed
> Vertex failed, vertexName=Map 1, 
> vertexId=vertex_1474275943985_0179_1_00, diagnostics=[Task failed, 
> taskId=task_1474275943985_0179_1_00_01, diagnostics=[TaskAttempt 0 
> failed, info=[Error: Failure while running 
> task:java.lang.RuntimeException: 
> org.apache.tez.dag.api.TezUncheckedException: Expected length: 12416 
> actual length: 9223372036854775711
>at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
>at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
>at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
>at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
>at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
>at java.security.AccessController.doPrivileged(Native Method)
>at javax.security.auth.Subject.doAs(Subject.java:422)
>at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
>at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
>at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.tez.dag.api.TezUncheckedException: Expected length: 
> 12416 actual length: 9223372036854775711
>at 
> org.apache.hadoop.mapred.split.TezGroupedSplit.readFields(TezGroupedSplit.java:128)
>at 
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:71)
>at 
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:42)
>at 
> org.apache.tez.mapreduce.hadoop.MRInputHelpers.createOldFormatSplitFromUserPayload(MRInputHelpers.java:177)
>at 
> org.apache.tez.mapreduce.lib.MRInputUtils.getOldSplitDetailsFromEvent(MRInputUtils.java:136)
>at 
> org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:643)
>at org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:621)
>at 
> org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:145)
>at 
> org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:109)
>at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:390)
>at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:128)
>at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:147)
>... 14 more
> {code}
> So far I understand, in readfields we are getting more data that we are 
> expecting. But considering the size of the 

[jira] [Commented] (HIVE-14857) select count(*) fails with tez over cassandra

2016-09-29 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15532983#comment-15532983
 ] 

Hitesh Shah commented on HIVE-14857:


[~carlo_4002] I just moved this to Hive. Can you please update the affects 
version field to indicate what versions of Hive and Tez you were running. 

> select count(*) fails with tez over cassandra
> -
>
> Key: HIVE-14857
> URL: https://issues.apache.org/jira/browse/HIVE-14857
> Project: Hive
>  Issue Type: Bug
>Reporter: jean carlo rivera ura
>
> Hello,
> We have a cluster with nodes having cassandra and hadoop (hortonworks 2.3.2) 
> and we have tez as our engine by default.
> I have a table in cassandra, and I use the driver hive-cassandra to do 
> selects over it. This is the table
> {code:sql}
> CREATE TABLE table1 ( campaign_id text, sid text, name text, ts timestamp, 
> PRIMARY KEY (campaign_id, sid) ) WITH CLUSTERING ORDER BY (sid ASC)
> {code}
> And I have only 3 partitions
> ||campaign_id ||   sid  ||  name  ||  ts||
> |45sqdqs| sqsd |  dea| NULL|
> |QSHJKA | sqsd |  dea| NULL|
> |45s-qs   | sqsd |  dea| NULL|
> At the moment to do a "select count ( * )" over table using hive like that 
> (tez is our engine by default)
> {code} hive -e "select count(*) from table1;" {code}
> I got this error:
> {code}
> Status: Failed
> Vertex failed, vertexName=Map 1, 
> vertexId=vertex_1474275943985_0179_1_00, diagnostics=[Task failed, 
> taskId=task_1474275943985_0179_1_00_01, diagnostics=[TaskAttempt 0 
> failed, info=[Error: Failure while running 
> task:java.lang.RuntimeException: 
> org.apache.tez.dag.api.TezUncheckedException: Expected length: 12416 
> actual length: 9223372036854775711
>at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
>at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
>at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
>at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
>at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
>at java.security.AccessController.doPrivileged(Native Method)
>at javax.security.auth.Subject.doAs(Subject.java:422)
>at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
>at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
>at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.tez.dag.api.TezUncheckedException: Expected length: 
> 12416 actual length: 9223372036854775711
>at 
> org.apache.hadoop.mapred.split.TezGroupedSplit.readFields(TezGroupedSplit.java:128)
>at 
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:71)
>at 
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:42)
>at 
> org.apache.tez.mapreduce.hadoop.MRInputHelpers.createOldFormatSplitFromUserPayload(MRInputHelpers.java:177)
>at 
> org.apache.tez.mapreduce.lib.MRInputUtils.getOldSplitDetailsFromEvent(MRInputUtils.java:136)
>at 
> org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:643)
>at org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:621)
>at 
> org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:145)
>at 
> org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:109)
>at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:390)
>at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:128)
>at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:147)
>... 14 more
> {code}
> So far I understand, in readfields we are getting more data that we are 
> expecting. But considering the size of the table( only 3 records), I dont 
> think the data is a problem. 
> Another thing to add is that if I do  a "select *", it works perfectly fine 
> with tez. Using the engine mp, select count ( * ) and select * work fine as 
> well.
> We are using hortonworks 

[jira] [Moved] (HIVE-14857) select count(*) fails with tez over cassandra

2016-09-29 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah moved TEZ-3451 to HIVE-14857:
-

Affects Version/s: (was: 0.7.0)
  Key: HIVE-14857  (was: TEZ-3451)
  Project: Hive  (was: Apache Tez)

> select count(*) fails with tez over cassandra
> -
>
> Key: HIVE-14857
> URL: https://issues.apache.org/jira/browse/HIVE-14857
> Project: Hive
>  Issue Type: Bug
>Reporter: jean carlo rivera ura
>
> Hello,
> We have a cluster with nodes having cassandra and hadoop (hortonworks 2.3.2) 
> and we have tez as our engine by default.
> I have a table in cassandra, and I use the driver hive-cassandra to do 
> selects over it. This is the table
> {code:sql}
> CREATE TABLE table1 ( campaign_id text, sid text, name text, ts timestamp, 
> PRIMARY KEY (campaign_id, sid) ) WITH CLUSTERING ORDER BY (sid ASC)
> {code}
> And I have only 3 partitions
> ||campaign_id ||   sid  ||  name  ||  ts||
> |45sqdqs| sqsd |  dea| NULL|
> |QSHJKA | sqsd |  dea| NULL|
> |45s-qs   | sqsd |  dea| NULL|
> At the moment to do a "select count ( * )" over table using hive like that 
> (tez is our engine by default)
> {code} hive -e "select count(*) from table1;" {code}
> I got this error:
> {code}
> Status: Failed
> Vertex failed, vertexName=Map 1, 
> vertexId=vertex_1474275943985_0179_1_00, diagnostics=[Task failed, 
> taskId=task_1474275943985_0179_1_00_01, diagnostics=[TaskAttempt 0 
> failed, info=[Error: Failure while running 
> task:java.lang.RuntimeException: 
> org.apache.tez.dag.api.TezUncheckedException: Expected length: 12416 
> actual length: 9223372036854775711
>at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
>at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
>at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
>at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
>at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
>at java.security.AccessController.doPrivileged(Native Method)
>at javax.security.auth.Subject.doAs(Subject.java:422)
>at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
>at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
>at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.tez.dag.api.TezUncheckedException: Expected length: 
> 12416 actual length: 9223372036854775711
>at 
> org.apache.hadoop.mapred.split.TezGroupedSplit.readFields(TezGroupedSplit.java:128)
>at 
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:71)
>at 
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:42)
>at 
> org.apache.tez.mapreduce.hadoop.MRInputHelpers.createOldFormatSplitFromUserPayload(MRInputHelpers.java:177)
>at 
> org.apache.tez.mapreduce.lib.MRInputUtils.getOldSplitDetailsFromEvent(MRInputUtils.java:136)
>at 
> org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:643)
>at org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:621)
>at 
> org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:145)
>at 
> org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:109)
>at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:390)
>at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:128)
>at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:147)
>... 14 more
> {code}
> So far I understand, in readfields we are getting more data that we are 
> expecting. But considering the size of the table( only 3 records), I dont 
> think the data is a problem. 
> Another thing to add is that if I do  a "select *", it works perfectly fine 
> with tez. Using the engine mp, select count ( * ) and select * work fine as 
> well.
> We are using hortonworks version 2.3.2



--
This message was 

[jira] [Commented] (HIVE-13446) LLAP: set default management protocol acls to deny all

2016-04-30 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15265367#comment-15265367
 ] 

Hitesh Shah commented on HIVE-13446:


If you are using the hadoop acls impl, setting it to a string with a single 
space blocks everyone. 

> LLAP: set default management protocol acls to deny all
> --
>
> Key: HIVE-13446
> URL: https://issues.apache.org/jira/browse/HIVE-13446
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-13446.patch
>
>
> The user needs to set the acls.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13446) LLAP: set default management protocol acls to deny all

2016-04-30 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15265366#comment-15265366
 ] 

Hitesh Shah commented on HIVE-13446:


Setting tez acls to empty string will allow only the AM user to view all 
details and the dag owner to view dag specific details. 

> LLAP: set default management protocol acls to deny all
> --
>
> Key: HIVE-13446
> URL: https://issues.apache.org/jira/browse/HIVE-13446
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-13446.patch
>
>
> The user needs to set the acls.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13239) "java.lang.OutOfMemoryError: unable to create new native thread" occurs at Hive on Tez

2016-03-08 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186330#comment-15186330
 ] 

Hitesh Shah commented on HIVE-13239:


Moved to Hive as this is a Hive issue and not a Tez one. 

> "java.lang.OutOfMemoryError: unable to create new native thread" occurs at 
> Hive on Tez
> --
>
> Key: HIVE-13239
> URL: https://issues.apache.org/jira/browse/HIVE-13239
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
> Environment: HDP2.3.4
> JDK1.8
> CentOS 6
>Reporter: Wataru Yukawa
>
> "ps -L $(pgrep -f hiveserver2) | wc -l" is more than 15,000
> HiveServer2 memory leak occurs.
> hive query
> {code}
>  FROM hoge_tmp
>  INSERT INTO TABLE hoge PARTITION (...)
>SELECT ...   WHERE ...
> {code}
> stacktrace
> {code}
> org.apache.hive.service.cli.HiveSQLException: Error while processing 
> statement: FAILED: Execution Error, return code -101 from 
> org.apache.hadoop.hive.ql.exec.tez.TezTask. unable to create new native thread
> at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:315)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:156)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:183)
> at 
> org.apache.hive.service.cli.operation.Operation.run(Operation.java:257)
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:410)
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatement(HiveSessionImpl.java:391)
> at 
> org.apache.hive.service.cli.CLIService.executeStatement(CLIService.java:261)
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:486)
> at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1313)
> at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1298)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.OutOfMemoryError: unable to create new native thread
> at java.lang.Thread.start0(Native Method)
> at java.lang.Thread.start(Thread.java:714)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.start(DFSOutputStream.java:2238)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1753)
> at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1703)
> at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1638)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:448)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:444)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:444)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:387)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:909)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:890)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:787)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:776)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:577)
> at 
> org.apache.tez.common.TezCommonUtils.createFileForAM(TezCommonUtils.java:310)
> at 
> org.apache.tez.client.TezClientUtils.createApplicationSubmissionContext(TezClientUtils.java:559)
> at org.apache.tez.client.TezClient.start(TezClient.java:395)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:196)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezTask.updateSession(TezTask.java:271)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:151)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
> at 
> 

[jira] [Updated] (HIVE-13239) "java.lang.OutOfMemoryError: unable to create new native thread" occurs at Hive on Tez

2016-03-08 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated HIVE-13239:
---
Component/s: HiveServer2

> "java.lang.OutOfMemoryError: unable to create new native thread" occurs at 
> Hive on Tez
> --
>
> Key: HIVE-13239
> URL: https://issues.apache.org/jira/browse/HIVE-13239
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
> Environment: HDP2.3.4
> JDK1.8
> CentOS 6
>Reporter: Wataru Yukawa
>
> "ps -L $(pgrep -f hiveserver2) | wc -l" is more than 15,000
> HiveServer2 memory leak occurs.
> hive query
> {code}
>  FROM hoge_tmp
>  INSERT INTO TABLE hoge PARTITION (...)
>SELECT ...   WHERE ...
> {code}
> stacktrace
> {code}
> org.apache.hive.service.cli.HiveSQLException: Error while processing 
> statement: FAILED: Execution Error, return code -101 from 
> org.apache.hadoop.hive.ql.exec.tez.TezTask. unable to create new native thread
> at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:315)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:156)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:183)
> at 
> org.apache.hive.service.cli.operation.Operation.run(Operation.java:257)
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:410)
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatement(HiveSessionImpl.java:391)
> at 
> org.apache.hive.service.cli.CLIService.executeStatement(CLIService.java:261)
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:486)
> at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1313)
> at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1298)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.OutOfMemoryError: unable to create new native thread
> at java.lang.Thread.start0(Native Method)
> at java.lang.Thread.start(Thread.java:714)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.start(DFSOutputStream.java:2238)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1753)
> at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1703)
> at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1638)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:448)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:444)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:444)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:387)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:909)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:890)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:787)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:776)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:577)
> at 
> org.apache.tez.common.TezCommonUtils.createFileForAM(TezCommonUtils.java:310)
> at 
> org.apache.tez.client.TezClientUtils.createApplicationSubmissionContext(TezClientUtils.java:559)
> at org.apache.tez.client.TezClient.start(TezClient.java:395)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:196)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezTask.updateSession(TezTask.java:271)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:151)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:89)
> at 

[jira] [Moved] (HIVE-13239) "java.lang.OutOfMemoryError: unable to create new native thread" occurs at Hive on Tez

2016-03-08 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah moved TEZ-3158 to HIVE-13239:
-

Affects Version/s: (was: 0.7.0)
  Key: HIVE-13239  (was: TEZ-3158)
  Project: Hive  (was: Apache Tez)

> "java.lang.OutOfMemoryError: unable to create new native thread" occurs at 
> Hive on Tez
> --
>
> Key: HIVE-13239
> URL: https://issues.apache.org/jira/browse/HIVE-13239
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
> Environment: HDP2.3.4
> JDK1.8
> CentOS 6
>Reporter: Wataru Yukawa
>
> "ps -L $(pgrep -f hiveserver2) | wc -l" is more than 15,000
> HiveServer2 memory leak occurs.
> hive query
> {code}
>  FROM hoge_tmp
>  INSERT INTO TABLE hoge PARTITION (...)
>SELECT ...   WHERE ...
> {code}
> stacktrace
> {code}
> org.apache.hive.service.cli.HiveSQLException: Error while processing 
> statement: FAILED: Execution Error, return code -101 from 
> org.apache.hadoop.hive.ql.exec.tez.TezTask. unable to create new native thread
> at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:315)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:156)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:183)
> at 
> org.apache.hive.service.cli.operation.Operation.run(Operation.java:257)
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:410)
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatement(HiveSessionImpl.java:391)
> at 
> org.apache.hive.service.cli.CLIService.executeStatement(CLIService.java:261)
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:486)
> at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1313)
> at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1298)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.OutOfMemoryError: unable to create new native thread
> at java.lang.Thread.start0(Native Method)
> at java.lang.Thread.start(Thread.java:714)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.start(DFSOutputStream.java:2238)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1753)
> at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1703)
> at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1638)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:448)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:444)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:444)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:387)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:909)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:890)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:787)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:776)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:577)
> at 
> org.apache.tez.common.TezCommonUtils.createFileForAM(TezCommonUtils.java:310)
> at 
> org.apache.tez.client.TezClientUtils.createApplicationSubmissionContext(TezClientUtils.java:559)
> at org.apache.tez.client.TezClient.start(TezClient.java:395)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:196)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezTask.updateSession(TezTask.java:271)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:151)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
> at 
> 

[jira] [Commented] (HIVE-13238) "java.lang.OutOfMemoryError: Java heap space" occurs at Hive on Tez

2016-03-08 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186323#comment-15186323
 ] 

Hitesh Shah commented on HIVE-13238:


Such issues should be sent to u...@hive.apache.org for memory tuning guidance 
as they are usually not bugs but rather due to misconfigurations or require 
certain params to be tuned. 

> "java.lang.OutOfMemoryError: Java heap space" occurs at Hive on Tez
> ---
>
> Key: HIVE-13238
> URL: https://issues.apache.org/jira/browse/HIVE-13238
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
> Environment: HDP2.3.4
> JDK1.8
> CentOS 6
>Reporter: Wataru Yukawa
>
> hive query
> {code}
> select
>   aaa,
>   sum(bbb)
> from (
>   select
> ...
>   from access_log
>   where mmdd='20160224' AND ...
> )x
> group by aaa
> {code}
> stacktrace
> {code}
> java.lang.OutOfMemoryError: Java heap space
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:157)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.OutOfMemoryError: Java heap space
> at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57)
> at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
> at 
> org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.(PipelinedSorter.java:172)
> at 
> org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.(PipelinedSorter.java:116)
> at 
> org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput.start(OrderedPartitionedKVOutput.java:142)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:142)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:147)
> ... 14 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12683) Does Tez run slower than hive on larger dataset (~2.5 TB)?

2015-12-16 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060302#comment-15060302
 ] 

Hitesh Shah commented on HIVE-12683:


It seems like the application master is running out of memory. What is the Tez 
AM being configured for in terms of memory and Xmx?

> Does Tez run slower than hive on larger dataset (~2.5 TB)?
> --
>
> Key: HIVE-12683
> URL: https://issues.apache.org/jira/browse/HIVE-12683
> Project: Hive
>  Issue Type: Bug
>Reporter: rohit garg
>
> We have started to look into testing tez query engine. From initial results, 
> we are getting 30% performance boost over Hive on smaller data set(1-10 GB) 
> but Hive starts to perform better than Tez as data size increases. Like when 
> we run a hive query with Tez on about 2.3 TB worth of data, it performs worse 
> than hive alone.(~20% less performance) Details are in the post below.
> On a cluster with 1.3 TB RAM, I set the following property :
> set tez.task.resource.memory.mb=1; set tez.am.resource.memory.mb=59205; 
> set tez.am.launch.cmd-opts =-Xmx47364m; set hive.tez.container.size=59205; 
> set hive.tez.java.opts=-Xmx47364m; set tez.am.grouping.max-size=3670016;
> Is it normal or I am missing some property / not configuring some property 
> properly? Also, I am using an older version of Tez as of now. Could that be 
> the issue too? I still have to bootstrap latest version of Tez on EMR and 
> test it and see if that could do any better.
> Thought of asking here too
> http://www.jwplayer.com/blog/hive-with-tez-on-emr/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12683) Does Tez run slower than hive on larger dataset (~2.5 TB)?

2015-12-16 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060465#comment-15060465
 ] 

Hitesh Shah commented on HIVE-12683:


The Tez AM resource sizing has no relation to the task container sizing. That 
said, for various benchmarks done in the past, I dont believe anyone has needed 
to go beyond 16GB for the Tez AM for very large DAGs.

[~rohitgarg1989] What was the AM size configured to when the OOM happened? If 
you are running a version older than Tez 0.7.0, there were some memory issues 
that require a large AM size i.e. large being say 16 GB but for 0.7.0 and 
higher, even 4 GB should be sufficient for a decent sized DAG. You can set it 
to 8 GB to be safe for now with Xmx say 6.4 GB and that should be sufficient. 
If you still hit an OOM with 8 GB, a jira against Tez with the heap dump would 
be helpful. 

[~gopalv] anything to add? any configs that need to be tuned / turned off for 
Hive that ends up using more memory in the AM? Any implicit caching of splits, 
etc?  

> Does Tez run slower than hive on larger dataset (~2.5 TB)?
> --
>
> Key: HIVE-12683
> URL: https://issues.apache.org/jira/browse/HIVE-12683
> Project: Hive
>  Issue Type: Bug
>Reporter: rohit garg
>
> We have started to look into testing tez query engine. From initial results, 
> we are getting 30% performance boost over Hive on smaller data set(1-10 GB) 
> but Hive starts to perform better than Tez as data size increases. Like when 
> we run a hive query with Tez on about 2.3 TB worth of data, it performs worse 
> than hive alone.(~20% less performance) Details are in the post below.
> On a cluster with 1.3 TB RAM, I set the following property :
> set tez.task.resource.memory.mb=1; set tez.am.resource.memory.mb=59205; 
> set tez.am.launch.cmd-opts =-Xmx47364m; set hive.tez.container.size=59205; 
> set hive.tez.java.opts=-Xmx47364m; set tez.am.grouping.max-size=3670016;
> Is it normal or I am missing some property / not configuring some property 
> properly? Also, I am using an older version of Tez as of now. Could that be 
> the issue too? I still have to bootstrap latest version of Tez on EMR and 
> test it and see if that could do any better.
> Thought of asking here too
> http://www.jwplayer.com/blog/hive-with-tez-on-emr/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12683) Does Tez run slower than hive on larger dataset (~2.5 TB)?

2015-12-15 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058967#comment-15058967
 ] 

Hitesh Shah commented on HIVE-12683:


Additional info such as the query text as well as the explain would be useful. 

> Does Tez run slower than hive on larger dataset (~2.5 TB)?
> --
>
> Key: HIVE-12683
> URL: https://issues.apache.org/jira/browse/HIVE-12683
> Project: Hive
>  Issue Type: Bug
>Reporter: rohit garg
>
> We have started to look into testing tez query engine. From initial results, 
> we are getting 30% performance boost over Hive on smaller data set(1-10 GB) 
> but Hive starts to perform better than Tez as data size increases. Like when 
> we run a hive query with Tez on about 2.3 TB worth of data, it performs worse 
> than hive alone.(~20% less performance) Details are in the post below.
> On a cluster with 1.3 TB RAM, I set the following property :
> set tez.task.resource.memory.mb=1; set tez.am.resource.memory.mb=59205; 
> set tez.am.launch.cmd-opts =-Xmx47364m; set hive.tez.container.size=59205; 
> set hive.tez.java.opts=-Xmx47364m; set tez.am.grouping.max-size=3670016;
> Is it normal or I am missing some property / not configuring some property 
> properly? Also, I am using an older version of Tez as of now. Could that be 
> the issue too? I still have to bootstrap latest version of Tez on EMR and 
> test it and see if that could do any better.
> Thought of asking here too
> http://www.jwplayer.com/blog/hive-with-tez-on-emr/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Moved] (HIVE-12683) Does Tez run slower than hive on larger dataset (~2.5 TB)?

2015-12-15 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah moved TEZ-3002 to HIVE-12683:
-

Key: HIVE-12683  (was: TEZ-3002)
Project: Hive  (was: Apache Tez)

> Does Tez run slower than hive on larger dataset (~2.5 TB)?
> --
>
> Key: HIVE-12683
> URL: https://issues.apache.org/jira/browse/HIVE-12683
> Project: Hive
>  Issue Type: Bug
>Reporter: rohit garg
>
> We have started to look into testing tez query engine. From initial results, 
> we are getting 30% performance boost over Hive on smaller data set(1-10 GB) 
> but Hive starts to perform better than Tez as data size increases. Like when 
> we run a hive query with Tez on about 2.3 TB worth of data, it performs worse 
> than hive alone.(~20% less performance) Details are in the post below.
> On a cluster with 1.3 TB RAM, I set the following property :
> set tez.task.resource.memory.mb=1; set tez.am.resource.memory.mb=59205; 
> set tez.am.launch.cmd-opts =-Xmx47364m; set hive.tez.container.size=59205; 
> set hive.tez.java.opts=-Xmx47364m; set tez.am.grouping.max-size=3670016;
> Is it normal or I am missing some property / not configuring some property 
> properly? Also, I am using an older version of Tez as of now. Could that be 
> the issue too? I still have to bootstrap latest version of Tez on EMR and 
> test it and see if that could do any better.
> Thought of asking here too
> http://www.jwplayer.com/blog/hive-with-tez-on-emr/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12683) Does Tez run slower than hive on larger dataset (~2.5 TB)?

2015-12-15 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058963#comment-15058963
 ] 

Hitesh Shah commented on HIVE-12683:


\cc [~hagleitn] [~gopalv]

[~rohitgarg1989] Can you attach the yarn application logs for the query that 
was slow. 

> Does Tez run slower than hive on larger dataset (~2.5 TB)?
> --
>
> Key: HIVE-12683
> URL: https://issues.apache.org/jira/browse/HIVE-12683
> Project: Hive
>  Issue Type: Bug
>Reporter: rohit garg
>
> We have started to look into testing tez query engine. From initial results, 
> we are getting 30% performance boost over Hive on smaller data set(1-10 GB) 
> but Hive starts to perform better than Tez as data size increases. Like when 
> we run a hive query with Tez on about 2.3 TB worth of data, it performs worse 
> than hive alone.(~20% less performance) Details are in the post below.
> On a cluster with 1.3 TB RAM, I set the following property :
> set tez.task.resource.memory.mb=1; set tez.am.resource.memory.mb=59205; 
> set tez.am.launch.cmd-opts =-Xmx47364m; set hive.tez.container.size=59205; 
> set hive.tez.java.opts=-Xmx47364m; set tez.am.grouping.max-size=3670016;
> Is it normal or I am missing some property / not configuring some property 
> properly? Also, I am using an older version of Tez as of now. Could that be 
> the issue too? I still have to bootstrap latest version of Tez on EMR and 
> test it and see if that could do any better.
> Thought of asking here too
> http://www.jwplayer.com/blog/hive-with-tez-on-emr/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12683) Does Tez run slower than hive on larger dataset (~2.5 TB)?

2015-12-15 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058968#comment-15058968
 ] 

Hitesh Shah commented on HIVE-12683:


Nevermind - just noticed that the query text is on the blog. 

> Does Tez run slower than hive on larger dataset (~2.5 TB)?
> --
>
> Key: HIVE-12683
> URL: https://issues.apache.org/jira/browse/HIVE-12683
> Project: Hive
>  Issue Type: Bug
>Reporter: rohit garg
>
> We have started to look into testing tez query engine. From initial results, 
> we are getting 30% performance boost over Hive on smaller data set(1-10 GB) 
> but Hive starts to perform better than Tez as data size increases. Like when 
> we run a hive query with Tez on about 2.3 TB worth of data, it performs worse 
> than hive alone.(~20% less performance) Details are in the post below.
> On a cluster with 1.3 TB RAM, I set the following property :
> set tez.task.resource.memory.mb=1; set tez.am.resource.memory.mb=59205; 
> set tez.am.launch.cmd-opts =-Xmx47364m; set hive.tez.container.size=59205; 
> set hive.tez.java.opts=-Xmx47364m; set tez.am.grouping.max-size=3670016;
> Is it normal or I am missing some property / not configuring some property 
> properly? Also, I am using an older version of Tez as of now. Could that be 
> the issue too? I still have to bootstrap latest version of Tez on EMR and 
> test it and see if that could do any better.
> Thought of asking here too
> http://www.jwplayer.com/blog/hive-with-tez-on-emr/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12683) Does Tez run slower than hive on larger dataset (~2.5 TB)?

2015-12-15 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059037#comment-15059037
 ] 

Hitesh Shah commented on HIVE-12683:


For SSDs, you should be able to run a few more containers per node. Maybe try 
with say 8 containers sized to 20 GB each ( Xmx 16G ) as a start. 

Also, you may want to try the large group-by query with "hive.map.aggr" set to 
false to help with the OOMs. 


 

> Does Tez run slower than hive on larger dataset (~2.5 TB)?
> --
>
> Key: HIVE-12683
> URL: https://issues.apache.org/jira/browse/HIVE-12683
> Project: Hive
>  Issue Type: Bug
>Reporter: rohit garg
>
> We have started to look into testing tez query engine. From initial results, 
> we are getting 30% performance boost over Hive on smaller data set(1-10 GB) 
> but Hive starts to perform better than Tez as data size increases. Like when 
> we run a hive query with Tez on about 2.3 TB worth of data, it performs worse 
> than hive alone.(~20% less performance) Details are in the post below.
> On a cluster with 1.3 TB RAM, I set the following property :
> set tez.task.resource.memory.mb=1; set tez.am.resource.memory.mb=59205; 
> set tez.am.launch.cmd-opts =-Xmx47364m; set hive.tez.container.size=59205; 
> set hive.tez.java.opts=-Xmx47364m; set tez.am.grouping.max-size=3670016;
> Is it normal or I am missing some property / not configuring some property 
> properly? Also, I am using an older version of Tez as of now. Could that be 
> the issue too? I still have to bootstrap latest version of Tez on EMR and 
> test it and see if that could do any better.
> Thought of asking here too
> http://www.jwplayer.com/blog/hive-with-tez-on-emr/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12357) Allow user to set tez job name

2015-11-05 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14992798#comment-14992798
 ] 

Hitesh Shah commented on HIVE-12357:


Changes look good. Maybe change version to an int and set value to 2?

> Allow user to set tez job name
> --
>
> Key: HIVE-12357
> URL: https://issues.apache.org/jira/browse/HIVE-12357
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
> Attachments: HIVE-12357.1.patch
>
>
> Need something like mapred.job.name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Moved] (HIVE-11270) Tez gives different responses when run on Physical tables and logical views

2015-07-15 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah moved TEZ-2622 to HIVE-11270:
-

Affects Version/s: (was: 0.7.0)
  Key: HIVE-11270  (was: TEZ-2622)
  Project: Hive  (was: Apache Tez)

 Tez gives different responses when run on Physical tables and logical views
 ---

 Key: HIVE-11270
 URL: https://issues.apache.org/jira/browse/HIVE-11270
 Project: Hive
  Issue Type: Bug
 Environment: Hive 1.2.0 and Tez 0.7.0,
Reporter: Soundararajan Velu
Priority: Critical

 The same query, one on view and other on the physical table yields different 
 results, the query on view returns no or few records output.
 CBO is turned on and following are the flags used,
 set hive.cli.print.current.db=true;
 set hive.cli.print.header=true;
 set hive.execution.engine=tez;
 set mapreduce.job.queuename=admin;
 set tez.queue.name=admin;
 set hive.tez.container.size=5096;
 set tez.task.resource.memory.mb=5096;
 set hive.auto.convert.join=true;
 set hive.auto.convert.sortmerge.join.to.mapjoin=true;
 set hive.auto.convert.sortmerge.join=true;
 set hive.enforce.bucketmapjoin=true;
 set hive.enforce.bucketing=true;
 set hive.enforce.sorting=true;
 set hive.enforce.sortmergebucketmapjoin=true;
 set hive.optimize.bucketmapjoin.sortedmerge=true; 
 set hive.optimize.skewjoin=true;
 set hive.optimize.skewjoin.compiletime=true;
 set hive.groupby.skewindata=true;
 set hive.convert.join.bucket.mapjoin.tez=true;
 set hive.exec.parallel=true;
 set hive.vectorized.execution.enabled=true;
 set hive.vectorized.groupby.maxentries=10240;
 set hive.vectorized.groupby.flush.percent=0.1;
 set hive.tez.auto.reducer.parallelism=true;
 set hive.tez.min.partition.factor=50;
 set hive.tez.max.partition.factor=100;
 set io.sort.mb=400;
 set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
 set hive.hashtable.initialCapacity=100;
 set hive.hashtable.key.count.adjustment=1.0;
 set hive.hashtable.loadfactor=0.99;
 set tez.runtime.io.sort.mb=1800;
 set tez.runtime.sort.threads=4;
 set tez.runtime.io.sort.factor=200;
 set tez.runtime.shuffle.memory-to-memory.enable=false;
 set tez.runtime.shuffle.memory-to-memory.segments=4;
 set tez.runtime.pipelined-shuffle.enable=true;
 set tez.runtime.optimize.shared.fetch=true;
 set tez.runtime.shuffle.keep-alive.enabled=true;
 set tez.runtime.optimize.local.fetch=false;
 set hive.exec.reducers.max=300;
 set hive.mapjoin.hybridgrace.hashtable=true;
 set hive.mapjoin.hybridgrace.memcheckfrequency=1024;
 set hive.mapjoin.optimized.hashtable=true;
 set hive.mapjoin.optimized.hashtable.wbsize=88;
 set hive.mapjoin.localtask.max.memory.usage=0.99;
 set hive.optimize.skewjoin.compiletime=false;
 set hive.skewjoin.key=1000;
 set hive.skewjoin.mapjoin.map.tasks=200;
 set hive.skewjoin.mapjoin.min.split=134217728;
 set hive.compute.query.using.stats=true;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11270) Tez gives different responses when run on Physical tables and logical views

2015-07-15 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated HIVE-11270:
---
Affects Version/s: 1.2.0

 Tez gives different responses when run on Physical tables and logical views
 ---

 Key: HIVE-11270
 URL: https://issues.apache.org/jira/browse/HIVE-11270
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
 Environment: Hive 1.2.0 and Tez 0.7.0,
Reporter: Soundararajan Velu
Priority: Critical

 The same query, one on view and other on the physical table yields different 
 results, the query on view returns no or few records output.
 CBO is turned on and following are the flags used,
 set hive.cli.print.current.db=true;
 set hive.cli.print.header=true;
 set hive.execution.engine=tez;
 set mapreduce.job.queuename=admin;
 set tez.queue.name=admin;
 set hive.tez.container.size=5096;
 set tez.task.resource.memory.mb=5096;
 set hive.auto.convert.join=true;
 set hive.auto.convert.sortmerge.join.to.mapjoin=true;
 set hive.auto.convert.sortmerge.join=true;
 set hive.enforce.bucketmapjoin=true;
 set hive.enforce.bucketing=true;
 set hive.enforce.sorting=true;
 set hive.enforce.sortmergebucketmapjoin=true;
 set hive.optimize.bucketmapjoin.sortedmerge=true; 
 set hive.optimize.skewjoin=true;
 set hive.optimize.skewjoin.compiletime=true;
 set hive.groupby.skewindata=true;
 set hive.convert.join.bucket.mapjoin.tez=true;
 set hive.exec.parallel=true;
 set hive.vectorized.execution.enabled=true;
 set hive.vectorized.groupby.maxentries=10240;
 set hive.vectorized.groupby.flush.percent=0.1;
 set hive.tez.auto.reducer.parallelism=true;
 set hive.tez.min.partition.factor=50;
 set hive.tez.max.partition.factor=100;
 set io.sort.mb=400;
 set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
 set hive.hashtable.initialCapacity=100;
 set hive.hashtable.key.count.adjustment=1.0;
 set hive.hashtable.loadfactor=0.99;
 set tez.runtime.io.sort.mb=1800;
 set tez.runtime.sort.threads=4;
 set tez.runtime.io.sort.factor=200;
 set tez.runtime.shuffle.memory-to-memory.enable=false;
 set tez.runtime.shuffle.memory-to-memory.segments=4;
 set tez.runtime.pipelined-shuffle.enable=true;
 set tez.runtime.optimize.shared.fetch=true;
 set tez.runtime.shuffle.keep-alive.enabled=true;
 set tez.runtime.optimize.local.fetch=false;
 set hive.exec.reducers.max=300;
 set hive.mapjoin.hybridgrace.hashtable=true;
 set hive.mapjoin.hybridgrace.memcheckfrequency=1024;
 set hive.mapjoin.optimized.hashtable=true;
 set hive.mapjoin.optimized.hashtable.wbsize=88;
 set hive.mapjoin.localtask.max.memory.usage=0.99;
 set hive.optimize.skewjoin.compiletime=false;
 set hive.skewjoin.key=1000;
 set hive.skewjoin.mapjoin.map.tasks=200;
 set hive.skewjoin.mapjoin.min.split=134217728;
 set hive.compute.query.using.stats=true;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11270) Tez gives different responses when run on Physical tables and logical views

2015-07-15 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14628367#comment-14628367
 ] 

Hitesh Shah commented on HIVE-11270:


Moved jira to Hive. \cc [~hagleitn] 

 Tez gives different responses when run on Physical tables and logical views
 ---

 Key: HIVE-11270
 URL: https://issues.apache.org/jira/browse/HIVE-11270
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
 Environment: Hive 1.2.0 and Tez 0.7.0,
Reporter: Soundararajan Velu
Priority: Critical

 The same query, one on view and other on the physical table yields different 
 results, the query on view returns no or few records output.
 CBO is turned on and following are the flags used,
 set hive.cli.print.current.db=true;
 set hive.cli.print.header=true;
 set hive.execution.engine=tez;
 set mapreduce.job.queuename=admin;
 set tez.queue.name=admin;
 set hive.tez.container.size=5096;
 set tez.task.resource.memory.mb=5096;
 set hive.auto.convert.join=true;
 set hive.auto.convert.sortmerge.join.to.mapjoin=true;
 set hive.auto.convert.sortmerge.join=true;
 set hive.enforce.bucketmapjoin=true;
 set hive.enforce.bucketing=true;
 set hive.enforce.sorting=true;
 set hive.enforce.sortmergebucketmapjoin=true;
 set hive.optimize.bucketmapjoin.sortedmerge=true; 
 set hive.optimize.skewjoin=true;
 set hive.optimize.skewjoin.compiletime=true;
 set hive.groupby.skewindata=true;
 set hive.convert.join.bucket.mapjoin.tez=true;
 set hive.exec.parallel=true;
 set hive.vectorized.execution.enabled=true;
 set hive.vectorized.groupby.maxentries=10240;
 set hive.vectorized.groupby.flush.percent=0.1;
 set hive.tez.auto.reducer.parallelism=true;
 set hive.tez.min.partition.factor=50;
 set hive.tez.max.partition.factor=100;
 set io.sort.mb=400;
 set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
 set hive.hashtable.initialCapacity=100;
 set hive.hashtable.key.count.adjustment=1.0;
 set hive.hashtable.loadfactor=0.99;
 set tez.runtime.io.sort.mb=1800;
 set tez.runtime.sort.threads=4;
 set tez.runtime.io.sort.factor=200;
 set tez.runtime.shuffle.memory-to-memory.enable=false;
 set tez.runtime.shuffle.memory-to-memory.segments=4;
 set tez.runtime.pipelined-shuffle.enable=true;
 set tez.runtime.optimize.shared.fetch=true;
 set tez.runtime.shuffle.keep-alive.enabled=true;
 set tez.runtime.optimize.local.fetch=false;
 set hive.exec.reducers.max=300;
 set hive.mapjoin.hybridgrace.hashtable=true;
 set hive.mapjoin.hybridgrace.memcheckfrequency=1024;
 set hive.mapjoin.optimized.hashtable=true;
 set hive.mapjoin.optimized.hashtable.wbsize=88;
 set hive.mapjoin.localtask.max.memory.usage=0.99;
 set hive.optimize.skewjoin.compiletime=false;
 set hive.skewjoin.key=1000;
 set hive.skewjoin.mapjoin.map.tasks=200;
 set hive.skewjoin.mapjoin.min.split=134217728;
 set hive.compute.query.using.stats=true;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10274) Send context and description to tez via dag info

2015-04-09 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14488044#comment-14488044
 ] 

Hitesh Shah commented on HIVE-10274:


Looks fine. Maybe put LOG.debug(DagInfo:  + dagInfo); within a if 
LOG.isDebugEnabled() ? 

 Send context and description to tez via dag info
 

 Key: HIVE-10274
 URL: https://issues.apache.org/jira/browse/HIVE-10274
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Attachments: HIVE-10274.1.patch


 tez has a way to specify context and description (which is shown in the ui) 
 for each dag.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10145) set Tez ACLs appropriately in hive

2015-04-03 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14394139#comment-14394139
 ] 

Hitesh Shah commented on HIVE-10145:


No open jira. Can you please go ahead and create one? Thanks.

 set Tez ACLs appropriately in hive
 --

 Key: HIVE-10145
 URL: https://issues.apache.org/jira/browse/HIVE-10145
 Project: Hive
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-10145.1.patch


 Hive should make the necessary changes to integrate with Tez and Timeline. It 
 should pass the necessary ACL related params to ensure that query execution + 
 logs is only visible to the relevant users.
 Proposed Change -
 Set DAG level ACL for user running the query (the end user), to allow modify 
 + view



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10050) Support overriding memory configuration for AM launched for TempletonControllerJob

2015-03-27 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated HIVE-10050:
---
Attachment: HIVE-10050.2.patch

 Support overriding memory configuration for AM launched for 
 TempletonControllerJob
 --

 Key: HIVE-10050
 URL: https://issues.apache.org/jira/browse/HIVE-10050
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Reporter: Hitesh Shah
Assignee: Hitesh Shah
 Attachments: HIVE-10050.1.patch, HIVE-10050.2.patch


 The MR AM launched for the TempletonControllerJob does not do any heavy 
 lifting and therefore can be configured to use a small memory footprint ( as 
 compared to potentially using the default footprint for most MR jobs on a 
 cluster ). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10050) Support overriding memory configuration for AM launched for TempletonControllerJob

2015-03-23 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14376736#comment-14376736
 ] 

Hitesh Shah commented on HIVE-10050:


[~thejas] [~ekoifman] Please take a look.

 Support overriding memory configuration for AM launched for 
 TempletonControllerJob
 --

 Key: HIVE-10050
 URL: https://issues.apache.org/jira/browse/HIVE-10050
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Reporter: Hitesh Shah
Assignee: Hitesh Shah
 Attachments: HIVE-10050.1.patch


 The MR AM launched for the TempletonControllerJob does not do any heavy 
 lifting and therefore can be configured to use a small memory footprint ( as 
 compared to potentially using the default footprint for most MR jobs on a 
 cluster ). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10050) Support overriding memory configuration for AM launched for TempletonControllerJob

2015-03-22 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated HIVE-10050:
---
Component/s: WebHCat

 Support overriding memory configuration for AM launched for 
 TempletonControllerJob
 --

 Key: HIVE-10050
 URL: https://issues.apache.org/jira/browse/HIVE-10050
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Reporter: Hitesh Shah
Assignee: Hitesh Shah
 Attachments: HIVE-10050.1.patch


 The MR AM launched for the TempletonControllerJob does not do any heavy 
 lifting and therefore can be configured to use a small memory footprint ( as 
 compared to potentially using the default footprint for most MR jobs on a 
 cluster ). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10050) Support overriding memory configuration for AM launched for TempletonControllerJob

2015-03-22 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated HIVE-10050:
---
Attachment: HIVE-10050.1.patch

 Support overriding memory configuration for AM launched for 
 TempletonControllerJob
 --

 Key: HIVE-10050
 URL: https://issues.apache.org/jira/browse/HIVE-10050
 Project: Hive
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Hitesh Shah
 Attachments: HIVE-10050.1.patch


 The MR AM launched for the TempletonControllerJob does not do any heavy 
 lifting and therefore can be configured to use a small memory footprint ( as 
 compared to potentially using the default footprint for most MR jobs on a 
 cluster ). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10050) Support overriding memory configuration for AM launched for TempletonControllerJob

2015-03-22 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14375238#comment-14375238
 ] 

Hitesh Shah commented on HIVE-10050:


Attached patch. No tests at the moment. If someone can give me a pointer to 
where existing tests exist for the TempletonControllerJob, I can modify as 
needed to test these changes. 

 Support overriding memory configuration for AM launched for 
 TempletonControllerJob
 --

 Key: HIVE-10050
 URL: https://issues.apache.org/jira/browse/HIVE-10050
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Reporter: Hitesh Shah
Assignee: Hitesh Shah
 Attachments: HIVE-10050.1.patch


 The MR AM launched for the TempletonControllerJob does not do any heavy 
 lifting and therefore can be configured to use a small memory footprint ( as 
 compared to potentially using the default footprint for most MR jobs on a 
 cluster ). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10050) Support overriding memory configuration for AM launched for TempletonControllerJob

2015-03-22 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14375266#comment-14375266
 ] 

Hitesh Shah commented on HIVE-10050:


New configs introduced:
  - templeton.mr.am.memory.mb
  - templeton.controller.mr.am.java.opts

 Support overriding memory configuration for AM launched for 
 TempletonControllerJob
 --

 Key: HIVE-10050
 URL: https://issues.apache.org/jira/browse/HIVE-10050
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Reporter: Hitesh Shah
Assignee: Hitesh Shah
 Attachments: HIVE-10050.1.patch


 The MR AM launched for the TempletonControllerJob does not do any heavy 
 lifting and therefore can be configured to use a small memory footprint ( as 
 compared to potentially using the default footprint for most MR jobs on a 
 cluster ). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)