Re: hive on spark job not start enough executors

2016-09-09 Thread 明浩 冯
All the parameters except spark.executor.instances are specified in 
spark-default.conf located in hive's conf folder.  So I think it's a yes.

I also checked on spark's web page when a hive on spark job is running, the 
parameters shown on the web page are exactly what I specified in the config 
file including spark.shuffle.service.enabled and 
spark.dynamicAllocation.enabled.


Should I specify a fixed executor.instances in the file? But it's not good for 
me.


By the way, the data source of my query is parquet files. In hive side I just 
created a external table from the parquet.



Thanks,

Minghao Feng


From: Mich Talebzadeh 
Sent: Friday, September 9, 2016 4:49:55 PM
To: user
Subject: Re: hive on spark job not start enough executors

when you start hive on spark do you set any parameters for the submitted job 
(or read them from init file)?

set spark.master=yarn;
set spark.deploy.mode=client;
set spark.executor.memory=3g;
set spark.driver.memory=3g;
set spark.executor.instances=2;
set spark.ui.port=;


Dr Mich Talebzadeh



LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw



http://talebzadehmich.wordpress.com


Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction.



On 9 September 2016 at 09:30, ?? ? 
> wrote:

Hi there,


I encountered a problem that makes hive on spark with a very low performance.

I'm using spark 1.6.2 and hive 2.1.0, I specified


spark.shuffle.service.enabledtrue
spark.dynamicAllocation.enabled  true

in my spark-default.conf file (the file is in both spark and hive conf folder) 
to make spark job to get executors dynamically.
The configuration works correctly when I run spark jobs, but when I use hive on 
spark, it only started a few executors although there are more enough cores and 
memories to start more executors.
For example, for the same SQL query, if I run on sparkSQL, it can start more 
than 20 executors, but with hive on spark, only 3.

How can I improve the performance on hive on spark? Any suggestions please.

Thanks,
Minghao Feng




Re: hive on spark job not start enough executors

2016-09-09 Thread Mich Talebzadeh
when you start hive on spark do you set any parameters for the submitted
job (or read them from init file)?

set spark.master=yarn;
set spark.deploy.mode=client;
set spark.executor.memory=3g;
set spark.driver.memory=3g;
set spark.executor.instances=2;
set spark.ui.port=;

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 9 September 2016 at 09:30, ?? ?  wrote:

> Hi there,
>
>
> I encountered a problem that makes hive on spark with a very low
> performance.
>
> I'm using spark 1.6.2 and hive 2.1.0, I specified
>
>
> spark.shuffle.service.enabledtrue
> spark.dynamicAllocation.enabled  true
>
> in my spark-default.conf file (the file is in both spark and hive conf
> folder) to make spark job to get executors dynamically.
> The configuration works correctly when I run spark jobs, but when I use
> hive on spark, it only started a few executors although there are more
> enough cores and memories to start more executors.
> For example, for the same SQL query, if I run on sparkSQL, it can start
> more than 20 executors, but with hive on spark, only 3.
>
> How can I improve the performance on hive on spark? Any suggestions please.
>
> Thanks,
> Minghao Feng
>
>


Re: Quota for rogue ad-hoc queries

2016-09-09 Thread ravi teja
Hi,

I am trying to add this feature in hive ( HIVE-11735
 ).
But hit a road block while setting the quota during session folder creation
as the quota can be only set by super user in HDFS.
Any thoughts how to avoid this issue?

Thanks,
Ravi

On Fri, Sep 2, 2016 at 2:35 PM, ravi teja  wrote:

> Hi Gopal,
>
> We are using MR not Tez.
> I feel since the adhoc queries data output size is something we can
> determine, rather than the time the job takes, I was wondering more from
> output size/number of rows quota.
>
> Thanks,
> Ravi
>
> On Fri, Sep 2, 2016 at 2:57 AM, Gopal Vijayaraghavan 
> wrote:
>
>>
>> > Are there any other ways?
>>
>> Are you running Tez?
>>
>> Tez heartbeats counters back to the AppMaster every few seconds, so the
>> AppMaster has an accurate (but delayed) count of HDFS_BYTES_WRITTEN.
>>
>> Cheers,
>> Gopal
>>
>>
>>
>>
>>
>>
>>
>>
>


Re: load data Failed with exception java.lang.IndexOutOfBoundsException

2016-09-09 Thread Prasanth Jayachandran
You are hitting this issue https://issues.apache.org/jira/browse/HIVE-13185 
which is fixed in latest hive release (2.1.0)

Thanks
Prasanth

> On Sep 9, 2016, at 2:21 AM, Gopal Vijayaraghavan  wrote:
> 
> 
>> It will be ok if the file has more than two characters,that is a little 
>> interesting. I can not understand the result of function checkInputFormat is 
>> OrcInputFormat,maybe that is just right.
> 
> My guess is that it is trying to read the 3 letter string "ORC" from that 
> file and failing.
> 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java#L471
> 
> Cheers,
> Gopal
> 
> 
> 
> 
> 



Re: load data Failed with exception java.lang.IndexOutOfBoundsException

2016-09-09 Thread Gopal Vijayaraghavan

> It will be ok if the file has more than two characters,that is a little 
> interesting. I can not understand the result of function checkInputFormat is 
> OrcInputFormat,maybe that is just right.

My guess is that it is trying to read the 3 letter string "ORC" from that file 
and failing.

https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java#L471

Cheers,
Gopal






Re: Re: load data Failed with exception java.lang.IndexOutOfBoundsException

2016-09-09 Thread C R

drop table ods.loadtest;
create external table ods.loadtest
(
c1 string
)
stored as textfile
location '/tmp/loadtest';


hive> show create table ods.loadtest;
OK
CREATE EXTERNAL TABLE `ods.loadtest`(
  `c1` string)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'hdfs://bidc/tmp/loadtest'
TBLPROPERTIES (
  'numFiles'='1',
  'totalSize'='4',
  'transient_lastDdlTime'='1473400143')


hive.default.fileformat
TextFile

  Expects one of [textfile, sequencefile, rcfile, orc].
  Default file format for CREATE TABLE statement. Users can explicitly 
override it by CREATE TABLE ... STORED AS [FORMAT]


> LOAD DATA LOCAL INPATH '1.dat' overwrite INTO TABLE ODS.loadtest;
Loading data to table ods.loadtest
Failed with exception java.lang.IndexOutOfBoundsException
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.MoveTask


It will be ok if the file has more than two characters,that is a little 
interesting. I can not understand the result of function checkInputFormat is 
OrcInputFormat,maybe that is just right.

Thanks.


From: Stephen Sprague
Date: 2016-09-09 12:47
To: user@hive.apache.org
Subject: Re: Re: load data Failed with exception 
java.lang.IndexOutOfBoundsException
>at 
>org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.validateInput(OrcInputFormat.java:508)

would it be safe to assume that you are trying to load a text file into an 
table stored as ORC?

your create table doesn't specify that explicitly so that means you have a 
setting in your configs that says new tables are to be stored as ORC if not 
specified otherwise.

​
​too bad there isn't an error message like: "loading text data into into a 
non-TEXTFILE table generally isn't a good idea". :)

then again maybe somebody knows something i don't.

Cheers,
Stephen.​





On Thu, Sep 8, 2016 at 7:37 PM, C R 
> wrote:

Yes, based on my testing,it is wrong from 0 to 99 with the content of file 
1.dat, whether the column type is string or int.

hive.log:

2016-09-09T09:10:40,978 INFO  [d1e08abd-5f8b-4149-a679-00ba6b4f4ab9 main]: 
CliDriver (SessionState.java:printInfo(1029)) - Hive-on-MR is deprecated in 
Hive 2 and may not be available in the future versions. Consider using a 
different execution engine (i.e. tez, spark) or using Hive 1.X releases.
2016-09-09T09:11:17,433 INFO  [d1e08abd-5f8b-4149-a679-00ba6b4f4ab9 main]: 
conf.HiveConf (HiveConf.java:getLogIdVar(3177)) - Using the default value 
passed in for log id: d1e08abd-5f8b-4149-a679-00ba6b4f4ab9
2016-09-09T09:11:17,462 INFO  [d1e08abd-5f8b-4149-a679-00ba6b4f4ab9 main]: 
ql.Driver (Driver.java:compile(409)) - Compiling 
command(queryId=hadoop_20160909091117_2f9e8e3b-b2e8-4312-b473-535881c1d726): 
LOAD DATA LOCAL INPATH '1.dat' overwrite INTO TABLE ODS.loadtest
2016-09-09T09:11:18,016 INFO  [d1e08abd-5f8b-4149-a679-00ba6b4f4ab9 main]: 
metastore.HiveMetaStore (HiveMetaStore.java:logInfo(670)) - 0: get_table : 
db=ODS tbl=loadtest
2016-09-09T09:11:18,016 INFO  [d1e08abd-5f8b-4149-a679-00ba6b4f4ab9 main]: 
HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(280)) - ugi=hadoop   
ip=unknown-ip-addr  cmd=get_table : db=ODS tbl=loadtest
2016-09-09T09:11:18,162 INFO  [d1e08abd-5f8b-4149-a679-00ba6b4f4ab9 main]: 
ql.Driver (Driver.java:compile(479)) - Semantic Analysis Completed
2016-09-09T09:11:18,163 INFO  [d1e08abd-5f8b-4149-a679-00ba6b4f4ab9 main]: 
ql.Driver (Driver.java:getSchema(251)) - Returning Hive schema: 
Schema(fieldSchemas:null, properties:null)
2016-09-09T09:11:18,167 INFO  [d1e08abd-5f8b-4149-a679-00ba6b4f4ab9 main]: 
ql.Driver (Driver.java:compile(551)) - Completed compiling 
command(queryId=hadoop_20160909091117_2f9e8e3b-b2e8-4312-b473-535881c1d726); 
Time taken: 0.725 seconds
2016-09-09T09:11:18,167 INFO  [d1e08abd-5f8b-4149-a679-00ba6b4f4ab9 main]: 
ql.Driver (Driver.java:checkConcurrency(171)) - Concurrency mode is disabled, 
not creating a lock manager
2016-09-09T09:11:18,167 INFO  [d1e08abd-5f8b-4149-a679-00ba6b4f4ab9 main]: 
ql.Driver (Driver.java:execute(1493)) - Executing 
command(queryId=hadoop_20160909091117_2f9e8e3b-b2e8-4312-b473-535881c1d726): 
LOAD DATA LOCAL INPATH '1.dat' overwrite INTO TABLE ODS.loadtest
2016-09-09T09:11:18,172 INFO  [d1e08abd-5f8b-4149-a679-00ba6b4f4ab9 main]: 
ql.Driver (Driver.java:launchTask(1832)) - Starting task [Stage-0:MOVE] in 
serial mode
2016-09-09T09:11:18,172 INFO  [d1e08abd-5f8b-4149-a679-00ba6b4f4ab9 main]: 
exec.Task (SessionState.java:printInfo(1029)) - Loading data to table 
ods.loadtest from file:1.dat
2016-09-09T09:11:18,172 INFO  [d1e08abd-5f8b-4149-a679-00ba6b4f4ab9 main]: 
metastore.HiveMetaStore (HiveMetaStore.java:logInfo(670)) - 0: get_table : 
db=ods tbl=loadtest