[Spark Sql] Direct write on hive and s3 while executing a CTAS on spark sql

2019-10-24 Thread francexo83
Hi all,
I'm using spark 2.4.0, my spark.sql.catalogImplementation is set to hive
while spark.sql.warehouse.dir is set to a specific s3 bucket.

I want to execute a CTAS statement in spark sql like the one below.

*create table as db_name.table_name as (select ..)*
When writing, spark always uses the hive staging folder on s3 as a scratch
dir. Once the executors finish their computation SPARK  moves the files
from the staging dir to the final location.
This is causing performance degradation in write phase because of the
nature of the object storage where the rename operation is not permitted.

Is it possible to enable a direct-write on s3 bucket while performing a
CTAS execution in the scenario depicted above?

I performed the write operation by using the DataFrameWriter.saveAsTable
api and obtained the desired result.

Thank you in advance


Re: Monitor executor and task memory getting used

2019-10-24 Thread Sriram Ganesh
I was wrong here.

I am using spark standalone cluster and I am not using YARN or MESOS. Is it
possible to track spark execution memory?.

On Mon, Oct 21, 2019 at 5:42 PM Sriram Ganesh  wrote:

> I looked into this. But I found it is possible like this
>
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/status/AppStatusListener.scala#L229
>
> Line no 230. This is for executors.
>
> Just wanna cross verify is that right?
>
>
>
> On Mon, 21 Oct 2019, 17:24 Alonso Isidoro Roman, 
> wrote:
>
>> Take a look in this thread
>> 
>>
>> El lun., 21 oct. 2019 a las 13:45, Sriram Ganesh ()
>> escribió:
>>
>>> Hi,
>>>
>>> I wanna monitor how much memory executor and task used for a given job.
>>> Is there any direct method available for it which can be used to track this
>>> metric?
>>>
>>> --
>>> *Sriram G*
>>> *Tech*
>>>
>>>
>>
>> --
>> Alonso Isidoro Roman
>> [image: https://]about.me/alonso.isidoro.roman
>>
>> 
>>
>

-- 
*Sriram G*
*Tech*