Re: How executor Understand which RDDs needed to be persist from the submitted Task

2020-01-09 Thread Jack Kolokasis

Thanks for your help!

Iacovos

On 1/9/20 5:49 PM, Wenchen Fan wrote:

You can take a look at ShuffleMapTask.runTask. It's not just a function.

On Thu, Jan 9, 2020 at 11:25 PM Jack Kolokasis <mailto:koloka...@ics.forth.gr>> wrote:


Thanks for the help. I read that driver only send a function
(task) to executors and the executors apply this function to their
local RDD partitions.

Iacovos

On 1/9/20 5:03 PM, Wenchen Fan wrote:

RDD has a flag `storageLevel` which will be set by calling
persist(). RDD will be serialized and sent to executors for
running tasks. So executors just look at RDD.storageLevel and
store output in its block manager when needed.

On Thu, Jan 9, 2020 at 5:53 PM Jack Kolokasis
mailto:koloka...@ics.forth.gr>> wrote:

Hello all,

I want to find when a Task that is sended by Driver to
executor contains
a call to function persist(). I am trying to read the
submitted function
that driver send to executor but I could not find any call to
persist()
method. Do you know how executor understand which RDDs needed
to be persist?

Thanks,
Iacovos Kolokasis

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
<mailto:dev-unsubscr...@spark.apache.org>



Re: How executor Understand which RDDs needed to be persist from the submitted Task

2020-01-09 Thread Jack Kolokasis
Thanks for the help. I read that driver only send a function (task) to 
executors and the executors apply this function to their local RDD 
partitions.


Iacovos

On 1/9/20 5:03 PM, Wenchen Fan wrote:
RDD has a flag `storageLevel` which will be set by calling persist(). 
RDD will be serialized and sent to executors for running tasks. So 
executors just look at RDD.storageLevel and store output in its block 
manager when needed.


On Thu, Jan 9, 2020 at 5:53 PM Jack Kolokasis <mailto:koloka...@ics.forth.gr>> wrote:


Hello all,

I want to find when a Task that is sended by Driver to executor
contains
a call to function persist(). I am trying to read the submitted
function
that driver send to executor but I could not find any call to
persist()
method. Do you know how executor understand which RDDs needed to
be persist?

Thanks,
Iacovos Kolokasis

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
<mailto:dev-unsubscr...@spark.apache.org>



How executor Understand which RDDs needed to be persist from the submitted Task

2020-01-09 Thread Jack Kolokasis

Hello all,

I want to find when a Task that is sended by Driver to executor contains 
a call to function persist(). I am trying to read the submitted function 
that driver send to executor but I could not find any call to persist() 
method. Do you know how executor understand which RDDs needed to be persist?


Thanks,
Iacovos Kolokasis

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Tungsten Memory Consumer

2019-02-12 Thread Jack Kolokasis

Hello,

    I am sorry about my first explanation, was not concrete. Well I 
will explain further about TaskMemoryManager. TaskMemoryManager manages 
the execution memory of each task application as follow:


    1. MemoryConsumer is the entry for the Spark task to run. 
MemoryConsumer requests execution memory from TaskMemoryManager.


    2. TaskMemoryManager requests memory from ExeuctionMemoryPool. If 
the execution memory is insufficient, it will borrow storage memory. If 
it is not enough, it will force the cached data in storage to be flushed 
to disk to free memory.


    3. If the Memory returned by the ExecutionMemoryPool is 
insufficient, the MemoryConsumer.spill method is called to flush the 
memory data occupied by the MemoryConsumer to the disk to free the memory.


    4. Then use HeapMemoryAllocator to allocate memory for 
MemoryConsumer OnHeap or use UnsafeMemoryAllocator to allocate memory 
for MemoryConsumer OffHeap, based on the MemoryMode.


    5. Wrap the allocation memory to a MemoryBlock and each MemoryBlock 
is corresponding to a page.


    6. The taskpageManager maintains the page Table of the task, and 
the task can query the corresponding MemoryBlock by page number.


So using the name "TungstenConsumer" I mean the MemoryConsumer that use 
offHeap memory execution. Running some tests to see when 
HeapMemoryAllocator is called, I see that for some applications 
HeapMemoryAllocator is called and for some others not. Could you please 
explain me why this happens ? HeapMemoryAllocator would not always 
called by MemoryConsumer  ?


--Iacovos

On 11/02/2019 11:06 πμ, Wenchen Fan wrote:

what do you mean by ''Tungsten Consumer"?

On Fri, Feb 8, 2019 at 6:11 PM Jack Kolokasis <mailto:koloka...@ics.forth.gr>> wrote:


Hello all,
 I am studying about Tungsten Project and I am wondering when
Spark
creates a Tungsten consumer. While I am running some applications,
I see
that Spark creates Tungsten Consumer while in other applications not
(using the same configuration). When does this happens ?

I am looking forward for your reply.

--Jack Kolokasis

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
<mailto:dev-unsubscr...@spark.apache.org>



--
Iacovos Kolokasis
Email: koloka...@ics.forth.gr
Postgraduate Student CSD, University of Crete
Researcher in CARV Lab ICS FORTH



Tungsten Memory Consumer

2019-02-08 Thread Jack Kolokasis

Hello all,
    I am studying about Tungsten Project and I am wondering when Spark 
creates a Tungsten consumer. While I am running some applications, I see 
that Spark creates Tungsten Consumer while in other applications not 
(using the same configuration). When does this happens ?


I am looking forward for your reply.

--Jack Kolokasis

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



TaskMemoryManager

2019-02-07 Thread Jack Kolokasis

Hello all,

    well I try to profile Spark in order to see which functions are 
called while the execution of an application. Based on my results, i see 
that in SVM benchmark, TaskMemoryManager called to allocate extra memory 
using HeapMemoryAllocator. In addition with Linear Regression 
application execution where TaskMemoryManager does not need to allocate 
extra memory.


Can anyone explain to me why this happens. Thanks a lot and I am looking 
forward for your reply.


--Jack Kolokasis


-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Profile Spark Executors

2019-01-18 Thread Jack Kolokasis

Hi all,

    I try to profile my spark executors performance when use on Heap 
persistent level in compare to use off-Heap persistent level. I use 
statsd-jvm-profiler to profile each executor.


From the results i see that application spends 71,92% of its threads 
running the method sun.nio.ch.EPollArrayWrapper.epollWait(). This 
happens to all benchmarks execution. (Matrix Factorization, Linear 
Regression, etc)


Can anyone explain me why this happens ?

Thanks for your help,
--Iacovos Kolokasis

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Maven

2018-11-20 Thread Jack Kolokasis

Hello,

   is there any way to use my local custom - Spark as dependency while 
I am using maven to compile my applications ?


Thanks for your reply,
--Iacovos

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: welcome a new batch of committers

2018-10-03 Thread Jack Kolokasis

Congratulations to all !!

-Iacovos


On 03/10/2018 12:54 μμ, Ted Yu wrote:

Congratulations to all !

 Original message 
From: Jungtaek Lim 
Date: 10/3/18 2:41 AM (GMT-08:00)
To: Marco Gaido 
Cc: dev 
Subject: Re: welcome a new batch of committers

Congrats all! You all deserved it.
On Wed, 3 Oct 2018 at 6:35 PM Marco Gaido > wrote:


Congrats you all!

Il giorno mer 3 ott 2018 alle ore 11:29 Liang-Chi Hsieh
mailto:vii...@gmail.com>> ha scritto:


Congratulations to all new committers!


rxin wrote
> Hi all,
>
> The Apache Spark PMC has recently voted to add several new
committers to
> the project, for their contributions:
>
> - Shane Knapp (contributor to infra)
> - Dongjoon Hyun (contributor to ORC support and other parts
of Spark)
> - Kazuaki Ishizaki (contributor to Spark SQL)
> - Xingbo Jiang (contributor to Spark Core and SQL)
> - Yinan Li (contributor to Spark on Kubernetes)
> - Takeshi Yamamuro (contributor to Spark SQL)
>
> Please join me in welcoming them!





--
Sent from:
http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org




--
Iacovos Kolokasis
Email: koloka...@ics.forth.gr
Postgraduate Student CSD, University of Crete
Researcher in CARV Lab ICS FORTH



Off Heap Memory

2018-09-11 Thread Jack Kolokasis

Hello,
    I recently start studying the Spark's memory management system. 
More spesifically I want to understand how spark use the off-Heap memory.
Interanlly I saw, that there are two types of offHeap memory. 
(offHeapExecutionMemoryPool and offHeapStorageMemoryPool).


    How Spark use the offHeap memory ?

    How Spark use the offHeapExecutionMemoryPool and how the 
offHeapStorageMemoryPool ?


    Is there any good tutorial or a guide that explain internally the 
Spark Memory Management ?


Thanks for your reply,
-Iacovos

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Off Heap Memory

2018-08-24 Thread Jack Kolokasis

Hello,
    I recently start studying the Spark's memory management system. My 
question is about the offHeapExecutionMemoryPool and 
offHeapStorageMemoryPool.


    1. How Spark use the offHeapExecutionMemoryPool ?
    2. How use the offHeap memory (I understand the allocation side), 
but it is not such clear how store or load from this memory. (e.g how an 
Object is saved on the offHeap Memory) ?


Thanks for your reply,
-Iacovos

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org