Re: cache OS memory and spark usage of it

2018-04-11 Thread yncxcw
hi, Raúl 

(1)&(2) yes, the OS needs some pressure to release it. For example, if you
have a total 16GB ram in your machine, then you read a file of 8GB and
immediately close it. Noe the page cache would cache 8GB the file data. Then
you start a program requesting memory from OS, the OS will release the page
cache as long as your request goes beyond 8GB.

(3) I think you can configure your JVM with a maximum heap size of 14GB
(-xmx) and leave 2GB memory for OS.  you will have memory elasticity with
this configuration. The JVM will increase memory allocation from OS as long
as new objects are created, but it is bounded by 14GB which will not cause
memory swapping. For example, if your application only needs 8GB memory,
then the rest 8GB can be used for page cache, improving you IO performance.
Otherwise, if your application needs 14GB memory, then the JVM will force OS
to release almost all page cache. In this situation, your IO performance may
not be good, but you can hold more data (e.g, RDD) in your application.


Wei



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: cache OS memory and spark usage of it

2018-04-11 Thread Jose Raul Perez Rodriguez

it was helpful,

Then, the OS needs to fill some pressure from the applications 
requesting memory to free some memory cache?


Exactly under which circumstances the OS free that memory to give it to 
applications requesting it?


I mean if the total memory is 16GB and 10GB are used for OS cache, how 
the JVM can obtain memory from that.


Thanks,


On 11/04/18 01:36, yncxcw wrote:

hi, Raúl

First, the most of the OS memory cache is used by  Page Cache
   which OS use for caching the
recent read/write I/O.

I think the understanding of OS memory cache should be discussed in two
different perspectives. From a perspective of
user-space (e.g, Spark application), it is not used, since the Spark is not
allocating memory from this part of memory.
However, from a perspective of OS, it is actually used, because the memory
pages are already allocated for caching the
I/O pages. For each I/O request, the OS always allocate memory pages to
cache it to expect these cached I/O pages can be reused in near future.
Recall, you use vim/emacs to open a large file. It is pretty slow when you
open it at the first time. But it will be much faster when you close it and
open it immediately because the file has been cached in file cache at the
first time you open it.

It is hard for Spark to use this part of memory. Because this part of the
memory is managed by OS and is transparent to applications.  The only thing
you can do is that you can continuously allocate memory from OS (by
malloc()), to some certain points which the OS senses some memory pressure,
the OS will voluntarily release the page cache to satisfy your memory
allocation. Another thing is that the memory limit of Spark is limited by
maximum JVM heap size. So your memory request from your Spark application is
actually handled by JVM not the OS.


Hope this answer can help you!


Wei




--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org




-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: cache OS memory and spark usage of it

2018-04-10 Thread yncxcw
hi, Raúl 

First, the most of the OS memory cache is used by  Page Cache
   which OS use for caching the
recent read/write I/O.

I think the understanding of OS memory cache should be discussed in two
different perspectives. From a perspective of 
user-space (e.g, Spark application), it is not used, since the Spark is not
allocating memory from this part of memory. 
However, from a perspective of OS, it is actually used, because the memory
pages are already allocated for caching the 
I/O pages. For each I/O request, the OS always allocate memory pages to
cache it to expect these cached I/O pages can be reused in near future. 
Recall, you use vim/emacs to open a large file. It is pretty slow when you
open it at the first time. But it will be much faster when you close it and
open it immediately because the file has been cached in file cache at the
first time you open it.

It is hard for Spark to use this part of memory. Because this part of the
memory is managed by OS and is transparent to applications.  The only thing
you can do is that you can continuously allocate memory from OS (by
malloc()), to some certain points which the OS senses some memory pressure,
the OS will voluntarily release the page cache to satisfy your memory
allocation. Another thing is that the memory limit of Spark is limited by
maximum JVM heap size. So your memory request from your Spark application is
actually handled by JVM not the OS.


Hope this answer can help you!


Wei




--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org