Re: Pyspark worker memory
Yeah, this is definitely confusing. The motivation for this was that different users of the same cluster may want to set different memory sizes for their apps, so we decided to put this setting in the driver. However, if you put SPARK_JAVA_OPTS in spark-env.sh, it also applies to executors, which is confusing (though in this case it wouldn’t overwrite spark.executor.memory AFAIK). We want to clean a bunch of this stuff up for 1.0, or at least document it better. Thanks for the suggestions. Matei On Mar 19, 2014, at 12:53 AM, Jim Blomo wrote: > To document this, it would be nice to clarify what environment > variables should be used to set which Java system properties, and what > type of process they affect. I'd be happy to start a page if you can > point me to the right place: > > SPARK_JAVA_OPTS: > -Dspark.executor.memory can by set on the machine running the driver > (typically the master host) and will affect the memory available to > the Executor running on a slave node > -D > > SPARK_DAEMON_OPTS: > > > On Wed, Mar 19, 2014 at 12:48 AM, Jim Blomo wrote: >> Thanks for the suggestion, Matei. I've tracked this down to a setting >> I had to make on the Driver. It looks like spark-env.sh has no impact >> on the Executor, which confused me for a long while with settings like >> SPARK_EXECUTOR_MEMORY. The only setting that mattered was setting the >> system property in the *driver* (in this case pyspark/shell.py) or >> using -Dspark.executor.memory in SPARK_JAVA_OPTS *on the master*. I'm >> not sure how this varies from 0.9.0 release, but it seems to work on >> SNAPSHOT. >> >> On Tue, Mar 18, 2014 at 11:52 PM, Matei Zaharia >> wrote: >>> Try checking spark-env.sh on the workers as well. Maybe code there is >>> somehow overriding the spark.executor.memory setting. >>> >>> Matei >>> >>> On Mar 18, 2014, at 6:17 PM, Jim Blomo wrote: >>> >>> Hello, I'm using the Github snapshot of PySpark and having trouble setting >>> the worker memory correctly. I've set spark.executor.memory to 5g, but >>> somewhere along the way Xmx is getting capped to 512M. This was not >>> occurring with the same setup and 0.9.0. How many places do I need to >>> configure the memory? Thank you! >>> >>>
Re: Pyspark worker memory
Jim, I'm starting to document the heap size settings all in one place, which has been a confusion for a lot of my peers. Maybe you can take a look at this ticket? https://spark-project.atlassian.net/browse/SPARK-1264 On Wed, Mar 19, 2014 at 12:53 AM, Jim Blomo wrote: > To document this, it would be nice to clarify what environment > variables should be used to set which Java system properties, and what > type of process they affect. I'd be happy to start a page if you can > point me to the right place: > > SPARK_JAVA_OPTS: > -Dspark.executor.memory can by set on the machine running the driver > (typically the master host) and will affect the memory available to > the Executor running on a slave node > -D > > SPARK_DAEMON_OPTS: > > > On Wed, Mar 19, 2014 at 12:48 AM, Jim Blomo wrote: > > Thanks for the suggestion, Matei. I've tracked this down to a setting > > I had to make on the Driver. It looks like spark-env.sh has no impact > > on the Executor, which confused me for a long while with settings like > > SPARK_EXECUTOR_MEMORY. The only setting that mattered was setting the > > system property in the *driver* (in this case pyspark/shell.py) or > > using -Dspark.executor.memory in SPARK_JAVA_OPTS *on the master*. I'm > > not sure how this varies from 0.9.0 release, but it seems to work on > > SNAPSHOT. > > > > On Tue, Mar 18, 2014 at 11:52 PM, Matei Zaharia > wrote: > >> Try checking spark-env.sh on the workers as well. Maybe code there is > >> somehow overriding the spark.executor.memory setting. > >> > >> Matei > >> > >> On Mar 18, 2014, at 6:17 PM, Jim Blomo wrote: > >> > >> Hello, I'm using the Github snapshot of PySpark and having trouble > setting > >> the worker memory correctly. I've set spark.executor.memory to 5g, but > >> somewhere along the way Xmx is getting capped to 512M. This was not > >> occurring with the same setup and 0.9.0. How many places do I need to > >> configure the memory? Thank you! > >> > >> >
Re: Pyspark worker memory
To document this, it would be nice to clarify what environment variables should be used to set which Java system properties, and what type of process they affect. I'd be happy to start a page if you can point me to the right place: SPARK_JAVA_OPTS: -Dspark.executor.memory can by set on the machine running the driver (typically the master host) and will affect the memory available to the Executor running on a slave node -D SPARK_DAEMON_OPTS: On Wed, Mar 19, 2014 at 12:48 AM, Jim Blomo wrote: > Thanks for the suggestion, Matei. I've tracked this down to a setting > I had to make on the Driver. It looks like spark-env.sh has no impact > on the Executor, which confused me for a long while with settings like > SPARK_EXECUTOR_MEMORY. The only setting that mattered was setting the > system property in the *driver* (in this case pyspark/shell.py) or > using -Dspark.executor.memory in SPARK_JAVA_OPTS *on the master*. I'm > not sure how this varies from 0.9.0 release, but it seems to work on > SNAPSHOT. > > On Tue, Mar 18, 2014 at 11:52 PM, Matei Zaharia > wrote: >> Try checking spark-env.sh on the workers as well. Maybe code there is >> somehow overriding the spark.executor.memory setting. >> >> Matei >> >> On Mar 18, 2014, at 6:17 PM, Jim Blomo wrote: >> >> Hello, I'm using the Github snapshot of PySpark and having trouble setting >> the worker memory correctly. I've set spark.executor.memory to 5g, but >> somewhere along the way Xmx is getting capped to 512M. This was not >> occurring with the same setup and 0.9.0. How many places do I need to >> configure the memory? Thank you! >> >>
Re: Pyspark worker memory
Thanks for the suggestion, Matei. I've tracked this down to a setting I had to make on the Driver. It looks like spark-env.sh has no impact on the Executor, which confused me for a long while with settings like SPARK_EXECUTOR_MEMORY. The only setting that mattered was setting the system property in the *driver* (in this case pyspark/shell.py) or using -Dspark.executor.memory in SPARK_JAVA_OPTS *on the master*. I'm not sure how this varies from 0.9.0 release, but it seems to work on SNAPSHOT. On Tue, Mar 18, 2014 at 11:52 PM, Matei Zaharia wrote: > Try checking spark-env.sh on the workers as well. Maybe code there is > somehow overriding the spark.executor.memory setting. > > Matei > > On Mar 18, 2014, at 6:17 PM, Jim Blomo wrote: > > Hello, I'm using the Github snapshot of PySpark and having trouble setting > the worker memory correctly. I've set spark.executor.memory to 5g, but > somewhere along the way Xmx is getting capped to 512M. This was not > occurring with the same setup and 0.9.0. How many places do I need to > configure the memory? Thank you! > >
Re: Pyspark worker memory
Try checking spark-env.sh on the workers as well. Maybe code there is somehow overriding the spark.executor.memory setting. Matei On Mar 18, 2014, at 6:17 PM, Jim Blomo wrote: > Hello, I'm using the Github snapshot of PySpark and having trouble setting > the worker memory correctly. I've set spark.executor.memory to 5g, but > somewhere along the way Xmx is getting capped to 512M. This was not occurring > with the same setup and 0.9.0. How many places do I need to configure the > memory? Thank you! >
Pyspark worker memory
Hello, I'm using the Github snapshot of PySpark and having trouble setting the worker memory correctly. I've set spark.executor.memory to 5g, but somewhere along the way Xmx is getting capped to 512M. This was not occurring with the same setup and 0.9.0. How many places do I need to configure the memory? Thank you!