[jira] [Created] (SPARK-29148) Modify dynamic allocation manager for stage level scheduling
Thomas Graves created SPARK-29148: - Summary: Modify dynamic allocation manager for stage level scheduling Key: SPARK-29148 URL: https://issues.apache.org/jira/browse/SPARK-29148 Project: Spark Issue Type: Bug Components: Scheduler Affects Versions: 3.0.0 Reporter: Thomas Graves To Support Stage Level Scheduling, the dynamic allocation manager has to track the usage and need or executor per ResourceProfile. We will have to figure out what to do with the metrics. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27495) SPIP: Support Stage level resource configuration and scheduling
[ https://issues.apache.org/jira/browse/SPARK-27495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16930600#comment-16930600 ] Thomas Graves commented on SPARK-27495: --- The SPIP vote passed - https://mail-archives.apache.org/mod_mbox/spark-dev/201909.mbox/%3ccactgibw44ggts5apnlkuqsurnwqdejeadeiwfvdwl7ic3eh...@mail.gmail.com%3e > SPIP: Support Stage level resource configuration and scheduling > --- > > Key: SPARK-27495 > URL: https://issues.apache.org/jira/browse/SPARK-27495 > Project: Spark > Issue Type: Epic > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Thomas Graves >Assignee: Thomas Graves >Priority: Major > > *Q1.* What are you trying to do? Articulate your objectives using absolutely > no jargon. > Objectives: > # Allow users to specify task and executor resource requirements at the > stage level. > # Spark will use the stage level requirements to acquire the necessary > resources/executors and schedule tasks based on the per stage requirements. > Many times users have different resource requirements for different stages of > their application so they want to be able to configure resources at the stage > level. For instance, you have a single job that has 2 stages. The first stage > does some ETL which requires a lot of tasks, each with a small amount of > memory and 1 core each. Then you have a second stage where you feed that ETL > data into an ML algorithm. The second stage only requires a few executors but > each executor needs a lot of memory, GPUs, and many cores. This feature > allows the user to specify the task and executor resource requirements for > the ETL Stage and then change them for the ML stage of the job. > Resources include cpu, memory (on heap, overhead, pyspark, and off heap), and > extra Resources (GPU/FPGA/etc). It has the potential to allow for other > things like limiting the number of tasks per stage, specifying other > parameters for things like shuffle, etc. Initially I would propose we only > support resources as they are now. So Task resources would be cpu and other > resources (GPU, FPGA), that way we aren't adding in extra scheduling things > at this point. Executor resources would be cpu, memory, and extra > resources(GPU,FPGA, etc). Changing the executor resources will rely on > dynamic allocation being enabled. > Main use cases: > # ML use case where user does ETL and feeds it into an ML algorithm where > it’s using the RDD API. This should work with barrier scheduling as well once > it supports dynamic allocation. > # This adds the framework/api for Spark's own internal use. In the future > (not covered by this SPIP), Catalyst could control the stage level resources > as it finds the need to change it between stages for different optimizations. > For instance, with the new columnar plugin to the query planner we can insert > stages into the plan that would change running something on the CPU in row > format to running it on the GPU in columnar format. This API would allow the > planner to make sure the stages that run on the GPU get the corresponding GPU > resources it needs to run. Another possible use case for catalyst is that it > would allow catalyst to add in more optimizations to where the user doesn’t > need to configure container sizes at all. If the optimizer/planner can handle > that for the user, everyone wins. > This SPIP focuses on the RDD API but we don’t exclude the Dataset API. I > think the DataSet API will require more changes because it specifically hides > the RDD from the users via the plans and catalyst can optimize the plan and > insert things into the plan. The only way I’ve found to make this work with > the Dataset API would be modifying all the plans to be able to get the > resource requirements down into where it creates the RDDs, which I believe > would be a lot of change. If other people know better options, it would be > great to hear them. > *Q2.* What problem is this proposal NOT designed to solve? > The initial implementation is not going to add Dataset APIs. > We are starting with allowing users to specify a specific set of > task/executor resources and plan to design it to be extendable, but the first > implementation will not support changing generic SparkConf configs and only > specific limited resources. > This initial version will have a programmatic API for specifying the resource > requirements per stage, we can add the ability to perhaps have profiles in > the configs later if its useful. > *Q3.* How is it done today, and what are the limits of current practice? > Currently this is either done by having multiple spark jobs or requesting > containers with the max resources needed for any part of the job. To
[jira] [Resolved] (SPARK-27492) GPU scheduling - High level user documentation
[ https://issues.apache.org/jira/browse/SPARK-27492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-27492. --- Fix Version/s: 3.0.0 Resolution: Fixed > GPU scheduling - High level user documentation > -- > > Key: SPARK-27492 > URL: https://issues.apache.org/jira/browse/SPARK-27492 > Project: Spark > Issue Type: Story > Components: Documentation >Affects Versions: 3.0.0 >Reporter: Thomas Graves >Assignee: Thomas Graves >Priority: Major > Fix For: 3.0.0 > > > For the SPIP - Accelerator-aware task scheduling for Spark, > https://issues.apache.org/jira/browse/SPARK-24615 Add some high level user > documentation about how this feature works together and point to things like > the example discovery script, etc. > > - make sure to document the discovery script and what permissions are needed > and any security implications > - Document standalone - local-cluster mode limitation of only a single > resource file or discovery script so you have to have coordination on for it > to work right. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27495) SPIP: Support Stage level resource configuration and scheduling
[ https://issues.apache.org/jira/browse/SPARK-27495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927558#comment-16927558 ] Thomas Graves commented on SPARK-27495: --- [~felixcheung] [~jiangxb1987] I put this up for vote on the dev mailing list. Could you please take a look and comment there? > SPIP: Support Stage level resource configuration and scheduling > --- > > Key: SPARK-27495 > URL: https://issues.apache.org/jira/browse/SPARK-27495 > Project: Spark > Issue Type: Epic > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Thomas Graves >Assignee: Thomas Graves >Priority: Major > > *Q1.* What are you trying to do? Articulate your objectives using absolutely > no jargon. > Objectives: > # Allow users to specify task and executor resource requirements at the > stage level. > # Spark will use the stage level requirements to acquire the necessary > resources/executors and schedule tasks based on the per stage requirements. > Many times users have different resource requirements for different stages of > their application so they want to be able to configure resources at the stage > level. For instance, you have a single job that has 2 stages. The first stage > does some ETL which requires a lot of tasks, each with a small amount of > memory and 1 core each. Then you have a second stage where you feed that ETL > data into an ML algorithm. The second stage only requires a few executors but > each executor needs a lot of memory, GPUs, and many cores. This feature > allows the user to specify the task and executor resource requirements for > the ETL Stage and then change them for the ML stage of the job. > Resources include cpu, memory (on heap, overhead, pyspark, and off heap), and > extra Resources (GPU/FPGA/etc). It has the potential to allow for other > things like limiting the number of tasks per stage, specifying other > parameters for things like shuffle, etc. Initially I would propose we only > support resources as they are now. So Task resources would be cpu and other > resources (GPU, FPGA), that way we aren't adding in extra scheduling things > at this point. Executor resources would be cpu, memory, and extra > resources(GPU,FPGA, etc). Changing the executor resources will rely on > dynamic allocation being enabled. > Main use cases: > # ML use case where user does ETL and feeds it into an ML algorithm where > it’s using the RDD API. This should work with barrier scheduling as well once > it supports dynamic allocation. > # This adds the framework/api for Spark's own internal use. In the future > (not covered by this SPIP), Catalyst could control the stage level resources > as it finds the need to change it between stages for different optimizations. > For instance, with the new columnar plugin to the query planner we can insert > stages into the plan that would change running something on the CPU in row > format to running it on the GPU in columnar format. This API would allow the > planner to make sure the stages that run on the GPU get the corresponding GPU > resources it needs to run. Another possible use case for catalyst is that it > would allow catalyst to add in more optimizations to where the user doesn’t > need to configure container sizes at all. If the optimizer/planner can handle > that for the user, everyone wins. > This SPIP focuses on the RDD API but we don’t exclude the Dataset API. I > think the DataSet API will require more changes because it specifically hides > the RDD from the users via the plans and catalyst can optimize the plan and > insert things into the plan. The only way I’ve found to make this work with > the Dataset API would be modifying all the plans to be able to get the > resource requirements down into where it creates the RDDs, which I believe > would be a lot of change. If other people know better options, it would be > great to hear them. > *Q2.* What problem is this proposal NOT designed to solve? > The initial implementation is not going to add Dataset APIs. > We are starting with allowing users to specify a specific set of > task/executor resources and plan to design it to be extendable, but the first > implementation will not support changing generic SparkConf configs and only > specific limited resources. > This initial version will have a programmatic API for specifying the resource > requirements per stage, we can add the ability to perhaps have profiles in > the configs later if its useful. > *Q3.* How is it done today, and what are the limits of current practice? > Currently this is either done by having multiple spark jobs or requesting > containers with the max resources needed for any part of the job. To do this > today, you can brea
[jira] [Commented] (SPARK-27492) GPU scheduling - High level user documentation
[ https://issues.apache.org/jira/browse/SPARK-27492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16923528#comment-16923528 ] Thomas Graves commented on SPARK-27492: --- I think we talked about this a big in the prs for the main changes. I think its better to leave as resource since really cpu/memory could be rolled into this. They are just special since pre-existing ones. Also this could be extended to any resource type that wouldn't necessarily be an accelerator. Also if https://issues.apache.org/jira/browse/SPARK-27495 stage scheduling gets approved everything will be under a single api there and just referred to as resource. > GPU scheduling - High level user documentation > -- > > Key: SPARK-27492 > URL: https://issues.apache.org/jira/browse/SPARK-27492 > Project: Spark > Issue Type: Story > Components: Documentation >Affects Versions: 3.0.0 >Reporter: Thomas Graves >Assignee: Thomas Graves >Priority: Major > > For the SPIP - Accelerator-aware task scheduling for Spark, > https://issues.apache.org/jira/browse/SPARK-24615 Add some high level user > documentation about how this feature works together and point to things like > the example discovery script, etc. > > - make sure to document the discovery script and what permissions are needed > and any security implications > - Document standalone - local-cluster mode limitation of only a single > resource file or discovery script so you have to have coordination on for it > to work right. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-27492) GPU scheduling - High level user documentation
[ https://issues.apache.org/jira/browse/SPARK-27492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves reassigned SPARK-27492: - Assignee: Thomas Graves > GPU scheduling - High level user documentation > -- > > Key: SPARK-27492 > URL: https://issues.apache.org/jira/browse/SPARK-27492 > Project: Spark > Issue Type: Story > Components: Documentation >Affects Versions: 3.0.0 >Reporter: Thomas Graves >Assignee: Thomas Graves >Priority: Major > > For the SPIP - Accelerator-aware task scheduling for Spark, > https://issues.apache.org/jira/browse/SPARK-24615 Add some high level user > documentation about how this feature works together and point to things like > the example discovery script, etc. > > - make sure to document the discovery script and what permissions are needed > and any security implications > - Document standalone - local-cluster mode limitation of only a single > resource file or discovery script so you have to have coordination on for it > to work right. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27495) SPIP: Support Stage level resource configuration and scheduling
[ https://issues.apache.org/jira/browse/SPARK-27495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated SPARK-27495: -- Description: *Q1.* What are you trying to do? Articulate your objectives using absolutely no jargon. Objectives: # Allow users to specify task and executor resource requirements at the stage level. # Spark will use the stage level requirements to acquire the necessary resources/executors and schedule tasks based on the per stage requirements. Many times users have different resource requirements for different stages of their application so they want to be able to configure resources at the stage level. For instance, you have a single job that has 2 stages. The first stage does some ETL which requires a lot of tasks, each with a small amount of memory and 1 core each. Then you have a second stage where you feed that ETL data into an ML algorithm. The second stage only requires a few executors but each executor needs a lot of memory, GPUs, and many cores. This feature allows the user to specify the task and executor resource requirements for the ETL Stage and then change them for the ML stage of the job. Resources include cpu, memory (on heap, overhead, pyspark, and off heap), and extra Resources (GPU/FPGA/etc). It has the potential to allow for other things like limiting the number of tasks per stage, specifying other parameters for things like shuffle, etc. Initially I would propose we only support resources as they are now. So Task resources would be cpu and other resources (GPU, FPGA), that way we aren't adding in extra scheduling things at this point. Executor resources would be cpu, memory, and extra resources(GPU,FPGA, etc). Changing the executor resources will rely on dynamic allocation being enabled. Main use cases: # ML use case where user does ETL and feeds it into an ML algorithm where it’s using the RDD API. This should work with barrier scheduling as well once it supports dynamic allocation. # This adds the framework/api for Spark's own internal use. In the future (not covered by this SPIP), Catalyst could control the stage level resources as it finds the need to change it between stages for different optimizations. For instance, with the new columnar plugin to the query planner we can insert stages into the plan that would change running something on the CPU in row format to running it on the GPU in columnar format. This API would allow the planner to make sure the stages that run on the GPU get the corresponding GPU resources it needs to run. Another possible use case for catalyst is that it would allow catalyst to add in more optimizations to where the user doesn’t need to configure container sizes at all. If the optimizer/planner can handle that for the user, everyone wins. This SPIP focuses on the RDD API but we don’t exclude the Dataset API. I think the DataSet API will require more changes because it specifically hides the RDD from the users via the plans and catalyst can optimize the plan and insert things into the plan. The only way I’ve found to make this work with the Dataset API would be modifying all the plans to be able to get the resource requirements down into where it creates the RDDs, which I believe would be a lot of change. If other people know better options, it would be great to hear them. *Q2.* What problem is this proposal NOT designed to solve? The initial implementation is not going to add Dataset APIs. We are starting with allowing users to specify a specific set of task/executor resources and plan to design it to be extendable, but the first implementation will not support changing generic SparkConf configs and only specific limited resources. This initial version will have a programmatic API for specifying the resource requirements per stage, we can add the ability to perhaps have profiles in the configs later if its useful. *Q3.* How is it done today, and what are the limits of current practice? Currently this is either done by having multiple spark jobs or requesting containers with the max resources needed for any part of the job. To do this today, you can break it into separate jobs where each job requests the corresponding resources needed, but then you have to write the data out somewhere and then read it back in between jobs. This is going to take longer as well as require that job coordination between those to make sure everything works smoothly. Another option would be to request executors with your largest need up front and potentially waste those resources when they aren't being used, which in turn wastes money. For instance, for an ML application where it does ETL first, many times people request containers with GPUs and the GPUs sit idle while the ETL is happening. This is wasting those GPU resources and in turn money because those GPUs could have been used by other applicatio
[jira] [Resolved] (SPARK-28577) Ensure executorMemoryHead requested value not less than MEMORY_OFFHEAP_SIZE when MEMORY_OFFHEAP_ENABLED is true
[ https://issues.apache.org/jira/browse/SPARK-28577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-28577. --- Fix Version/s: 3.0.0 Assignee: Yang Jie Resolution: Fixed > Ensure executorMemoryHead requested value not less than MEMORY_OFFHEAP_SIZE > when MEMORY_OFFHEAP_ENABLED is true > --- > > Key: SPARK-28577 > URL: https://issues.apache.org/jira/browse/SPARK-28577 > Project: Spark > Issue Type: Improvement > Components: YARN >Affects Versions: 3.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: release-notes > Fix For: 3.0.0 > > > If MEMORY_OFFHEAP_ENABLED is true, we should ensure executorOverheadMemory > not less than MEMORY_OFFHEAP_SIZE, otherwise the memory resource requested > for executor may be not enough. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-28960) maven 3.6.1 gone from apache maven repo
[ https://issues.apache.org/jira/browse/SPARK-28960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-28960. --- Resolution: Duplicate > maven 3.6.1 gone from apache maven repo > --- > > Key: SPARK-28960 > URL: https://issues.apache.org/jira/browse/SPARK-28960 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.0.0 >Reporter: Thomas Graves >Priority: Blocker > > It looks like maven 3.6.1 is missing from the apache maven repo: > [http://apache.claz.org/maven/maven-3/] > This is causing PR build failures: > [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/110045/console] > > exec: curl -s -L > [https://www.apache.org/dyn/closer.lua?action=download&filename=/maven/maven-3/3.6.1/binaries/apache-maven-3.6.1-bin.tar.gz] > gzip: stdin: not in gzip format > tar: Child returned status 1 > tar: Error is not recoverable: exiting now > Using `mvn` from path: > /home/jenkins/workspace/SparkPullRequestBuilder/build/apache-maven-3.6.1/bin/mvn > build/mvn: line 163: > /home/jenkins/workspace/SparkPullRequestBuilder/build/apache-maven-3.6.1/bin/mvn: > No such file or directory > Error while getting version string from Maven: -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28961) Upgrade Maven to 3.6.2
[ https://issues.apache.org/jira/browse/SPARK-28961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated SPARK-28961: -- Description: It looks like maven 3.6.1 is missing from the apache maven repo: [http://apache.claz.org/maven/maven-3/] This is causing PR build failures: [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/110045/console] exec: curl -s -L [https://www.apache.org/dyn/closer.lua?action=download&filename=/maven/maven-3/3.6.1/binaries/apache-maven-3.6.1-bin.tar.gz] gzip: stdin: not in gzip format tar: Child returned status 1 tar: Error is not recoverable: exiting now Using `mvn` from path: /home/jenkins/workspace/SparkPullRequestBuilder/build/apache-maven-3.6.1/bin/mvn build/mvn: line 163: /home/jenkins/workspace/SparkPullRequestBuilder/build/apache-maven-3.6.1/bin/mvn: No such file or directory Error while getting version string from Maven: h4. > Upgrade Maven to 3.6.2 > -- > > Key: SPARK-28961 > URL: https://issues.apache.org/jira/browse/SPARK-28961 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.0.0 >Reporter: Xiao Li >Assignee: Xiao Li >Priority: Major > > It looks like maven 3.6.1 is missing from the apache maven repo: > [http://apache.claz.org/maven/maven-3/] > This is causing PR build failures: > [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/110045/console] > > exec: curl -s -L > > [https://www.apache.org/dyn/closer.lua?action=download&filename=/maven/maven-3/3.6.1/binaries/apache-maven-3.6.1-bin.tar.gz] > gzip: stdin: not in gzip format > tar: Child returned status 1 > tar: Error is not recoverable: exiting now > Using `mvn` from path: > /home/jenkins/workspace/SparkPullRequestBuilder/build/apache-maven-3.6.1/bin/mvn > build/mvn: line 163: > /home/jenkins/workspace/SparkPullRequestBuilder/build/apache-maven-3.6.1/bin/mvn: > No such file or directory > Error while getting version string from Maven: > h4. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28960) maven 3.6.1 gone from apache maven repo
[ https://issues.apache.org/jira/browse/SPARK-28960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16921611#comment-16921611 ] Thomas Graves commented on SPARK-28960: --- its a bit odd I don't see 3.6.2 on their page as released yet: [https://maven.apache.org/docs/history.html] Anyone happen to know what the deal is? > maven 3.6.1 gone from apache maven repo > --- > > Key: SPARK-28960 > URL: https://issues.apache.org/jira/browse/SPARK-28960 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.0.0 >Reporter: Thomas Graves >Priority: Blocker > > It looks like maven 3.6.1 is missing from the apache maven repo: > [http://apache.claz.org/maven/maven-3/] > This is causing PR build failures: > [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/110045/console] > > exec: curl -s -L > [https://www.apache.org/dyn/closer.lua?action=download&filename=/maven/maven-3/3.6.1/binaries/apache-maven-3.6.1-bin.tar.gz] > gzip: stdin: not in gzip format > tar: Child returned status 1 > tar: Error is not recoverable: exiting now > Using `mvn` from path: > /home/jenkins/workspace/SparkPullRequestBuilder/build/apache-maven-3.6.1/bin/mvn > build/mvn: line 163: > /home/jenkins/workspace/SparkPullRequestBuilder/build/apache-maven-3.6.1/bin/mvn: > No such file or directory > Error while getting version string from Maven: -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27495) SPIP: Support Stage level resource configuration and scheduling
[ https://issues.apache.org/jira/browse/SPARK-27495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16921608#comment-16921608 ] Thomas Graves commented on SPARK-27495: --- Thanks for the feedback [~felixcheung] 1. yes this is out of scope of this SPIP, my statement was intended to just say this would allow for that in the future if we decided to do it. I updated the description hopefully that is more clear. 2. We could certainly fail if there is a conflict for now and ask the user to resolve and fix their code. I am a little worried users will hit this fairly easily once it starts to get used and in most cases I would think doing a simple max or sum would be sufficient. Perhaps if we start with a config for this and have it fail by default, but allow them to turn it off and it does something static like a max. Then we can see how things go an potentially add in other options,. thoughts? 3. Yeah I agree for ML use cases the strict mode makes more sensee to me, I think the hint could come into play more in the ETL side and I think even that would be unusual. If you are using those extra resources its more then likely to make things go faster so I would think you would want all things to run on those, but I guess if you have a shared cluster and those resources are in high demand it could be faster just to run on whatever is available. The API would allow us to add it in the future but its not included in the first go here. > SPIP: Support Stage level resource configuration and scheduling > --- > > Key: SPARK-27495 > URL: https://issues.apache.org/jira/browse/SPARK-27495 > Project: Spark > Issue Type: Epic > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Thomas Graves >Assignee: Thomas Graves >Priority: Major > > *Q1.* What are you trying to do? Articulate your objectives using absolutely > no jargon. > Objectives: > # Allow users to specify task and executor resource requirements at the > stage level. > # Spark will use the stage level requirements to acquire the necessary > resources/executors and schedule tasks based on the per stage requirements. > Many times users have different resource requirements for different stages of > their application so they want to be able to configure resources at the stage > level. For instance, you have a single job that has 2 stages. The first stage > does some ETL which requires a lot of tasks, each with a small amount of > memory and 1 core each. Then you have a second stage where you feed that ETL > data into an ML algorithm. The second stage only requires a few executors but > each executor needs a lot of memory, GPUs, and many cores. This feature > allows the user to specify the task and executor resource requirements for > the ETL Stage and then change them for the ML stage of the job. > Resources include cpu, memory (on heap, overhead, pyspark, and off heap), and > extra Resources (GPU/FPGA/etc). It has the potential to allow for other > things like limiting the number of tasks per stage, specifying other > parameters for things like shuffle, etc. Initially I would propose we only > support resources as they are now. So Task resources would be cpu and other > resources (GPU, FPGA), that way we aren't adding in extra scheduling things > at this point. Executor resources would be cpu, memory, and extra > resources(GPU,FPGA, etc). Changing the executor resources will rely on > dynamic allocation being enabled. > Main use cases: > # ML use case where user does ETL and feeds it into an ML algorithm where > it’s using the RDD API. This should work with barrier scheduling as well once > it supports dynamic allocation. > # This adds the framework/api for Spark's own internal use. In the future > (not covered by this SPIP), Catalyst could control the stage level resources > as it finds the need to change it between stages for different optimizations. > For instance, with the new columnar plugin to the query planner we can insert > stages into the plan that would change running something on the CPU in row > format to running it on the GPU in columnar format. This API would allow the > planner to make sure the stages that run on the GPU get the corresponding GPU > resources it needs to run. Another possible use case for catalyst is that it > would allow catalyst to add in more optimizations to where the user doesn’t > need to configure container sizes at all. If the optimizer/planner can handle > that for the user, everyone wins. > This SPIP focuses on the RDD API but we don’t exclude the Dataset API. I > think the DataSet API will require more changes because it specifically hides > the RDD from the users via the plans and catalyst can optimize the p
[jira] [Updated] (SPARK-27495) SPIP: Support Stage level resource configuration and scheduling
[ https://issues.apache.org/jira/browse/SPARK-27495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated SPARK-27495: -- Description: *Q1.* What are you trying to do? Articulate your objectives using absolutely no jargon. Objectives: # Allow users to specify task and executor resource requirements at the stage level. # Spark will use the stage level requirements to acquire the necessary resources/executors and schedule tasks based on the per stage requirements. Many times users have different resource requirements for different stages of their application so they want to be able to configure resources at the stage level. For instance, you have a single job that has 2 stages. The first stage does some ETL which requires a lot of tasks, each with a small amount of memory and 1 core each. Then you have a second stage where you feed that ETL data into an ML algorithm. The second stage only requires a few executors but each executor needs a lot of memory, GPUs, and many cores. This feature allows the user to specify the task and executor resource requirements for the ETL Stage and then change them for the ML stage of the job. Resources include cpu, memory (on heap, overhead, pyspark, and off heap), and extra Resources (GPU/FPGA/etc). It has the potential to allow for other things like limiting the number of tasks per stage, specifying other parameters for things like shuffle, etc. Initially I would propose we only support resources as they are now. So Task resources would be cpu and other resources (GPU, FPGA), that way we aren't adding in extra scheduling things at this point. Executor resources would be cpu, memory, and extra resources(GPU,FPGA, etc). Changing the executor resources will rely on dynamic allocation being enabled. Main use cases: # ML use case where user does ETL and feeds it into an ML algorithm where it’s using the RDD API. This should work with barrier scheduling as well once it supports dynamic allocation. # This adds the framework/api for Spark's own internal use. In the future (not covered by this SPIP), Catalyst could control the stage level resources as it finds the need to change it between stages for different optimizations. For instance, with the new columnar plugin to the query planner we can insert stages into the plan that would change running something on the CPU in row format to running it on the GPU in columnar format. This API would allow the planner to make sure the stages that run on the GPU get the corresponding GPU resources it needs to run. Another possible use case for catalyst is that it would allow catalyst to add in more optimizations to where the user doesn’t need to configure container sizes at all. If the optimizer/planner can handle that for the user, everyone wins. This SPIP focuses on the RDD API but we don’t exclude the Dataset API. I think the DataSet API will require more changes because it specifically hides the RDD from the users via the plans and catalyst can optimize the plan and insert things into the plan. The only way I’ve found to make this work with the Dataset API would be modifying all the plans to be able to get the resource requirements down into where it creates the RDDs, which I believe would be a lot of change. If other people know better options, it would be great to hear them. *Q2.* What problem is this proposal NOT designed to solve? The initial implementation is not going to add Dataset APIs. We are starting with allowing users to specify a specific set of task/executor resources and plan to design it to be extendable, but the first implementation will not support changing generic SparkConf configs and only specific limited resources. This initial version will have a programmatic API for specifying the resource requirements per stage, we can add the ability to perhaps have profiles in the configs later if its useful. *Q3.* How is it done today, and what are the limits of current practice? Currently this is either done by having multiple spark jobs or requesting containers with the max resources needed for any part of the job. To do this today, you can break it into separate jobs where each job requests the corresponding resources needed, but then you have to write the data out somewhere and then read it back in between jobs. This is going to take longer as well as require that job coordination between those to make sure everything works smoothly. Another option would be to request executors with your largest need up front and potentially waste those resources when they aren't being used, which in turn wastes money. For instance, for an ML application where it does ETL first, many times people request containers with GPUs and the GPUs sit idle while the ETL is happening. This is wasting those GPU resources and in turn money because those GPUs could have been used by other applicatio
[jira] [Created] (SPARK-28960) maven 3.6.1 gone from apache maven repo
Thomas Graves created SPARK-28960: - Summary: maven 3.6.1 gone from apache maven repo Key: SPARK-28960 URL: https://issues.apache.org/jira/browse/SPARK-28960 Project: Spark Issue Type: Bug Components: Build Affects Versions: 3.0.0 Reporter: Thomas Graves It looks like maven 3.6.1 is missing from the apache maven repo: [http://apache.claz.org/maven/maven-3/] This is causing PR build failures: [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/110045/console] exec: curl -s -L [https://www.apache.org/dyn/closer.lua?action=download&filename=/maven/maven-3/3.6.1/binaries/apache-maven-3.6.1-bin.tar.gz] gzip: stdin: not in gzip format tar: Child returned status 1 tar: Error is not recoverable: exiting now Using `mvn` from path: /home/jenkins/workspace/SparkPullRequestBuilder/build/apache-maven-3.6.1/bin/mvn build/mvn: line 163: /home/jenkins/workspace/SparkPullRequestBuilder/build/apache-maven-3.6.1/bin/mvn: No such file or directory Error while getting version string from Maven: -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28917) Jobs can hang because of race of RDD.dependencies
[ https://issues.apache.org/jira/browse/SPARK-28917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16921424#comment-16921424 ] Thomas Graves commented on SPARK-28917: --- So for this to happen, you essentially have to have that RDD referenced in 2 separate threads and the second thread to start operating on it before it has been materialized in the first thread, correct? I see how it can change with a checkpoint, but like you say I'm not sure how the cache would cause this. Anyway you can see if they are checkpointing? > Jobs can hang because of race of RDD.dependencies > - > > Key: SPARK-28917 > URL: https://issues.apache.org/jira/browse/SPARK-28917 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 2.3.3, 2.4.3 >Reporter: Imran Rashid >Priority: Major > > {{RDD.dependencies}} stores the precomputed cache value, but it is not > thread-safe. This can lead to a race where the value gets overwritten, but > the DAGScheduler gets stuck in an inconsistent state. In particular, this > can happen when there is a race between the DAGScheduler event loop, and > another thread (eg. a user thread, if there is multi-threaded job submission). > First, a job is submitted by the user, which then computes the result Stage > and its parents: > https://github.com/apache/spark/blob/24655583f1cb5dae2e80bb572604fb4a9761ec07/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L983 > Which eventually makes a call to {{rdd.dependencies}}: > https://github.com/apache/spark/blob/24655583f1cb5dae2e80bb572604fb4a9761ec07/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L519 > At the same time, the user could also touch {{rdd.dependencies}} in another > thread, which could overwrite the stored value because of the race. > Then the DAGScheduler checks the dependencies *again* later on in the job > submission, via {{getMissingParentStages}} > https://github.com/apache/spark/blob/24655583f1cb5dae2e80bb572604fb4a9761ec07/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1025 > Because it will find new dependencies, it will create entirely different > stages. Now the job has some orphaned stages which will never run. > The symptoms of this are seeing disjoint sets of stages in the "Parents of > final stage" and the "Missing parents" messages on job submission, as well as > seeing repeated messages "Registered RDD X" for the same RDD id. eg: > {noformat} > [INFO] 2019-08-15 23:22:31,570 org.apache.spark.SparkContext logInfo - > Starting job: count at XXX.scala:462 > ... > [INFO] 2019-08-15 23:22:31,573 org.apache.spark.scheduler.DAGScheduler > logInfo - Registering RDD 14 (repartition at XXX.scala:421) > ... > ... > [INFO] 2019-08-15 23:22:31,582 org.apache.spark.scheduler.DAGScheduler > logInfo - Got job 1 (count at XXX.scala:462) with 40 output partitions > [INFO] 2019-08-15 23:22:31,582 org.apache.spark.scheduler.DAGScheduler > logInfo - Final stage: ResultStage 5 (count at XXX.scala:462) > [INFO] 2019-08-15 23:22:31,582 org.apache.spark.scheduler.DAGScheduler > logInfo - Parents of final stage: List(ShuffleMapStage 4) > [INFO] 2019-08-15 23:22:31,599 org.apache.spark.scheduler.DAGScheduler > logInfo - Registering RDD 14 (repartition at XXX.scala:421) > [INFO] 2019-08-15 23:22:31,599 org.apache.spark.scheduler.DAGScheduler > logInfo - Missing parents: List(ShuffleMapStage 6) > {noformat} > Note that there is a similar issue w/ {{rdd.partitions}}. I don't see a way > it could mess up the scheduler (seems its only used for > {{rdd.partitions.length}}). There is also an issue that {{rdd.storageLevel}} > is read and cached in the scheduler, but it could be modified simultaneously > by the user in another thread. Similarly, I can't see a way it could effect > the scheduler. > *WORKAROUND*: > (a) call {{rdd.dependencies}} while you know that RDD is only getting touched > by one thread (eg. in the thread that created it, or before you submit > multiple jobs touching that RDD from other threads). Then that value will get > cached. > (b) don't submit jobs from multiple threads. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-27360) Standalone cluster mode support for GPU-aware scheduling
[ https://issues.apache.org/jira/browse/SPARK-27360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-27360. --- Fix Version/s: 3.0.0 Resolution: Fixed all subtasks are finished so resolving > Standalone cluster mode support for GPU-aware scheduling > > > Key: SPARK-27360 > URL: https://issues.apache.org/jira/browse/SPARK-27360 > Project: Spark > Issue Type: Story > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Xiangrui Meng >Assignee: Xiangrui Meng >Priority: Major > Fix For: 3.0.0 > > > Design and implement standalone manager support for GPU-aware scheduling: > 1. static conf to describe resources > 2. spark-submit to request resources > 2. auto discovery of GPUs > 3. executor process isolation -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-28414) Standalone worker/master UI updates for Resource scheduling
[ https://issues.apache.org/jira/browse/SPARK-28414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-28414. --- Fix Version/s: 3.0.0 Assignee: wuyi Resolution: Fixed > Standalone worker/master UI updates for Resource scheduling > --- > > Key: SPARK-28414 > URL: https://issues.apache.org/jira/browse/SPARK-28414 > Project: Spark > Issue Type: Sub-task > Components: Deploy >Affects Versions: 3.0.0 >Reporter: Thomas Graves >Assignee: wuyi >Priority: Major > Fix For: 3.0.0 > > > https://issues.apache.org/jira/browse/SPARK-27360 is adding Resource > scheduling to standalone mode. Update the UI with the resource information > for master/worker as appropriate. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28577) Ensure executorMemoryHead requested value not less than MEMORY_OFFHEAP_SIZE when MEMORY_OFFHEAP_ENABLED is true
[ https://issues.apache.org/jira/browse/SPARK-28577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated SPARK-28577: -- Docs Text: On YARN, The off heap memory size is now separately included for the user in the container size it requests from YARN. Previously you had to add that into the overhead memory you requested, that is no longer needed. > Ensure executorMemoryHead requested value not less than MEMORY_OFFHEAP_SIZE > when MEMORY_OFFHEAP_ENABLED is true > --- > > Key: SPARK-28577 > URL: https://issues.apache.org/jira/browse/SPARK-28577 > Project: Spark > Issue Type: Improvement > Components: YARN >Affects Versions: 3.0.0 >Reporter: Yang Jie >Priority: Major > Labels: release-notes > > If MEMORY_OFFHEAP_ENABLED is true, we should ensure executorOverheadMemory > not less than MEMORY_OFFHEAP_SIZE, otherwise the memory resource requested > for executor may be not enough. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28577) Ensure executorMemoryHead requested value not less than MEMORY_OFFHEAP_SIZE when MEMORY_OFFHEAP_ENABLED is true
[ https://issues.apache.org/jira/browse/SPARK-28577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated SPARK-28577: -- Labels: release-notes (was: ) > Ensure executorMemoryHead requested value not less than MEMORY_OFFHEAP_SIZE > when MEMORY_OFFHEAP_ENABLED is true > --- > > Key: SPARK-28577 > URL: https://issues.apache.org/jira/browse/SPARK-28577 > Project: Spark > Issue Type: Improvement > Components: YARN >Affects Versions: 3.0.0 >Reporter: Yang Jie >Priority: Major > Labels: release-notes > > If MEMORY_OFFHEAP_ENABLED is true, we should ensure executorOverheadMemory > not less than MEMORY_OFFHEAP_SIZE, otherwise the memory resource requested > for executor may be not enough. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27361) YARN support for GPU-aware scheduling
[ https://issues.apache.org/jira/browse/SPARK-27361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16907431#comment-16907431 ] Thomas Graves commented on SPARK-27361: --- Since that was done prior to this feature, I think its ok to leave it alone. It worked just fine on its own to purely request yarn get the resources (with no integration with spark scheduler, etc) We did modify how that worked in https://issues.apache.org/jira/browse/SPARK-27959 so that one should be linked here I think. > YARN support for GPU-aware scheduling > - > > Key: SPARK-27361 > URL: https://issues.apache.org/jira/browse/SPARK-27361 > Project: Spark > Issue Type: Story > Components: YARN >Affects Versions: 3.0.0 >Reporter: Xiangrui Meng >Assignee: Thomas Graves >Priority: Major > Fix For: 3.0.0 > > > Design and implement YARN support for GPU-aware scheduling: > * User can request GPU resources at Spark application level. > * How the Spark executor discovers GPU's when run on YARN > * Integrate with YARN 3.2 GPU support. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23151) Provide a distribution of Spark with Hadoop 3.0
[ https://issues.apache.org/jira/browse/SPARK-23151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16906218#comment-16906218 ] Thomas Graves commented on SPARK-23151: --- Hey so for the hadoop-3.2 profile, the dependencies are quite a bit different since we are using the hive version of orc. The things it pulls in are quite different then the nohive version of orc. It would be really nice if we had a different classifier or something so users can actually tell the difference in what is published on mvn. Right now all we know is its a spark jar with version X.x and scala version y.y but don't know which hadoop profile and thus dependencies got pulled in. Something to think about anyway > Provide a distribution of Spark with Hadoop 3.0 > --- > > Key: SPARK-23151 > URL: https://issues.apache.org/jira/browse/SPARK-23151 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 2.2.0, 2.2.1 >Reporter: Louis Burke >Priority: Major > > Provide a Spark package that supports Hadoop 3.0.0. Currently the Spark > package > only supports Hadoop 2.7 i.e. spark-2.2.1-bin-hadoop2.7.tgz. The implication > is > that using up to date Kinesis libraries alongside s3 causes a clash w.r.t > aws-java-sdk. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28679) Spark Yarn ResourceRequestHelper shouldn't lookup setResourceInformation is no resources specified
Thomas Graves created SPARK-28679: - Summary: Spark Yarn ResourceRequestHelper shouldn't lookup setResourceInformation is no resources specified Key: SPARK-28679 URL: https://issues.apache.org/jira/browse/SPARK-28679 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 3.0.0 Reporter: Thomas Graves in the Spark Yarn ResourceRequestHelper it uses reflection to lookup setResourceInformation. We should skip that lookup if the resource Map is empty. [https://github.com/apache/spark/blob/master/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ResourceRequestHelper.scala#L154] -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-27371) Standalone master receives resource info from worker and allocate driver/executor properly
[ https://issues.apache.org/jira/browse/SPARK-27371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves reassigned SPARK-27371: - Assignee: wuyi > Standalone master receives resource info from worker and allocate > driver/executor properly > -- > > Key: SPARK-27371 > URL: https://issues.apache.org/jira/browse/SPARK-27371 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Xiangrui Meng >Assignee: wuyi >Priority: Major > > As an admin, I can let Spark Standalone worker automatically discover GPUs > installed on worker nodes. So I don't need to manually configure them. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-27368) Design: Standalone supports GPU scheduling
[ https://issues.apache.org/jira/browse/SPARK-27368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-27368. --- Resolution: Fixed resolving since code committed > Design: Standalone supports GPU scheduling > -- > > Key: SPARK-27368 > URL: https://issues.apache.org/jira/browse/SPARK-27368 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Xiangrui Meng >Assignee: Xiangrui Meng >Priority: Major > > Design draft: > Scenarios: > * client-mode, worker might create one or more executor processes, from > different Spark applications. > * cluster-mode, worker might create driver process as well. > * local-cluster model, there could be multiple worker processes on the same > node. This is an undocumented use of standalone mode, which is mainly for > tests. > * Resource isolation is not considered here. > Because executor and driver processes on the same node will share the > accelerator resources, worker must take the role that allocates resources. So > we will add spark.worker.resource.[resourceName].discoveryScript conf for > workers to discover resources. User need to match the resourceName in driver > and executor requests. Besides CPU cores and memory, worker now also > considers resources in creating new executors or drivers. > Example conf: > {code} > # static worker conf > spark.worker.resource.gpu.discoveryScript=/path/to/list-gpus.sh > # application conf > spark.driver.resource.gpu.amount=4 > spark.executor.resource.gpu.amount=2 > spark.task.resource.gpu.amount=1 > {code} > In client mode, driver process is not launched by worker. So user can specify > driver resource discovery script. In cluster mode, if user still specify > driver resource discovery script, it is ignored with a warning. > Supporting resource isolation is tricky because Spark worker doesn't know how > to isolate resources unless we hardcode some resource names like GPU support > in YARN, which is less ideal. Support resource isolation of multiple resource > types is even harder. In the first version, we will implement accelerator > support without resource isolation. > Timeline: > 1. Worker starts. > 2. Worker loads `work.source.*` conf and runs discovery scripts to discover > resources. > 3. Worker reports to master cores, memory, and resources (new) and registers. > 4. An application starts. > 5. Master finds workers with sufficient available resources and let worker > start executor or driver process. > 6. Worker assigns executor / driver resources by passing the resource info > from command-line. > 7. Application ends. > 8. Master requests worker to kill driver/executor process. > 9. Master updates available resources. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-27371) Standalone master receives resource info from worker and allocate driver/executor properly
[ https://issues.apache.org/jira/browse/SPARK-27371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-27371. --- Resolution: Fixed Fix Version/s: 3.0.0 > Standalone master receives resource info from worker and allocate > driver/executor properly > -- > > Key: SPARK-27371 > URL: https://issues.apache.org/jira/browse/SPARK-27371 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Xiangrui Meng >Assignee: wuyi >Priority: Major > Fix For: 3.0.0 > > > As an admin, I can let Spark Standalone worker automatically discover GPUs > installed on worker nodes. So I don't need to manually configure them. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27492) GPU scheduling - High level user documentation
[ https://issues.apache.org/jira/browse/SPARK-27492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated SPARK-27492: -- Description: For the SPIP - Accelerator-aware task scheduling for Spark, https://issues.apache.org/jira/browse/SPARK-24615 Add some high level user documentation about how this feature works together and point to things like the example discovery script, etc. - make sure to document the discovery script and what permissions are needed and any security implications - Document standalone - local-cluster mode limitation of only a single resource file or discovery script so you have to have coordination on for it to work right. was: For the SPIP - Accelerator-aware task scheduling for Spark, https://issues.apache.org/jira/browse/SPARK-24615 Add some high level user documentation about how this feature works together and point to things like the example discovery script, etc. - make sure to document the discovery script and what permissions are needed and any security implications > GPU scheduling - High level user documentation > -- > > Key: SPARK-27492 > URL: https://issues.apache.org/jira/browse/SPARK-27492 > Project: Spark > Issue Type: Story > Components: Documentation >Affects Versions: 3.0.0 >Reporter: Thomas Graves >Priority: Major > > For the SPIP - Accelerator-aware task scheduling for Spark, > https://issues.apache.org/jira/browse/SPARK-24615 Add some high level user > documentation about how this feature works together and point to things like > the example discovery script, etc. > > - make sure to document the discovery script and what permissions are needed > and any security implications > - Document standalone - local-cluster mode limitation of only a single > resource file or discovery script so you have to have coordination on for it > to work right. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27495) SPIP: Support Stage level resource configuration and scheduling
[ https://issues.apache.org/jira/browse/SPARK-27495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated SPARK-27495: -- Description: *Q1.* What are you trying to do? Articulate your objectives using absolutely no jargon. Objectives: # Allow users to specify task and executor resource requirements at the stage level. # Spark will use the stage level requirements to acquire the necessary resources/executors and schedule tasks based on the per stage requirements. Many times users have different resource requirements for different stages of their application so they want to be able to configure resources at the stage level. For instance, you have a single job that has 2 stages. The first stage does some ETL which requires a lot of tasks, each with a small amount of memory and 1 core each. Then you have a second stage where you feed that ETL data into an ML algorithm. The second stage only requires a few executors but each executor needs a lot of memory, GPUs, and many cores. This feature allows the user to specify the task and executor resource requirements for the ETL Stage and then change them for the ML stage of the job. Resources include cpu, memory (on heap, overhead, pyspark, and off heap), and extra Resources (GPU/FPGA/etc). It has the potential to allow for other things like limiting the number of tasks per stage, specifying other parameters for things like shuffle, etc. Initially I would propose we only support resources as they are now. So Task resources would be cpu and other resources (GPU, FPGA), that way we aren't adding in extra scheduling things at this point. Executor resources would be cpu, memory, and extra resources(GPU,FPGA, etc). Changing the executor resources will rely on dynamic allocation being enabled. Main use cases: # ML use case where user does ETL and feeds it into an ML algorithm where it’s using the RDD API. This should work with barrier scheduling as well once it supports dynamic allocation. # Spark internal use by catalyst. Catalyst could control the stage level resources as it finds the need to change it between stages for different optimizations. For instance, with the new columnar plugin to the query planner we can insert stages into the plan that would change running something on the CPU in row format to running it on the GPU in columnar format. This API would allow the planner to make sure the stages that run on the GPU get the corresponding GPU resources it needs to run. Another possible use case for catalyst is that it would allow catalyst to add in more optimizations to where the user doesn’t need to configure container sizes at all. If the optimizer/planner can handle that for the user, everyone wins. This SPIP focuses on the RDD API but we don’t exclude the Dataset API. I think the DataSet API will require more changes because it specifically hides the RDD from the users via the plans and catalyst can optimize the plan and insert things into the plan. The only way I’ve found to make this work with the Dataset API would be modifying all the plans to be able to get the resource requirements down into where it creates the RDDs, which I believe would be a lot of change. If other people know better options, it would be great to hear them. *Q2.* What problem is this proposal NOT designed to solve? The initial implementation is not going to add Dataset APIs. We are starting with allowing users to specify a specific set of task/executor resources and plan to design it to be extendable, but the first implementation will not support changing generic SparkConf configs and only specific limited resources. This initial version will have a programmatic API for specifying the resource requirements per stage, we can add the ability to perhaps have profiles in the configs later if its useful. *Q3.* How is it done today, and what are the limits of current practice? Currently this is either done by having multiple spark jobs or requesting containers with the max resources needed for any part of the job. To do this today, you can break it into separate jobs where each job requests the corresponding resources needed, but then you have to write the data out somewhere and then read it back in between jobs. This is going to take longer as well as require that job coordination between those to make sure everything works smoothly. Another option would be to request executors with your largest need up front and potentially waste those resources when they aren't being used, which in turn wastes money. For instance, for an ML application where it does ETL first, many times people request containers with GPUs and the GPUs sit idle while the ETL is happening. This is wasting those GPU resources and in turn money because those GPUs could have been used by other applications until they were really needed. Note for the catalyst internal
[jira] [Updated] (SPARK-27495) SPIP: Support Stage level resource configuration and scheduling
[ https://issues.apache.org/jira/browse/SPARK-27495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated SPARK-27495: -- Description: *Q1.* What are you trying to do? Articulate your objectives using absolutely no jargon. Objectives: # Allow users to specify task and executor resource requirements at the stage level. # Spark will use the stage level requirements to acquire the necessary resources/executors and schedule tasks based on the per stage requirements. Many times users have different resource requirements for different stages of their application so they want to be able to configure resources at the stage level. For instance, you have a single job that has 2 stages. The first stage does some ETL which requires a lot of tasks, each with a small amount of memory and 1 core each. Then you have a second stage where you feed that ETL data into an ML algorithm. The second stage only requires a few executors but each executor needs a lot of memory, GPUs, and many cores. This feature allows the user to specify the task and executor resource requirements for the ETL Stage and then change them for the ML stage of the job. Resources include cpu, memory (on heap, overhead, pyspark, and off heap), and extra Resources (GPU/FPGA/etc). It has the potential to allow for other things like limiting the number of tasks per stage, specifying other parameters for things like shuffle, etc. Initially I would propose we only support resources as they are now. So Task resources would be cpu and other resources (GPU, FPGA), that way we aren't adding in extra scheduling things at this point. Executor resources would be cpu, memory, and extra resources(GPU,FPGA, etc). Changing the executor resources will rely on dynamic allocation being enabled. Main use cases: # ML use case where user does ETL and feeds it into an ML algorithm where it’s using the RDD API. This should work with barrier scheduling as well once it supports dynamic allocation. # Spark internal use by catalyst. Catalyst could control the stage level resources as it finds the need to change it between stages for different optimizations. For instance, with the new columnar plugin to the query planner we can insert stages into the plan that would change running something on the CPU in row format to running it on the GPU in columnar format. This API would allow the planner to make sure the stages that run on the GPU get the corresponding GPU resources it needs to run. Another possible use case for catalyst is that it would allow catalyst to add in more optimizations to where the user doesn’t need to configure container sizes at all. If the optimizer/planner can handle that for the user, everyone wins. This SPIP focuses on the RDD API but we don’t exclude the Dataset API. I think the DataSet API will require more changes because it specifically hides the RDD from the users via the plans and catalyst can optimize the plan and insert things into the plan. The only way I’ve found to make this work with the Dataset API would be modifying all the plans to be able to get the resource requirements down into where it creates the RDDs, which I believe would be a lot of change. If other people know better options, it would be great to hear them. *Q2.* What problem is this proposal NOT designed to solve? The initial implementation is not going to add Dataset APIs. We are starting with allowing users to specify a specific set of task/executor resources and plan to design it to be extendable, but the first implementation will not support changing generic SparkConf configs and only specific limited resources. This initial version will have a programmatic API for specifying the resource requirements per stage, we can add the ability to perhaps have profiles in the configs later if its useful. *Q3.* How is it done today, and what are the limits of current practice? Currently this is either done by having multiple spark jobs or requesting containers with the max resources needed for any part of the job. To do this today, you can break it into separate jobs where each job requests the corresponding resources needed, but then you have to write the data out somewhere and then read it back in between jobs. This is going to take longer as well as require that job coordination between those to make sure everything works smoothly. Another option would be to request executors with your largest need up front and potentially waste those resources when they aren't being used, which in turn wastes money. For instance, for an ML application where it does ETL first, many times people request containers with GPUs and the GPUs sit idle while the ETL is happening. This is wasting those GPU resources and in turn money because those GPUs could have been used by other applications until they were really needed. Note for the catalyst internal us
[jira] [Commented] (SPARK-27495) SPIP: Support Stage level resource configuration and scheduling
[ https://issues.apache.org/jira/browse/SPARK-27495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902076#comment-16902076 ] Thomas Graves commented on SPARK-27495: --- Thanks for the comments. # I was trying to leave the API open to allow both. Not sure you had a chance to look at the design/api doc, but with the ResourceProfile I have a .require function, that would be a must have and then mentioned in there we could add a .prefer function that would be more of a hint or a nice to have. Also I should have clarified (and I'll update the description) that in this initial implementation I only intend to support scheduling on the same things we do now. So for Tasks it would only be CPU and other resources (GPU, FPGa, etc). For Executor resource we would support requesting cpu, memory (offheap, onheap, overhead, pyspark), and other resources. So I wouldn't add support for task memory per stage at this point. It would also take other changes to enforce a task memory usage as well. But your point is valid if that gets added so I think we need to be careful about what we allow for each. # So I think you have brought up 2 issues here. ## One is how we do resource merging/conflict resolution. My proposal was to simply look at the RDD's within the stage and merge any resource profiles. There are multiple ways we could do the merge, one is naively just use a max, but as you pointed out some operations might be better done as a sum, so I was hoping to be able to either make it configurable with perhaps some defaults for different operations. I guess the question is if we need that for the initial implementation or we just make sure to design for it. I think you are also proposing to use the parents RDD (do you just mean parents within a stage or always?) but I think that all depends on how we define it and how the user is going to expect it. I might have a very large RDD that needs processing that shrinks down to a very small one. To process that large RDD I need a lot of resources, but once its shrunk I don't need that many. So the child of that wouldn't need the same resources as its parent. That is why I was proposing to set the resources specifically for that RDD. We need to handle those that have shuffle, like groupBy to make sure its carried over. ## so the other things you brought up I think it talking more about the task resources here since you are saying per partition, correct? There are definitely still issues even within the same RDD that some partitions may be larger then others. I'm not really trying to solve that problem here. This feature may open it up so that Spark can to more intelligent things there, but this initial round users would still have to configure the resources for the worse case. I'm essentially leaving the configs the same as what they have now. Users I think are used to defining the executor resources based on the worse case any task within the worst stage will consume, this just allows them to go one step further to control it per stage. It also hopefully opens it up to do more of that configuration for them in the future. For now, the idea is why would only configure the resources for the stages they want, all other stages would default back to the global configs. I'll clarify this in the spip as well. We could change this but I think it gets hard for the user to know the scoping. > SPIP: Support Stage level resource configuration and scheduling > --- > > Key: SPARK-27495 > URL: https://issues.apache.org/jira/browse/SPARK-27495 > Project: Spark > Issue Type: Epic > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Thomas Graves >Assignee: Thomas Graves >Priority: Major > > *Q1.* What are you trying to do? Articulate your objectives using absolutely > no jargon. > Objectives: > # Allow users to specify task and executor resource requirements at the > stage level. > # Spark will use the stage level requirements to acquire the necessary > resources/executors and schedule tasks based on the per stage requirements. > Many times users have different resource requirements for different stages of > their application so they want to be able to configure resources at the stage > level. For instance, you have a single job that has 2 stages. The first stage > does some ETL which requires a lot of tasks, each with a small amount of > memory and 1 core each. Then you have a second stage where you feed that ETL > data into an ML algorithm. The second stage only requires a few executors but > each executor needs a lot of memory, GPUs, and many cores. This feature > allows the user to specify the task and executor resource requirements for > the ETL Stage and then
[jira] [Assigned] (SPARK-27495) SPIP: Support Stage level resource configuration and scheduling
[ https://issues.apache.org/jira/browse/SPARK-27495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves reassigned SPARK-27495: - Assignee: Thomas Graves > SPIP: Support Stage level resource configuration and scheduling > --- > > Key: SPARK-27495 > URL: https://issues.apache.org/jira/browse/SPARK-27495 > Project: Spark > Issue Type: Epic > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Thomas Graves >Assignee: Thomas Graves >Priority: Major > > *Q1.* What are you trying to do? Articulate your objectives using absolutely > no jargon. > Objectives: > # Allow users to specify task and executor resource requirements at the > stage level. > # Spark will use the stage level requirements to acquire the necessary > resources/executors and schedule tasks based on the per stage requirements. > Many times users have different resource requirements for different stages of > their application so they want to be able to configure resources at the stage > level. For instance, you have a single job that has 2 stages. The first stage > does some ETL which requires a lot of tasks, each with a small amount of > memory and 1 core each. Then you have a second stage where you feed that ETL > data into an ML algorithm. The second stage only requires a few executors but > each executor needs a lot of memory, GPUs, and many cores. This feature > allows the user to specify the task and executor resource requirements for > the ETL Stage and then change them for the ML stage of the job. > Resources include cpu, memory (on heap, overhead, pyspark, and off heap), and > extra Resources (GPU/FPGA/etc). It has the potential to allow for other > things like limiting the number of tasks per stage, specifying other > parameters for things like shuffle, etc. Changing the executor resources will > rely on dynamic allocation being enabled. > Main use cases: > # ML use case where user does ETL and feeds it into an ML algorithm where > it’s using the RDD API. This should work with barrier scheduling as well once > it supports dynamic allocation. > # Spark internal use by catalyst. Catalyst could control the stage level > resources as it finds the need to change it between stages for different > optimizations. For instance, with the new columnar plugin to the query > planner we can insert stages into the plan that would change running > something on the CPU in row format to running it on the GPU in columnar > format. This API would allow the planner to make sure the stages that run on > the GPU get the corresponding GPU resources it needs to run. Another possible > use case for catalyst is that it would allow catalyst to add in more > optimizations to where the user doesn’t need to configure container sizes at > all. If the optimizer/planner can handle that for the user, everyone wins. > This SPIP focuses on the RDD API but we don’t exclude the Dataset API. I > think the DataSet API will require more changes because it specifically hides > the RDD from the users via the plans and catalyst can optimize the plan and > insert things into the plan. The only way I’ve found to make this work with > the Dataset API would be modifying all the plans to be able to get the > resource requirements down into where it creates the RDDs, which I believe > would be a lot of change. If other people know better options, it would be > great to hear them. > *Q2.* What problem is this proposal NOT designed to solve? > The initial implementation is not going to add Dataset APIs. > We are starting with allowing users to specify a specific set of > task/executor resources and plan to design it to be extendable, but the first > implementation will not support changing generic SparkConf configs and only > specific limited resources. > This initial version will have a programmatic API for specifying the resource > requirements per stage, we can add the ability to perhaps have profiles in > the configs later if its useful. > *Q3.* How is it done today, and what are the limits of current practice? > Currently this is either done by having multiple spark jobs or requesting > containers with the max resources needed for any part of the job. To do this > today, you can break it into separate jobs where each job requests the > corresponding resources needed, but then you have to write the data out > somewhere and then read it back in between jobs. This is going to take > longer as well as require that job coordination between those to make sure > everything works smoothly. Another option would be to request executors with > your largest need up front and potentially waste those resources when they > aren't being used, which in turn wastes money. For instance, for an ML > ap
[jira] [Commented] (SPARK-27495) SPIP: Support Stage level resource configuration and scheduling
[ https://issues.apache.org/jira/browse/SPARK-27495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901253#comment-16901253 ] Thomas Graves commented on SPARK-27495: --- I updated the Jira to have the SPIP text in description. I split the appendices out into a google doc because it was becoming long and to allow inline comments in the doc. I can pull them back in here if people prefer. > SPIP: Support Stage level resource configuration and scheduling > --- > > Key: SPARK-27495 > URL: https://issues.apache.org/jira/browse/SPARK-27495 > Project: Spark > Issue Type: Epic > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Thomas Graves >Priority: Major > > *Q1.* What are you trying to do? Articulate your objectives using absolutely > no jargon. > Objectives: > # Allow users to specify task and executor resource requirements at the > stage level. > # Spark will use the stage level requirements to acquire the necessary > resources/executors and schedule tasks based on the per stage requirements. > Many times users have different resource requirements for different stages of > their application so they want to be able to configure resources at the stage > level. For instance, you have a single job that has 2 stages. The first stage > does some ETL which requires a lot of tasks, each with a small amount of > memory and 1 core each. Then you have a second stage where you feed that ETL > data into an ML algorithm. The second stage only requires a few executors but > each executor needs a lot of memory, GPUs, and many cores. This feature > allows the user to specify the task and executor resource requirements for > the ETL Stage and then change them for the ML stage of the job. > Resources include cpu, memory (on heap, overhead, pyspark, and off heap), and > extra Resources (GPU/FPGA/etc). It has the potential to allow for other > things like limiting the number of tasks per stage, specifying other > parameters for things like shuffle, etc. Changing the executor resources will > rely on dynamic allocation being enabled. > Main use cases: > # ML use case where user does ETL and feeds it into an ML algorithm where > it’s using the RDD API. This should work with barrier scheduling as well once > it supports dynamic allocation. > # Spark internal use by catalyst. Catalyst could control the stage level > resources as it finds the need to change it between stages for different > optimizations. For instance, with the new columnar plugin to the query > planner we can insert stages into the plan that would change running > something on the CPU in row format to running it on the GPU in columnar > format. This API would allow the planner to make sure the stages that run on > the GPU get the corresponding GPU resources it needs to run. Another possible > use case for catalyst is that it would allow catalyst to add in more > optimizations to where the user doesn’t need to configure container sizes at > all. If the optimizer/planner can handle that for the user, everyone wins. > This SPIP focuses on the RDD API but we don’t exclude the Dataset API. I > think the DataSet API will require more changes because it specifically hides > the RDD from the users via the plans and catalyst can optimize the plan and > insert things into the plan. The only way I’ve found to make this work with > the Dataset API would be modifying all the plans to be able to get the > resource requirements down into where it creates the RDDs, which I believe > would be a lot of change. If other people know better options, it would be > great to hear them. > *Q2.* What problem is this proposal NOT designed to solve? > The initial implementation is not going to add Dataset APIs. > We are starting with allowing users to specify a specific set of > task/executor resources and plan to design it to be extendable, but the first > implementation will not support changing generic SparkConf configs and only > specific limited resources. > This initial version will have a programmatic API for specifying the resource > requirements per stage, we can add the ability to perhaps have profiles in > the configs later if its useful. > *Q3.* How is it done today, and what are the limits of current practice? > Currently this is either done by having multiple spark jobs or requesting > containers with the max resources needed for any part of the job. To do this > today, you can break it into separate jobs where each job requests the > corresponding resources needed, but then you have to write the data out > somewhere and then read it back in between jobs. This is going to take > longer as well as require that job coordination between those to make sure > everything works smoothly.
[jira] [Updated] (SPARK-27495) SPIP: Support Stage level resource configuration and scheduling
[ https://issues.apache.org/jira/browse/SPARK-27495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated SPARK-27495: -- Description: *Q1.* What are you trying to do? Articulate your objectives using absolutely no jargon. Objectives: # Allow users to specify task and executor resource requirements at the stage level. # Spark will use the stage level requirements to acquire the necessary resources/executors and schedule tasks based on the per stage requirements. Many times users have different resource requirements for different stages of their application so they want to be able to configure resources at the stage level. For instance, you have a single job that has 2 stages. The first stage does some ETL which requires a lot of tasks, each with a small amount of memory and 1 core each. Then you have a second stage where you feed that ETL data into an ML algorithm. The second stage only requires a few executors but each executor needs a lot of memory, GPUs, and many cores. This feature allows the user to specify the task and executor resource requirements for the ETL Stage and then change them for the ML stage of the job. Resources include cpu, memory (on heap, overhead, pyspark, and off heap), and extra Resources (GPU/FPGA/etc). It has the potential to allow for other things like limiting the number of tasks per stage, specifying other parameters for things like shuffle, etc. Changing the executor resources will rely on dynamic allocation being enabled. Main use cases: # ML use case where user does ETL and feeds it into an ML algorithm where it’s using the RDD API. This should work with barrier scheduling as well once it supports dynamic allocation. # Spark internal use by catalyst. Catalyst could control the stage level resources as it finds the need to change it between stages for different optimizations. For instance, with the new columnar plugin to the query planner we can insert stages into the plan that would change running something on the CPU in row format to running it on the GPU in columnar format. This API would allow the planner to make sure the stages that run on the GPU get the corresponding GPU resources it needs to run. Another possible use case for catalyst is that it would allow catalyst to add in more optimizations to where the user doesn’t need to configure container sizes at all. If the optimizer/planner can handle that for the user, everyone wins. This SPIP focuses on the RDD API but we don’t exclude the Dataset API. I think the DataSet API will require more changes because it specifically hides the RDD from the users via the plans and catalyst can optimize the plan and insert things into the plan. The only way I’ve found to make this work with the Dataset API would be modifying all the plans to be able to get the resource requirements down into where it creates the RDDs, which I believe would be a lot of change. If other people know better options, it would be great to hear them. *Q2.* What problem is this proposal NOT designed to solve? The initial implementation is not going to add Dataset APIs. We are starting with allowing users to specify a specific set of task/executor resources and plan to design it to be extendable, but the first implementation will not support changing generic SparkConf configs and only specific limited resources. This initial version will have a programmatic API for specifying the resource requirements per stage, we can add the ability to perhaps have profiles in the configs later if its useful. *Q3.* How is it done today, and what are the limits of current practice? Currently this is either done by having multiple spark jobs or requesting containers with the max resources needed for any part of the job. To do this today, you can break it into separate jobs where each job requests the corresponding resources needed, but then you have to write the data out somewhere and then read it back in between jobs. This is going to take longer as well as require that job coordination between those to make sure everything works smoothly. Another option would be to request executors with your largest need up front and potentially waste those resources when they aren't being used, which in turn wastes money. For instance, for an ML application where it does ETL first, many times people request containers with GPUs and the GPUs sit idle while the ETL is happening. This is wasting those GPU resources and in turn money because those GPUs could have been used by other applications until they were really needed. Note for the catalyst internal use, that can’t be done today. *Q4.* What is new in your approach and why do you think it will be successful? This is a new way for users to specify the per stage resource requirements. This will give users and Spark a lot more flexibility within a job and get better utilization
[jira] [Updated] (SPARK-27495) SPIP: Support Stage level resource configuration and scheduling
[ https://issues.apache.org/jira/browse/SPARK-27495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated SPARK-27495: -- Description: *Q1.* What are you trying to do? Articulate your objectives using absolutely no jargon. Objectives: # Allow users to specify task and executor resource requirements at the stage level. # Spark will use the stage level requirements to acquire the necessary resources/executors and schedule tasks based on the per stage requirements. Many times users have different resource requirements for different stages of their application so they want to be able to configure resources at the stage level. For instance, you have a single job that has 2 stages. The first stage does some ETL which requires a lot of tasks, each with a small amount of memory and 1 core each. Then you have a second stage where you feed that ETL data into an ML algorithm. The second stage only requires a few executors but each executor needs a lot of memory, GPUs, and many cores. This feature allows the user to specify the task and executor resource requirements for the ETL Stage and then change them for the ML stage of the job. Resources include cpu, memory (on heap, overhead, pyspark, and off heap), and extra Resources (GPU/FPGA/etc). It has the potential to allow for other things like limiting the number of tasks per stage, specifying other parameters for things like shuffle, etc. Changing the executor resources will rely on dynamic allocation being enabled. Main use cases: # ML use case where user does ETL and feeds it into an ML algorithm where it’s using the RDD API. This should work with barrier scheduling as well once it supports dynamic allocation. # Spark internal use by catalyst. Catalyst could control the stage level resources as it finds the need to change it between stages for different optimizations. For instance, with the new columnar plugin to the query planner we can insert stages into the plan that would change running something on the CPU in row format to running it on the GPU in columnar format. This API would allow the planner to make sure the stages that run on the GPU get the corresponding GPU resources it needs to run. Another possible use case for catalyst is that it would allow catalyst to add in more optimizations to where the user doesn’t need to configure container sizes at all. If the optimizer/planner can handle that for the user, everyone wins. This SPIP focuses on the RDD API but we don’t exclude the Dataset API. I think the DataSet API will require more changes because it specifically hides the RDD from the users via the plans and catalyst can optimize the plan and insert things into the plan. The only way I’ve found to make this work with the Dataset API would be modifying all the plans to be able to get the resource requirements down into where it creates the RDDs, which I believe would be a lot of change. If other people know better options, it would be great to hear them. *Q2.* What problem is this proposal NOT designed to solve? The initial implementation is not going to add Dataset APIs. We are starting with allowing users to specify a specific set of task/executor resources and plan to design it to be extendable, but the first implementation will not support changing generic SparkConf configs and only specific limited resources. This initial version will have a programmatic API for specifying the resource requirements per stage, we can add the ability to perhaps have profiles in the configs later if its useful. *Q3.* How is it done today, and what are the limits of current practice? Currently this is either done by having multiple spark jobs or requesting containers with the max resources needed for any part of the job. To do this today, you can break it into separate jobs where each job requests the corresponding resources needed, but then you have to write the data out somewhere and then read it back in between jobs. This is going to take longer as well as require that job coordination between those to make sure everything works smoothly. Another option would be to request executors with your largest need up front and potentially waste those resources when they aren't being used, which in turn wastes money. For instance, for an ML application where it does ETL first, many times people request containers with GPUs and the GPUs sit idle while the ETL is happening. This is wasting those GPU resources and in turn money because those GPUs could have been used by other applications until they were really needed. Note for the catalyst internal use, that can’t be done today. *Q4.* What is new in your approach and why do you think it will be successful? This is a new way for users to specify the per stage resource requirements. This will give users and Spark a lot more flexibility within a job and get better utili
[jira] [Updated] (SPARK-27495) SPIP: Support Stage level resource configuration and scheduling
[ https://issues.apache.org/jira/browse/SPARK-27495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated SPARK-27495: -- Issue Type: Epic (was: Story) > SPIP: Support Stage level resource configuration and scheduling > --- > > Key: SPARK-27495 > URL: https://issues.apache.org/jira/browse/SPARK-27495 > Project: Spark > Issue Type: Epic > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Thomas Graves >Priority: Major > > Currently Spark supports CPU level scheduling and we are adding in > accelerator aware scheduling with > https://issues.apache.org/jira/browse/SPARK-24615, but both of those are > scheduling via application level configurations. Meaning there is one > configuration that is set for the entire lifetime of the application and the > user can't change it between Spark jobs/stages within that application. > Many times users have different requirements for different stages of their > application so they want to be able to configure at the stage level what > resources are required for that stage. > For example, I might start a spark application which first does some ETL work > that needs lots of cores to run many tasks in parallel, then once that is > done I want to run some ML job and at that point I want GPU's, less CPU's, > and more memory. > With this Jira we want to add the ability for users to specify the resources > for different stages. > Note that https://issues.apache.org/jira/browse/SPARK-24615 had some > discussions on this but this part of it was removed from that. > We should come up with a proposal on how to do this. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27495) SPIP: Support Stage level resource configuration and scheduling
[ https://issues.apache.org/jira/browse/SPARK-27495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated SPARK-27495: -- Summary: SPIP: Support Stage level resource configuration and scheduling (was: Support Stage level resource configuration and scheduling) > SPIP: Support Stage level resource configuration and scheduling > --- > > Key: SPARK-27495 > URL: https://issues.apache.org/jira/browse/SPARK-27495 > Project: Spark > Issue Type: Story > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Thomas Graves >Priority: Major > > Currently Spark supports CPU level scheduling and we are adding in > accelerator aware scheduling with > https://issues.apache.org/jira/browse/SPARK-24615, but both of those are > scheduling via application level configurations. Meaning there is one > configuration that is set for the entire lifetime of the application and the > user can't change it between Spark jobs/stages within that application. > Many times users have different requirements for different stages of their > application so they want to be able to configure at the stage level what > resources are required for that stage. > For example, I might start a spark application which first does some ETL work > that needs lots of cores to run many tasks in parallel, then once that is > done I want to run some ML job and at that point I want GPU's, less CPU's, > and more memory. > With this Jira we want to add the ability for users to specify the resources > for different stages. > Note that https://issues.apache.org/jira/browse/SPARK-24615 had some > discussions on this but this part of it was removed from that. > We should come up with a proposal on how to do this. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28414) Standalone worker/master UI updates for Resource scheduling
Thomas Graves created SPARK-28414: - Summary: Standalone worker/master UI updates for Resource scheduling Key: SPARK-28414 URL: https://issues.apache.org/jira/browse/SPARK-28414 Project: Spark Issue Type: Sub-task Components: Deploy Affects Versions: 3.0.0 Reporter: Thomas Graves https://issues.apache.org/jira/browse/SPARK-27360 is adding Resource scheduling to standalone mode. Update the UI with the resource information for master/worker as appropriate. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28403) Executor Allocation Manager can add an extra executor when speculative tasks
Thomas Graves created SPARK-28403: - Summary: Executor Allocation Manager can add an extra executor when speculative tasks Key: SPARK-28403 URL: https://issues.apache.org/jira/browse/SPARK-28403 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.3.0 Reporter: Thomas Graves It looks like SPARK-19326 added a bug in the execuctor allocation maanger where it adds an extra executor when it shouldn't when we have pending speculative tasks but the target number didn't change. [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala#L377] It doesn't look like this is necessary since it already added in the pendingSpeculative tasks. See the questioning of this on the PR at: https://github.com/apache/spark/pull/18492/files#diff-b096353602813e47074ace09a3890d56R379 -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28213) Remove duplication between columnar and ColumnarBatchScan
[ https://issues.apache.org/jira/browse/SPARK-28213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves reassigned SPARK-28213: - Assignee: Robert Joseph Evans > Remove duplication between columnar and ColumnarBatchScan > - > > Key: SPARK-28213 > URL: https://issues.apache.org/jira/browse/SPARK-28213 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Robert Joseph Evans >Assignee: Robert Joseph Evans >Priority: Major > > There is a lot of duplicate code between Columanr.scala and > ColumanrBatchScan. This should fix that. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-28213) Remove duplication between columnar and ColumnarBatchScan
[ https://issues.apache.org/jira/browse/SPARK-28213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-28213. --- Resolution: Fixed Fix Version/s: 3.0.0 > Remove duplication between columnar and ColumnarBatchScan > - > > Key: SPARK-28213 > URL: https://issues.apache.org/jira/browse/SPARK-28213 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Robert Joseph Evans >Assignee: Robert Joseph Evans >Priority: Major > Fix For: 3.0.0 > > > There is a lot of duplicate code between Columanr.scala and > ColumanrBatchScan. This should fix that. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24615) SPIP: Accelerator-aware task scheduling for Spark
[ https://issues.apache.org/jira/browse/SPARK-24615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16882954#comment-16882954 ] Thomas Graves commented on SPARK-24615: --- Hey [~jomach], thanks for the offer, you can find SPIP attached and designs are in the EPIC jiras. So yes we are working on the issues in the Epic, and most of them are complete, see the links above. I think most of them are already assigned, currently we are waiting on the standalonemode implementation PR to be reviewed. I think the only one not assigned would be the mesos one, but we were waiting on someone with mesos experience that was wanting that, if that is your background you would take a look at that. > SPIP: Accelerator-aware task scheduling for Spark > - > > Key: SPARK-24615 > URL: https://issues.apache.org/jira/browse/SPARK-24615 > Project: Spark > Issue Type: Epic > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Saisai Shao >Assignee: Thomas Graves >Priority: Major > Labels: Hydrogen, SPIP > Attachments: Accelerator-aware scheduling in Apache Spark 3.0.pdf, > SPIP_ Accelerator-aware scheduling.pdf > > > (The JIRA received a major update on 2019/02/28. Some comments were based on > an earlier version. Please ignore them. New comments start at > [#comment-16778026].) > h2. Background and Motivation > GPUs and other accelerators have been widely used for accelerating special > workloads, e.g., deep learning and signal processing. While users from the AI > community use GPUs heavily, they often need Apache Spark to load and process > large datasets and to handle complex data scenarios like streaming. YARN and > Kubernetes already support GPUs in their recent releases. Although Spark > supports those two cluster managers, Spark itself is not aware of GPUs > exposed by them and hence Spark cannot properly request GPUs and schedule > them for users. This leaves a critical gap to unify big data and AI workloads > and make life simpler for end users. > To make Spark be aware of GPUs, we shall make two major changes at high level: > * At cluster manager level, we update or upgrade cluster managers to include > GPU support. Then we expose user interfaces for Spark to request GPUs from > them. > * Within Spark, we update its scheduler to understand available GPUs > allocated to executors, user task requests, and assign GPUs to tasks properly. > Based on the work done in YARN and Kubernetes to support GPUs and some > offline prototypes, we could have necessary features implemented in the next > major release of Spark. You can find a detailed scoping doc here, where we > listed user stories and their priorities. > h2. Goals > * Make Spark 3.0 GPU-aware in standalone, YARN, and Kubernetes. > * No regression on scheduler performance for normal jobs. > h2. Non-goals > * Fine-grained scheduling within one GPU card. > ** We treat one GPU card and its memory together as a non-divisible unit. > * Support TPU. > * Support Mesos. > * Support Windows. > h2. Target Personas > * Admins who need to configure clusters to run Spark with GPU nodes. > * Data scientists who need to build DL applications on Spark. > * Developers who need to integrate DL features on Spark. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28234) Spark Resources - add python support to get resources
[ https://issues.apache.org/jira/browse/SPARK-28234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16882092#comment-16882092 ] Thomas Graves commented on SPARK-28234: --- Testing driver side: {code:java} >>> sc.resources['gpu'].addresses ['0', '1'] >>> sc.resources['gpu'].name 'gpu'{code} basic code for testing executor side: {code:java} from pyspark import TaskContext import socket def task_info(*_): ctx = TaskContext() return ["addrs: {0}".format(ctx.resources()['gpu'].addresses)] for x in sc.parallelize([], 8).mapPartitions(task_info).collect(): print(x) {code} > Spark Resources - add python support to get resources > - > > Key: SPARK-28234 > URL: https://issues.apache.org/jira/browse/SPARK-28234 > Project: Spark > Issue Type: Story > Components: PySpark, Spark Core >Affects Versions: 3.0.0 >Reporter: Thomas Graves >Assignee: Thomas Graves >Priority: Major > > Add the equivalent python api for sc.resources and TaskContext.resources -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28234) Spark Resources - add python support to get resources
[ https://issues.apache.org/jira/browse/SPARK-28234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves reassigned SPARK-28234: - Assignee: Thomas Graves > Spark Resources - add python support to get resources > - > > Key: SPARK-28234 > URL: https://issues.apache.org/jira/browse/SPARK-28234 > Project: Spark > Issue Type: Story > Components: PySpark, Spark Core >Affects Versions: 3.0.0 >Reporter: Thomas Graves >Assignee: Thomas Graves >Priority: Major > > Add the equivalent python api for sc.resources and TaskContext.resources -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28234) Spark Resources - add python support to get resources
Thomas Graves created SPARK-28234: - Summary: Spark Resources - add python support to get resources Key: SPARK-28234 URL: https://issues.apache.org/jira/browse/SPARK-28234 Project: Spark Issue Type: Story Components: Spark Core Affects Versions: 3.0.0 Reporter: Thomas Graves Add the equivalent python api for sc.resources and TaskContext.resources -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27360) Standalone cluster mode support for GPU-aware scheduling
[ https://issues.apache.org/jira/browse/SPARK-27360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16877086#comment-16877086 ] Thomas Graves commented on SPARK-27360: --- Are you going to handle updating the master/worker UI to include resources? > Standalone cluster mode support for GPU-aware scheduling > > > Key: SPARK-27360 > URL: https://issues.apache.org/jira/browse/SPARK-27360 > Project: Spark > Issue Type: Story > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Xiangrui Meng >Assignee: Xiangrui Meng >Priority: Major > > Design and implement standalone manager support for GPU-aware scheduling: > 1. static conf to describe resources > 2. spark-submit to request resources > 2. auto discovery of GPUs > 3. executor process isolation -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-27945) Make minimal changes to support columnar processing
[ https://issues.apache.org/jira/browse/SPARK-27945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves reassigned SPARK-27945: - Assignee: Robert Joseph Evans > Make minimal changes to support columnar processing > --- > > Key: SPARK-27945 > URL: https://issues.apache.org/jira/browse/SPARK-27945 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Robert Joseph Evans >Assignee: Robert Joseph Evans >Priority: Major > > As the first step for SPARK-27396 this is to put in the minimum changes > needed to allow a plugin to support columnar processing. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-27945) Make minimal changes to support columnar processing
[ https://issues.apache.org/jira/browse/SPARK-27945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-27945. --- Resolution: Fixed Fix Version/s: 3.0.0 > Make minimal changes to support columnar processing > --- > > Key: SPARK-27945 > URL: https://issues.apache.org/jira/browse/SPARK-27945 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Robert Joseph Evans >Assignee: Robert Joseph Evans >Priority: Major > Fix For: 3.0.0 > > > As the first step for SPARK-27396 this is to put in the minimum changes > needed to allow a plugin to support columnar processing. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28116) Fix Flaky `SparkContextSuite.test resource scheduling under local-cluster mode`
[ https://issues.apache.org/jira/browse/SPARK-28116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16869803#comment-16869803 ] Thomas Graves commented on SPARK-28116: --- Sorry for my delay on this, I'll take a look. haven't seen that one before. > Fix Flaky `SparkContextSuite.test resource scheduling under local-cluster > mode` > --- > > Key: SPARK-28116 > URL: https://issues.apache.org/jira/browse/SPARK-28116 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Dongjoon Hyun >Priority: Major > > This test suite has two kind of failures. > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.7/6486/testReport/org.apache.spark/SparkContextSuite/test_resource_scheduling_under_local_cluster_mode/history/ > {code} > org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to > eventually never returned normally. Attempted 3921 times over > 1.88681567 minutes. Last failure message: > Array(org.apache.spark.SparkExecutorInfoImpl@533793be, > org.apache.spark.SparkExecutorInfoImpl@5018e57d, > org.apache.spark.SparkExecutorInfoImpl@dea2485, > org.apache.spark.SparkExecutorInfoImpl@9e63ecd) had size 4 instead of > expected size 3. > at > org.scalatest.concurrent.Eventually.tryTryAgain$1(Eventually.scala:432) > at org.scalatest.concurrent.Eventually.eventually(Eventually.scala:439) > at org.scalatest.concurrent.Eventually.eventually$(Eventually.scala:391) > at > org.apache.spark.SparkContextSuite.eventually(SparkContextSuite.scala:50) > at org.scalatest.concurrent.Eventually.eventually(Eventually.scala:337) > at org.scalatest.concurrent.Eventually.eventually$(Eventually.scala:336) > at > org.apache.spark.SparkContextSuite.eventually(SparkContextSuite.scala:50) > at > org.apache.spark.SparkContextSuite.$anonfun$new$93(SparkContextSuite.scala:885){code} > {code} > org.scalatest.exceptions.TestFailedException: Array("0", "0", "1", "1", "1", > "2", "2", "2", "2") did not equal List("0", "0", "0", "1", "1", "1", "2", > "2", "2") > at > org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:528) > at > org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:527) > at > org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560) > at > org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:501) > at > org.apache.spark.SparkContextSuite.$anonfun$new$93(SparkContextSuite.scala:894) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-27823) Add an abstraction layer for accelerator resource handling to avoid manipulating raw confs
[ https://issues.apache.org/jira/browse/SPARK-27823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves reassigned SPARK-27823: - Assignee: Thomas Graves > Add an abstraction layer for accelerator resource handling to avoid > manipulating raw confs > -- > > Key: SPARK-27823 > URL: https://issues.apache.org/jira/browse/SPARK-27823 > Project: Spark > Issue Type: Story > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Xiangrui Meng >Assignee: Thomas Graves >Priority: Major > > In SPARK-27488, we extract resource requests and allocation by parsing raw > Spark confs. This hurts readability because we didn't have the abstraction at > resource level. After we merge the core changes, we should do a refactoring > and make the code more readable. > See https://github.com/apache/spark/pull/24615#issuecomment-494580663. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-27760) Spark resources - user configs change .count to be .amount
[ https://issues.apache.org/jira/browse/SPARK-27760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-27760. --- Resolution: Fixed Fix Version/s: 3.0.0 > Spark resources - user configs change .count to be .amount > -- > > Key: SPARK-27760 > URL: https://issues.apache.org/jira/browse/SPARK-27760 > Project: Spark > Issue Type: Story > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Thomas Graves >Assignee: Thomas Graves >Priority: Major > Fix For: 3.0.0 > > > For the Spark resources, we created the config > spark.\{driver/executor}.resource.\{resourceName}.count > I think we should change .count to be .amount. That more easily allows users > to specify things with suffix like memory in a single config and they can > combine the value and unit. Without this they would have to specify 2 > separate configs (like .count and .unit) which seems more of a hassle for the > user. > Note the yarn configs for resources use amount: > spark.yarn.\{executor/driver/am}.resource=, where the amont> is value and unit together. I think that makes a lot of sense. Filed a > separate Jira to add .amount to the yarn configs as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24615) SPIP: Accelerator-aware task scheduling for Spark
[ https://issues.apache.org/jira/browse/SPARK-24615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves reassigned SPARK-24615: - Assignee: Thomas Graves (was: Xingbo Jiang) > SPIP: Accelerator-aware task scheduling for Spark > - > > Key: SPARK-24615 > URL: https://issues.apache.org/jira/browse/SPARK-24615 > Project: Spark > Issue Type: Epic > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Saisai Shao >Assignee: Thomas Graves >Priority: Major > Labels: Hydrogen, SPIP > Attachments: Accelerator-aware scheduling in Apache Spark 3.0.pdf, > SPIP_ Accelerator-aware scheduling.pdf > > > (The JIRA received a major update on 2019/02/28. Some comments were based on > an earlier version. Please ignore them. New comments start at > [#comment-16778026].) > h2. Background and Motivation > GPUs and other accelerators have been widely used for accelerating special > workloads, e.g., deep learning and signal processing. While users from the AI > community use GPUs heavily, they often need Apache Spark to load and process > large datasets and to handle complex data scenarios like streaming. YARN and > Kubernetes already support GPUs in their recent releases. Although Spark > supports those two cluster managers, Spark itself is not aware of GPUs > exposed by them and hence Spark cannot properly request GPUs and schedule > them for users. This leaves a critical gap to unify big data and AI workloads > and make life simpler for end users. > To make Spark be aware of GPUs, we shall make two major changes at high level: > * At cluster manager level, we update or upgrade cluster managers to include > GPU support. Then we expose user interfaces for Spark to request GPUs from > them. > * Within Spark, we update its scheduler to understand available GPUs > allocated to executors, user task requests, and assign GPUs to tasks properly. > Based on the work done in YARN and Kubernetes to support GPUs and some > offline prototypes, we could have necessary features implemented in the next > major release of Spark. You can find a detailed scoping doc here, where we > listed user stories and their priorities. > h2. Goals > * Make Spark 3.0 GPU-aware in standalone, YARN, and Kubernetes. > * No regression on scheduler performance for normal jobs. > h2. Non-goals > * Fine-grained scheduling within one GPU card. > ** We treat one GPU card and its memory together as a non-divisible unit. > * Support TPU. > * Support Mesos. > * Support Windows. > h2. Target Personas > * Admins who need to configure clusters to run Spark with GPU nodes. > * Data scientists who need to build DL applications on Spark. > * Developers who need to integrate DL features on Spark. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27760) Spark resources - user configs change .count to be .amount
[ https://issues.apache.org/jira/browse/SPARK-27760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated SPARK-27760: -- Description: For the Spark resources, we created the config spark.\{driver/executor}.resource.\{resourceName}.count I think we should change .count to be .amount. That more easily allows users to specify things with suffix like memory in a single config and they can combine the value and unit. Without this they would have to specify 2 separate configs (like .count and .unit) which seems more of a hassle for the user. Note the yarn configs for resources use amount: spark.yarn.\{executor/driver/am}.resource=, where the is value and unit together. I think that makes a lot of sense. Filed a separate Jira to add .amount to the yarn configs as well. was: For the Spark resources, we created the config spark.\{driver/executor}.resource.\{resourceName}.count I think we should change .count to be .amount. That more easily allows users to specify things with suffix like memory in a single config and they can combine the value and unit. Without this they would have to specify 2 separate configs which seems more of a hassle for the user. Note the yarn configs for resources use amount: spark.yarn.\{executor/driver/am}.resource=, where the is value and unit together. I think that makes a lot of sense. Filed a separate Jira to add .amount to the yarn configs as well. > Spark resources - user configs change .count to be .amount > -- > > Key: SPARK-27760 > URL: https://issues.apache.org/jira/browse/SPARK-27760 > Project: Spark > Issue Type: Story > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Thomas Graves >Assignee: Thomas Graves >Priority: Major > > For the Spark resources, we created the config > spark.\{driver/executor}.resource.\{resourceName}.count > I think we should change .count to be .amount. That more easily allows users > to specify things with suffix like memory in a single config and they can > combine the value and unit. Without this they would have to specify 2 > separate configs (like .count and .unit) which seems more of a hassle for the > user. > Note the yarn configs for resources use amount: > spark.yarn.\{executor/driver/am}.resource=, where the amont> is value and unit together. I think that makes a lot of sense. Filed a > separate Jira to add .amount to the yarn configs as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27760) Spark resources - user configs change .count to be .amount
[ https://issues.apache.org/jira/browse/SPARK-27760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated SPARK-27760: -- Description: For the Spark resources, we created the config spark.\{driver/executor}.resource.\{resourceName}.count I think we should change .count to be .amount. That more easily allows users to specify things with suffix like memory in a single config and they can combine the value and unit. Without this they would have to specify 2 separate configs which seems more of a hassle for the user. Note the yarn configs for resources use amount: spark.yarn.\{executor/driver/am}.resource=, where the is value and unit together. I think that makes a lot of sense. Filed a separate Jira to add .amount to the yarn configs as well. was: For the Spark resources, we created the config spark.\{driver/executor}.resource.\{resourceName}.count I think we should change .count to be .amount. That more easily allows users to specify things with suffix like memory in a single config and they can combine the value and unit. Without this they would have to specify 2 separate configs which seems more of a hassle for the user. > Spark resources - user configs change .count to be .amount > -- > > Key: SPARK-27760 > URL: https://issues.apache.org/jira/browse/SPARK-27760 > Project: Spark > Issue Type: Story > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Thomas Graves >Assignee: Thomas Graves >Priority: Major > > For the Spark resources, we created the config > spark.\{driver/executor}.resource.\{resourceName}.count > I think we should change .count to be .amount. That more easily allows users > to specify things with suffix like memory in a single config and they can > combine the value and unit. Without this they would have to specify 2 > separate configs which seems more of a hassle for the user. > Note the yarn configs for resources use amount: > spark.yarn.\{executor/driver/am}.resource=, where the amont> is value and unit together. I think that makes a lot of sense. Filed a > separate Jira to add .amount to the yarn configs as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-27760) Spark resources - user configs change .count to be .amount
[ https://issues.apache.org/jira/browse/SPARK-27760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves reassigned SPARK-27760: - Assignee: Thomas Graves > Spark resources - user configs change .count to be .amount > -- > > Key: SPARK-27760 > URL: https://issues.apache.org/jira/browse/SPARK-27760 > Project: Spark > Issue Type: Story > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Thomas Graves >Assignee: Thomas Graves >Priority: Major > > For the Spark resources, we created the config > spark.\{driver/executor}.resource.\{resourceName}.count > I think we should change .count to be .amount. That more easily allows users > to specify things with suffix like memory in a single config and they can > combine the value and unit. Without this they would have to specify 2 > separate configs which seems more of a hassle for the user. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27760) Spark resources - user configs change .count to be .amount
[ https://issues.apache.org/jira/browse/SPARK-27760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated SPARK-27760: -- Summary: Spark resources - user configs change .count to be .amount (was: Spark resources - user configs change .count to be .amount, and yarn configs should match) > Spark resources - user configs change .count to be .amount > -- > > Key: SPARK-27760 > URL: https://issues.apache.org/jira/browse/SPARK-27760 > Project: Spark > Issue Type: Story > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Thomas Graves >Priority: Major > > For the Spark resources, we created the config > spark.\{driver/executor}.resource.\{resourceName}.count > I think we should change .count to be .amount. That more easily allows users > to specify things with suffix like memory in a single config and they can > combine the value and unit. Without this they would have to specify 2 > separate configs which seems more of a hassle for the user. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27959) Change YARN resource configs to use .amount
Thomas Graves created SPARK-27959: - Summary: Change YARN resource configs to use .amount Key: SPARK-27959 URL: https://issues.apache.org/jira/browse/SPARK-27959 Project: Spark Issue Type: Story Components: YARN Affects Versions: 3.0.0 Reporter: Thomas Graves we are adding in generic resource support into spark where we have suffix for the amount of the resource so that we could support other configs. Spark on yarn already had added configs to request resources via the configs spark.yarn.\{executor/driver/am}.resource=, where the is value and unit together. We should change those configs to have a .amount suffix on them to match the spark configs and to allow future configs to be more easily added. YARN itself already supports tags and attributes so if we want the user to be able to pass those from spark at some point having a suffix makes sense. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27760) Spark resources - user configs change .count to be .amount, and yarn configs should match
[ https://issues.apache.org/jira/browse/SPARK-27760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated SPARK-27760: -- Description: For the Spark resources, we created the config spark.\{driver/executor}.resource.\{resourceName}.count I think we should change .count to be .amount. That more easily allows users to specify things with suffix like memory in a single config and they can combine the value and unit. Without this they would have to specify 2 separate configs which seems more of a hassle for the user. was: For the Spark resources, we created the config spark.\{driver/executor}.resource.\{resourceName}.count I think we should change .count to be .amount. That more easily allows users to specify things with suffix like memory in a single config and they can combine the value and unit. Without this they would have to specify 2 separate configs which seems more of a hassle for the user. Spark on yarn already had added configs to request resources via the configs spark.yarn.\{executor/driver/am}.resource=, where the is value and unit together. We should change those configs to have a .amount suffix on them to match the spark configs and to allow future configs to be more easily added. YARN itself already supports tags and attributes so if we want the user to be able to pass those from spark at some point having a suffix makes sense. > Spark resources - user configs change .count to be .amount, and yarn configs > should match > - > > Key: SPARK-27760 > URL: https://issues.apache.org/jira/browse/SPARK-27760 > Project: Spark > Issue Type: Story > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Thomas Graves >Priority: Major > > For the Spark resources, we created the config > spark.\{driver/executor}.resource.\{resourceName}.count > I think we should change .count to be .amount. That more easily allows users > to specify things with suffix like memory in a single config and they can > combine the value and unit. Without this they would have to specify 2 > separate configs which seems more of a hassle for the user. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-27364) User-facing APIs for GPU-aware scheduling
[ https://issues.apache.org/jira/browse/SPARK-27364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-27364. --- Resolution: Fixed Fix Version/s: 3.0.0 > User-facing APIs for GPU-aware scheduling > - > > Key: SPARK-27364 > URL: https://issues.apache.org/jira/browse/SPARK-27364 > Project: Spark > Issue Type: Story > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Xiangrui Meng >Assignee: Thomas Graves >Priority: Major > Fix For: 3.0.0 > > > Design and implement: > * General guidelines for cluster managers to understand resource requests at > application start. The concrete conf/param will be under the design of each > cluster manager. > * APIs to fetch assigned resources from task context. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27364) User-facing APIs for GPU-aware scheduling
[ https://issues.apache.org/jira/browse/SPARK-27364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16856895#comment-16856895 ] Thomas Graves commented on SPARK-27364: --- User facing changes are all committed so going to close this. A few changes from above. getResources was just called resources. The driver config for standalone mode takes a json file rather then individual address configs. (spark.driver.resourceFile) > User-facing APIs for GPU-aware scheduling > - > > Key: SPARK-27364 > URL: https://issues.apache.org/jira/browse/SPARK-27364 > Project: Spark > Issue Type: Story > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Xiangrui Meng >Assignee: Thomas Graves >Priority: Major > > Design and implement: > * General guidelines for cluster managers to understand resource requests at > application start. The concrete conf/param will be under the design of each > cluster manager. > * APIs to fetch assigned resources from task context. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-27374) Fetch assigned resources from TaskContext
[ https://issues.apache.org/jira/browse/SPARK-27374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-27374. --- Resolution: Duplicate > Fetch assigned resources from TaskContext > - > > Key: SPARK-27374 > URL: https://issues.apache.org/jira/browse/SPARK-27374 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Xiangrui Meng >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-27362) Kubernetes support for GPU-aware scheduling
[ https://issues.apache.org/jira/browse/SPARK-27362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-27362. --- Resolution: Fixed Fix Version/s: 3.0.0 > Kubernetes support for GPU-aware scheduling > --- > > Key: SPARK-27362 > URL: https://issues.apache.org/jira/browse/SPARK-27362 > Project: Spark > Issue Type: Story > Components: Kubernetes >Affects Versions: 3.0.0 >Reporter: Xiangrui Meng >Assignee: Thomas Graves >Priority: Major > Fix For: 3.0.0 > > > Design and implement k8s support for GPU-aware scheduling. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-27373) Design: Kubernetes support for GPU-aware scheduling
[ https://issues.apache.org/jira/browse/SPARK-27373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-27373. --- Resolution: Fixed Fix Version/s: 3.0.0 > Design: Kubernetes support for GPU-aware scheduling > --- > > Key: SPARK-27373 > URL: https://issues.apache.org/jira/browse/SPARK-27373 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 3.0.0 >Reporter: Xiangrui Meng >Assignee: Thomas Graves >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-27897) GPU Scheduling - move example discovery Script to scripts directory
[ https://issues.apache.org/jira/browse/SPARK-27897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-27897. --- Resolution: Fixed Fix Version/s: 3.0.0 > GPU Scheduling - move example discovery Script to scripts directory > --- > > Key: SPARK-27897 > URL: https://issues.apache.org/jira/browse/SPARK-27897 > Project: Spark > Issue Type: Story > Components: Examples >Affects Versions: 3.0.0 >Reporter: Thomas Graves >Assignee: Thomas Graves >Priority: Minor > Fix For: 3.0.0 > > > SPARK-27725 GPU Scheduling - add an example discovery Script added a script > at > [https://github.com/apache/spark/blob/master/examples/src/main/resources/getGpusResources.sh.] > Instead of having it in the resources directory lets move it to the scripts > directory -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27897) GPU Scheduling - move example discovery Script to scripts directory
Thomas Graves created SPARK-27897: - Summary: GPU Scheduling - move example discovery Script to scripts directory Key: SPARK-27897 URL: https://issues.apache.org/jira/browse/SPARK-27897 Project: Spark Issue Type: Story Components: Examples Affects Versions: 3.0.0 Reporter: Thomas Graves Assignee: Thomas Graves SPARK-27725 GPU Scheduling - add an example discovery Script added a script at [https://github.com/apache/spark/blob/master/examples/src/main/resources/getGpusResources.sh.] Instead of having it in the resources directory lets move it to the scripts directory -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-27361) YARN support for GPU-aware scheduling
[ https://issues.apache.org/jira/browse/SPARK-27361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-27361. --- Resolution: Fixed Fix Version/s: 3.0.0 all the subtasks are finished and the parts of the spark on hadoop 3.x that we need are also merged so resolving this. > YARN support for GPU-aware scheduling > - > > Key: SPARK-27361 > URL: https://issues.apache.org/jira/browse/SPARK-27361 > Project: Spark > Issue Type: Story > Components: YARN >Affects Versions: 3.0.0 >Reporter: Xiangrui Meng >Assignee: Thomas Graves >Priority: Major > Fix For: 3.0.0 > > > Design and implement YARN support for GPU-aware scheduling: > * User can request GPU resources at Spark application level. > * How the Spark executor discovers GPU's when run on YARN > * Integrate with YARN 3.2 GPU support. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-27835) Resource Scheduling: change driver config from addresses to resourcesFile
[ https://issues.apache.org/jira/browse/SPARK-27835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-27835. --- Resolution: Fixed Fix Version/s: 3.0.0 > Resource Scheduling: change driver config from addresses to resourcesFile > - > > Key: SPARK-27835 > URL: https://issues.apache.org/jira/browse/SPARK-27835 > Project: Spark > Issue Type: Story > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Thomas Graves >Assignee: Thomas Graves >Priority: Major > Fix For: 3.0.0 > > > currently we added a driver config > spark.driver.resources..addresses for standalone mode. > We should change that to be consistent with the executor side changes and > make it spark.driver.resourcesFile. This makes it more flexible in that we > wouldn't have to add more configs for different types of resources, > standalone mode would just need to output different json in that file. It > will also make some of the parsing code common between the 2. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27396) SPIP: Public APIs for extended Columnar Processing Support
[ https://issues.apache.org/jira/browse/SPARK-27396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16851445#comment-16851445 ] Thomas Graves commented on SPARK-27396: --- The vote passed. > SPIP: Public APIs for extended Columnar Processing Support > -- > > Key: SPARK-27396 > URL: https://issues.apache.org/jira/browse/SPARK-27396 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Robert Joseph Evans >Priority: Major > > *SPIP: Columnar Processing Without Arrow Formatting Guarantees.* > > *Q1.* What are you trying to do? Articulate your objectives using absolutely > no jargon. > The Dataset/DataFrame API in Spark currently only exposes to users one row at > a time when processing data. The goals of this are to > # Add to the current sql extensions mechanism so advanced users can have > access to the physical SparkPlan and manipulate it to provide columnar > processing for existing operators, including shuffle. This will allow them > to implement their own cost based optimizers to decide when processing should > be columnar and when it should not. > # Make any transitions between the columnar memory layout and a row based > layout transparent to the users so operations that are not columnar see the > data as rows, and operations that are columnar see the data as columns. > > Not Requirements, but things that would be nice to have. > # Transition the existing in memory columnar layouts to be compatible with > Apache Arrow. This would make the transformations to Apache Arrow format a > no-op. The existing formats are already very close to those layouts in many > cases. This would not be using the Apache Arrow java library, but instead > being compatible with the memory > [layout|https://arrow.apache.org/docs/format/Layout.html] and possibly only a > subset of that layout. > > *Q2.* What problem is this proposal NOT designed to solve? > The goal of this is not for ML/AI but to provide APIs for accelerated > computing in Spark primarily targeting SQL/ETL like workloads. ML/AI already > have several mechanisms to get data into/out of them. These can be improved > but will be covered in a separate SPIP. > This is not trying to implement any of the processing itself in a columnar > way, with the exception of examples for documentation. > This does not cover exposing the underlying format of the data. The only way > to get at the data in a ColumnVector is through the public APIs. Exposing > the underlying format to improve efficiency will be covered in a separate > SPIP. > This is not trying to implement new ways of transferring data to external > ML/AI applications. That is covered by separate SPIPs already. > This is not trying to add in generic code generation for columnar processing. > Currently code generation for columnar processing is only supported when > translating columns to rows. We will continue to support this, but will not > extend it as a general solution. That will be covered in a separate SPIP if > we find it is helpful. For now columnar processing will be interpreted. > This is not trying to expose a way to get columnar data into Spark through > DataSource V2 or any other similar API. That would be covered by a separate > SPIP if we find it is needed. > > *Q3.* How is it done today, and what are the limits of current practice? > The current columnar support is limited to 3 areas. > # Internal implementations of FileFormats, optionally can return a > ColumnarBatch instead of rows. The code generation phase knows how to take > that columnar data and iterate through it as rows for stages that wants rows, > which currently is almost everything. The limitations here are mostly > implementation specific. The current standard is to abuse Scala’s type > erasure to return ColumnarBatches as the elements of an RDD[InternalRow]. The > code generation can handle this because it is generating java code, so it > bypasses scala’s type checking and just casts the InternalRow to the desired > ColumnarBatch. This makes it difficult for others to implement the same > functionality for different processing because they can only do it through > code generation. There really is no clean separate path in the code > generation for columnar vs row based. Additionally, because it is only > supported through code generation if for any reason code generation would > fail there is no backup. This is typically fine for input formats but can be > problematic when we get into more extensive processing. > # When caching data it can optionally be cached in a columnar format if the > input is also columnar. This is similar to the first area and has the same > limitations because the
[jira] [Resolved] (SPARK-27376) Design: YARN supports Spark GPU-aware scheduling
[ https://issues.apache.org/jira/browse/SPARK-27376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-27376. --- Resolution: Fixed resolving this since seems no objections > Design: YARN supports Spark GPU-aware scheduling > > > Key: SPARK-27376 > URL: https://issues.apache.org/jira/browse/SPARK-27376 > Project: Spark > Issue Type: Sub-task > Components: YARN >Affects Versions: 3.0.0 >Reporter: Xiangrui Meng >Assignee: Thomas Graves >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27361) YARN support for GPU-aware scheduling
[ https://issues.apache.org/jira/browse/SPARK-27361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16851298#comment-16851298 ] Thomas Graves commented on SPARK-27361: --- Note the design Jira covers more details. > YARN support for GPU-aware scheduling > - > > Key: SPARK-27361 > URL: https://issues.apache.org/jira/browse/SPARK-27361 > Project: Spark > Issue Type: Story > Components: YARN >Affects Versions: 3.0.0 >Reporter: Xiangrui Meng >Assignee: Thomas Graves >Priority: Major > > Design and implement YARN support for GPU-aware scheduling: > * User can request GPU resources at Spark application level. > * How the Spark executor discovers GPU's when run on YARN > * Integrate with YARN 3.2 GPU support. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27361) YARN support for GPU-aware scheduling
[ https://issues.apache.org/jira/browse/SPARK-27361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated SPARK-27361: -- Description: Design and implement YARN support for GPU-aware scheduling: * User can request GPU resources at Spark application level. * How the Spark executor discovers GPU's when run on YARN * Integrate with YARN 3.2 GPU support. was: Design and implement YARN support for GPU-aware scheduling: * User can request GPU resources at Spark application level. * YARN can pass GPU info to Spark executor. * Integrate with YARN 3.2 GPU support. > YARN support for GPU-aware scheduling > - > > Key: SPARK-27361 > URL: https://issues.apache.org/jira/browse/SPARK-27361 > Project: Spark > Issue Type: Story > Components: YARN >Affects Versions: 3.0.0 >Reporter: Xiangrui Meng >Assignee: Thomas Graves >Priority: Major > > Design and implement YARN support for GPU-aware scheduling: > * User can request GPU resources at Spark application level. > * How the Spark executor discovers GPU's when run on YARN > * Integrate with YARN 3.2 GPU support. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-27362) Kubernetes support for GPU-aware scheduling
[ https://issues.apache.org/jira/browse/SPARK-27362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves reassigned SPARK-27362: - Assignee: Thomas Graves > Kubernetes support for GPU-aware scheduling > --- > > Key: SPARK-27362 > URL: https://issues.apache.org/jira/browse/SPARK-27362 > Project: Spark > Issue Type: Story > Components: Kubernetes >Affects Versions: 3.0.0 >Reporter: Xiangrui Meng >Assignee: Thomas Graves >Priority: Major > > Design and implement k8s support for GPU-aware scheduling. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-27725) GPU Scheduling - add an example discovery Script
[ https://issues.apache.org/jira/browse/SPARK-27725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves reassigned SPARK-27725: - Assignee: Thomas Graves > GPU Scheduling - add an example discovery Script > > > Key: SPARK-27725 > URL: https://issues.apache.org/jira/browse/SPARK-27725 > Project: Spark > Issue Type: Story > Components: Examples >Affects Versions: 3.0.0 >Reporter: Thomas Graves >Assignee: Thomas Graves >Priority: Major > > We should add an example script that can be used to discovery GPU's and > output the correctly formatted JSON. > Something like below, but it needs to be tested on various systems with more > then 2 gpu's: > ADDRS=`nvidia-smi --query-gpu=index --format=csv,noheader | sed > 'N;s/\n/\",\"/'` > COUNT=`echo $ADDRS | tr -cd , | wc -c` > ALLCOUNT=`expr $COUNT + 1` > #echo \{\"name\": \"gpu\", \"addresses\":[\"$ADDRS\"]} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-27835) Resource Scheduling: change driver config from addresses to resourcesFile
[ https://issues.apache.org/jira/browse/SPARK-27835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves reassigned SPARK-27835: - Assignee: Thomas Graves > Resource Scheduling: change driver config from addresses to resourcesFile > - > > Key: SPARK-27835 > URL: https://issues.apache.org/jira/browse/SPARK-27835 > Project: Spark > Issue Type: Story > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Thomas Graves >Assignee: Thomas Graves >Priority: Major > > currently we added a driver config > spark.driver.resources..addresses for standalone mode. > We should change that to be consistent with the executor side changes and > make it spark.driver.resourcesFile. This makes it more flexible in that we > wouldn't have to add more configs for different types of resources, > standalone mode would just need to output different json in that file. It > will also make some of the parsing code common between the 2. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27835) Resource Scheduling: change driver config from addresses to resourcesFile
Thomas Graves created SPARK-27835: - Summary: Resource Scheduling: change driver config from addresses to resourcesFile Key: SPARK-27835 URL: https://issues.apache.org/jira/browse/SPARK-27835 Project: Spark Issue Type: Story Components: Spark Core Affects Versions: 3.0.0 Reporter: Thomas Graves currently we added a driver config spark.driver.resources..addresses for standalone mode. We should change that to be consistent with the executor side changes and make it spark.driver.resourcesFile. This makes it more flexible in that we wouldn't have to add more configs for different types of resources, standalone mode would just need to output different json in that file. It will also make some of the parsing code common between the 2. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27495) Support Stage level resource configuration and scheduling
[ https://issues.apache.org/jira/browse/SPARK-27495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16845219#comment-16845219 ] Thomas Graves commented on SPARK-27495: --- I'm working though the design of this and there are definitely a lot of things to think about here. I would like to get other peoples input before going much further. I think a few main points we need to decide on: 1) What resources can user specify per stage - the more I think about this, the more things I can think of people wanting to change. For instance, normally in Spark you don't specify the task requirements, you specify the executor requirements and then possibly the cores per task. So if someone is requesting resources per stage, I think we need a way to specify both task and executor requirements. You could specify executor requirements based on task requirements like say I want 4 tasks per executor and then multiply the task requirements, but then you have things like overhead memory and users aren't used to specifying at task level, so I think its better to separate those out. Then going beyond that, we know people want to limit the # of total running tasks per stage, looking at memory, there is offheap, heap, overhead memory. I can envision people wanting to change retries or shuffle parameters per stage. Its definitely more stuff then just a few resources. Basically its coming to a lot of things in SparkConf. Now whether that is the interface we show to users or not is another question. You could for instance let them pass an entire SparkConf in and set the configs they are used to setting and deal with that. But then you have to error or ignore the configs we don't support dynamically changing and you have to deal with resolving conflicts on an unknown set of things if they have specified different confs for multiple operations that get combined into a single stage (ie like map.filter.groupby and they specified conflicting resources for map and groupby). Or you could make an interface that only gives them specific options and keep adding to that as people request more things. The latter I think is cleaner in some ways but is also less flexible and requires a new API vs possibly using the configs users are already used to. 2) API. I think ideally each of the operators (RDD.*, Dataset.*, where * is map, filter, groupby, sort, join, etc) would have an optional parameter to specify the resources you want to use. I think this would make it clear to the user that for at least that operation these will be applied. It also helps with cases you don't have an RDD yet, like on the initial read of files. This however could mean a lot of API changes. Another way, which was originally proposed in SPARK-24615, is adding something like a .withResource api but then you have to deal with do you prefix it, post fix it, etc. If you postfix it what about things like eager execution mode. prefix seems to make more sense. But then you still don't have an option for the readers. I think this also makes the scoping less clear, although you still have some of that with adding it to the individual operators. 3) Scoping. The scoping could be confusing to the users. Ideally I want to do RDD/Dataset/Data frame api's (I realize the Jira was initially more in the scope of the barrier scheduling, but if we are going to do it, it seems like we should make it generic). The RDD is a bit more obvious where the stage boundaries might be, but with catalyst it can do any sort of optimizations that could lead to stage boundaries the user doesn't expect. In either case you also have cases where things are in 2 stages, like groupby where it does the partial aggregation, the shuffle, then the full aggregation. The withResources would have to apply to both stages. Then you have things like do you keep using that resource profile until they change it or is it just those stages and then it goes back to the default application level configs. You could also go back to what I mentioned on the Jira where the withResources would be like a function scope {}... withResources() \{ everything that should be done within that resource profile } but that syntax isn't like anything we have now that I'm aware of. 3) How to deal with multiple, possibly conflict resource requirements. You can go with the max in some cases but for some cases that might not make sense. For instance if you are doing memory you might actually want them to be the sum, for instance you know this operations need x memory, then catalyst combines that with another operation that needs y memory. You would want to sum those or you would have to have the user again realize those will get combined and have them do it. The latter isn't ideal either. Really, the dataframe/dataset api shouldn't need to have any api to specify
[jira] [Commented] (SPARK-26989) Flaky test:DAGSchedulerSuite.Barrier task failures from the same stage attempt don't trigger multiple stage retries
[ https://issues.apache.org/jira/browse/SPARK-26989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16844845#comment-16844845 ] Thomas Graves commented on SPARK-26989: --- seeing the same error intermittently, the weird thing is I sometimes also see it error with below, like its not converting the collection or maybe its still a timing issue, but I had increased the timeout from 10 seconds to 30 seconds as well. org.scalatest.exceptions.TestFailedException: ArrayBuffer(0) did not equal List(0) at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:528) at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:527) at org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560) at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:501) at org.apache.spark.scheduler.DAGSchedulerSuite.$anonfun$new$144(DAGSchedulerSuite.scala:2656) > Flaky test:DAGSchedulerSuite.Barrier task failures from the same stage > attempt don't trigger multiple stage retries > --- > > Key: SPARK-26989 > URL: https://issues.apache.org/jira/browse/SPARK-26989 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Marcelo Vanzin >Priority: Major > > https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/102761/testReport/junit/org.apache.spark.scheduler/DAGSchedulerSuite/Barrier_task_failures_from_the_same_stage_attempt_don_t_trigger_multiple_stage_retries/ > {noformat} > org.apache.spark.scheduler.DAGSchedulerSuite.Barrier task failures from the > same stage attempt don't trigger multiple stage retries > Error Message > org.scalatest.exceptions.TestFailedException: ArrayBuffer() did not equal > List(0) > Stacktrace > sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: > ArrayBuffer() did not equal List(0) > at > org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:528) > at > org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:527) > at > org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560) > at > org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:501) > at > org.apache.spark.scheduler.DAGSchedulerSuite.$anonfun$new$144(DAGSchedulerSuite.scala:2644) > at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) > at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) > at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > at org.scalatest.Transformer.apply(Transformer.scala:22) > at org.scalatest.Transformer.apply(Transformer.scala:20) > at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186) > at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:104) > at > org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184) > at org.scalatest.FunSuiteLike.$anonfun$runTest$1(FunSuiteLike.scala:196) > at org.scalatest.SuperEngine.runTestImpl(Engine.scala:289) > at org.scalatest.FunSuiteLike.runTest(FunSuiteLike.scala:196) > at org.scalatest.FunSuiteLike.runTest$(FunSuiteLike.scala:178) > at > org.apache.spark.scheduler.DAGSchedulerSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(DAGSchedulerSuite.scala:122) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27374) Fetch assigned resources from TaskContext
[ https://issues.apache.org/jira/browse/SPARK-27374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16844067#comment-16844067 ] Thomas Graves commented on SPARK-27374: --- I believe this is being done under: https://issues.apache.org/jira/browse/SPARK-27366 > Fetch assigned resources from TaskContext > - > > Key: SPARK-27374 > URL: https://issues.apache.org/jira/browse/SPARK-27374 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Xiangrui Meng >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27725) GPU Scheduling - add an example discovery Script
[ https://issues.apache.org/jira/browse/SPARK-27725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16844066#comment-16844066 ] Thomas Graves commented on SPARK-27725: --- the above sed command doesn't quite handle it, here is a command that works for many gpus: ADDRS=`nvidia-smi --query-gpu=index --format=csv,noheader | sed -e ':a' -e 'N' -e'$!ba' -e 's/\n/","/g'` echo \{\"name\": \"gpu\", \"addresses\":[\"$ADDRS\"]} > GPU Scheduling - add an example discovery Script > > > Key: SPARK-27725 > URL: https://issues.apache.org/jira/browse/SPARK-27725 > Project: Spark > Issue Type: Story > Components: Examples >Affects Versions: 3.0.0 >Reporter: Thomas Graves >Priority: Major > > We should add an example script that can be used to discovery GPU's and > output the correctly formatted JSON. > Something like below, but it needs to be tested on various systems with more > then 2 gpu's: > ADDRS=`nvidia-smi --query-gpu=index --format=csv,noheader | sed > 'N;s/\n/\",\"/'` > COUNT=`echo $ADDRS | tr -cd , | wc -c` > ALLCOUNT=`expr $COUNT + 1` > #echo \{\"name\": \"gpu\", \"addresses\":[\"$ADDRS\"]} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27376) Design: YARN supports Spark GPU-aware scheduling
[ https://issues.apache.org/jira/browse/SPARK-27376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16842262#comment-16842262 ] Thomas Graves commented on SPARK-27376: --- I filed https://issues.apache.org/jira/browse/SPARK-27760 to change the config from .count to .amount and have yarn configs match. If you have other thoughts we can document there. > Design: YARN supports Spark GPU-aware scheduling > > > Key: SPARK-27376 > URL: https://issues.apache.org/jira/browse/SPARK-27376 > Project: Spark > Issue Type: Sub-task > Components: YARN >Affects Versions: 3.0.0 >Reporter: Xiangrui Meng >Assignee: Thomas Graves >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27760) Spark resources - user configs change .count to be .amount, and yarn configs should match
Thomas Graves created SPARK-27760: - Summary: Spark resources - user configs change .count to be .amount, and yarn configs should match Key: SPARK-27760 URL: https://issues.apache.org/jira/browse/SPARK-27760 Project: Spark Issue Type: Story Components: Spark Core Affects Versions: 3.0.0 Reporter: Thomas Graves For the Spark resources, we created the config spark.\{driver/executor}.resource.\{resourceName}.count I think we should change .count to be .amount. That more easily allows users to specify things with suffix like memory in a single config and they can combine the value and unit. Without this they would have to specify 2 separate configs which seems more of a hassle for the user. Spark on yarn already had added configs to request resources via the configs spark.yarn.\{executor/driver/am}.resource=, where the is value and unit together. We should change those configs to have a .amount suffix on them to match the spark configs and to allow future configs to be more easily added. YARN itself already supports tags and attributes so if we want the user to be able to pass those from spark at some point having a suffix makes sense. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27736) Improve handling of FetchFailures caused by ExternalShuffleService losing track of executor registrations
[ https://issues.apache.org/jira/browse/SPARK-27736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16841614#comment-16841614 ] Thomas Graves commented on SPARK-27736: --- to clarify my last suggestion, I mean each executor reports back to the driver about the fetch failure and the driver could see that multiple fetch failures happened to that same host for different executors output and then choose to invalidate all the output on that host if X number have already failed to fetch. There are other things the driver could use the information on. > Improve handling of FetchFailures caused by ExternalShuffleService losing > track of executor registrations > - > > Key: SPARK-27736 > URL: https://issues.apache.org/jira/browse/SPARK-27736 > Project: Spark > Issue Type: Bug > Components: Shuffle >Affects Versions: 2.4.0 >Reporter: Josh Rosen >Priority: Minor > > This ticket describes a fault-tolerance edge-case which can cause Spark jobs > to fail if a single external shuffle service process reboots and fails to > recover the list of registered executors (something which can happen when > using YARN if NodeManager recovery is disabled) _and_ the Spark job has a > large number of executors per host. > I believe this problem can be worked around today via a change of > configurations, but I'm filing this issue to (a) better document this > problem, and (b) propose either a change of default configurations or > additional DAGScheduler logic to better handle this failure mode. > h2. Problem description > The external shuffle service process is _mostly_ stateless except for a map > tracking the set of registered applications and executors. > When processing a shuffle fetch request, the shuffle services first checks > whether the requested block ID's executor is registered; if it's not > registered then the shuffle service throws an exception like > {code:java} > java.lang.RuntimeException: Executor is not registered > (appId=application_1557557221330_6891, execId=428){code} > and this exception becomes a {{FetchFailed}} error in the executor requesting > the shuffle block. > In normal operation this error should not occur because executors shouldn't > be mis-routing shuffle fetch requests. However, this _can_ happen if the > shuffle service crashes and restarts, causing it to lose its in-memory > executor registration state. With YARN this state can be recovered from disk > if YARN NodeManager recovery is enabled (using the mechanism added in > SPARK-9439), but I don't believe that we perform state recovery in Standalone > and Mesos modes (see SPARK-24223). > If state cannot be recovered then map outputs cannot be served (even though > the files probably still exist on disk). In theory, this shouldn't cause > Spark jobs to fail because we can always redundantly recompute lost / > unfetchable map outputs. > However, in practice this can cause total job failures in deployments where > the node with the failed shuffle service was running a large number of > executors: by default, the DAGScheduler unregisters map outputs _only from > individual executor whose shuffle blocks could not be fetched_ (see > [code|https://github.com/apache/spark/blame/bfb3ffe9b33a403a1f3b6f5407d34a477ce62c85/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1643]), > so it can take several rounds of failed stage attempts to fail and clear > output from all executors on the faulty host. If the number of executors on a > host is greater than the stage retry limit then this can exhaust stage retry > attempts and cause job failures. > This "multiple rounds of recomputation to discover all failed executors on a > host" problem was addressed by SPARK-19753, which added a > {{spark.files.fetchFailure.unRegisterOutputOnHost}} configuration which > promotes executor fetch failures into host-wide fetch failures (clearing > output from all neighboring executors upon a single failure). However, that > configuration is {{false}} by default. > h2. Potential solutions > I have a few ideas about how we can improve this situation: > - Update the [YARN external shuffle service > documentation|https://spark.apache.org/docs/latest/running-on-yarn.html#configuring-the-external-shuffle-service] > to recommend enabling node manager recovery. > - Consider defaulting {{spark.files.fetchFailure.unRegisterOutputOnHost}} to > {{true}}. This would improve out-of-the-box resiliency for large clusters. > The trade-off here is a reduction of efficiency in case there are transient > "false positive" fetch failures, but I suspect this case may be unlikely in > practice (so the change of default could be an acceptable trade-off). See
[jira] [Commented] (SPARK-27736) Improve handling of FetchFailures caused by ExternalShuffleService losing track of executor registrations
[ https://issues.apache.org/jira/browse/SPARK-27736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16841593#comment-16841593 ] Thomas Graves commented on SPARK-27736: --- Yeah we always ran yarn with node manager recover on, but that doesn't help standalone mode unless you implement something similar. But either way I think documenting it on yarn is a good idea. We used to see transient fetch failures all the time, because of temporary spikes in disk usage, so I would be hesitant to turn on spark.files.fetchFailure.unRegisterOutputOnHost by default, but on the other hand users could turn it back off too, so it depends on what people think is most common. I don't think you can assume the death of shuffle service (NM on yarn) implies death of executor. We have seen Nodemanagers goes down with OOM and executor stays up. Without the NM there, there isn't really anything to clean up the containers on it. Now you will obviously fetch fail from that node if it does go down. Your last option seems like the best of those but like you mention could get a bit ugly with the String matching. The other thing you can do is start tracking those fetch failures and have the driver make a more informed decision on that. This is work we had started to do at my previous employer but never had time to finish it. Its a much bigger change but really what we should be doing. It would allow us to make better decisions about black listing and see was it the map or reduce node that has issues, etc. > Improve handling of FetchFailures caused by ExternalShuffleService losing > track of executor registrations > - > > Key: SPARK-27736 > URL: https://issues.apache.org/jira/browse/SPARK-27736 > Project: Spark > Issue Type: Bug > Components: Shuffle >Affects Versions: 2.4.0 >Reporter: Josh Rosen >Priority: Minor > > This ticket describes a fault-tolerance edge-case which can cause Spark jobs > to fail if a single external shuffle service process reboots and fails to > recover the list of registered executors (something which can happen when > using YARN if NodeManager recovery is disabled) _and_ the Spark job has a > large number of executors per host. > I believe this problem can be worked around today via a change of > configurations, but I'm filing this issue to (a) better document this > problem, and (b) propose either a change of default configurations or > additional DAGScheduler logic to better handle this failure mode. > h2. Problem description > The external shuffle service process is _mostly_ stateless except for a map > tracking the set of registered applications and executors. > When processing a shuffle fetch request, the shuffle services first checks > whether the requested block ID's executor is registered; if it's not > registered then the shuffle service throws an exception like > {code:java} > java.lang.RuntimeException: Executor is not registered > (appId=application_1557557221330_6891, execId=428){code} > and this exception becomes a {{FetchFailed}} error in the executor requesting > the shuffle block. > In normal operation this error should not occur because executors shouldn't > be mis-routing shuffle fetch requests. However, this _can_ happen if the > shuffle service crashes and restarts, causing it to lose its in-memory > executor registration state. With YARN this state can be recovered from disk > if YARN NodeManager recovery is enabled (using the mechanism added in > SPARK-9439), but I don't believe that we perform state recovery in Standalone > and Mesos modes (see SPARK-24223). > If state cannot be recovered then map outputs cannot be served (even though > the files probably still exist on disk). In theory, this shouldn't cause > Spark jobs to fail because we can always redundantly recompute lost / > unfetchable map outputs. > However, in practice this can cause total job failures in deployments where > the node with the failed shuffle service was running a large number of > executors: by default, the DAGScheduler unregisters map outputs _only from > individual executor whose shuffle blocks could not be fetched_ (see > [code|https://github.com/apache/spark/blame/bfb3ffe9b33a403a1f3b6f5407d34a477ce62c85/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1643]), > so it can take several rounds of failed stage attempts to fail and clear > output from all executors on the faulty host. If the number of executors on a > host is greater than the stage retry limit then this can exhaust stage retry > attempts and cause job failures. > This "multiple rounds of recomputation to discover all failed executors on a > host" problem was addressed by SPARK-19753, which added a
[jira] [Comment Edited] (SPARK-27373) Design: Kubernetes support for GPU-aware scheduling
[ https://issues.apache.org/jira/browse/SPARK-27373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16841506#comment-16841506 ] Thomas Graves edited comment on SPARK-27373 at 5/16/19 4:21 PM: for the kubernetes side, it has 2 options for requesting containers: 1) pod templates, 2) through normal spark and spark.kubernetes configs For adding in the spark resource support, we can take the spark configs spark.\{driver/executor}.resource.\{resourceName}.count and combine this with a new config for the vendor name like spark.\{driver/executor}.resource.\{resourceName}.vendor to match the device plugin support from k8s ( [https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/)] and add that to the PodBuilder. We could make the vendor config kubernetes specific, but I'm thinking we leave it generic and then just state its only supported on kubernetes right now. Depending on the setup, I could see this being useful for say YARN since yarn support attributes and vendor could be an attribute spark already has functionality to override and add certain things in the pod templates so we can use similar functionality with the resources. So we can support both the pod templates and the configs the same way. was (Author: tgraves): for the kubernetes side, it has 2 options for requesting containers: 1) pod templates, 2) through normal spark and spark.kubernetes configs For adding in the spark resource support, we can take the spark configs spark.\{driver/executor}.resource.\{resourceName}.count and combine this with a new config for the vendor name like spark.\{driver/executor}.resource.\{resourceName}.vendor to match the device plugin support from k8s ( [https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/)] and add that to the PodBuilder. spark already has functionality to override and add certain things in the pod templates so we can use similar functionality with the resources. So we can support both the pod templates and the configs the same way. > Design: Kubernetes support for GPU-aware scheduling > --- > > Key: SPARK-27373 > URL: https://issues.apache.org/jira/browse/SPARK-27373 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 3.0.0 >Reporter: Xiangrui Meng >Assignee: Thomas Graves >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27373) Design: Kubernetes support for GPU-aware scheduling
[ https://issues.apache.org/jira/browse/SPARK-27373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16841506#comment-16841506 ] Thomas Graves commented on SPARK-27373: --- for the kubernetes side, it has 2 options for requesting containers: 1) pod templates, 2) through normal spark and spark.kubernetes configs For adding in the spark resource support, we can take the spark configs spark.\{driver/executor}.resource.\{resourceName}.count and combine this with a new config for the vendor name like spark.\{driver/executor}.resource.\{resourceName}.vendor to match the device plugin support from k8s ( [https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/)] and add that to the PodBuilder. spark already has functionality to override and add certain things in the pod templates so we can use similar functionality with the resources. So we can support both the pod templates and the configs the same way. > Design: Kubernetes support for GPU-aware scheduling > --- > > Key: SPARK-27373 > URL: https://issues.apache.org/jira/browse/SPARK-27373 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 3.0.0 >Reporter: Xiangrui Meng >Assignee: Thomas Graves >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-27373) Design: Kubernetes support for GPU-aware scheduling
[ https://issues.apache.org/jira/browse/SPARK-27373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves reassigned SPARK-27373: - Assignee: Thomas Graves > Design: Kubernetes support for GPU-aware scheduling > --- > > Key: SPARK-27373 > URL: https://issues.apache.org/jira/browse/SPARK-27373 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 3.0.0 >Reporter: Xiangrui Meng >Assignee: Thomas Graves >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-27377) Upgrade YARN to 3.1.2+ to support GPU
[ https://issues.apache.org/jira/browse/SPARK-27377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-27377. --- Resolution: Fixed > Upgrade YARN to 3.1.2+ to support GPU > - > > Key: SPARK-27377 > URL: https://issues.apache.org/jira/browse/SPARK-27377 > Project: Spark > Issue Type: Sub-task > Components: YARN >Affects Versions: 3.0.0 >Reporter: Xiangrui Meng >Priority: Major > > This task should be covered by SPARK-23710. Just a placeholder here. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27376) Design: YARN supports Spark GPU-aware scheduling
[ https://issues.apache.org/jira/browse/SPARK-27376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16841404#comment-16841404 ] Thomas Graves commented on SPARK-27376: --- [~mengxr] [~jiangxb] Thoughts on my proposal above to rename the user facing resource config from .count to .amount and also adding it to the existing yarn configs? > Design: YARN supports Spark GPU-aware scheduling > > > Key: SPARK-27376 > URL: https://issues.apache.org/jira/browse/SPARK-27376 > Project: Spark > Issue Type: Sub-task > Components: YARN >Affects Versions: 3.0.0 >Reporter: Xiangrui Meng >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27378) spark-submit requests GPUs in YARN mode
[ https://issues.apache.org/jira/browse/SPARK-27378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16841412#comment-16841412 ] Thomas Graves commented on SPARK-27378: --- Spark 3.0 already added support for requesting any resource from YARN via the configs: spark.yarn.\{executor/driver/am}.resource, so the changes required for this Jira are simply to map the new spark configs: spark.\{executor/driver}.resource.\{fpga/gpu}.count into the corresponding yarn configs. For other resource types we can't map them though because we don't know what they are called on the yarn side. So for any other resource they will have to specify both configs spark.yarn.\{executor/driver/am}.resource and spark.\{executor/driver}.resource.\{fpga/gpu}. That isn't ideal but the only other option would be to have some sort of mapping the user would pass in. We can always add more yarn resource types if it adds them. The main 2 people are interested in seem to be gpu and fpga anyway, so I think for now this is fine. > spark-submit requests GPUs in YARN mode > --- > > Key: SPARK-27378 > URL: https://issues.apache.org/jira/browse/SPARK-27378 > Project: Spark > Issue Type: Sub-task > Components: Spark Submit, YARN >Affects Versions: 3.0.0 >Reporter: Xiangrui Meng >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27379) YARN passes GPU info to Spark executor
[ https://issues.apache.org/jira/browse/SPARK-27379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16841409#comment-16841409 ] Thomas Graves commented on SPARK-27379: --- The way yarn works is it actually doesn't tell the application any info about what is was allocated. If you have hadoop 3.1+ and it setup for docker and isolation then its up to the user to discover what the container has. So based on that, I'm going to close this. > YARN passes GPU info to Spark executor > -- > > Key: SPARK-27379 > URL: https://issues.apache.org/jira/browse/SPARK-27379 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, YARN >Affects Versions: 3.0.0 >Reporter: Xiangrui Meng >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-27379) YARN passes GPU info to Spark executor
[ https://issues.apache.org/jira/browse/SPARK-27379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-27379. --- Resolution: Invalid Assignee: Thomas Graves > YARN passes GPU info to Spark executor > -- > > Key: SPARK-27379 > URL: https://issues.apache.org/jira/browse/SPARK-27379 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, YARN >Affects Versions: 3.0.0 >Reporter: Xiangrui Meng >Assignee: Thomas Graves >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27377) Upgrade YARN to 3.1.2+ to support GPU
[ https://issues.apache.org/jira/browse/SPARK-27377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16841407#comment-16841407 ] Thomas Graves commented on SPARK-27377: --- there are enough pieces in the hadoop 3.2 support impelemnted that this is no longer blocking us so I'm going to close this. > Upgrade YARN to 3.1.2+ to support GPU > - > > Key: SPARK-27377 > URL: https://issues.apache.org/jira/browse/SPARK-27377 > Project: Spark > Issue Type: Sub-task > Components: YARN >Affects Versions: 3.0.0 >Reporter: Xiangrui Meng >Priority: Major > > This task should be covered by SPARK-23710. Just a placeholder here. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-27376) Design: YARN supports Spark GPU-aware scheduling
[ https://issues.apache.org/jira/browse/SPARK-27376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves reassigned SPARK-27376: - Assignee: Thomas Graves > Design: YARN supports Spark GPU-aware scheduling > > > Key: SPARK-27376 > URL: https://issues.apache.org/jira/browse/SPARK-27376 > Project: Spark > Issue Type: Sub-task > Components: YARN >Affects Versions: 3.0.0 >Reporter: Xiangrui Meng >Assignee: Thomas Graves >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27376) Design: YARN supports Spark GPU-aware scheduling
[ https://issues.apache.org/jira/browse/SPARK-27376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16841402#comment-16841402 ] Thomas Graves commented on SPARK-27376: --- The design is pretty straight forward, there is really only 1 question which is consistency between the yarn resource configs and now the new spark resource configs, see the last paragraph for more details. Require Hadoop 3.1 and > to get official GPU support. Hadoop can be configured to use docker with isolation so that the containers yarn hands you back has the requested gpu's and other resources. YARN does not give you information about what it allocated for gpu's, you have to discover it. YARN has hardcoded resource types for fpga and gpu, anything else is user defined types. Spark 3.0 already added support for requesting any resource from YARN via the configs: spark.yarn.\{executor/driver/am}.resource, so the changes required for this Jira are simply to map the new spark configs: spark.\{executor/driver}.resource.\{fpga/gpu}.count into the corresponding yarn configs. For other resource types we can't map them though because we don't know what they are called on the yarn side. So for any other resource they will have to specify both configs spark.yarn.\{executor/driver/am}.resource and spark.\{executor/driver}.resource.\{fpga/gpu}. That isn't ideal but the only other option would be to have some sort of mapping the user would pass in. We can always add more yarn resource types if it adds them. The main 2 people are interested in seem to be gpu and fpga anyway, so I think for now this is fine. For versions < hadoop 3.1 it won't allocate based on GPU, so if they are using hadoop 2.7, 2.8, etc they could still allocate nodes with GPU, with yarn node labels or other hacks, and tell Spark the count and to auto discover them and Spark will pick up whatever it sees in the container - or really whatever the discoveryScript returns, so people could potentially write that script to match whatever hacks they have for sharing gpu nodes now. The flow from user point would be: For GPU and FPGA: User will specify the spark.\{executor/driver}.resource.\{gpu/fpga}.count and the spark.\{executor/driver}.resource.\{gpu/fpga}.discoveryScript. The spark yarn code maps these into the corresponding yarn resource config and asks yarn for the containers. Yarn allocates the containers and Spark will run the discovery script to figure out what it has for allocations. For other resource types the user will have to specify: spark.yarn.\{executor/driver/am}.resource and spark.\{executor/driver}.resource.\{gpu/fpga}.count and the spark.\{executor/driver}.resource.\{gpu/fpga}.discoveryScript. The only other thing that is a inconsistent is the spark.yarn.\{executor/driver/am}.resource configs don't have a .count on the end. Right now that config takes a string as a value and splits that into an actual count and a unit. The yarn resource configs were just added in 3.0 so haven't been released so we could potentially change them. We could change the spark user facing configs ( spark.\{executor/driver}.resource.\{gpu/fpga}.count) to be similar to make it easier for the user to specify both a count and unit in 1 config instead of 2, but I like the ability to separate them on the discovery side as well. We took the .unit support out in the executor pull request so it isn't there right now anyway. We could do the opposite and change the yarn ones to have a .count and .unit as well just to make things consistent but that makes user have to specify 2 instead of 1. Or the third option would be to have the .count and .unit and then eventually have a third one that lets the user specify them together if we add resources that actually use it. My thoughts are for the user facing configs we change .count to be .amount and let the user specify units on it. This makes it easier for the user and it allows us to extend later if we want. I think we should also change the spark.yarn configs to have a .amount because yarn has already added other things like tags and attributes so we if want to extend the spark support for those it makes more sense to have those as another postfix option spark.yarn...resource.tags= We can leave everything else that is internal as separate count and units and since gpu/fpga don't need units we don't need to actually add it to our ResourceInformation since we already removed it. > Design: YARN supports Spark GPU-aware scheduling > > > Key: SPARK-27376 > URL: https://issues.apache.org/jira/browse/SPARK-27376 > Project: Spark > Issue Type: Sub-task > Components: YARN >Affects Versions: 3.0.0 >Reporter: Xiangrui Meng >Priority: Major > -- This message wa
[jira] [Created] (SPARK-27725) GPU Scheduling - add an example discovery Script
Thomas Graves created SPARK-27725: - Summary: GPU Scheduling - add an example discovery Script Key: SPARK-27725 URL: https://issues.apache.org/jira/browse/SPARK-27725 Project: Spark Issue Type: Story Components: Examples Affects Versions: 3.0.0 Reporter: Thomas Graves We should add an example script that can be used to discovery GPU's and output the correctly formatted JSON. Something like below, but it needs to be tested on various systems with more then 2 gpu's: ADDRS=`nvidia-smi --query-gpu=index --format=csv,noheader | sed 'N;s/\n/\",\"/'` COUNT=`echo $ADDRS | tr -cd , | wc -c` ALLCOUNT=`expr $COUNT + 1` #echo \{\"name\": \"gpu\", \"addresses\":[\"$ADDRS\"]} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-27024) Executor interface for cluster managers to support GPU resources
[ https://issues.apache.org/jira/browse/SPARK-27024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-27024. --- Resolution: Fixed > Executor interface for cluster managers to support GPU resources > > > Key: SPARK-27024 > URL: https://issues.apache.org/jira/browse/SPARK-27024 > Project: Spark > Issue Type: Story > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Xingbo Jiang >Assignee: Thomas Graves >Priority: Major > > The executor interface shall deal with the resources allocated to the > executor by cluster managers(Standalone, YARN, Kubernetes). The Executor > either needs to be told the resources it was given or it needs to discover > them in order for the executor to sync with the driver to expose available > resources to support task scheduling. > Note this is part of a bigger feature for gpu-aware scheduling and is just > how the executor find the resources. The general flow : > * users ask for a certain set of resources, for instance number of gpus - > each cluster manager has a specific way to do this. > * cluster manager allocates a container or set of resources (standalone mode) > * When spark launches the executor in that container, the executor either > has to be told what resources it has or it has to auto discover them. > * Executor has to register with Driver and tell the driver the set of > resources it has so the scheduler can use that to schedule tasks that > requires a certain amount of each of those resources -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27488) Driver interface to support GPU resources
[ https://issues.apache.org/jira/browse/SPARK-27488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16839402#comment-16839402 ] Thomas Graves commented on SPARK-27488: --- With this Jira we need to move the function that does the checks for various configs to a common place and we need to address the comment here: [https://github.com/apache/spark/pull/24406/files/b9dacef1d7d47d19df300628d3841b3a13c03547#r283617243] To make it more clear on variables names. > Driver interface to support GPU resources > -- > > Key: SPARK-27488 > URL: https://issues.apache.org/jira/browse/SPARK-27488 > Project: Spark > Issue Type: Story > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Thomas Graves >Assignee: Thomas Graves >Priority: Major > > We want to have an interface to allow the users on the driver to get what > resources are allocated to them. This is mostly to handle the case the > cluster manager does not launch the driver in an isolated environment and > where users could be sharing hosts. For instance in standalone mode it > doesn't support container isolation so a host may have 4 gpu's, but only 2 of > them could be assigned to the driver. In this case we need an interface for > the cluster manager to specify what gpu's for the driver to use and an > interface for the user to get the resource information -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27520) Introduce a global config system to replace hadoopConfiguration
[ https://issues.apache.org/jira/browse/SPARK-27520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16836730#comment-16836730 ] Thomas Graves commented on SPARK-27520: --- Can we add more to the description to explain why we are doing this? I'm assuming this is to allow users to more easily change it. > Introduce a global config system to replace hadoopConfiguration > --- > > Key: SPARK-27520 > URL: https://issues.apache.org/jira/browse/SPARK-27520 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Xingbo Jiang >Priority: Major > > hadoopConf can be accessed via `SparkContext.hadoopConfiguration` from both > user code and Spark internal. The configuration is mainly used to read files > from hadoop-supported file system(eg. get URI/get FileSystem/add security > credentials/get metastore connect url/etc.) > We shall keep a global config that users can set and use that to track the > hadoop configurations, and avoid using `SparkContext.hadoopConfiguration`, > maybe we shall mark it as deprecate. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27492) GPU scheduling - High level user documentation
[ https://issues.apache.org/jira/browse/SPARK-27492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated SPARK-27492: -- Description: For the SPIP - Accelerator-aware task scheduling for Spark, https://issues.apache.org/jira/browse/SPARK-24615 Add some high level user documentation about how this feature works together and point to things like the example discovery script, etc. - make sure to document the discovery script and what permissions are needed and any security implications was:For the SPIP - Accelerator-aware task scheduling for Spark, https://issues.apache.org/jira/browse/SPARK-24615 Add some high level user documentation about how this feature works together and point to things like the example discovery script, etc. > GPU scheduling - High level user documentation > -- > > Key: SPARK-27492 > URL: https://issues.apache.org/jira/browse/SPARK-27492 > Project: Spark > Issue Type: Story > Components: Documentation >Affects Versions: 3.0.0 >Reporter: Thomas Graves >Priority: Major > > For the SPIP - Accelerator-aware task scheduling for Spark, > https://issues.apache.org/jira/browse/SPARK-24615 Add some high level user > documentation about how this feature works together and point to things like > the example discovery script, etc. > > - make sure to document the discovery script and what permissions are needed > and any security implications -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27396) SPIP: Public APIs for extended Columnar Processing Support
[ https://issues.apache.org/jira/browse/SPARK-27396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16835014#comment-16835014 ] Thomas Graves commented on SPARK-27396: --- I just started a vote thread on this SPIP please take a look and vote. > SPIP: Public APIs for extended Columnar Processing Support > -- > > Key: SPARK-27396 > URL: https://issues.apache.org/jira/browse/SPARK-27396 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Robert Joseph Evans >Priority: Major > > *SPIP: Columnar Processing Without Arrow Formatting Guarantees.* > > *Q1.* What are you trying to do? Articulate your objectives using absolutely > no jargon. > The Dataset/DataFrame API in Spark currently only exposes to users one row at > a time when processing data. The goals of this are to > # Add to the current sql extensions mechanism so advanced users can have > access to the physical SparkPlan and manipulate it to provide columnar > processing for existing operators, including shuffle. This will allow them > to implement their own cost based optimizers to decide when processing should > be columnar and when it should not. > # Make any transitions between the columnar memory layout and a row based > layout transparent to the users so operations that are not columnar see the > data as rows, and operations that are columnar see the data as columns. > > Not Requirements, but things that would be nice to have. > # Transition the existing in memory columnar layouts to be compatible with > Apache Arrow. This would make the transformations to Apache Arrow format a > no-op. The existing formats are already very close to those layouts in many > cases. This would not be using the Apache Arrow java library, but instead > being compatible with the memory > [layout|https://arrow.apache.org/docs/format/Layout.html] and possibly only a > subset of that layout. > > *Q2.* What problem is this proposal NOT designed to solve? > The goal of this is not for ML/AI but to provide APIs for accelerated > computing in Spark primarily targeting SQL/ETL like workloads. ML/AI already > have several mechanisms to get data into/out of them. These can be improved > but will be covered in a separate SPIP. > This is not trying to implement any of the processing itself in a columnar > way, with the exception of examples for documentation. > This does not cover exposing the underlying format of the data. The only way > to get at the data in a ColumnVector is through the public APIs. Exposing > the underlying format to improve efficiency will be covered in a separate > SPIP. > This is not trying to implement new ways of transferring data to external > ML/AI applications. That is covered by separate SPIPs already. > This is not trying to add in generic code generation for columnar processing. > Currently code generation for columnar processing is only supported when > translating columns to rows. We will continue to support this, but will not > extend it as a general solution. That will be covered in a separate SPIP if > we find it is helpful. For now columnar processing will be interpreted. > This is not trying to expose a way to get columnar data into Spark through > DataSource V2 or any other similar API. That would be covered by a separate > SPIP if we find it is needed. > > *Q3.* How is it done today, and what are the limits of current practice? > The current columnar support is limited to 3 areas. > # Internal implementations of FileFormats, optionally can return a > ColumnarBatch instead of rows. The code generation phase knows how to take > that columnar data and iterate through it as rows for stages that wants rows, > which currently is almost everything. The limitations here are mostly > implementation specific. The current standard is to abuse Scala’s type > erasure to return ColumnarBatches as the elements of an RDD[InternalRow]. The > code generation can handle this because it is generating java code, so it > bypasses scala’s type checking and just casts the InternalRow to the desired > ColumnarBatch. This makes it difficult for others to implement the same > functionality for different processing because they can only do it through > code generation. There really is no clean separate path in the code > generation for columnar vs row based. Additionally, because it is only > supported through code generation if for any reason code generation would > fail there is no backup. This is typically fine for input formats but can be > problematic when we get into more extensive processing. > # When caching data it can optionally be cached in a columnar format if the > input is also columnar. This is similar to the f
[jira] [Commented] (SPARK-27396) SPIP: Public APIs for extended Columnar Processing Support
[ https://issues.apache.org/jira/browse/SPARK-27396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823097#comment-16823097 ] Thomas Graves commented on SPARK-27396: --- thanks for the questions and commenting, please also vote on the DEV list email chain - subject: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support I'm going to extend that vote by a few days to give more people time to comment as I know its a busy time of year. > SPIP: Public APIs for extended Columnar Processing Support > -- > > Key: SPARK-27396 > URL: https://issues.apache.org/jira/browse/SPARK-27396 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Robert Joseph Evans >Priority: Major > > *Q1.* What are you trying to do? Articulate your objectives using absolutely > no jargon. > > The Dataset/DataFrame API in Spark currently only exposes to users one row at > a time when processing data. The goals of this are to > > # Expose to end users a new option of processing the data in a columnar > format, multiple rows at a time, with the data organized into contiguous > arrays in memory. > # Make any transitions between the columnar memory layout and a row based > layout transparent to the end user. > # Allow for simple data exchange with other systems, DL/ML libraries, > pandas, etc. by having clean APIs to transform the columnar data into an > Apache Arrow compatible layout. > # Provide a plugin mechanism for columnar processing support so an advanced > user could avoid data transition between columnar and row based processing > even through shuffles. This means we should at least support pluggable APIs > so an advanced end user can implement the columnar partitioning themselves, > and provide the glue necessary to shuffle the data still in a columnar format. > # Expose new APIs that allow advanced users or frameworks to implement > columnar processing either as UDFs, or by adjusting the physical plan to do > columnar processing. If the latter is too controversial we can move it to > another SPIP, but we plan to implement some accelerated computing in parallel > with this feature to be sure the APIs work, and without this feature it makes > that impossible. > > Not Requirements, but things that would be nice to have. > # Provide default implementations for partitioning columnar data, so users > don’t have to. > # Transition the existing in memory columnar layouts to be compatible with > Apache Arrow. This would make the transformations to Apache Arrow format a > no-op. The existing formats are already very close to those layouts in many > cases. This would not be using the Apache Arrow java library, but instead > being compatible with the memory > [layout|https://arrow.apache.org/docs/format/Layout.html] and possibly only a > subset of that layout. > # Provide a clean transition from the existing code to the new one. The > existing APIs which are public but evolving are not that far off from what is > being proposed. We should be able to create a new parallel API that can wrap > the existing one. This means any file format that is trying to support > columnar can still do so until we make a conscious decision to deprecate and > then turn off the old APIs. > > *Q2.* What problem is this proposal NOT designed to solve? > This is not trying to implement any of the processing itself in a columnar > way, with the exception of examples for documentation, and possibly default > implementations for partitioning of columnar shuffle. > > *Q3.* How is it done today, and what are the limits of current practice? > The current columnar support is limited to 3 areas. > # Input formats, optionally can return a ColumnarBatch instead of rows. The > code generation phase knows how to take that columnar data and iterate > through it as rows for stages that wants rows, which currently is almost > everything. The limitations here are mostly implementation specific. The > current standard is to abuse Scala’s type erasure to return ColumnarBatches > as the elements of an RDD[InternalRow]. The code generation can handle this > because it is generating java code, so it bypasses scala’s type checking and > just casts the InternalRow to the desired ColumnarBatch. This makes it > difficult for others to implement the same functionality for different > processing because they can only do it through code generation. There really > is no clean separate path in the code generation for columnar vs row based. > Additionally because it is only supported through code generation if for any > reason code generation would fail there is no backup. This is typically fine > for input formats but can
[jira] [Comment Edited] (SPARK-27495) Support Stage level resource configuration and scheduling
[ https://issues.apache.org/jira/browse/SPARK-27495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16822166#comment-16822166 ] Thomas Graves edited comment on SPARK-27495 at 4/19/19 8:29 PM: Unfortunately the link to the original design doc was removed from https://issues.apache.org/jira/browse/SPARK-24615 but just for reference there was some good discussions about this in comments https://issues.apache.org/jira/browse/SPARK-24615?focusedCommentId=16528293&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16528293 all the way through https://issues.apache.org/jira/browse/SPARK-24615?focusedCommentId=16567393&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16567393 Just to summarize the proposal there after the discussions it was something like: {code:java} val rdd2 = rdd1.withResources().mapPartitions(){code} Possibly getting more detailed: {code:java} rdd.withResources .prefer("/gpu/k80", 2) // prefix of resource logical name, amount .require("/cpu", 1) .require("/memory", 819200) .require("/disk", 1){code} The withResources would apply to just that stage. If there were conflicting resources in say like a join {code:java} val rddA = rdd.withResources.mapPartitions() val rddB = rdd.withResources.mapPartitions() val rddC = rddA.join(rddB) {code} Then you would have to resolve that conflict by either choosing largest or merging the requirements obviously choosing the largest of the two. There are also a lot of corner cases we need to handle such as mentioned: {code:java} So which means RDDs with different resources requirements in one stage may have conflicts. For example: rdd1.withResources.mapPartitions { xxx }.withResources.mapPartitions { xxx }.collect, resources in rdd1 may be different from map rdd, so currently what I can think is that: 1. always pick the latter with warning log to say that multiple different resources in one stage is illegal. 2. fail the stage with warning log to say that multiple different resources in one stage is illegal. 3. merge conflicts with maximum resources needs. For example rdd1 requires 3 gpus per task, rdd2 requires 4 gpus per task, then the merged requirement would be 4 gpus per task. (This is the high level description, details will be per partition based merging) [chosen]. {code} One thing that was brought up multiple times is how a user will really know what stage it applies to. Users aren't necessarily going to realize where the stage boundaries are. I don't think anyone has a good solution to this. Also I think for this to really be useful it has to be tied into dynamic allocation. Without that they can just use the application level task configs we are adding in SPARK-24615. Of course the original proposal was only for RDDs as well. That was because it goes with barrier scheduling and I think the dataset/dataframe api is even harder to know where stage boundaries are because catalyst can optimize a bunch of things. was (Author: tgraves): Unfortunately the link to the original design doc was removed from https://issues.apache.org/jira/browse/SPARK-24615 but just for reference there was some good discussions about this in comments https://issues.apache.org/jira/browse/SPARK-24615?focusedCommentId=16528293&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16528293 all the way through https://issues.apache.org/jira/browse/SPARK-24615?focusedCommentId=16567393&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16567393 Just to summarize the proposal there after the discussions it was something like: {code:java} val rdd2 = rdd1.withResources().mapPartitions(){code} Possibly getting more detailed: rdd.withResources .prefer("/gpu/k80", 2) // prefix of resource logical name, amount .require("/cpu", 1) .require("/memory", 819200) .require("/disk", 1) The withResources would apply to just that stage. If there were conflicting resources in say like a join val rddA = rdd.withResources.mapPartitions() val rddB = rdd.withResources.mapPartitions() val rddC = rddA.join(rddB) Then you would have to resolve that conflict by either choosing largest or merging the requirements obviously choosing the largest of the two. There are also a lot of corner cases we need to handle such as mentioned: {code:java} So which means RDDs with different resources requirements in one stage may have conflicts. For example: rdd1.withResources.mapPartitions { xxx }.withResources.mapPartitions { xxx }.collect, resources in rdd1 may be different from map rdd, so currently what I can think is that: 1. always pick the latter with warning log to say that multiple different resources in one stage is illegal. 2. fail the stage with warning lo