[GitHub] [spark] tgravescs commented on a change in pull request #26078: [SPARK-29151][CORE] Support fractional resources for task resource scheduling

2019-11-04 Thread GitBox
tgravescs commented on a change in pull request #26078: [SPARK-29151][CORE] 
Support fractional resources for task resource scheduling
URL: https://github.com/apache/spark/pull/26078#discussion_r342089932
 
 

 ##
 File path: 
core/src/main/scala/org/apache/spark/resource/ResourceAllocator.scala
 ##
 @@ -86,8 +103,8 @@ trait ResourceAllocator {
   s"address $address doesn't exist.")
   }
   val isAvailable = addressAvailabilityMap(address)
-  if (!isAvailable) {
-addressAvailabilityMap(address) = true
+  if (isAvailable < slotsPerAddress) {
+addressAvailabilityMap(address) = addressAvailabilityMap(address) + 1
 
 Review comment:
   yeah I think the only issue here are with UI accessing, and I probably 
missed this in the reviews for the standalone support. I was thinking it was 
just read but its doing a flatMap in there.
   But I do think we should protect that. That said it is a separate issue not 
introduced by this so I think we should file a separate jira for it, 
@jiangxb1987 are you ok with that?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] tgravescs commented on a change in pull request #26078: [SPARK-29151][CORE] Support fractional resources for task resource scheduling

2019-10-22 Thread GitBox
tgravescs commented on a change in pull request #26078: [SPARK-29151][CORE] 
Support fractional resources for task resource scheduling
URL: https://github.com/apache/spark/pull/26078#discussion_r337558880
 
 

 ##
 File path: 
core/src/main/scala/org/apache/spark/resource/ResourceAllocator.scala
 ##
 @@ -30,27 +30,36 @@ trait ResourceAllocator {
 
   protected def resourceName: String
   protected def resourceAddresses: Seq[String]
+  protected def slotsPerAddress: Int
 
   /**
-   * Map from an address to its availability, the value `true` means the 
address is available,
-   * while value `false` means the address is assigned.
+   * Map from an address to its availability, a value > 0 means the address is 
available,
+   * while value of 0 means the address is fully assigned.
+   *
+   * For task resources ([[org.apache.spark.scheduler.ExecutorResourceInfo]]), 
this value
+   * can be a multiple, such that each address can be allocated up to 
[[slotsPerAddress]]
+   * times.
+   *
* TODO Use [[OpenHashMap]] instead to gain better performance.
*/
-  private lazy val addressAvailabilityMap = 
mutable.HashMap(resourceAddresses.map(_ -> true): _*)
+  private lazy val addressAvailabilityMap = {
+mutable.HashMap(resourceAddresses.map(_ -> slotsPerAddress): _*)
+  }
 
   /**
* Sequence of currently available resource addresses.
*/
-  def availableAddrs: Seq[String] = addressAvailabilityMap.flatMap { case 
(addr, available) =>
-if (available) Some(addr) else None
-  }.toSeq
+  def availableAddrs: Seq[String] = addressAvailabilityMap
+.flatMap { case (addr, available) =>
+  (0 until available).map(_ => addr)
+}.toSeq
 
   /**
* Sequence of currently assigned resource addresses.
 
 Review comment:
   can you add comment saying can see the same addr multiple times when 
slotsPeraddress > 1


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] tgravescs commented on a change in pull request #26078: [SPARK-29151][CORE] Support fractional resources for task resource scheduling

2019-10-22 Thread GitBox
tgravescs commented on a change in pull request #26078: [SPARK-29151][CORE] 
Support fractional resources for task resource scheduling
URL: https://github.com/apache/spark/pull/26078#discussion_r337546377
 
 

 ##
 File path: 
core/src/test/scala/org/apache/spark/scheduler/ExecutorResourceInfoSuite.scala
 ##
 @@ -80,12 +80,29 @@ class ExecutorResourceInfoSuite extends SparkFunSuite {
 
   test("Don't allow release address that doesn't exist") {
 // Init Executor Resource.
-val info = new ExecutorResourceInfo(GPU, ArrayBuffer("0", "1", "2", "3"))
+val info = new ExecutorResourceInfo(GPU, ArrayBuffer("0", "1", "2", "3"), 
1)
 assert(!info.assignedAddrs.contains("4"))
 // Release an address that doesn't exist
 val e = intercept[SparkException] {
   info.release(Array("4"))
 }
 assert(e.getMessage.contains("Try to release an address that doesn't 
exist."))
   }
+
+  test("Ensure that we can acquire the same fractions of a resource from an 
executor") {
+val slotSeq = Seq(10, 9, 8, 7, 6, 5, 4, 3, 2, 1)
+val addresses = ArrayBuffer("0", "1", "2", "3")
+slotSeq.foreach { slots =>
+  val info = new ExecutorResourceInfo(GPU, addresses, slots)
+  for (_ <- 0 until slots) {
+addresses.foreach(addr => info.acquire(Seq(addr)))
+  }
+  addresses.foreach { addr =>
+assertThrows[SparkException] {
+  info.acquire(Seq(addr))
+}
+assert(!info.availableAddrs.contains(addr))
 
 Review comment:
   might be nie to have assert for assignedAddrs where it returns more then 1.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] tgravescs commented on a change in pull request #26078: [SPARK-29151][CORE] Support fractional resources for task resource scheduling

2019-10-22 Thread GitBox
tgravescs commented on a change in pull request #26078: [SPARK-29151][CORE] 
Support fractional resources for task resource scheduling
URL: https://github.com/apache/spark/pull/26078#discussion_r337094423
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala
 ##
 @@ -47,7 +48,25 @@ private[spark] case class ResourceRequest(
 discoveryScript: Option[String],
 vendor: Option[String])
 
-private[spark] case class ResourceRequirement(resourceName: String, amount: 
Int)
+/**
+ * Case class that represents a user's resource requirement as given by 
configuration
+ * (e.g spark.task.resource.[resource type].amount = 4)
+ *
+ * A configuration of spark.task.resource.[resource type].amount = 4, equates 
to:
+ * amount = 4, and numParts = 1.
+ *
+ * A configuration of spark.task.resource.[resource type].amount = 0.25, 
equates to:
+ * amount = 1, and numParts = 4.
+ *
+ * @param resourceName gpu, fpga, etc.
+ * @param amount whole units of the resource we expect (e.g. 1 gpus, 2 fpgas)
+ * @param numParts if not 1, the number of ways a whole resource is subdivided.
+ */
+private[spark] case class ResourceRequirement(
+resourceName: String,
+amount: Int,
+numParts: Int = 1) {
 
 Review comment:
   remove emtpy {}


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] tgravescs commented on a change in pull request #26078: [SPARK-29151][CORE] Support fractional resources for task resource scheduling

2019-10-10 Thread GitBox
tgravescs commented on a change in pull request #26078: [SPARK-29151][CORE] 
Support fractional resources for task resource scheduling
URL: https://github.com/apache/spark/pull/26078#discussion_r333715025
 
 

 ##
 File path: 
core/src/main/scala/org/apache/spark/scheduler/ExecutorResourceInfo.scala
 ##
 @@ -25,10 +25,13 @@ import org.apache.spark.resource.{ResourceAllocator, 
ResourceInformation}
  * information.
  * @param name Resource name
  * @param addresses Resource addresses provided by the executor
+ * @param numParts Number of ways each resource is subdivided when scheduling 
tasks
  */
-private[spark] class ExecutorResourceInfo(name: String, addresses: Seq[String])
+private[spark] class ExecutorResourceInfo(name: String, addresses: Seq[String],
+  numParts: Int)
   extends ResourceInformation(name, addresses.toArray) with ResourceAllocator {
 
   override protected def resourceName = this.name
   override protected def resourceAddresses = this.addresses
+  override protected def resourcesPerAddress = numParts
 
 Review comment:
   sure slotsPerAddress sounds good.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] tgravescs commented on a change in pull request #26078: [SPARK-29151][CORE] Support fractional resources for task resource scheduling

2019-10-10 Thread GitBox
tgravescs commented on a change in pull request #26078: [SPARK-29151][CORE] 
Support fractional resources for task resource scheduling
URL: https://github.com/apache/spark/pull/26078#discussion_r333581119
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/SparkContext.scala
 ##
 @@ -2775,7 +2775,10 @@ object SparkContext extends Logging {
 s" = ${taskReq.amount}")
 }
 // Compare and update the max slots each executor can provide.
-val resourceNumSlots = execAmount / taskReq.amount
+// If the configured amount per task was < 1.0, a task is subdividing
+// executor resources. If the amount per task was > 1.0, the task wants
+// multiple executor resources.
+val resourceNumSlots = Math.floor(execAmount * 
taskReq.numParts/taskReq.amount).toInt
 
 Review comment:
   add spaces around the "/"


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] tgravescs commented on a change in pull request #26078: [SPARK-29151][CORE] Support fractional resources for task resource scheduling

2019-10-10 Thread GitBox
tgravescs commented on a change in pull request #26078: [SPARK-29151][CORE] 
Support fractional resources for task resource scheduling
URL: https://github.com/apache/spark/pull/26078#discussion_r333539938
 
 

 ##
 File path: docs/configuration.md
 ##
 @@ -1982,9 +1982,13 @@ Apart from these, the following properties are also 
available, and may be useful
   spark.task.resource.{resourceName}.amount
   1
   
-Amount of a particular resource type to allocate for each task. If this is 
specified
-you must also provide the executor config 
spark.executor.resource.{resourceName}.amount
-and any corresponding discovery configs so that your executors are created 
with that resource type.
+Amount of a particular resource type to allocate for each task, note that 
this can be a double.
+If this is specified you must also provide the executor config 
+spark.executor.resource.{resourceName}.amount and any 
corresponding discovery configs 
+so that your executors are created with that resource type. In addition to 
whole amounts, 
+a fractional amount (for example, 0.25, or 1/4th of a resource) may be 
specified. 
 
 Review comment:
   change "or 1/4th" to something like "which means 1/4th".  They way I read 
the or makes me question is I can pass in "1/4" to the config, just want it to 
be clear.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] tgravescs commented on a change in pull request #26078: [SPARK-29151][CORE] Support fractional resources for task resource scheduling

2019-10-10 Thread GitBox
tgravescs commented on a change in pull request #26078: [SPARK-29151][CORE] 
Support fractional resources for task resource scheduling
URL: https://github.com/apache/spark/pull/26078#discussion_r333571682
 
 

 ##
 File path: 
core/src/main/scala/org/apache/spark/scheduler/ExecutorResourceInfo.scala
 ##
 @@ -25,10 +25,13 @@ import org.apache.spark.resource.{ResourceAllocator, 
ResourceInformation}
  * information.
  * @param name Resource name
  * @param addresses Resource addresses provided by the executor
+ * @param numParts Number of ways each resource is subdivided when scheduling 
tasks
  */
-private[spark] class ExecutorResourceInfo(name: String, addresses: Seq[String])
+private[spark] class ExecutorResourceInfo(name: String, addresses: Seq[String],
+  numParts: Int)
 
 Review comment:
   fix formatting, should be 4 spaces idented from left


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] tgravescs commented on a change in pull request #26078: [SPARK-29151][CORE] Support fractional resources for task resource scheduling

2019-10-10 Thread GitBox
tgravescs commented on a change in pull request #26078: [SPARK-29151][CORE] 
Support fractional resources for task resource scheduling
URL: https://github.com/apache/spark/pull/26078#discussion_r333586973
 
 

 ##
 File path: 
core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala
 ##
 @@ -215,7 +217,14 @@ class CoarseGrainedSchedulerBackend(scheduler: 
TaskSchedulerImpl, val rpcEnv: Rp
   totalCoreCount.addAndGet(cores)
   totalRegisteredExecutors.addAndGet(1)
   val resourcesInfo = resources.map{ case (k, v) =>
-(v.name, new ExecutorResourceInfo(v.name, v.addresses))}
+(v.name,
+ new ExecutorResourceInfo(v.name, v.addresses,
+   taskResources
 
 Review comment:
   instead of doing this filter each time how about we convert taskResources 
into a Map (and rename it) one time at the top when its declared and then just 
lookup in the map here. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] tgravescs commented on a change in pull request #26078: [SPARK-29151][CORE] Support fractional resources for task resource scheduling

2019-10-10 Thread GitBox
tgravescs commented on a change in pull request #26078: [SPARK-29151][CORE] 
Support fractional resources for task resource scheduling
URL: https://github.com/apache/spark/pull/26078#discussion_r333538698
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala
 ##
 @@ -47,7 +48,9 @@ private[spark] case class ResourceRequest(
 discoveryScript: Option[String],
 vendor: Option[String])
 
-private[spark] case class ResourceRequirement(resourceName: String, amount: 
Int)
+private[spark] case class ResourceRequirement(resourceName: String, amount: 
Int,
 
 Review comment:
   lets add description, especially explaining numParts and amount. Mention 
fractional use case where amount is user specified get changed to 1 and 
numParts is the number per resource equivalent.
   
   Also need to fix formatting.  Please use formatting like ResourceRequest 
above


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] tgravescs commented on a change in pull request #26078: [SPARK-29151][CORE] Support fractional resources for task resource scheduling

2019-10-10 Thread GitBox
tgravescs commented on a change in pull request #26078: [SPARK-29151][CORE] 
Support fractional resources for task resource scheduling
URL: https://github.com/apache/spark/pull/26078#discussion_r333564756
 
 

 ##
 File path: docs/configuration.md
 ##
 @@ -1982,9 +1982,13 @@ Apart from these, the following properties are also 
available, and may be useful
   spark.task.resource.{resourceName}.amount
   1
   
-Amount of a particular resource type to allocate for each task. If this is 
specified
-you must also provide the executor config 
spark.executor.resource.{resourceName}.amount
-and any corresponding discovery configs so that your executors are created 
with that resource type.
+Amount of a particular resource type to allocate for each task, note that 
this can be a double.
+If this is specified you must also provide the executor config 
+spark.executor.resource.{resourceName}.amount and any 
corresponding discovery configs 
+so that your executors are created with that resource type. In addition to 
whole amounts, 
+a fractional amount (for example, 0.25, or 1/4th of a resource) may be 
specified. 
+Fractional amounts must be less than or equal to 0.5, or in other words, 
the minimum amount of
 
 Review comment:
   we should also add that any fractional value specified gets rounded to have 
an even number of tasks per resource.  ie 0. end up with 4 tasks per 
resource 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] tgravescs commented on a change in pull request #26078: [SPARK-29151][CORE] Support fractional resources for task resource scheduling

2019-10-10 Thread GitBox
tgravescs commented on a change in pull request #26078: [SPARK-29151][CORE] 
Support fractional resources for task resource scheduling
URL: https://github.com/apache/spark/pull/26078#discussion_r333582986
 
 

 ##
 File path: 
core/src/main/scala/org/apache/spark/resource/ResourceAllocator.scala
 ##
 @@ -30,27 +30,35 @@ trait ResourceAllocator {
 
   protected def resourceName: String
   protected def resourceAddresses: Seq[String]
+  protected def resourcesPerAddress: Int
 
   /**
-   * Map from an address to its availability, the value `true` means the 
address is available,
-   * while value `false` means the address is assigned.
+   * Map from an address to its availability, a value > 0 means the address is 
available,
+   * while value of 0 means the address is fully assigned.
+   *
+   * For task resources ([[org.apache.spark.scheduler.ExecutorResourceInfo]]), 
this value
+   * can be fractional.
 
 Review comment:
   I think we should clarify, here its actually not fractional its a multiple. 
I think if its called tasksPerAddress that will help 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] tgravescs commented on a change in pull request #26078: [SPARK-29151][CORE] Support fractional resources for task resource scheduling

2019-10-10 Thread GitBox
tgravescs commented on a change in pull request #26078: [SPARK-29151][CORE] 
Support fractional resources for task resource scheduling
URL: https://github.com/apache/spark/pull/26078#discussion_r333574367
 
 

 ##
 File path: 
core/src/main/scala/org/apache/spark/scheduler/ExecutorResourceInfo.scala
 ##
 @@ -25,10 +25,13 @@ import org.apache.spark.resource.{ResourceAllocator, 
ResourceInformation}
  * information.
  * @param name Resource name
  * @param addresses Resource addresses provided by the executor
+ * @param numParts Number of ways each resource is subdivided when scheduling 
tasks
  */
-private[spark] class ExecutorResourceInfo(name: String, addresses: Seq[String])
+private[spark] class ExecutorResourceInfo(name: String, addresses: Seq[String],
+  numParts: Int)
   extends ResourceInformation(name, addresses.toArray) with ResourceAllocator {
 
   override protected def resourceName = this.name
   override protected def resourceAddresses = this.addresses
+  override protected def resourcesPerAddress = numParts
 
 Review comment:
   I think we should rename. resourcesPerAddress could be confusing since an 
address is a single resource.  
   I'm thinking we just call this tasksPerAddress and perhaps update this to be 
consistent everywhere - ie rename numParts as well.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org