[GitHub] spark issue #23252: [SPARK-26239] File-based secret key loading for SASL.

2018-12-07 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/23252
  
How is this file protected in kubernetes?  
I actually don't like this idea at least for yarn and other deployments, I 
see people abusing it (accidentally) and using it in non-secure manner. 
I realize its up to the user to shoot themselves but I would like to keep 
that to a minimal to where they can shoot themselves.

Do you need updates to the kurbernetes specific docs on how users use this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23242: [SPARK-26285][CORE] accumulator metrics sources for Long...

2018-12-07 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/23242
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23223: [SPARK-26269][YARN]Yarnallocator should have same blackl...

2018-12-06 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/23223
  
if you aren't seeing actual issues with this I guess it would be 
interesting to test it further to see if it does.  I can see spark blacklisting 
when it shouldn't for exit codes like you mention (KILLED_BY_RESOURCEMANAGER) . 
 so I guess I would like to see someone test this further and determine if that 
happens.  If it does we should change to bug and put into 2.4.1.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23223: [SPARK-26269][YARN]Yarnallocator should have same blackl...

2018-12-05 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/23223
  
the approach here makes sense.   Are you seeing actual issues with this 
blacklisting when it shouldn't?  I could see that possible there and if so we 
should move this to defect and make sure it goes into 2.4.1


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23223: [SPARK-26269][YARN]Yarnallocator should have same...

2018-12-05 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/23223#discussion_r239110361
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
 ---
@@ -612,11 +612,14 @@ private[yarn] class YarnAllocator(
 val message = "Container killed by YARN for exceeding physical 
memory limits. " +
   s"$diag Consider boosting ${EXECUTOR_MEMORY_OVERHEAD.key}."
 (true, message)
+  case exit_status if 
NOT_APP_AND_SYSTEM_FAULT_EXIT_STATUS.contains(exit_status) =>
--- End diff --

yeah I agree this should be cleaned up we already handle cases above that 
are in the NOT_APP_AND_SYSTEM_FAULT_EXIT_STATUS set.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23223: [SPARK-26269][YARN]Yarnallocator should have same blackl...

2018-12-05 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/23223
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23166: [SPARK-26201] Fix python broadcast with encryption

2018-11-30 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/23166
  
thanks @redsanket @squito, committed master, 2.4, and 2.3.2


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23166: [SPARK-26201] Fix python broadcast with encryptio...

2018-11-30 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/23166#discussion_r237875851
  
--- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala 
---
@@ -708,16 +709,36 @@ private[spark] class PythonBroadcast(@transient var 
path: String) extends Serial
   override def handleConnection(sock: Socket): Unit = {
 val env = SparkEnv.get
 val in = sock.getInputStream()
-val dir = new File(Utils.getLocalDir(env.conf))
-val file = File.createTempFile("broadcast", "", dir)
-path = file.getAbsolutePath
-val out = env.serializerManager.wrapForEncryption(new 
FileOutputStream(path))
+val abspath = new File(path).getAbsolutePath
+val out = env.serializerManager.wrapForEncryption(new 
FileOutputStream(abspath))
--- End diff --

ok I think we agree its good this way, (just to verify though I won't 
commit until you +1 it), but yes you are correct, now that we are using the 
decryption server which reads from the path in PythonBroadcast the path change 
isn't strictly necessary, but the value of self._path in broadcast.py doesn't 
match the path in PythonBroadcast so I think its better to have those match.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23160: [SPARK-26196][WebUI] Total tasks title in the stage page...

2018-11-30 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/23160
  
reviewing, to clarify @pgandhi999 question, I assume the screenshots above 
are cut off and the task table is really showing 7 total failed tasks?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23166: [SPARK-26201] Fix python broadcast with encryptio...

2018-11-28 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/23166#discussion_r237248890
  
--- Diff: python/pyspark/broadcast.py ---
@@ -134,7 +137,15 @@ def value(self):
 """ Return the broadcasted value
 """
 if not hasattr(self, "_value") and self._path is not None:
-self._value = self.load_from_path(self._path)
+# we only need to decrypt it here when encryption is enabled 
and
--- End diff --

I think you can get in here when encryption off on executors and self._sc 
would be not definied


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23166: [SPARK-26201] Fix python broadcast with encryptio...

2018-11-28 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/23166#discussion_r237212353
  
--- Diff: python/pyspark/tests/test_broadcast.py ---
@@ -67,6 +67,20 @@ def test_broadcast_with_encryption(self):
 def test_broadcast_no_encryption(self):
 self._test_multiple_broadcasts()
 
+def _test_broadcast_on_driver(self, *extra_confs):
+conf = SparkConf()
+for key, value in extra_confs:
+conf.set(key, value)
+conf.setMaster("local-cluster[2,1,1024]")
+self.sc = SparkContext(conf=conf)
+bs = self.sc.broadcast(value=5)
+self.assertEqual(5, bs.value)
+
+def test_broadcast_value_driver_no_encryption(self):
+self._test_broadcast_on_driver()
+
+def test_broadcast_value_driver_encryption(self):
+
self.self._test_broadcast_on_driver(("spark.io.encryption.enabled", "true"))
--- End diff --

have an extra .self here


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23166: [SPARK-26201] Fix python broadcast with encryption

2018-11-28 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/23166
  
Yeah certainly seems like a good idea.  The only question I have is does 
this cause more memory usage on the driver because it has a reference to that 
broadcast value or is something else already holding on to it?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23166: [SPARK-26201] Fix python broadcast with encryption

2018-11-28 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/23166
  
cc @squito 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23166: [SPARK-26201] Fix python broadcast with encryption

2018-11-28 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/23166
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21688: [SPARK-21809] : Change Stage Page to use datatables to s...

2018-11-26 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/21688
  
+1 , going to merge to master

There are a few followup jiras on this.
1) make the timeline visualization better: 
https://issues.apache.org/jira/browse/SPARK-26130
2) improve search functionalit: 
https://issues.apache.org/jira/browse/SPARK-25719


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21688: [SPARK-21809] : Change Stage Page to use datatables to s...

2018-11-26 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/21688
  
Test this please




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23103: [SPARK-26121] [Structured Streaming] Allow users to defi...

2018-11-22 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/23103
  
Lgtm 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21688: [SPARK-21809] : Change Stage Page to use datatables to s...

2018-11-21 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/21688
  
Test this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21688: [SPARK-21809] : Change Stage Page to use datatables to s...

2018-11-21 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/21688
  
test this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23103: [SPARK-26121] [Structured Streaming] Allow users to defi...

2018-11-21 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/23103
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21688: [SPARK-21809] : Change Stage Page to use datatabl...

2018-11-20 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/21688#discussion_r235164507
  
--- Diff: core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala ---
@@ -92,6 +92,14 @@ private[ui] class StagePage(parent: StagesTab, store: 
AppStatusStore) extends We
 val parameterTaskSortDesc = 
UIUtils.stripXSS(request.getParameter("task.desc"))
 val parameterTaskPageSize = 
UIUtils.stripXSS(request.getParameter("task.pageSize"))
--- End diff --

we need to clean this up as its not really used anymore, can you file a 
jira to change the timeline to use rest api or the data from the other tables..


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21688: [SPARK-21809] : Change Stage Page to use datatabl...

2018-11-20 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/21688#discussion_r235157033
  
--- Diff: core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala ---
@@ -259,7 +278,8 @@ private[ui] class StagePage(parent: StagesTab, store: 
AppStatusStore) extends We
 
   }
 
-  def makeTimeline(tasks: Seq[TaskData], currentTime: Long): Seq[Node] = {
+  def makeTimeline(tasks: Seq[TaskData], currentTime: Long, page: Int, 
pageSize: Int,
+totalPages: Int, stageId: Int, stageAttemptId: Int, totalTasks: Int): 
Seq[Node] = {
--- End diff --

fix format of arguments to be multi-line style like:

def runJob(
  sc: SparkContext,
  rdd: JavaRDD[Array[Byte]],
  partitions: JArrayList[Int]): Array[Any] = {


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19840: [SPARK-22640][PYSPARK][YARN]switch python exec on execut...

2018-11-20 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/19840
  
I didn't read the entire thread here but what you want is this:

 --archives hdfs:///python36/python36.tgz#python36 --conf 
spark.pyspark.python=./python36/bin/python3.6 --conf 
spark.executorEnv.LD_LIBRARY_PATH=./python36/lib --driver-library-path 
/opt/python36/lib --conf spark.pyspark.driver.python=/opt/python36/bin/python3.6


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21688: [SPARK-21809] : Change Stage Page to use datatabl...

2018-11-19 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/21688#discussion_r234765781
  
--- Diff: core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala ---
@@ -315,187 +241,22 @@ private[ui] class StagePage(parent: StagesTab, 
store: AppStatusStore) extends We
 }
   
 
-val metricsSummary = store.taskSummary(stageData.stageId, 
stageData.attemptId,
-  Array(0, 0.25, 0.5, 0.75, 1.0))
-
-val summaryTable = metricsSummary.map { metrics =>
-  def timeQuantiles(data: IndexedSeq[Double]): Seq[Node] = {
-data.map { millis =>
-  {UIUtils.formatDuration(millis.toLong)}
-}
-  }
-
-  def sizeQuantiles(data: IndexedSeq[Double]): Seq[Node] = {
-data.map { size =>
-  {Utils.bytesToString(size.toLong)}
-}
-  }
-
-  def sizeQuantilesWithRecords(
-  data: IndexedSeq[Double],
-  records: IndexedSeq[Double]) : Seq[Node] = {
-data.zip(records).map { case (d, r) =>
-  {s"${Utils.bytesToString(d.toLong)} / ${r.toLong}"}
-}
-  }
-
-  def titleCell(title: String, tooltip: String): Seq[Node] = {
-
-  
-{title}
-  
-
-  }
-
-  def simpleTitleCell(title: String): Seq[Node] = {title}
-
-  val deserializationQuantiles = titleCell("Task Deserialization Time",
-ToolTips.TASK_DESERIALIZATION_TIME) ++ 
timeQuantiles(metrics.executorDeserializeTime)
-
-  val serviceQuantiles = simpleTitleCell("Duration") ++ 
timeQuantiles(metrics.executorRunTime)
-
-  val gcQuantiles = titleCell("GC Time", ToolTips.GC_TIME) ++ 
timeQuantiles(metrics.jvmGcTime)
-
-  val serializationQuantiles = titleCell("Result Serialization Time",
-ToolTips.RESULT_SERIALIZATION_TIME) ++ 
timeQuantiles(metrics.resultSerializationTime)
-
-  val gettingResultQuantiles = titleCell("Getting Result Time", 
ToolTips.GETTING_RESULT_TIME) ++
-timeQuantiles(metrics.gettingResultTime)
-
-  val peakExecutionMemoryQuantiles = titleCell("Peak Execution Memory",
-ToolTips.PEAK_EXECUTION_MEMORY) ++ 
sizeQuantiles(metrics.peakExecutionMemory)
-
-  // The scheduler delay includes the network delay to send the task 
to the worker
-  // machine and to send back the result (but not the time to fetch 
the task result,
-  // if it needed to be fetched from the block manager on the worker).
-  val schedulerDelayQuantiles = titleCell("Scheduler Delay", 
ToolTips.SCHEDULER_DELAY) ++
-timeQuantiles(metrics.schedulerDelay)
-
-  def inputQuantiles: Seq[Node] = {
-simpleTitleCell("Input Size / Records") ++
-  sizeQuantilesWithRecords(metrics.inputMetrics.bytesRead, 
metrics.inputMetrics.recordsRead)
-  }
-
-  def outputQuantiles: Seq[Node] = {
-simpleTitleCell("Output Size / Records") ++
-  sizeQuantilesWithRecords(metrics.outputMetrics.bytesWritten,
-metrics.outputMetrics.recordsWritten)
-  }
-
-  def shuffleReadBlockedQuantiles: Seq[Node] = {
-titleCell("Shuffle Read Blocked Time", 
ToolTips.SHUFFLE_READ_BLOCKED_TIME) ++
-  timeQuantiles(metrics.shuffleReadMetrics.fetchWaitTime)
-  }
-
-  def shuffleReadTotalQuantiles: Seq[Node] = {
-titleCell("Shuffle Read Size / Records", ToolTips.SHUFFLE_READ) ++
-  sizeQuantilesWithRecords(metrics.shuffleReadMetrics.readBytes,
-metrics.shuffleReadMetrics.readRecords)
-  }
-
-  def shuffleReadRemoteQuantiles: Seq[Node] = {
-titleCell("Shuffle Remote Reads", 
ToolTips.SHUFFLE_READ_REMOTE_SIZE) ++
-  sizeQuantiles(metrics.shuffleReadMetrics.remoteBytesRead)
-  }
-
-  def shuffleWriteQuantiles: Seq[Node] = {
-simpleTitleCell("Shuffle Write Size / Records") ++
-  sizeQuantilesWithRecords(metrics.shuffleWriteMetrics.writeBytes,
-metrics.shuffleWriteMetrics.writeRecords)
-  }
-
-  def memoryBytesSpilledQuantiles: Seq[Node] = {
-simpleTitleCell("Shuffle spill (memory)") ++ 
sizeQuantiles(metrics.memoryBytesSpilled)
-  }
-
-  def diskBytesSpilledQuantiles: Seq[Node] = {
-simpleTitleCell("Shuffle spill (disk)") ++ 
sizeQuantiles(metrics.diskBytesSpilled)
-  }
-
-  val listings: Seq[Seq[Node]] = Seq(
-{serviceQuantiles},
-{sche

[GitHub] spark pull request #21688: [SPARK-21809] : Change Stage Page to use datatabl...

2018-11-19 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/21688#discussion_r234757635
  
--- Diff: 
core/src/main/scala/org/apache/spark/status/api/v1/StagesResource.scala ---
@@ -102,4 +103,120 @@ private[v1] class StagesResource extends 
BaseAppResource {
 withUI(_.store.taskList(stageId, stageAttemptId, offset, length, 
sortBy))
   }
 
+  // This api needs to stay formatted exactly as it is below, since, it is 
being used by the
+  // datatables for the stages page.
+  @GET
+  @Path("{stageId: \\d+}/{stageAttemptId: \\d+}/taskTable")
+  def taskTable(
+@PathParam("stageId") stageId: Int,
+@PathParam("stageAttemptId") stageAttemptId: Int,
+@QueryParam("details") @DefaultValue("true") details: Boolean,
+@Context uriInfo: UriInfo):
+  HashMap[String, Object] = {
+withUI { ui =>
+  val uriQueryParameters = uriInfo.getQueryParameters(true)
+  val totalRecords = uriQueryParameters.getFirst("numTasks")
+  var isSearch = false
+  var searchValue: Option[String] = None
+  var filteredRecords = totalRecords
+  // The datatables client API sends a list of query parameters to the 
server which contain
+  // information like the columns to be sorted, search value typed by 
the user in the search
+  // box, pagination index etc. For more information on these query 
parameters,
+  // refer https://datatables.net/manual/server-side.
+  if (uriQueryParameters.getFirst("search[value]") != null &&
+uriQueryParameters.getFirst("search[value]").length > 0) {
+isSearch = true
+searchValue = Some(uriQueryParameters.getFirst("search[value]"))
+  }
+  val _tasksToShow: Seq[TaskData] = doPagination(uriQueryParameters, 
stageId, stageAttemptId,
+isSearch, totalRecords.toInt)
+  val ret = new HashMap[String, Object]()
+  if (_tasksToShow.nonEmpty) {
+// Performs server-side search based on input from user
+if (isSearch) {
+  val filteredTaskList = filterTaskList(_tasksToShow, 
searchValue.get)
--- End diff --

sorry things have changed a bit since my first comment, but there is no 
reason to have searchValue an option anymore. All you do is set Some and then 
.get it, so just use a string.  Here we only reference it when you know its 
valid since you check its not empyt and length >0


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23013: [SPARK-25023] More detailed security guidance for K8S

2018-11-19 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/23013
  
sorry for my delay as I was on vacation.  I would like to clarify if k8s 
support spark.authenticate and if so how does it do it?Without that, Spark 
is not secure with k8s unless you are securing via ssl or other.

Is there someone familiar with the k8s on spark that would know the answer?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21688: [SPARK-21809] : Change Stage Page to use datatabl...

2018-11-12 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/21688#discussion_r232832492
  
--- Diff: 
core/src/main/scala/org/apache/spark/status/api/v1/StagesResource.scala ---
@@ -162,26 +165,29 @@ private[v1] class StagesResource extends 
BaseAppResource {
   // Performs pagination on the server side
   def doPagination(queryParameters: MultivaluedMap[String, String], 
stageId: Int,
 stageAttemptId: Int, isSearch: Boolean, totalRecords: Int): 
Seq[TaskData] = {
-val queryParams = queryParameters.keySet()
 var columnNameToSort = queryParameters.getFirst("columnNameToSort")
+// Sorting on Logs column will default to Index column sort
 if (columnNameToSort.equalsIgnoreCase("Logs")) {
   columnNameToSort = "Index"
 }
 val isAscendingStr = queryParameters.getFirst("order[0][dir]")
 var pageStartIndex = 0
 var pageLength = totalRecords
+// We fetch only the desired rows upto the specified page length for 
all cases except when a
+// search query is present, in that case, we need to fetch all the 
rows to perform the search
+// on the entire table
 if (!isSearch) {
   pageStartIndex = queryParameters.getFirst("start").toInt
   pageLength = queryParameters.getFirst("length").toInt
 }
-return withUI(_.store.taskList(stageId, stageAttemptId, 
pageStartIndex, pageLength,
+withUI(_.store.taskList(stageId, stageAttemptId, pageStartIndex, 
pageLength,
   indexName(columnNameToSort), isAscendingStr.equalsIgnoreCase("asc")))
   }
 
   // Filters task list based on search parameter
   def filterTaskList(
 taskDataList: Seq[TaskData],
-searchValue: String): Seq[TaskData] = {
+searchValue: String): Option[Seq[TaskData]] = {
--- End diff --

sorry my comment was confusing don't make the return value Option


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21688: [SPARK-21809] : Change Stage Page to use datatabl...

2018-11-12 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/21688#discussion_r232738700
  
--- Diff: 
core/src/main/scala/org/apache/spark/status/api/v1/StagesResource.scala ---
@@ -102,4 +103,120 @@ private[v1] class StagesResource extends 
BaseAppResource {
 withUI(_.store.taskList(stageId, stageAttemptId, offset, length, 
sortBy))
   }
 
+  // This api needs to stay formatted exactly as it is below, since, it is 
being used by the
+  // datatables for the stages page.
+  @GET
+  @Path("{stageId: \\d+}/{stageAttemptId: \\d+}/taskTable")
+  def taskTable(
+@PathParam("stageId") stageId: Int,
+@PathParam("stageAttemptId") stageAttemptId: Int,
+@QueryParam("details") @DefaultValue("true") details: Boolean,
+@Context uriInfo: UriInfo):
+  HashMap[String, Object] = {
+withUI { ui =>
+  val uriQueryParameters = uriInfo.getQueryParameters(true)
+  val totalRecords = uriQueryParameters.getFirst("numTasks")
+  var isSearch = false
+  var searchValue: String = null
+  var filteredRecords = totalRecords
+  var _tasksToShow: Seq[TaskData] = null
+  // The datatables client API sends a list of query parameters to the 
server which contain
+  // information like the columns to be sorted, search value typed by 
the user in the search
+  // box, pagination index etc. For more information on these query 
parameters,
+  // refer https://datatables.net/manual/server-side.
+  if (uriQueryParameters.getFirst("search[value]") != null &&
+uriQueryParameters.getFirst("search[value]").length > 0) {
+_tasksToShow = doPagination(uriQueryParameters, stageId, 
stageAttemptId, true,
+  totalRecords.toInt)
+isSearch = true
+searchValue = uriQueryParameters.getFirst("search[value]")
+  } else {
+_tasksToShow = doPagination(uriQueryParameters, stageId, 
stageAttemptId, false,
+  totalRecords.toInt)
+  }
+  val ret = new HashMap[String, Object]()
+  if (_tasksToShow.nonEmpty) {
+// Performs server-side search based on input from user
+if (isSearch) {
+  val filteredTaskList = filterTaskList(_tasksToShow, searchValue)
+  filteredRecords = filteredTaskList.length.toString
+  if (filteredTaskList.length > 0) {
+val pageStartIndex = uriQueryParameters.getFirst("start").toInt
+val pageLength = uriQueryParameters.getFirst("length").toInt
+ret.put("aaData", filteredTaskList.slice(pageStartIndex, 
pageStartIndex + pageLength))
+  } else {
+ret.put("aaData", filteredTaskList)
+  }
+} else {
+  ret.put("aaData", _tasksToShow)
+}
+  } else {
+ret.put("aaData", _tasksToShow)
+  }
+  ret.put("recordsTotal", totalRecords)
+  ret.put("recordsFiltered", filteredRecords)
+  ret
+}
+  }
+
+  // Performs pagination on the server side
+  def doPagination(queryParameters: MultivaluedMap[String, String], 
stageId: Int,
+stageAttemptId: Int, isSearch: Boolean, totalRecords: Int): 
Seq[TaskData] = {
+val queryParams = queryParameters.keySet()
+var columnNameToSort = queryParameters.getFirst("columnNameToSort")
+if (columnNameToSort.equalsIgnoreCase("Logs")) {
+  columnNameToSort = "Index"
+}
+val isAscendingStr = queryParameters.getFirst("order[0][dir]")
+var pageStartIndex = 0
+var pageLength = totalRecords
+if (!isSearch) {
+  pageStartIndex = queryParameters.getFirst("start").toInt
+  pageLength = queryParameters.getFirst("length").toInt
+}
+return withUI(_.store.taskList(stageId, stageAttemptId, 
pageStartIndex, pageLength,
+  indexName(columnNameToSort), isAscendingStr.equalsIgnoreCase("asc")))
+  }
+
+  // Filters task list based on search parameter
+  def filterTaskList(
+taskDataList: Seq[TaskData],
+searchValue: String): Seq[TaskData] = {
--- End diff --

change to an Option and you can just do a map on it to handle the case it 
might be empty


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21688: [SPARK-21809] : Change Stage Page to use datatabl...

2018-11-12 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/21688#discussion_r232736598
  
--- Diff: 
core/src/main/scala/org/apache/spark/status/api/v1/StagesResource.scala ---
@@ -102,4 +103,120 @@ private[v1] class StagesResource extends 
BaseAppResource {
 withUI(_.store.taskList(stageId, stageAttemptId, offset, length, 
sortBy))
   }
 
+  // This api needs to stay formatted exactly as it is below, since, it is 
being used by the
+  // datatables for the stages page.
+  @GET
+  @Path("{stageId: \\d+}/{stageAttemptId: \\d+}/taskTable")
+  def taskTable(
+@PathParam("stageId") stageId: Int,
+@PathParam("stageAttemptId") stageAttemptId: Int,
+@QueryParam("details") @DefaultValue("true") details: Boolean,
+@Context uriInfo: UriInfo):
+  HashMap[String, Object] = {
+withUI { ui =>
+  val uriQueryParameters = uriInfo.getQueryParameters(true)
+  val totalRecords = uriQueryParameters.getFirst("numTasks")
+  var isSearch = false
+  var searchValue: String = null
+  var filteredRecords = totalRecords
+  var _tasksToShow: Seq[TaskData] = null
--- End diff --

don't pre initialize this, just set it as results of call to doPagination 
which should no longer be in if (see comment below)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21688: [SPARK-21809] : Change Stage Page to use datatabl...

2018-11-12 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/21688#discussion_r232736621
  
--- Diff: 
core/src/main/scala/org/apache/spark/status/api/v1/StagesResource.scala ---
@@ -102,4 +103,120 @@ private[v1] class StagesResource extends 
BaseAppResource {
 withUI(_.store.taskList(stageId, stageAttemptId, offset, length, 
sortBy))
   }
 
+  // This api needs to stay formatted exactly as it is below, since, it is 
being used by the
+  // datatables for the stages page.
+  @GET
+  @Path("{stageId: \\d+}/{stageAttemptId: \\d+}/taskTable")
+  def taskTable(
+@PathParam("stageId") stageId: Int,
+@PathParam("stageAttemptId") stageAttemptId: Int,
+@QueryParam("details") @DefaultValue("true") details: Boolean,
+@Context uriInfo: UriInfo):
+  HashMap[String, Object] = {
+withUI { ui =>
+  val uriQueryParameters = uriInfo.getQueryParameters(true)
+  val totalRecords = uriQueryParameters.getFirst("numTasks")
+  var isSearch = false
+  var searchValue: String = null
+  var filteredRecords = totalRecords
+  var _tasksToShow: Seq[TaskData] = null
+  // The datatables client API sends a list of query parameters to the 
server which contain
+  // information like the columns to be sorted, search value typed by 
the user in the search
+  // box, pagination index etc. For more information on these query 
parameters,
+  // refer https://datatables.net/manual/server-side.
+  if (uriQueryParameters.getFirst("search[value]") != null &&
+uriQueryParameters.getFirst("search[value]").length > 0) {
+_tasksToShow = doPagination(uriQueryParameters, stageId, 
stageAttemptId, true,
+  totalRecords.toInt)
+isSearch = true
--- End diff --

we set isSearch and pass in true above, just set this above and pass into 
doPagination.

Really I think you could simplify this and do the if then set isSearch and 
only have 1 call to doPagination outside the if.  


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21688: [SPARK-21809] : Change Stage Page to use datatabl...

2018-11-12 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/21688#discussion_r232736375
  
--- Diff: 
core/src/main/scala/org/apache/spark/status/api/v1/StagesResource.scala ---
@@ -102,4 +103,120 @@ private[v1] class StagesResource extends 
BaseAppResource {
 withUI(_.store.taskList(stageId, stageAttemptId, offset, length, 
sortBy))
   }
 
+  // This api needs to stay formatted exactly as it is below, since, it is 
being used by the
+  // datatables for the stages page.
+  @GET
+  @Path("{stageId: \\d+}/{stageAttemptId: \\d+}/taskTable")
+  def taskTable(
+@PathParam("stageId") stageId: Int,
+@PathParam("stageAttemptId") stageAttemptId: Int,
+@QueryParam("details") @DefaultValue("true") details: Boolean,
+@Context uriInfo: UriInfo):
+  HashMap[String, Object] = {
+withUI { ui =>
+  val uriQueryParameters = uriInfo.getQueryParameters(true)
+  val totalRecords = uriQueryParameters.getFirst("numTasks")
+  var isSearch = false
+  var searchValue: String = null
--- End diff --

use Option instead of null


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21688: [SPARK-21809] : Change Stage Page to use datatabl...

2018-11-12 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/21688#discussion_r232735802
  
--- Diff: 
core/src/main/scala/org/apache/spark/status/api/v1/StagesResource.scala ---
@@ -102,4 +103,120 @@ private[v1] class StagesResource extends 
BaseAppResource {
 withUI(_.store.taskList(stageId, stageAttemptId, offset, length, 
sortBy))
   }
 
+  // This api needs to stay formatted exactly as it is below, since, it is 
being used by the
+  // datatables for the stages page.
+  @GET
+  @Path("{stageId: \\d+}/{stageAttemptId: \\d+}/taskTable")
+  def taskTable(
+@PathParam("stageId") stageId: Int,
+@PathParam("stageAttemptId") stageAttemptId: Int,
+@QueryParam("details") @DefaultValue("true") details: Boolean,
+@Context uriInfo: UriInfo):
+  HashMap[String, Object] = {
+withUI { ui =>
+  val uriQueryParameters = uriInfo.getQueryParameters(true)
+  val totalRecords = uriQueryParameters.getFirst("numTasks")
+  var isSearch = false
+  var searchValue: String = null
+  var filteredRecords = totalRecords
+  var _tasksToShow: Seq[TaskData] = null
+  // The datatables client API sends a list of query parameters to the 
server which contain
+  // information like the columns to be sorted, search value typed by 
the user in the search
+  // box, pagination index etc. For more information on these query 
parameters,
+  // refer https://datatables.net/manual/server-side.
+  if (uriQueryParameters.getFirst("search[value]") != null &&
+uriQueryParameters.getFirst("search[value]").length > 0) {
+_tasksToShow = doPagination(uriQueryParameters, stageId, 
stageAttemptId, true,
+  totalRecords.toInt)
+isSearch = true
+searchValue = uriQueryParameters.getFirst("search[value]")
+  } else {
+_tasksToShow = doPagination(uriQueryParameters, stageId, 
stageAttemptId, false,
+  totalRecords.toInt)
+  }
+  val ret = new HashMap[String, Object]()
+  if (_tasksToShow.nonEmpty) {
+// Performs server-side search based on input from user
+if (isSearch) {
+  val filteredTaskList = filterTaskList(_tasksToShow, searchValue)
+  filteredRecords = filteredTaskList.length.toString
+  if (filteredTaskList.length > 0) {
+val pageStartIndex = uriQueryParameters.getFirst("start").toInt
+val pageLength = uriQueryParameters.getFirst("length").toInt
+ret.put("aaData", filteredTaskList.slice(pageStartIndex, 
pageStartIndex + pageLength))
+  } else {
+ret.put("aaData", filteredTaskList)
+  }
+} else {
+  ret.put("aaData", _tasksToShow)
+}
+  } else {
+ret.put("aaData", _tasksToShow)
+  }
+  ret.put("recordsTotal", totalRecords)
+  ret.put("recordsFiltered", filteredRecords)
+  ret
+}
+  }
+
+  // Performs pagination on the server side
+  def doPagination(queryParameters: MultivaluedMap[String, String], 
stageId: Int,
+stageAttemptId: Int, isSearch: Boolean, totalRecords: Int): 
Seq[TaskData] = {
+val queryParams = queryParameters.keySet()
+var columnNameToSort = queryParameters.getFirst("columnNameToSort")
+if (columnNameToSort.equalsIgnoreCase("Logs")) {
+  columnNameToSort = "Index"
+}
+val isAscendingStr = queryParameters.getFirst("order[0][dir]")
+var pageStartIndex = 0
+var pageLength = totalRecords
+if (!isSearch) {
--- End diff --

lets add comment about not trimming since search wants to look at all 
results


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21688: [SPARK-21809] : Change Stage Page to use datatabl...

2018-11-12 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/21688#discussion_r232711679
  
--- Diff: 
core/src/main/scala/org/apache/spark/status/api/v1/StagesResource.scala ---
@@ -102,4 +103,120 @@ private[v1] class StagesResource extends 
BaseAppResource {
 withUI(_.store.taskList(stageId, stageAttemptId, offset, length, 
sortBy))
   }
 
+  // This api needs to stay formatted exactly as it is below, since, it is 
being used by the
+  // datatables for the stages page.
+  @GET
+  @Path("{stageId: \\d+}/{stageAttemptId: \\d+}/taskTable")
+  def taskTable(
+@PathParam("stageId") stageId: Int,
+@PathParam("stageAttemptId") stageAttemptId: Int,
+@QueryParam("details") @DefaultValue("true") details: Boolean,
+@Context uriInfo: UriInfo):
+  HashMap[String, Object] = {
+withUI { ui =>
+  val uriQueryParameters = uriInfo.getQueryParameters(true)
+  val totalRecords = uriQueryParameters.getFirst("numTasks")
+  var isSearch = false
+  var searchValue: String = null
+  var filteredRecords = totalRecords
+  var _tasksToShow: Seq[TaskData] = null
+  // The datatables client API sends a list of query parameters to the 
server which contain
+  // information like the columns to be sorted, search value typed by 
the user in the search
+  // box, pagination index etc. For more information on these query 
parameters,
+  // refer https://datatables.net/manual/server-side.
+  if (uriQueryParameters.getFirst("search[value]") != null &&
+uriQueryParameters.getFirst("search[value]").length > 0) {
+_tasksToShow = doPagination(uriQueryParameters, stageId, 
stageAttemptId, true,
+  totalRecords.toInt)
+isSearch = true
+searchValue = uriQueryParameters.getFirst("search[value]")
+  } else {
+_tasksToShow = doPagination(uriQueryParameters, stageId, 
stageAttemptId, false,
+  totalRecords.toInt)
+  }
+  val ret = new HashMap[String, Object]()
+  if (_tasksToShow.nonEmpty) {
+// Performs server-side search based on input from user
+if (isSearch) {
+  val filteredTaskList = filterTaskList(_tasksToShow, searchValue)
+  filteredRecords = filteredTaskList.length.toString
+  if (filteredTaskList.length > 0) {
+val pageStartIndex = uriQueryParameters.getFirst("start").toInt
+val pageLength = uriQueryParameters.getFirst("length").toInt
+ret.put("aaData", filteredTaskList.slice(pageStartIndex, 
pageStartIndex + pageLength))
+  } else {
+ret.put("aaData", filteredTaskList)
+  }
+} else {
+  ret.put("aaData", _tasksToShow)
+}
+  } else {
+ret.put("aaData", _tasksToShow)
+  }
+  ret.put("recordsTotal", totalRecords)
+  ret.put("recordsFiltered", filteredRecords)
+  ret
+}
+  }
+
+  // Performs pagination on the server side
+  def doPagination(queryParameters: MultivaluedMap[String, String], 
stageId: Int,
+stageAttemptId: Int, isSearch: Boolean, totalRecords: Int): 
Seq[TaskData] = {
+val queryParams = queryParameters.keySet()
--- End diff --

not used remove


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...

2018-11-12 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/22144
  
I think we should add the support back.  It sounded like some people didn't 
like this PR for a fix so we would need to investigate something else, 
@cloud-fan  did you have more specifics about what is missing from the PR 
(other then a test) or what approach should be taken here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23013: [SPARK-25023] More detailed security guidance for K8S

2018-11-12 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/23013
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22288: [SPARK-22148][SPARK-15815][Scheduler] Acquire new execut...

2018-11-06 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/22288
  
merged to master and 2.4 branch, thanks @dhruve 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22288: [SPARK-22148][SPARK-15815][Scheduler] Acquire new execut...

2018-11-06 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/22288
  
+1 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22288: [SPARK-22148][SPARK-15815][Scheduler] Acquire new...

2018-11-02 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/22288#discussion_r230487066
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala ---
@@ -503,6 +507,181 @@ class TaskSchedulerImplSuite extends SparkFunSuite 
with LocalSparkContext with B
 verify(tsm).abort(anyString(), anyObject())
   }
 
+  test("SPARK-22148 abort timer should kick in when task is completely 
blacklisted & no new " +
+  "executor can be acquired") {
+// set the abort timer to fail immediately
+taskScheduler = setupSchedulerWithMockTaskSetBlacklist(
+  config.UNSCHEDULABLE_TASKSET_TIMEOUT.key -> "0")
+
+// We have only 1 task remaining with 1 executor
+val taskSet = FakeTask.createTaskSet(numTasks = 1, stageAttemptId = 0)
+taskScheduler.submitTasks(taskSet)
+val tsm = stageToMockTaskSetManager(0)
+
+// submit an offer with one executor
+val firstTaskAttempts = taskScheduler.resourceOffers(IndexedSeq(
+  WorkerOffer("executor0", "host0", 1)
+)).flatten
+
+// Fail the running task
+val failedTask = firstTaskAttempts.find(_.executorId == 
"executor0").get
+taskScheduler.statusUpdate(failedTask.taskId, TaskState.FAILED, 
ByteBuffer.allocate(0))
+// we explicitly call the handleFailedTask method here to avoid adding 
a sleep in the test suite
+// Reason being - handleFailedTask is run by an executor service and 
there is a momentary delay
+// before it is launched and this fails the assertion check.
+tsm.handleFailedTask(failedTask.taskId, TaskState.FAILED, 
UnknownReason)
+when(tsm.taskSetBlacklistHelperOpt.get.isExecutorBlacklistedForTask(
+  "executor0", failedTask.index)).thenReturn(true)
+
+// make an offer on the blacklisted executor.  We won't schedule 
anything, and set the abort
+// timer to kick in immediately
+assert(taskScheduler.resourceOffers(IndexedSeq(
+  WorkerOffer("executor0", "host0", 1)
+)).flatten.size === 0)
+// Wait for the abort timer to kick in. Without sleep the test exits 
before the timer is
+// triggered.
--- End diff --

Comment still out of date


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22852: [SPARK-25023] Clarify Spark security documentation

2018-11-02 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/22852
  
merged to master and 2.4


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21688: [SPARK-21809] : Change Stage Page to use datatables to s...

2018-11-01 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/21688
  
test this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22288: [SPARK-22148][SPARK-15815][Scheduler] Acquire new execut...

2018-10-31 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/22288
  
@dhruve  is this ready to review again?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22852: [SPARK-25023] Clarify Spark security documentation

2018-10-31 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/22852
  
If no other comments, I'll commit this?  I'll leave it open for a bit longer


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21688: [SPARK-21809] : Change Stage Page to use datatabl...

2018-10-30 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/21688#discussion_r229363078
  
--- Diff: core/src/main/resources/org/apache/spark/ui/static/stagepage.js 
---
@@ -0,0 +1,959 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+var blockUICount = 0;
+
+$(document).ajaxStop(function () {
+if (blockUICount == 0) {
+$.unblockUI();
+blockUICount++;
+}
+});
+
+$(document).ajaxStart(function () {
+if (blockUICount == 0) {
+$.blockUI({message: 'Loading Stage Page...'});
+}
+});
+
+$.extend( $.fn.dataTable.ext.type.order, {
+"duration-pre": ConvertDurationString,
+
+"duration-asc": function ( a, b ) {
+a = ConvertDurationString( a );
+b = ConvertDurationString( b );
+return ((a < b) ? -1 : ((a > b) ? 1 : 0));
+},
+
+"duration-desc": function ( a, b ) {
+a = ConvertDurationString( a );
+b = ConvertDurationString( b );
+return ((a < b) ? 1 : ((a > b) ? -1 : 0));
+}
+} );
+
+// This function will only parse the URL under certain format
+// e.g. (history) 
https://domain:50509/history/application_1536254569791_3806251/1/stages/stage/?id=4&attempt=1
+// e.g. (proxy) 
https://domain:50505/proxy/application_1502220952225_59143/stages/stage?id=4&attempt=1
+function stageEndPoint(appId) {
+var queryString = document.baseURI.split('?');
+var words = document.baseURI.split('/');
+var indexOfProxy = words.indexOf("proxy");
+var stageId = queryString[1].split("&").filter(word => 
word.includes("id="))[0].split("=")[1];
+if (indexOfProxy > 0) {
+var appId = words[indexOfProxy + 1];
+var newBaseURI = words.slice(0, words.indexOf("proxy") + 
2).join('/');
+return newBaseURI + "/api/v1/applications/" + appId + "/stages/" + 
stageId;
+}
+var indexOfHistory = words.indexOf("history");
+if (indexOfHistory > 0) {
+var appId = words[indexOfHistory + 1];
+var appAttemptId = words[indexOfHistory + 2];
+var newBaseURI = words.slice(0, 
words.indexOf("history")).join('/');
+if (isNaN(appAttemptId) || appAttemptId == "0") {
+return newBaseURI + "/api/v1/applications/" + appId + 
"/stages/" + stageId;
+} else {
+return newBaseURI + "/api/v1/applications/" + appId + "/" + 
appAttemptId + "/stages/" + stageId;
+}
+}
+return location.origin + "/api/v1/applications/" + appId + "/stages/" 
+ stageId;
+}
+
+function getColumnNameForTaskMetricSummary(columnKey) {
+switch(columnKey) {
+case "executorRunTime":
+return "Duration";
+break;
+
+case "jvmGcTime":
+return "GC Time";
+break;
+
+case "gettingResultTime":
+return "Getting Result Time";
+break;
+
+case "inputMetrics":
+return "Input Size / Records";
+break;
+
+case "outputMetrics":
+return "Output Size / Records";
+break;
+
+case "peakExecutionMemory":
+return "Peak Execution Memory";
+break;
+
+case "resultSerializationTime":
+return "Result Serialization Time";
+break;
+
+case "schedulerDelay":
+return "Scheduler Delay

[GitHub] spark issue #22852: [SPARK-25023] Clarify Spark security documentation

2018-10-30 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/22852
  
added sections to the resource manager sections. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22852: [SPARK-25023] Clarify Spark security documentation

2018-10-30 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/22852
  
I can add a note here for deployments here and then we can do version 
specific ones after


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22852: [SPARK-25023] Clarify Spark security documentation

2018-10-30 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/22852
  
Updated to have a section on security in the quickstart and overview, let 
me know what you think and if wording needs updated.  If this ok I can followup 
with something on the website


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21688: [SPARK-21809] : Change Stage Page to use datatabl...

2018-10-29 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/21688#discussion_r229002442
  
--- Diff: core/src/main/resources/org/apache/spark/ui/static/stagepage.js 
---
@@ -0,0 +1,944 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+var blockUICount = 0;
+
+$(document).ajaxStop(function () {
+if (blockUICount == 0) {
+$.unblockUI();
+blockUICount++;
+}
+});
+
+$(document).ajaxStart(function () {
+if (blockUICount == 0) {
+$.blockUI({message: 'Loading Stage Page...'});
+}
+});
+
+$.extend( $.fn.dataTable.ext.type.order, {
+"duration-pre": ConvertDurationString,
+
+"duration-asc": function ( a, b ) {
+a = ConvertDurationString( a );
+b = ConvertDurationString( b );
+return ((a < b) ? -1 : ((a > b) ? 1 : 0));
+},
+
+"duration-desc": function ( a, b ) {
+a = ConvertDurationString( a );
+b = ConvertDurationString( b );
+return ((a < b) ? 1 : ((a > b) ? -1 : 0));
+}
+} );
+
+// This function will only parse the URL under certain format
+// e.g. (history) 
https://domain:50509/history/application_1536254569791_3806251/1/stages/stage/?id=4&attempt=1
+// e.g. (proxy) 
https://domain:50505/proxy/application_1502220952225_59143/stages/stage?id=4&attempt=1
+function stageEndPoint(appId) {
+var queryString = document.baseURI.split('?');
+var words = document.baseURI.split('/');
+var indexOfProxy = words.indexOf("proxy");
+var stageId = queryString[1].split("&").filter(word => 
word.includes("id="))[0].split("=")[1];
+if (indexOfProxy > 0) {
+var appId = words[indexOfProxy + 1];
+var newBaseURI = words.slice(0, words.indexOf("proxy") + 
2).join('/');
+return newBaseURI + "/api/v1/applications/" + appId + "/stages/" + 
stageId;
+}
+var indexOfHistory = words.indexOf("history");
+if (indexOfHistory > 0) {
+var appId = words[indexOfHistory + 1];
+var appAttemptId = words[indexOfHistory + 2];
+var newBaseURI = words.slice(0, 
words.indexOf("history")).join('/');
+if (isNaN(appAttemptId) || appAttemptId == "0") {
+return newBaseURI + "/api/v1/applications/" + appId + 
"/stages/" + stageId;
+} else {
+return newBaseURI + "/api/v1/applications/" + appId + "/" + 
appAttemptId + "/stages/" + stageId;
+}
+}
+return location.origin + "/api/v1/applications/" + appId + "/stages/" 
+ stageId;
+}
+
+function getColumnNameForTaskMetricSummary(columnKey) {
+switch(columnKey) {
+case "executorRunTime":
+return "Duration";
+break;
+
+case "jvmGcTime":
+return "GC Time";
+break;
+
+case "gettingResultTime":
+return "Getting Result Time";
+break;
+
+case "inputMetrics":
+return "Input Size / Records";
+break;
+
+case "outputMetrics":
+return "Output Size / Records";
+break;
+
+case "peakExecutionMemory":
+return "Peak Execution Memory";
+break;
+
+case "resultSerializationTime":
+return "Result Serialization Time";
+break;
+
+case "schedulerDelay":
+return "Scheduler Delay

[GitHub] spark pull request #21688: [SPARK-21809] : Change Stage Page to use datatabl...

2018-10-29 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/21688#discussion_r228994601
  
--- Diff: core/src/main/resources/org/apache/spark/ui/static/stagepage.js 
---
@@ -0,0 +1,944 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+var blockUICount = 0;
+
+$(document).ajaxStop(function () {
+if (blockUICount == 0) {
+$.unblockUI();
+blockUICount++;
+}
+});
+
+$(document).ajaxStart(function () {
+if (blockUICount == 0) {
+$.blockUI({message: 'Loading Stage Page...'});
+}
+});
+
+$.extend( $.fn.dataTable.ext.type.order, {
+"duration-pre": ConvertDurationString,
+
+"duration-asc": function ( a, b ) {
+a = ConvertDurationString( a );
+b = ConvertDurationString( b );
+return ((a < b) ? -1 : ((a > b) ? 1 : 0));
+},
+
+"duration-desc": function ( a, b ) {
+a = ConvertDurationString( a );
+b = ConvertDurationString( b );
+return ((a < b) ? 1 : ((a > b) ? -1 : 0));
+}
+} );
+
+// This function will only parse the URL under certain format
+// e.g. (history) 
https://domain:50509/history/application_1536254569791_3806251/1/stages/stage/?id=4&attempt=1
+// e.g. (proxy) 
https://domain:50505/proxy/application_1502220952225_59143/stages/stage?id=4&attempt=1
+function stageEndPoint(appId) {
+var queryString = document.baseURI.split('?');
+var words = document.baseURI.split('/');
+var indexOfProxy = words.indexOf("proxy");
+var stageId = queryString[1].split("&").filter(word => 
word.includes("id="))[0].split("=")[1];
+if (indexOfProxy > 0) {
+var appId = words[indexOfProxy + 1];
+var newBaseURI = words.slice(0, words.indexOf("proxy") + 
2).join('/');
+return newBaseURI + "/api/v1/applications/" + appId + "/stages/" + 
stageId;
+}
+var indexOfHistory = words.indexOf("history");
+if (indexOfHistory > 0) {
+var appId = words[indexOfHistory + 1];
+var appAttemptId = words[indexOfHistory + 2];
+var newBaseURI = words.slice(0, 
words.indexOf("history")).join('/');
+if (isNaN(appAttemptId) || appAttemptId == "0") {
+return newBaseURI + "/api/v1/applications/" + appId + 
"/stages/" + stageId;
+} else {
+return newBaseURI + "/api/v1/applications/" + appId + "/" + 
appAttemptId + "/stages/" + stageId;
+}
+}
+return location.origin + "/api/v1/applications/" + appId + "/stages/" 
+ stageId;
+}
+
+function getColumnNameForTaskMetricSummary(columnKey) {
+switch(columnKey) {
+case "executorRunTime":
+return "Duration";
+break;
+
+case "jvmGcTime":
+return "GC Time";
+break;
+
+case "gettingResultTime":
+return "Getting Result Time";
+break;
+
+case "inputMetrics":
+return "Input Size / Records";
+break;
+
+case "outputMetrics":
+return "Output Size / Records";
+break;
+
+case "peakExecutionMemory":
+return "Peak Execution Memory";
+break;
+
+case "resultSerializationTime":
+return "Result Serialization Time";
+break;
+
+case "schedulerDelay":
+return "Scheduler Delay

[GitHub] spark pull request #21688: [SPARK-21809] : Change Stage Page to use datatabl...

2018-10-29 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/21688#discussion_r228982925
  
--- Diff: core/src/main/resources/org/apache/spark/ui/static/stagepage.js 
---
@@ -0,0 +1,944 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+var blockUICount = 0;
+
+$(document).ajaxStop(function () {
+if (blockUICount == 0) {
+$.unblockUI();
+blockUICount++;
+}
+});
+
+$(document).ajaxStart(function () {
+if (blockUICount == 0) {
+$.blockUI({message: 'Loading Stage Page...'});
+}
+});
+
+$.extend( $.fn.dataTable.ext.type.order, {
+"duration-pre": ConvertDurationString,
+
+"duration-asc": function ( a, b ) {
+a = ConvertDurationString( a );
+b = ConvertDurationString( b );
+return ((a < b) ? -1 : ((a > b) ? 1 : 0));
+},
+
+"duration-desc": function ( a, b ) {
+a = ConvertDurationString( a );
+b = ConvertDurationString( b );
+return ((a < b) ? 1 : ((a > b) ? -1 : 0));
+}
+} );
+
+// This function will only parse the URL under certain format
+// e.g. (history) 
https://domain:50509/history/application_1536254569791_3806251/1/stages/stage/?id=4&attempt=1
+// e.g. (proxy) 
https://domain:50505/proxy/application_1502220952225_59143/stages/stage?id=4&attempt=1
+function stageEndPoint(appId) {
+var queryString = document.baseURI.split('?');
+var words = document.baseURI.split('/');
+var indexOfProxy = words.indexOf("proxy");
+var stageId = queryString[1].split("&").filter(word => 
word.includes("id="))[0].split("=")[1];
+if (indexOfProxy > 0) {
+var appId = words[indexOfProxy + 1];
+var newBaseURI = words.slice(0, words.indexOf("proxy") + 
2).join('/');
+return newBaseURI + "/api/v1/applications/" + appId + "/stages/" + 
stageId;
+}
+var indexOfHistory = words.indexOf("history");
+if (indexOfHistory > 0) {
+var appId = words[indexOfHistory + 1];
+var appAttemptId = words[indexOfHistory + 2];
+var newBaseURI = words.slice(0, 
words.indexOf("history")).join('/');
+if (isNaN(appAttemptId) || appAttemptId == "0") {
+return newBaseURI + "/api/v1/applications/" + appId + 
"/stages/" + stageId;
+} else {
+return newBaseURI + "/api/v1/applications/" + appId + "/" + 
appAttemptId + "/stages/" + stageId;
+}
+}
+return location.origin + "/api/v1/applications/" + appId + "/stages/" 
+ stageId;
+}
+
+function getColumnNameForTaskMetricSummary(columnKey) {
+switch(columnKey) {
+case "executorRunTime":
+return "Duration";
+break;
+
+case "jvmGcTime":
+return "GC Time";
+break;
+
+case "gettingResultTime":
+return "Getting Result Time";
+break;
+
+case "inputMetrics":
+return "Input Size / Records";
+break;
+
+case "outputMetrics":
+return "Output Size / Records";
+break;
+
+case "peakExecutionMemory":
+return "Peak Execution Memory";
+break;
+
+case "resultSerializationTime":
+return "Result Serialization Time";
+break;
+
+case "schedulerDelay":
+return "Scheduler Delay

[GitHub] spark issue #22852: [SPARK-25023] Clarify Spark security documentation

2018-10-29 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/22852
  
I would be fine with adding it more places, including perhaps the overview 
page: http://spark.apache.org/docs/latest/ and quick start pages.

Perhaps we should agree upon the wording here first though.  I'm not 
exactly sure where this pr stands honestly.  @srowen  are you going to put up a 
different one with wording you prefer?

> If someone lands on this page, do they pretty easily come away with the 
impression they need to set spark.authenticate and network security if they 
care about security? 

Everyone reads text slightly different and I'm by no means a doc expert, so 
I'm definitely open to reword if there is consensus on it.  


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21588: [SPARK-24590][BUILD] Make Jenkins tests passed with hado...

2018-10-29 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/21588
  
Can you clarify what you mean by drop builtin metastore support?  Are you 
just saying users must always provide jars to use it or something more?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22288: [SPARK-22148][SPARK-15815][Scheduler] Acquire new...

2018-10-26 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/22288#discussion_r228678209
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
@@ -415,9 +420,54 @@ private[spark] class TaskSchedulerImpl(
 launchedAnyTask |= launchedTaskAtCurrentMaxLocality
   } while (launchedTaskAtCurrentMaxLocality)
 }
+
 if (!launchedAnyTask) {
-  taskSet.abortIfCompletelyBlacklisted(hostToExecutors)
+  taskSet.getCompletelyBlacklistedTaskIfAny(hostToExecutors) match 
{
+case Some(taskIndex) => // Returns the taskIndex which was 
unschedulable
+
+  // If the taskSet is unschedulable we try to find an 
existing idle blacklisted
+  // executor. If we cannot find one, we abort immediately. 
Else we kill the idle
+  // executor and kick off an abortTimer which if it doesn't 
schedule a task within the
+  // the timeout will abort the taskSet if we were unable to 
schedule any task from the
+  // taskSet.
+  // Note 1: We keep track of schedulability on a per taskSet 
basis rather than on a per
+  // task basis.
+  // Note 2: The taskSet can still be aborted when there are 
more than one idle
+  // blacklisted executors and dynamic allocation is on. This 
can happen when a killed
+  // idle executor isn't replaced in time by 
ExecutorAllocationManager as it relies on
+  // pending tasks and doesn't kill executors on idle 
timeouts, resulting in the abort
+  // timer to expire and abort the taskSet.
+  executorIdToRunningTaskIds.find(x => !isExecutorBusy(x._1)) 
match {
+case Some ((executorId, _)) =>
+  if (!unschedulableTaskSetToExpiryTime.contains(taskSet)) 
{
+blacklistTrackerOpt.foreach(blt => 
blt.killBlacklistedIdleExecutor(executorId))
+
+val timeout = 
conf.get(config.UNSCHEDULABLE_TASKSET_TIMEOUT) * 1000
+unschedulableTaskSetToExpiryTime(taskSet) = 
clock.getTimeMillis() + timeout
+logInfo(s"Waiting for $timeout ms for completely "
+  + s"blacklisted task to be schedulable again before 
aborting $taskSet.")
+abortTimer.schedule(
+  createUnschedulableTaskSetAbortTimer(taskSet, 
taskIndex), timeout)
+  }
+case _ => // Abort Immediately
+  logInfo("Cannot schedule any task because of complete 
blacklisting. No idle" +
+s" executors can be found to kill. Aborting $taskSet." 
)
+  taskSet.abortSinceCompletelyBlacklisted(taskIndex)
+  }
+case _ => // Do nothing if no tasks completely blacklisted.
+  }
+} else {
+  // We want to defer killing any taskSets as long as we have a 
non blacklisted executor
+  // which can be used to schedule a task from any active 
taskSets. This ensures that the
+  // job can make progress and if we encounter a flawed taskSet it 
will eventually either
+  // fail or abort due to being completely blacklisted.
--- End diff --

ok, yeah it seems like it would have to be very timing dependent that 
taskset1 never got a chance for that executor, really that would just be a 
normal indefinite postponement problem in the scheduler regardless of 
blacklisting.   I don't think with fifo its a problem as first taskset should 
always be first. With Fair scheduler perhaps it could but probably depends on 
much more specific scenario.   

I guess I'm ok with this if you are.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22852: [SPARK-25023] Clarify Spark security documentation

2018-10-26 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/22852
  
I would rather see someone more familiar with K8s that uses it document it. 
 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22852: [SPARK-25023] Clarify Spark security documentation

2018-10-26 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/22852
  
so I had filed a jira to update mesos docs more detail about security 
things (https://issues.apache.org/jira/browse/SPARK-25024) which I need to 
follow up on,  but I didn't file one for k8s.  It would be good to have one for 
k8s if its not clear as well.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22288: [SPARK-22148][SPARK-15815][Scheduler] Acquire new...

2018-10-26 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/22288#discussion_r228668756
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
@@ -415,9 +420,54 @@ private[spark] class TaskSchedulerImpl(
 launchedAnyTask |= launchedTaskAtCurrentMaxLocality
   } while (launchedTaskAtCurrentMaxLocality)
 }
+
 if (!launchedAnyTask) {
-  taskSet.abortIfCompletelyBlacklisted(hostToExecutors)
+  taskSet.getCompletelyBlacklistedTaskIfAny(hostToExecutors) match 
{
+case Some(taskIndex) => // Returns the taskIndex which was 
unschedulable
+
+  // If the taskSet is unschedulable we try to find an 
existing idle blacklisted
+  // executor. If we cannot find one, we abort immediately. 
Else we kill the idle
+  // executor and kick off an abortTimer which if it doesn't 
schedule a task within the
+  // the timeout will abort the taskSet if we were unable to 
schedule any task from the
+  // taskSet.
+  // Note 1: We keep track of schedulability on a per taskSet 
basis rather than on a per
+  // task basis.
+  // Note 2: The taskSet can still be aborted when there are 
more than one idle
+  // blacklisted executors and dynamic allocation is on. This 
can happen when a killed
+  // idle executor isn't replaced in time by 
ExecutorAllocationManager as it relies on
+  // pending tasks and doesn't kill executors on idle 
timeouts, resulting in the abort
+  // timer to expire and abort the taskSet.
+  executorIdToRunningTaskIds.find(x => !isExecutorBusy(x._1)) 
match {
+case Some ((executorId, _)) =>
+  if (!unschedulableTaskSetToExpiryTime.contains(taskSet)) 
{
+blacklistTrackerOpt.foreach(blt => 
blt.killBlacklistedIdleExecutor(executorId))
+
+val timeout = 
conf.get(config.UNSCHEDULABLE_TASKSET_TIMEOUT) * 1000
+unschedulableTaskSetToExpiryTime(taskSet) = 
clock.getTimeMillis() + timeout
+logInfo(s"Waiting for $timeout ms for completely "
+  + s"blacklisted task to be schedulable again before 
aborting $taskSet.")
+abortTimer.schedule(
+  createUnschedulableTaskSetAbortTimer(taskSet, 
taskIndex), timeout)
+  }
+case _ => // Abort Immediately
+  logInfo("Cannot schedule any task because of complete 
blacklisting. No idle" +
+s" executors can be found to kill. Aborting $taskSet." 
)
+  taskSet.abortSinceCompletelyBlacklisted(taskIndex)
+  }
+case _ => // Do nothing if no tasks completely blacklisted.
+  }
+} else {
+  // We want to defer killing any taskSets as long as we have a 
non blacklisted executor
+  // which can be used to schedule a task from any active 
taskSets. This ensures that the
+  // job can make progress and if we encounter a flawed taskSet it 
will eventually either
+  // fail or abort due to being completely blacklisted.
--- End diff --

Thanks for pointing this out, but if I'm reading the discussion properly, I 
don't think you will actually wait indefinitely.  Eventually you will either 
abort immediately or you should fail due to max number of task failures.  Let 
me know if I'm missing something from the scenario.

Lets say you have taskset1 that is blacklisted on all nodes (lets say we 
have 3). 3 cases can happen at this point:

- taskset 2 hasn't started, so it tries to kill an executor and starts the 
timer.  
- taskset 2 has started, if its running on all nodes then we abort 
immediately because no executors to kill to kill
- taskset 2 has started but its not running on all blacklisted nodes, then 
we will kill an executor

At this point lets say we didn't abort so we killed an executor.  Taskset 1 
will get a chance to run on the new executor and either work or have a task 
failure.  If it has a task failure and it gets blacklisted, we go back into the 
case above.  But the # of task failures gets one closer.

so it seems like eventually you would either abort immediately if there 
aren't any executors to kill or you would eventually fail with max number of 
task attempts.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22852: [SPARK-25023] Clarify Spark security documentation

2018-10-26 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/22852
  
we can add stronger wording for standalone if you want,  I know the text 
was recently updated (I believe by you) to have the below:

> For other resource managers, spark.authenticate.secret must be configured 
on each of the nodes. This secret will be shared by all the daemons and 
applications, so this deployment configuration is not as secure as the above, 
especially when considering multi-tenant clusters. In this configuration, a 
user with the secret can effectively impersonate any other user.

Do you have specific suggestion on where you want to put that?  The reason 
I didn't put stronger was because if you are running it in isolated one client 
environment then the authentication part via secret doesn't matter that much.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22852: [SPARK-25023] Clarify Spark security documentation

2018-10-26 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/22852
  
the intention is not a we told you so, its meant to grab their attention 
and to get people to think about it because in the end it is their 
responsibility in my opinion. 

I'm fine if you want to take a crack at listing a few things or adding a 
table of the bullets we have in the below sections.  I just want to make sure 
we also say that this is not a comprehensive list.  Many users who don't read 
all the docs look at the cheat sheet and stop there and think its 
comprehensive.  There is no way we could list all variances of peoples 
environments.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22852: [SPARK-25023] Clarify Spark security documentatio...

2018-10-26 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/22852#discussion_r228544730
  
--- Diff: docs/security.md ---
@@ -6,7 +6,20 @@ title: Security
 * This will become a table of contents (this text will be scraped).
 {:toc}
 
-# Spark RPC
+# Spark Security Overview
+
+Security in Spark is OFF by default. This could mean you are vulnerable to 
attack by default.
+Spark supports multiple deployments types and each one supports different 
levels of security. Not
+all deployment types will be secure in all environments and none are 
secure by default. Be
+sure to evaluate your environment, what Spark supports, and take the 
appropriate measure to secure
+your Spark deployment
--- End diff --

I thought about this but this is very specific to a users environment.  The 
rest of the doc has the things you listed.  We aren't responsible for securing 
other things in their environment so I wasn't wanting to make it sound like it 
was a comprehensive list.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22852: [SPARK-25023] Clarify Spark security documentation

2018-10-26 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/22852
  
@vanzin @srowen 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22852: [SPARK-25023] Clarify Spark security documentatio...

2018-10-26 Thread tgravescs
GitHub user tgravescs opened a pull request:

https://github.com/apache/spark/pull/22852

[SPARK-25023] Clarify Spark security documentation

## What changes were proposed in this pull request?

Clarify documentation about security.

## How was this patch tested?

None, just documentation

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tgravescs/spark SPARK-25023

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22852.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22852


commit 8b4aaf5a85f7f925baf7365283e950b9d7676a4b
Author: Thomas Graves 
Date:   2018-10-26T13:45:58Z

[SPARK-25023] Clarify Spark security documentation




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22771: [SPARK-25773][Core]Cancel zombie tasks in a result stage...

2018-10-25 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/22771
  
no other comments, looks good.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...

2018-10-24 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/22144
  
@cloud-fan 

I agree with you that most issues do not have to block a release but I 
don't agree with most of your criteria for making that decision.   I've already 
talked about the other points you said that I disagree with so no point in 
going over again. 

The one I agree goes into the decision is your point 2:
> It fails the job instead of returning wrong result

 I think at this point it makes sense to start discussion on dev list to 
make sure people are in sync and in fact there is no written policy that I am 
aware of.  I want to make sure we don't have hard rules that state things like  
"since it was a bug in previous release it shouldn't be a blocker now".Its 
always going to be at people discretion as I think its impossible to have exact 
rules for this, but that is part of what we trust committers and PMC members to 
do.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...

2018-10-24 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/22144
  
while I'm ok with not blocking 2.4 for this as well, not for many of the 
reasons stated though. Note the jira was filed a Major not a blocker. Based on 
the information we have, the impact on the number of users for this issue seems 
low, it doesn't seem to cause a correctness issue as it seems to fail when the 
issue is hit, the 2.4 release is far enough underway and has other things users 
are waiting on that we don't want to delay.  But I think we need to investigate 
more and make a decision what we are doing with it, if we find that this does 
have higher impact we can do a 2.4.1 and really we would want it in previous 
versions as well.

I think the overall decision has to be based on the impact of the issue.  
As far as I know we don't have any written rules about this, but perhaps we 
need some.   The ultimate decision is basically if the release vote passes.  If 
the PMC members pass it they think its sufficient as a release.

I do also agree with @markhamstra about our criteria for calling it a 
non-blocker.  We should not be making that decision based on if it was 
regression from only last previous release.

I do NOT agree with @cloud-fan on most of his points as to why this is ok.

"After all, this is a bug and a regression from previous releases, like 
other 1000 we've fixed before. "

I'll state this again, this should have very little to do with the decision 
on if its a blocker, if its a correctness bug we are going to ignore because 
its been wrong for multiple release, the answer is NO we better not.  Many 
people don't upgrade immediately so things are found right away or its a 
obscure thing that only happens occasionally so it takes time for it to be 
reported.I do agree that the time the issue has been around does go into 
the calculation of the impact though. 

 - is a hive compatibility bug. Spark fails to run some Hive UDAFs

Really ?  what does a hive compatibility bug have to do with anything?  We 
state in our docs we support hive UDFs and UDAFs.  You seem very unconcerned 
with this which concerns me ("hive compatibility is not that important to Spark 
at this point") . Was there an official decision to drop this?  If so please 
point it out as I would strongly -1 this, otherwise anyone making changes here 
should keep compatibility and our committers are the ones that should enforce 
this and make sure it happens.  This is the basics of api compatibility. If we 
drop support for this many users will be hurt.  Just because your particular 
users don't use this, others do and as a member of Apache you should be 
concerned with the community not just your companies users.

@cloud-fan  you are the one that removed the supportPartial flag here: 
https://issues.apache.org/jira/browse/SPARK-19060 so we were assuming you had 
some knowledge of the code in this area and might have the back ground on it.

@srowen  your statement here: "Dropping support for something in a minor 
release isn't crazy though."  also concerns me.  We should not be dropping 
features on purpose in minor releases. This again is an api compatibility 
thing.  Unless its developer api or experimental. Obviously things get dropped 
by accident but we should not be doing this on purpose. Otherwise why do we 
have minor vs major releases at all.   



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22771: [SPARK-25773][Core]Cancel zombie tasks in a resul...

2018-10-23 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/22771#discussion_r227503789
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
@@ -1364,6 +1385,19 @@ private[spark] class DAGScheduler(
   if (job.numFinished == job.numPartitions) {
 markStageAsFinished(resultStage)
 cleanupStateForJobAndIndependentStages(job)
+try { // cancelTasks will fail if a SchedulerBackend 
does not implement killTask
--- End diff --

comment "cancelTasks" no longer valid


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...

2018-10-23 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/22144
  
The fact that it isn't a new regression shouldn't determine if its a 
blocker,  there are lots of things you don't hit right away. 

The impact does affect if we decide if this is a blocker.  Here it is 
definitely impacting the data sketches folks, it was reported a long time ago, 
I'll check back with them to see if they have worked around the issue.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22771: [SPARK-25773][Core]Cancel zombie tasks in a result stage...

2018-10-23 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/22771
  
@markhamstra thanks for the reference, I was looking for some background on 
this.   I agree those are still issues like mentioned in SPARK-17064 but I 
don't think that directly impacts this.  We can at least try to abort the tasks 
and still honors the interrupt on cancel flag.  It seems like best case is 
things actually get killed and we free up resources, worst case seems to be 
that the task ignores the  interrupt and continues just like now.  I guess if 
the user code spawns other threads its possible that you clean up the main 
thread and leave other threads running, but short of killing the executor jvm I 
don't think there is a way around that. 
 We now have the task reaper functionality as well which at least gives the 
user some options.  

Do you have specific concerns where this would actually cause problems, 
there is a lot of discussion there so want to make sure I didn't miss 
something? In the jira, you mention "possibility of nodes being marked dead 
when a Task thread is interrupted".   What exactly do you mean by that?   Do 
you mean user code is badly handling and exiting the jvm? 
Reynold mentions something about storage clients not handling interrupts 
well, do you know if that means it was actually causing corruptions or was it 
just ignoring?

I didn't thoroughly go through the spark code to ensure it doesn't cause 
our resource accounting to get off. 






---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...

2018-10-23 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/22144
  
@cloud-fan  I disagree, it seems like you broke functionality that was 
there and you can't do that in a minor release.   3.0 would be fine to drop I 
think but we should fix it for 2.4, this PR was in a hope to get people to 
respond to see if the change was close to a fix, if we think this will work 
then he can add unit tests, otherwise I don't see a reason to waste time 
writing unit tests if the code changes are not going to be accepted.

If I'm missing something please explain as you state this is a new feature 
but it certainly seems like a broken feature to me.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21688: [SPARK-21809] : Change Stage Page to use datatabl...

2018-10-22 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/21688#discussion_r227117575
  
--- Diff: core/src/main/scala/org/apache/spark/status/AppStatusStore.scala 
---
@@ -547,6 +519,24 @@ private[spark] class AppStatusStore(
 store.close()
   }
 
+  def interceptAndModifyTaskData(taskDataWrapper: TaskDataWrapper) : 
v1.TaskData = {
--- End diff --

rename this function constructTaskData


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21688: [SPARK-21809] : Change Stage Page to use datatabl...

2018-10-22 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/21688#discussion_r227116341
  
--- Diff: core/src/main/resources/org/apache/spark/ui/static/stagepage.js 
---
@@ -52,22 +52,20 @@ $.extend( $.fn.dataTable.ext.type.order, {
 function stageEndPoint(appId) {
 var urlRegex = 
/https\:\/\/[^\/]+\/([^\/]+)\/([^\/]+)\/([^\/]+)?\/?([^\/]+)?\/?([^\/]+)?\/?([^\/]+)?/gm;
 var urlArray = urlRegex.exec(document.baseURI);
-var ind = urlArray.indexOf("proxy");
+var indexOfProxy = urlArray.indexOf("proxy");
 var queryString = document.baseURI.split('?');
 var words = document.baseURI.split('/');
 var stageId = queryString[1].split("&").filter(word => 
word.includes("id="))[0].split("=")[1];
-if (ind > 0) {
+if (indexOfProxy > 0) {
 var appId = urlArray[2];
-var indexOfProxy = words.indexOf("proxy");
-var newBaseURI = words.slice(0, indexOfProxy + 2).join('/');
+var newBaseURI = words.slice(0, words.indexOf("proxy") + 
2).join('/');
 return newBaseURI + "/api/v1/applications/" + appId + "/stages/" + 
stageId;
 }
-ind = urlArray.indexOf("history");
-if (ind > 0) {
+var indexOfHistory = urlArray.indexOf("history");
+if (indexOfHistory > 0) {
 var appId = urlArray[2];
-var appAttemptId = urlArray[ind + 2];
-var indexOfHistory = words.indexOf("history");
-var newBaseURI = words.slice(0, indexOfHistory).join('/');
+var appAttemptId = urlArray[indexOfHistory + 2];
--- End diff --

same here just use words


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21688: [SPARK-21809] : Change Stage Page to use datatabl...

2018-10-22 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/21688#discussion_r227116251
  
--- Diff: core/src/main/resources/org/apache/spark/ui/static/stagepage.js 
---
@@ -52,22 +52,20 @@ $.extend( $.fn.dataTable.ext.type.order, {
 function stageEndPoint(appId) {
 var urlRegex = 
/https\:\/\/[^\/]+\/([^\/]+)\/([^\/]+)\/([^\/]+)?\/?([^\/]+)?\/?([^\/]+)?\/?([^\/]+)?/gm;
 var urlArray = urlRegex.exec(document.baseURI);
-var ind = urlArray.indexOf("proxy");
+var indexOfProxy = urlArray.indexOf("proxy");
--- End diff --

we seems to be using urlArray and then words, we don't really need both.  
Just get everything form words if that is what you need the format in.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22504: [SPARK-25118][Submit] Persist Driver Logs in Clie...

2018-10-22 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/22504#discussion_r227036915
  
--- Diff: docs/configuration.md ---
@@ -266,6 +266,41 @@ of the most common options to set are:
 Only has effect in Spark standalone mode or Mesos cluster deploy mode.
   
 
+
+  spark.driver.log.dfsDir
+  (none)
+  
+Base directory in which Spark driver logs are synced, if 
spark.driver.log.persistToDfs.enabled
+is true. Within this base directory, Spark creates a sub-directory for 
each application, and logs the driver
+logs specific to the application in this directory. Users may want to 
set this to a unified location like an
+HDFS directory so driver log files can be persisted for later usage. 
This directory should allow any Spark
+user to read/write files and the Spark History Server user to delete 
files. Additionally, older logs from
--- End diff --

we should add something about this to the security doc with specific 
information on permissions, like for event logging: 
https://spark.apache.org/docs/latest/security.html


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22288: [SPARK-22148][SPARK-15815][Scheduler] Acquire new...

2018-10-22 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/22288#discussion_r226999715
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
@@ -415,9 +420,54 @@ private[spark] class TaskSchedulerImpl(
 launchedAnyTask |= launchedTaskAtCurrentMaxLocality
   } while (launchedTaskAtCurrentMaxLocality)
 }
+
 if (!launchedAnyTask) {
-  taskSet.abortIfCompletelyBlacklisted(hostToExecutors)
+  taskSet.getCompletelyBlacklistedTaskIfAny(hostToExecutors) match 
{
+case Some(taskIndex) => // Returns the taskIndex which was 
unschedulable
+
+  // If the taskSet is unschedulable we try to find an 
existing idle blacklisted
+  // executor. If we cannot find one, we abort immediately. 
Else we kill the idle
+  // executor and kick off an abortTimer which if it doesn't 
schedule a task within the
+  // the timeout will abort the taskSet if we were unable to 
schedule any task from the
+  // taskSet.
+  // Note 1: We keep track of schedulability on a per taskSet 
basis rather than on a per
+  // task basis.
+  // Note 2: The taskSet can still be aborted when there are 
more than one idle
+  // blacklisted executors and dynamic allocation is on. This 
can happen when a killed
+  // idle executor isn't replaced in time by 
ExecutorAllocationManager as it relies on
+  // pending tasks and doesn't kill executors on idle 
timeouts, resulting in the abort
+  // timer to expire and abort the taskSet.
+  executorIdToRunningTaskIds.find(x => !isExecutorBusy(x._1)) 
match {
+case Some (x) =>
+  val executorId = x._1
+  if (!unschedulableTaskSetToExpiryTime.contains(taskSet)) 
{
+blacklistTrackerOpt.foreach(blt => 
blt.killBlacklistedIdleExecutor(executorId))
+
+unschedulableTaskSetToExpiryTime(taskSet) = 
clock.getTimeMillis()
+val timeout = 
conf.get(config.UNSCHEDULABLE_TASKSET_TIMEOUT) * 1000
+logInfo(s"Waiting for $timeout ms for completely "
+  + s"blacklisted task to be schedulable again before 
aborting $taskSet.")
+abortTimer.schedule(getAbortTimer(taskSet, taskIndex, 
timeout), timeout)
+  }
+case _ => // Abort Immediately
+  logInfo("Cannot schedule any task because of complete 
blacklisting. No idle" +
+s" executors can be found to kill. Aborting $taskSet." 
)
+  taskSet.abortSinceCompletelyBlacklisted(taskIndex)
+  }
+case _ => // Do nothing if no tasks completely blacklisted.
+  }
+} else {
+  // We want to differ killing any taskSets as long as we have a 
non blacklisted executor
--- End diff --

s/differ/defer/


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22288: [SPARK-22148][SPARK-15815][Scheduler] Acquire new...

2018-10-22 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/22288#discussion_r227005479
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala ---
@@ -503,6 +505,145 @@ class TaskSchedulerImplSuite extends SparkFunSuite 
with LocalSparkContext with B
 verify(tsm).abort(anyString(), anyObject())
   }
 
+  test("SPARK-22148 abort timer should kick in when task is completely 
blacklisted & no new " +
+"executor can be acquired") {
+// set the abort timer to fail immediately
+taskScheduler = setupSchedulerWithMockTaskSetBlacklist(
+  config.UNSCHEDULABLE_TASKSET_TIMEOUT.key -> "0")
+
+// We have only 1 task remaining with 1 executor
+val taskSet = FakeTask.createTaskSet(numTasks = 1, stageAttemptId = 0)
+taskScheduler.submitTasks(taskSet)
+val tsm = stageToMockTaskSetManager(0)
+
+// submit an offer with one executor
+val firstTaskAttempts = taskScheduler.resourceOffers(IndexedSeq(
+  WorkerOffer("executor0", "host0", 1)
+)).flatten
+
+// Fail the running task
+val failedTask = firstTaskAttempts.find(_.executorId == 
"executor0").get
+taskScheduler.statusUpdate(failedTask.taskId, TaskState.FAILED, 
ByteBuffer.allocate(0))
+// we explicitly call the handleFailedTask method here to avoid adding 
a sleep in the test suite
+// Reason being - handleFailedTask is run by an executor service and 
there is a momentary delay
+// before it is launched and this fails the assertion check.
+tsm.handleFailedTask(failedTask.taskId, TaskState.FAILED, 
UnknownReason)
+when(tsm.taskSetBlacklistHelperOpt.get.isExecutorBlacklistedForTask(
+  "executor0", failedTask.index)).thenReturn(true)
+
+// make an offer on the blacklisted executor.  We won't schedule 
anything, and set the abort
+// timer to kick in immediately
+assert(taskScheduler.resourceOffers(IndexedSeq(
+  WorkerOffer("executor0", "host0", 1)
+)).flatten.size === 0)
+// Wait for the abort timer to kick in. Without sleep the test exits 
before the timer is
+// triggered.
+Thread.sleep(500)
--- End diff --

instead of sleep could you do something like, note I haven't used 
eventually and the scaladoc seems to be down but its used in other places like 
SparkContextSuite

eventually(timeout(1.seconds)) {
assert(tsm.isZombie)
  }


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22288: [SPARK-22148][SPARK-15815][Scheduler] Acquire new...

2018-10-22 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/22288#discussion_r226999389
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
@@ -453,6 +503,22 @@ private[spark] class TaskSchedulerImpl(
 return tasks
   }
 
+  private def getAbortTimer(taskSet: TaskSetManager, taskIndex: Int, 
timeout: Long): TimerTask = {
--- End diff --

perhaps rename to createUnschedulableTaskSetAbortTimer


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22288: [SPARK-22148][SPARK-15815][Scheduler] Acquire new...

2018-10-22 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/22288#discussion_r227000658
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
@@ -415,9 +420,54 @@ private[spark] class TaskSchedulerImpl(
 launchedAnyTask |= launchedTaskAtCurrentMaxLocality
   } while (launchedTaskAtCurrentMaxLocality)
 }
+
 if (!launchedAnyTask) {
-  taskSet.abortIfCompletelyBlacklisted(hostToExecutors)
+  taskSet.getCompletelyBlacklistedTaskIfAny(hostToExecutors) match 
{
+case Some(taskIndex) => // Returns the taskIndex which was 
unschedulable
+
+  // If the taskSet is unschedulable we try to find an 
existing idle blacklisted
+  // executor. If we cannot find one, we abort immediately. 
Else we kill the idle
+  // executor and kick off an abortTimer which if it doesn't 
schedule a task within the
+  // the timeout will abort the taskSet if we were unable to 
schedule any task from the
+  // taskSet.
+  // Note 1: We keep track of schedulability on a per taskSet 
basis rather than on a per
+  // task basis.
+  // Note 2: The taskSet can still be aborted when there are 
more than one idle
+  // blacklisted executors and dynamic allocation is on. This 
can happen when a killed
+  // idle executor isn't replaced in time by 
ExecutorAllocationManager as it relies on
+  // pending tasks and doesn't kill executors on idle 
timeouts, resulting in the abort
+  // timer to expire and abort the taskSet.
+  executorIdToRunningTaskIds.find(x => !isExecutorBusy(x._1)) 
match {
+case Some (x) =>
+  val executorId = x._1
+  if (!unschedulableTaskSetToExpiryTime.contains(taskSet)) 
{
+blacklistTrackerOpt.foreach(blt => 
blt.killBlacklistedIdleExecutor(executorId))
+
+unschedulableTaskSetToExpiryTime(taskSet) = 
clock.getTimeMillis()
--- End diff --

 just put in the actual expiry time here rather then the current time


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22288: [SPARK-22148][SPARK-15815][Scheduler] Acquire new execut...

2018-10-22 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/22288
  
test this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22725: [SPARK-25753][CORE]fix reading small files via BinaryFil...

2018-10-22 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/22725
  
merged to master


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22771: [SPARK-25773][Core]Cancel zombie tasks in a result stage...

2018-10-19 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/22771
  
Actually as part of the jiras I mentioned above we were looking at killing 
other task attempts  as soon as one task attempt succeeds rather then waiting 
for the entire job to finish, thoughts on that, it does require a new api?  We 
can put up a pr next week for that.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22771: [SPARK-25773][Core]Cancel zombie tasks in a resul...

2018-10-19 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/22771#discussion_r226651308
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
@@ -1364,6 +1385,16 @@ private[spark] class DAGScheduler(
   if (job.numFinished == job.numPartitions) {
 markStageAsFinished(resultStage)
 cleanupStateForJobAndIndependentStages(job)
+try { // cancelTasks will fail if a SchedulerBackend 
does not implement killTask
+  logInfo(
+s"Job ${job.jobId} is finished. Killing 
speculative tasks for this job")
--- End diff --

message should be updated as this should be more then speculative tasks as 
it could be tasks in other attempts.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22771: [SPARK-25773][Core]Cancel zombie tasks in a result stage...

2018-10-19 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/22771
  
the change itself seems reasonable we were actually looking at this 
recently as there are a couple of others bugs around this: 
https://issues.apache.org/jira/browse/SPARK-25250 and 
https://issues.apache.org/jira/browse/SPARK-24622 (probably dup of this jira).

I have to think about it a bit more but I'm not sure this one solves 
https://issues.apache.org/jira/browse/SPARK-25250 .  I think that one requires 
tasks to be marked as successful in other tasks sets so that can be a separate 
jira.

My only other question is why is the interrupt flag only associated with a 
job group?  Seems like this could be a global. Does anyone know the history on 
this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22288: [SPARK-22148][SPARK-15815][Scheduler] Acquire new...

2018-10-17 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/22288#discussion_r225998426
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
@@ -415,9 +420,65 @@ private[spark] class TaskSchedulerImpl(
 launchedAnyTask |= launchedTaskAtCurrentMaxLocality
   } while (launchedTaskAtCurrentMaxLocality)
 }
+
 if (!launchedAnyTask) {
-  taskSet.abortIfCompletelyBlacklisted(hostToExecutors)
+  taskSet.getCompletelyBlacklistedTaskIfAny(hostToExecutors) match 
{
+case taskIndex: Some[Int] => // Returns the taskIndex which 
was unschedulable
+
+  // If the taskSet is unschedulable we try to find an 
existing idle blacklisted
+  // executor. If we cannot find one, we abort immediately. 
Else we kill the idle
+  // executor and kick off an abortTimer which if it doesn't 
schedule a task within the
+  // the timeout will abort the taskSet if we were unable to 
schedule any task from the
+  // taskSet.
+  // Note 1: We keep track of schedulability on a per taskSet 
basis rather than on a per
+  // task basis.
+  // Note 2: The taskSet can still be aborted when there are 
more than one idle
+  // blacklisted executors and dynamic allocation is on. This 
can happen when a killed
+  // idle executor isn't replaced in time by 
ExecutorAllocationManager as it relies on
+  // pending tasks and doesn't kill executors on idle 
timeouts, resulting in the abort
+  // timer to expire and abort the taskSet.
+  executorIdToRunningTaskIds.find(x => !isExecutorBusy(x._1)) 
match {
+case Some (x) =>
+  val executorId = x._1
+  if (!unschedulableTaskSetToExpiryTime.contains(taskSet)) 
{
+blacklistTrackerOpt.foreach(blt => 
blt.killBlacklistedIdleExecutor(executorId))
+
+unschedulableTaskSetToExpiryTime(taskSet) = 
clock.getTimeMillis()
+val timeout = 
conf.get(config.UNSCHEDULABLE_TASKSET_TIMEOUT) * 1000
+logInfo(s"Waiting for $timeout ms for completely "
+  + s"blacklisted task to be schedulable again before 
aborting $taskSet.")
+abortTimer.schedule(new TimerTask() {
+  override def run() {
+if 
(unschedulableTaskSetToExpiryTime.contains(taskSet) &&
+  (unschedulableTaskSetToExpiryTime(taskSet) + 
timeout)
+<= clock.getTimeMillis()
+) {
+  logInfo("Cannot schedule any task because of 
complete blacklisting. " +
+s"Wait time for scheduling expired. Aborting 
$taskSet.")
+  
taskSet.abortSinceCompletelyBlacklisted(taskIndex.get)
+} else {
+  this.cancel()
+}
+  }
+}, timeout)
+  }
+case _ => // Abort Immediately
+  logInfo("Cannot schedule any task because of complete 
blacklisting. No idle" +
+  s" executors can be found to kill. Aborting $taskSet." )
--- End diff --

```suggestion
s" executors can be found to kill. Aborting $taskSet." )
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22725: [SPARK-25753][[CORE][FOLLOW-UP]fix reading small files v...

2018-10-17 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/22725
  
+1 Looks good, thanks @10110346 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22725: [SPARK-24610][[CORE][FOLLOW-UP]fix reading small files v...

2018-10-16 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/22725
  
SPARK-24610 is the original issue, please file a new jira for 
StreamFileInputFormat


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22288: [SPARK-22148][SPARK-15815][Scheduler] Acquire new...

2018-10-15 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/22288#discussion_r225198423
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala ---
@@ -503,6 +505,89 @@ class TaskSchedulerImplSuite extends SparkFunSuite 
with LocalSparkContext with B
 verify(tsm).abort(anyString(), anyObject())
   }
 
+  test("SPARK-22148 abort timer should kick in when task is completely 
blacklisted & no new " +
+"executor can be acquired") {
+// set the abort timer to fail immediately
+taskScheduler = setupSchedulerWithMockTaskSetBlacklist(
+  config.UNSCHEDULABLE_TASKSET_TIMEOUT.key -> "0")
+
+// We have only 1 task remaining with 1 executor
+val taskSet = FakeTask.createTaskSet(numTasks = 1, stageAttemptId = 0)
+taskScheduler.submitTasks(taskSet)
+val tsm = stageToMockTaskSetManager(0)
+
+// submit an offer with one executor
+val firstTaskAttempts = taskScheduler.resourceOffers(IndexedSeq(
+  WorkerOffer("executor0", "host0", 1)
+)).flatten
+
+// Fail the running task
+val failedTask = firstTaskAttempts.find(_.executorId == 
"executor0").get
+taskScheduler.statusUpdate(
+  tid = failedTask.taskId,
+  state = TaskState.FAILED,
+  serializedData = ByteBuffer.allocate(0)
+)
+// Wait for the failed task to propagate.
+Thread.sleep(500)
+
+
when(stageToMockTaskSetBlacklist(0).isExecutorBlacklistedForTask("executor0", 
failedTask.index))
--- End diff --

Can we perhaps mock the blacklisted tracker to mark it as blacklisted to 
get rid of the sleep here?  


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22288: [SPARK-22148][SPARK-15815][Scheduler] Acquire new...

2018-10-15 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/22288#discussion_r225193634
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
@@ -415,9 +419,61 @@ private[spark] class TaskSchedulerImpl(
 launchedAnyTask |= launchedTaskAtCurrentMaxLocality
   } while (launchedTaskAtCurrentMaxLocality)
 }
+
 if (!launchedAnyTask) {
-  taskSet.abortIfCompletelyBlacklisted(hostToExecutors)
+  taskSet.getCompletelyBlacklistedTaskIfAny(hostToExecutors) match 
{
+case taskIndex: Some[Int] => // Returns the taskIndex which 
was unschedulable
+
+  // If the taskSet is unschedulable we try to find an 
existing idle blacklisted
+  // executor. If we cannot find one, we abort immediately. 
Else we kill the idle
+  // executor and kick off an abortTimer which if it doesn't 
schedule a task within the
+  // the timeout will abort the taskSet if we were unable to 
schedule any task from the
+  // taskSet.
+  // Note 1: We keep track of schedulability on a per taskSet 
basis rather than on a per
+  // task basis.
+  // Note 2: The taskSet can still be aborted when there are 
more than one idle
+  // blacklisted executors and dynamic allocation is on. This 
can happen when a killed
+  // idle executor isn't replaced in time by 
ExecutorAllocationManager as it relies on
+  // pending tasks and doesn't kill executors on idle 
timeouts, resulting in the abort
+  // timer to expire and abort the taskSet.
+  executorIdToRunningTaskIds.find(x => !isExecutorBusy(x._1)) 
match {
+case Some (x) =>
+  val executorId = x._1
+  if (!unschedulableTaskSetToExpiryTime.contains(taskSet)) 
{
+blacklistTrackerOpt.foreach(blt => 
blt.killBlacklistedIdleExecutor(executorId))
+
+unschedulableTaskSetToExpiryTime(taskSet) = 
clock.getTimeMillis()
+val timeout = 
conf.get(config.UNSCHEDULABLE_TASKSET_TIMEOUT) * 1000
+logInfo(s"Waiting for $timeout ms for completely "
+  + s"blacklisted task to be schedulable again before 
aborting $taskSet.")
+abortTimer.schedule(new TimerTask() {
+  override def run() {
+if 
(unschedulableTaskSetToExpiryTime.contains(taskSet) &&
+  (unschedulableTaskSetToExpiryTime(taskSet) + 
timeout)
+<= clock.getTimeMillis()
+) {
+  logInfo("Cannot schedule any task because of 
complete blacklisting. " +
+s"Wait time for scheduling expired. Aborting 
$taskSet.")
+  
taskSet.abortSinceCompletelyBlacklisted(taskIndex.get)
+} else {
+  this.cancel()
+}
+  }
+}, timeout)
+  }
+case _ => // Abort Immediately
+  logInfo("Cannot schedule any task because of complete 
blacklisting. No idle" +
+  s" executors can be found to kill. Aborting $taskSet." )
+  taskSet.abortSinceCompletelyBlacklisted(taskIndex.get)
+  }
+case _ => // Do nothing if no tasks completely blacklisted.
+  }
+} else {
+  // If a task was scheduled, we clear the expiry time for the 
taskSet. The abort timer
+  // checks this entry to decide if we want to abort the taskSet.
+  unschedulableTaskSetToExpiryTime.remove(taskSet)
--- End diff --

Here we have to handle the situation where if you have 2 tasksets, they may 
have both chose the same executor to kill.  If one of the tasksets kills the 
executor and launches a task it clears it expiry, but if a second taskset had 
tried to kill the same executor we don't clear it and it could end up aborting 
the second taskset and killing the job even though it shouldn't have


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22288: [SPARK-22148][SPARK-15815][Scheduler] Acquire new...

2018-10-15 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/22288#discussion_r225185389
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/config/package.scala ---
@@ -597,6 +597,16 @@ package object config {
   .checkValue(v => v > 0, "The value should be a positive time value.")
   .createWithDefaultString("365d")
 
+  // Threshold above which we abort the TaskSet if a task could not be 
scheduled because of complete
--- End diff --

I don't think we need the extra comment, the doc section should be 
sufficient


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21688: [SPARK-21809] : Change Stage Page to use datatabl...

2018-10-12 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/21688#discussion_r224864545
  
--- Diff: core/src/main/scala/org/apache/spark/status/LiveEntity.scala ---
@@ -341,7 +341,9 @@ private class LiveExecutorStageSummary(
   metrics.shuffleWriteMetrics.recordsWritten,
   metrics.memoryBytesSpilled,
   metrics.diskBytesSpilled,
-  isBlacklisted)
+  isBlacklisted,
--- End diff --

We can look into it more, I can't image its that hard to join, its just 
that much more data that is going over the wire, you are sending the entire 
ExecutorSummary for all executors when you really just need 2 fields out of it 
of some executors.  Previously this was happening on the driver side so it 
didn't have to transfer the data.  We can look at the data size and if it 
doesn't seem to bad we can do that for now and can always revisit if needed.  
The stage page executor table which uses this I wouldn't expect to be as used 
as the task one so probably not bad for now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21688: [SPARK-21809] : Change Stage Page to use datatabl...

2018-10-12 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/21688#discussion_r224850397
  
--- Diff: core/src/main/scala/org/apache/spark/status/LiveEntity.scala ---
@@ -341,7 +341,9 @@ private class LiveExecutorStageSummary(
   metrics.shuffleWriteMetrics.recordsWritten,
   metrics.memoryBytesSpilled,
   metrics.diskBytesSpilled,
-  isBlacklisted)
+  isBlacklisted,
--- End diff --

@vanzin ideas on how to better handle this?

I don't see a real clean way to populate these fields from the 
AppstatusListener before being written.For context, in this PR these are 
currently being populated in the AppStatusStore.executorSummary call before 
going back to user.

We could potentially split into separate api or on UI side query both that 
and the executor info and join but seems like a lot more data.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21688: [SPARK-21809] : Change Stage Page to use datatabl...

2018-10-10 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/21688#discussion_r224251226
  
--- Diff: core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala ---
@@ -278,7 +198,7 @@ private[ui] class StagePage(parent: StagesTab, store: 
AppStatusStore) extends We
   }
 }
 val currentTime = System.currentTimeMillis()
-val (taskTable, taskTableHTML) = try {
+val taskTable = try {
   val _taskTable = new TaskPagedTable(
--- End diff --

ok might be something we should look at closer to see if we want to clean 
up and use the rest api for 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21688: [SPARK-21809] : Change Stage Page to use datatabl...

2018-10-10 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/21688#discussion_r224239984
  
--- Diff: 
core/src/main/scala/org/apache/spark/status/api/v1/StagesResource.scala ---
@@ -102,4 +103,124 @@ private[v1] class StagesResource extends 
BaseAppResource {
 withUI(_.store.taskList(stageId, stageAttemptId, offset, length, 
sortBy))
   }
 
+  // This api needs to stay formatted exactly as it is below, since, it is 
being used by the
+  // datatables for the stages page.
+  @GET
+  @Path("{stageId: \\d+}/{stageAttemptId: \\d+}/taskTable")
+  def taskTable(
+@PathParam("stageId") stageId: Int,
+@PathParam("stageAttemptId") stageAttemptId: Int,
+@QueryParam("details") @DefaultValue("true") details: Boolean,
+@Context uriInfo: UriInfo):
+  HashMap[String, Object] = {
+withUI { ui =>
+  val uriQueryParameters = uriInfo.getQueryParameters(true)
+  val totalRecords = uriQueryParameters.getFirst("numTasks")
+  var isSearch = false
+  var searchValue: String = null
+  var filteredRecords = totalRecords
+  var _tasksToShow: Seq[TaskData] = null
+  if (uriQueryParameters.getFirst("search[value]") != null &&
+uriQueryParameters.getFirst("search[value]").length > 0) {
--- End diff --

can we easily point somewhere for the datatable api so if someone tries to 
read this they can see the values?  Or put an example here


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21688: [SPARK-21809] : Change Stage Page to use datatabl...

2018-10-10 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/21688#discussion_r224238974
  
--- Diff: core/src/main/scala/org/apache/spark/status/LiveEntity.scala ---
@@ -341,7 +341,9 @@ private class LiveExecutorStageSummary(
   metrics.shuffleWriteMetrics.recordsWritten,
   metrics.memoryBytesSpilled,
   metrics.diskBytesSpilled,
-  isBlacklisted)
+  isBlacklisted,
--- End diff --

these are actually going to be written to the tracking store.  we don't 
want that if the values aren't set.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21688: [SPARK-21809] : Change Stage Page to use datatabl...

2018-10-10 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/21688#discussion_r224221297
  
--- Diff: core/src/main/scala/org/apache/spark/status/AppStatusStore.scala 
---
@@ -349,7 +349,23 @@ private[spark] class AppStatusStore(
   def taskList(stageId: Int, stageAttemptId: Int, maxTasks: Int): 
Seq[v1.TaskData] = {
 val stageKey = Array(stageId, stageAttemptId)
 
store.view(classOf[TaskDataWrapper]).index("stage").first(stageKey).last(stageKey).reverse()
-  .max(maxTasks).asScala.map(_.toApi).toSeq.reverse
+  .max(maxTasks).asScala.map { taskDataWrapper =>
+  val taskDataOld: v1.TaskData = taskDataWrapper.toApi
+  val executorLogs: Option[Map[String, String]] = try {
+Some(executorSummary(taskDataOld.executorId).executorLogs)
+  } catch {
+case e: NoSuchElementException => e.getMessage
+  None
+  }
+  new v1.TaskData(taskDataOld.taskId, taskDataOld.index,
+taskDataOld.attempt, taskDataOld.launchTime, 
taskDataOld.resultFetchStart,
+taskDataOld.duration, taskDataOld.executorId, taskDataOld.host, 
taskDataOld.status,
+taskDataOld.taskLocality, taskDataOld.speculative, 
taskDataOld.accumulatorUpdates,
+taskDataOld.errorMessage, taskDataOld.taskMetrics,
+executorLogs.getOrElse(Map[String, String]()),
+AppStatusUtils.schedulerDelay(taskDataOld),
+AppStatusUtils.gettingResultTime(taskDataOld))
--- End diff --

make a helper function to do this since the same between the taskList 
functions


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21688: [SPARK-21809] : Change Stage Page to use datatabl...

2018-10-10 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/21688#discussion_r224217796
  
--- Diff: core/src/main/resources/org/apache/spark/ui/static/utils.js ---
@@ -74,3 +74,99 @@ function getTimeZone() {
 return new Date().toString().match(/\((.*)\)/)[1];
   }
 }
+
+function formatLogsCells(execLogs, type) {
+  if (type !== 'display') return Object.keys(execLogs);
+  if (!execLogs) return;
+  var result = '';
+  $.each(execLogs, function (logName, logUrl) {
+result += '' + logName + ''
+  });
+  return result;
+}
+
+function getStandAloneAppId(cb) {
+  var words = document.baseURI.split('/');
+  var ind = words.indexOf("proxy");
+  if (ind > 0) {
+var appId = words[ind + 1];
+cb(appId);
+return;
+  }
+  ind = words.indexOf("history");
+  if (ind > 0) {
+var appId = words[ind + 1];
+cb(appId);
+return;
+  }
+  //Looks like Web UI is running in standalone mode
--- End diff --

add a space after //


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21688: [SPARK-21809] : Change Stage Page to use datatabl...

2018-10-10 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/21688#discussion_r224178567
  
--- Diff: core/src/main/resources/org/apache/spark/ui/static/stagepage.js 
---
@@ -0,0 +1,872 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+var blockUICount = 0;
+
+$(document).ajaxStop(function () {
+if (blockUICount == 0) {
+$.unblockUI();
+blockUICount++;
+}
+});
+
+$(document).ajaxStart(function () {
+if (blockUICount == 0) {
+$.blockUI({message: 'Loading Stage Page...'});
+}
+});
+
+$.extend( $.fn.dataTable.ext.type.order, {
+"duration-pre": ConvertDurationString,
+
+"duration-asc": function ( a, b ) {
+a = ConvertDurationString( a );
+b = ConvertDurationString( b );
+return ((a < b) ? -1 : ((a > b) ? 1 : 0));
+},
+
+"duration-desc": function ( a, b ) {
+a = ConvertDurationString( a );
+b = ConvertDurationString( b );
+return ((a < b) ? 1 : ((a > b) ? -1 : 0));
+}
+} );
+
+// This function will only parse the URL under certain format
+// e.g. (history) 
https://domain:50509/history/application_1536254569791_3806251/1/stages/stage/?id=4&attempt=1
+// e.g. (proxy) 
https://domain:50505/proxy/application_1502220952225_59143/stages/stage?id=4&attempt=1
+function stageEndPoint(appId) {
+var urlRegex = 
/https\:\/\/[^\/]+\/([^\/]+)\/([^\/]+)\/([^\/]+)?\/?([^\/]+)?\/?([^\/]+)?\/?([^\/]+)?/gm;
+var urlArray = urlRegex.exec(document.baseURI);
+var ind = urlArray.indexOf("proxy");
+var queryString = document.baseURI.split('?');
+var words = document.baseURI.split('/');
+var stageId = queryString[1].split("&").filter(word => 
word.includes("id="))[0].split("=")[1];
+if (ind > 0) {
+var appId = urlArray[2];
+var indexOfProxy = words.indexOf("proxy");
+var newBaseURI = words.slice(0, indexOfProxy + 2).join('/');
+return newBaseURI + "/api/v1/applications/" + appId + "/stages/" + 
stageId;
+}
+ind = urlArray.indexOf("history");
+if (ind > 0) {
+var appId = urlArray[2];
+var appAttemptId = urlArray[ind + 2];
+var indexOfHistory = words.indexOf("history");
+var newBaseURI = words.slice(0, indexOfHistory).join('/');
+if (isNaN(appAttemptId) || appAttemptId == "0") {
+return newBaseURI + "/api/v1/applications/" + appId + 
"/stages/" + stageId;
+} else {
+return newBaseURI + "/api/v1/applications/" + appId + "/" + 
appAttemptId + "/stages/" + stageId;
+}
+}
+return location.origin + "/api/v1/applications/" + appId + "/stages/" 
+ stageId;
+}
+
+function getColumnNameForTaskMetricSummary(columnKey) {
+switch(columnKey) {
+case "executorRunTime":
+return "Duration";
+break;
+
+case "jvmGcTime":
+return "GC Time";
+break;
+
+case "gettingResultTime":
+return "Getting Result Time";
+break;
+
+case "inputMetrics":
+return "Input Size / Records";
+break;
+
+case "outputMetrics":
+return "Output Size / Records";
+break;
+
+case "peakExecutionMemory":
+return "Peak Execution Memory";
+break;
+
+case "resultSerializatio

[GitHub] spark pull request #21688: [SPARK-21809] : Change Stage Page to use datatabl...

2018-10-10 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/21688#discussion_r224176616
  
--- Diff: 
core/src/main/resources/org/apache/spark/ui/static/stagespage-template.html ---
@@ -0,0 +1,124 @@
+
+
+

[GitHub] spark pull request #21688: [SPARK-21809] : Change Stage Page to use datatabl...

2018-10-10 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/21688#discussion_r224166110
  
--- Diff: core/src/main/resources/org/apache/spark/ui/static/stagepage.js 
---
@@ -0,0 +1,872 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+var blockUICount = 0;
+
+$(document).ajaxStop(function () {
+if (blockUICount == 0) {
+$.unblockUI();
+blockUICount++;
+}
+});
+
+$(document).ajaxStart(function () {
+if (blockUICount == 0) {
+$.blockUI({message: 'Loading Stage Page...'});
+}
+});
+
+$.extend( $.fn.dataTable.ext.type.order, {
+"duration-pre": ConvertDurationString,
+
+"duration-asc": function ( a, b ) {
+a = ConvertDurationString( a );
+b = ConvertDurationString( b );
+return ((a < b) ? -1 : ((a > b) ? 1 : 0));
+},
+
+"duration-desc": function ( a, b ) {
+a = ConvertDurationString( a );
+b = ConvertDurationString( b );
+return ((a < b) ? 1 : ((a > b) ? -1 : 0));
+}
+} );
+
+// This function will only parse the URL under certain format
+// e.g. (history) 
https://domain:50509/history/application_1536254569791_3806251/1/stages/stage/?id=4&attempt=1
+// e.g. (proxy) 
https://domain:50505/proxy/application_1502220952225_59143/stages/stage?id=4&attempt=1
+function stageEndPoint(appId) {
+var urlRegex = 
/https\:\/\/[^\/]+\/([^\/]+)\/([^\/]+)\/([^\/]+)?\/?([^\/]+)?\/?([^\/]+)?\/?([^\/]+)?/gm;
+var urlArray = urlRegex.exec(document.baseURI);
+var ind = urlArray.indexOf("proxy");
+var queryString = document.baseURI.split('?');
+var words = document.baseURI.split('/');
+var stageId = queryString[1].split("&").filter(word => 
word.includes("id="))[0].split("=")[1];
+if (ind > 0) {
+var appId = urlArray[2];
+var indexOfProxy = words.indexOf("proxy");
+var newBaseURI = words.slice(0, indexOfProxy + 2).join('/');
+return newBaseURI + "/api/v1/applications/" + appId + "/stages/" + 
stageId;
+}
+ind = urlArray.indexOf("history");
+if (ind > 0) {
+var appId = urlArray[2];
+var appAttemptId = urlArray[ind + 2];
+var indexOfHistory = words.indexOf("history");
+var newBaseURI = words.slice(0, indexOfHistory).join('/');
+if (isNaN(appAttemptId) || appAttemptId == "0") {
+return newBaseURI + "/api/v1/applications/" + appId + 
"/stages/" + stageId;
+} else {
+return newBaseURI + "/api/v1/applications/" + appId + "/" + 
appAttemptId + "/stages/" + stageId;
+}
+}
+return location.origin + "/api/v1/applications/" + appId + "/stages/" 
+ stageId;
+}
+
+function getColumnNameForTaskMetricSummary(columnKey) {
+switch(columnKey) {
+case "executorRunTime":
+return "Duration";
+break;
+
+case "jvmGcTime":
+return "GC Time";
+break;
+
+case "gettingResultTime":
+return "Getting Result Time";
+break;
+
+case "inputMetrics":
+return "Input Size / Records";
+break;
+
+case "outputMetrics":
+return "Output Size / Records";
+break;
+
+case "peakExecutionMemory":
+return "Peak Execution Memory";
+break;
+
+case "resultSerializatio

[GitHub] spark pull request #21688: [SPARK-21809] : Change Stage Page to use datatabl...

2018-10-10 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/21688#discussion_r224164134
  
--- Diff: core/src/main/resources/org/apache/spark/ui/static/stagepage.js 
---
@@ -0,0 +1,872 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+var blockUICount = 0;
+
+$(document).ajaxStop(function () {
+if (blockUICount == 0) {
+$.unblockUI();
+blockUICount++;
+}
+});
+
+$(document).ajaxStart(function () {
+if (blockUICount == 0) {
+$.blockUI({message: 'Loading Stage Page...'});
+}
+});
+
+$.extend( $.fn.dataTable.ext.type.order, {
+"duration-pre": ConvertDurationString,
+
+"duration-asc": function ( a, b ) {
+a = ConvertDurationString( a );
+b = ConvertDurationString( b );
+return ((a < b) ? -1 : ((a > b) ? 1 : 0));
+},
+
+"duration-desc": function ( a, b ) {
+a = ConvertDurationString( a );
+b = ConvertDurationString( b );
+return ((a < b) ? 1 : ((a > b) ? -1 : 0));
+}
+} );
+
+// This function will only parse the URL under certain format
+// e.g. (history) 
https://domain:50509/history/application_1536254569791_3806251/1/stages/stage/?id=4&attempt=1
+// e.g. (proxy) 
https://domain:50505/proxy/application_1502220952225_59143/stages/stage?id=4&attempt=1
+function stageEndPoint(appId) {
+var urlRegex = 
/https\:\/\/[^\/]+\/([^\/]+)\/([^\/]+)\/([^\/]+)?\/?([^\/]+)?\/?([^\/]+)?\/?([^\/]+)?/gm;
+var urlArray = urlRegex.exec(document.baseURI);
+var ind = urlArray.indexOf("proxy");
+var queryString = document.baseURI.split('?');
+var words = document.baseURI.split('/');
+var stageId = queryString[1].split("&").filter(word => 
word.includes("id="))[0].split("=")[1];
+if (ind > 0) {
+var appId = urlArray[2];
+var indexOfProxy = words.indexOf("proxy");
+var newBaseURI = words.slice(0, indexOfProxy + 2).join('/');
+return newBaseURI + "/api/v1/applications/" + appId + "/stages/" + 
stageId;
+}
+ind = urlArray.indexOf("history");
+if (ind > 0) {
+var appId = urlArray[2];
+var appAttemptId = urlArray[ind + 2];
+var indexOfHistory = words.indexOf("history");
--- End diff --

similar here can't we get rid of one of the indexes and just use words to 
look up appId and appAttemptId?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21688: [SPARK-21809] : Change Stage Page to use datatabl...

2018-10-10 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/21688#discussion_r224163278
  
--- Diff: core/src/main/resources/org/apache/spark/ui/static/stagepage.js 
---
@@ -0,0 +1,872 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+var blockUICount = 0;
+
+$(document).ajaxStop(function () {
+if (blockUICount == 0) {
+$.unblockUI();
+blockUICount++;
+}
+});
+
+$(document).ajaxStart(function () {
+if (blockUICount == 0) {
+$.blockUI({message: 'Loading Stage Page...'});
+}
+});
+
+$.extend( $.fn.dataTable.ext.type.order, {
+"duration-pre": ConvertDurationString,
+
+"duration-asc": function ( a, b ) {
+a = ConvertDurationString( a );
+b = ConvertDurationString( b );
+return ((a < b) ? -1 : ((a > b) ? 1 : 0));
+},
+
+"duration-desc": function ( a, b ) {
+a = ConvertDurationString( a );
+b = ConvertDurationString( b );
+return ((a < b) ? 1 : ((a > b) ? -1 : 0));
+}
+} );
+
+// This function will only parse the URL under certain format
+// e.g. (history) 
https://domain:50509/history/application_1536254569791_3806251/1/stages/stage/?id=4&attempt=1
+// e.g. (proxy) 
https://domain:50505/proxy/application_1502220952225_59143/stages/stage?id=4&attempt=1
+function stageEndPoint(appId) {
+var urlRegex = 
/https\:\/\/[^\/]+\/([^\/]+)\/([^\/]+)\/([^\/]+)?\/?([^\/]+)?\/?([^\/]+)?\/?([^\/]+)?/gm;
+var urlArray = urlRegex.exec(document.baseURI);
+var ind = urlArray.indexOf("proxy");
+var queryString = document.baseURI.split('?');
+var words = document.baseURI.split('/');
+var stageId = queryString[1].split("&").filter(word => 
word.includes("id="))[0].split("=")[1];
+if (ind > 0) {
+var appId = urlArray[2];
+var indexOfProxy = words.indexOf("proxy");
--- End diff --

we have index of proxy in 2 places can we get rid of one


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21688: [SPARK-21809] : Change Stage Page to use datatabl...

2018-10-10 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/21688#discussion_r224162695
  
--- Diff: core/src/main/resources/org/apache/spark/ui/static/stagepage.js 
---
@@ -0,0 +1,872 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+var blockUICount = 0;
+
+$(document).ajaxStop(function () {
+if (blockUICount == 0) {
+$.unblockUI();
+blockUICount++;
+}
+});
+
+$(document).ajaxStart(function () {
+if (blockUICount == 0) {
+$.blockUI({message: 'Loading Stage Page...'});
+}
+});
+
+$.extend( $.fn.dataTable.ext.type.order, {
+"duration-pre": ConvertDurationString,
+
+"duration-asc": function ( a, b ) {
+a = ConvertDurationString( a );
+b = ConvertDurationString( b );
+return ((a < b) ? -1 : ((a > b) ? 1 : 0));
+},
+
+"duration-desc": function ( a, b ) {
+a = ConvertDurationString( a );
+b = ConvertDurationString( b );
+return ((a < b) ? 1 : ((a > b) ? -1 : 0));
+}
+} );
+
+// This function will only parse the URL under certain format
+// e.g. (history) 
https://domain:50509/history/application_1536254569791_3806251/1/stages/stage/?id=4&attempt=1
+// e.g. (proxy) 
https://domain:50505/proxy/application_1502220952225_59143/stages/stage?id=4&attempt=1
+function stageEndPoint(appId) {
+var urlRegex = 
/https\:\/\/[^\/]+\/([^\/]+)\/([^\/]+)\/([^\/]+)?\/?([^\/]+)?\/?([^\/]+)?\/?([^\/]+)?/gm;
+var urlArray = urlRegex.exec(document.baseURI);
+var ind = urlArray.indexOf("proxy");
+var queryString = document.baseURI.split('?');
+var words = document.baseURI.split('/');
+var stageId = queryString[1].split("&").filter(word => 
word.includes("id="))[0].split("=")[1];
--- End diff --

all of these should be val not var


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21688: [SPARK-21809] : Change Stage Page to use datatabl...

2018-10-10 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/21688#discussion_r224162463
  
--- Diff: core/src/main/resources/org/apache/spark/ui/static/stagepage.js 
---
@@ -0,0 +1,872 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+var blockUICount = 0;
+
+$(document).ajaxStop(function () {
+if (blockUICount == 0) {
+$.unblockUI();
+blockUICount++;
+}
+});
+
+$(document).ajaxStart(function () {
+if (blockUICount == 0) {
+$.blockUI({message: 'Loading Stage Page...'});
+}
+});
+
+$.extend( $.fn.dataTable.ext.type.order, {
+"duration-pre": ConvertDurationString,
+
+"duration-asc": function ( a, b ) {
+a = ConvertDurationString( a );
+b = ConvertDurationString( b );
+return ((a < b) ? -1 : ((a > b) ? 1 : 0));
+},
+
+"duration-desc": function ( a, b ) {
+a = ConvertDurationString( a );
+b = ConvertDurationString( b );
+return ((a < b) ? 1 : ((a > b) ? -1 : 0));
+}
+} );
+
+// This function will only parse the URL under certain format
+// e.g. (history) 
https://domain:50509/history/application_1536254569791_3806251/1/stages/stage/?id=4&attempt=1
+// e.g. (proxy) 
https://domain:50505/proxy/application_1502220952225_59143/stages/stage?id=4&attempt=1
+function stageEndPoint(appId) {
+var urlRegex = 
/https\:\/\/[^\/]+\/([^\/]+)\/([^\/]+)\/([^\/]+)?\/?([^\/]+)?\/?([^\/]+)?\/?([^\/]+)?/gm;
+var urlArray = urlRegex.exec(document.baseURI);
+var ind = urlArray.indexOf("proxy");
--- End diff --

use val and name it proxyInd or something, use different variable for 
history


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   8   9   10   >