[GitHub] spark pull request #16975: [SPARK-19522] Fix executor memory in local-cluste...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/16975#discussion_r102569401 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -470,12 +470,25 @@ class SparkContext(config: SparkConf) extends Logging { files.foreach(addFile) } -_executorMemory = _conf.getOption("spark.executor.memory") - .orElse(Option(System.getenv("SPARK_EXECUTOR_MEMORY"))) - .orElse(Option(System.getenv("SPARK_MEM")) - .map(warnSparkMem)) - .map(Utils.memoryStringToMb) - .getOrElse(1024) +_executorMemory = { + val defaultMemory = 1024 + val configuredMemory = _conf.getOption("spark.executor.memory") +.orElse(Option(System.getenv("SPARK_EXECUTOR_MEMORY"))) +.orElse(Option(System.getenv("SPARK_MEM")).map(warnSparkMem)) +.map(Utils.memoryStringToMb) + // In local-cluster mode, always use the slave memory specified in the master string + // In other modes, use the configured memory if it exists + master match { +case SparkMasterRegex.LOCAL_CLUSTER_REGEX(_, _, em) => + if (configuredMemory.isDefined) { --- End diff -- sure --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16975: [SPARK-19522] Fix executor memory in local-cluste...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/16975#discussion_r101887609 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -466,7 +466,7 @@ object SparkSubmit extends CommandLineUtils { // Other options OptionAssigner(args.executorCores, STANDALONE | YARN, ALL_DEPLOY_MODES, sysProp = "spark.executor.cores"), - OptionAssigner(args.executorMemory, STANDALONE | MESOS | YARN, ALL_DEPLOY_MODES, + OptionAssigner(args.executorMemory, ALL_CLUSTER_MGRS, ALL_DEPLOY_MODES, --- End diff -- also we're talking about a net addition of 7 LOC in `SparkContext.scala`, about half of which are comments and warning logs. It's really not that much code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16975: [SPARK-19522] Fix executor memory in local-cluste...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/16975#discussion_r101887589 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -466,7 +466,7 @@ object SparkSubmit extends CommandLineUtils { // Other options OptionAssigner(args.executorCores, STANDALONE | YARN, ALL_DEPLOY_MODES, sysProp = "spark.executor.cores"), - OptionAssigner(args.executorMemory, STANDALONE | MESOS | YARN, ALL_DEPLOY_MODES, + OptionAssigner(args.executorMemory, ALL_CLUSTER_MGRS, ALL_DEPLOY_MODES, --- End diff -- The inconsistency is already inherent with the parameters in `local-cluster[]`, so I'm not introducing it here with this change. I personally think it's a really bad interface to force the user set executor memory in two different places and require that these two values match. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16975: [SPARK-19522] Fix executor memory in local-cluste...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/16975#discussion_r101857385 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -466,7 +466,7 @@ object SparkSubmit extends CommandLineUtils { // Other options OptionAssigner(args.executorCores, STANDALONE | YARN, ALL_DEPLOY_MODES, sysProp = "spark.executor.cores"), - OptionAssigner(args.executorMemory, STANDALONE | MESOS | YARN, ALL_DEPLOY_MODES, + OptionAssigner(args.executorMemory, ALL_CLUSTER_MGRS, ALL_DEPLOY_MODES, --- End diff -- ``` You may, for whatever reason, want to run executors with less than that, which your change doesn't seem to allow. ``` Yeah, I thought about this long and hard but I just couldn't come up with a case where you would possibly want the worker size to be different from executor size in local-cluster mode. If you want two launch 5 workers (2GB), each with 2 executors (1GB), then you might as well just launch 10 executors (1GB) or run real standalone mode locally. I think it's better to fix the out-of-the-box case than to try to cover all potentially non-existent corner cases. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16975: [SPARK-19522] Fix executor memory in local-cluste...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/16975#discussion_r101857230 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -466,7 +466,7 @@ object SparkSubmit extends CommandLineUtils { // Other options OptionAssigner(args.executorCores, STANDALONE | YARN, ALL_DEPLOY_MODES, sysProp = "spark.executor.cores"), - OptionAssigner(args.executorMemory, STANDALONE | MESOS | YARN, ALL_DEPLOY_MODES, + OptionAssigner(args.executorMemory, ALL_CLUSTER_MGRS, ALL_DEPLOY_MODES, --- End diff -- If this was the only change then specifying `local-cluster[2,1,2048]` doesn't actually do anything because we're not setting `spark.executor.memory` anywhere. You could do `--master local-cluster[2,1,2048] --conf spark.executor.memory=2g` but that's cumbersome and now there are two ways to set the executor memory. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16975: [SPARK-19522] Fix executor memory in local-cluste...
GitHub user andrewor14 opened a pull request: https://github.com/apache/spark/pull/16975 [SPARK-19522] Fix executor memory in local-cluster mode ## What changes were proposed in this pull request? ``` bin/spark-shell --master local-cluster[2,1,2048] ``` is supposed to launch 2 executors, each with 2GB of memory. However, when I ran this in master, I only get executors with 1GB memory. This patch fixes this problem. ## How was this patch tested? `SparkSubmitSuite`, manual tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/andrewor14/spark local-cluster-executor-mem Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16975.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16975 commit db3773c8aeb44e69610c17ff5bc169d7085333ee Author: Andrew Or <andrewo...@gmail.com> Date: 2017-02-08T23:27:07Z Fix propagation of executor memory in local-cluster mode commit 1a5bdfeca931a624b82933772c57c605e02dc58d Author: Andrew Or <andrewo...@gmail.com> Date: 2017-02-17T16:47:44Z Log warning if memory is explicitly set commit b1a13dc3b1a59593945f0fb6d13683ed81acf9b7 Author: Andrew Or <andrewo...@gmail.com> Date: 2017-02-17T16:48:16Z Merge branch 'master' of github.com:apache/spark into local-cluster-executor-mem --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13899: [SPARK-16196][SQL] Codegen in-memory scan with Co...
Github user andrewor14 closed the pull request at: https://github.com/apache/spark/pull/13899 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13899: [SPARK-16196][SQL] Codegen in-memory scan with ColumnarB...
Github user andrewor14 commented on the issue: https://github.com/apache/spark/pull/13899 Closing for now; too many conflicts. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16819: [SPARK-16441][YARN] Set maxNumExecutor depends on yarn c...
Github user andrewor14 commented on the issue: https://github.com/apache/spark/pull/16819 I agree. Resource managers generally expect applications to request more than what's available already so we don't have to do it again ourselves in Spark. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16823: [SPARK] Config methods simplification at SparkSession#Bu...
Github user andrewor14 commented on the issue: https://github.com/apache/spark/pull/16823 This is a bad idea! First it breaks backward compatibility, and second, we intentionally didn't want to make it so general that the user can pass in any objects. Can you please close this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15396: [SPARK-14804][Spark][Graphx] Fix checkpointing of...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/15396#discussion_r98218538 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -1589,7 +1589,8 @@ abstract class RDD[T: ClassTag]( * This is introduced as an alias for `isCheckpointed` to clarify the semantics of the * return value. Exposed for testing. */ - private[spark] def isCheckpointedAndMaterialized: Boolean = isCheckpointed + private[spark] def isCheckpointedAndMaterialized: Boolean = +checkpointData.exists(_.isCheckpointed) --- End diff -- looks good, but you forgot to make this final --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16081: [SPARK][EXAMPLE] Added missing semicolon in quick-start-...
Github user andrewor14 commented on the issue: https://github.com/apache/spark/pull/16081 and 2.0 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16073: [SPARK-18640] Add synchronization to TaskScheduler.runni...
Github user andrewor14 commented on the issue: https://github.com/apache/spark/pull/16073 and 2.0 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16073: [SPARK-18640] Add synchronization to TaskScheduler.runni...
Github user andrewor14 commented on the issue: https://github.com/apache/spark/pull/16073 LGTM merging into master 2.1. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16081: [SPARK][EXAMPLE] Added missing semicolon in quick-start-...
Github user andrewor14 commented on the issue: https://github.com/apache/spark/pull/16081 Ok, merging into master 2.1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15462: [SPARK-17680] [SQL] [TEST] Added test cases for InMemory...
Github user andrewor14 commented on the issue: https://github.com/apache/spark/pull/15462 @kiszk is there a JIRA associated specifically with adding tests for `InMemoryRelation`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15462: [SPARK-17680] [SQL] [TEST] Added test cases for InMemory...
Github user andrewor14 commented on the issue: https://github.com/apache/spark/pull/15462 LGTM, merging into master 2.1 thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15993: [SPARK-18050][SQL] do not create default database if it ...
Github user andrewor14 commented on the issue: https://github.com/apache/spark/pull/15993 Sounds good. Merging into master 2.1. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15462: [SPARK-17680] [SQL] [TEST] Added test cases for I...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/15462#discussion_r89205894 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/columnar/InMemoryColumnarQuerySuite.scala --- @@ -20,18 +20,83 @@ package org.apache.spark.sql.execution.columnar import java.nio.charset.StandardCharsets import java.sql.{Date, Timestamp} -import org.apache.spark.sql.{QueryTest, Row} +import org.apache.spark.sql.{DataFrame, QueryTest, Row} import org.apache.spark.sql.internal.SQLConf import org.apache.spark.sql.test.SharedSQLContext import org.apache.spark.sql.test.SQLTestData._ import org.apache.spark.sql.types._ -import org.apache.spark.storage.StorageLevel.MEMORY_ONLY +import org.apache.spark.storage.StorageLevel._ class InMemoryColumnarQuerySuite extends QueryTest with SharedSQLContext { import testImplicits._ setupTestData() + def cachePrimitiveTest(data: DataFrame, dataType: String) { --- End diff -- minor: `private def` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15462: [SPARK-17680] [SQL] [TEST] Added test cases for InMemory...
Github user andrewor14 commented on the issue: https://github.com/apache/spark/pull/15462 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15462: [SPARK-17680] [SQL] [TEST] Added test cases for I...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/15462#discussion_r89205780 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/columnar/InMemoryColumnarQuerySuite.scala --- @@ -58,6 +123,12 @@ class InMemoryColumnarQuerySuite extends QueryTest with SharedSQLContext { }.map(Row.fromTuple)) } + test("access only some column of the all of columns") { +val df = spark.range(1, 100).map(i => (i, (i + 1).toFloat)).toDF("i", "f").cache +df.count --- End diff -- please be explicit with the actions here `count()`, `cache()` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15462: [SPARK-17680] [SQL] [TEST] Added test cases for I...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/15462#discussion_r89205541 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/columnar/InMemoryColumnarQuerySuite.scala --- @@ -20,18 +20,83 @@ package org.apache.spark.sql.execution.columnar import java.nio.charset.StandardCharsets import java.sql.{Date, Timestamp} -import org.apache.spark.sql.{QueryTest, Row} +import org.apache.spark.sql.{DataFrame, QueryTest, Row} import org.apache.spark.sql.internal.SQLConf import org.apache.spark.sql.test.SharedSQLContext import org.apache.spark.sql.test.SQLTestData._ import org.apache.spark.sql.types._ -import org.apache.spark.storage.StorageLevel.MEMORY_ONLY +import org.apache.spark.storage.StorageLevel._ class InMemoryColumnarQuerySuite extends QueryTest with SharedSQLContext { import testImplicits._ setupTestData() + def cachePrimitiveTest(data: DataFrame, dataType: String) { +data.createOrReplaceTempView(s"testData$dataType") +val storageLevel = MEMORY_ONLY +val plan = spark.sessionState.executePlan(data.logicalPlan).sparkPlan +val inMemoryRelation = InMemoryRelation(useCompression = true, 5, storageLevel, plan, None) + +assert(inMemoryRelation.cachedColumnBuffers.getStorageLevel == storageLevel) +inMemoryRelation.cachedColumnBuffers.collect().head match { + case _: CachedBatch => assert(true) --- End diff -- no need to do assert true here --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15462: [SPARK-17680] [SQL] [TEST] Added test cases for I...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/15462#discussion_r89205861 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/columnar/InMemoryColumnarQuerySuite.scala --- @@ -246,4 +317,59 @@ class InMemoryColumnarQuerySuite extends QueryTest with SharedSQLContext { assert(cached.batchStats.value === expectedAnswer.size * INT.defaultSize) } + test("access columns in CachedBatch without whole stage codegen") { +// whole stage codegen is not applied to a row with more than WHOLESTAGE_MAX_NUM_FIELDS fields +withSQLConf(SQLConf.WHOLESTAGE_MAX_NUM_FIELDS.key -> "2") { + val data = Seq(null, true, 1.toByte, 3.toShort, 7, 15.toLong, +31.25.toFloat, 63.75, new Date(127), new Timestamp(25500L), null) + val dataTypes = Seq(NullType, BooleanType, ByteType, ShortType, IntegerType, LongType, +FloatType, DoubleType, DateType, TimestampType, IntegerType) + val schemas = dataTypes.zipWithIndex.map { case (dataType, index) => +StructField(s"col$index", dataType, true) + } + val rdd = sparkContext.makeRDD(Seq(Row.fromSeq(data))) + val df = spark.createDataFrame(rdd, StructType(schemas)) + val row = df.persist.take(1).apply(0) + checkAnswer(df, row) +} + +withSQLConf(SQLConf.WHOLESTAGE_MAX_NUM_FIELDS.key -> "2") { --- End diff -- similarly can you split these into multiple smaller unit tests? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15462: [SPARK-17680] [SQL] [TEST] Added test cases for I...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/15462#discussion_r89205730 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/columnar/InMemoryColumnarQuerySuite.scala --- @@ -20,18 +20,83 @@ package org.apache.spark.sql.execution.columnar import java.nio.charset.StandardCharsets import java.sql.{Date, Timestamp} -import org.apache.spark.sql.{QueryTest, Row} +import org.apache.spark.sql.{DataFrame, QueryTest, Row} import org.apache.spark.sql.internal.SQLConf import org.apache.spark.sql.test.SharedSQLContext import org.apache.spark.sql.test.SQLTestData._ import org.apache.spark.sql.types._ -import org.apache.spark.storage.StorageLevel.MEMORY_ONLY +import org.apache.spark.storage.StorageLevel._ class InMemoryColumnarQuerySuite extends QueryTest with SharedSQLContext { import testImplicits._ setupTestData() + def cachePrimitiveTest(data: DataFrame, dataType: String) { +data.createOrReplaceTempView(s"testData$dataType") +val storageLevel = MEMORY_ONLY +val plan = spark.sessionState.executePlan(data.logicalPlan).sparkPlan +val inMemoryRelation = InMemoryRelation(useCompression = true, 5, storageLevel, plan, None) + +assert(inMemoryRelation.cachedColumnBuffers.getStorageLevel == storageLevel) +inMemoryRelation.cachedColumnBuffers.collect().head match { + case _: CachedBatch => assert(true) + case other => fail(s"Unexpected cached batch type: ${other.getClass.getName}") +} +checkAnswer(inMemoryRelation, data.collect().toSeq) + } + + test("all data type w && w/o nullability") { +// all primitives +Seq(true, false).map { nullability => --- End diff -- can you split these into 2 separate tests? It's more debuggable that way. I would use a private helper method to abstract the logic --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15978: [SPARK-18507][SQL] HiveExternalCatalog.listPartitions sh...
Github user andrewor14 commented on the issue: https://github.com/apache/spark/pull/15978 (Oops never mind, not my fault! :p) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15978: [SPARK-18507][SQL] HiveExternalCatalog.listPartitions sh...
Github user andrewor14 commented on the issue: https://github.com/apache/spark/pull/15978 @cloud-fan can you make a patch for 2.0? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15978: [SPARK-18507][SQL] HiveExternalCatalog.listPartitions sh...
Github user andrewor14 commented on the issue: https://github.com/apache/spark/pull/15978 Oops, that was my fault. Thanks merging into master 2.1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15811: [SPARK-18361] [PySpark] Expose RDD localCheckpoint in Py...
Github user andrewor14 commented on the issue: https://github.com/apache/spark/pull/15811 I did --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15896: [SPARK-18465] Uncache table shouldn't throw an exception...
Github user andrewor14 commented on the issue: https://github.com/apache/spark/pull/15896 I personally think `UNCACHE TABLE IF EXISTS` is best. It preserves the old behavior but lets the user make sure a table is not cached if they really want. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15953: [SPARK-18517][SQL] DROP TABLE IF EXISTS should not warn ...
Github user andrewor14 commented on the issue: https://github.com/apache/spark/pull/15953 Makes sense. This LGTM merging into master and 2.0. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15811: [SPARK-18361] [PySpark] Expose RDD localCheckpoint in Py...
Github user andrewor14 commented on the issue: https://github.com/apache/spark/pull/15811 LGTM merging into master thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15811: [SPARK-18361] [PySpark] Expose RDD localCheckpoint in Py...
Github user andrewor14 commented on the issue: https://github.com/apache/spark/pull/15811 Looks good, just one question. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15811: [SPARK-18361] [PySpark] Expose RDD localCheckpoin...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/15811#discussion_r88115031 --- Diff: python/pyspark/rdd.py --- @@ -181,6 +181,7 @@ def __init__(self, jrdd, ctx, jrdd_deserializer=AutoBatchedSerializer(PickleSeri self._jrdd = jrdd self.is_cached = False self.is_checkpointed = False +self.is_locally_checkpointed = False --- End diff -- is this read anywhere? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15811: [SPARK-18361] [PySpark] Expose RDD localCheckpoint in Py...
Github user andrewor14 commented on the issue: https://github.com/apache/spark/pull/15811 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15833: [SPARK-18353][CORE] spark.rpc.askTimeout defalut ...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/15833#discussion_r88113719 --- Diff: core/src/main/scala/org/apache/spark/deploy/Client.scala --- @@ -221,7 +221,9 @@ object Client { val conf = new SparkConf() val driverArgs = new ClientArguments(args) -conf.set("spark.rpc.askTimeout", "10") +if (!conf.contains("spark.rpc.askTimeout")) { --- End diff -- also we should probably add this to `config/package.scala` so we don't duplicate it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15766: [SPARK-18271][SQL]hash udf in HiveSessionCatalog.hiveFun...
Github user andrewor14 commented on the issue: https://github.com/apache/spark/pull/15766 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15756: [SPARK-18256] Improve the performance of event log repla...
Github user andrewor14 commented on the issue: https://github.com/apache/spark/pull/15756 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15739: [SPARK-16808][Core] prepend base URI for links on main h...
Github user andrewor14 commented on the issue: https://github.com/apache/spark/pull/15739 Also there's another patch trying to solve the same issue: #15742 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15756: [SPARK-18256] Improve the performance of event log repla...
Github user andrewor14 commented on the issue: https://github.com/apache/spark/pull/15756 LGTM. That's a massive amount of time spent in `Class.getSimpleName`! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15756: [SPARK-18256] Improve the performance of event lo...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/15756#discussion_r86422590 --- Diff: core/src/main/scala/org/apache/spark/util/JsonProtocol.scala --- @@ -540,7 +544,8 @@ private[spark] object JsonProtocol { def taskStartFromJson(json: JValue): SparkListenerTaskStart = { val stageId = (json \ "Stage ID").extract[Int] -val stageAttemptId = (json \ "Stage Attempt ID").extractOpt[Int].getOrElse(0) +val stageAttemptId = + Utils.jsonOption(json \ "Stage Attempt ID").map(_.extract[Int]).getOrElse(0) --- End diff -- if so maybe we should add a scalastyle to ban it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15756: [SPARK-18256] Improve the performance of event lo...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/15756#discussion_r86422652 --- Diff: core/src/main/scala/org/apache/spark/util/JsonProtocol.scala --- @@ -540,7 +544,8 @@ private[spark] object JsonProtocol { def taskStartFromJson(json: JValue): SparkListenerTaskStart = { val stageId = (json \ "Stage ID").extract[Int] -val stageAttemptId = (json \ "Stage Attempt ID").extractOpt[Int].getOrElse(0) +val stageAttemptId = + Utils.jsonOption(json \ "Stage Attempt ID").map(_.extract[Int]).getOrElse(0) --- End diff -- oh wait you already did 😄 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15756: [SPARK-18256] Improve the performance of event lo...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/15756#discussion_r86422521 --- Diff: core/src/main/scala/org/apache/spark/util/JsonProtocol.scala --- @@ -540,7 +544,8 @@ private[spark] object JsonProtocol { def taskStartFromJson(json: JValue): SparkListenerTaskStart = { val stageId = (json \ "Stage ID").extract[Int] -val stageAttemptId = (json \ "Stage Attempt ID").extractOpt[Int].getOrElse(0) +val stageAttemptId = + Utils.jsonOption(json \ "Stage Attempt ID").map(_.extract[Int]).getOrElse(0) --- End diff -- what's the difference? Is `extractOpt` really slow? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15739: [SPARK-16808][Core] prepend base URI for links on main h...
Github user andrewor14 commented on the issue: https://github.com/apache/spark/pull/15739 ok to test @vanzin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15698: [SPARK-18182] Expose ReplayListenerBus.read() overload w...
Github user andrewor14 commented on the issue: https://github.com/apache/spark/pull/15698 LGTM retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15410: [SPARK-17843][Web UI] Indicate event logs pending for pr...
Github user andrewor14 commented on the issue: https://github.com/apache/spark/pull/15410 We shouldn't display file names but we should display application names and IDs, something the user understands. We don't have to do that as part of this issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15458: [SPARK-17899][SQL] add a debug mode to keep raw table pr...
Github user andrewor14 commented on the issue: https://github.com/apache/spark/pull/15458 I see. Then maybe we should add a comment above the config to note that several commands don't work (e.g. ALTER TABLE) if this is turned on, even if it's only internal. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15458: [SPARK-17899][SQL] add a debug mode to keep raw table pr...
Github user andrewor14 commented on the issue: https://github.com/apache/spark/pull/15458 Yes that's why it's `internal` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15456: [SPARK-17686][Core] Support printing out scala and java ...
Github user andrewor14 commented on the issue: https://github.com/apache/spark/pull/15456 Merging into master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15458: [SPARK-17899][SQL] add a debug mode to keep raw table pr...
Github user andrewor14 commented on the issue: https://github.com/apache/spark/pull/15458 JK, actually it doesn't merge in 2.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15458: [SPARK-17899][SQL] add a debug mode to keep raw table pr...
Github user andrewor14 commented on the issue: https://github.com/apache/spark/pull/15458 Cool beans. Merging into master 2.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15400: [SPARK-11272] [Web UI] Add support for downloading event...
Github user andrewor14 commented on the issue: https://github.com/apache/spark/pull/15400 This one LGTM I'm merging it into master. Thanks for working on this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15400: [SPARK-11272] [Web UI] Add support for downloading event...
Github user andrewor14 commented on the issue: https://github.com/apache/spark/pull/15400 Usually we retest the PR if it's been a few days since it last ran tests. We have had build breaks before where we merged a PR that passed tests a long time ago. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15410: [SPARK-17843][Web UI] Indicate event logs pending for pr...
Github user andrewor14 commented on the issue: https://github.com/apache/spark/pull/15410 ok to test I think the idea is good, but it would be a better UX if we display the pending applications as rows in the existing table (or a new one) and indicate there that it's still being processed. It might be more code change but I think it's worth it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15410: [SPARK-17843][Web UI] Indicate event logs pending...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/15410#discussion_r83044768 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/HistoryPage.scala --- @@ -38,6 +39,13 @@ private[history] class HistoryPage(parent: HistoryServer) extends WebUIPage("") {providerConfig.map { case (k, v) => {k}: {v} }} { +if (eventLogsUnderProcessCount > 0) { + There are {eventLogsUnderProcessCount} event log(s) currently being +processed which may result in additional applications getting listed on this page. --- End diff -- Instead of just telling the user how many logs are pending, it will be a lot more useful to tell him which ones are pending so he can wait before clicking on a specific application --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15396: [SPARK-14804][Spark][Graphx] Fix checkpointing of Vertex...
Github user andrewor14 commented on the issue: https://github.com/apache/spark/pull/15396 Looks good. I left a suggestion that I think will make the code cleaner. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15396: [SPARK-14804][Spark][Graphx] Fix checkpointing of...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/15396#discussion_r83043442 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -1589,7 +1589,8 @@ abstract class RDD[T: ClassTag]( * This is introduced as an alias for `isCheckpointed` to clarify the semantics of the * return value. Exposed for testing. */ - private[spark] def isCheckpointedAndMaterialized: Boolean = isCheckpointed + private[spark] def isCheckpointedAndMaterialized: Boolean = +checkpointData.exists(_.isCheckpointed) --- End diff -- One way to fix the duplicate code is to make this one final and have `isCheckpointed` call this one, then implementations of RDD can feel free to override `isCheckpointed` without affecting our internal implementation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15396: [SPARK-14804][Spark][Graphx] Fix checkpointing of...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/15396#discussion_r83042522 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -1589,7 +1589,8 @@ abstract class RDD[T: ClassTag]( * This is introduced as an alias for `isCheckpointed` to clarify the semantics of the --- End diff -- oh wait I just realized this is exactly the same as `isCheckpointed`. It's kind of confusing why we have two methods that do the same thing. Is it because VertexRDD or something overrides one but not the other? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15400: [SPARK-11272] [Web UI] Add support for downloading event...
Github user andrewor14 commented on the issue: https://github.com/apache/spark/pull/15400 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15353: [SPARK-17724][WebUI][Streaming] Unevaluated new lines in...
Github user andrewor14 commented on the issue: https://github.com/apache/spark/pull/15353 @keypointt by "working" I mean it should be replaced by a line break, not a space --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15405: [SPARK-15917][CORE] Added support for number of executor...
Github user andrewor14 commented on the issue: https://github.com/apache/spark/pull/15405 Thanks for working on this. It's great to see how small the patch turned out to be! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15405: [SPARK-15917][CORE] Added support for number of e...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/15405#discussion_r82671287 --- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala --- @@ -637,6 +637,16 @@ private[deploy] class Master( } freeWorkers = freeWorkers.filter(canLaunchExecutor) } + +val numExecutorsScheduled = assignedExecutors.sum +val numExecutorsLaunched = app.executors.size +// Check to see if we managed to launch the requested number of executors +if(numUsable != 0 && numExecutorsLaunched != app.executorLimit && + numExecutorsScheduled != app.executorLimit) { --- End diff -- Another thing is, how noisy is this? Do we log this if dynamic allocation is turned on (we shouldn't)? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15405: [SPARK-15917][CORE] Added support for number of executor...
Github user andrewor14 commented on the issue: https://github.com/apache/spark/pull/15405 add to whitelist --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15347: [SPARK-16827] Stop reporting spill metrics as shuffle me...
Github user andrewor14 commented on the issue: https://github.com/apache/spark/pull/15347 OK, this change by itself LGTM. @dafrista would you mind creating a separate JIRA (or point me to an existing one) about the TODO then? Merging this into master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15353: [SPARK-17724][WebUI][Streaming] Unevaluated new lines in...
Github user andrewor14 commented on the issue: https://github.com/apache/spark/pull/15353 But this isn't the original intention, which is to actually add a line break where `\n` is today. IIRC this works correctly on Chrome but not on Safari (or the other way round?). If you can make it work across all browsers then I will merge it. :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15353: [SPARK-17724][WebUI][Streaming] Unevaluated new lines in...
Github user andrewor14 commented on the issue: https://github.com/apache/spark/pull/15353 Also this is a more general problem, not just for streaming --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15347: [SPARK-16827] Stop reporting spill metrics as shu...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/15347#discussion_r82042420 --- Diff: core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java --- @@ -145,7 +145,9 @@ private UnsafeExternalSorter( // Use getSizeAsKb (not bytes) to maintain backwards compatibility for units // this.fileBufferSizeBytes = (int) conf.getSizeAsKb("spark.shuffle.file.buffer", "32k") * 1024; this.fileBufferSizeBytes = 32 * 1024; -this.writeMetrics = taskContext.taskMetrics().shuffleWriteMetrics(); +// The spill metrics are stored in a new ShuffleWriteMetrics, and then discarded (this fixes SPARK-16827). +// TODO: Instead, separate spill metrics should be stored and reported (tracked in SPARK-3577). +this.writeMetrics = new ShuffleWriteMetrics(); --- End diff -- so this isn't actually used anywhere right now? Is that what the TODO is about? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15350: [SPARK-17778][Tests]Mock SparkContext to reduce memory u...
Github user andrewor14 commented on the issue: https://github.com/apache/spark/pull/15350 I think that's OK. This is supposed to be a unit test for the BlockManager, not how BlockManager interacts with the rest of the system. LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15290: [SPARK-17715] [Scheduler] Make task launch logs DEBUG
Github user andrewor14 commented on the issue: https://github.com/apache/spark/pull/15290 Merging into master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15290: [SPARK-17715] [Scheduler] Make task launch logs DEBUG
Github user andrewor14 commented on the issue: https://github.com/apache/spark/pull/15290 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15247: [SPARK-17672] Spark 2.0 history server web Ui tak...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/15247#discussion_r81218774 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/ApplicationHistoryProvider.scala --- @@ -109,4 +109,11 @@ private[history] abstract class ApplicationHistoryProvider { @throws(classOf[SparkException]) def writeEventLogs(appId: String, attemptId: Option[String], zipStream: ZipOutputStream): Unit + /** + * Returns an ApplicationHistoryInfo for the appId. + * + * @return ApplicationHistoryInfo of one appId if exists. --- End diff -- This is kind of verbose. I would just say `@return the [[ApplicationHistoryInfo]] for the appId if it exists` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15247: [SPARK-17672] Spark 2.0 history server web Ui tak...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/15247#discussion_r81219143 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/ApplicationHistoryProvider.scala --- @@ -109,4 +109,11 @@ private[history] abstract class ApplicationHistoryProvider { @throws(classOf[SparkException]) def writeEventLogs(appId: String, attemptId: Option[String], zipStream: ZipOutputStream): Unit + /** + * Returns an ApplicationHistoryInfo for the appId. + * + * @return ApplicationHistoryInfo of one appId if exists. --- End diff -- I'll fix this myself when I merge --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15247: [SPARK-17672] Spark 2.0 history server web Ui takes too ...
Github user andrewor14 commented on the issue: https://github.com/apache/spark/pull/15247 LGTM merging into master 2.0, thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15247: [SPARK-17672] Spark 2.0 history server web Ui tak...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/15247#discussion_r81219049 --- Diff: core/src/main/scala/org/apache/spark/status/api/v1/ApiRootResource.scala --- @@ -222,6 +222,7 @@ private[spark] object ApiRootResource { private[spark] trait UIRoot { def getSparkUI(appKey: String): Option[SparkUI] def getApplicationInfoList: Iterator[ApplicationInfo] + def getApplicationInfo(appId: String): Option[ApplicationInfo] --- End diff -- also this --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15247: [SPARK-17672] Spark 2.0 history server web Ui tak...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/15247#discussion_r81218992 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala --- @@ -182,6 +182,10 @@ class HistoryServer( getApplicationList().iterator.map(ApplicationsListResource.appHistoryInfoToPublicAppInfo) } + def getApplicationInfo(appId: String): Option[ApplicationInfo] = { --- End diff -- I believe this is a public API, but it's a reasonable one --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15295: [SPARK-17720][SQL] introduce global SQL conf
Github user andrewor14 commented on the issue: https://github.com/apache/spark/pull/15295 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15221: [SPARK-17648][CORE] TaskScheduler really needs offers to...
Github user andrewor14 commented on the issue: https://github.com/apache/spark/pull/15221 This looks reasonable. Merging into master. I will leave it out from branch-2.0 just in case. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15295: [SPARK-17720][SQL] introduce global SQL conf
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/15295#discussion_r81150854 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala --- @@ -791,7 +791,7 @@ object SparkSession { // Get the session from current thread's active session. var session = activeThreadSession.get() if ((session ne null) && !session.sparkContext.isStopped) { -options.foreach { case (k, v) => session.conf.set(k, v) } +options.foreach { case (k, v) => session.sessionState.conf.setConfString(k, v) } --- End diff -- this is change related? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15295: [SPARK-17720][SQL] introduce global SQL conf
Github user andrewor14 commented on the issue: https://github.com/apache/spark/pull/15295 LGTM. Pretty straightforward. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15244: Update spark-standalone.md to fix link
Github user andrewor14 commented on the issue: https://github.com/apache/spark/pull/15244 Thanks merged into master 2.0 and 1.6 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15243: Fixing comment since Actor is not used anymore.
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/15243#discussion_r80478992 --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/WorkerWatcher.scala --- @@ -21,7 +21,7 @@ import org.apache.spark.internal.Logging import org.apache.spark.rpc._ /** - * Actor which connects to a worker process and terminates the JVM if the connection is severed. + * WorkerWatcher RpcEndpoint which connects to a worker process and terminates the JVM if the connection is severed. --- End diff -- yeah, just say `Endpoint` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15137: [SPARK-17512][Core] Avoid formatting to python path for ...
Github user andrewor14 commented on the issue: https://github.com/apache/spark/pull/15137 Got it. I think back then when I wrote that comment we still haven't supported cluster mode with python yet. It's just a little confusing if we're reading that comment in isolation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15137: [SPARK-17512][Core] Avoid formatting to python path for ...
Github user andrewor14 commented on the issue: https://github.com/apache/spark/pull/15137 I've merged it. One more thing, would you mind correcting the comment in `PythonRunner#formatPath`? Right now it says "we currently only support local python files", which is apparently not true! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15137: [SPARK-17512][Core] Avoid formatting to python path for ...
Github user andrewor14 commented on the issue: https://github.com/apache/spark/pull/15137 Good catch. I'm merging this into master, 2.0 and 1.6. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15181: [SPARK-17623][CORE] Clarify type of TaskEndReason with a...
Github user andrewor14 commented on the issue: https://github.com/apache/spark/pull/15181 Thanks for the clean up. I'm merging this into master. Because this patch touches multiple files in the critical scheduler code I'm hesitant on back porting this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15181: [SPARK-17623][CORE] Clarify type of TaskEndReason...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/15181#discussion_r79939560 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskResultGetter.scala --- @@ -118,14 +118,14 @@ private[spark] class TaskResultGetter(sparkEnv: SparkEnv, scheduler: TaskSchedul def enqueueFailedTask(taskSetManager: TaskSetManager, tid: Long, taskState: TaskState, serializedData: ByteBuffer) { -var reason : TaskEndReason = UnknownReason +var reason : TaskFailedReason = UnknownReason try { getTaskResultExecutor.execute(new Runnable { override def run(): Unit = Utils.logUncaughtExceptions { val loader = Utils.getContextOrSparkClassLoader try { if (serializedData != null && serializedData.limit() > 0) { - reason = serializer.get().deserialize[TaskEndReason]( + reason = serializer.get().deserialize[TaskFailedReason]( --- End diff -- great. It's always better to be more explicit --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15181: [SPARK-17623][CORE] Clarify type of TaskEndReason...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/15181#discussion_r79894168 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskResultGetter.scala --- @@ -118,14 +118,14 @@ private[spark] class TaskResultGetter(sparkEnv: SparkEnv, scheduler: TaskSchedul def enqueueFailedTask(taskSetManager: TaskSetManager, tid: Long, taskState: TaskState, serializedData: ByteBuffer) { -var reason : TaskEndReason = UnknownReason +var reason : TaskFailedReason = UnknownReason try { getTaskResultExecutor.execute(new Runnable { override def run(): Unit = Utils.logUncaughtExceptions { val loader = Utils.getContextOrSparkClassLoader try { if (serializedData != null && serializedData.limit() > 0) { - reason = serializer.get().deserialize[TaskEndReason]( + reason = serializer.get().deserialize[TaskFailedReason]( --- End diff -- Is this totally safe? Can you point me to the code where we serialize this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15159: [SPARK-17605][SPARK_SUBMIT] Add option spark.useP...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/15159#discussion_r79867613 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala --- @@ -70,6 +70,8 @@ private[deploy] class SparkSubmitArguments(args: Seq[String], env: Map[String, S var proxyUser: String = null var principal: String = null var keytab: String = null + var usePython: Boolean = false --- End diff -- so now there's an `isPython` and ` usePython`? Can you just use the old one? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15001: [SPARK-17438][WebUI] Show Application.executorLimit in t...
Github user andrewor14 commented on the issue: https://github.com/apache/spark/pull/15001 LGTM, merging into master 2.0 thanks @zsxwing! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15001: [SPARK-17438][WebUI] Show Application.executorLim...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/15001#discussion_r79449912 --- Diff: core/src/main/scala/org/apache/spark/deploy/master/ui/ApplicationPage.scala --- @@ -70,6 +70,16 @@ private[ui] class ApplicationPage(parent: MasterWebUI) extends WebUIPage("app") } + +Executor Limit: +{ + if (app.executorLimit == Int.MaxValue) "Unlimited" else app.executorLimit --- End diff -- I think for now it's OK. Conceptually the number of executors is constrained by the lower of the two: the executor limit and the cores limit (when translated into executors). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15133: [SPARK-17578][Docs] Add spark.app.name default value for...
Github user andrewor14 commented on the issue: https://github.com/apache/spark/pull/15133 Yeah `SparkSession` will be the new thing moving forward. `SparkContext` is kind of just a legacy thing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15133: [SPARK-17578][Docs] Add spark.app.name default value for...
Github user andrewor14 commented on the issue: https://github.com/apache/spark/pull/15133 We should probably just make it a random UUID in all cases to be consistent. I don't know if people check whether `spark.app.name` is set, so that might be a backward compatibility concern (though one that we kind of already broke with `SparkSession`). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14338: [SPARK-16701] Make parameters configurable in BlockManag...
Github user andrewor14 commented on the issue: https://github.com/apache/spark/pull/14338 I am opposed to this change because these are internal states that were never intended for the user to tweak. If there is a specific problem you're trying to fix then we should address that more generally instead. Please close this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14644: [MESOS] Enable GPU support with Mesos
Github user andrewor14 commented on the issue: https://github.com/apache/spark/pull/14644 Tim, please file a JIRA! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14765: [SPARK-15815] Keeping tell yarn the target executors in ...
Github user andrewor14 commented on the issue: https://github.com/apache/spark/pull/14765 From the JIRA description it seems that this issue arises not only in the context of DA. If that's the case then we should definitely not just arbitrarily remove code from `ExecutorAllocationManager`. Let's discuss on a more general solution on the issue, but for now we should close this PR since it's neither sufficient nor correct. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14862: [SPARK-17295][SQL] Create TestHiveSessionState use refle...
Github user andrewor14 commented on the issue: https://github.com/apache/spark/pull/14862 What exactly does this change buy us? It doesn't allow us to remove any of the inheritance code. `TestHiveSessionState` is not fundamentally different from `HiveSessionState` so I think it's actually less clean to introduce a new catalog implementation for this purpose. @jiangxb1987 would you mind closing this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15099: [SPARK-17541][SQL] fix some DDL bugs about table managem...
Github user andrewor14 commented on the issue: https://github.com/apache/spark/pull/15099 LGTM! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13822: [SPARK-16115][SQL] Change output schema to be partition ...
Github user andrewor14 commented on the issue: https://github.com/apache/spark/pull/13822 By the way I'm not super active in this community anymore. If you want a quicker response you could try your luck by pinging @yhuai or @cloud-fan instead. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13822: [SPARK-16115][SQL] Change output schema to be par...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/13822#discussion_r79247914 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -660,6 +662,10 @@ case class ShowPartitionsCommand( override def run(sparkSession: SparkSession): Seq[Row] = { val catalog = sparkSession.sessionState.catalog +if (!catalog.tableExists(table)) { + throw new AnalysisException(s" Table does not exist") --- End diff -- Can you include the table name in this message? Also please remove the random space in the beginning. Something like: ``` s"Table $table does not exist" ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13822: [SPARK-16115][SQL] Change output schema to be par...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/13822#discussion_r79247651 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala --- @@ -595,6 +595,19 @@ class HiveDDLSuite } } + test("test show partitions") { --- End diff -- can you call this something more specific, like `show partitions on non-existent table` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13822: [SPARK-16115][SQL] Change output schema to be par...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/13822#discussion_r79247588 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala --- @@ -595,6 +595,19 @@ class HiveDDLSuite } } + test("test show partitions") { +val message = intercept[AnalysisException] { + sql("SHOW PARTITIONS default.nonexistentTable") +}.getMessage +assert(message.contains("Table does not exist")) + +withTable("t1") { + sql("CREATE TABLE t1 (key STRING, value STRING) PARTITIONED BY (ds STRING)") + sql("ALTER TABLE t1 ADD PARTITION (ds = '1')") + assert(sql(" SHOW PARTITIONS t1").schema.getFieldIndex("partition") == Some(0)) --- End diff -- I'm not sure if this is worth testing, since this is just Hive behavior --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org