[GitHub] spark issue #18350: [MINOR] Fix some typo of the document
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18350 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18343: [SPARK-21133][CORE] Fix HighlyCompressedMapStatus...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18343#discussion_r122625325 --- Diff: core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala --- @@ -175,6 +175,7 @@ class KryoSerializer(conf: SparkConf) kryo.register(None.getClass) kryo.register(Nil.getClass) kryo.register(Utils.classForName("scala.collection.immutable.$colon$colon")) + kryo.register(Utils.classForName("scala.collection.immutable.Map$EmptyMap$")) --- End diff -- why `Map$EmptyMap$`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18343: [SPARK-21133][CORE] Fix HighlyCompressedMapStatus...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18343#discussion_r122625202 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -141,7 +141,7 @@ private[spark] class HighlyCompressedMapStatus private ( private[this] var numNonEmptyBlocks: Int, private[this] var emptyBlocks: RoaringBitmap, private[this] var avgSize: Long, -@transient private var hugeBlockSizes: Map[Int, Byte]) +private[this] var hugeBlockSizes: Map[Int, Byte]) --- End diff -- oh seems it is now, then LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18350: [MINOR] Fix some typo of the document
Github user ConeyLiu commented on the issue: https://github.com/apache/spark/pull/18350 Hi @srowen, would you mind take a look? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18343: [SPARK-21133][CORE] Fix HighlyCompressedMapStatus#writeE...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18343 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18350: [MINOR] Fix some typo of the document
GitHub user ConeyLiu opened a pull request: https://github.com/apache/spark/pull/18350 [MINOR] Fix some typo of the document ## What changes were proposed in this pull request? Fix some typo of the document. ## How was this patch tested? Existing tests. Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ConeyLiu/spark fixtypo Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18350.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18350 commit 32841d64dd0026b2bc3030196ada1e38f1551443 Author: Xianyang LiuDate: 2017-06-18T13:18:38Z fix typo commit b22254bec313d1800624bc3c536eaa4fd12298f3 Author: Xianyang Liu Date: 2017-06-19T05:23:30Z Merge remote-tracking branch 'spark/master' into fixtypo commit e7baf5489c1472c180f8ec7609ec370b0ed9dabe Author: Xianyang Liu Date: 2017-06-19T05:49:14Z typo fix --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18343: [SPARK-21133][CORE] Fix HighlyCompressedMapStatus#writeE...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18343 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78239/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18343: [SPARK-21133][CORE] Fix HighlyCompressedMapStatus#writeE...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18343 **[Test build #78239 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78239/testReport)** for PR 18343 at commit [`e2816ec`](https://github.com/apache/spark/commit/e2816eccec9875144e4edf5679cf6594c1ca3874). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17328: [SPARK-19975][Python][SQL] Add map_keys and map_values f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17328 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17328: [SPARK-19975][Python][SQL] Add map_keys and map_values f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17328 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78242/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17328: [SPARK-19975][Python][SQL] Add map_keys and map_values f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17328 **[Test build #78242 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78242/testReport)** for PR 17328 at commit [`021b551`](https://github.com/apache/spark/commit/021b5513b9dbea546bc577e2e1b939dc8ebe85aa). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18343: [SPARK-21133][CORE] Fix HighlyCompressedMapStatus...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18343#discussion_r122623575 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -141,7 +141,7 @@ private[spark] class HighlyCompressedMapStatus private ( private[this] var numNonEmptyBlocks: Int, private[this] var emptyBlocks: RoaringBitmap, private[this] var avgSize: Long, -@transient private var hugeBlockSizes: Map[Int, Byte]) +private[this] var hugeBlockSizes: Map[Int, Byte]) --- End diff -- if you can figure out a way to make it serializable with kryo and still keep the customized serialization logic for java serializer, I'm ok with it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18343: [SPARK-21133][CORE] Fix HighlyCompressedMapStatus...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18343#discussion_r122623298 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -141,7 +141,7 @@ private[spark] class HighlyCompressedMapStatus private ( private[this] var numNonEmptyBlocks: Int, private[this] var emptyBlocks: RoaringBitmap, private[this] var avgSize: Long, -@transient private var hugeBlockSizes: Map[Int, Byte]) +private[this] var hugeBlockSizes: Map[Int, Byte]) --- End diff -- actually we can do better: use a bitmap to track which block id has size info, and a byte array to store these size data. so the format can be: `[num blocks] [bit map], [size array]` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18346: [SPARK-21134][SQL] Don't collapse codegen-only expressio...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18346 cc @cloud-fan @gatorsmile --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18269: [SPARK-21056][SQL] Use at most one spark job to list fil...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18269 let's wait @mallman 's response to make sure this patch does fix the problem --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18269: [SPARK-21056][SQL] Use at most one spark job to l...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18269#discussion_r122622206 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InMemoryFileIndex.scala --- @@ -248,60 +245,94 @@ object InMemoryFileIndex extends Logging { * @return all children of path that match the specified filter. */ private def listLeafFiles( - path: Path, + paths: Seq[Path], hadoopConf: Configuration, filter: PathFilter, - sessionOpt: Option[SparkSession]): Seq[FileStatus] = { -logTrace(s"Listing $path") -val fs = path.getFileSystem(hadoopConf) - -// [SPARK-17599] Prevent InMemoryFileIndex from failing if path doesn't exist -// Note that statuses only include FileStatus for the files and dirs directly under path, -// and does not include anything else recursively. -val statuses = try fs.listStatus(path) catch { - case _: FileNotFoundException => -logWarning(s"The directory $path was not found. Was it deleted very recently?") -Array.empty[FileStatus] -} - -val filteredStatuses = statuses.filterNot(status => shouldFilterOut(status.getPath.getName)) + sessionOpt: Option[SparkSession]): Seq[(Path, Seq[FileStatus])] = { +logTrace(s"Listing ${paths.mkString(", ")}") +if (paths.isEmpty) { + Nil +} else { + val fs = paths.head.getFileSystem(hadoopConf) + + // [SPARK-17599] Prevent InMemoryFileIndex from failing if path doesn't exist + // Note that statuses only include FileStatus for the files and dirs directly under path, + // and does not include anything else recursively. + val filteredStatuses = paths.flatMap { path => +try { + val fStatuses = fs.listStatus(path) + val filtered = fStatuses.filterNot(status => shouldFilterOut(status.getPath.getName)) + if (filtered.nonEmpty) { +Some(path -> filtered) + } else { +None + } +} catch { + case _: FileNotFoundException => +logWarning(s"The directory $paths was not found. Was it deleted very recently?") +None +} + } -val allLeafStatuses = { - val (dirs, topLevelFiles) = filteredStatuses.partition(_.isDirectory) - val nestedFiles: Seq[FileStatus] = sessionOpt match { -case Some(session) => - bulkListLeafFiles(dirs.map(_.getPath), hadoopConf, filter, session).flatMap(_._2) -case _ => - dirs.flatMap(dir => listLeafFiles(dir.getPath, hadoopConf, filter, sessionOpt)) + val allLeafStatuses = { +val (dirs, topLevelFiles) = filteredStatuses.flatMap { case (path, fStatuses) => + fStatuses.map { f => path -> f } +}.partition { case (_, fStatus) => fStatus.isDirectory } +val pathsToList = dirs.map { case (_, fStatus) => fStatus.getPath } +val nestedFiles = if (pathsToList.nonEmpty) { + sessionOpt match { +case Some(session) => + bulkListLeafFiles(pathsToList, hadoopConf, filter, session) +case _ => + listLeafFiles(pathsToList, hadoopConf, filter, sessionOpt) + } +} else Seq.empty[(Path, Seq[FileStatus])] +val allFiles = topLevelFiles.groupBy { case (path, _) => path } --- End diff -- nit: `xxx.groupBy(_._1)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18269: [SPARK-21056][SQL] Use at most one spark job to l...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18269#discussion_r122622157 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InMemoryFileIndex.scala --- @@ -248,60 +245,94 @@ object InMemoryFileIndex extends Logging { * @return all children of path that match the specified filter. */ private def listLeafFiles( - path: Path, + paths: Seq[Path], hadoopConf: Configuration, filter: PathFilter, - sessionOpt: Option[SparkSession]): Seq[FileStatus] = { -logTrace(s"Listing $path") -val fs = path.getFileSystem(hadoopConf) - -// [SPARK-17599] Prevent InMemoryFileIndex from failing if path doesn't exist -// Note that statuses only include FileStatus for the files and dirs directly under path, -// and does not include anything else recursively. -val statuses = try fs.listStatus(path) catch { - case _: FileNotFoundException => -logWarning(s"The directory $path was not found. Was it deleted very recently?") -Array.empty[FileStatus] -} - -val filteredStatuses = statuses.filterNot(status => shouldFilterOut(status.getPath.getName)) + sessionOpt: Option[SparkSession]): Seq[(Path, Seq[FileStatus])] = { +logTrace(s"Listing ${paths.mkString(", ")}") +if (paths.isEmpty) { + Nil +} else { + val fs = paths.head.getFileSystem(hadoopConf) + + // [SPARK-17599] Prevent InMemoryFileIndex from failing if path doesn't exist + // Note that statuses only include FileStatus for the files and dirs directly under path, + // and does not include anything else recursively. + val filteredStatuses = paths.flatMap { path => +try { + val fStatuses = fs.listStatus(path) + val filtered = fStatuses.filterNot(status => shouldFilterOut(status.getPath.getName)) + if (filtered.nonEmpty) { +Some(path -> filtered) + } else { +None + } +} catch { + case _: FileNotFoundException => +logWarning(s"The directory $paths was not found. Was it deleted very recently?") +None +} + } -val allLeafStatuses = { - val (dirs, topLevelFiles) = filteredStatuses.partition(_.isDirectory) - val nestedFiles: Seq[FileStatus] = sessionOpt match { -case Some(session) => - bulkListLeafFiles(dirs.map(_.getPath), hadoopConf, filter, session).flatMap(_._2) -case _ => - dirs.flatMap(dir => listLeafFiles(dir.getPath, hadoopConf, filter, sessionOpt)) + val allLeafStatuses = { +val (dirs, topLevelFiles) = filteredStatuses.flatMap { case (path, fStatuses) => + fStatuses.map { f => path -> f } +}.partition { case (_, fStatus) => fStatus.isDirectory } +val pathsToList = dirs.map { case (_, fStatus) => fStatus.getPath } +val nestedFiles = if (pathsToList.nonEmpty) { --- End diff -- do we need this `if` check? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18269: [SPARK-21056][SQL] Use at most one spark job to l...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18269#discussion_r122622031 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InMemoryFileIndex.scala --- @@ -248,60 +245,94 @@ object InMemoryFileIndex extends Logging { * @return all children of path that match the specified filter. */ private def listLeafFiles( - path: Path, + paths: Seq[Path], hadoopConf: Configuration, filter: PathFilter, - sessionOpt: Option[SparkSession]): Seq[FileStatus] = { -logTrace(s"Listing $path") -val fs = path.getFileSystem(hadoopConf) - -// [SPARK-17599] Prevent InMemoryFileIndex from failing if path doesn't exist -// Note that statuses only include FileStatus for the files and dirs directly under path, -// and does not include anything else recursively. -val statuses = try fs.listStatus(path) catch { - case _: FileNotFoundException => -logWarning(s"The directory $path was not found. Was it deleted very recently?") -Array.empty[FileStatus] -} - -val filteredStatuses = statuses.filterNot(status => shouldFilterOut(status.getPath.getName)) + sessionOpt: Option[SparkSession]): Seq[(Path, Seq[FileStatus])] = { +logTrace(s"Listing ${paths.mkString(", ")}") +if (paths.isEmpty) { + Nil +} else { + val fs = paths.head.getFileSystem(hadoopConf) + + // [SPARK-17599] Prevent InMemoryFileIndex from failing if path doesn't exist + // Note that statuses only include FileStatus for the files and dirs directly under path, + // and does not include anything else recursively. + val filteredStatuses = paths.flatMap { path => +try { + val fStatuses = fs.listStatus(path) + val filtered = fStatuses.filterNot(status => shouldFilterOut(status.getPath.getName)) + if (filtered.nonEmpty) { +Some(path -> filtered) + } else { +None + } +} catch { + case _: FileNotFoundException => +logWarning(s"The directory $paths was not found. Was it deleted very recently?") +None +} + } -val allLeafStatuses = { - val (dirs, topLevelFiles) = filteredStatuses.partition(_.isDirectory) - val nestedFiles: Seq[FileStatus] = sessionOpt match { -case Some(session) => - bulkListLeafFiles(dirs.map(_.getPath), hadoopConf, filter, session).flatMap(_._2) -case _ => - dirs.flatMap(dir => listLeafFiles(dir.getPath, hadoopConf, filter, sessionOpt)) + val allLeafStatuses = { +val (dirs, topLevelFiles) = filteredStatuses.flatMap { case (path, fStatuses) => + fStatuses.map { f => path -> f } +}.partition { case (_, fStatus) => fStatus.isDirectory } --- End diff -- nit: `xxx.partition(_._2. isDirectory)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17471: [SPARK-3577] Report Spill size on disk for UnsafeExterna...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17471 I was just looking though PRs for my curiosity. Please let me leave a gentle ping @sitalkedia. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18269: [SPARK-21056][SQL] Use at most one spark job to l...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18269#discussion_r122621941 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InMemoryFileIndex.scala --- @@ -248,60 +245,94 @@ object InMemoryFileIndex extends Logging { * @return all children of path that match the specified filter. */ private def listLeafFiles( - path: Path, + paths: Seq[Path], hadoopConf: Configuration, filter: PathFilter, - sessionOpt: Option[SparkSession]): Seq[FileStatus] = { -logTrace(s"Listing $path") -val fs = path.getFileSystem(hadoopConf) - -// [SPARK-17599] Prevent InMemoryFileIndex from failing if path doesn't exist -// Note that statuses only include FileStatus for the files and dirs directly under path, -// and does not include anything else recursively. -val statuses = try fs.listStatus(path) catch { - case _: FileNotFoundException => -logWarning(s"The directory $path was not found. Was it deleted very recently?") -Array.empty[FileStatus] -} - -val filteredStatuses = statuses.filterNot(status => shouldFilterOut(status.getPath.getName)) + sessionOpt: Option[SparkSession]): Seq[(Path, Seq[FileStatus])] = { +logTrace(s"Listing ${paths.mkString(", ")}") +if (paths.isEmpty) { + Nil +} else { + val fs = paths.head.getFileSystem(hadoopConf) + + // [SPARK-17599] Prevent InMemoryFileIndex from failing if path doesn't exist + // Note that statuses only include FileStatus for the files and dirs directly under path, + // and does not include anything else recursively. + val filteredStatuses = paths.flatMap { path => +try { + val fStatuses = fs.listStatus(path) + val filtered = fStatuses.filterNot(status => shouldFilterOut(status.getPath.getName)) + if (filtered.nonEmpty) { --- End diff -- nit: `filtered.map(path -> _)`, so that we don't need the `if-else` here, and the `flatMap` [there](https://github.com/apache/spark/pull/18269/files?diff=split#diff-95dcbab8e4f960b101c8a6b7a05fdc2fR278) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18303: [SPARK-19824][Core] Update JsonProtocol to keep c...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18303 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18092 **[Test build #78247 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78247/testReport)** for PR 18092 at commit [`d31d8da`](https://github.com/apache/spark/commit/d31d8da7952e1db527fa892087b2feb85799cae4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18092: [SPARK-20640][CORE]Make rpc timeout and retry for...
Github user liyichao commented on a diff in the pull request: https://github.com/apache/spark/pull/18092#discussion_r122621671 --- Diff: core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala --- @@ -1281,6 +1286,61 @@ class BlockManagerSuite extends SparkFunSuite with Matchers with BeforeAndAfterE assert(master.getLocations("item").isEmpty) } + test("SPARK-20640: Shuffle registration timeout and maxAttempts conf are working") { +val tryAgainMsg = "test_spark_20640_try_again" +// a server which delays response 50ms and must try twice for success. +def newShuffleServer(port: Int): (TransportServer, Int) = { + val attempts = new mutable.HashMap[String, Int]() + val handler = new NoOpRpcHandler { +override def receive( + client: TransportClient, --- End diff -- Oh. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18343: [SPARK-21133][CORE] Fix HighlyCompressedMapStatus...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18343#discussion_r122621682 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -141,7 +141,7 @@ private[spark] class HighlyCompressedMapStatus private ( private[this] var numNonEmptyBlocks: Int, private[this] var emptyBlocks: RoaringBitmap, private[this] var avgSize: Long, -@transient private var hugeBlockSizes: Map[Int, Byte]) +private[this] var hugeBlockSizes: Map[Int, Byte]) --- End diff -- Sounds good to me. However, the customized serialization logic looks similar to kyro's default way to serialize map. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18303: [SPARK-19824][Core] Update JsonProtocol to keep consiste...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18303 Thanks! Merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18303: [SPARK-19824][Core] Update JsonProtocol to keep consiste...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18303 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78235/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18346: [SPARK-21134][SQL] Don't collapse codegen-only expressio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18346 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18303: [SPARK-19824][Core] Update JsonProtocol to keep consiste...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18303 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18346: [SPARK-21134][SQL] Don't collapse codegen-only expressio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18346 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18346: [SPARK-21134][SQL] Don't collapse codegen-only expressio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18346 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78238/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18290: [SPARK-20989][Core] Fail to start multiple workers on on...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18290 LGTM, only one question: are we going to support it? To reuse the same shuffle service across workers or to allow multiple shuffle services on one host? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18346: [SPARK-21134][SQL] Don't collapse codegen-only expressio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18346 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78237/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18303: [SPARK-19824][Core] Update JsonProtocol to keep consiste...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18303 **[Test build #78235 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78235/testReport)** for PR 18303 at commit [`8c39912`](https://github.com/apache/spark/commit/8c399127741446b063c1b081593569bc76ad8fa8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18346: [SPARK-21134][SQL] Don't collapse codegen-only expressio...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18346 **[Test build #78238 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78238/testReport)** for PR 18346 at commit [`c2783f4`](https://github.com/apache/spark/commit/c2783f4bd3f50a1e0583276155df8d40ec3c1d55). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `trait CodegenOnlyExpression extends Expression ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18346: [SPARK-21134][SQL] Don't collapse codegen-only expressio...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18346 **[Test build #78237 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78237/testReport)** for PR 18346 at commit [`4f412b7`](https://github.com/apache/spark/commit/4f412b7d6b15f563c67c7cf12392798b88146fcf). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `trait CodegenOnlyExpression extends Expression ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18346: [SPARK-21134][SQL] Don't collapse codegen-only expressio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18346 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78236/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18346: [SPARK-21134][SQL] Don't collapse codegen-only expressio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18346 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18346: [SPARK-21134][SQL] Don't collapse codegen-only expressio...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18346 **[Test build #78236 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78236/testReport)** for PR 18346 at commit [`eead5e1`](https://github.com/apache/spark/commit/eead5e106f13a593e64a1c1d440bd66e2fbac48b). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `trait CodegenOnlyExpression extends Expression ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18092: [SPARK-20640][CORE]Make rpc timeout and retry for...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18092#discussion_r122620995 --- Diff: core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala --- @@ -1281,6 +1286,61 @@ class BlockManagerSuite extends SparkFunSuite with Matchers with BeforeAndAfterE assert(master.getLocations("item").isEmpty) } + test("SPARK-20640: Shuffle registration timeout and maxAttempts conf are working") { +val tryAgainMsg = "test_spark_20640_try_again" +// a server which delays response 50ms and must try twice for success. +def newShuffleServer(port: Int): (TransportServer, Int) = { + val attempts = new mutable.HashMap[String, Int]() + val handler = new NoOpRpcHandler { +override def receive( + client: TransportClient, --- End diff -- I mean 4 spaces indention, not to align with `def`... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18343: [SPARK-21133][CORE] Fix HighlyCompressedMapStatus...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18343#discussion_r122620862 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -141,7 +141,7 @@ private[spark] class HighlyCompressedMapStatus private ( private[this] var numNonEmptyBlocks: Int, private[this] var emptyBlocks: RoaringBitmap, private[this] var avgSize: Long, -@transient private var hugeBlockSizes: Map[Int, Byte]) +private[this] var hugeBlockSizes: Map[Int, Byte]) --- End diff -- we do want to serialize `hugeBlockSizes`, but with customized logic, that why we marked it `@transient`. I think the corrected fix is, make this class implements `KryoSerializable`, and copy the customized serialization logic for `hugeBlockSizes` to kryo serialization hooks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17401: [SPARK-18364][YARN] Expose metrics for YarnShuffleServic...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17401 gentle ping @ash211. I just wonder if it is active now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18324: [SPARK-21045][PYSPARK]Fixed executor blocked beca...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18324#discussion_r122620720 --- Diff: python/pyspark/worker.py --- @@ -177,8 +180,11 @@ def process(): process() except Exception: try: +exc_info = traceback.format_exc() +if isinstance(exc_info, unicode): +exc_info = exc_info.encode('utf-8') --- End diff -- Yes, we should take a closer look. BTW, just note that, they are a bit different in that sense this one needs to return bytes in Python 3 / string (bytes) in Python 2 whereas #17267 needs to produce string (unicode) in Python 3 / string (bytes) in Python 2. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15417: [SPARK-17851][SQL][TESTS] Make sure all test sqls in cat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15417 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15417: [SPARK-17851][SQL][TESTS] Make sure all test sqls in cat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15417 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78243/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15417: [SPARK-17851][SQL][TESTS] Make sure all test sqls in cat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15417 **[Test build #78243 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78243/testReport)** for PR 15417 at commit [`f255696`](https://github.com/apache/spark/commit/f25569613366970f98137f38aa3bb8bd7c97c538). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18092: [SPARK-20640][CORE]Make rpc timeout and retry for...
Github user liyichao commented on a diff in the pull request: https://github.com/apache/spark/pull/18092#discussion_r122620600 --- Diff: core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala --- @@ -1281,6 +1286,59 @@ class BlockManagerSuite extends SparkFunSuite with Matchers with BeforeAndAfterE assert(master.getLocations("item").isEmpty) } + test("SPARK-20640: Shuffle registration timeout and maxAttempts conf are working") { +val tryAgainMsg = "test_spark_20640_try_again" +// a server which delays response 50ms and must try twice for success. +def newShuffleServer(port: Int): (TransportServer, Int) = { + val attempts = new mutable.HashMap[String, Int]() + val handler = new NoOpRpcHandler { +override def receive(client: TransportClient, message: ByteBuffer, --- End diff -- Updated. By the way, I am a little confused. First, when you insert line break before, Intellij auto indent like this: ``` override def receive( client: TransportClient ``` Second, in the same file, at near 1349, `fetchBlocks`'s indent is like this: ``` override def fetchBlocks( host: String, port: Int, execId: String, blockIds: Array[String], listener: BlockFetchingListener, shuffleFiles: Array[File]): Unit = { ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17758: [SPARK-20460][SQL] Make it more consistent to handle col...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/17758 ok, I'll recheck. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18343: [SPARK-21133][CORE] Fix HighlyCompressedMapStatus#writeE...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18343 @wangyum Can you also add a test for this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18323: [SPARK-21117][SQL] Built-in SQL Function Support - WIDTH...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18323 **[Test build #78245 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78245/testReport)** for PR 18323 at commit [`7407541`](https://github.com/apache/spark/commit/740754135db80ff0ab60952b38defae65c017065). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14085: [SPARK-16408][SQL] SparkSQL Added file get Except...
Github user zenglinxi0615 commented on a diff in the pull request: https://github.com/apache/spark/pull/14085#discussion_r122620464 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/commands.scala --- @@ -113,8 +113,9 @@ case class AddFile(path: String) extends RunnableCommand { override def run(sqlContext: SQLContext): Seq[Row] = { val hiveContext = sqlContext.asInstanceOf[HiveContext] +val recursive = sqlContext.sparkContext.getConf.getBoolean("spark.input.dir.recursive", false) --- End diff -- I was wondering if we could call: sparkSession.sparkContext.addFile(path, true) in AddFileCommand func, since it's a general demand in ETL. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18092 **[Test build #78246 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78246/testReport)** for PR 18092 at commit [`c8e7c64`](https://github.com/apache/spark/commit/c8e7c64d7e599c3f6283f2390c1ea188e4ed899a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18092 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17758: [SPARK-20460][SQL] Make it more consistent to handle col...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/17758 shall we check duplicated columns in write path? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17395: [SPARK-20065][SS][WIP] Avoid to output empty parquet fil...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17395 hmmm. @uncleGen, shell we close this for now? reopening when it's ready would welcome. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17328: [SPARK-19975][Python][SQL] Add map_keys and map_values f...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17328 +1 for this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18324: [SPARK-21045][PYSPARK]Fixed executor blocked beca...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18324#discussion_r122620265 --- Diff: python/pyspark/worker.py --- @@ -177,8 +180,11 @@ def process(): process() except Exception: try: +exc_info = traceback.format_exc() +if isinstance(exc_info, unicode): +exc_info = exc_info.encode('utf-8') --- End diff -- I guess we need to follow #17267 each other to fix correctly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18092: [SPARK-20640][CORE]Make rpc timeout and retry for...
Github user liyichao commented on a diff in the pull request: https://github.com/apache/spark/pull/18092#discussion_r122620196 --- Diff: core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala --- @@ -1281,6 +1286,59 @@ class BlockManagerSuite extends SparkFunSuite with Matchers with BeforeAndAfterE assert(master.getLocations("item").isEmpty) } + test("SPARK-20640: Shuffle registration timeout and maxAttempts conf are working") { +val tryAgainMsg = "test_spark_20640_try_again" +// a server which delays response 50ms and must try twice for success. +def newShuffleServer(port: Int): (TransportServer, Int) = { + val attempts = new mutable.HashMap[String, Int]() + val handler = new NoOpRpcHandler { +override def receive(client: TransportClient, message: ByteBuffer, + callback: RpcResponseCallback): Unit = { + val msgObj = BlockTransferMessage.Decoder.fromByteBuffer(message) + msgObj match { +case exec: RegisterExecutor => + Thread.sleep(50) + val attempt = attempts.getOrElse(exec.execId, 0) + 1 + attempts(exec.execId) = attempt + if (attempt < 2) { +callback.onFailure(new Exception(tryAgainMsg)) +return + } + callback.onSuccess(ByteBuffer.wrap(new Array[Byte](0))) + } +} + } + + val transConf = SparkTransportConf.fromSparkConf(conf, "shuffle", numUsableCores = 0) + val transCtx = new TransportContext(transConf, handler, true) + (transCtx.createServer(port, Seq.empty[TransportServerBootstrap].asJava), port) +} +val candidatePort = RandomUtils.nextInt(1024, 65536) +val (server, shufflePort) = Utils.startServiceOnPort(candidatePort, --- End diff -- No, because `startServiceOnPort` will handle the conflicted port case. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15417: [SPARK-17851][SQL][TESTS] Make sure all test sqls in cat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15417 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15417: [SPARK-17851][SQL][TESTS] Make sure all test sqls in cat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15417 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78241/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15417: [SPARK-17851][SQL][TESTS] Make sure all test sqls in cat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15417 **[Test build #78241 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78241/testReport)** for PR 15417 at commit [`5436c38`](https://github.com/apache/spark/commit/5436c382e565e09754062795edea716fd0694cd3). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class DecimalPrecisionSuite extends AnalysisTest with BeforeAndAfter ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17084: [SPARK-18693][ML][MLLIB] ML Evaluators should use weight...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17084 gentle ping @imatiach-msft . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18025 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78244/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18025 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18025 **[Test build #78244 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78244/testReport)** for PR 18025 at commit [`6eae126`](https://github.com/apache/spark/commit/6eae126398e4229aa84130728792f407c67a75e6). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18343: [SPARK-21133][CORE] Fix HighlyCompressedMapStatus#writeE...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18343 @wangyum Thanks for updating. Can you try to disable kyro and try it again? So we can verify it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17681: [SPARK-20383][SQL] Supporting Create [temporary] Functio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17681 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78240/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17681: [SPARK-20383][SQL] Supporting Create [temporary] Functio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17681 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16766: [SPARK-19426][SQL] Custom coalesce for Dataset
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16766#discussion_r122619579 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -2603,12 +2603,27 @@ class Dataset[T] private[sql]( * current upstream partitions will be executed in parallel (per whatever * the current partitioning is). * + * A [[PartitionCoalescer]] can also be supplied allowing the behavior of the partitioning to be --- End diff -- Sounds this trait is unable to be generated as is in Java. Simply wrapping `` `...` `` would be fine. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16766: [SPARK-19426][SQL] Custom coalesce for Dataset
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16766#discussion_r122619526 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -2603,12 +2603,27 @@ class Dataset[T] private[sql]( * current upstream partitions will be executed in parallel (per whatever * the current partitioning is). * + * A [[PartitionCoalescer]] can also be supplied allowing the behavior of the partitioning to be + * customized similar to [[RDD.coalesce]]. --- End diff -- I think it should be `[[org.apache.spark.rdd.RDD##coalesce]]`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17681: [SPARK-20383][SQL] Supporting Create [temporary] Functio...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17681 **[Test build #78240 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78240/testReport)** for PR 17681 at commit [`f6898c4`](https://github.com/apache/spark/commit/f6898c44642420278a616fa65d1f9825fd762ee7). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class DropFunctionEvent(database: String, name: String) extends FunctionEvent` * `case class AlterFunctionEvent(database: String, name: String) extends FunctionEvent` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18343: [SPARK-21133][CORE] Fix HighlyCompressedMapStatus#writeE...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18343 Because we write/read `hugeBlockSizes` in `writeExternal`/`readExternal`, it seems to me that it is intended to be serialized. So I think removing `transient` should be ok. LGTM cc @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18349: [SPARK-20927][SS] Change some operators in Dataset to no...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18349 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18349: [SPARK-20927][SS] Change some operators in Dataset to no...
Github user ZiyueHuang commented on the issue: https://github.com/apache/spark/pull/18349 @zsxwing Could you please review this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18343: [SPARK-21133][CORE] Fix HighlyCompressedMapStatus#writeE...
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/18343 @viirya Yes, I' using `org.apache.spark.serializer.KryoSerializer`, [master branch](https://github.com/apache/spark/tree/ce49428ef7d640c1734e91ffcddc49dbc8547ba7) still has this issue, error logs: ``` 17/06/19 12:24:05 ERROR Utils: Exception encountered java.lang.NullPointerException at org.apache.spark.scheduler.HighlyCompressedMapStatus$$anonfun$writeExternal$2.apply$mcV$sp(MapStatus.scala:171) at org.apache.spark.scheduler.HighlyCompressedMapStatus$$anonfun$writeExternal$2.apply(MapStatus.scala:167) at org.apache.spark.scheduler.HighlyCompressedMapStatus$$anonfun$writeExternal$2.apply(MapStatus.scala:167) at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1306) at org.apache.spark.scheduler.HighlyCompressedMapStatus.writeExternal(MapStatus.scala:167) at java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1459) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1430) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1378) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) at org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply$mcV$sp(MapOutputTracker.scala:728) at org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:727) at org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:727) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1340) at org.apache.spark.MapOutputTracker$.serializeMapStatuses(MapOutputTracker.scala:730) at org.apache.spark.ShuffleStatus.serializedMapStatus(MapOutputTracker.scala:171) at org.apache.spark.MapOutputTrackerMaster$MessageLoop.run(MapOutputTracker.scala:389) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 17/06/19 12:24:05 ERROR MapOutputTrackerMaster: java.lang.NullPointerException java.io.IOException: java.lang.NullPointerException at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1313) at org.apache.spark.scheduler.HighlyCompressedMapStatus.writeExternal(MapStatus.scala:167) at java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1459) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1430) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1378) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) at org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply$mcV$sp(MapOutputTracker.scala:728) at org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:727) at org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:727) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1340) at org.apache.spark.MapOutputTracker$.serializeMapStatuses(MapOutputTracker.scala:730) at org.apache.spark.ShuffleStatus.serializedMapStatus(MapOutputTracker.scala:171) at org.apache.spark.MapOutputTrackerMaster$MessageLoop.run(MapOutputTracker.scala:389) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NullPointerException at org.apache.spark.scheduler.HighlyCompressedMapStatus$$anonfun$writeExternal$2.apply$mcV$sp(MapStatus.scala:171) at org.apache.spark.scheduler.HighlyCompressedMapStatus$$anonfun$writeExternal$2.apply(MapStatus.scala:167) at org.apache.spark.scheduler.HighlyCompressedMapStatus$$anonfun$writeExternal$2.apply(MapStatus.scala:167) at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1306) ... 17 more ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature
[GitHub] spark pull request #18349: [SPARK-20927][SS] Change some operators in Datase...
GitHub user ZiyueHuang opened a pull request: https://github.com/apache/spark/pull/18349 [SPARK-20927][SS] Change some operators in Dataset to no-op for a streaming query. ## What changes were proposed in this pull request? Change some operators(persist, unpersist, checkpoint) in Dataset to no-op (do nothing but log a warning) for a streaming query. ## How was this patch tested? ```scala df = spark.readStream.json(...) val dfCounts = df.persist().unpersist().checkpoint().groupBy().count() val query = dfCounts.writeStream.outputMode("complete").format("console").start() ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/ZiyueHuang/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18349.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18349 commit 8358f576bd255db458c630fc38537dd93c695246 Author: Ziyue HuangDate: 2017-06-15T06:05:18Z [SPARK-20927][SS] Change some operators to no-op for a streaming query without throwing an exception. commit 1713c568de223a78bc6db4455e8ea38c4f2d2267 Author: ZiyueHuang Date: 2017-06-15T06:32:57Z Revert "[SPARK-20927][SS] Change some operators to no-op for a streaming query without throwing an exception." This reverts commit 8358f576bd255db458c630fc38537dd93c695246. revert commit 372ada118883b7ac8924ff20bd76e4bd15b47d6f Author: ZiyueHuang Date: 2017-06-15T06:39:28Z [SPARK-20927][SS] Change some operators to no-op in streaming queries. commit f68e6b738d9330409850556c40d626bfb17a561e Author: ZiyueHuang Date: 2017-06-19T03:10:43Z scalastyle fix commit f127ada3281442fc1def31328deaab7463f88531 Author: ZiyueHuang Date: 2017-06-19T03:50:34Z comment fix commit bb1fc6f23d9a74f649aef7b6707a82f45df3eff3 Author: Ziyue Huang Date: 2017-06-19T04:17:59Z Merge pull request #1 from ZiyueHuang/dev [SPARK-20927][SS] Change some operators to no-op in streaming queries. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16347: [SPARK-18934][SQL] Writing to dynamic partitions does no...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16347 gentle ping @junegunn on ^. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18343: [SPARK-21133][CORE] Fix HighlyCompressedMapStatus#writeE...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18343 I think this should be addressed before 2.2. I already asked notice of other committers on dev mailing list. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14957: [SPARK-4502][SQL]Support parquet nested struct pruning a...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14957 @xuanyuanking, let's close this and help review #16578 if you agree on the comments above. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18343: [SPARK-21133][CORE] Fix HighlyCompressedMapStatus#writeE...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18343 @wangyum Are you using kyro serializer? I think it is why you hit this issue. Once you use kyro, I think the `readExternal` in `HighlyCompressedMapStatus` won't be used to deserialize the object in driver side. As `hugeBlockSizes` is a `transient` variable, it is null now. So when we try to serialize it again, as `MapOutputTracker.serializeMapStatuses` directly call `ObjectOutputStream.writeObject` to serialize it, it calls `HighlyCompressedMapStatus.writeExternal` and cause NPE. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/18025 This is how the doc for column_aggregate_functions looks like (only snapshot of the main parts): ![image](https://user-images.githubusercontent.com/11082368/27269174-85df12fa-5469-11e7-872d-d740fd382294.png) ![image](https://user-images.githubusercontent.com/11082368/27269177-8b35a67e-5469-11e7-80ac-7c804c3728d2.png) ![image](https://user-images.githubusercontent.com/11082368/27269180-8eb8c7a4-5469-11e7-8c4a-1de037bf078d.png) ![image](https://user-images.githubusercontent.com/11082368/27269184-91e39cb0-5469-11e7-932c-5eab772ec845.png) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18303: [SPARK-19824][Core] Update JsonProtocol to keep consiste...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18303 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18092 LGTM except 2 minor comments --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18092: [SPARK-20640][CORE]Make rpc timeout and retry for...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18092#discussion_r122617592 --- Diff: core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala --- @@ -1281,6 +1286,59 @@ class BlockManagerSuite extends SparkFunSuite with Matchers with BeforeAndAfterE assert(master.getLocations("item").isEmpty) } + test("SPARK-20640: Shuffle registration timeout and maxAttempts conf are working") { +val tryAgainMsg = "test_spark_20640_try_again" +// a server which delays response 50ms and must try twice for success. +def newShuffleServer(port: Int): (TransportServer, Int) = { + val attempts = new mutable.HashMap[String, Int]() + val handler = new NoOpRpcHandler { +override def receive(client: TransportClient, message: ByteBuffer, + callback: RpcResponseCallback): Unit = { + val msgObj = BlockTransferMessage.Decoder.fromByteBuffer(message) + msgObj match { +case exec: RegisterExecutor => + Thread.sleep(50) + val attempt = attempts.getOrElse(exec.execId, 0) + 1 + attempts(exec.execId) = attempt + if (attempt < 2) { +callback.onFailure(new Exception(tryAgainMsg)) +return + } + callback.onSuccess(ByteBuffer.wrap(new Array[Byte](0))) + } +} + } + + val transConf = SparkTransportConf.fromSparkConf(conf, "shuffle", numUsableCores = 0) + val transCtx = new TransportContext(transConf, handler, true) + (transCtx.createServer(port, Seq.empty[TransportServerBootstrap].asJava), port) +} +val candidatePort = RandomUtils.nextInt(1024, 65536) +val (server, shufflePort) = Utils.startServiceOnPort(candidatePort, --- End diff -- will this be flaky? e.g. the port is occupied by other test suites --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18025: [SPARK-20889][SparkR] Grouped documentation for A...
Github user actuaryzhang commented on a diff in the pull request: https://github.com/apache/spark/pull/18025#discussion_r122617531 --- Diff: R/pkg/R/stats.R --- @@ -52,22 +52,17 @@ setMethod("crosstab", collect(dataFrame(sct)) }) -#' Calculate the sample covariance of two numerical columns of a SparkDataFrame. -#' #' @param colName1 the name of the first column #' @param colName2 the name of the second column -#' @return The covariance of the two columns. --- End diff -- OK. I added this back. The doc should be very clear even without this return value. Indeed, most functions do not document return value in SparkR. See what it looks like in the image attached in the next comment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18025 **[Test build #78244 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78244/testReport)** for PR 18025 at commit [`6eae126`](https://github.com/apache/spark/commit/6eae126398e4229aa84130728792f407c67a75e6). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18092: [SPARK-20640][CORE]Make rpc timeout and retry for...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18092#discussion_r122617448 --- Diff: core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala --- @@ -1281,6 +1286,59 @@ class BlockManagerSuite extends SparkFunSuite with Matchers with BeforeAndAfterE assert(master.getLocations("item").isEmpty) } + test("SPARK-20640: Shuffle registration timeout and maxAttempts conf are working") { +val tryAgainMsg = "test_spark_20640_try_again" +// a server which delays response 50ms and must try twice for success. +def newShuffleServer(port: Int): (TransportServer, Int) = { + val attempts = new mutable.HashMap[String, Int]() + val handler = new NoOpRpcHandler { +override def receive(client: TransportClient, message: ByteBuffer, --- End diff -- nit: ``` def xxxI( para1: XXX para2: XXX): XXX ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13893: [SPARK-14172][SQL] Hive table partition predicate not pa...
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/13893 ya, this still exists. Let me find some time to resolve this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18025: [SPARK-20889][SparkR] Grouped documentation for A...
Github user actuaryzhang commented on a diff in the pull request: https://github.com/apache/spark/pull/18025#discussion_r122617405 --- Diff: R/pkg/R/stats.R --- @@ -52,22 +52,17 @@ setMethod("crosstab", collect(dataFrame(sct)) }) -#' Calculate the sample covariance of two numerical columns of a SparkDataFrame. -#' #' @param colName1 the name of the first column #' @param colName2 the name of the second column -#' @return The covariance of the two columns. #' #' @rdname cov -#' @name cov #' @aliases cov,SparkDataFrame-method #' @family stat functions #' @export #' @examples -#'\dontrun{ -#' df <- read.json("/path/to/file.json") -#' cov <- cov(df, "title", "gender") -#' } +#' +#' \dontrun{ --- End diff -- No. The newline should be between `@example` and `\dontrun` to separate multiple `dontruns`. ![image](https://user-images.githubusercontent.com/11082368/27269043-73785762-5468-11e7-9a31-5cca104e005b.png) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12257: [SPARK-14483][WEBUI] Display user name for each job and ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/12257 gentle ping @sarutak on ^ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18347: [SPARK-20599][SS] ConsoleSink should work with (b...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/18347#discussion_r122617147 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -465,6 +465,8 @@ case class DataSource( providingClass.newInstance() match { case dataSource: CreatableRelationProvider => SaveIntoDataSourceCommand(data, dataSource, caseInsensitiveOptions, mode) + case dataSource: ConsoleSinkProvider => +data.show(data.count().toInt, false) --- End diff -- `ConsoleSink` [has two options](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/console.scala#L27-L30) that could be used here -- `numRows` and `truncate`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18296: [SPARK-21090][core]Optimize the unified memory manager c...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18296 Actually this is a bug fix, and it's a small fix(without tests, only 3 lines), so backporting in to 2.2 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11887: [SPARK-13041][Mesos]add driver sandbox uri to the dispat...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/11887 @skonto, do you maybe know (or a wild guess) when we would be able to proceed this? Probably, closing this for now and reopening might be an option. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18347: [SPARK-20599][SS] ConsoleSink should work with (b...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/18347#discussion_r122616876 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -465,6 +465,8 @@ case class DataSource( providingClass.newInstance() match { case dataSource: CreatableRelationProvider => SaveIntoDataSourceCommand(data, dataSource, caseInsensitiveOptions, mode) + case dataSource: ConsoleSinkProvider => --- End diff -- Underscore `dataSource` since it's not used. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11420: [SPARK-13493][SQL] Enable case sensitiveness in json sch...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/11420 Let's close this if it is not in a progress. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18025: [SPARK-20889][SparkR] Grouped documentation for A...
Github user actuaryzhang commented on a diff in the pull request: https://github.com/apache/spark/pull/18025#discussion_r122616787 --- Diff: R/pkg/R/stats.R --- @@ -52,22 +52,17 @@ setMethod("crosstab", collect(dataFrame(sct)) }) -#' Calculate the sample covariance of two numerical columns of a SparkDataFrame. --- End diff -- The method for SparkDataFrame is still there. I'm just removing redundant doc here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18296: [SPARK-21090][core]Optimize the unified memory ma...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18296 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11205: [SPARK-11334][Core] Handle maximum task failure situatio...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/11205 gentle ping @rustgi, have you maybe had some time to confirm this patch maybe? It sounds the only thing we need here is the confirmation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18296: [SPARK-21090][core]Optimize the unified memory manager c...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18296 LGTM, merging to master! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #10861: SPARK-12948. [SQL]. Consider reducing size of broadcasts...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/10861 Hi @rajeshbalamohan, I think this should be a mergeable state at least and the conflicts and style issues should be resolved. Would you be able to update this for now? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18334: [SPARK-21127] [SQL] Update statistics after data changin...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18334 +1 to provide a flag to automatically trigger the stats updates. We cat set it false by default to not surprise users --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org