date:20170618

[GitHub] spark issue #18350: [MINOR] Fix some typo of the document

2017-06-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18350
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18343: [SPARK-21133][CORE] Fix HighlyCompressedMapStatus...

2017-06-18 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18343#discussion_r122625325
  
--- Diff: 
core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala ---
@@ -175,6 +175,7 @@ class KryoSerializer(conf: SparkConf)
 kryo.register(None.getClass)
 kryo.register(Nil.getClass)
 
kryo.register(Utils.classForName("scala.collection.immutable.$colon$colon"))
+
kryo.register(Utils.classForName("scala.collection.immutable.Map$EmptyMap$"))
--- End diff --

why `Map$EmptyMap$`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18343: [SPARK-21133][CORE] Fix HighlyCompressedMapStatus...

2017-06-18 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18343#discussion_r122625202
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala ---
@@ -141,7 +141,7 @@ private[spark] class HighlyCompressedMapStatus private (
 private[this] var numNonEmptyBlocks: Int,
 private[this] var emptyBlocks: RoaringBitmap,
 private[this] var avgSize: Long,
-@transient private var hugeBlockSizes: Map[Int, Byte])
+private[this] var hugeBlockSizes: Map[Int, Byte])
--- End diff --

oh seems it is now, then LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18350: [MINOR] Fix some typo of the document

2017-06-18 Thread ConeyLiu

Github user ConeyLiu commented on the issue:

https://github.com/apache/spark/pull/18350
  
Hi @srowen, would you mind take a look? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18343: [SPARK-21133][CORE] Fix HighlyCompressedMapStatus#writeE...

2017-06-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18343
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18350: [MINOR] Fix some typo of the document

2017-06-18 Thread ConeyLiu

GitHub user ConeyLiu opened a pull request:

https://github.com/apache/spark/pull/18350

[MINOR] Fix some typo of the document

## What changes were proposed in this pull request?

Fix some typo of the document.

## How was this patch tested?

Existing tests.

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ConeyLiu/spark fixtypo

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18350.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18350


commit 32841d64dd0026b2bc3030196ada1e38f1551443
Author: Xianyang Liu 
Date:   2017-06-18T13:18:38Z

fix typo

commit b22254bec313d1800624bc3c536eaa4fd12298f3
Author: Xianyang Liu 
Date:   2017-06-19T05:23:30Z

Merge remote-tracking branch 'spark/master' into fixtypo

commit e7baf5489c1472c180f8ec7609ec370b0ed9dabe
Author: Xianyang Liu 
Date:   2017-06-19T05:49:14Z

typo fix




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18343: [SPARK-21133][CORE] Fix HighlyCompressedMapStatus#writeE...

2017-06-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18343
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78239/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18343: [SPARK-21133][CORE] Fix HighlyCompressedMapStatus#writeE...

2017-06-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18343
  
**[Test build #78239 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78239/testReport)**
 for PR 18343 at commit 
[`e2816ec`](https://github.com/apache/spark/commit/e2816eccec9875144e4edf5679cf6594c1ca3874).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17328: [SPARK-19975][Python][SQL] Add map_keys and map_values f...

2017-06-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17328
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17328: [SPARK-19975][Python][SQL] Add map_keys and map_values f...

2017-06-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17328
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78242/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17328: [SPARK-19975][Python][SQL] Add map_keys and map_values f...

2017-06-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17328
  
**[Test build #78242 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78242/testReport)**
 for PR 17328 at commit 
[`021b551`](https://github.com/apache/spark/commit/021b5513b9dbea546bc577e2e1b939dc8ebe85aa).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18343: [SPARK-21133][CORE] Fix HighlyCompressedMapStatus...

2017-06-18 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18343#discussion_r122623575
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala ---
@@ -141,7 +141,7 @@ private[spark] class HighlyCompressedMapStatus private (
 private[this] var numNonEmptyBlocks: Int,
 private[this] var emptyBlocks: RoaringBitmap,
 private[this] var avgSize: Long,
-@transient private var hugeBlockSizes: Map[Int, Byte])
+private[this] var hugeBlockSizes: Map[Int, Byte])
--- End diff --

if you can figure out a way to make it serializable with kryo and still 
keep the customized serialization logic for java serializer, I'm ok with it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18343: [SPARK-21133][CORE] Fix HighlyCompressedMapStatus...

2017-06-18 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18343#discussion_r122623298
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala ---
@@ -141,7 +141,7 @@ private[spark] class HighlyCompressedMapStatus private (
 private[this] var numNonEmptyBlocks: Int,
 private[this] var emptyBlocks: RoaringBitmap,
 private[this] var avgSize: Long,
-@transient private var hugeBlockSizes: Map[Int, Byte])
+private[this] var hugeBlockSizes: Map[Int, Byte])
--- End diff --

actually we can do better: use a bitmap to track which block id has size 
info, and a byte array to store these size data. so the format can be: `[num 
blocks] [bit map], [size array]`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18346: [SPARK-21134][SQL] Don't collapse codegen-only expressio...

2017-06-18 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18346
  
cc @cloud-fan @gatorsmile 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18269: [SPARK-21056][SQL] Use at most one spark job to list fil...

2017-06-18 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18269
  
let's wait @mallman 's response to make sure this patch does fix the problem


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18269: [SPARK-21056][SQL] Use at most one spark job to l...

2017-06-18 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18269#discussion_r122622206
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InMemoryFileIndex.scala
 ---
@@ -248,60 +245,94 @@ object InMemoryFileIndex extends Logging {
* @return all children of path that match the specified filter.
*/
   private def listLeafFiles(
-  path: Path,
+  paths: Seq[Path],
   hadoopConf: Configuration,
   filter: PathFilter,
-  sessionOpt: Option[SparkSession]): Seq[FileStatus] = {
-logTrace(s"Listing $path")
-val fs = path.getFileSystem(hadoopConf)
-
-// [SPARK-17599] Prevent InMemoryFileIndex from failing if path 
doesn't exist
-// Note that statuses only include FileStatus for the files and dirs 
directly under path,
-// and does not include anything else recursively.
-val statuses = try fs.listStatus(path) catch {
-  case _: FileNotFoundException =>
-logWarning(s"The directory $path was not found. Was it deleted 
very recently?")
-Array.empty[FileStatus]
-}
-
-val filteredStatuses = statuses.filterNot(status => 
shouldFilterOut(status.getPath.getName))
+  sessionOpt: Option[SparkSession]): Seq[(Path, Seq[FileStatus])] = {
+logTrace(s"Listing ${paths.mkString(", ")}")
+if (paths.isEmpty) {
+  Nil
+} else {
+  val fs = paths.head.getFileSystem(hadoopConf)
+
+  // [SPARK-17599] Prevent InMemoryFileIndex from failing if path 
doesn't exist
+  // Note that statuses only include FileStatus for the files and dirs 
directly under path,
+  // and does not include anything else recursively.
+  val filteredStatuses = paths.flatMap { path =>
+try {
+  val fStatuses = fs.listStatus(path)
+  val filtered = fStatuses.filterNot(status => 
shouldFilterOut(status.getPath.getName))
+  if (filtered.nonEmpty) {
+Some(path -> filtered)
+  } else {
+None
+  }
+} catch {
+  case _: FileNotFoundException =>
+logWarning(s"The directory $paths was not found. Was it 
deleted very recently?")
+None
+}
+  }
 
-val allLeafStatuses = {
-  val (dirs, topLevelFiles) = filteredStatuses.partition(_.isDirectory)
-  val nestedFiles: Seq[FileStatus] = sessionOpt match {
-case Some(session) =>
-  bulkListLeafFiles(dirs.map(_.getPath), hadoopConf, filter, 
session).flatMap(_._2)
-case _ =>
-  dirs.flatMap(dir => listLeafFiles(dir.getPath, hadoopConf, 
filter, sessionOpt))
+  val allLeafStatuses = {
+val (dirs, topLevelFiles) = filteredStatuses.flatMap { case (path, 
fStatuses) =>
+  fStatuses.map { f => path -> f }
+}.partition { case (_, fStatus) => fStatus.isDirectory }
+val pathsToList = dirs.map { case (_, fStatus) => fStatus.getPath }
+val nestedFiles = if (pathsToList.nonEmpty) {
+  sessionOpt match {
+case Some(session) =>
+  bulkListLeafFiles(pathsToList, hadoopConf, filter, session)
+case _ =>
+  listLeafFiles(pathsToList, hadoopConf, filter, sessionOpt)
+  }
+} else Seq.empty[(Path, Seq[FileStatus])]
+val allFiles = topLevelFiles.groupBy { case (path, _) => path }
--- End diff --

nit: `xxx.groupBy(_._1)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18269: [SPARK-21056][SQL] Use at most one spark job to l...

2017-06-18 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18269#discussion_r122622157
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InMemoryFileIndex.scala
 ---
@@ -248,60 +245,94 @@ object InMemoryFileIndex extends Logging {
* @return all children of path that match the specified filter.
*/
   private def listLeafFiles(
-  path: Path,
+  paths: Seq[Path],
   hadoopConf: Configuration,
   filter: PathFilter,
-  sessionOpt: Option[SparkSession]): Seq[FileStatus] = {
-logTrace(s"Listing $path")
-val fs = path.getFileSystem(hadoopConf)
-
-// [SPARK-17599] Prevent InMemoryFileIndex from failing if path 
doesn't exist
-// Note that statuses only include FileStatus for the files and dirs 
directly under path,
-// and does not include anything else recursively.
-val statuses = try fs.listStatus(path) catch {
-  case _: FileNotFoundException =>
-logWarning(s"The directory $path was not found. Was it deleted 
very recently?")
-Array.empty[FileStatus]
-}
-
-val filteredStatuses = statuses.filterNot(status => 
shouldFilterOut(status.getPath.getName))
+  sessionOpt: Option[SparkSession]): Seq[(Path, Seq[FileStatus])] = {
+logTrace(s"Listing ${paths.mkString(", ")}")
+if (paths.isEmpty) {
+  Nil
+} else {
+  val fs = paths.head.getFileSystem(hadoopConf)
+
+  // [SPARK-17599] Prevent InMemoryFileIndex from failing if path 
doesn't exist
+  // Note that statuses only include FileStatus for the files and dirs 
directly under path,
+  // and does not include anything else recursively.
+  val filteredStatuses = paths.flatMap { path =>
+try {
+  val fStatuses = fs.listStatus(path)
+  val filtered = fStatuses.filterNot(status => 
shouldFilterOut(status.getPath.getName))
+  if (filtered.nonEmpty) {
+Some(path -> filtered)
+  } else {
+None
+  }
+} catch {
+  case _: FileNotFoundException =>
+logWarning(s"The directory $paths was not found. Was it 
deleted very recently?")
+None
+}
+  }
 
-val allLeafStatuses = {
-  val (dirs, topLevelFiles) = filteredStatuses.partition(_.isDirectory)
-  val nestedFiles: Seq[FileStatus] = sessionOpt match {
-case Some(session) =>
-  bulkListLeafFiles(dirs.map(_.getPath), hadoopConf, filter, 
session).flatMap(_._2)
-case _ =>
-  dirs.flatMap(dir => listLeafFiles(dir.getPath, hadoopConf, 
filter, sessionOpt))
+  val allLeafStatuses = {
+val (dirs, topLevelFiles) = filteredStatuses.flatMap { case (path, 
fStatuses) =>
+  fStatuses.map { f => path -> f }
+}.partition { case (_, fStatus) => fStatus.isDirectory }
+val pathsToList = dirs.map { case (_, fStatus) => fStatus.getPath }
+val nestedFiles = if (pathsToList.nonEmpty) {
--- End diff --

do we need this `if` check?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18269: [SPARK-21056][SQL] Use at most one spark job to l...

2017-06-18 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18269#discussion_r122622031
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InMemoryFileIndex.scala
 ---
@@ -248,60 +245,94 @@ object InMemoryFileIndex extends Logging {
* @return all children of path that match the specified filter.
*/
   private def listLeafFiles(
-  path: Path,
+  paths: Seq[Path],
   hadoopConf: Configuration,
   filter: PathFilter,
-  sessionOpt: Option[SparkSession]): Seq[FileStatus] = {
-logTrace(s"Listing $path")
-val fs = path.getFileSystem(hadoopConf)
-
-// [SPARK-17599] Prevent InMemoryFileIndex from failing if path 
doesn't exist
-// Note that statuses only include FileStatus for the files and dirs 
directly under path,
-// and does not include anything else recursively.
-val statuses = try fs.listStatus(path) catch {
-  case _: FileNotFoundException =>
-logWarning(s"The directory $path was not found. Was it deleted 
very recently?")
-Array.empty[FileStatus]
-}
-
-val filteredStatuses = statuses.filterNot(status => 
shouldFilterOut(status.getPath.getName))
+  sessionOpt: Option[SparkSession]): Seq[(Path, Seq[FileStatus])] = {
+logTrace(s"Listing ${paths.mkString(", ")}")
+if (paths.isEmpty) {
+  Nil
+} else {
+  val fs = paths.head.getFileSystem(hadoopConf)
+
+  // [SPARK-17599] Prevent InMemoryFileIndex from failing if path 
doesn't exist
+  // Note that statuses only include FileStatus for the files and dirs 
directly under path,
+  // and does not include anything else recursively.
+  val filteredStatuses = paths.flatMap { path =>
+try {
+  val fStatuses = fs.listStatus(path)
+  val filtered = fStatuses.filterNot(status => 
shouldFilterOut(status.getPath.getName))
+  if (filtered.nonEmpty) {
+Some(path -> filtered)
+  } else {
+None
+  }
+} catch {
+  case _: FileNotFoundException =>
+logWarning(s"The directory $paths was not found. Was it 
deleted very recently?")
+None
+}
+  }
 
-val allLeafStatuses = {
-  val (dirs, topLevelFiles) = filteredStatuses.partition(_.isDirectory)
-  val nestedFiles: Seq[FileStatus] = sessionOpt match {
-case Some(session) =>
-  bulkListLeafFiles(dirs.map(_.getPath), hadoopConf, filter, 
session).flatMap(_._2)
-case _ =>
-  dirs.flatMap(dir => listLeafFiles(dir.getPath, hadoopConf, 
filter, sessionOpt))
+  val allLeafStatuses = {
+val (dirs, topLevelFiles) = filteredStatuses.flatMap { case (path, 
fStatuses) =>
+  fStatuses.map { f => path -> f }
+}.partition { case (_, fStatus) => fStatus.isDirectory }
--- End diff --

nit: `xxx.partition(_._2. isDirectory)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17471: [SPARK-3577] Report Spill size on disk for UnsafeExterna...

2017-06-18 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/17471
  
I was just looking though PRs for my curiosity. Please let me leave a 
gentle ping @sitalkedia.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18269: [SPARK-21056][SQL] Use at most one spark job to l...

2017-06-18 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18269#discussion_r122621941
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InMemoryFileIndex.scala
 ---
@@ -248,60 +245,94 @@ object InMemoryFileIndex extends Logging {
* @return all children of path that match the specified filter.
*/
   private def listLeafFiles(
-  path: Path,
+  paths: Seq[Path],
   hadoopConf: Configuration,
   filter: PathFilter,
-  sessionOpt: Option[SparkSession]): Seq[FileStatus] = {
-logTrace(s"Listing $path")
-val fs = path.getFileSystem(hadoopConf)
-
-// [SPARK-17599] Prevent InMemoryFileIndex from failing if path 
doesn't exist
-// Note that statuses only include FileStatus for the files and dirs 
directly under path,
-// and does not include anything else recursively.
-val statuses = try fs.listStatus(path) catch {
-  case _: FileNotFoundException =>
-logWarning(s"The directory $path was not found. Was it deleted 
very recently?")
-Array.empty[FileStatus]
-}
-
-val filteredStatuses = statuses.filterNot(status => 
shouldFilterOut(status.getPath.getName))
+  sessionOpt: Option[SparkSession]): Seq[(Path, Seq[FileStatus])] = {
+logTrace(s"Listing ${paths.mkString(", ")}")
+if (paths.isEmpty) {
+  Nil
+} else {
+  val fs = paths.head.getFileSystem(hadoopConf)
+
+  // [SPARK-17599] Prevent InMemoryFileIndex from failing if path 
doesn't exist
+  // Note that statuses only include FileStatus for the files and dirs 
directly under path,
+  // and does not include anything else recursively.
+  val filteredStatuses = paths.flatMap { path =>
+try {
+  val fStatuses = fs.listStatus(path)
+  val filtered = fStatuses.filterNot(status => 
shouldFilterOut(status.getPath.getName))
+  if (filtered.nonEmpty) {
--- End diff --

nit: `filtered.map(path -> _)`, so that we don't need the `if-else` here, 
and the `flatMap` 
[there](https://github.com/apache/spark/pull/18269/files?diff=split#diff-95dcbab8e4f960b101c8a6b7a05fdc2fR278)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18303: [SPARK-19824][Core] Update JsonProtocol to keep c...

2017-06-18 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18303


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

2017-06-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18092
  
**[Test build #78247 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78247/testReport)**
 for PR 18092 at commit 
[`d31d8da`](https://github.com/apache/spark/commit/d31d8da7952e1db527fa892087b2feb85799cae4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18092: [SPARK-20640][CORE]Make rpc timeout and retry for...

2017-06-18 Thread liyichao

Github user liyichao commented on a diff in the pull request:

https://github.com/apache/spark/pull/18092#discussion_r122621671
  
--- Diff: 
core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala ---
@@ -1281,6 +1286,61 @@ class BlockManagerSuite extends SparkFunSuite with 
Matchers with BeforeAndAfterE
 assert(master.getLocations("item").isEmpty)
   }
 
+  test("SPARK-20640: Shuffle registration timeout and maxAttempts conf are 
working") {
+val tryAgainMsg = "test_spark_20640_try_again"
+// a server which delays response 50ms and must try twice for success.
+def newShuffleServer(port: Int): (TransportServer, Int) = {
+  val attempts = new mutable.HashMap[String, Int]()
+  val handler = new NoOpRpcHandler {
+override def receive(
+ client: TransportClient,
--- End diff --

Oh.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18343: [SPARK-21133][CORE] Fix HighlyCompressedMapStatus...

2017-06-18 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18343#discussion_r122621682
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala ---
@@ -141,7 +141,7 @@ private[spark] class HighlyCompressedMapStatus private (
 private[this] var numNonEmptyBlocks: Int,
 private[this] var emptyBlocks: RoaringBitmap,
 private[this] var avgSize: Long,
-@transient private var hugeBlockSizes: Map[Int, Byte])
+private[this] var hugeBlockSizes: Map[Int, Byte])
--- End diff --

Sounds good to me. However, the customized serialization logic looks 
similar to kyro's default way to serialize map.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18303: [SPARK-19824][Core] Update JsonProtocol to keep consiste...

2017-06-18 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/18303
  
Thanks! Merging to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18303: [SPARK-19824][Core] Update JsonProtocol to keep consiste...

2017-06-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18303
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78235/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18346: [SPARK-21134][SQL] Don't collapse codegen-only expressio...

2017-06-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18346
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18303: [SPARK-19824][Core] Update JsonProtocol to keep consiste...

2017-06-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18303
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18346: [SPARK-21134][SQL] Don't collapse codegen-only expressio...

2017-06-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18346
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18346: [SPARK-21134][SQL] Don't collapse codegen-only expressio...

2017-06-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18346
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78238/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18290: [SPARK-20989][Core] Fail to start multiple workers on on...

2017-06-18 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18290
  
LGTM, only one question: are we going to support it? To reuse the same 
shuffle service across workers or to allow multiple shuffle services on one 
host?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18346: [SPARK-21134][SQL] Don't collapse codegen-only expressio...

2017-06-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18346
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78237/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18303: [SPARK-19824][Core] Update JsonProtocol to keep consiste...

2017-06-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18303
  
**[Test build #78235 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78235/testReport)**
 for PR 18303 at commit 
[`8c39912`](https://github.com/apache/spark/commit/8c399127741446b063c1b081593569bc76ad8fa8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18346: [SPARK-21134][SQL] Don't collapse codegen-only expressio...

2017-06-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18346
  
**[Test build #78238 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78238/testReport)**
 for PR 18346 at commit 
[`c2783f4`](https://github.com/apache/spark/commit/c2783f4bd3f50a1e0583276155df8d40ec3c1d55).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `trait CodegenOnlyExpression extends Expression `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18346: [SPARK-21134][SQL] Don't collapse codegen-only expressio...

2017-06-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18346
  
**[Test build #78237 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78237/testReport)**
 for PR 18346 at commit 
[`4f412b7`](https://github.com/apache/spark/commit/4f412b7d6b15f563c67c7cf12392798b88146fcf).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `trait CodegenOnlyExpression extends Expression `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18346: [SPARK-21134][SQL] Don't collapse codegen-only expressio...

2017-06-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18346
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78236/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18346: [SPARK-21134][SQL] Don't collapse codegen-only expressio...

2017-06-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18346
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18346: [SPARK-21134][SQL] Don't collapse codegen-only expressio...

2017-06-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18346
  
**[Test build #78236 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78236/testReport)**
 for PR 18346 at commit 
[`eead5e1`](https://github.com/apache/spark/commit/eead5e106f13a593e64a1c1d440bd66e2fbac48b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `trait CodegenOnlyExpression extends Expression `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18092: [SPARK-20640][CORE]Make rpc timeout and retry for...

2017-06-18 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18092#discussion_r122620995
  
--- Diff: 
core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala ---
@@ -1281,6 +1286,61 @@ class BlockManagerSuite extends SparkFunSuite with 
Matchers with BeforeAndAfterE
 assert(master.getLocations("item").isEmpty)
   }
 
+  test("SPARK-20640: Shuffle registration timeout and maxAttempts conf are 
working") {
+val tryAgainMsg = "test_spark_20640_try_again"
+// a server which delays response 50ms and must try twice for success.
+def newShuffleServer(port: Int): (TransportServer, Int) = {
+  val attempts = new mutable.HashMap[String, Int]()
+  val handler = new NoOpRpcHandler {
+override def receive(
+ client: TransportClient,
--- End diff --

I mean 4 spaces indention, not to align with `def`...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18343: [SPARK-21133][CORE] Fix HighlyCompressedMapStatus...

2017-06-18 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18343#discussion_r122620862
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala ---
@@ -141,7 +141,7 @@ private[spark] class HighlyCompressedMapStatus private (
 private[this] var numNonEmptyBlocks: Int,
 private[this] var emptyBlocks: RoaringBitmap,
 private[this] var avgSize: Long,
-@transient private var hugeBlockSizes: Map[Int, Byte])
+private[this] var hugeBlockSizes: Map[Int, Byte])
--- End diff --

we do want to serialize `hugeBlockSizes`, but with customized logic, that 
why we marked it `@transient`.

I think the corrected fix is, make this class implements 
`KryoSerializable`, and copy the customized serialization logic for 
`hugeBlockSizes` to kryo serialization hooks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17401: [SPARK-18364][YARN] Expose metrics for YarnShuffleServic...

2017-06-18 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/17401
  
gentle ping @ash211. I just wonder if it is active now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18324: [SPARK-21045][PYSPARK]Fixed executor blocked beca...

2017-06-18 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18324#discussion_r122620720
  
--- Diff: python/pyspark/worker.py ---
@@ -177,8 +180,11 @@ def process():
 process()
 except Exception:
 try:
+exc_info = traceback.format_exc()
+if isinstance(exc_info, unicode):
+exc_info = exc_info.encode('utf-8')
--- End diff --

Yes, we should take a closer look. BTW, just note that, they are a bit 
different in that sense this one needs to return bytes in Python 3 / string 
(bytes) in Python 2 whereas #17267 needs to produce string (unicode) in Python 
3 / string (bytes) in Python 2.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15417: [SPARK-17851][SQL][TESTS] Make sure all test sqls in cat...

2017-06-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15417
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15417: [SPARK-17851][SQL][TESTS] Make sure all test sqls in cat...

2017-06-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15417
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78243/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15417: [SPARK-17851][SQL][TESTS] Make sure all test sqls in cat...

2017-06-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15417
  
**[Test build #78243 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78243/testReport)**
 for PR 15417 at commit 
[`f255696`](https://github.com/apache/spark/commit/f25569613366970f98137f38aa3bb8bd7c97c538).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18092: [SPARK-20640][CORE]Make rpc timeout and retry for...

2017-06-18 Thread liyichao

Github user liyichao commented on a diff in the pull request:

https://github.com/apache/spark/pull/18092#discussion_r122620600
  
--- Diff: 
core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala ---
@@ -1281,6 +1286,59 @@ class BlockManagerSuite extends SparkFunSuite with 
Matchers with BeforeAndAfterE
 assert(master.getLocations("item").isEmpty)
   }
 
+  test("SPARK-20640: Shuffle registration timeout and maxAttempts conf are 
working") {
+val tryAgainMsg = "test_spark_20640_try_again"
+// a server which delays response 50ms and must try twice for success.
+def newShuffleServer(port: Int): (TransportServer, Int) = {
+  val attempts = new mutable.HashMap[String, Int]()
+  val handler = new NoOpRpcHandler {
+override def receive(client: TransportClient, message: ByteBuffer,
--- End diff --

Updated. By the way, I am a little confused.

First, when you insert line break before, Intellij auto indent like this:

```
override def receive(
  client: TransportClient
```

Second, in the same file, at near 1349, `fetchBlocks`'s indent is like this:

```
override def fetchBlocks(
host: String,
port: Int,
execId: String,
blockIds: Array[String],
listener: BlockFetchingListener,
shuffleFiles: Array[File]): Unit = {
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17758: [SPARK-20460][SQL] Make it more consistent to handle col...

2017-06-18 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/17758
  
ok, I'll recheck.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18343: [SPARK-21133][CORE] Fix HighlyCompressedMapStatus#writeE...

2017-06-18 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18343
  
@wangyum Can you also add a test for this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18323: [SPARK-21117][SQL] Built-in SQL Function Support - WIDTH...

2017-06-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18323
  
**[Test build #78245 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78245/testReport)**
 for PR 18323 at commit 
[`7407541`](https://github.com/apache/spark/commit/740754135db80ff0ab60952b38defae65c017065).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14085: [SPARK-16408][SQL] SparkSQL Added file get Except...

2017-06-18 Thread zenglinxi0615

Github user zenglinxi0615 commented on a diff in the pull request:

https://github.com/apache/spark/pull/14085#discussion_r122620464
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/commands.scala ---
@@ -113,8 +113,9 @@ case class AddFile(path: String) extends 
RunnableCommand {
 
   override def run(sqlContext: SQLContext): Seq[Row] = {
 val hiveContext = sqlContext.asInstanceOf[HiveContext]
+val recursive = 
sqlContext.sparkContext.getConf.getBoolean("spark.input.dir.recursive", false)
--- End diff --

I was wondering if we could call:
sparkSession.sparkContext.addFile(path, true)
in AddFileCommand func, since it's a general demand in ETL.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

2017-06-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18092
  
**[Test build #78246 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78246/testReport)**
 for PR 18092 at commit 
[`c8e7c64`](https://github.com/apache/spark/commit/c8e7c64d7e599c3f6283f2390c1ea188e4ed899a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

2017-06-18 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18092
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17758: [SPARK-20460][SQL] Make it more consistent to handle col...

2017-06-18 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/17758
  
shall we check duplicated columns in write path?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17395: [SPARK-20065][SS][WIP] Avoid to output empty parquet fil...

2017-06-18 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/17395
  
hmmm. @uncleGen, shell we close this for now? reopening when it's ready 
would welcome.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17328: [SPARK-19975][Python][SQL] Add map_keys and map_values f...

2017-06-18 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/17328
  
+1 for this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18324: [SPARK-21045][PYSPARK]Fixed executor blocked beca...

2017-06-18 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/18324#discussion_r122620265
  
--- Diff: python/pyspark/worker.py ---
@@ -177,8 +180,11 @@ def process():
 process()
 except Exception:
 try:
+exc_info = traceback.format_exc()
+if isinstance(exc_info, unicode):
+exc_info = exc_info.encode('utf-8')
--- End diff --

I guess we need to follow #17267 each other to fix correctly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18092: [SPARK-20640][CORE]Make rpc timeout and retry for...

2017-06-18 Thread liyichao

Github user liyichao commented on a diff in the pull request:

https://github.com/apache/spark/pull/18092#discussion_r122620196
  
--- Diff: 
core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala ---
@@ -1281,6 +1286,59 @@ class BlockManagerSuite extends SparkFunSuite with 
Matchers with BeforeAndAfterE
 assert(master.getLocations("item").isEmpty)
   }
 
+  test("SPARK-20640: Shuffle registration timeout and maxAttempts conf are 
working") {
+val tryAgainMsg = "test_spark_20640_try_again"
+// a server which delays response 50ms and must try twice for success.
+def newShuffleServer(port: Int): (TransportServer, Int) = {
+  val attempts = new mutable.HashMap[String, Int]()
+  val handler = new NoOpRpcHandler {
+override def receive(client: TransportClient, message: ByteBuffer,
+ callback: RpcResponseCallback): Unit = {
+  val msgObj = BlockTransferMessage.Decoder.fromByteBuffer(message)
+  msgObj match {
+case exec: RegisterExecutor =>
+  Thread.sleep(50)
+  val attempt = attempts.getOrElse(exec.execId, 0) + 1
+  attempts(exec.execId) = attempt
+  if (attempt < 2) {
+callback.onFailure(new Exception(tryAgainMsg))
+return
+  }
+  callback.onSuccess(ByteBuffer.wrap(new Array[Byte](0)))
+  }
+}
+  }
+
+  val transConf = SparkTransportConf.fromSparkConf(conf, "shuffle", 
numUsableCores = 0)
+  val transCtx = new TransportContext(transConf, handler, true)
+  (transCtx.createServer(port, 
Seq.empty[TransportServerBootstrap].asJava), port)
+}
+val candidatePort = RandomUtils.nextInt(1024, 65536)
+val (server, shufflePort) = Utils.startServiceOnPort(candidatePort,
--- End diff --

No, because `startServiceOnPort` will handle the conflicted port case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15417: [SPARK-17851][SQL][TESTS] Make sure all test sqls in cat...

2017-06-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15417
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15417: [SPARK-17851][SQL][TESTS] Make sure all test sqls in cat...

2017-06-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15417
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78241/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15417: [SPARK-17851][SQL][TESTS] Make sure all test sqls in cat...

2017-06-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15417
  
**[Test build #78241 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78241/testReport)**
 for PR 15417 at commit 
[`5436c38`](https://github.com/apache/spark/commit/5436c382e565e09754062795edea716fd0694cd3).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class DecimalPrecisionSuite extends AnalysisTest with BeforeAndAfter `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17084: [SPARK-18693][ML][MLLIB] ML Evaluators should use weight...

2017-06-18 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/17084
  
gentle ping @imatiach-msft .


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

2017-06-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18025
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78244/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

2017-06-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18025
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

2017-06-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18025
  
**[Test build #78244 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78244/testReport)**
 for PR 18025 at commit 
[`6eae126`](https://github.com/apache/spark/commit/6eae126398e4229aa84130728792f407c67a75e6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18343: [SPARK-21133][CORE] Fix HighlyCompressedMapStatus#writeE...

2017-06-18 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18343
  
@wangyum Thanks for updating. Can you try to disable kyro and try it again? 
So we can verify it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17681: [SPARK-20383][SQL] Supporting Create [temporary] Functio...

2017-06-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17681
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78240/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17681: [SPARK-20383][SQL] Supporting Create [temporary] Functio...

2017-06-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17681
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16766: [SPARK-19426][SQL] Custom coalesce for Dataset

2017-06-18 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16766#discussion_r122619579
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -2603,12 +2603,27 @@ class Dataset[T] private[sql](
* current upstream partitions will be executed in parallel (per whatever
* the current partitioning is).
*
+   * A [[PartitionCoalescer]] can also be supplied allowing the behavior 
of the partitioning to be
--- End diff --

Sounds this trait is unable to be generated as is in Java. Simply wrapping 
`` `...` `` would be fine.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16766: [SPARK-19426][SQL] Custom coalesce for Dataset

2017-06-18 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16766#discussion_r122619526
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -2603,12 +2603,27 @@ class Dataset[T] private[sql](
* current upstream partitions will be executed in parallel (per whatever
* the current partitioning is).
*
+   * A [[PartitionCoalescer]] can also be supplied allowing the behavior 
of the partitioning to be
+   * customized similar to [[RDD.coalesce]].
--- End diff --

I think it should be `[[org.apache.spark.rdd.RDD##coalesce]]`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17681: [SPARK-20383][SQL] Supporting Create [temporary] Functio...

2017-06-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17681
  
**[Test build #78240 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78240/testReport)**
 for PR 17681 at commit 
[`f6898c4`](https://github.com/apache/spark/commit/f6898c44642420278a616fa65d1f9825fd762ee7).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class DropFunctionEvent(database: String, name: String) extends 
FunctionEvent`
  * `case class AlterFunctionEvent(database: String, name: String) extends 
FunctionEvent`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18343: [SPARK-21133][CORE] Fix HighlyCompressedMapStatus#writeE...

2017-06-18 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18343
  
Because we write/read `hugeBlockSizes` in `writeExternal`/`readExternal`, 
it seems to me that it is intended to be serialized. So I think removing 
`transient` should be ok.

LGTM cc @cloud-fan 




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18349: [SPARK-20927][SS] Change some operators in Dataset to no...

2017-06-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18349
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18349: [SPARK-20927][SS] Change some operators in Dataset to no...

2017-06-18 Thread ZiyueHuang

Github user ZiyueHuang commented on the issue:

https://github.com/apache/spark/pull/18349
  
@zsxwing Could you please review this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18343: [SPARK-21133][CORE] Fix HighlyCompressedMapStatus#writeE...

2017-06-18 Thread wangyum

Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/18343
  
@viirya Yes, I' using `org.apache.spark.serializer.KryoSerializer`, [master 
branch](https://github.com/apache/spark/tree/ce49428ef7d640c1734e91ffcddc49dbc8547ba7)
 still has this issue, error logs:
```
17/06/19 12:24:05 ERROR Utils: Exception encountered
java.lang.NullPointerException
at 
org.apache.spark.scheduler.HighlyCompressedMapStatus$$anonfun$writeExternal$2.apply$mcV$sp(MapStatus.scala:171)
at 
org.apache.spark.scheduler.HighlyCompressedMapStatus$$anonfun$writeExternal$2.apply(MapStatus.scala:167)
at 
org.apache.spark.scheduler.HighlyCompressedMapStatus$$anonfun$writeExternal$2.apply(MapStatus.scala:167)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1306)
at 
org.apache.spark.scheduler.HighlyCompressedMapStatus.writeExternal(MapStatus.scala:167)
at 
java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1459)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1430)
at 
java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at 
java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1378)
at 
java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
at 
java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
at 
org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply$mcV$sp(MapOutputTracker.scala:728)
at 
org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:727)
at 
org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:727)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1340)
at 
org.apache.spark.MapOutputTracker$.serializeMapStatuses(MapOutputTracker.scala:730)
at 
org.apache.spark.ShuffleStatus.serializedMapStatus(MapOutputTracker.scala:171)
at 
org.apache.spark.MapOutputTrackerMaster$MessageLoop.run(MapOutputTracker.scala:389)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
17/06/19 12:24:05 ERROR MapOutputTrackerMaster: 
java.lang.NullPointerException
java.io.IOException: java.lang.NullPointerException
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1313)
at 
org.apache.spark.scheduler.HighlyCompressedMapStatus.writeExternal(MapStatus.scala:167)
at 
java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1459)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1430)
at 
java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at 
java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1378)
at 
java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
at 
java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
at 
org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply$mcV$sp(MapOutputTracker.scala:728)
at 
org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:727)
at 
org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:727)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1340)
at 
org.apache.spark.MapOutputTracker$.serializeMapStatuses(MapOutputTracker.scala:730)
at 
org.apache.spark.ShuffleStatus.serializedMapStatus(MapOutputTracker.scala:171)
at 
org.apache.spark.MapOutputTrackerMaster$MessageLoop.run(MapOutputTracker.scala:389)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
at 
org.apache.spark.scheduler.HighlyCompressedMapStatus$$anonfun$writeExternal$2.apply$mcV$sp(MapStatus.scala:171)
at 
org.apache.spark.scheduler.HighlyCompressedMapStatus$$anonfun$writeExternal$2.apply(MapStatus.scala:167)
at 
org.apache.spark.scheduler.HighlyCompressedMapStatus$$anonfun$writeExternal$2.apply(MapStatus.scala:167)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1306)
... 17 more
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature

[GitHub] spark pull request #18349: [SPARK-20927][SS] Change some operators in Datase...

2017-06-18 Thread ZiyueHuang

GitHub user ZiyueHuang opened a pull request:

https://github.com/apache/spark/pull/18349

[SPARK-20927][SS] Change some operators in Dataset to no-op for a streaming 
query.

## What changes were proposed in this pull request?

Change some operators(persist, unpersist, checkpoint) in Dataset to no-op 
(do nothing but log a warning) for a streaming query.

## How was this patch tested?

```scala
df = spark.readStream.json(...)
val dfCounts = df.persist().unpersist().checkpoint().groupBy().count()
val query = 
dfCounts.writeStream.outputMode("complete").format("console").start()
```


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ZiyueHuang/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18349.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18349


commit 8358f576bd255db458c630fc38537dd93c695246
Author: Ziyue Huang 
Date:   2017-06-15T06:05:18Z

[SPARK-20927][SS] Change some operators to no-op for a streaming query 
without throwing an exception.

commit 1713c568de223a78bc6db4455e8ea38c4f2d2267
Author: ZiyueHuang 
Date:   2017-06-15T06:32:57Z

Revert "[SPARK-20927][SS] Change some operators to no-op for a streaming 
query without throwing an exception."

This reverts commit 8358f576bd255db458c630fc38537dd93c695246.

revert

commit 372ada118883b7ac8924ff20bd76e4bd15b47d6f
Author: ZiyueHuang 
Date:   2017-06-15T06:39:28Z

[SPARK-20927][SS] Change some operators to no-op in streaming queries.

commit f68e6b738d9330409850556c40d626bfb17a561e
Author: ZiyueHuang 
Date:   2017-06-19T03:10:43Z

scalastyle fix

commit f127ada3281442fc1def31328deaab7463f88531
Author: ZiyueHuang 
Date:   2017-06-19T03:50:34Z

comment fix

commit bb1fc6f23d9a74f649aef7b6707a82f45df3eff3
Author: Ziyue Huang 
Date:   2017-06-19T04:17:59Z

Merge pull request #1 from ZiyueHuang/dev

[SPARK-20927][SS] Change some operators to no-op in streaming queries.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16347: [SPARK-18934][SQL] Writing to dynamic partitions does no...

2017-06-18 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16347
  
gentle ping @junegunn on ^.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18343: [SPARK-21133][CORE] Fix HighlyCompressedMapStatus#writeE...

2017-06-18 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18343
  
I think this should be addressed before 2.2. I already asked notice of 
other committers on dev mailing list.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14957: [SPARK-4502][SQL]Support parquet nested struct pruning a...

2017-06-18 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14957
  
@xuanyuanking, let's close this and help review #16578 if you agree on the 
comments above.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18343: [SPARK-21133][CORE] Fix HighlyCompressedMapStatus#writeE...

2017-06-18 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18343
  
@wangyum Are you using kyro serializer? I think it is why you hit this 
issue.

Once you use kyro, I think the `readExternal` in 
`HighlyCompressedMapStatus` won't be used to deserialize the object in driver 
side. As `hugeBlockSizes` is a `transient` variable, it is null now. So when we 
try to serialize it again, as `MapOutputTracker.serializeMapStatuses` directly 
call `ObjectOutputStream.writeObject` to serialize it, it calls  
`HighlyCompressedMapStatus.writeExternal` and cause NPE.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

2017-06-18 Thread actuaryzhang

Github user actuaryzhang commented on the issue:

https://github.com/apache/spark/pull/18025
  
This is how the doc for column_aggregate_functions looks like (only 
snapshot of the main parts):


![image](https://user-images.githubusercontent.com/11082368/27269174-85df12fa-5469-11e7-872d-d740fd382294.png)

![image](https://user-images.githubusercontent.com/11082368/27269177-8b35a67e-5469-11e7-80ac-7c804c3728d2.png)

![image](https://user-images.githubusercontent.com/11082368/27269180-8eb8c7a4-5469-11e7-8c4a-1de037bf078d.png)

![image](https://user-images.githubusercontent.com/11082368/27269184-91e39cb0-5469-11e7-932c-5eab772ec845.png)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18303: [SPARK-19824][Core] Update JsonProtocol to keep consiste...

2017-06-18 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18303
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

2017-06-18 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18092
  
LGTM except 2 minor comments


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18092: [SPARK-20640][CORE]Make rpc timeout and retry for...

2017-06-18 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18092#discussion_r122617592
  
--- Diff: 
core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala ---
@@ -1281,6 +1286,59 @@ class BlockManagerSuite extends SparkFunSuite with 
Matchers with BeforeAndAfterE
 assert(master.getLocations("item").isEmpty)
   }
 
+  test("SPARK-20640: Shuffle registration timeout and maxAttempts conf are 
working") {
+val tryAgainMsg = "test_spark_20640_try_again"
+// a server which delays response 50ms and must try twice for success.
+def newShuffleServer(port: Int): (TransportServer, Int) = {
+  val attempts = new mutable.HashMap[String, Int]()
+  val handler = new NoOpRpcHandler {
+override def receive(client: TransportClient, message: ByteBuffer,
+ callback: RpcResponseCallback): Unit = {
+  val msgObj = BlockTransferMessage.Decoder.fromByteBuffer(message)
+  msgObj match {
+case exec: RegisterExecutor =>
+  Thread.sleep(50)
+  val attempt = attempts.getOrElse(exec.execId, 0) + 1
+  attempts(exec.execId) = attempt
+  if (attempt < 2) {
+callback.onFailure(new Exception(tryAgainMsg))
+return
+  }
+  callback.onSuccess(ByteBuffer.wrap(new Array[Byte](0)))
+  }
+}
+  }
+
+  val transConf = SparkTransportConf.fromSparkConf(conf, "shuffle", 
numUsableCores = 0)
+  val transCtx = new TransportContext(transConf, handler, true)
+  (transCtx.createServer(port, 
Seq.empty[TransportServerBootstrap].asJava), port)
+}
+val candidatePort = RandomUtils.nextInt(1024, 65536)
+val (server, shufflePort) = Utils.startServiceOnPort(candidatePort,
--- End diff --

will this be flaky? e.g. the port is occupied by other test suites


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18025: [SPARK-20889][SparkR] Grouped documentation for A...

2017-06-18 Thread actuaryzhang

Github user actuaryzhang commented on a diff in the pull request:

https://github.com/apache/spark/pull/18025#discussion_r122617531
  
--- Diff: R/pkg/R/stats.R ---
@@ -52,22 +52,17 @@ setMethod("crosstab",
 collect(dataFrame(sct))
   })
 
-#' Calculate the sample covariance of two numerical columns of a 
SparkDataFrame.
-#'
 #' @param colName1 the name of the first column
 #' @param colName2 the name of the second column
-#' @return The covariance of the two columns.
--- End diff --

OK. I added this back. The doc should be very clear even without this 
return value. Indeed,  most functions do not document return value in SparkR. 
See what it looks like in the image attached in the next comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18025: [SPARK-20889][SparkR] Grouped documentation for AGGREGAT...

2017-06-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18025
  
**[Test build #78244 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78244/testReport)**
 for PR 18025 at commit 
[`6eae126`](https://github.com/apache/spark/commit/6eae126398e4229aa84130728792f407c67a75e6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18092: [SPARK-20640][CORE]Make rpc timeout and retry for...

2017-06-18 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18092#discussion_r122617448
  
--- Diff: 
core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala ---
@@ -1281,6 +1286,59 @@ class BlockManagerSuite extends SparkFunSuite with 
Matchers with BeforeAndAfterE
 assert(master.getLocations("item").isEmpty)
   }
 
+  test("SPARK-20640: Shuffle registration timeout and maxAttempts conf are 
working") {
+val tryAgainMsg = "test_spark_20640_try_again"
+// a server which delays response 50ms and must try twice for success.
+def newShuffleServer(port: Int): (TransportServer, Int) = {
+  val attempts = new mutable.HashMap[String, Int]()
+  val handler = new NoOpRpcHandler {
+override def receive(client: TransportClient, message: ByteBuffer,
--- End diff --

nit:
```
def xxxI(
para1: XXX
para2: XXX): XXX
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13893: [SPARK-14172][SQL] Hive table partition predicate not pa...

2017-06-18 Thread jiangxb1987

Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/13893
  
ya, this still exists. Let me find some time to resolve this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18025: [SPARK-20889][SparkR] Grouped documentation for A...

2017-06-18 Thread actuaryzhang

Github user actuaryzhang commented on a diff in the pull request:

https://github.com/apache/spark/pull/18025#discussion_r122617405
  
--- Diff: R/pkg/R/stats.R ---
@@ -52,22 +52,17 @@ setMethod("crosstab",
 collect(dataFrame(sct))
   })
 
-#' Calculate the sample covariance of two numerical columns of a 
SparkDataFrame.
-#'
 #' @param colName1 the name of the first column
 #' @param colName2 the name of the second column
-#' @return The covariance of the two columns.
 #'
 #' @rdname cov
-#' @name cov
 #' @aliases cov,SparkDataFrame-method
 #' @family stat functions
 #' @export
 #' @examples
-#'\dontrun{
-#' df <- read.json("/path/to/file.json")
-#' cov <- cov(df, "title", "gender")
-#' }
+#'
+#' \dontrun{
--- End diff --

No. The newline should be between `@example` and `\dontrun` to separate 
multiple `dontruns`. 

![image](https://user-images.githubusercontent.com/11082368/27269043-73785762-5468-11e7-9a31-5cca104e005b.png)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12257: [SPARK-14483][WEBUI] Display user name for each job and ...

2017-06-18 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/12257
  
gentle ping @sarutak on ^


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18347: [SPARK-20599][SS] ConsoleSink should work with (b...

2017-06-18 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/18347#discussion_r122617147
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
@@ -465,6 +465,8 @@ case class DataSource(
 providingClass.newInstance() match {
   case dataSource: CreatableRelationProvider =>
 SaveIntoDataSourceCommand(data, dataSource, 
caseInsensitiveOptions, mode)
+  case dataSource: ConsoleSinkProvider =>
+data.show(data.count().toInt, false)
--- End diff --

`ConsoleSink`  [has two 
options](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/console.scala#L27-L30)
 that could be used here -- `numRows` and `truncate`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18296: [SPARK-21090][core]Optimize the unified memory manager c...

2017-06-18 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18296
  
Actually this is a bug fix, and it's a small fix(without tests, only 3 
lines), so backporting in to 2.2


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #11887: [SPARK-13041][Mesos]add driver sandbox uri to the dispat...

2017-06-18 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/11887
  
@skonto, do you maybe know (or a wild guess) when we would be able to 
proceed this? Probably, closing this for now and reopening might be an option.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18347: [SPARK-20599][SS] ConsoleSink should work with (b...

2017-06-18 Thread jaceklaskowski

Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/18347#discussion_r122616876
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
@@ -465,6 +465,8 @@ case class DataSource(
 providingClass.newInstance() match {
   case dataSource: CreatableRelationProvider =>
 SaveIntoDataSourceCommand(data, dataSource, 
caseInsensitiveOptions, mode)
+  case dataSource: ConsoleSinkProvider =>
--- End diff --

Underscore `dataSource` since it's not used.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #11420: [SPARK-13493][SQL] Enable case sensitiveness in json sch...

2017-06-18 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/11420
  
Let's close this if it is not in a progress.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18025: [SPARK-20889][SparkR] Grouped documentation for A...

2017-06-18 Thread actuaryzhang

Github user actuaryzhang commented on a diff in the pull request:

https://github.com/apache/spark/pull/18025#discussion_r122616787
  
--- Diff: R/pkg/R/stats.R ---
@@ -52,22 +52,17 @@ setMethod("crosstab",
 collect(dataFrame(sct))
   })
 
-#' Calculate the sample covariance of two numerical columns of a 
SparkDataFrame.
--- End diff --

The method for SparkDataFrame is still there. I'm just removing redundant 
doc here. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18296: [SPARK-21090][core]Optimize the unified memory ma...

2017-06-18 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18296


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #11205: [SPARK-11334][Core] Handle maximum task failure situatio...

2017-06-18 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/11205
  
gentle ping @rustgi, have you maybe had some time to confirm this patch 
maybe? It sounds the only thing we need here is the confirmation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18296: [SPARK-21090][core]Optimize the unified memory manager c...

2017-06-18 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18296
  
LGTM, merging to master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #10861: SPARK-12948. [SQL]. Consider reducing size of broadcasts...

2017-06-18 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/10861
  
Hi @rajeshbalamohan, I think this should be a mergeable state at least and 
the conflicts and style issues should be resolved. Would you be able to update 
this for now?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18334: [SPARK-21127] [SQL] Update statistics after data changin...

2017-06-18 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18334
  
+1 to provide a flag to automatically trigger the stats updates. We cat set 
it false by default to not surprise users 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 >

1 - 100 of 288 matches

Mail list logo