[GitHub] spark pull request #15292: [SPARK-17719][SPARK-17776][SQL] Unify and tie up ...

2016-10-06 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15292#discussion_r82332285
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala
 ---
@@ -17,47 +17,130 @@
 
 package org.apache.spark.sql.execution.datasources.jdbc
 
+import java.sql.{Connection, DriverManager}
+import java.util.Properties
+
 /**
  * Options for the JDBC data source.
  */
 class JDBCOptions(
 @transient private val parameters: Map[String, String])
   extends Serializable {
 
+  import JDBCOptions._
+
+  def this(url: String, table: String, parameters: Map[String, String]) = {
+this(parameters ++ Map("url" -> url, "dbtable" -> table))
+  }
+
+  val asProperties: Properties = {
--- End diff --

I think the function name needs an update. How about 
`asConnectionProperties`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15375: [SPARK-17790] Support for parallelizing R data.fr...

2016-10-06 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/15375#discussion_r82331701
  
--- Diff: R/pkg/R/context.R ---
@@ -123,19 +126,48 @@ parallelize <- function(sc, coll, numSlices = 1) {
   if (numSlices > length(coll))
 numSlices <- length(coll)
 
+  sizeLimit <- as.numeric(sparkR.conf(
+  "spark.r.maxAllocationLimit",
+  toString(.Machine$integer.max / 2) # Default to a safe default: 200MB
+  ))
+  objectSize <- object.size(coll)
+
+  # For large objects we make sure the size of each slice is also smaller 
than sizeLimit
+  numSlices <- max(numSlices, ceiling(objectSize / sizeLimit))
+
   sliceLen <- ceiling(length(coll) / numSlices)
   slices <- split(coll, rep(1: (numSlices + 1), each = 
sliceLen)[1:length(coll)])
 
   # Serialize each slice: obtain a list of raws, or a list of lists 
(slices) of
   # 2-tuples of raws
   serializedSlices <- lapply(slices, serialize, connection = NULL)
 
-  jrdd <- callJStatic("org.apache.spark.api.r.RRDD",
-  "createRDDFromArray", sc, serializedSlices)
+  # The PRC backend cannot handle arguments larger than 2GB (INT_MAX)
+  # If serialized data is safely less than that threshold we send it over 
the PRC channel.
+  # Otherwise, we write it to a file and send the file name
+  if (objectSize < sizeLimit) {
+jrdd <- callJStatic("org.apache.spark.api.r.RRDD", 
"createRDDFromArray", sc, serializedSlices)
+  } else {
+fileName <- writeToTempFile(serializedSlices)
+jrdd <- callJStatic(
+  "org.apache.spark.api.r.RRDD", "createRDDFromFile", sc, fileName, 
as.integer(numSlices))
+file.remove(fileName)
--- End diff --

if the JVM call throws an exception, I don't think this line will execute, 
perhaps wrap this in tryCatch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15246: [MINOR][SQL] Use resource path for test_script.sh

2016-10-06 Thread weiqingy
Github user weiqingy commented on the issue:

https://github.com/apache/spark/pull/15246
  
Hi, @srowen all tests passed this time. Could you please review this PR 
again? Thanks. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15375: [SPARK-17790] Support for parallelizing R data.frame lar...

2016-10-06 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/15375
  
Odd, this is the error from appveyor:
```
ontext: Fail to set Spark caller context
java.lang.ClassNotFoundException: org.apache.hadoop.ipc.CallerContext
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at 
org.apache.spark.util.CallerContext.setCurrentContext(Utils.scala:2485)
at org.apache.spark.scheduler.Task.run(Task.scala:96)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
16/10/07 03:45:58 INFO Executor: Finished task 1.
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15382: [SPARK-17810] [SQL] Default spark.sql.warehouse.dir is r...

2016-10-06 Thread koertkuipers
Github user koertkuipers commented on the issue:

https://github.com/apache/spark/pull/15382
  
i think working dir makes more sense than home dir. but could this catch 
people by surprise because we now expect write permission in the working dir?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15375: [SPARK-17790] Support for parallelizing R data.fr...

2016-10-06 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/15375#discussion_r82331194
  
--- Diff: R/pkg/R/context.R ---
@@ -126,13 +126,13 @@ parallelize <- function(sc, coll, numSlices = 1) {
   if (numSlices > length(coll))
 numSlices <- length(coll)
 
-  sizeLimit <- .Machine$integer.max - 10240 # Safe margin bellow maximum 
allocation limit
+  sizeLimit <- as.numeric(
--- End diff --

shouldn't this be `as.integer(`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15292: [SPARK-17719][SPARK-17776][SQL] Unify and tie up ...

2016-10-06 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15292#discussion_r82330973
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala
 ---
@@ -17,47 +17,130 @@
 
 package org.apache.spark.sql.execution.datasources.jdbc
 
+import java.sql.{Connection, DriverManager}
+import java.util.Properties
+
 /**
  * Options for the JDBC data source.
  */
 class JDBCOptions(
 @transient private val parameters: Map[String, String])
   extends Serializable {
 
+  import JDBCOptions._
+
+  def this(url: String, table: String, parameters: Map[String, String]) = {
+this(parameters ++ Map("url" -> url, "dbtable" -> table))
--- End diff --

Change them to `JDBC_URL` and `JDBC_TABLE_NAME`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13690: [SPARK-15767][R][ML] Decision Tree Regression wra...

2016-10-06 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/13690#discussion_r82330771
  
--- Diff: R/pkg/inst/tests/testthat/test_mllib.R ---
@@ -791,4 +791,59 @@ test_that("spark.kstest", {
   expect_match(capture.output(stats)[1], "Kolmogorov-Smirnov test 
summary:")
 })
 
+test_that("spark.decisionTree Regression", {
+  data <- suppressWarnings(createDataFrame(longley))
+  model <- spark.decisionTree(data, Employed~., "regression", maxDepth = 
5, maxBins = 16)
--- End diff --

could be more readable as `Employed ~ .` (with spaces)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15246: [MINOR][SQL] Use resource path for test_script.sh

2016-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15246
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15246: [MINOR][SQL] Use resource path for test_script.sh

2016-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15246
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66482/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15246: [MINOR][SQL] Use resource path for test_script.sh

2016-10-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15246
  
**[Test build #66482 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66482/consoleFull)**
 for PR 15246 at commit 
[`1233aa2`](https://github.com/apache/spark/commit/1233aa25d751b94a610f6ac052411596cb0df10d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15361: [SPARK-17765][SQL] Support for writing out user-defined ...

2016-10-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/15361
  
@kxepal Sure, thanks for confirming!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15351: [SPARK-17612][SQL][branch-2.0] Support `DESCRIBE table P...

2016-10-06 Thread hvanhovell
Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/15351
  
Merging to 2.0. @dongjoon-hyun can you close this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13690: [SPARK-15767][R][ML] Decision Tree Regression wra...

2016-10-06 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/13690#discussion_r82330422
  
--- Diff: R/pkg/R/mllib.R ---
@@ -117,7 +132,7 @@ NULL
 #' @export
 #' @seealso \link{spark.glm}, \link{glm},
 #' @seealso \link{spark.als}, \link{spark.gaussianMixture}, 
\link{spark.isoreg}, \link{spark.kmeans},
-#' @seealso \link{spark.mlp}, \link{spark.naiveBayes}, \link{spark.survreg}
+#' @seealso \link{spark.mlp}, \link{spark.naiveBayes}, 
\link{spark.survreg}, \link{spark.decisionTree}
--- End diff --

same here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15351: [SPARK-17612][SQL][branch-2.0] Support `DESCRIBE table P...

2016-10-06 Thread hvanhovell
Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/15351
  
@dongjoon-hyun it LGTM. It is just a rather big patch to backport, for 
something that is not a bug fix. But I'll merge it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15361: [SPARK-17765][SQL] Support for writing out user-defined ...

2016-10-06 Thread kxepal
Github user kxepal commented on the issue:

https://github.com/apache/spark/pull/15361
  
@HyukjinKwon 
Oh, great news! It seems it's me backported this patch to 2.0.0 
incorrectly. I'm sorry for false alarm then - suddenly, I wasn't able to test 
it with master.

 I'll do one more try today, but so far it looks like that you solved the 
problem \o/ Thank you!






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13690: [SPARK-15767][R][ML] Decision Tree Regression wrapper in...

2016-10-06 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/13690
  
could you fix the test failure?
```
Duplicated \argument entries in documentation object 'spark.decisionTree':
  'newData' '...' 'object' '...' 'x'
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15388: [SPARK-17821][SQL] Support And and Or in Expression Cano...

2016-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15388
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66481/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15389: [SPARK-17817][PySpark] PySpark RDD Repartitioning Result...

2016-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15389
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15388: [SPARK-17821][SQL] Support And and Or in Expression Cano...

2016-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15388
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15389: [SPARK-17817][PySpark] PySpark RDD Repartitioning Result...

2016-10-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15389
  
**[Test build #66485 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66485/consoleFull)**
 for PR 15389 at commit 
[`be8c509`](https://github.com/apache/spark/commit/be8c509a14506817cce500e845064a2ca7edcc23).
 * This patch **fails Python style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15292: [SPARK-17719][SPARK-17776][SQL] Unify and tie up ...

2016-10-06 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15292#discussion_r82329856
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1014,16 +1014,31 @@ bin/spark-shell --driver-class-path 
postgresql-9.4.1207.jar --jars postgresql-9.
 {% endhighlight %}
 
 Tables from the remote database can be loaded as a DataFrame or Spark SQL 
Temporary table using
-the Data Sources API. The following options are supported:
+the Data Sources API. The following case-sensitive options are supported:
 
 
   Property NameMeaning
   
 url
 
-  The JDBC URL to connect to.
+  The JDBC URL to connect to. It might contain user and password 
information. e.g., 
jdbc:postgresql://localhost/test?user=fred=secret
--- End diff --

Sure, that sounds more clean and correct.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15389: [SPARK-17817][PySpark] PySpark RDD Repartitioning Result...

2016-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15389
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66485/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15388: [SPARK-17821][SQL] Support And and Or in Expression Cano...

2016-10-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15388
  
**[Test build #66481 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66481/consoleFull)**
 for PR 15388 at commit 
[`7e25355`](https://github.com/apache/spark/commit/7e2535554d5a0661490b74ff4422798d98063214).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15389: [SPARK-17817][PySpark] PySpark RDD Repartitioning Result...

2016-10-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15389
  
**[Test build #66485 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66485/consoleFull)**
 for PR 15389 at commit 
[`be8c509`](https://github.com/apache/spark/commit/be8c509a14506817cce500e845064a2ca7edcc23).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15389: [SPARK-17817][PySpark] PySpark RDD Repartitioning...

2016-10-06 Thread viirya
GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/15389

[SPARK-17817][PySpark] PySpark RDD Repartitioning Results in Highly Skewed 
Partition Sizes

## What changes were proposed in this pull request?

Quoted from JIRA description:

Calling repartition on a PySpark RDD to increase the number of partitions 
results in highly skewed partition sizes, with most having 0 rows. The 
repartition method should evenly spread out the rows across the partitions, and 
this behavior is correctly seen on the Scala side.

Please reference the following code for a reproducible example of this 
issue:


num_partitions = 2
a = sc.parallelize(range(int(1e6)), 2)  # start with 2 even partitions
l = a.repartition(num_partitions).glom().map(len).collect()  # get 
length of each partition
min(l), max(l), sum(l)/len(l), len(l)  # skewed!

In Scala's `repartition` code, we will distribute elements evenly across 
output partitions. However, the RDD from Python is serialized as a single 
binary data, so the distribution fails. We need to convert the RDD in Python to 
java object before repartitioning.

## How was this patch tested?

Jenkins tests.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 pyspark-rdd-repartition

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15389.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15389


commit be8c509a14506817cce500e845064a2ca7edcc23
Author: Liang-Chi Hsieh 
Date:   2016-10-07T04:59:37Z

Fix pyspark.rdd repartition.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15292: [SPARK-17719][SPARK-17776][SQL] Unify and tie up ...

2016-10-06 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15292#discussion_r82329644
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1014,16 +1014,31 @@ bin/spark-shell --driver-class-path 
postgresql-9.4.1207.jar --jars postgresql-9.
 {% endhighlight %}
 
 Tables from the remote database can be loaded as a DataFrame or Spark SQL 
Temporary table using
-the Data Sources API. The following options are supported:
+the Data Sources API. The following case-sensitive options are supported:
 
 
   Property NameMeaning
   
 url
 
-  The JDBC URL to connect to.
+  The JDBC URL to connect to. It might contain user and password 
information. e.g., 
jdbc:postgresql://localhost/test?user=fred=secret
 
   
+
+  
+user
+
+  The user to connect as.
--- End diff --

Sorry, after rethinking it, I think we do not need `user` and `password` 
here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15292: [SPARK-17719][SPARK-17776][SQL] Unify and tie up ...

2016-10-06 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15292#discussion_r82329571
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1014,16 +1014,31 @@ bin/spark-shell --driver-class-path 
postgresql-9.4.1207.jar --jars postgresql-9.
 {% endhighlight %}
 
 Tables from the remote database can be loaded as a DataFrame or Spark SQL 
Temporary table using
-the Data Sources API. The following options are supported:
+the Data Sources API. The following case-sensitive options are supported:
 
 
   Property NameMeaning
   
 url
 
-  The JDBC URL to connect to.
+  The JDBC URL to connect to. It might contain user and password 
information. e.g., 
jdbc:postgresql://localhost/test?user=fred=secret
--- End diff --

How about this change?

_The source-specific connection properties may be specified in the URL. 
e.g., jdbc:postgresql://localhost/test?user=fred=secret_


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15218: [SPARK-17637][Scheduler]Packed scheduling for Spark task...

2016-10-06 Thread mridulm
Github user mridulm commented on the issue:

https://github.com/apache/spark/pull/15218
  

Btw, taking a step back, I am not sure this will work as you expect it to.
Other than a few taskset's - those without locality information - the 
schedule is going to be highly biased towards the locality information supplied.

This typically will mean PROCESS_LOCAL (almost always) and then NODE_LOCAL 
- which means, exactly match the executor or host (irrespective of the order we 
traverse the task list).

The shuffle of offers we do is for a specific set of purposes - spread load 
if no locality information (not very common imo) or spread it across cluster 
when locality information is of more 'low quality' - like from an InputFormat 
or for shuffle when we are using heuristics which might not be optimal.

But since I have not looked at this in a while, will CC kay. +CC 
@kayousterhout pls do take a look in case I am missing something.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15292: [SPARK-17719][SPARK-17776][SQL] Unify and tie up ...

2016-10-06 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15292#discussion_r82329341
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1014,16 +1014,31 @@ bin/spark-shell --driver-class-path 
postgresql-9.4.1207.jar --jars postgresql-9.
 {% endhighlight %}
 
 Tables from the remote database can be loaded as a DataFrame or Spark SQL 
Temporary table using
-the Data Sources API. The following options are supported:
+the Data Sources API. The following case-sensitive options are supported:
 
 
   Property NameMeaning
   
 url
 
-  The JDBC URL to connect to.
+  The JDBC URL to connect to. It might contain user and password 
information. e.g., 
jdbc:postgresql://localhost/test?user=fred=secret
--- End diff --

Actually, this is not accurate. Let me think about it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15364: [SPARK-17792][ML] L-BFGS solver for linear regres...

2016-10-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/15364


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15364: [SPARK-17792][ML] L-BFGS solver for linear regression do...

2016-10-06 Thread yanboliang
Github user yanboliang commented on the issue:

https://github.com/apache/spark/pull/15364
  
Merged into master, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15375: [SPARK-17790] Support for parallelizing R data.frame lar...

2016-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15375
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15375: [SPARK-17790] Support for parallelizing R data.frame lar...

2016-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15375
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66472/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15375: [SPARK-17790] Support for parallelizing R data.frame lar...

2016-10-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15375
  
**[Test build #66472 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66472/consoleFull)**
 for PR 15375 at commit 
[`4aab6cf`](https://github.com/apache/spark/commit/4aab6cf4d6e2f05c1e893cbc6d05fcc1763ea0f4).
 * This patch **fails from timeout after a configured wait of \`250m\`**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15381: [SPARK-17707] [WEBUI] Web UI prevents spark-submi...

2016-10-06 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/15381#discussion_r82326116
  
--- Diff: 
sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/thrift/ThriftHttpCLIService.java
 ---
@@ -90,8 +95,21 @@ public void run() {
   Arrays.toString(sslContextFactory.getExcludeProtocols()));
 sslContextFactory.setKeyStorePath(keyStorePath);
 sslContextFactory.setKeyStorePassword(keyStorePassword);
-connector = new ServerConnector(httpServer, sslContextFactory);
+connectionFactories = AbstractConnectionFactory.getFactories(
--- End diff --

This will expose both http and https, and it's a behavior change. Right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15354: [SPARK-17764][SQL] Add `to_json` supporting to convert n...

2016-10-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15354
  
**[Test build #66484 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66484/consoleFull)**
 for PR 15354 at commit 
[`5f9fa29`](https://github.com/apache/spark/commit/5f9fa29a44b9f33cd90633e470d3dff2516499a9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14531: [SPARK-17353] [SPARK-16943] [SPARK-16942] [SQL] Fix mult...

2016-10-06 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/14531
  
@sitalkedia Yeah, I saw it. Thank you for investigation. Normally, we do 
not want to add many configuration flags. It hurts the usability. Let @rxin 
make a decision whether we should add another flag or not. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15367: [SPARK-17346][SQL][test-maven]Add Kafka source for Struc...

2016-10-06 Thread marmbrus
Github user marmbrus commented on the issue:

https://github.com/apache/spark/pull/15367
  
No, if we backport this I would plan to continue to backport changes (that 
are safe) until the next release.  Either way this should not affect what goes 
into master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15385: [DO NOT MERGE]Try to reproduce DirectKafkaStreamS...

2016-10-06 Thread zsxwing
Github user zsxwing closed the pull request at:

https://github.com/apache/spark/pull/15385


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15367: [SPARK-17346][SQL][test-maven]Add Kafka source for Struc...

2016-10-06 Thread koeninger
Github user koeninger commented on the issue:

https://github.com/apache/spark/pull/15367
  
Does backporting reduce the likelihood of change if user feedback indicates 
we got it wrong?

My technical concerns were largely addressed, that's my big remaining 
organizational concern.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...

2016-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15307
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66483/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15385: [DO NOT MERGE]Try to reproduce DirectKafkaStreamSuite fa...

2016-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15385
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...

2016-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15307
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...

2016-10-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15307
  
**[Test build #66483 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66483/consoleFull)**
 for PR 15307 at commit 
[`8537783`](https://github.com/apache/spark/commit/8537783abc495156d3f356e378d260c9222f2c46).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15385: [DO NOT MERGE]Try to reproduce DirectKafkaStreamSuite fa...

2016-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15385
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66470/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15385: [DO NOT MERGE]Try to reproduce DirectKafkaStreamSuite fa...

2016-10-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15385
  
**[Test build #66470 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66470/consoleFull)**
 for PR 15385 at commit 
[`0fc2da9`](https://github.com/apache/spark/commit/0fc2da9e7d35f645d8564d85389ff74f264d3d00).
 * This patch **fails from timeout after a configured wait of \`250m\`**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15366: [SPARK-17793] [Web UI] Sorting on the description on the...

2016-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15366
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66480/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15366: [SPARK-17793] [Web UI] Sorting on the description on the...

2016-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15366
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15366: [SPARK-17793] [Web UI] Sorting on the description on the...

2016-10-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15366
  
**[Test build #66480 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66480/consoleFull)**
 for PR 15366 at commit 
[`c1d2b2b`](https://github.com/apache/spark/commit/c1d2b2bd1e1a12791a180f1b753ca082c97df31c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15387: [SPARK-17782][STREAMING][KAFKA] eliminate race condition...

2016-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15387
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15387: [SPARK-17782][STREAMING][KAFKA] eliminate race condition...

2016-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15387
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66479/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15387: [SPARK-17782][STREAMING][KAFKA] eliminate race condition...

2016-10-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15387
  
**[Test build #66479 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66479/consoleFull)**
 for PR 15387 at commit 
[`aca55de`](https://github.com/apache/spark/commit/aca55de0624f5634acb04f91636dce79af875fab).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15387: [SPARK-17782][STREAMING][KAFKA] eliminate race condition...

2016-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15387
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15387: [SPARK-17782][STREAMING][KAFKA] eliminate race condition...

2016-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15387
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66477/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15387: [SPARK-17782][STREAMING][KAFKA] eliminate race condition...

2016-10-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15387
  
**[Test build #66477 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66477/consoleFull)**
 for PR 15387 at commit 
[`1fc5863`](https://github.com/apache/spark/commit/1fc5863db88cac9dfd0be09318c4ca8779a51682).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...

2016-10-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15307
  
**[Test build #66483 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66483/consoleFull)**
 for PR 15307 at commit 
[`8537783`](https://github.com/apache/spark/commit/8537783abc495156d3f356e378d260c9222f2c46).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15249: [SPARK-17675] [CORE] Expand Blacklist for TaskSets

2016-10-06 Thread squito
Github user squito commented on the issue:

https://github.com/apache/spark/pull/15249
  
@mridulm we had considered that approach earlier on as well -- I don't 
think it works because you can also have resources which are not totally 
broken, but are flaky for a long period of time.  Simplest example is one bad 
disk out of many; some tasks may succeed though a bunch will fail.  I've seen 
users hit this.  But could be even more nuanced even, eg. a bad sector, flaky 
network connection, etc.

In those cases, its intentional that in this implementation, one success 
does *not* un-blacklist anything.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15246: [MINOR][SQL] Use resource path for test_script.sh

2016-10-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15246
  
**[Test build #66482 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66482/consoleFull)**
 for PR 15246 at commit 
[`1233aa2`](https://github.com/apache/spark/commit/1233aa25d751b94a610f6ac052411596cb0df10d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...

2016-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15307
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66478/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...

2016-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15307
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...

2016-10-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15307
  
**[Test build #66478 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66478/consoleFull)**
 for PR 15307 at commit 
[`10d1c24`](https://github.com/apache/spark/commit/10d1c243a71d464ada33db269a30ad0e4dff3ced).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15367: [SPARK-17346][SQL][test-maven]Add Kafka source for Struc...

2016-10-06 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/15367
  
@marmbrus @zsxwing I agree its experimental and we should have more 
flexibility here with backports. I also very much agree that structured 
streaming in its current state on 2.0 isn't usable - but I'm not super sure 
that backporting fixes is the best way to do this? Honestly I spend most of my 
time focused on Python & ML (and I've only really been looking at structured 
streaming with those two hats on).

I'm really cautious about the idea 2k+ line backport which hasn't even been 
released otherwise but I don't have any specific objections to the changes its 
just making me nervous. The fact the whats being backported seems to still be 
under development is also concerning since doing this backport now puts us in a 
position of backporting more (not yet merged into mainline) fixes.

Of course - If the people with the most experience in this area all agree 
(and most of y'all [ @marmbrus @zsxwing @tdas but maybe missing @koeninger  ] 
seem to already be on this PR so I'll leave you to it) that this backport 
reasonable that is great - it would probably be good to follow up to the 
original backport mailing list thread and update the wiki as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15332: [SPARK-10364][SQL] Support Parquet logical type TIMESTAM...

2016-10-06 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/15332
  
LGTM. see if @davies @liancheng have other comments about this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15388: [SPARK-17821][SQL] Support And and Or in Expression Cano...

2016-10-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15388
  
**[Test build #66481 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66481/consoleFull)**
 for PR 15388 at commit 
[`7e25355`](https://github.com/apache/spark/commit/7e2535554d5a0661490b74ff4422798d98063214).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15388: [SPARK-17821][SQL] Support And and Or in Expression Cano...

2016-10-06 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/15388
  
cc @hvanhovell @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15388: [SPARK-17821][SQL] Support And and Or in Expressi...

2016-10-06 Thread viirya
GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/15388

[SPARK-17821][SQL] Support And and Or in Expression Canonicalize

## What changes were proposed in this pull request?

Currently `Canonicalize` object doesn't support `And` and `Or`. So we can 
compare canonicalized form of predicates consistently. We should add the 
support.

## How was this patch tested?

Jenkins tests.




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 canonicalize-and-or

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15388.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15388


commit 7e2535554d5a0661490b74ff4422798d98063214
Author: Liang-Chi Hsieh 
Date:   2016-10-07T02:54:34Z

Support And and Or in Canonicalize.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15370: [SPARK-17417][Core] Fix # of partitions for Reliable RDD...

2016-10-06 Thread dhruve
Github user dhruve commented on the issue:

https://github.com/apache/spark/pull/15370
  

If we assume file name of the form "part-[0-9]+"
   * Case 1: *Entire RDD* => Verification of file name while reconstructing 
would be satisfied as we read all the checkpointed part files. 
   * Case 2: *Specific Partition* => While trying to reconstruct a specific 
partition, this information would be insufficient to locate the actual part 
file. See `getPreferredLocations ` Should the filename be 
hdfs:////.../part-1 or part-01 or part-...1?

Also, with the NumberFormat impl, files continue to be named upto 5 digits 
by default. Only when you exceed 10 it starts with 6 digits, 7 digits and 
so on. This takes care of the old format as well and handles the case exceeding 
the current limit. 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15354: [SPARK-17764][SQL] Add `to_json` supporting to co...

2016-10-06 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/15354#discussion_r82322230
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -1729,6 +1729,29 @@ def from_json(col, schema, options={}):
 return Column(jc)
 
 
+@ignore_unicode_prefix
+@since(2.1)
+def to_json(col, options={}):
+"""
+Converts a column containing a [[StructType]] into a JSON string. 
Returns `null`,
+in the case of an unsupported type.
+
+:param col: struct column
+:param options: options to control converting. accepts the same 
options as the json datasource
+
+>>> from pyspark.sql import Row
+>>> from pyspark.sql.types import *
+>>> data = [(1, Row(name='Alice', age=2))]
+>>> df = spark.createDataFrame(data, ("key", "value"))
+>>> df.select(to_json(df.value).alias("json")).collect()
+[Row(json=u'{"age":2,"name":"Alice"}')]
+"""
+
+sc = SparkContext._active_spark_context
+jc = sc._jvm.functions.to_json(_to_java_column(col), options)
--- End diff --

actually nvm my original comment, the more I look at this file the less it 
seems the pattern is overly consistent and this same pattern is done elsewhere 
within the file.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15379: [SPARK-17805][PYSPARK] Fix in sqlContext.read.tex...

2016-10-06 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/15379#discussion_r82321597
  
--- Diff: python/pyspark/sql/readwriter.py ---
@@ -289,8 +289,8 @@ def text(self, paths):
 [Row(value=u'hello'), Row(value=u'this')]
 """
 if isinstance(paths, basestring):
-path = [paths]
-return 
self._df(self._jreader.text(self._spark._sc._jvm.PythonUtils.toSeq(path)))
+paths = [paths]
+return 
self._df(self._jreader.text(self._spark._sc._jvm.PythonUtils.toSeq(paths)))
--- End diff --

So I agree keeping path here kind of makes sense.

Its unfortunate we didn't catch the difference in the named parameter 
difference between these reader functions back during 2.0. At this point 
changing the named parameter from paths to path we need to be a bit careful 
with incase people are using named params (if we did that we would need to add 
a version changed note and be careful). We could also have it (transitionally) 
take a kwargs work with either for a version (while updating the pydoc of 
course).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15246: [MINOR][SQL] Use resource path for test_script.sh

2016-10-06 Thread weiqingy
Github user weiqingy commented on a diff in the pull request:

https://github.com/apache/spark/pull/15246#discussion_r82321891
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala 
---
@@ -66,13 +67,14 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils 
with TestHiveSingleton {
   import spark.implicits._
 
   test("script") {
+val scriptFilePath = getPath("test_script.sh")
 if (testCommandAvailable("bash") && testCommandAvailable("echo | 
sed")) {
   val df = Seq(("x1", "y1", "z1"), ("x2", "y2", "z2")).toDF("c1", 
"c2", "c3")
   df.createOrReplaceTempView("script_table")
   val query1 = sql(
-"""
+s"""
--- End diff --

Yes. Good catch. There are some odd corner cases for `s""" """`, but it 
should be OK here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15386: [SPARK-17808][PYSPARK] Upgraded version of Pyrolite to 4...

2016-10-06 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/15386
  
Thanks for working on this - the pylint script found a style problem (PEP8 
checks failed.
./python/pyspark/sql/tests.py:1709:54: E231 missing whitespace after ',') - 
if you want to test the style locally first you can use ./dev/lint-python


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15218: [SPARK-17637][Scheduler]Packed scheduling for Spa...

2016-10-06 Thread zhzhan
Github user zhzhan commented on a diff in the pull request:

https://github.com/apache/spark/pull/15218#discussion_r82321008
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala 
---
@@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler
+
+import scala.collection.mutable.ArrayBuffer
+import scala.collection.mutable.PriorityQueue
+import scala.util.Random
+
+import org.apache.spark.SparkConf
+
+case class OfferState(workOffer: WorkerOffer, var cores: Int) {
+  // Build a list of tasks to assign to each worker.
+  val tasks = new ArrayBuffer[TaskDescription](cores)
+}
+
+abstract class TaskAssigner(conf: SparkConf) {
+  var offer: Seq[OfferState] = _
+  val CPUS_PER_TASK = conf.getInt("spark.task.cpus", 1)
+
+  // The final assigned offer returned to TaskScheduler.
+  def tasks(): Seq[ArrayBuffer[TaskDescription]] = offer.map(_.tasks)
+
+  // construct the assigner by the workoffer.
+  def construct(workOffer: Seq[WorkerOffer]): Unit = {
+offer = workOffer.map(o => OfferState(o, o.cores))
+  }
+
+  // Invoked in each round of Taskset assignment to initialize the 
internal structure.
+  def init(): Unit
+
+  // Indicating whether there is offer available to be used by one round 
of Taskset assignment.
+  def hasNext(): Boolean
+
+  // Next available offer returned to one round of Taskset assignment.
+  def getNext(): OfferState
+
+  // Called by the TaskScheduler to indicate whether the current offer is 
accepted
+  // In order to decide whether the current is valid for the next offering.
+  def taskAssigned(assigned: Boolean): Unit
+
+  // Release internally maintained resources. Subclass is responsible to
+  // release its own private resources.
+  def reset: Unit = {
+offer = null
+  }
+}
+
+class RoundRobinAssigner(conf: SparkConf) extends TaskAssigner(conf) {
+  var i = 0
+  override def construct(workOffer: Seq[WorkerOffer]): Unit = {
+offer = Random.shuffle(workOffer.map(o => OfferState(o, o.cores)))
+  }
+  override def init(): Unit = {
+i = 0
+  }
+  override def hasNext: Boolean = {
+i < offer.size
+  }
+  override def getNext(): OfferState = {
+offer(i)
+  }
+  override def taskAssigned(assigned: Boolean): Unit = {
+i += 1
+  }
+  override def reset: Unit = {
+super.reset
+i = 0
+  }
+}
+
+class BalancedAssigner(conf: SparkConf) extends TaskAssigner(conf) {
--- End diff --

@mridulm Thanks for the comments. But I am lost here. My understanding is 
Ordering-wise, x is equal to y if x.cores == y.cores. This ordering is used by 
priority queue to construct the data structure.  Following is an example from 
trait Ordering. PersonA will be equal to PersionB if they are the same age. Do 
I miss anything?

 * import scala.util.Sorting
  *
  * case class Person(name:String, age:Int)
  * val people = Array(Person("bob", 30), Person("ann", 32), Person("carl", 
19))
  *
  * // sort by age
  * object AgeOrdering extends Ordering[Person] {
  *   def compare(a:Person, b:Person) = a.age compare b.age
  * }
  * Sorting.quickSort(people)(AgeOrdering)
  * }}}


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15375: [SPARK-17790] Support for parallelizing R data.frame lar...

2016-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15375
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15375: [SPARK-17790] Support for parallelizing R data.frame lar...

2016-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15375
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66467/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15375: [SPARK-17790] Support for parallelizing R data.frame lar...

2016-10-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15375
  
**[Test build #66467 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66467/consoleFull)**
 for PR 15375 at commit 
[`8e065c1`](https://github.com/apache/spark/commit/8e065c100389bd5e89f02ffb43319bb2089a44c5).
 * This patch **fails from timeout after a configured wait of \`250m\`**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15365: [SPARK-17157][SPARKR]: Add multiclass logistic regressio...

2016-10-06 Thread wangmiao1981
Github user wangmiao1981 commented on the issue:

https://github.com/apache/spark/pull/15365
  
@felixcheung I fixed the cran errors. It is ready to review now. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #11601: [SPARK-13568] [ML] Create feature transformer to impute ...

2016-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/11601
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #11601: [SPARK-13568] [ML] Create feature transformer to impute ...

2016-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/11601
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66476/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #11601: [SPARK-13568] [ML] Create feature transformer to impute ...

2016-10-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/11601
  
**[Test build #66476 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66476/consoleFull)**
 for PR 11601 at commit 
[`91d4cee`](https://github.com/apache/spark/commit/91d4cee75a150ad2335dba0838c47cb4f0505ad8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15329: [SPARK-17763][SQL] JacksonParser silently parses null as...

2016-10-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/15329
  
Hi @yhuai and @cloud-fan , I recall changing codes here was reviewed by you 
both. Do you mind if I ask to review this please?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15246: [MINOR][SQL] Use resource path for test_script.sh

2016-10-06 Thread weiqingy
Github user weiqingy commented on a diff in the pull request:

https://github.com/apache/spark/pull/15246#discussion_r82317624
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala 
---
@@ -17,6 +17,7 @@
 
 package org.apache.spark.sql.hive.execution
 
+import java.io.File
--- End diff --

No. Will delete it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15246: [MINOR][SQL] Use resource path for test_script.sh

2016-10-06 Thread weiqingy
Github user weiqingy commented on a diff in the pull request:

https://github.com/apache/spark/pull/15246#discussion_r82317563
  
--- Diff: core/src/test/scala/org/apache/spark/SparkFunSuite.scala ---
@@ -41,6 +43,15 @@ abstract class SparkFunSuite
 }
   }
 
+  // helper function
+  protected final def getFile(file: String): File = {
--- End diff --

@srowen getTestResourceFile and getTestResourcePath look better. Thanks.

URL class doesn't have a method like getCanonicalFile. It has getFile only.

Also, I tested  Paths.get(... toURI).toFile. The only difference I noticed 
is that it keeps spaces as usual, but getFile(file).getCanonicalPath converts 
spaces to "%20". I suppose they are both OK.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15249: [SPARK-17675] [CORE] Expand Blacklist for TaskSets

2016-10-06 Thread mridulm
Github user mridulm commented on the issue:

https://github.com/apache/spark/pull/15249
  

Thinking more, and based on what @squito mentioned, I was considering the 
following :

Since we are primarily dealing with executor or nodes which are 'bad' as 
opposed to recoverable failures due to resource contention, prevention of 
degenerate corner cases which existing blacklist is for, etc :

Can we assume a successful task execution on a node will imply healthy node 
?
What about at executor level ?

Proposal is to keep the pr as is for the most part, but :
- Clear nodeToExecsWithFailures when an task on an node succeeds. Same for 
nodeToBlacklistedTaskIndexes.
- Not sure if we want to reset execToFailures for an executor (not clearing 
would imply we are handling resource starvation case implicitly imo).
- If possible - allow for speculative tasks to be scheduled on blacklisted 
nodes/executors if it is possible for countTowardsTaskFailures to be overriden 
to false in those cases (if not, ignore this - since it will add towards number 
of failures per app).
 
The rationale behind this is that successful tasks indicate past failures 
were not indicative of bad nodes/executors, but rather transient failures. And 
speculative tasks also sort of work as probe tasks to determine if the 
node/executor has recovered and is healthy.

I hope I am not missing anything - any thoughts @squito, @kayousterhout, 
@tgravescs ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15361: [SPARK-17765][SQL] Support for writing out user-defined ...

2016-10-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/15361
  
Hi @kxepal , I just tested (copied and pasted) the codes below:

```scala
import org.apache.spark.sql.SparkSession
import spark.implicits._

val spark = SparkSession.builder().appName("Spark Hive 
Example").enableHiveSupport().getOrCreate()
val sv = org.apache.spark.mllib.linalg.Vectors.sparse(7, Array(0, 42), 
Array(-127, 128))
val df = Seq(("thing", sv)).toDF("thing", "vector")
df.write.format("orc").save("/tmp/thing.orc")
```

and it seems fine with the current master branch. Do you mind if I try to 
verify this again when be hopefully backport to branch-2.0?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15366: [SPARK-17793] [Web UI] Sorting on the description on the...

2016-10-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15366
  
**[Test build #66480 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66480/consoleFull)**
 for PR 15366 at commit 
[`c1d2b2b`](https://github.com/apache/spark/commit/c1d2b2bd1e1a12791a180f1b753ca082c97df31c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15387: [SPARK-17782][STREAMING][KAFKA] eliminate race condition...

2016-10-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15387
  
**[Test build #66479 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66479/consoleFull)**
 for PR 15387 at commit 
[`aca55de`](https://github.com/apache/spark/commit/aca55de0624f5634acb04f91636dce79af875fab).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15355: [SPARK-17782][STREAMING] Disable Kafka 010 pattern based...

2016-10-06 Thread koeninger
Github user koeninger commented on the issue:

https://github.com/apache/spark/pull/15355
  
@zsxwing good eye, thanks.  It's not that auto.offset.reset.earliest 
doesn't work, it's that there's a potential race condition that poll gets 
called twice slowly enough for consumer position to be modified before 
topicpartitions are paused.

https://github.com/apache/spark/pull/15387

should address that.

It's something that whoever works on the duplicated equivalent code in the 
structured streaming module is going to have to address, also.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15366: [SPARK-17793] [Web UI] Sorting on the description on the...

2016-10-06 Thread ajbozarth
Github user ajbozarth commented on the issue:

https://github.com/apache/spark/pull/15366
  
Jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15387: [SPARK-17782][STREAMING][KAFKA] eliminate race condition...

2016-10-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15387
  
**[Test build #66477 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66477/consoleFull)**
 for PR 15387 at commit 
[`1fc5863`](https://github.com/apache/spark/commit/1fc5863db88cac9dfd0be09318c4ca8779a51682).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...

2016-10-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15307
  
**[Test build #66478 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66478/consoleFull)**
 for PR 15307 at commit 
[`10d1c24`](https://github.com/apache/spark/commit/10d1c243a71d464ada33db269a30ad0e4dff3ced).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15379: [SPARK-17805][PYSPARK] Fix in sqlContext.read.text when ...

2016-10-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/15379
  
+1 for this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15387: [SPARK-17782][STREAMING][KAFKA] eliminate race co...

2016-10-06 Thread koeninger
GitHub user koeninger opened a pull request:

https://github.com/apache/spark/pull/15387

[SPARK-17782][STREAMING][KAFKA] eliminate race condition of poll twice

## What changes were proposed in this pull request?

Kafka consumers can't subscribe or maintain heartbeat without polling, but 
polling ordinarily consumes messages and adjusts position.  We don't want this 
on the driver, so we poll with a timeout of 0 and pause all topicpartitions.

Some consumer strategies that seek to particular positions have to poll 
first, but they weren't pausing immediately thereafter.  Thus, there was a race 
condition where the second poll() in the DStream start method might actually 
adjust consumer position.

Eliminated (or at least drastically reduced the chance of) the race 
condition via pausing in the relevant consumer strategies, and assert on 
startup that no messages were consumed.

## How was this patch tested?

I reliably reproduced the intermittent test failure by inserting a 
thread.sleep directly before returning from SubscribePattern.  The suggested 
fix eliminated the failure.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/koeninger/spark-1 SPARK-17782

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15387.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15387


commit 1fc5863db88cac9dfd0be09318c4ca8779a51682
Author: cody koeninger 
Date:   2016-10-07T01:08:01Z

[SPARK-17782][STREAMING][KAFKA] eliminate race condition of poll being 
called twice and moving position




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15366: [SPARK-17793] [Web UI] Sorting on the description on the...

2016-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15366
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66469/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15379: [SPARK-17805][PYSPARK] Fix in sqlContext.read.tex...

2016-10-06 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15379#discussion_r82315908
  
--- Diff: python/pyspark/sql/readwriter.py ---
@@ -289,8 +289,8 @@ def text(self, paths):
 [Row(value=u'hello'), Row(value=u'this')]
 """
 if isinstance(paths, basestring):
-path = [paths]
-return 
self._df(self._jreader.text(self._spark._sc._jvm.PythonUtils.toSeq(path)))
+paths = [paths]
+return 
self._df(self._jreader.text(self._spark._sc._jvm.PythonUtils.toSeq(paths)))
--- End diff --

This is a super minor but I think it'd be nicer to match up the variable 
name to `path` if this makes sense. For parquet, it takes non-keyword arguments 
so it seems `paths` but for others, it seems take a single argument.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15366: [SPARK-17793] [Web UI] Sorting on the description on the...

2016-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15366
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15366: [SPARK-17793] [Web UI] Sorting on the description on the...

2016-10-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15366
  
**[Test build #66469 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66469/consoleFull)**
 for PR 15366 at commit 
[`c1d2b2b`](https://github.com/apache/spark/commit/c1d2b2bd1e1a12791a180f1b753ca082c97df31c).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...

2016-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15307
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...

2016-10-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15307
  
**[Test build #66475 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66475/consoleFull)**
 for PR 15307 at commit 
[`2918525`](https://github.com/apache/spark/commit/29185254d325834c40bd63a543317950b2794b30).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...

2016-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15307
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66475/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   >