[GitHub] spark pull request #23072: [SPARK-19827][R]spark.ml R API for PIC

2018-11-17 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/23072#discussion_r234432181
  
--- Diff: R/pkg/R/mllib_clustering.R ---
@@ -610,3 +616,57 @@ setMethod("write.ml", signature(object = "LDAModel", 
path = "character"),
   function(object, path, overwrite = FALSE) {
 write_internal(object, path, overwrite)
   })
+
+#' PowerIterationClustering
+#'
+#' A scalable graph clustering algorithm. Users can call 
\code{spark.assignClusters} to
+#' return a cluster assignment for each input vertex.
+#'
+#  Run the PIC algorithm and returns a cluster assignment for each input 
vertex.
+#' @param data A SparkDataFrame.
+#' @param k The number of clusters to create.
+#' @param initMode Param for the initialization algorithm.
+#' @param maxIter Param for maximum number of iterations.
+#' @param srcCol Param for the name of the input column for source vertex 
IDs.
+#' @param dstCol Name of the input column for destination vertex IDs.
+#' @param weightCol Param for weight column name. If this is not set or 
\code{NULL},
+#'  we treat all instance weights as 1.0.
+#' @param ... additional argument(s) passed to the method.
+#' @return A dataset that contains columns of vertex id and the 
corresponding cluster for the id.
+#' The schema of it will be:
+#' \code{id: Long}
+#' \code{cluster: Int}
+#' @rdname spark.powerIterationClustering
+#' @aliases 
assignClusters,PowerIterationClustering-method,SparkDataFrame-method
+#' @examples
+#' \dontrun{
+#' df <- createDataFrame(list(list(0L, 1L, 1.0), list(0L, 2L, 1.0),
+#'   list(1L, 2L, 1.0), list(3L, 4L, 1.0),
+#'   list(4L, 0L, 0.1)), schema = c("src", "dst", 
"weight"))
+#' clusters <- spark.assignClusters(df, initMode="degree", 
weightCol="weight")
+#' showDF(clusters)
+#' }
+#' @note spark.assignClusters(SparkDataFrame) since 3.0.0
+setMethod("spark.assignClusters",
+  signature(data = "SparkDataFrame"),
+  function(data, k = 2L, initMode = "random", maxIter = 20L, 
srcCol = "src",
+dstCol = "dst", weightCol = NULL) {
--- End diff --

I  think we try to avoid srcCol dstCol in R (I think there are other R ml 
APIs like that)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23072: [SPARK-19827][R]spark.ml R API for PIC

2018-11-17 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/23072#discussion_r234432019
  
--- Diff: R/pkg/R/mllib_clustering.R ---
@@ -610,3 +616,57 @@ setMethod("write.ml", signature(object = "LDAModel", 
path = "character"),
   function(object, path, overwrite = FALSE) {
 write_internal(object, path, overwrite)
   })
+
+#' PowerIterationClustering
+#'
+#' A scalable graph clustering algorithm. Users can call 
\code{spark.assignClusters} to
+#' return a cluster assignment for each input vertex.
+#'
+#  Run the PIC algorithm and returns a cluster assignment for each input 
vertex.
+#' @param data A SparkDataFrame.
+#' @param k The number of clusters to create.
+#' @param initMode Param for the initialization algorithm.
+#' @param maxIter Param for maximum number of iterations.
+#' @param srcCol Param for the name of the input column for source vertex 
IDs.
+#' @param dstCol Name of the input column for destination vertex IDs.
+#' @param weightCol Param for weight column name. If this is not set or 
\code{NULL},
+#'  we treat all instance weights as 1.0.
+#' @param ... additional argument(s) passed to the method.
+#' @return A dataset that contains columns of vertex id and the 
corresponding cluster for the id.
+#' The schema of it will be:
+#' \code{id: Long}
+#' \code{cluster: Int}
+#' @rdname spark.powerIterationClustering
+#' @aliases 
assignClusters,PowerIterationClustering-method,SparkDataFrame-method
+#' @examples
+#' \dontrun{
+#' df <- createDataFrame(list(list(0L, 1L, 1.0), list(0L, 2L, 1.0),
+#'   list(1L, 2L, 1.0), list(3L, 4L, 1.0),
+#'   list(4L, 0L, 0.1)), schema = c("src", "dst", 
"weight"))
+#' clusters <- spark.assignClusters(df, initMode="degree", 
weightCol="weight")
+#' showDF(clusters)
+#' }
+#' @note spark.assignClusters(SparkDataFrame) since 3.0.0
+setMethod("spark.assignClusters",
+  signature(data = "SparkDataFrame"),
+  function(data, k = 2L, initMode = "random", maxIter = 20L, 
srcCol = "src",
--- End diff --

set valid values for initMode and check for it - eg. 
https://github.com/apache/spark/pull/23072/files#diff-d9f92e07db6424e2527a7f9d7caa9013R355

and `match.arg(initMode)`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23072: [SPARK-19827][R]spark.ml R API for PIC

2018-11-17 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/23072#discussion_r234432049
  
--- Diff: R/pkg/vignettes/sparkr-vignettes.Rmd ---
@@ -968,6 +970,17 @@ predicted <- predict(model, df)
 head(predicted)
 ```
 
+ Power Iteration Clustering
+
+Power Iteration Clustering (PIC) is a scalable graph clustering algorithm. 
`spark.assignClusters` method runs the PIC algorithm and returns a cluster 
assignment for each input vertex.
+
+```{r}
+df <- createDataFrame(list(list(0L, 1L, 1.0), list(0L, 2L, 1.0),
+  list(1L, 2L, 1.0), list(3L, 4L, 1.0),
+  list(4L, 0L, 0.1)), schema = c("src", "dst", 
"weight"))
+head(spark.assignClusters(df, initMode="degree", weightCol="weight"))
--- End diff --

spacing: `initMode = "degree", weightCol = "weight"`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23073: [SPARK-26104] expose pci info to task scheduler

2018-11-17 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/23073#discussion_r234431864
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/ExecutorData.scala ---
@@ -27,12 +27,14 @@ import org.apache.spark.rpc.{RpcAddress, RpcEndpointRef}
  * @param executorHost The hostname that this executor is running on
  * @param freeCores  The current number of cores available for work on the 
executor
  * @param totalCores The total number of cores available to the executor
+ * @param pcis The external devices avaliable to the executor
--- End diff --

available


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23073: [SPARK-26104] expose pci info to task scheduler

2018-11-17 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/23073
  
please put ^ comment into PR description (because comment is not included 
in commit message once the PR is merged)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23042: [SPARK-26070][SQL] add rule for implicit type coe...

2018-11-17 Thread uzadude
Github user uzadude commented on a diff in the pull request:

https://github.com/apache/spark/pull/23042#discussion_r234431689
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -138,6 +138,11 @@ object TypeCoercion {
 case (DateType, TimestampType)
   => if (conf.compareDateTimestampInTimestamp) Some(TimestampType) 
else Some(StringType)
 
+// to support a popular use case of tables using Decimal(X, 0) for 
long IDs instead of strings
+// see SPARK-26070 for more details
+case (n: DecimalType, s: StringType) if n.scale == 0 => 
Some(DecimalType(n.precision, n.scale))
--- End diff --

I personally agree with @cloud-fan that there are a few types that are 
"definitely safe", and as the user is not always responsible to his input 
tables, I believe convinience is more important than schema definitions. Also, 
even count() returns a bigint then you'll have to filter 'count(*)>100L' which 
means huge regression.
I believe that the "definitely safe" list is very short and we should use 
it. @mgaido91, in your examples I do agree that Double to Decimal is not safe 
and so is String to almost anything.
the trivial safes are something like (Long, Int), (Int, Double), (Decimal, 
Decimal) - that could be expanded to the same precision and scale, maybe (Data, 
TimeStamp)..


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23076: [SPARK-26103][SQL] Added maxDepth to limit the length of...

2018-11-17 Thread DaveDeCaprio
Github user DaveDeCaprio commented on the issue:

https://github.com/apache/spark/pull/23076
  
This contribution is my original work and I license the work to the project 
under the project’s open source license.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23076: [SPARK-26103][SQL] Added maxDepth to limit the length of...

2018-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23076
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23076: [SPARK-26103][SQL] Added maxDepth to limit the length of...

2018-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23076
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23076: [SPARK-26103][SQL] Added maxDepth to limit the length of...

2018-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23076
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23076: [SPARK-26103][SQL] Added maxDepth to limit the le...

2018-11-17 Thread DaveDeCaprio
GitHub user DaveDeCaprio opened a pull request:

https://github.com/apache/spark/pull/23076

[SPARK-26103][SQL] Added maxDepth to limit the length of text plans

Nested query plans can get extremely large (hundreds of megabytes).

## What changes were proposed in this pull request?

The PR puts in a limit on the nesting depth of trees to be printed when 
writing a plan string.  
* The default limit is 15, which allows for reasonably nested plans.  
* A new configuration parameter called spark.debug.maxToStringTreeDepth was 
added to control the depth.
* When plans are truncated, "..." is printed to indicate that tree elements 
were removed.
* A warning is printed out the first time a truncated plan is displayed.  
The warning explains what happened and how to adjust the limit.

## How was this patch tested?

A new unit test in QueryExecutionSuite which creates a highly nested plan 
and then ensures that the printed plan is not too long.

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/DaveDeCaprio/spark max-log-tree-depth

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23076.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23076


commit adc4f8efd4b51b77d3600bcba8f331e92f7ea3c6
Author: Dave DeCaprio 
Date:   2018-11-18T06:29:16Z

Added maxDepth to treeString which limits the depth of the printed string.

commit 3a9743fbc89358055c37cc45437f191fc5f15957
Author: Dave DeCaprio 
Date:   2018-11-18T06:34:42Z

Added maxDepth to treeString which limits the depth of the printed string.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23069: [SPARK-26026][BUILD] Published Scaladoc jars missing fro...

2018-11-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23069
  
**[Test build #4432 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4432/testReport)**
 for PR 23069 at commit 
[`25a311b`](https://github.com/apache/spark/commit/25a311beb9da709b61931dca12a7c443f43efa65).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23065: [SPARK-26090][CORE][SQL][ML] Resolve most miscellaneous ...

2018-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23065
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98972/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23065: [SPARK-26090][CORE][SQL][ML] Resolve most miscellaneous ...

2018-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23065
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23065: [SPARK-26090][CORE][SQL][ML] Resolve most miscellaneous ...

2018-11-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23065
  
**[Test build #98972 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98972/testReport)**
 for PR 23065 at commit 
[`4665696`](https://github.com/apache/spark/commit/4665696f2b28e56b2aa15a2e1b85ce3ff11b3178).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23054: [SPARK-26085][SQL] Key attribute of primitive type under...

2018-11-17 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/23054
  
Ok. I will add a flag. Thanks @rxin 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23075: [SPARK-26084][SQL] Fixes unresolved AggregateExpression....

2018-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23075
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23075: [SPARK-26084][SQL] Fixes unresolved AggregateExpression....

2018-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23075
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23075: [SPARK-26084][SQL] Fixes unresolved AggregateExpression....

2018-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23075
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23075: [SPARK-26084][SQL] Fixes unresolved AggregateExpr...

2018-11-17 Thread ssimeonov
GitHub user ssimeonov opened a pull request:

https://github.com/apache/spark/pull/23075

[SPARK-26084][SQL] Fixes unresolved AggregateExpression.references exception

## What changes were proposed in this pull request?

This PR fixes an exception in `AggregateExpression.references` called on 
unresolved expressions. It implements the solution proposed in 
[SPARK-26084](https://issues.apache.org/jira/browse/SPARK-26084), a minor 
refactoring that removes the unnecessary dependence on `AttributeSet.toSeq`, 
which requires expression IDs and, therefore, can only execute successfully for 
resolved expressions.

The refactored implementation is both simpler and faster, eliminating the 
conversion of a `Set` to a
`Seq` and back to `Set`.

## How was this patch tested?

Local tests pass. I added no new tests as (a) the new behavior has no 
failing case and (b) this is a simple refactoring.

@hvanhovell 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/swoop-inc/spark ss_SPARK-26084

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23075.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23075


commit 178f0a5dff9f7eb8887ed711727b2f83af40ae8a
Author: Simeon Simeonov 
Date:   2018-11-18T01:05:07Z

[SPARK-26084][SQL] Fixes unresolved AggregateExpression.references exception

Implements the solution proposed in 
[SPARK-26084](https://issues.apache.org/jira/browse/SPARK-26084),
a minor refactoring that removes the unnecessary dependence on 
`AttributeSet.toSeq`,
which requires expression IDs and, therefore, can only execute successfully 
for resolved expressions.

The refactored implementation is both simpler and faster, eliminating the 
conversion of a `Set` to a
`Seq` and back to `Set`.

I added no new tests as (a) the new behavior has no failing case and (b) 
this is a simple refactoring.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23069: [SPARK-26026][BUILD] Published Scaladoc jars missing fro...

2018-11-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23069
  
**[Test build #4432 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4432/testReport)**
 for PR 23069 at commit 
[`25a311b`](https://github.com/apache/spark/commit/25a311beb9da709b61931dca12a7c443f43efa65).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23065: [SPARK-26090][CORE][SQL][ML] Resolve most miscellaneous ...

2018-11-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23065
  
**[Test build #98972 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98972/testReport)**
 for PR 23065 at commit 
[`4665696`](https://github.com/apache/spark/commit/4665696f2b28e56b2aa15a2e1b85ce3ff11b3178).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23065: [SPARK-26090][CORE][SQL][ML] Resolve most miscellaneous ...

2018-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23065
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5115/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23065: [SPARK-26090][CORE][SQL][ML] Resolve most miscellaneous ...

2018-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23065
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23074: [SPARK-19798] Refresh table does not have effect on othe...

2018-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23074
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23074: [SPARK-19798] Refresh table does not have effect on othe...

2018-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23074
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23074: [SPARK-19798] Refresh table does not have effect on othe...

2018-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23074
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23074: [SPARK-19798] Refresh table does not have effect ...

2018-11-17 Thread gbloisi
GitHub user gbloisi opened a pull request:

https://github.com/apache/spark/pull/23074

[SPARK-19798] Refresh table does not have effect on other sessions than the 
issuing one

## What changes were proposed in this pull request?
Refresh table command does not have effect on other sessions than the 
issuing one.

Move table relation cache from session catalog to session shared state so 
that different sessions can synchronize when a table is modified and refreshed.

## How was this patch tested?
New test in HiveMetadataCacheSuite

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gbloisi/spark shared-session-cache

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23074.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23074


commit bdd677c94c4e198d1d012c3c66a06ba791dc95bb
Author: Giambattista Bloisi 
Date:   2018-11-17T22:35:19Z

Refresh table command do not have effect on other sessions than the issuing 
one.
Move table relation cache from session catalog to session sharedstate so 
that different sessions can synchronize when refresh table command is issued.
New test in HiveMetadataCacheSuite demonstrates the need.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23073: [SPARK-26104] expose pci info to task scheduler

2018-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23073
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

2018-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23072
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23073: [SPARK-26104] expose pci info to task scheduler

2018-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23073
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23073: [SPARK-26104] expose pci info to task scheduler

2018-11-17 Thread chenqin
GitHub user chenqin opened a pull request:

https://github.com/apache/spark/pull/23073

[SPARK-26104] expose pci info to task scheduler

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chenqin/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23073.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23073


commit 096ce4c1d85a9fad9a5601bd438f9bee86cad2c1
Author: Chen Qin 
Date:   2018-11-17T22:29:37Z

expose pci info to task scheduler




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

2018-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23072
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98971/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23073: [SPARK-26104] expose pci info to task scheduler

2018-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23073
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

2018-11-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23072
  
**[Test build #98971 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98971/testReport)**
 for PR 23072 at commit 
[`9e2b0f9`](https://github.com/apache/spark/commit/9e2b0f9ffe0866fa328bc677500e4f3a49ff384b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23071: [SPARK-26102][SQL][TEST] Extracting common CSV/JSON func...

2018-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23071
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23071: [SPARK-26102][SQL][TEST] Extracting common CSV/JSON func...

2018-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23071
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98970/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23071: [SPARK-26102][SQL][TEST] Extracting common CSV/JSON func...

2018-11-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23071
  
**[Test build #98970 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98970/testReport)**
 for PR 23071 at commit 
[`3884aa3`](https://github.com/apache/spark/commit/3884aa39824914d1f710589b8c1a691780b04cc8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

2018-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23072
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

2018-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23072
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5114/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

2018-11-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23072
  
**[Test build #98971 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98971/testReport)**
 for PR 23072 at commit 
[`9e2b0f9`](https://github.com/apache/spark/commit/9e2b0f9ffe0866fa328bc677500e4f3a49ff384b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23072: [SPARK-19827][R]spark.ml R API for PIC

2018-11-17 Thread huaxingao
GitHub user huaxingao opened a pull request:

https://github.com/apache/spark/pull/23072

[SPARK-19827][R]spark.ml R API for PIC

## What changes were proposed in this pull request?

Add PowerIterationCluster (PIC) in R
## How was this patch tested?
Add test case


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/huaxingao/spark spark-19827

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23072.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23072


commit 9e2b0f9ffe0866fa328bc677500e4f3a49ff384b
Author: Huaxin Gao 
Date:   2018-11-17T21:25:46Z

[SPARK-19827][R]spark.ml R API for PIC




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23066: [SPARK-26043][CORE] Make SparkHadoopUtil private to Spar...

2018-11-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23066
  
**[Test build #4431 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4431/testReport)**
 for PR 23066 at commit 
[`5a79b2e`](https://github.com/apache/spark/commit/5a79b2e73a658b5fffd6b605b109b63cd1c887e2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23071: [SPARK-26102][SQL][TEST] Extracting common CSV/JSON func...

2018-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23071
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23057: [SPARK-26078][SQL] Dedup self-join attributes on IN subq...

2018-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23057
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23071: [SPARK-26102][SQL][TEST] Extracting common CSV/JS...

2018-11-17 Thread MaxGekk
GitHub user MaxGekk opened a pull request:

https://github.com/apache/spark/pull/23071

[SPARK-26102][SQL][TEST] Extracting common CSV/JSON functions tests

## What changes were proposed in this pull request?

Extracted common tests from `CsvFunctionsSuite` and `JsonFunctionsSuite` to 
the `FunctionsTests` trait. 

## How was this patch tested?

by `CsvFunctionsSuite` and `JsonFunctionsSuite`.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/MaxGekk/spark-1 common-functions-tests

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23071.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23071


commit 4b5b7fb7801d8d72be4db3781d8ca439b6eb102d
Author: Maxim Gekk 
Date:   2018-11-17T16:26:29Z

Extract common test to FunctionsTests

commit 4b417ac974f67e6d61086a3c41abbc25854e150c
Author: Maxim Gekk 
Date:   2018-11-17T17:33:56Z

Extracted additional tests 1

commit 974aa8da4c7399205306baf2b34d1e8cb37d75c2
Author: Maxim Gekk 
Date:   2018-11-17T17:34:36Z

Removing unused imports

commit 3884aa39824914d1f710589b8c1a691780b04cc8
Author: Maxim Gekk 
Date:   2018-11-17T18:18:02Z

Extracting the rest tests from CsvFunctionsSuite




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23071: [SPARK-26102][SQL][TEST] Extracting common CSV/JSON func...

2018-11-17 Thread MaxGekk
Github user MaxGekk commented on the issue:

https://github.com/apache/spark/pull/23071
  
@dongjoon-hyun May I ask you to review the PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23071: [SPARK-26102][SQL][TEST] Extracting common CSV/JSON func...

2018-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23071
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5113/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23071: [SPARK-26102][SQL][TEST] Extracting common CSV/JSON func...

2018-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23071
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23057: [SPARK-26078][SQL] Dedup self-join attributes on IN subq...

2018-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23057
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98969/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23057: [SPARK-26078][SQL] Dedup self-join attributes on IN subq...

2018-11-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23057
  
**[Test build #98969 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98969/testReport)**
 for PR 23057 at commit 
[`86106fa`](https://github.com/apache/spark/commit/86106fadcaed6c1a4768138b3d72e8c892b7cd7f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23071: [SPARK-26102][SQL][TEST] Extracting common CSV/JSON func...

2018-11-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23071
  
**[Test build #98970 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98970/testReport)**
 for PR 23071 at commit 
[`3884aa3`](https://github.com/apache/spark/commit/3884aa39824914d1f710589b8c1a691780b04cc8).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23038: [SPARK-25451][SPARK-26100][CORE]Aggregated metrics table...

2018-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23038
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23038: [SPARK-25451][SPARK-26100][CORE]Aggregated metrics table...

2018-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23038
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98966/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23038: [SPARK-25451][SPARK-26100][CORE]Aggregated metrics table...

2018-11-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23038
  
**[Test build #98966 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98966/testReport)**
 for PR 23038 at commit 
[`ad30c36`](https://github.com/apache/spark/commit/ad30c36f63a0b7f14b69d1699291ed9cec591af6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23054: [SPARK-26085][SQL] Key attribute of primitive type under...

2018-11-17 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/23054
  
We should add a “legacy” flag in case somebody’s workload gets broken 
by this. We can remove the legacy flag in a future release.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23038: [SPARK-25451][SPARK-26100][CORE]Aggregated metrics table...

2018-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23038
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98967/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23038: [SPARK-25451][SPARK-26100][CORE]Aggregated metrics table...

2018-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23038
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23038: [SPARK-25451][SPARK-26100][CORE]Aggregated metrics table...

2018-11-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23038
  
**[Test build #98967 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98967/testReport)**
 for PR 23038 at commit 
[`dca941d`](https://github.com/apache/spark/commit/dca941d316543526ea429c2b6a993c2252d09fd6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23038: [SPARK-25451][SPARK-26100][CORE]Aggregated metrics table...

2018-11-17 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/23038
  
It is random failure


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23070: [SPARK-26099][SQL] Verification of the corrupt column in...

2018-11-17 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/23070
  
L9oks good to me. I or someone else should take a closer look tho.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23043: [SPARK-26021][SQL] replace minus zero with zero in Unsaf...

2018-11-17 Thread adoron
Github user adoron commented on the issue:

https://github.com/apache/spark/pull/23043
  
@cloud-fan changing writeDouble/writeFloat in UnsafeWriter indeed do the 
trick!
I'll fix the PR. I was thinking about making the change in 
`Platform::putDouble` so all accesses get affected, in UnsafeRow and 
UnsafeWriter as well.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23038: [SPARK-25451][SPARK-26100][CORE]Aggregated metrics table...

2018-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23038
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98968/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23038: [SPARK-25451][SPARK-26100][CORE]Aggregated metrics table...

2018-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23038
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23038: [SPARK-25451][SPARK-26100][CORE]Aggregated metrics table...

2018-11-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23038
  
**[Test build #98968 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98968/testReport)**
 for PR 23038 at commit 
[`a21bc0c`](https://github.com/apache/spark/commit/a21bc0c3a24a468bf8147c7ee6f7ef12e384c454).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23070: [SPARK-26099][SQL] Verification of the corrupt column in...

2018-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23070
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23070: [SPARK-26099][SQL] Verification of the corrupt column in...

2018-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23070
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98965/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23070: [SPARK-26099][SQL] Verification of the corrupt column in...

2018-11-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23070
  
**[Test build #98965 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98965/testReport)**
 for PR 23070 at commit 
[`bd2debc`](https://github.com/apache/spark/commit/bd2debcc2237ad178ef00b762bcdc80b63d1ecb7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23069: [SPARK-26026][BUILD] Published Scaladoc jars missing fro...

2018-11-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23069
  
**[Test build #4430 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4430/testReport)**
 for PR 23069 at commit 
[`25a311b`](https://github.com/apache/spark/commit/25a311beb9da709b61931dca12a7c443f43efa65).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23016: [SPARK-26006][mllib] unpersist 'dataInternalRepr' in the...

2018-11-17 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/23016
  
Thank you @srowen 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23057: [SPARK-26078][SQL] Dedup self-join attributes on ...

2018-11-17 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/23057#discussion_r234412635
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala
 ---
@@ -119,7 +139,7 @@ object RewritePredicateSubquery extends 
Rule[LogicalPlan] with PredicateHelper {
   // (A.A1 = B.B1 OR ISNULL(A.A1 = B.B1)) AND (B.B2 = A.A2) AND 
B.B3 > 1
   val finalJoinCond = (nullAwareJoinConds ++ 
conditions).reduceLeft(And)
   // Deduplicate conflicting attributes if any.
-  dedupJoin(Join(outerPlan, sub, LeftAnti, Option(finalJoinCond)))
+  dedupJoin(Join(outerPlan, newSub, LeftAnti, 
Option(finalJoinCond)))
 case (p, predicate) =>
   val (newCond, inputPlan) = 
rewriteExistentialExpr(Seq(predicate), p)
   Project(p.output, Filter(newCond.get, inputPlan))
--- End diff --

Can you try this test case?

```scala
val df1 = spark.sql(
"""
  |SELECT id,num,source FROM (
  |  SELECT id, num, 'a' as source FROM a
  |  UNION ALL
  |  SELECT id, num, 'b' as source FROM b
  |) AS c WHERE c.id IN (SELECT id FROM b WHERE num = 2) OR
  |c.id IN (SELECT id FROM b WHERE num = 3)
""".stripMargin)
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23065: [SPARK-26090][CORE][SQL][ML] Resolve most miscellaneous ...

2018-11-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23065
  
**[Test build #4429 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4429/testReport)**
 for PR 23065 at commit 
[`e2e375b`](https://github.com/apache/spark/commit/e2e375b592ccbbf2e468736fb2ee00b33787c58e).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22986: [SPARK-25959][ML] GBTClassifier picks wrong impur...

2018-11-17 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22986


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22986: [SPARK-25959][ML] GBTClassifier picks wrong impur...

2018-11-17 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22986#discussion_r234412241
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala ---
@@ -258,11 +258,7 @@ private[ml] object TreeClassifierParams {
 private[ml] trait DecisionTreeClassifierParams
   extends DecisionTreeParams with TreeClassifierParams
 
-/**
- * Parameters for Decision Tree-based regression algorithms.
- */
-private[ml] trait TreeRegressorParams extends Params {
--- End diff --

I see. I am not sure how to verify that.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23016: [SPARK-26006][mllib] unpersist 'dataInternalRepr'...

2018-11-17 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/23016


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22986: [SPARK-25959][ML] GBTClassifier picks wrong impur...

2018-11-17 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/22986#discussion_r234412163
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala ---
@@ -258,11 +258,7 @@ private[ml] object TreeClassifierParams {
 private[ml] trait DecisionTreeClassifierParams
   extends DecisionTreeParams with TreeClassifierParams
 
-/**
- * Parameters for Decision Tree-based regression algorithms.
- */
-private[ml] trait TreeRegressorParams extends Params {
--- End diff --

That's true, but I don't know if we can back-port it because of the binary 
incompatibility, internal as it may be. I don't know. If it's not an issue then 
yes it can back port


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22986: [SPARK-25959][ML] GBTClassifier picks wrong impurity sta...

2018-11-17 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/22986
  
Merged to master


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23016: [SPARK-26006][mllib] unpersist 'dataInternalRepr' in the...

2018-11-17 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/23016
  
Merged to master


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23066: [SPARK-26043][CORE] Make SparkHadoopUtil private to Spar...

2018-11-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23066
  
**[Test build #4431 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4431/testReport)**
 for PR 23066 at commit 
[`5a79b2e`](https://github.com/apache/spark/commit/5a79b2e73a658b5fffd6b605b109b63cd1c887e2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23070: [SPARK-26099][SQL] Verification of the corrupt column in...

2018-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23070
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98963/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23070: [SPARK-26099][SQL] Verification of the corrupt column in...

2018-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23070
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23070: [SPARK-26099][SQL] Verification of the corrupt column in...

2018-11-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23070
  
**[Test build #98963 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98963/testReport)**
 for PR 23070 at commit 
[`bd2debc`](https://github.com/apache/spark/commit/bd2debcc2237ad178ef00b762bcdc80b63d1ecb7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pyspark.me...

2018-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23055
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pyspark.me...

2018-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23055
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98962/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pyspark.me...

2018-11-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23055
  
**[Test build #98962 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98962/testReport)**
 for PR 23055 at commit 
[`52a91cc`](https://github.com/apache/spark/commit/52a91cc887462227caf65eb85c0f01d5e8fd0485).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23057: [SPARK-26078][SQL] Dedup self-join attributes on IN subq...

2018-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23057
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5112/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23057: [SPARK-26078][SQL] Dedup self-join attributes on IN subq...

2018-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23057
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23057: [SPARK-26078][SQL] Dedup self-join attributes on IN subq...

2018-11-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23057
  
**[Test build #98969 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98969/testReport)**
 for PR 23057 at commit 
[`86106fa`](https://github.com/apache/spark/commit/86106fadcaed6c1a4768138b3d72e8c892b7cd7f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23057: [SPARK-26078][SQL] Dedup self-join attributes on ...

2018-11-17 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/23057#discussion_r234410124
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala
 ---
@@ -119,7 +139,7 @@ object RewritePredicateSubquery extends 
Rule[LogicalPlan] with PredicateHelper {
   // (A.A1 = B.B1 OR ISNULL(A.A1 = B.B1)) AND (B.B2 = A.A2) AND 
B.B3 > 1
   val finalJoinCond = (nullAwareJoinConds ++ 
conditions).reduceLeft(And)
   // Deduplicate conflicting attributes if any.
-  dedupJoin(Join(outerPlan, sub, LeftAnti, Option(finalJoinCond)))
+  dedupJoin(Join(outerPlan, newSub, LeftAnti, 
Option(finalJoinCond)))
 case (p, predicate) =>
   val (newCond, inputPlan) = 
rewriteExistentialExpr(Seq(predicate), p)
   Project(p.output, Filter(newCond.get, inputPlan))
--- End diff --

mmmh...`rewriteExistentialExpr` operates on the result of the `foldLeft`,so 
every `InSubquery` there was already transformed using 
`dedupSubqueryOnSelfJoin`, right? So I don't think it is needed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23066: [SPARK-26043][CORE] Make SparkHadoopUtil private to Spar...

2018-11-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23066
  
**[Test build #4428 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4428/testReport)**
 for PR 23066 at commit 
[`5a79b2e`](https://github.com/apache/spark/commit/5a79b2e73a658b5fffd6b605b109b63cd1c887e2).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23068: [SPARK-26098][WebUI] Show associated SQL query in Job pa...

2018-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23068
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23068: [SPARK-26098][WebUI] Show associated SQL query in Job pa...

2018-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23068
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98960/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23057: [SPARK-26078][SQL] Dedup self-join attributes on ...

2018-11-17 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/23057#discussion_r234409196
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala
 ---
@@ -119,7 +139,7 @@ object RewritePredicateSubquery extends 
Rule[LogicalPlan] with PredicateHelper {
   // (A.A1 = B.B1 OR ISNULL(A.A1 = B.B1)) AND (B.B2 = A.A2) AND 
B.B3 > 1
   val finalJoinCond = (nullAwareJoinConds ++ 
conditions).reduceLeft(And)
   // Deduplicate conflicting attributes if any.
-  dedupJoin(Join(outerPlan, sub, LeftAnti, Option(finalJoinCond)))
+  dedupJoin(Join(outerPlan, newSub, LeftAnti, 
Option(finalJoinCond)))
 case (p, predicate) =>
   val (newCond, inputPlan) = 
rewriteExistentialExpr(Seq(predicate), p)
   Project(p.output, Filter(newCond.get, inputPlan))
--- End diff --

In `rewriteExistentialExpr`, there is a similar logic for `InSubquery`. 
Should we also do `dedupSubqueryOnSelfJoin` for it?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23068: [SPARK-26098][WebUI] Show associated SQL query in Job pa...

2018-11-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23068
  
**[Test build #98960 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98960/testReport)**
 for PR 23068 at commit 
[`e7c2ebb`](https://github.com/apache/spark/commit/e7c2ebbda949918034cb9cb92ac6ef30af17d943).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23057: [SPARK-26078][SQL] Dedup self-join attributes on ...

2018-11-17 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/23057#discussion_r234409212
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala 
---
@@ -1280,4 +1281,34 @@ class SubquerySuite extends QueryTest with 
SharedSQLContext {
   assert(subqueries.length == 1)
 }
   }
+
+  test("SPARK-26078: deduplicate fake self joins for IN subqueries") {
+withTempView("a", "b") {
+  val a = 
spark.createDataFrame(spark.sparkContext.parallelize(Seq(Row("a", 2), Row("b", 
1))),
+StructType(Seq(StructField("id", StringType), StructField("num", 
IntegerType
+  val b = 
spark.createDataFrame(spark.sparkContext.parallelize(Seq(Row("a", 2), Row("b", 
1))),
+StructType(Seq(StructField("id", StringType), StructField("num", 
IntegerType
--- End diff --

Two schema is the same. We can define it just once?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23057: [SPARK-26078][SQL] Dedup self-join attributes on ...

2018-11-17 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/23057#discussion_r234409158
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala
 ---
@@ -70,6 +67,27 @@ object RewritePredicateSubquery extends 
Rule[LogicalPlan] with PredicateHelper {
 case _ => joinPlan
   }
 
+  private def rewriteDedupPlan(plan: LogicalPlan, rewrites: 
AttributeMap[Alias]): LogicalPlan = {
+val aliasedExpressions = plan.output.map { ref =>
+  rewrites.getOrElse(ref, ref)
+}
+Project(aliasedExpressions, plan)
+  }
+
+  private def dedupSubqueryOnSelfJoin(values: Seq[Expression], sub: 
LogicalPlan): LogicalPlan = {
+val leftRefs = AttributeSet.fromAttributeSets(values.map(_.references))
+val rightRefs = AttributeSet(sub.output)
--- End diff --

This is just `outputSet`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23068: [SPARK-26098][WebUI] Show associated SQL query in Job pa...

2018-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23068
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98959/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23068: [SPARK-26098][WebUI] Show associated SQL query in Job pa...

2018-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23068
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23068: [SPARK-26098][WebUI] Show associated SQL query in Job pa...

2018-11-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23068
  
**[Test build #98959 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98959/testReport)**
 for PR 23068 at commit 
[`e7c2ebb`](https://github.com/apache/spark/commit/e7c2ebbda949918034cb9cb92ac6ef30af17d943).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23038: [SPARK-25451][SPARK-26100][CORE]Aggregated metrics table...

2018-11-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23038
  
**[Test build #98968 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98968/testReport)**
 for PR 23038 at commit 
[`a21bc0c`](https://github.com/apache/spark/commit/a21bc0c3a24a468bf8147c7ee6f7ef12e384c454).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   >