date:20170803

[GitHub] spark pull request #18836: Update SortMergeJoinExec.scala

2017-08-03 Thread BoleynSu

Github user BoleynSu commented on a diff in the pull request:

https://github.com/apache/spark/pull/18836#discussion_r131316095
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala
 ---
@@ -82,7 +82,7 @@ case class SortMergeJoinExec(
 
   override def outputOrdering: Seq[SortOrder] = joinType match {
 // For inner join, orders of both sides keys should be kept.
-case Inner =>
+case _: InnerLike =>
--- End diff --

I think we can get a SortMergeJoin paln with Cross, e.g.  `select distinct 
a.i + 1,a.* from T a cross join T t where a.i > 1 and t.i = a.i group by a.i 
having a.i > 2`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties...

2017-08-03 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18668#discussion_r131315665
  
--- Diff: 
sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala
 ---
@@ -157,12 +168,8 @@ private[hive] object SparkSQLCLIDriver extends Logging 
{
 // Execute -i init files (always in silent mode)
 cli.processInitFiles(sessionState)
 
-// Respect the configurations set by --hiveconf from the command line
-// (based on Hive's CliDriver).
-val it = sessionState.getOverriddenConfigurations.entrySet().iterator()
-while (it.hasNext) {
-  val kv = it.next()
-  SparkSQLEnv.sqlContext.setConf(kv.getKey, kv.getValue)
+newHiveConf.foreach{ kv =>
--- End diff --

`foreach{` -> `foreach {`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties...

2017-08-03 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18668#discussion_r131314418
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala 
---
@@ -404,6 +404,13 @@ private[spark] object HiveUtils extends Logging {
 propMap.put(ConfVars.METASTORE_EVENT_LISTENERS.varname, "")
 propMap.put(ConfVars.METASTORE_END_FUNCTION_LISTENERS.varname, "")
 
+// Copy any "spark.hadoop.foo=bar" system properties into conf as 
"foo=bar"
--- End diff --

@yaooqinn Please follow what @tejasapatil said and create a util function. 
In addition, `newTemporaryConfiguration` is being used for `SparkSQLCLIDriver`, 
and thus, please update the function description of 
`newTemporaryConfiguration`. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18820: [SPARK-14932][SQL] Allow DataFrame.replace() to replace ...

2017-08-03 Thread nchammas

Github user nchammas commented on the issue:

https://github.com/apache/spark/pull/18820
  
> I don't think we should allow user to change field nullability while 
doing replace.

Why not? As long as we correctly update the schema from non-nullable to 
nullable, it seems OK to me. What would we be protecting against by disallowing 
this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18820: [SPARK-14932][SQL] Allow DataFrame.replace() to replace ...

2017-08-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18820
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80229/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18820: [SPARK-14932][SQL] Allow DataFrame.replace() to replace ...

2017-08-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18820
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18690: [SPARK-21334][CORE] Add metrics reporting service to Ext...

2017-08-03 Thread raajay

Github user raajay commented on the issue:

https://github.com/apache/spark/pull/18690
  
I understand. My previous comment was just a clarification to your 
question: "I'm not sure how does this code work in your changes?". I will close 
this PR. The JIRA is already closed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18820: [SPARK-14932][SQL] Allow DataFrame.replace() to replace ...

2017-08-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18820
  
**[Test build #80229 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80229/testReport)**
 for PR 18820 at commit 
[`dfbcaf3`](https://github.com/apache/spark/commit/dfbcaf3e47126eaf6fd3a0276054cad01dbff71a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18690: [SPARK-21334][CORE] Add metrics reporting service...

2017-08-03 Thread raajay

Github user raajay closed the pull request at:

https://github.com/apache/spark/pull/18690


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18742: [Spark-21542][ML][Python]Python persistence helper funct...

2017-08-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18742
  
**[Test build #80234 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80234/testReport)**
 for PR 18742 at commit 
[`8a3c6d6`](https://github.com/apache/spark/commit/8a3c6d6fdceeb9f11454a96d9302bc17ca1cf1e9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18831: [SPARK-21622][ML][SparkR] Support offset in Spark...

2017-08-03 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/18831#discussion_r131313635
  
--- Diff: R/pkg/tests/fulltests/test_mllib_regression.R ---
@@ -173,6 +173,14 @@ test_that("spark.glm summary", {
   expect_equal(stats$df.residual, rStats$df.residual)
   expect_equal(stats$aic, rStats$aic)
 
+  # Test spark.glm works with offset
+  training <- suppressWarnings(createDataFrame(iris))
+  stats <- summary(spark.glm(training, Sepal_Width ~ Sepal_Length + 
Species,
+ family = poisson(), offsetCol = 
"Petal_Length"))
+  rStats <- suppressWarnings(summary(glm(Sepal.Width ~ Sepal.Length + 
Species,
+data = iris, family = poisson(), offset = 
iris$Petal.Length)))
--- End diff --

that's interesting - perhaps we should take `col` in addition to `col name` 
too


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18829: [SPARK-21620][WEB-UI][CORE]Add metrics url in spark web ...

2017-08-03 Thread guoxiaolongzte

Github user guoxiaolongzte commented on the issue:

https://github.com/apache/spark/pull/18829
  
I understand what you mean. These metricsURLs do not need to be displayed 
in the WEB UI.
Important metrics can be used as a header for the WEB UI, such as 
'aliveWorkers'


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18829: [SPARK-21620][WEB-UI][CORE]Add metrics url in spark web ...

2017-08-03 Thread ajbozarth

Github user ajbozarth commented on the issue:

https://github.com/apache/spark/pull/18829
  
I'm not personally saying these metrics need to be in the Web UI, I'm just 
saying that if you think they're important enough to surface this way then they 
should be important enough to you to have them displayed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18829: [SPARK-21620][WEB-UI][CORE]Add metrics url in spark web ...

2017-08-03 Thread guoxiaolongzte

Github user guoxiaolongzte commented on the issue:

https://github.com/apache/spark/pull/18829
  
Your comments I accepted, thank you.
If you really make these important metrics to WEB UI, the workload is not 
small. I will try to do that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18779: [SPARK-21580][SQL]Integers in aggregation express...

2017-08-03 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18779#discussion_r131311220
  
--- Diff: sql/core/src/test/resources/sql-tests/inputs/group-by-ordinal.sql 
---
@@ -52,8 +52,19 @@ select count(a), a from (select 1 as a) tmp group by 2 
having a > 0;
 -- mixed cases: group-by ordinals and aliases
 select a, a AS k, count(b) from data group by k, 1;
 
--- turn of group by ordinal
+-- turn off group by ordinal
 set spark.sql.groupByOrdinal=false;
 
 -- can now group by negative literal
 select sum(b) from data group by -1;
+
+-- SPARK-21580 ints in aggregation expressions are taken as group-by 
ordinal
+select 4, b from data group by 1, 2;
+
+set spark.sql.groupByOrdinal=true;
+
+select 4, b from data group by 1, 2;
+
+select 3, 4, sum(b) from data group by 1, 2;
--- End diff --

We can move those test queries to the test suite like 
`DataFrameAggregateSuite`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18779: [SPARK-21580][SQL]Integers in aggregation express...

2017-08-03 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18779#discussion_r131311047
  
--- Diff: sql/core/src/test/resources/sql-tests/inputs/group-by-ordinal.sql 
---
@@ -52,8 +52,19 @@ select count(a), a from (select 1 as a) tmp group by 2 
having a > 0;
 -- mixed cases: group-by ordinals and aliases
 select a, a AS k, count(b) from data group by k, 1;
 
--- turn of group by ordinal
+-- turn off group by ordinal
 set spark.sql.groupByOrdinal=false;
 
 -- can now group by negative literal
 select sum(b) from data group by -1;
+
+-- SPARK-21580 ints in aggregation expressions are taken as group-by 
ordinal
+select 4, b from data group by 1, 2;
+
+set spark.sql.groupByOrdinal=true;
+
+select 4, b from data group by 1, 2;
+
+select 3, 4, sum(b) from data group by 1, 2;
--- End diff --

Only the methods like `Dataset.show` causes en-entrance of analyzed plans. 
`collect` won't. I guess the queries here are evaluated with `collect`.

scala> sql("select 4, b, sum(b) from data group by 1, 2").show
org.apache.spark.sql.AnalysisException: GROUP BY position 4 is not in 
select list (valid range is [1, 3]); line 1 pos 7

scala> sql("select 4, b, sum(b) from data group by 1, 2").collect
res2: Array[org.apache.spark.sql.Row] = Array([4,3,3], [4,4,4], 
[4,2,2])





---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18813: [SPARK-21567][SQL] Dataset should work with type alias

2017-08-03 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18813
  
ping @cloud-fan May you have time to look at this? Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18779: [SPARK-21580][SQL]Integers in aggregation express...

2017-08-03 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18779#discussion_r131309985
  
--- Diff: sql/core/src/test/resources/sql-tests/inputs/group-by-ordinal.sql 
---
@@ -52,8 +52,19 @@ select count(a), a from (select 1 as a) tmp group by 2 
having a > 0;
 -- mixed cases: group-by ordinals and aliases
 select a, a AS k, count(b) from data group by k, 1;
 
--- turn of group by ordinal
+-- turn off group by ordinal
 set spark.sql.groupByOrdinal=false;
 
 -- can now group by negative literal
 select sum(b) from data group by -1;
+
+-- SPARK-21580 ints in aggregation expressions are taken as group-by 
ordinal
+select 4, b from data group by 1, 2;
+
+set spark.sql.groupByOrdinal=true;
+
+select 4, b from data group by 1, 2;
+
+select 3, 4, sum(b) from data group by 1, 2;
--- End diff --

Can those tests this bug? No matter I use `transform` or  
`resolveOperators`, they generate the same results.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18840: [SPARK-21565] Propagate metadata in attribute replacemen...

2017-08-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18840
  
**[Test build #80233 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80233/testReport)**
 for PR 18840 at commit 
[`e54d812`](https://github.com/apache/spark/commit/e54d81200569c2260f0995b2f91aa9829dc10ad7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18840: [SPARK-21565] Propagate metadata in attribute rep...

2017-08-03 Thread zsxwing

Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/18840#discussion_r131309601
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/EventTimeWatermarkSuite.scala
 ---
@@ -391,6 +391,30 @@ class EventTimeWatermarkSuite extends StreamTest with 
BeforeAndAfter with Matche
 checkDataset[Long](df, 1L to 100L: _*)
   }
 
+  test("SPARK-21565: watermark operator accepts attributes from 
replacement") {
+withTempDir { dir =>
+  dir.delete()
+
+  val df = Seq(("a", 100.0, new java.sql.Timestamp(100L)))
+.toDF("symbol", "price", "eventTime")
+  df.write.json(dir.getCanonicalPath)
+
+  val input = spark.readStream.schema(df.schema)
+.json(dir.getCanonicalPath)
+
+  val groupEvents = input
+.withWatermark("eventTime", "2 seconds")
+.groupBy("symbol", "eventTime")
+.agg(count("price") as 'count)
+.select("symbol", "eventTime", "count")
+  val q = groupEvents.writeStream
+.outputMode("append")
+.format("console")
+.start()
+  q.processAllAvailable()
--- End diff --

nit: `q.processAllAvailable()` ->
```
try {
  q.processAllAvailable()
} finally {
  q.stop()
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18779: [SPARK-21580][SQL]Integers in aggregation express...

2017-08-03 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18779#discussion_r131309398
  
--- Diff: sql/core/src/test/resources/sql-tests/inputs/group-by-ordinal.sql 
---
@@ -52,8 +52,19 @@ select count(a), a from (select 1 as a) tmp group by 2 
having a > 0;
 -- mixed cases: group-by ordinals and aliases
 select a, a AS k, count(b) from data group by k, 1;
 
--- turn of group by ordinal
+-- turn off group by ordinal
 set spark.sql.groupByOrdinal=false;
 
 -- can now group by negative literal
 select sum(b) from data group by -1;
+
+-- SPARK-21580 ints in aggregation expressions are taken as group-by 
ordinal
+select 4, b from data group by 1, 2;
+
+set spark.sql.groupByOrdinal=true;
+
+select 4, b from data group by 1, 2;
+
+select 3, 4, sum(b) from data group by 1, 2;
--- End diff --

Can those tests this bug? No matter I use `transform` or  
`resolveOperators`, they generate the same results.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18840: [SPARK-21565] Propagate metadata in attribute replacemen...

2017-08-03 Thread zsxwing

Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/18840
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18779: [SPARK-21580][SQL]Integers in aggregation express...

2017-08-03 Thread 10110346

Github user 10110346 commented on a diff in the pull request:

https://github.com/apache/spark/pull/18779#discussion_r131309163
  
--- Diff: sql/core/src/test/resources/sql-tests/inputs/group-by-ordinal.sql 
---
@@ -52,8 +52,19 @@ select count(a), a from (select 1 as a) tmp group by 2 
having a > 0;
 -- mixed cases: group-by ordinals and aliases
 select a, a AS k, count(b) from data group by k, 1;
 
--- turn of group by ordinal
+-- turn off group by ordinal
 set spark.sql.groupByOrdinal=false;
 
 -- can now group by negative literal
 select sum(b) from data group by -1;
+
+-- SPARK-21580 ints in aggregation expressions are taken as group-by 
ordinal
--- End diff --

OK,thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18779: [SPARK-21580][SQL]Integers in aggregation express...

2017-08-03 Thread 10110346

Github user 10110346 commented on a diff in the pull request:

https://github.com/apache/spark/pull/18779#discussion_r131309076
  
--- Diff: sql/core/src/test/resources/sql-tests/inputs/group-by-ordinal.sql 
---
@@ -52,8 +52,19 @@ select count(a), a from (select 1 as a) tmp group by 2 
having a > 0;
 -- mixed cases: group-by ordinals and aliases
 select a, a AS k, count(b) from data group by k, 1;
 
--- turn of group by ordinal
+-- turn off group by ordinal
 set spark.sql.groupByOrdinal=false;
 
 -- can now group by negative literal
 select sum(b) from data group by -1;
+
+-- SPARK-21580 ints in aggregation expressions are taken as group-by 
ordinal
+select 4, b from data group by 1, 2;
+
+set spark.sql.groupByOrdinal=true;
+
+select 4, b from data group by 1, 2;
+
+select 3, 4, sum(b) from data group by 1, 2;
--- End diff --

Also, it  fixes this query:`select 3 as c, 4 as d, sum(b) from data group 
by c, d;`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18764: [SPARK-21306][ML] For branch 2.0, OneVsRest should suppo...

2017-08-03 Thread facaiy

Github user facaiy commented on the issue:

https://github.com/apache/spark/pull/18764
  
Test failures in pyspark.ml.tests with python2.6, but I don't have the 
environment. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18742: [Spark-21542][ML][Python]Python persistence helper funct...

2017-08-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18742
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80230/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18742: [Spark-21542][ML][Python]Python persistence helper funct...

2017-08-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18742
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18742: [Spark-21542][ML][Python]Python persistence helper funct...

2017-08-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18742
  
**[Test build #80230 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80230/testReport)**
 for PR 18742 at commit 
[`cd22c59`](https://github.com/apache/spark/commit/cd22c59ed6e9b11ca9690f11a20febda02e9c896).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18664: [SPARK-21375][PYSPARK][SQL][WIP] Add Date and Tim...

2017-08-03 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/18664#discussion_r131308903
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -3036,6 +3052,9 @@ def test_toPandas_arrow_toggle(self):
 pdf = df.toPandas()
 self.spark.conf.set("spark.sql.execution.arrow.enable", "true")
 pdf_arrow = df.toPandas()
+# need to remove timezone for comparison
+pdf_arrow["7_timestamp_t"] = \
+pdf_arrow["7_timestamp_t"].apply(lambda ts: 
ts.tz_localize(None))
--- End diff --

I talked to @gatorsmile and he suggested that we should have another 
configuration to control the behavior of `df.toPandas()` to handle timezone or 
not, the default behavior of which is to not handle it as the same as the 
current behavior.
I'll submit a pr to do it with "without-Arrow" version asap and I'd like 
you to follow the behavior.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18779: [SPARK-21580][SQL]Integers in aggregation express...

2017-08-03 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18779#discussion_r131308502
  
--- Diff: sql/core/src/test/resources/sql-tests/inputs/group-by-ordinal.sql 
---
@@ -52,8 +52,19 @@ select count(a), a from (select 1 as a) tmp group by 2 
having a > 0;
 -- mixed cases: group-by ordinals and aliases
 select a, a AS k, count(b) from data group by k, 1;
 
--- turn of group by ordinal
+-- turn off group by ordinal
 set spark.sql.groupByOrdinal=false;
 
 -- can now group by negative literal
 select sum(b) from data group by -1;
+
+-- SPARK-21580 ints in aggregation expressions are taken as group-by 
ordinal
--- End diff --

Could you move this line to the line before the line 68


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18779: [SPARK-21580][SQL]Integers in aggregation express...

2017-08-03 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18779#discussion_r131308421
  
--- Diff: sql/core/src/test/resources/sql-tests/inputs/group-by-ordinal.sql 
---
@@ -52,8 +52,19 @@ select count(a), a from (select 1 as a) tmp group by 2 
having a > 0;
 -- mixed cases: group-by ordinals and aliases
 select a, a AS k, count(b) from data group by k, 1;
 
--- turn of group by ordinal
+-- turn off group by ordinal
 set spark.sql.groupByOrdinal=false;
 
 -- can now group by negative literal
 select sum(b) from data group by -1;
+
+-- SPARK-21580 ints in aggregation expressions are taken as group-by 
ordinal
+select 4, b from data group by 1, 2;
+
+set spark.sql.groupByOrdinal=true;
+
+select 4, b from data group by 1, 2;
+
+select 3, 4, sum(b) from data group by 1, 2;
--- End diff --

Except this line, does this PR fix any other queries? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties from s...

2017-08-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18668
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties from s...

2017-08-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18668
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80227/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties from s...

2017-08-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18668
  
**[Test build #80227 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80227/testReport)**
 for PR 18668 at commit 
[`5043eb6`](https://github.com/apache/spark/commit/5043eb69b41d1d0263e8814da27a934491bc936c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18840: [SPARK-21565] Propagate metadata in attribute replacemen...

2017-08-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18840
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18840: [SPARK-21565] Propagate metadata in attribute rep...

2017-08-03 Thread joseph-torres

GitHub user joseph-torres opened a pull request:

https://github.com/apache/spark/pull/18840

[SPARK-21565] Propagate metadata in attribute replacement.

## What changes were proposed in this pull request?

Propagate metadata in attribute replacement during streaming execution. 
This is necessary for EventTimeWatermarks consuming replaced attributes.

## How was this patch tested?
new unit test, which was verified to fail before the fix

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/joseph-torres/spark SPARK-21565

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18840.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18840


commit e54d81200569c2260f0995b2f91aa9829dc10ad7
Author: Jose Torres 
Date:   2017-08-04T03:52:57Z

Propagate metadata in attribute replacement.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18779: [SPARK-21580][SQL]Integers in aggregation expressions ar...

2017-08-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18779
  
**[Test build #80232 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80232/testReport)**
 for PR 18779 at commit 
[`bfab7e5`](https://github.com/apache/spark/commit/bfab7e55609ed9c98534c23d41ff881a6eed84d5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18779: [SPARK-21580][SQL]Integers in aggregation expressions ar...

2017-08-03 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18779
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18779: [SPARK-21580][SQL]Integers in aggregation expressions ar...

2017-08-03 Thread 10110346

Github user 10110346 commented on the issue:

https://github.com/apache/spark/pull/18779
  
ok,thanks @viirya 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18839: [SPARK-21634][SQL] Change OneRowRelation from a case obj...

2017-08-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18839
  
**[Test build #80231 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80231/testReport)**
 for PR 18839 at commit 
[`23efcc6`](https://github.com/apache/spark/commit/23efcc687aabf8bebcf3d1a2647b60ca059f9d55).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18839: [SPARK-21634][SQL] Change OneRowRelation from a case obj...

2017-08-03 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/18839
  
Some test on string form of the plan might fail. Let's see ...



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18839: [SPARK-21634][SQL] Change OneRowRelation from a c...

2017-08-03 Thread rxin

GitHub user rxin opened a pull request:

https://github.com/apache/spark/pull/18839

[SPARK-21634][SQL] Change OneRowRelation from a case object to case class

## What changes were proposed in this pull request?
OneRowRelation is the only plan that is a case object, which causes some 
issues with makeCopy using a 0-arg constructor. This patch changes it from a 
case object to a case class.

This blocks SPARK-21619.

## How was this patch tested?
Should be covered by existing test cases.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rxin/spark SPARK-21634

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18839.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18839


commit 23efcc687aabf8bebcf3d1a2647b60ca059f9d55
Author: Reynold Xin 
Date:   2017-08-04T03:39:34Z

[SPARK-21634][SQL] Change OneRowRelation from a case object to case class




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18779: [SPARK-21580][SQL]Integers in aggregation expressions ar...

2017-08-03 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18779
  
@10110346 As using `resolveOperators` can solve the whole bug, let's do it 
and simplify the whole change. Sorry for confusing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18779: [SPARK-21580][SQL]Integers in aggregation express...

2017-08-03 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18779#discussion_r131305760
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/SubstituteUnresolvedOrdinals.scala
 ---
@@ -1,54 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- *
- *http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.spark.sql.catalyst.analysis
-
-import org.apache.spark.sql.catalyst.expressions.{Expression, Literal, 
SortOrder}
-import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, 
LogicalPlan, Sort}
-import org.apache.spark.sql.catalyst.rules.Rule
-import org.apache.spark.sql.catalyst.trees.CurrentOrigin.withOrigin
-import org.apache.spark.sql.internal.SQLConf
-import org.apache.spark.sql.types.IntegerType
-
-/**
- * Replaces ordinal in 'order by' or 'group by' with UnresolvedOrdinal 
expression.
- */
-class SubstituteUnresolvedOrdinals(conf: SQLConf) extends 
Rule[LogicalPlan] {
-  private def isIntLiteral(e: Expression) = e match {
-case Literal(_, IntegerType) => true
-case _ => false
-  }
-
-  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
--- End diff --

Aha, sorry, I made a mistake here in inspecting Analyzer...

Yeah, the whole bug is due to re-entrance of an analyzed plan. Currently we 
can solve it by using `resolveOperators`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18829: [SPARK-21620][WEB-UI][CORE]Add metrics url in spark web ...

2017-08-03 Thread ajbozarth

Github user ajbozarth commented on the issue:

https://github.com/apache/spark/pull/18829
  
I think if we really want these metrics in the UI we should look at adding 
them to the UI in some way rather as a link to a json dump. I am not a fan of 
json dumps as part of a UI in general, I think it defeats the purpose of 
creating a UI.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18742: [Spark-21542][ML][Python]Python persistence helper funct...

2017-08-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18742
  
**[Test build #80230 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80230/testReport)**
 for PR 18742 at commit 
[`cd22c59`](https://github.com/apache/spark/commit/cd22c59ed6e9b11ca9690f11a20febda02e9c896).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18742: [Spark-21542][ML][Python]Python persistence helper funct...

2017-08-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18742
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80226/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18742: [Spark-21542][ML][Python]Python persistence helper funct...

2017-08-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18742
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18742: [Spark-21542][ML][Python]Python persistence helper funct...

2017-08-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18742
  
**[Test build #80226 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80226/testReport)**
 for PR 18742 at commit 
[`bb3d998`](https://github.com/apache/spark/commit/bb3d9983bd9ee6cc758950aa84a530184db7b68d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18820: [SPARK-14932][SQL] Allow DataFrame.replace() to replace ...

2017-08-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18820
  
**[Test build #80229 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80229/testReport)**
 for PR 18820 at commit 
[`dfbcaf3`](https://github.com/apache/spark/commit/dfbcaf3e47126eaf6fd3a0276054cad01dbff71a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18820: [SPARK-14932][SQL] Allow DataFrame.replace() to replace ...

2017-08-03 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18820
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18746: [SPARK-21633][ML][Python] UnaryTransformer in Python

2017-08-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18746
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80228/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18829: [SPARK-21620][WEB-UI][CORE]Add metrics url in spark web ...

2017-08-03 Thread guoxiaolongzte

Github user guoxiaolongzte commented on the issue:

https://github.com/apache/spark/pull/18829
  
Hbase WEB UI has metrics, Spark WEB UI should also have the function.
This is just my opinion.


![20](https://user-images.githubusercontent.com/26266482/28951812-e27abd9c-78ff-11e7-8213-c57d74716806.png)



![21](https://user-images.githubusercontent.com/26266482/28951819-e8edd394-78ff-11e7-99f9-d8e93873470a.png)



![23](https://user-images.githubusercontent.com/26266482/28951824-f0cfdbd4-78ff-11e7-879e-5667e051f0fe.png)





---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18746: [SPARK-21633][ML][Python] UnaryTransformer in Python

2017-08-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18746
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18746: [SPARK-21633][ML][Python] UnaryTransformer in Python

2017-08-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18746
  
**[Test build #80228 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80228/testReport)**
 for PR 18746 at commit 
[`a30ae39`](https://github.com/apache/spark/commit/a30ae399923f2d4cef2cc00d88cae85fa5a2dde7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18779: [SPARK-21580][SQL]Integers in aggregation expressions ar...

2017-08-03 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/18779
  
We need to backport this issue to branch-2.2? I think the opinion depends 
on the backport decision. If no, I'm with your suggestion (keep this issue as a 
blocker for branch-2.3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17980: [SPARK-20728][SQL] Make ORCFileFormat configurable betwe...

2017-08-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17980
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17980: [SPARK-20728][SQL] Make ORCFileFormat configurable betwe...

2017-08-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17980
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80224/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17980: [SPARK-20728][SQL] Make ORCFileFormat configurable betwe...

2017-08-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17980
  
**[Test build #80224 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80224/testReport)**
 for PR 17980 at commit 
[`5db6acf`](https://github.com/apache/spark/commit/5db6acf2cde02217c283103ebcbfd2630338852c).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18832: [SPARK-21623][ML]fix RF doc

2017-08-03 Thread mpjlu

Github user mpjlu commented on the issue:

https://github.com/apache/spark/pull/18832
  
Thanks @sethah .
I strongly think we should update the commend or just delete the comment as 
the current PR.
Another reason is: there are three kinds of feature: categorical, ordered 
categorical, and continuous
Only the first iteration of categorical feature need parentStats, the other 
two don't need. The comment seems all first iteration need parentStats. 
   


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18779: [SPARK-21580][SQL]Integers in aggregation express...

2017-08-03 Thread 10110346

Github user 10110346 commented on a diff in the pull request:

https://github.com/apache/spark/pull/18779#discussion_r131300093
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/SubstituteUnresolvedOrdinals.scala
 ---
@@ -1,54 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- *
- *http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.spark.sql.catalyst.analysis
-
-import org.apache.spark.sql.catalyst.expressions.{Expression, Literal, 
SortOrder}
-import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, 
LogicalPlan, Sort}
-import org.apache.spark.sql.catalyst.rules.Rule
-import org.apache.spark.sql.catalyst.trees.CurrentOrigin.withOrigin
-import org.apache.spark.sql.internal.SQLConf
-import org.apache.spark.sql.types.IntegerType
-
-/**
- * Replaces ordinal in 'order by' or 'group by' with UnresolvedOrdinal 
expression.
- */
-class SubstituteUnresolvedOrdinals(conf: SQLConf) extends 
Rule[LogicalPlan] {
-  private def isIntLiteral(e: Expression) = e match {
-case Literal(_, IntegerType) => true
-case _ => false
-  }
-
-  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
--- End diff --

 Run `SubstituteUnresolvedOrdinals` with ` fixedPoint`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18395: [SPARK-20655][core] In-memory KVStore implementation.

2017-08-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18395
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80222/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18395: [SPARK-20655][core] In-memory KVStore implementation.

2017-08-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18395
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18746: [SPARK-21633][ML][Python] UnaryTransformer in Python

2017-08-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18746
  
**[Test build #80228 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80228/testReport)**
 for PR 18746 at commit 
[`a30ae39`](https://github.com/apache/spark/commit/a30ae399923f2d4cef2cc00d88cae85fa5a2dde7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18395: [SPARK-20655][core] In-memory KVStore implementation.

2017-08-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18395
  
**[Test build #80222 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80222/testReport)**
 for PR 18395 at commit 
[`76cb291`](https://github.com/apache/spark/commit/76cb291d21ead292e5dda2d20a206492e23a3120).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-08-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18640
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80221/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18779: [SPARK-21580][SQL]Integers in aggregation express...

2017-08-03 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18779#discussion_r131299271
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/SubstituteUnresolvedOrdinals.scala
 ---
@@ -1,54 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- *
- *http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.spark.sql.catalyst.analysis
-
-import org.apache.spark.sql.catalyst.expressions.{Expression, Literal, 
SortOrder}
-import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, 
LogicalPlan, Sort}
-import org.apache.spark.sql.catalyst.rules.Rule
-import org.apache.spark.sql.catalyst.trees.CurrentOrigin.withOrigin
-import org.apache.spark.sql.internal.SQLConf
-import org.apache.spark.sql.types.IntegerType
-
-/**
- * Replaces ordinal in 'order by' or 'group by' with UnresolvedOrdinal 
expression.
- */
-class SubstituteUnresolvedOrdinals(conf: SQLConf) extends 
Rule[LogicalPlan] {
-  private def isIntLiteral(e: Expression) = e match {
-case Literal(_, IntegerType) => true
-case _ => false
-  }
-
-  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
--- End diff --

@10110346 May I ask which once you run `SubstituteUnresolvedOrdinals` with? 
`Once` or `fixedPoint`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-08-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18640
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-08-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18640
  
**[Test build #80221 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80221/testReport)**
 for PR 18640 at commit 
[`0f29656`](https://github.com/apache/spark/commit/0f29656cd2a933fad37af33e59115d376026d09d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18815: [SPARK-21609][WEB-UI]In the Master ui add "log directory...

2017-08-03 Thread guoxiaolongzte

Github user guoxiaolongzte commented on the issue:

https://github.com/apache/spark/pull/18815
  
Ok. That master, worker log and Executor log can be displayed in the WEB UI?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18779: [SPARK-21580][SQL]Integers in aggregation express...

2017-08-03 Thread 10110346

Github user 10110346 commented on a diff in the pull request:

https://github.com/apache/spark/pull/18779#discussion_r131298714
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/SubstituteUnresolvedOrdinals.scala
 ---
@@ -1,54 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- *
- *http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.spark.sql.catalyst.analysis
-
-import org.apache.spark.sql.catalyst.expressions.{Expression, Literal, 
SortOrder}
-import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, 
LogicalPlan, Sort}
-import org.apache.spark.sql.catalyst.rules.Rule
-import org.apache.spark.sql.catalyst.trees.CurrentOrigin.withOrigin
-import org.apache.spark.sql.internal.SQLConf
-import org.apache.spark.sql.types.IntegerType
-
-/**
- * Replaces ordinal in 'order by' or 'group by' with UnresolvedOrdinal 
expression.
- */
-class SubstituteUnresolvedOrdinals(conf: SQLConf) extends 
Rule[LogicalPlan] {
-  private def isIntLiteral(e: Expression) = e match {
-case Literal(_, IntegerType) => true
-case _ => false
-  }
-
-  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
--- End diff --

For this Change: `transform -> resolveOperators`,it can fix the whole bug, 
I have tested it. @gatorsmile  @viirya


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties from s...

2017-08-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18668
  
**[Test build #80227 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80227/testReport)**
 for PR 18668 at commit 
[`5043eb6`](https://github.com/apache/spark/commit/5043eb69b41d1d0263e8814da27a934491bc936c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18690: [SPARK-21334][CORE] Add metrics reporting service to Ext...

2017-08-03 Thread jerryshao

Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/18690
  
So I think if you want to connect your custom sink to Spark Metrics System, 
then you should at least follow what Spark and codahale metrics library did. 
Adding a feature in Spark specifically works only for your own sink seems not 
so reasonable, unless it is a general requirement.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18742: [Spark-21542][ML][Python]Python persistence helper funct...

2017-08-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18742
  
**[Test build #80226 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80226/testReport)**
 for PR 18742 at commit 
[`bb3d998`](https://github.com/apache/spark/commit/bb3d9983bd9ee6cc758950aa84a530184db7b68d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18815: [SPARK-21609][WEB-UI]In the Master ui add "log directory...

2017-08-03 Thread ajbozarth

Github user ajbozarth commented on the issue:

https://github.com/apache/spark/pull/18815
  
Ok then I'm really confused, if the logs we're talking about can already be 
viewed in the ui why do we need to display their location on the system?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18460: [SPARK-21247][SQL] Type comparision should respect case-...

2017-08-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18460
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18460: [SPARK-21247][SQL] Type comparision should respect case-...

2017-08-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18460
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80223/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18838: [SPARK-21632] There is no need to make attempts for crea...

2017-08-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18838
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18460: [SPARK-21247][SQL] Type comparision should respect case-...

2017-08-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18460
  
**[Test build #80223 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80223/testReport)**
 for PR 18460 at commit 
[`3e1e81d`](https://github.com/apache/spark/commit/3e1e81d4412672cd479cd07aa786dea6328789aa).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18838: [SPARK-21632] There is no need to make attempts f...

2017-08-03 Thread liu-zhaokun

GitHub user liu-zhaokun opened a pull request:

https://github.com/apache/spark/pull/18838

[SPARK-21632] There is no need to make attempts for createDirectory if the 
dir had existed


[https://issues.apache.org/jira/browse/SPARK-21632](https://issues.apache.org/jira/browse/SPARK-21632)

There is no need to make attempts for createDirectory if the dir had 
existed.So I think we should log it,and Jump out of the loop

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/liu-zhaokun/spark master08040833

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18838.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18838


commit e3e36a1b920890f2e1873790775d4327cf9ee4fa
Author: liuzhaokun 
Date:   2017-08-04T01:39:51Z

[SPARK-21632] There is no need to make attempts for createDirectory if the 
dir had existed




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18829: [SPARK-21620][WEB-UI][CORE]Add metrics url in spark web ...

2017-08-03 Thread jerryshao

Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/18829
  
I tend to agree with @ajbozarth , since we already have the APIs to access 
metrics dump with json format, this looks like not so necessary. Also directly 
displaying such kind of json dump on the UI without formatting seems hard to 
use or check. AFAIK usually user will feed this to a monitoring system to 
display, rather than directly check on the web UI.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18779: [SPARK-21580][SQL]Integers in aggregation expressions ar...

2017-08-03 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18779
  
Well, maybe we should revisit this after #17770 gets merged. Because after 
that, we won't go through analyzed plans anymore.

At that time, we can simply solve all the issues by making 
`SubstituteUnresolvedOrdinals` run with `Once`.

What do you think @gatorsmile @10110346 @maropu?




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18779: [SPARK-21580][SQL]Integers in aggregation express...

2017-08-03 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18779#discussion_r131294567
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/SubstituteUnresolvedOrdinals.scala
 ---
@@ -1,54 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- *
- *http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.spark.sql.catalyst.analysis
-
-import org.apache.spark.sql.catalyst.expressions.{Expression, Literal, 
SortOrder}
-import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, 
LogicalPlan, Sort}
-import org.apache.spark.sql.catalyst.rules.Rule
-import org.apache.spark.sql.catalyst.trees.CurrentOrigin.withOrigin
-import org.apache.spark.sql.internal.SQLConf
-import org.apache.spark.sql.types.IntegerType
-
-/**
- * Replaces ordinal in 'order by' or 'group by' with UnresolvedOrdinal 
expression.
- */
-class SubstituteUnresolvedOrdinals(conf: SQLConf) extends 
Rule[LogicalPlan] {
-  private def isIntLiteral(e: Expression) = e match {
-case Literal(_, IntegerType) => true
-case _ => false
-  }
-
-  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
--- End diff --

Because it keeps the redundant Aliases at the top level. That may not what 
we want.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18815: [SPARK-21609][WEB-UI]In the Master ui add "log directory...

2017-08-03 Thread guoxiaolongzte

Github user guoxiaolongzte commented on the issue:

https://github.com/apache/spark/pull/18815
  
Why is the Executor log can be displayed in the WEB UI?

Similarly, I think master, worker log and Executor log is the same. They 
can be displayed in the WEB UI.


![image](https://user-images.githubusercontent.com/26266482/28950185-5fada808-78f4-11e7-9b42-e84dbed1f809.png)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18746: [ML][Python] UnaryTransformer in Python

2017-08-03 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/18746#discussion_r131293810
  
--- Diff: python/pyspark/ml/tests.py ---
@@ -1957,6 +1988,40 @@ def test_chisquaretest(self):
 self.assertTrue(all(field in fieldNames for field in 
expectedFields))
 
 
+class UnaryTransformerTests(SparkSessionTestCase):
+
+def test_unary_transformer_validate_input_type(self):
+shiftVal = 3
+transformer = MockUnaryTransformer(shiftVal=shiftVal)\
+.setInputCol("input").setOutputCol("output")
+
+# should not raise any errors
+transformer.validateInputType(DoubleType())
+
+with self.assertRaises(TypeError):
+# passing the wrong input type should raise an error
+transformer.validateInputType(IntegerType())
+
+def test_unary_transformer_transform(self):
+shiftVal = 3
+transformer = MockUnaryTransformer(shiftVal=shiftVal)\
+.setInputCol("input").setOutputCol("output")
+
+df = self.spark.range(0, 10).toDF('input')
+df = df.withColumn("input", df.input.cast(dataType="double"))
+
+transformed_df = transformer.transform(df)
+inputCol = transformed_df.select("input").collect()
--- End diff --

Do this instead:
```
results = transformed_df.select("input", "output").collect()
for res in results:
   self.assertEqual(res.input + shiftVal, res.output)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18829: [SPARK-21620][WEB-UI][CORE]Add metrics url in spark web ...

2017-08-03 Thread guoxiaolongzte

Github user guoxiaolongzte commented on the issue:

https://github.com/apache/spark/pull/18829
  

Json Metrics information is very full, a lot of information UI is currently 
unable to show, but this information for the application developers is also 
very important.

I understand that for the spark application developers, they observe the 
spark cluster information in three ways, WEB UI, LOG and mestrics.

But we used to look at the masterIP: 8080 UI to view the information, 
instead of using api and log on to the server to view the log information. Api 
is not very easy to remember.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18746: [ML][Python] UnaryTransformer in Python

2017-08-03 Thread jkbradley

Github user jkbradley commented on the issue:

https://github.com/apache/spark/pull/18746
  
@ajaysaini725Â Is there a JIRA for this PR? Please tag this PR in the title.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18779: [SPARK-21580][SQL]Integers in aggregation express...

2017-08-03 Thread 10110346

Github user 10110346 commented on a diff in the pull request:

https://github.com/apache/spark/pull/18779#discussion_r131292299
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/SubstituteUnresolvedOrdinals.scala
 ---
@@ -1,54 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- *
- *http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.spark.sql.catalyst.analysis
-
-import org.apache.spark.sql.catalyst.expressions.{Expression, Literal, 
SortOrder}
-import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, 
LogicalPlan, Sort}
-import org.apache.spark.sql.catalyst.rules.Rule
-import org.apache.spark.sql.catalyst.trees.CurrentOrigin.withOrigin
-import org.apache.spark.sql.internal.SQLConf
-import org.apache.spark.sql.types.IntegerType
-
-/**
- * Replaces ordinal in 'order by' or 'group by' with UnresolvedOrdinal 
expression.
- */
-class SubstituteUnresolvedOrdinals(conf: SQLConf) extends 
Rule[LogicalPlan] {
-  private def isIntLiteral(e: Expression) = e match {
-case Literal(_, IntegerType) => true
-case _ => false
-  }
-
-  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
--- End diff --

changing `Aggregate(grouping.map(trimAliases), cleanedAggs, child)` to  ` 
Aggregate(grouping.map(trimNonTopLevelAliases), cleanedAggs, child) in 
`CleanupAliases` ,
looks like it can fix the whole bug 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18742: [Spark-21542][ML][Python]Python persistence helpe...

2017-08-03 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/18742#discussion_r131284503
  
--- Diff: python/pyspark/ml/tests.py ---
@@ -1957,6 +1964,46 @@ def test_chisquaretest(self):
 self.assertTrue(all(field in fieldNames for field in 
expectedFields))
 
 
+class DefaultReadWriteTests(SparkSessionTestCase):
+
+def test_default_read_write(self):
+temp_path = tempfile.mkdtemp()
+
+lr = LogisticRegression()
+lr.setMaxIter(50)
+lr.setThreshold(.75)
+writer = DefaultParamsWriter(lr)
+
+savePath = temp_path + "/lr"
+writer.saveImpl(savePath)
+
+reader = DefaultParamsReadable.read()
+lr2 = reader.load(savePath)
+
+self.assertEqual(lr.uid, lr2.uid)
+self.assertEqual(lr.extractParamMap(), lr2.extractParamMap())
+
+def test_default_read_write_with_overwrite(self):
--- End diff --

Since these tests are almost the same, can you combine them to reduce code 
duplication?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18742: [Spark-21542][ML][Python]Python persistence helpe...

2017-08-03 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/18742#discussion_r131285744
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala ---
@@ -471,3 +471,24 @@ private[ml] object MetaAlgorithmReadWrite {
 List((instance.uid, instance)) ++ subStageMaps
   }
 }
+
+private[ml] class FileSystemOverwrite extends Logging {
+
+  def handleOverwrite(path: String, shouldOverwrite: Boolean, sc: 
SparkContext): Unit = {
--- End diff --

This is the same now as the code in MLWriter.save, so please use this 
within MLWriter.save to eliminate the duplicated code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18742: [Spark-21542][ML][Python]Python persistence helpe...

2017-08-03 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/18742#discussion_r131286314
  
--- Diff: python/pyspark/ml/util.py ---
@@ -61,20 +66,74 @@ def _randomUID(cls):
 
 
 @inherit_doc
-class MLWriter(object):
+class BaseReadWrite(object):
+"""
+Base class for MLWriter and MLReader. Stores information about the 
SparkContext
+and SparkSession.
+
+.. versionadded:: 2.3.0
+"""
+
+def __init__(self):
+self._sparkSession = None
+
+def context(self, sqlContext):
--- End diff --

Leaving this is OK if you remove it from the subclasses, for the sake of 
code simplification.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18742: [Spark-21542][ML][Python]Python persistence helpe...

2017-08-03 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/18742#discussion_r131287820
  
--- Diff: python/pyspark/ml/util.py ---
@@ -237,6 +300,13 @@ def _load_java_obj(cls, clazz):
 java_obj = getattr(java_obj, name)
 return java_obj
 
+@classmethod
+def _load_given_name(cls, java_class):
--- End diff --

This can be static since you don't use ```cls```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18742: [Spark-21542][ML][Python]Python persistence helpe...

2017-08-03 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/18742#discussion_r131288629
  
--- Diff: python/pyspark/ml/util.py ---
@@ -237,6 +300,13 @@ def _load_java_obj(cls, clazz):
 java_obj = getattr(java_obj, name)
 return java_obj
 
+@classmethod
+def _load_given_name(cls, java_class):
--- End diff --

Please also stick with camelCase for methods since that's what pyspark 
does.  (here and elsewhere)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18742: [Spark-21542][ML][Python]Python persistence helpe...

2017-08-03 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/18742#discussion_r131288896
  
--- Diff: python/pyspark/ml/util.py ---
@@ -283,3 +353,143 @@ def numFeatures(self):
 Returns the number of features the model was trained on. If 
unknown, returns -1
 """
 return self._call_java("numFeatures")
+
+
+@inherit_doc
+class DefaultParamsWritable(MLWritable):
+"""
+Class for making simple Params types writable. Assumes that all 
parameters
+are JSON-serializable.
+
+.. versionadded:: 2.3.0
+"""
+
+def write(self):
+"""Returns a DefaultParamsWriter instance for this class."""
+if isinstance(self, Params):
+return DefaultParamsWriter(self)
+else:
+raise TypeError("Cannot use DefautParamsWritable with type %s 
because it does not " +
+" extend Params.", type(self))
+
+
+@inherit_doc
+class DefaultParamsWriter(MLWriter):
+"""
+Class for writing Estimators and Transformers whose parameters are 
JSON-serializable.
+
+.. versionadded:: 2.3.0
+"""
+
+def __init__(self, instance):
+super(DefaultParamsWriter, self).__init__()
+self.instance = instance
+
+def saveImpl(self, path):
+DefaultParamsWriter.save_metadata(self.instance, path, self.sc)
+
+@staticmethod
+def save_metadata(instance, path, sc, extraMetadata=None, 
paramMap=None):
--- End diff --

add leading underscore to make this private


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18742: [Spark-21542][ML][Python]Python persistence helpe...

2017-08-03 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/18742#discussion_r131288360
  
--- Diff: python/pyspark/ml/util.py ---
@@ -283,3 +353,143 @@ def numFeatures(self):
 Returns the number of features the model was trained on. If 
unknown, returns -1
 """
 return self._call_java("numFeatures")
+
+
+@inherit_doc
+class DefaultParamsWritable(MLWritable):
+"""
+Class for making simple Params types writable. Assumes that all 
parameters
+are JSON-serializable.
+
+.. versionadded:: 2.3.0
+"""
+
+def write(self):
+"""Returns a DefaultParamsWriter instance for this class."""
+if isinstance(self, Params):
+return DefaultParamsWriter(self)
+else:
+raise TypeError("Cannot use DefautParamsWritable with type %s 
because it does not " +
+" extend Params.", type(self))
+
+
+@inherit_doc
+class DefaultParamsWriter(MLWriter):
+"""
+Class for writing Estimators and Transformers whose parameters are 
JSON-serializable.
+
+.. versionadded:: 2.3.0
+"""
+
+def __init__(self, instance):
+super(DefaultParamsWriter, self).__init__()
+self.instance = instance
+
+def saveImpl(self, path):
+DefaultParamsWriter.save_metadata(self.instance, path, self.sc)
+
+@staticmethod
+def save_metadata(instance, path, sc, extraMetadata=None, 
paramMap=None):
+metadataPath = os.path.join(path, "metadata")
+metadataJson = DefaultParamsWriter.get_metadata_to_save(instance,
+
metadataPath,
+sc,
+
extraMetadata,
+paramMap)
+sc.parallelize([metadataJson], 1).saveAsTextFile(metadataPath)
+
+@staticmethod
+def get_metadata_to_save(instance, path, sc, extraMetadata=None, 
paramMap=None):
+uid = instance.uid
+cls = instance.__module__ + '.' + instance.__class__.__name__
+params = instance.extractParamMap()
+jsonParams = {}
+if paramMap is not None:
+for p in paramMap:
+jsonParams[p.name] = paramMap[p]
+else:
+for p in params:
+jsonParams[p.name] = params[p]
+basicMetadata = {"class": cls, "timestamp": long(round(time.time() 
* 1000)),
+ "sparkVersion": sc.version, "uid": uid, 
"paramMap": jsonParams}
+if extraMetadata is not None:
+basicMetadata.update(extraMetadata)
+return json.dumps(basicMetadata, separators=[',',  ':'])
+
+
+@inherit_doc
+class DefaultParamsReadable(MLReadable):
+"""
+Class for making simple Params types readable. Assumes that all 
parameters
--- End diff --

Could you please merge the Scala doc into this to make this more detailed?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18742: [Spark-21542][ML][Python]Python persistence helpe...

2017-08-03 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/18742#discussion_r131287028
  
--- Diff: python/pyspark/ml/util.py ---
@@ -61,20 +66,74 @@ def _randomUID(cls):
 
 
 @inherit_doc
-class MLWriter(object):
+class BaseReadWrite(object):
+"""
+Base class for MLWriter and MLReader. Stores information about the 
SparkContext
+and SparkSession.
+
+.. versionadded:: 2.3.0
+"""
+
+def __init__(self):
+self._sparkSession = None
+
+def context(self, sqlContext):
+"""
+Sets the Spark SQLContext to use for saving/loading.
+
+.. note:: Deprecated in 2.1 and will be removed in 3.0, use 
session instead.
+"""
+raise NotImplementedError("MLWriter is not yet implemented for 
type: %s" % type(self))
+
+def session(self, sparkSession):
+"""
+Sets the Spark Session to use for saving/loading.
+"""
+self._sparkSession = sparkSession
+return self
+
+def sparkSession(self):
+if self._sparkSession is None:
+self._sparkSession = SparkSession.builder.getOrCreate()
+return self._sparkSession
+
+@property
+def sc(self):
+return self.sparkSession().sparkContext
+
+
+@inherit_doc
+class MLWriter(BaseReadWrite):
 """
 Utility class that can save ML instances.
 
 .. versionadded:: 2.0.0
 """
 
+def __init__(self):
+super(MLWriter, self).__init__()
+self.shouldOverwrite = False
+
+def _handleOverwrite(self, path):
+from pyspark.ml.wrapper import JavaWrapper
+
+_java_obj = 
JavaWrapper._new_java_obj("org.apache.spark.ml.util.FileSystemOverwrite")
+wrapper = JavaWrapper(_java_obj)
+wrapper._call_java("handleOverwrite", path, True, 
self.sc._jsc.sc())
+
 def save(self, path):
 """Save the ML instance to the input path."""
+if self.shouldOverwrite:
+self._handleOverwrite(path)
+self.saveImpl(path)
+
+def saveImpl(self, path):
--- End diff --

Add doc from Scala


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18742: [Spark-21542][ML][Python]Python persistence helpe...

2017-08-03 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/18742#discussion_r131288786
  
--- Diff: python/pyspark/ml/util.py ---
@@ -283,3 +353,143 @@ def numFeatures(self):
 Returns the number of features the model was trained on. If 
unknown, returns -1
 """
 return self._call_java("numFeatures")
+
+
+@inherit_doc
+class DefaultParamsWritable(MLWritable):
+"""
+Class for making simple Params types writable. Assumes that all 
parameters
+are JSON-serializable.
+
+.. versionadded:: 2.3.0
+"""
+
+def write(self):
+"""Returns a DefaultParamsWriter instance for this class."""
+if isinstance(self, Params):
+return DefaultParamsWriter(self)
+else:
+raise TypeError("Cannot use DefautParamsWritable with type %s 
because it does not " +
+" extend Params.", type(self))
+
+
+@inherit_doc
+class DefaultParamsWriter(MLWriter):
+"""
+Class for writing Estimators and Transformers whose parameters are 
JSON-serializable.
+
+.. versionadded:: 2.3.0
+"""
+
+def __init__(self, instance):
+super(DefaultParamsWriter, self).__init__()
+self.instance = instance
+
+def saveImpl(self, path):
+DefaultParamsWriter.save_metadata(self.instance, path, self.sc)
+
+@staticmethod
+def save_metadata(instance, path, sc, extraMetadata=None, 
paramMap=None):
+metadataPath = os.path.join(path, "metadata")
+metadataJson = DefaultParamsWriter.get_metadata_to_save(instance,
+
metadataPath,
+sc,
+
extraMetadata,
+paramMap)
+sc.parallelize([metadataJson], 1).saveAsTextFile(metadataPath)
+
+@staticmethod
+def get_metadata_to_save(instance, path, sc, extraMetadata=None, 
paramMap=None):
--- End diff --

path is not used


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18742: [Spark-21542][ML][Python]Python persistence helpe...

2017-08-03 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/18742#discussion_r131288351
  
--- Diff: python/pyspark/ml/util.py ---
@@ -283,3 +353,143 @@ def numFeatures(self):
 Returns the number of features the model was trained on. If 
unknown, returns -1
 """
 return self._call_java("numFeatures")
+
+
+@inherit_doc
+class DefaultParamsWritable(MLWritable):
+"""
+Class for making simple Params types writable. Assumes that all 
parameters
--- End diff --

Could you please merge the Scala doc into this to make this more detailed?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18742: [Spark-21542][ML][Python]Python persistence helpe...

2017-08-03 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/18742#discussion_r131290424
  
--- Diff: python/pyspark/ml/util.py ---
@@ -283,3 +353,143 @@ def numFeatures(self):
 Returns the number of features the model was trained on. If 
unknown, returns -1
 """
 return self._call_java("numFeatures")
+
+
+@inherit_doc
+class DefaultParamsWritable(MLWritable):
+"""
+Class for making simple Params types writable. Assumes that all 
parameters
+are JSON-serializable.
+
+.. versionadded:: 2.3.0
+"""
+
+def write(self):
+"""Returns a DefaultParamsWriter instance for this class."""
+if isinstance(self, Params):
+return DefaultParamsWriter(self)
+else:
+raise TypeError("Cannot use DefautParamsWritable with type %s 
because it does not " +
+" extend Params.", type(self))
+
+
+@inherit_doc
+class DefaultParamsWriter(MLWriter):
+"""
+Class for writing Estimators and Transformers whose parameters are 
JSON-serializable.
+
+.. versionadded:: 2.3.0
+"""
+
+def __init__(self, instance):
+super(DefaultParamsWriter, self).__init__()
+self.instance = instance
+
+def saveImpl(self, path):
+DefaultParamsWriter.save_metadata(self.instance, path, self.sc)
+
+@staticmethod
+def save_metadata(instance, path, sc, extraMetadata=None, 
paramMap=None):
+metadataPath = os.path.join(path, "metadata")
+metadataJson = DefaultParamsWriter.get_metadata_to_save(instance,
+
metadataPath,
+sc,
+
extraMetadata,
+paramMap)
+sc.parallelize([metadataJson], 1).saveAsTextFile(metadataPath)
+
+@staticmethod
+def get_metadata_to_save(instance, path, sc, extraMetadata=None, 
paramMap=None):
+uid = instance.uid
+cls = instance.__module__ + '.' + instance.__class__.__name__
+params = instance.extractParamMap()
+jsonParams = {}
+if paramMap is not None:
+for p in paramMap:
+jsonParams[p.name] = paramMap[p]
+else:
+for p in params:
+jsonParams[p.name] = params[p]
+basicMetadata = {"class": cls, "timestamp": long(round(time.time() 
* 1000)),
+ "sparkVersion": sc.version, "uid": uid, 
"paramMap": jsonParams}
+if extraMetadata is not None:
+basicMetadata.update(extraMetadata)
+return json.dumps(basicMetadata, separators=[',',  ':'])
+
+
+@inherit_doc
+class DefaultParamsReadable(MLReadable):
+"""
+Class for making simple Params types readable. Assumes that all 
parameters
+are JSON-serializable.
+
+.. versionadded:: 2.3.0
+"""
+
+@classmethod
+def read(cls):
+"""Returns a DefaultParamsReader instance for this class."""
+return DefaultParamsReader(cls)
+
+
+@inherit_doc
+class DefaultParamsReader(MLReader):
+"""
+Class for reading Estimators and Transformers whose parameters are 
JSON-serializable.
+
+.. versionadded:: 2.3.0
+"""
+
+def __init__(self, cls):
+super(DefaultParamsReader, self).__init__()
+self.cls = cls
+
+@staticmethod
+def __get_class(clazz):
+"""
+Loads Python class from its name.
+"""
+parts = clazz.split('.')
+module = ".".join(parts[:-1])
+m = __import__(module)
+for comp in parts[1:]:
+m = getattr(m, comp)
+return m
+
+def load(self, path):
+metadata = DefaultParamsReader.loadMetadata(path, self.sc)
+py_type = DefaultParamsReader.__get_class(metadata['class'])
+instance = py_type()
+instance._resetUid(metadata['uid'])
+DefaultParamsReader.getAndSetParams(instance, metadata)
+return instance
+
+@staticmethod
+def loadMetadata(path, sc, expectedClassName=""):
--- End diff --

These static methods can be private (add leading underscores)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact

[GitHub] spark pull request #18742: [Spark-21542][ML][Python]Python persistence helpe...

2017-08-03 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/18742#discussion_r131288910
  
--- Diff: python/pyspark/ml/util.py ---
@@ -283,3 +353,143 @@ def numFeatures(self):
 Returns the number of features the model was trained on. If 
unknown, returns -1
 """
 return self._call_java("numFeatures")
+
+
+@inherit_doc
+class DefaultParamsWritable(MLWritable):
+"""
+Class for making simple Params types writable. Assumes that all 
parameters
+are JSON-serializable.
+
+.. versionadded:: 2.3.0
+"""
+
+def write(self):
+"""Returns a DefaultParamsWriter instance for this class."""
+if isinstance(self, Params):
+return DefaultParamsWriter(self)
+else:
+raise TypeError("Cannot use DefautParamsWritable with type %s 
because it does not " +
+" extend Params.", type(self))
+
+
+@inherit_doc
+class DefaultParamsWriter(MLWriter):
+"""
+Class for writing Estimators and Transformers whose parameters are 
JSON-serializable.
+
+.. versionadded:: 2.3.0
+"""
+
+def __init__(self, instance):
+super(DefaultParamsWriter, self).__init__()
+self.instance = instance
+
+def saveImpl(self, path):
+DefaultParamsWriter.save_metadata(self.instance, path, self.sc)
+
+@staticmethod
+def save_metadata(instance, path, sc, extraMetadata=None, 
paramMap=None):
+metadataPath = os.path.join(path, "metadata")
+metadataJson = DefaultParamsWriter.get_metadata_to_save(instance,
+
metadataPath,
+sc,
+
extraMetadata,
+paramMap)
+sc.parallelize([metadataJson], 1).saveAsTextFile(metadataPath)
+
+@staticmethod
+def get_metadata_to_save(instance, path, sc, extraMetadata=None, 
paramMap=None):
--- End diff --

add leading underscore to make this private


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 >

1 - 100 of 482 matches

Mail list logo