[GitHub] spark pull request #20183: [SPARK-22986][Core] Fix/cache broadcast values

2018-01-10 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20183#discussion_r160881323
  
--- Diff: 
core/src/test/scala/org/apache/spark/broadcast/BroadcastSuite.scala ---
@@ -153,6 +153,40 @@ class BroadcastSuite extends SparkFunSuite with 
LocalSparkContext with Encryptio
 assert(broadcast.value.sum === 10)
   }
 
+  test("One broadcast value instance per executor") {
+val conf = new SparkConf()
+  .setMaster("local[10]")
--- End diff --

nit: normally `local[4]` is sufficient.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20183: [SPARK-22986][Core] Fix/cache broadcast values

2018-01-10 Thread jiangxb1987
Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/20183
  
Please also update the PR title and description. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20230: [SPARK-23038][TEST] Update docker/spark-test (JDK/OS)

2018-01-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20230
  
**[Test build #85953 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85953/testReport)**
 for PR 20230 at commit 
[`cc3321c`](https://github.com/apache/spark/commit/cc3321c20fd0dc2ef75a8740b5c0292beef98beb).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20230: [SPARK-23038][TEST] Update docker/spark-test (JDK...

2018-01-10 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20230#discussion_r160880463
  
--- Diff: external/docker/spark-test/base/Dockerfile ---
@@ -15,14 +15,14 @@
 # limitations under the License.
 #
 
-FROM ubuntu:precise
+FROM ubuntu:xenial
 
 # Upgrade package index
-# install a few other useful packages plus Open Jdk 7
+# install a few other useful packages plus Open Jdk 8
 # Remove unneeded /var/lib/apt/lists/* after install to reduce the
 # docker image size (by ~30MB)
 RUN apt-get update && \
-apt-get install -y less openjdk-7-jre-headless net-tools vim-tiny sudo 
openssh-server && \
+apt-get install -y less openjdk-8-jre-headless iproute2 vim-tiny sudo 
openssh-server && \
--- End diff --

This is required to use 
[ip](https://github.com/apache/spark/blob/master/external/docker/spark-test/master/default_cmd#L20)
 command in `xenial`. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20153: [SPARK-22392][SQL] data source v2 columnar batch ...

2018-01-10 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20153#discussion_r160880016
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2ScanExec.scala
 ---
@@ -37,40 +35,58 @@ import org.apache.spark.sql.types.StructType
  */
 case class DataSourceV2ScanExec(
 fullOutput: Seq[AttributeReference],
-@transient reader: DataSourceV2Reader) extends LeafExecNode with 
DataSourceReaderHolder {
+@transient reader: DataSourceV2Reader)
+  extends LeafExecNode with DataSourceReaderHolder with ColumnarBatchScan {
 
   override def canEqual(other: Any): Boolean = 
other.isInstanceOf[DataSourceV2ScanExec]
 
-  override def references: AttributeSet = AttributeSet.empty
-
-  override lazy val metrics = Map(
-"numOutputRows" -> SQLMetrics.createMetric(sparkContext, "number of 
output rows"))
-
-  override protected def doExecute(): RDD[InternalRow] = {
-val readTasks: java.util.List[ReadTask[UnsafeRow]] = reader match {
-  case r: SupportsScanUnsafeRow => r.createUnsafeRowReadTasks()
-  case _ =>
-reader.createReadTasks().asScala.map {
-  new RowToUnsafeRowReadTask(_, reader.readSchema()): 
ReadTask[UnsafeRow]
-}.asJava
-}
+  override def producedAttributes: AttributeSet = AttributeSet(fullOutput)
+
+  private lazy val inputRDD: RDD[InternalRow] = reader match {
+case r: SupportsScanColumnarBatch if r.enableBatchRead() =>
+  assert(!reader.isInstanceOf[ContinuousReader],
+"continuous stream reader does not support columnar read yet.")
+  new DataSourceRDD(sparkContext, 
r.createBatchReadTasks()).asInstanceOf[RDD[InternalRow]]
+
+case _ =>
+  val readTasks: java.util.List[ReadTask[UnsafeRow]] = reader match {
+case r: SupportsScanUnsafeRow => r.createUnsafeRowReadTasks()
+case _ =>
+  reader.createReadTasks().asScala.map {
+new RowToUnsafeRowReadTask(_, reader.readSchema()): 
ReadTask[UnsafeRow]
+  }.asJava
+  }
+
+  reader match {
--- End diff --

This looks a bit messy, can we move `readTasks` out as a lazy val then we 
may have:
```
private lazy val readTasks = ..
private lazy val inputRDD: RDD[InternalRow] = reader match {
case r: SupportsScanColumnarBatch if r.enableBatchRead() => ..
case _: ContinuousReader => ..
case _ => ..
}
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20153: [SPARK-22392][SQL] data source v2 columnar batch ...

2018-01-10 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20153#discussion_r160877601
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/SupportsScanColumnarBatch.java
 ---
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.sources.v2.reader;
+
+import java.util.List;
+
+import org.apache.spark.annotation.InterfaceStability;
+import org.apache.spark.sql.Row;
+import org.apache.spark.sql.vectorized.ColumnarBatch;
+
+/**
+ * A mix-in interface for {@link DataSourceV2Reader}. Data source readers 
can implement this
+ * interface to output {@link ColumnarBatch} and make the scan faster.
+ */
+@InterfaceStability.Evolving
+public interface SupportsScanColumnarBatch extends DataSourceV2Reader {
+  @Override
+  default List createReadTasks() {
+throw new IllegalStateException(
+  "createReadTasks should not be called with 
SupportsScanColumnarBatch.");
--- End diff --

`createReadTasks not supported by default within 
SupportsScanColumnarBatch.`, since we allow users to fallback to normal read 
path.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20230: [SPARK-23038][TEST] Update docker/spark-test (JDK...

2018-01-10 Thread dongjoon-hyun
GitHub user dongjoon-hyun opened a pull request:

https://github.com/apache/spark/pull/20230

[SPARK-23038][TEST] Update docker/spark-test (JDK/OS)

## What changes were proposed in this pull request?

This PR aims to update the followings in `docker/spark-test`.

- JDK7 -> JDK8
Spark 2.2+ supports JDK8 only.

- Ubuntu 12.04.5 LTS(precise) -> Ubuntu 16.04.3 LTS(xeniel)
The end of life of `precise` was April 28, 2017.

## How was this patch tested?

Manual.

* Master
```
$ cd external/docker
$ ./build
$ export SPARK_HOME=...
$ docker run -v $SPARK_HOME:/opt/spark spark-test-master
CONTAINER_IP=172.17.0.3
...
18/01/11 06:50:25 INFO MasterWebUI: Bound MasterWebUI to 172.17.0.3, and 
started at http://172.17.0.3:8080
18/01/11 06:50:25 INFO Utils: Successfully started service on port 6066.
18/01/11 06:50:25 INFO StandaloneRestServer: Started REST server for 
submitting applications on port 6066
18/01/11 06:50:25 INFO Master: I have been elected leader! New state: ALIVE
```

* Slave
```
$ docker run -v $SPARK_HOME:/opt/spark spark-test-worker 
spark://172.17.0.3:7077
CONTAINER_IP=172.17.0.4
...
18/01/11 06:51:54 INFO Worker: Successfully registered with master 
spark://172.17.0.3:7077
```

After slave starts, master will show
```
18/01/11 06:51:54 INFO Master: Registering worker 172.17.0.4: with 4 
cores, 1024.0 MB RAM
```

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dongjoon-hyun/spark SPARK-23038

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20230.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20230


commit cc3321c20fd0dc2ef75a8740b5c0292beef98beb
Author: Dongjoon Hyun 
Date:   2018-01-11T07:18:48Z

[SPARK-23038][TEST] Update docker/spark-test (JDK/OS)




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20199: [Spark-22967][TESTS]Fix VersionSuite's unit tests...

2018-01-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20199#discussion_r160879315
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala ---
@@ -842,6 +842,7 @@ class VersionsSuite extends SparkFunSuite with Logging {
 }
 
 test(s"$version: SPARK-17920: Insert into/overwrite avro table") {
+  assume(!(Utils.isWindows && version == "0.12"))
--- End diff --

Nice. Yea, let's merge this one after adding a comment saying it's skipped 
because it's failed in the condition on Windows.

Then, see if we can fix the root cause, and then re-enable this test in the 
PR if everything goes well.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20229: [SPARK-23037][ML] Update RFormula to use VectorSi...

2018-01-10 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20229#discussion_r160878813
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/RFormula.scala 
---
@@ -199,6 +199,7 @@ class RFormula @Since("1.5.0") (@Since("1.5.0") 
override val uid: String)
 val parsedFormula = RFormulaParser.parse($(formula))
 val resolvedFormula = parsedFormula.resolve(dataset.schema)
 val encoderStages = ArrayBuffer[PipelineStage]()
+val oneHotEncodeStages = ArrayBuffer[(String, String)]()
--- End diff --

`oneHotEncodeColumns`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20229: [SPARK-23037][ML] Update RFormula to use VectorSi...

2018-01-10 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20229#discussion_r160878716
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/RFormula.scala 
---
@@ -228,22 +229,33 @@ class RFormula @Since("1.5.0") (@Since("1.5.0") 
override val uid: String)
 // Then we handle one-hot encoding and interactions between terms.
 var keepReferenceCategory = false
 val encodedTerms = resolvedFormula.terms.map {
-  case Seq(term) if dataset.schema(term).dataType == StringType =>
-val encodedCol = tmpColumn("onehot")
-var encoder = new OneHotEncoder()
-  .setInputCol(indexed(term))
-  .setOutputCol(encodedCol)
-// Formula w/o intercept, one of the categories in the first 
category feature is
-// being used as reference category, we will not drop any category 
for that feature.
-if (!hasIntercept && !keepReferenceCategory) {
-  encoder = encoder.setDropLast(false)
-  keepReferenceCategory = true
-}
-encoderStages += encoder
-prefixesToRewrite(encodedCol + "_") = term + "_"
-encodedCol
   case Seq(term) =>
-term
+dataset.schema(term).dataType match {
+  case _: StringType =>
+val encodedCol = tmpColumn("onehot")
+// Formula w/o intercept, one of the categories in the first 
category feature is
+// being used as reference category, we will not drop any 
category for that feature.
+if (!hasIntercept && !keepReferenceCategory) {
+  keepReferenceCategory = true
+  encoderStages += new OneHotEncoderEstimator(uid)
--- End diff --

Oh, I see. There is one `OneHotEncoderEstimator` for `dropLast=false`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20229: [SPARK-23037][ML] Update RFormula to use VectorSizeHint ...

2018-01-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20229
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85950/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20229: [SPARK-23037][ML] Update RFormula to use VectorSizeHint ...

2018-01-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20229
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20229: [SPARK-23037][ML] Update RFormula to use VectorSizeHint ...

2018-01-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20229
  
**[Test build #85950 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85950/testReport)**
 for PR 20229 at commit 
[`7bad275`](https://github.com/apache/spark/commit/7bad275cd995d22d70a42a5b9932073bc3a5d1e5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20229: [SPARK-23037][ML] Update RFormula to use VectorSi...

2018-01-10 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20229#discussion_r160877856
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/RFormula.scala 
---
@@ -228,22 +229,33 @@ class RFormula @Since("1.5.0") (@Since("1.5.0") 
override val uid: String)
 // Then we handle one-hot encoding and interactions between terms.
 var keepReferenceCategory = false
 val encodedTerms = resolvedFormula.terms.map {
-  case Seq(term) if dataset.schema(term).dataType == StringType =>
-val encodedCol = tmpColumn("onehot")
-var encoder = new OneHotEncoder()
-  .setInputCol(indexed(term))
-  .setOutputCol(encodedCol)
-// Formula w/o intercept, one of the categories in the first 
category feature is
-// being used as reference category, we will not drop any category 
for that feature.
-if (!hasIntercept && !keepReferenceCategory) {
-  encoder = encoder.setDropLast(false)
-  keepReferenceCategory = true
-}
-encoderStages += encoder
-prefixesToRewrite(encodedCol + "_") = term + "_"
-encodedCol
   case Seq(term) =>
-term
+dataset.schema(term).dataType match {
+  case _: StringType =>
+val encodedCol = tmpColumn("onehot")
+// Formula w/o intercept, one of the categories in the first 
category feature is
+// being used as reference category, we will not drop any 
category for that feature.
+if (!hasIntercept && !keepReferenceCategory) {
+  keepReferenceCategory = true
+  encoderStages += new OneHotEncoderEstimator(uid)
--- End diff --

As `OneHotEncoderEstimator` can handle multi-column now, can't we just have 
one `OneHotEncoderEstimator` to fit for all terms at once?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19872: [SPARK-22274][PYTHON][SQL] User-defined aggregation func...

2018-01-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19872
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85947/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19872: [SPARK-22274][PYTHON][SQL] User-defined aggregation func...

2018-01-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19872
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19872: [SPARK-22274][PYTHON][SQL] User-defined aggregation func...

2018-01-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19872
  
**[Test build #85947 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85947/testReport)**
 for PR 19872 at commit 
[`cb36227`](https://github.com/apache/spark/commit/cb362274711c1b26ed19e87aa15bc8c64668eae6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20229: [SPARK-23037][ML] Update RFormula to use VectorSizeHint ...

2018-01-10 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/20229
  
Actually I think this can be separated in two different PRs. One for 
OneHotEncoderEstimator and one for VectorSizeHint.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20217: [SPARK-23026] [PySpark] Add RegisterUDF to PySpar...

2018-01-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20217#discussion_r160877051
  
--- Diff: python/pyspark/sql/context.py ---
@@ -578,6 +606,9 @@ def __init__(self, sqlContext):
 def register(self, name, f, returnType=StringType()):
 return self.sqlContext.registerFunction(name, f, returnType)
 
+def registerUDF(self, name, f):
--- End diff --

Yup +1 like 
https://github.com/apache/spark/pull/20217/files/f25669a4b6c2298359df1b9083037468652cd141#r160861434
 but how about checking and doing this in batch? Seems we should fix the 
doctests too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20222: [SPARK-23028] Bump master branch version to 2.4.0-SNAPSH...

2018-01-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20222
  
**[Test build #85952 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85952/testReport)**
 for PR 20222 at commit 
[`5eded03`](https://github.com/apache/spark/commit/5eded033a0b352e7a799c7890131d8075475c8ff).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20072: [SPARK-22790][SQL] add a configurable factor to describe...

2018-01-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20072
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85949/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20072: [SPARK-22790][SQL] add a configurable factor to describe...

2018-01-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20072
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20072: [SPARK-22790][SQL] add a configurable factor to describe...

2018-01-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20072
  
**[Test build #85949 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85949/testReport)**
 for PR 20072 at commit 
[`5230081`](https://github.com/apache/spark/commit/523008192cb1476a68cf3808f46574598fcc6d2d).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20217: [SPARK-23026] [PySpark] Add RegisterUDF to PySpar...

2018-01-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20217#discussion_r160876638
  
--- Diff: python/pyspark/sql/context.py ---
@@ -203,18 +203,46 @@ def registerFunction(self, name, f, 
returnType=StringType()):
 >>> _ = sqlContext.udf.register("stringLengthInt", lambda x: 
len(x), IntegerType())
 >>> sqlContext.sql("SELECT stringLengthInt('test')").collect()
 [Row(stringLengthInt(test)=4)]
+"""
+return self.sparkSession.catalog.registerFunction(name, f, 
returnType)
+
+@ignore_unicode_prefix
+@since(2.3)
+def registerUDF(self, name, f):
--- End diff --

And also, seems we need to fix the examples too in this case .. `spark` -> 
`sqlContext`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20222: [SPARK-23028] Bump master branch version to 2.4.0-SNAPSH...

2018-01-10 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20222
  
Thanks for your review! 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20229: [SPARK-23037][ML] Update RFormula to use VectorSizeHint ...

2018-01-10 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/20229
  
we need to get it to kick off R tests - could you touch one of the files 
under R/?
also please update PR to include [SPARKR]


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20217: [SPARK-23026] [PySpark] Add RegisterUDF to PySpar...

2018-01-10 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20217#discussion_r160873545
  
--- Diff: python/pyspark/sql/context.py ---
@@ -203,18 +203,46 @@ def registerFunction(self, name, f, 
returnType=StringType()):
 >>> _ = sqlContext.udf.register("stringLengthInt", lambda x: 
len(x), IntegerType())
 >>> sqlContext.sql("SELECT stringLengthInt('test')").collect()
 [Row(stringLengthInt(test)=4)]
+"""
+return self.sparkSession.catalog.registerFunction(name, f, 
returnType)
+
+@ignore_unicode_prefix
+@since(2.3)
+def registerUDF(self, name, f):
+"""Registers a :class:`UserDefinedFunction`. The registered UDF 
can be used in SQL
+statement.
+
+:param name: name of the UDF
+:param f: a wrapped/native UserDefinedFunction. The UDF can be 
either row-at-a-time or
+  scalar vectorized. Grouped vectorized UDFs are not 
supported.
+:return: a wrapped :class:`UserDefinedFunction`
+
+>>> from pyspark.sql.types import IntegerType
+>>> from pyspark.sql.functions import udf
+>>> slen = udf(lambda s: len(s), IntegerType())
+>>> _ = sqlContext.udf.registerUDF("slen", slen)
+>>> sqlContext.sql("SELECT slen('test')").collect()
+[Row(slen(test)=4)]
 
 >>> import random
 >>> from pyspark.sql.functions import udf
->>> from pyspark.sql.types import IntegerType, StringType
+>>> from pyspark.sql.types import IntegerType
 >>> random_udf = udf(lambda: random.randint(0, 100), 
IntegerType()).asNondeterministic()
->>> newRandom_udf = sqlContext.registerFunction("random_udf", 
random_udf, StringType())
+>>> newRandom_udf = sqlContext.registerUDF("random_udf", 
random_udf)
 >>> sqlContext.sql("SELECT random_udf()").collect()  # doctest: 
+SKIP
-[Row(random_udf()=u'82')]
+[Row(random_udf()=82)]
 >>> sqlContext.range(1).select(newRandom_udf()).collect()  # 
doctest: +SKIP
-[Row(random_udf()=u'62')]
+[Row(random_udf()=62)]
+
+>>> from pyspark.sql.functions import pandas_udf, PandasUDFType
+>>> @pandas_udf("integer", PandasUDFType.SCALAR)  # doctest: +SKIP
+... def add_one(x):
+... return x + 1
+...
+>>> _ = sqlContext.udf.registerUDF("add_one", add_one)  # doctest: 
+SKIP
+>>> sqlContext.sql("SELECT add_one(id) FROM range(10)").collect()  
# doctest: +SKIP
--- End diff --

ditto.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20217: [SPARK-23026] [PySpark] Add RegisterUDF to PySpar...

2018-01-10 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20217#discussion_r160872774
  
--- Diff: python/pyspark/sql/catalog.py ---
@@ -255,26 +255,67 @@ def registerFunction(self, name, f, 
returnType=StringType()):
 >>> _ = spark.udf.register("stringLengthInt", len, IntegerType())
 >>> spark.sql("SELECT stringLengthInt('test')").collect()
 [Row(stringLengthInt(test)=4)]
+"""
+
+# This is to check whether the input function is a wrapped/native 
UserDefinedFunction
+if hasattr(f, 'asNondeterministic'):
+raise ValueError("Please use registerUDF for registering UDF. 
The expected function of "
+ "registerFunction is a Python function 
(including lambda function)")
+udf = UserDefinedFunction(f, returnType=returnType, name=name,
+  evalType=PythonEvalType.SQL_BATCHED_UDF)
+self._jsparkSession.udf().registerPython(name, udf._judf)
+return udf._wrapped()
+
+@ignore_unicode_prefix
+@since(2.3)
+def registerUDF(self, name, f):
+"""Registers a :class:`UserDefinedFunction`. The registered UDF 
can be used in SQL
+statement.
+
+:param name: name of the UDF
+:param f: a wrapped/native UserDefinedFunction. The UDF can be 
either row-at-a-time or
+  scalar vectorized. Grouped vectorized UDFs are not 
supported.
+:return: a wrapped :class:`UserDefinedFunction`
+
+>>> from pyspark.sql.types import IntegerType
+>>> from pyspark.sql.functions import udf
+>>> slen = udf(lambda s: len(s), IntegerType())
+>>> _ = spark.udf.registerUDF("slen", slen)
+>>> spark.sql("SELECT slen('test')").collect()
+[Row(slen(test)=4)]
 
 >>> import random
 >>> from pyspark.sql.functions import udf
->>> from pyspark.sql.types import IntegerType, StringType
+>>> from pyspark.sql.types import IntegerType
 >>> random_udf = udf(lambda: random.randint(0, 100), 
IntegerType()).asNondeterministic()
->>> newRandom_udf = spark.catalog.registerFunction("random_udf", 
random_udf, StringType())
+>>> newRandom_udf = spark.catalog.registerUDF("random_udf", 
random_udf)
 >>> spark.sql("SELECT random_udf()").collect()  # doctest: +SKIP
-[Row(random_udf()=u'82')]
+[Row(random_udf()=82)]
 >>> spark.range(1).select(newRandom_udf()).collect()  # doctest: 
+SKIP
-[Row(random_udf()=u'62')]
+[Row(random_udf()=62)]
+
+>>> from pyspark.sql.functions import pandas_udf, PandasUDFType
+>>> @pandas_udf("integer", PandasUDFType.SCALAR)  # doctest: +SKIP
+... def add_one(x):
+... return x + 1
+...
+>>> _ = spark.udf.registerUDF("add_one", add_one)  # doctest: +SKIP
+>>> spark.sql("SELECT add_one(id) FROM range(10)").collect()  # 
doctest: +SKIP
--- End diff --

Although this will be skipped, we should show the result like 
`[Row(add_one(id)=1), Row(add_one(id)=2), ...]`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20217: [SPARK-23026] [PySpark] Add RegisterUDF to PySpar...

2018-01-10 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20217#discussion_r160873927
  
--- Diff: python/pyspark/sql/context.py ---
@@ -578,6 +606,9 @@ def __init__(self, sqlContext):
 def register(self, name, f, returnType=StringType()):
 return self.sqlContext.registerFunction(name, f, returnType)
 
+def registerUDF(self, name, f):
--- End diff --

Maybe we can add doc here by `registerUDF.__doc__ == 
SQLContext.registerUDF.__doc__` similar to doc for `register`.
(We should do it for `registerJavaFunction` and `registerJavaUDAF`, too?)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20192: [SPARK-22994][k8s] Use a single image for all Spa...

2018-01-10 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/20192#discussion_r160874070
  
--- Diff: 
resource-managers/kubernetes/docker/src/main/dockerfiles/executor/Dockerfile ---
@@ -1,35 +0,0 @@
-#
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
-
-FROM spark-base
-
-# Before building the docker image, first build and make a Spark 
distribution following
-# the instructions in 
http://spark.apache.org/docs/latest/building-spark.html.
-# If this docker file is being used in the context of building your images 
from a Spark
-# distribution, the docker build command should be invoked from the top 
level directory
-# of the Spark distribution. E.g.:
-# docker build -t spark-executor:latest -f 
kubernetes/dockerfiles/executor/Dockerfile .
-
-COPY examples /opt/spark/examples
-
-CMD SPARK_CLASSPATH="${SPARK_HOME}/jars/*" && \
-env | grep SPARK_JAVA_OPT_ | sed 's/[^=]*=\(.*\)/\1/g' > 
/tmp/java_opts.txt && \
-readarray -t SPARK_EXECUTOR_JAVA_OPTS < /tmp/java_opts.txt && \
-if ! [ -z ${SPARK_MOUNTED_CLASSPATH}+x} ]; then 
SPARK_CLASSPATH="$SPARK_MOUNTED_CLASSPATH:$SPARK_CLASSPATH"; fi && \
-if ! [ -z ${SPARK_EXECUTOR_EXTRA_CLASSPATH+x} ]; then 
SPARK_CLASSPATH="$SPARK_EXECUTOR_EXTRA_CLASSPATH:$SPARK_CLASSPATH"; fi && \
--- End diff --

to clarify, by that I mean we no longer have the ability to customize 
different classpath for executor and driver.
for reference, see spark.driver.extraClassPath vs 
spark.executor.extraClassPath


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20192: [SPARK-22994][k8s] Use a single image for all Spa...

2018-01-10 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/20192#discussion_r160874300
  
--- Diff: 
resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh ---
@@ -0,0 +1,97 @@
+#!/bin/bash
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# echo commands to the terminal output
+set -ex
+
+# Check whether there is a passwd entry for the container UID
+myuid=$(id -u)
+mygid=$(id -g)
+uidentry=$(getent passwd $myuid)
+
+# If there is no passwd entry for the container UID, attempt to create one
+if [ -z "$uidentry" ] ; then
+if [ -w /etc/passwd ] ; then
+echo "$myuid:x:$myuid:$mygid:anonymous uid:$SPARK_HOME:/bin/false" 
>> /etc/passwd
+else
+echo "Container ENTRYPOINT failed to add passwd entry for 
anonymous UID"
+fi
+fi
+
+SPARK_K8S_CMD="$1"
+if [ -z "$SPARK_K8S_CMD" ]; then
+  echo "No command to execute has been provided." 1>&2
--- End diff --

we can revisit when we have proper client support.
overriding the entrypoint won't do if I want everything else set (eg. 
SPARK_CLASSPATH)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20195: [SPARK-22972][SQL] Couldn't find corresponding Hive SerD...

2018-01-10 Thread xubo245
Github user xubo245 commented on the issue:

https://github.com/apache/spark/pull/20195
  
ok @dongjoon-hyun 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20195: [SPARK-22972][SQL] Couldn't find corresponding Hi...

2018-01-10 Thread xubo245
Github user xubo245 closed the pull request at:

https://github.com/apache/spark/pull/20195


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20229: Update RFormula to use VectorSizeHint & OneHotEncoderEst...

2018-01-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20229
  
**[Test build #85950 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85950/testReport)**
 for PR 20229 at commit 
[`7bad275`](https://github.com/apache/spark/commit/7bad275cd995d22d70a42a5b9932073bc3a5d1e5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20214: [SPARK-23023][SQL] Cast field data to strings in showStr...

2018-01-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20214
  
**[Test build #85951 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85951/testReport)**
 for PR 20214 at commit 
[`9cf9954`](https://github.com/apache/spark/commit/9cf995461990e46c007405344481cd802a0d6501).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20206: [SPARK-19256][SQL] Remove ordering enforcement from `Fil...

2018-01-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20206
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85946/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20206: [SPARK-19256][SQL] Remove ordering enforcement from `Fil...

2018-01-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20206
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20206: [SPARK-19256][SQL] Remove ordering enforcement from `Fil...

2018-01-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20206
  
**[Test build #85946 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85946/testReport)**
 for PR 20206 at commit 
[`8c91ff9`](https://github.com/apache/spark/commit/8c91ff9909fecaa36a79397e36edf64d83adac6b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20229: Update RFormula to use VectorSizeHint & OneHotEnc...

2018-01-10 Thread MrBago
GitHub user MrBago opened a pull request:

https://github.com/apache/spark/pull/20229

Update RFormula to use VectorSizeHint & OneHotEncoderEstimator.

## What changes were proposed in this pull request?

RFormula should use VectorSizeHint & OneHotEncoderEstimator in its pipeline 
to avoid using the deprecated OneHotEncoder & to ensure the model produced can 
be used in streaming.

## How was this patch tested?

Unit tests.

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/MrBago/spark rFormula

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20229.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20229


commit 7bad275cd995d22d70a42a5b9932073bc3a5d1e5
Author: Bago Amirbekian 
Date:   2018-01-11T05:45:27Z

Update RFormula to use VectorSizeHint & OneHotEncoderEstimator.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20087: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...

2018-01-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20087
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85944/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20087: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...

2018-01-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20087
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20087: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...

2018-01-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20087
  
**[Test build #85944 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85944/testReport)**
 for PR 20087 at commit 
[`4b89b44`](https://github.com/apache/spark/commit/4b89b44b2b4a9a1ffee01da13a3e55cdd3aa260d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20214: [SPARK-23023][SQL] Cast field data to strings in showStr...

2018-01-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20214
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20214: [SPARK-23023][SQL] Cast field data to strings in showStr...

2018-01-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20214
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85948/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20214: [SPARK-23023][SQL] Cast field data to strings in showStr...

2018-01-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20214
  
**[Test build #85948 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85948/testReport)**
 for PR 20214 at commit 
[`cbccb1b`](https://github.com/apache/spark/commit/cbccb1b3a4f4220730c8e6260dda0e552c6c044b).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20194: [SPARK-22999][SQL]'show databases like command' c...

2018-01-10 Thread guoxiaolongzte
Github user guoxiaolongzte commented on a diff in the pull request:

https://github.com/apache/spark/pull/20194#discussion_r160869162
  
--- Diff: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -141,7 +141,7 @@ statement
 (LIKE? pattern=STRING)?
#showTables
 | SHOW TABLE EXTENDED ((FROM | IN) db=identifier)?
 LIKE pattern=STRING partitionSpec? 
#showTable
-| SHOW DATABASES (LIKE pattern=STRING)?
#showDatabases
+| SHOW DATABASES (LIKE? pattern=STRING)?
#showDatabases
--- End diff --

No, I just saw like show tables like can be removed. so I think show 
databases like can also be removed. Just think it is removed, the operation is 
more convenient.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20195: [SPARK-22972] Couldn't find corresponding Hive SerDe for...

2018-01-10 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/20195
  
@xubo245. Since it's merged, could you close your PR now?
For the PR against old branches like this, we need to close the PR manually.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20100: [SPARK-22913][SQL] Improved Hive Partition Pruning

2018-01-10 Thread ameent
Github user ameent commented on the issue:

https://github.com/apache/spark/pull/20100
  
@srowen can you help find someone to review this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20100: [SPARK-22913][SQL] Improved Hive Partition Pruning

2018-01-10 Thread ameent
Github user ameent commented on the issue:

https://github.com/apache/spark/pull/20100
  
Any updates on this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20194: [SPARK-22999][SQL]'show databases like command' c...

2018-01-10 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20194#discussion_r160867737
  
--- Diff: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -141,7 +141,7 @@ statement
 (LIKE? pattern=STRING)?
#showTables
 | SHOW TABLE EXTENDED ((FROM | IN) db=identifier)?
 LIKE pattern=STRING partitionSpec? 
#showTable
-| SHOW DATABASES (LIKE pattern=STRING)?
#showDatabases
+| SHOW DATABASES (LIKE? pattern=STRING)?
#showDatabases
--- End diff --

@guoxiaolongzte . MySQL also doesn't work like that. Do you have any 
reference for that syntax?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20227: [SPARK-23035] Fix warning: TEMPORARY TABLE ... US...

2018-01-10 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20227#discussion_r160867160
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala 
---
@@ -814,7 +814,7 @@ abstract class DDLSuite extends QueryTest with 
SQLTestUtils {
 withTempView("tab1") {
   sql(
 """
-  |CREATE TEMPORARY TABLE tab1
--- End diff --

For this one, I think we should keep the test coverage because we still 
support this legacy syntax.
Could you revert this kind of changes?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20227: [SPARK-23035] Fix warning: TEMPORARY TABLE ... US...

2018-01-10 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20227#discussion_r160867069
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AlreadyExistException.scala
 ---
@@ -33,6 +33,9 @@ class TableAlreadyExistsException(db: String, table: 
String)
 class TempTableAlreadyExistsException(table: String)
   extends AnalysisException(s"Temporary table '$table' already exists")
 
+class TempViewAlreadyExistsException(table: String)
+  extends AnalysisException(s"Temporary view '$table' already exists")
--- End diff --

+1 for adding this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20228: [SPARK-23036] Add withGlobalTempView for testing and cor...

2018-01-10 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/20228
  
BTW, please put '[SQL]' into your title.
Then, your PR will be listed under SQL category here.
- https://spark-prs.appspot.com/open-prs


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20226: [SPARK-23034][SQL][UI] Display tablename for `HiveTableS...

2018-01-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20226
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85945/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20226: [SPARK-23034][SQL][UI] Display tablename for `HiveTableS...

2018-01-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20226
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20226: [SPARK-23034][SQL][UI] Display tablename for `HiveTableS...

2018-01-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20226
  
**[Test build #85945 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85945/testReport)**
 for PR 20226 at commit 
[`6271804`](https://github.com/apache/spark/commit/62718040b1403dd796315bded9e47983f0ba9d6f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20228: [SPARK-23036] Add withGlobalTempView for testing ...

2018-01-10 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20228#discussion_r160866670
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/GlobalTempViewSuite.scala
 ---
@@ -140,8 +140,8 @@ class GlobalTempViewSuite extends QueryTest with 
SharedSQLContext {
 
   
assert(spark.catalog.listTables(globalTempDB).collect().toSeq.map(_.name) == 
Seq("v1", "v2"))
 } finally {
-  spark.catalog.dropTempView("v1")
-  spark.catalog.dropGlobalTempView("v2")
+  spark.catalog.dropGlobalTempView("v1")
+  spark.catalog.dropTempView("v2")
--- End diff --

Hi, @xubo245 .
Could you split the bug and new improvement into two separate PR?
Then, your PR will get reviews more easily.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20217: [SPARK-23026] [PySpark] Add RegisterUDF to PySpar...

2018-01-10 Thread icexelloss
Github user icexelloss commented on a diff in the pull request:

https://github.com/apache/spark/pull/20217#discussion_r160864450
  
--- Diff: python/pyspark/sql/context.py ---
@@ -203,18 +203,46 @@ def registerFunction(self, name, f, 
returnType=StringType()):
 >>> _ = sqlContext.udf.register("stringLengthInt", lambda x: 
len(x), IntegerType())
 >>> sqlContext.sql("SELECT stringLengthInt('test')").collect()
 [Row(stringLengthInt(test)=4)]
+"""
+return self.sparkSession.catalog.registerFunction(name, f, 
returnType)
+
+@ignore_unicode_prefix
+@since(2.3)
+def registerUDF(self, name, f):
--- End diff --

Ok. Sounds good to me.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20211: [SPARK-23011][PYTHON][SQL] Prepend missing groupi...

2018-01-10 Thread icexelloss
Github user icexelloss commented on a diff in the pull request:

https://github.com/apache/spark/pull/20211#discussion_r160864143
  
--- Diff: python/pyspark/sql/group.py ---
@@ -233,6 +233,27 @@ def apply(self, udf):
 |  2| 1.1094003924504583|
 +---+---+
 
+Notes on grouping column:
--- End diff --

Yeah. To be honest I don't think there is behavior that is both simple and 
works well with all use cases. It's probably a matter of leaning towards 
simpler behavior that doesn't work well in some cases or towards somewhat 
"magic" behavior. I don't think there is an obvious answer here.

Another option is to always prepend grouping columns, if users want to 
return grouping columns in the UDF output, they can do a `drop` after `groupby 
apply` 

```
pandas_udf('id int, v double', GROUP_MAP)
def foo(pdf):
  return pdf.assign(v=pdf.v+1)

df.groupby('id').apply(foo).drop(df.id)
```

I don't think it's too annoying to add a drop after apply and it works well 
with the linear regression case. This is also a pretty straight forward non 
magical behavior. 

What do you all think?



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20151: [SPARK-22959][PYTHON] Configuration to select the...

2018-01-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20151#discussion_r160864049
  
--- Diff: 
core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala ---
@@ -34,17 +34,39 @@ private[spark] class PythonWorkerFactory(pythonExec: 
String, envVars: Map[String
 
   import PythonWorkerFactory._
 
-  // Because forking processes from Java is expensive, we prefer to launch 
a single Python daemon
-  // (pyspark/daemon.py) and tell it to fork new workers for our tasks. 
This daemon currently
-  // only works on UNIX-based systems now because it uses signals for 
child management, so we can
-  // also fall back to launching workers (pyspark/worker.py) directly.
+  // Because forking processes from Java is expensive, we prefer to launch 
a single Python daemon,
+  // pyspark/daemon.py (by default) and tell it to fork new workers for 
our tasks. This daemon
+  // currently only works on UNIX-based systems now because it uses 
signals for child management,
+  // so we can also fall back to launching workers, pyspark/worker.py (by 
default) directly.
   val useDaemon = {
 val useDaemonEnabled = 
SparkEnv.get.conf.getBoolean("spark.python.use.daemon", true)
 
 // This flag is ignored on Windows as it's unable to fork.
 !System.getProperty("os.name").startsWith("Windows") && 
useDaemonEnabled
   }
 
+  // WARN: Both configurations, 'spark.python.daemon.module' and 
'spark.python.worker.module' are
+  // for very advanced users and they are experimental. This should be 
considered
+  // as expert-only option, and shouldn't be used before knowing what it 
means exactly.
+
+  // This configuration indicates the module to run the daemon to execute 
its Python workers.
+  val daemonModule = 
SparkEnv.get.conf.getOption("spark.python.daemon.module").map { value =>
--- End diff --

Hm, actually we could check like .. if it's empty string too. I wrote 
"shouldn't be used before knowing what it means exactly." above. So, I think 
it's fine.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20072: [SPARK-22790][SQL] add a configurable factor to describe...

2018-01-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20072
  
**[Test build #85949 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85949/testReport)**
 for PR 20072 at commit 
[`5230081`](https://github.com/apache/spark/commit/523008192cb1476a68cf3808f46574598fcc6d2d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20217: [SPARK-23026] [PySpark] Add RegisterUDF to PySpar...

2018-01-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20217#discussion_r160863048
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -4085,33 +4091,50 @@ def 
test_vectorized_udf_timestamps_respect_session_timezone(self):
 finally:
 self.spark.conf.set("spark.sql.session.timeZone", orig_tz)
 
-def test_nondeterministic_udf(self):
+def test_nondeterministic_vectorized_udf(self):
 # Test that nondeterministic UDFs are evaluated only once in 
chained UDF evaluations
 from pyspark.sql.functions import udf, pandas_udf, col
 
 @pandas_udf('double')
 def plus_ten(v):
 return v + 10
-random_udf = self.random_udf
+random_udf = self.nondeterministic_vectorized_udf
 
 df = self.spark.range(10).withColumn('rand', random_udf(col('id')))
 result1 = df.withColumn('plus_ten(rand)', 
plus_ten(df['rand'])).toPandas()
 
 self.assertEqual(random_udf.deterministic, False)
 self.assertTrue(result1['plus_ten(rand)'].equals(result1['rand'] + 
10))
 
-def test_nondeterministic_udf_in_aggregate(self):
+def test_nondeterministic_vectorized_udf_in_aggregate(self):
 from pyspark.sql.functions import pandas_udf, sum
 
 df = self.spark.range(10)
-random_udf = self.random_udf
+random_udf = self.nondeterministic_vectorized_udf
 
 with QuietTest(self.sc):
 with self.assertRaisesRegexp(AnalysisException, 
'nondeterministic'):
 df.groupby(df.id).agg(sum(random_udf(df.id))).collect()
 with self.assertRaisesRegexp(AnalysisException, 
'nondeterministic'):
 df.agg(sum(random_udf(df.id))).collect()
 
+def test_register_vectorized_udf_basic(self):
+from pyspark.rdd import PythonEvalType
+from pyspark.sql.functions import pandas_udf, col, expr
+df = self.spark.range(10).select(
+col('id').cast('int').alias('a'),
+col('id').cast('int').alias('b'))
+originalAdd = pandas_udf(lambda x, y: x + y, IntegerType())
--- End diff --

`originalAdd` ->`original_add`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20217: [SPARK-23026] [PySpark] Add RegisterUDF to PySpar...

2018-01-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20217#discussion_r160863023
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -4085,33 +4091,50 @@ def 
test_vectorized_udf_timestamps_respect_session_timezone(self):
 finally:
 self.spark.conf.set("spark.sql.session.timeZone", orig_tz)
 
-def test_nondeterministic_udf(self):
+def test_nondeterministic_vectorized_udf(self):
 # Test that nondeterministic UDFs are evaluated only once in 
chained UDF evaluations
 from pyspark.sql.functions import udf, pandas_udf, col
 
 @pandas_udf('double')
 def plus_ten(v):
 return v + 10
-random_udf = self.random_udf
+random_udf = self.nondeterministic_vectorized_udf
 
 df = self.spark.range(10).withColumn('rand', random_udf(col('id')))
 result1 = df.withColumn('plus_ten(rand)', 
plus_ten(df['rand'])).toPandas()
 
 self.assertEqual(random_udf.deterministic, False)
 self.assertTrue(result1['plus_ten(rand)'].equals(result1['rand'] + 
10))
 
-def test_nondeterministic_udf_in_aggregate(self):
+def test_nondeterministic_vectorized_udf_in_aggregate(self):
 from pyspark.sql.functions import pandas_udf, sum
 
 df = self.spark.range(10)
-random_udf = self.random_udf
+random_udf = self.nondeterministic_vectorized_udf
 
 with QuietTest(self.sc):
 with self.assertRaisesRegexp(AnalysisException, 
'nondeterministic'):
 df.groupby(df.id).agg(sum(random_udf(df.id))).collect()
 with self.assertRaisesRegexp(AnalysisException, 
'nondeterministic'):
 df.agg(sum(random_udf(df.id))).collect()
 
+def test_register_vectorized_udf_basic(self):
+from pyspark.rdd import PythonEvalType
+from pyspark.sql.functions import pandas_udf, col, expr
+df = self.spark.range(10).select(
+col('id').cast('int').alias('a'),
+col('id').cast('int').alias('b'))
+originalAdd = pandas_udf(lambda x, y: x + y, IntegerType())
+self.assertEqual(originalAdd.deterministic, True)
+self.assertEqual(originalAdd.evalType, 
PythonEvalType.SQL_PANDAS_SCALAR_UDF)
+newAdd = self.spark.catalog.registerUDF("add1", originalAdd)
--- End diff --

`newAdd` -> `new_add`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20217: [SPARK-23026] [PySpark] Add RegisterUDF to PySpar...

2018-01-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20217#discussion_r160862588
  
--- Diff: python/pyspark/sql/catalog.py ---
@@ -255,26 +255,67 @@ def registerFunction(self, name, f, 
returnType=StringType()):
 >>> _ = spark.udf.register("stringLengthInt", len, IntegerType())
 >>> spark.sql("SELECT stringLengthInt('test')").collect()
 [Row(stringLengthInt(test)=4)]
+"""
+
+# This is to check whether the input function is a wrapped/native 
UserDefinedFunction
+if hasattr(f, 'asNondeterministic'):
--- End diff --

BTW, wouldn't it break the compatibility?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20225: [SPARK-23033] Don't use task level retry for continuous ...

2018-01-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20225
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20225: [SPARK-23033] Don't use task level retry for continuous ...

2018-01-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20225
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85939/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20211: [SPARK-23011][PYTHON][SQL] Prepend missing groupi...

2018-01-10 Thread icexelloss
Github user icexelloss commented on a diff in the pull request:

https://github.com/apache/spark/pull/20211#discussion_r160862182
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -3995,23 +3995,49 @@ def test_coerce(self):
 self.assertFramesEqual(expected, result)
 
 def test_complex_groupby(self):
+import pandas as pd
 from pyspark.sql.functions import pandas_udf, col, PandasUDFType
 df = self.data
+pdf = df.toPandas()
 
 @pandas_udf(
-'id long, v int, norm double',
+'v int, v2 double',
 PandasUDFType.GROUP_MAP
 )
-def normalize(pdf):
+def foo(pdf):
 v = pdf.v
-return pdf.assign(norm=(v - v.mean()) / v.std())
-
-result = df.groupby(col('id') % 2 == 
0).apply(normalize).sort('id', 'v').toPandas()
-pdf = df.toPandas()
-expected = pdf.groupby(pdf['id'] % 2 == 0).apply(normalize.func)
-expected = expected.sort_values(['id', 'v']).reset_index(drop=True)
-expected = expected.assign(norm=expected.norm.astype('float64'))
-self.assertFramesEqual(expected, result)
+return pd.DataFrame({'v': v + 1, 'v2': v - v.mean()})[:]
--- End diff --

This is just for simplifying the test - pandas has very complicated 
behavior when it comes to what's the index of the return value when using 
`groupby apply`

If interested, take a look at 
http://nbviewer.jupyter.org/gist/mbirdi/05f8a83d340476e5f03a


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20225: [SPARK-23033] Don't use task level retry for continuous ...

2018-01-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20225
  
**[Test build #85939 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85939/testReport)**
 for PR 20225 at commit 
[`1bf613f`](https://github.com/apache/spark/commit/1bf613f2162ad07289d99c7ef3cbd0a7e2b73558).
 * This patch **fails from timeout after a configured wait of \`250m\`**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20217: [SPARK-23026] [PySpark] Add RegisterUDF to PySpar...

2018-01-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20217#discussion_r160862123
  
--- Diff: python/pyspark/sql/context.py ---
@@ -203,18 +203,46 @@ def registerFunction(self, name, f, 
returnType=StringType()):
 >>> _ = sqlContext.udf.register("stringLengthInt", lambda x: 
len(x), IntegerType())
 >>> sqlContext.sql("SELECT stringLengthInt('test')").collect()
 [Row(stringLengthInt(test)=4)]
+"""
+return self.sparkSession.catalog.registerFunction(name, f, 
returnType)
+
+@ignore_unicode_prefix
+@since(2.3)
+def registerUDF(self, name, f):
--- End diff --

I think that's ideally right. But let's keep it consistent with the 
convention used here. Let's discuss and try it separately and do it in batch.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20204: [SPARK-7721][PYTHON][TESTS] Adds PySpark coverage genera...

2018-01-10 Thread icexelloss
Github user icexelloss commented on the issue:

https://github.com/apache/spark/pull/20204
  
> @icexelloss, for #20204 (comment), yes, that's the way I usually use too. 
My worry is though I wonder if this is a proper official way to do it because I 
have been thinking this way is rather meant to be internal.

Sounds good to me. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20217: [SPARK-23026] [PySpark] Add RegisterUDF to PySpar...

2018-01-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20217#discussion_r160861750
  
--- Diff: python/pyspark/sql/catalog.py ---
@@ -255,26 +255,67 @@ def registerFunction(self, name, f, 
returnType=StringType()):
 >>> _ = spark.udf.register("stringLengthInt", len, IntegerType())
 >>> spark.sql("SELECT stringLengthInt('test')").collect()
 [Row(stringLengthInt(test)=4)]
+"""
+
+# This is to check whether the input function is a wrapped/native 
UserDefinedFunction
+if hasattr(f, 'asNondeterministic'):
--- End diff --

Yea, but the wrapped UDF is also callable .. unfortunately :(.

I suggested this way for this reason:

in `udf.py`

```diff
+wrapper._unwrapped = lambda: self
 return wrapper
```

and then

```
if hasattr(f, "_unwrapped"):
f = f._unwrapped()
if isinstance(f, UserDefinedFunction):
...
else:
...
```

but it was no string opinion.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20226: [SPARK-23034][SQL][UI] Display tablename for `HiveTableS...

2018-01-10 Thread tejasapatil
Github user tejasapatil commented on the issue:

https://github.com/apache/spark/pull/20226
  
@dongjoon-hyun : I have updated the PR description


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20217: [SPARK-23026] [PySpark] Add RegisterUDF to PySpar...

2018-01-10 Thread icexelloss
Github user icexelloss commented on a diff in the pull request:

https://github.com/apache/spark/pull/20217#discussion_r160861434
  
--- Diff: python/pyspark/sql/context.py ---
@@ -203,18 +203,46 @@ def registerFunction(self, name, f, 
returnType=StringType()):
 >>> _ = sqlContext.udf.register("stringLengthInt", lambda x: 
len(x), IntegerType())
 >>> sqlContext.sql("SELECT stringLengthInt('test')").collect()
 [Row(stringLengthInt(test)=4)]
+"""
+return self.sparkSession.catalog.registerFunction(name, f, 
returnType)
+
+@ignore_unicode_prefix
+@since(2.3)
+def registerUDF(self, name, f):
--- End diff --

I prefer not to duplicate the doc string. Maybe we can put the docstring in 
the user facing API (I think it's the SQLContext one?) And reference the doc 
string in the other one.

Or, maybe we can do sth like this if we want both docstrings

```
@ignore_unicode_prefix
@since(2.3)
def registerUDF(self, name, f):
   return self.sparkSession.catalog.registerUDF(name, f)
registerUDF.__doc__ = pyspark.sql.catalog.Catalog.registerUDF.__doc__
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20204: [SPARK-7721][PYTHON][TESTS] Adds PySpark coverage...

2018-01-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20204#discussion_r160860487
  
--- Diff: python/test_coverage/sitecustomize.py ---
@@ -0,0 +1,19 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import coverage
+coverage.process_startup()
--- End diff --

Yup, will add some comments.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20211: [SPARK-23011][PYTHON][SQL] Prepend missing groupi...

2018-01-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20211#discussion_r160860320
  
--- Diff: python/pyspark/sql/group.py ---
@@ -233,6 +233,27 @@ def apply(self, udf):
 |  2| 1.1094003924504583|
 +---+---+
 
+Notes on grouping column:
--- End diff --

Yup, I saw this usecase as described in the JIRA and I got that the 
specific case can be simplified; however, I am not sure if it's straightforward 
to the end users.

For example, if I use `pandas_udf` I think I would simply expect the return 
schema is matched as described in `returnType`. I think `pandas_udf` already 
need some background and I think we should make it simpler as possible as we 
can.

It might be convenient to make the guarantee on grouping columns in some 
cases vs this might be a kind of magic inside.

I would prefer to let the UDF to specify the grouping columns to make this 
more straightforward more .. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19885: [SPARK-22587] Spark job fails if fs.defaultFS and...

2018-01-10 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19885


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20217: [SPARK-23026] [PySpark] Add RegisterUDF to PySpar...

2018-01-10 Thread icexelloss
Github user icexelloss commented on a diff in the pull request:

https://github.com/apache/spark/pull/20217#discussion_r160859938
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -4147,6 +4170,21 @@ def test_simple(self):
 expected = 
df.toPandas().groupby('id').apply(foo_udf.func).reset_index(drop=True)
 self.assertFramesEqual(expected, result)
 
+def test_registerGroupMapUDF(self):
--- End diff --

nit: test_register_group_map_udf


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2018-01-10 Thread jerryshao
Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/19885
  
Let me merge to master and branch 2.3. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20217: [SPARK-23026] [PySpark] Add RegisterUDF to PySpar...

2018-01-10 Thread icexelloss
Github user icexelloss commented on a diff in the pull request:

https://github.com/apache/spark/pull/20217#discussion_r160858810
  
--- Diff: python/pyspark/sql/catalog.py ---
@@ -255,26 +255,67 @@ def registerFunction(self, name, f, 
returnType=StringType()):
 >>> _ = spark.udf.register("stringLengthInt", len, IntegerType())
 >>> spark.sql("SELECT stringLengthInt('test')").collect()
 [Row(stringLengthInt(test)=4)]
+"""
+
+# This is to check whether the input function is a wrapped/native 
UserDefinedFunction
+if hasattr(f, 'asNondeterministic'):
--- End diff --

It might be better to check `f` is (1) callable (2) not a UDF object


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20217: [SPARK-23026] [PySpark] Add RegisterUDF to PySpar...

2018-01-10 Thread icexelloss
Github user icexelloss commented on a diff in the pull request:

https://github.com/apache/spark/pull/20217#discussion_r160858570
  
--- Diff: python/pyspark/sql/catalog.py ---
@@ -255,26 +255,67 @@ def registerFunction(self, name, f, 
returnType=StringType()):
 >>> _ = spark.udf.register("stringLengthInt", len, IntegerType())
 >>> spark.sql("SELECT stringLengthInt('test')").collect()
 [Row(stringLengthInt(test)=4)]
+"""
+
+# This is to check whether the input function is a wrapped/native 
UserDefinedFunction
+if hasattr(f, 'asNondeterministic'):
+raise ValueError("Please use registerUDF for registering UDF. 
The expected function of "
+ "registerFunction is a Python function 
(including lambda function)")
+udf = UserDefinedFunction(f, returnType=returnType, name=name,
+  evalType=PythonEvalType.SQL_BATCHED_UDF)
+self._jsparkSession.udf().registerPython(name, udf._judf)
+return udf._wrapped()
+
+@ignore_unicode_prefix
+@since(2.3)
+def registerUDF(self, name, f):
+"""Registers a :class:`UserDefinedFunction`. The registered UDF 
can be used in SQL
+statement.
+
+:param name: name of the UDF
+:param f: a wrapped/native UserDefinedFunction. The UDF can be 
either row-at-a-time or
+  scalar vectorized. Grouped vectorized UDFs are not 
supported.
+:return: a wrapped :class:`UserDefinedFunction`
+
+>>> from pyspark.sql.types import IntegerType
+>>> from pyspark.sql.functions import udf
+>>> slen = udf(lambda s: len(s), IntegerType())
+>>> _ = spark.udf.registerUDF("slen", slen)
+>>> spark.sql("SELECT slen('test')").collect()
+[Row(slen(test)=4)]
 
 >>> import random
 >>> from pyspark.sql.functions import udf
->>> from pyspark.sql.types import IntegerType, StringType
+>>> from pyspark.sql.types import IntegerType
 >>> random_udf = udf(lambda: random.randint(0, 100), 
IntegerType()).asNondeterministic()
->>> newRandom_udf = spark.catalog.registerFunction("random_udf", 
random_udf, StringType())
+>>> newRandom_udf = spark.catalog.registerUDF("random_udf", 
random_udf)
 >>> spark.sql("SELECT random_udf()").collect()  # doctest: +SKIP
-[Row(random_udf()=u'82')]
+[Row(random_udf()=82)]
 >>> spark.range(1).select(newRandom_udf()).collect()  # doctest: 
+SKIP
-[Row(random_udf()=u'62')]
+[Row(random_udf()=62)]
+
+>>> from pyspark.sql.functions import pandas_udf, PandasUDFType
+>>> @pandas_udf("integer", PandasUDFType.SCALAR)  # doctest: +SKIP
+... def add_one(x):
+... return x + 1
+...
+>>> _ = spark.udf.registerUDF("add_one", add_one)  # doctest: +SKIP
+>>> spark.sql("SELECT add_one(id) FROM range(10)").collect()  # 
doctest: +SKIP
 """
 
 # This is to check whether the input function is a wrapped/native 
UserDefinedFunction
 if hasattr(f, 'asNondeterministic'):
-udf = UserDefinedFunction(f.func, returnType=returnType, 
name=name,
-  
evalType=PythonEvalType.SQL_BATCHED_UDF,
+if f.evalType not in [PythonEvalType.SQL_BATCHED_UDF,
+  PythonEvalType.SQL_PANDAS_SCALAR_UDF]:
+raise ValueError(
+"Invalid f: f must be either SQL_BATCHED_UDF or 
SQL_PANDAS_SCALAR_UDF")
+udf = UserDefinedFunction(f.func, returnType=f.returnType, 
name=name,
+  evalType=f.evalType,
   deterministic=f.deterministic)
 else:
-udf = UserDefinedFunction(f, returnType=returnType, name=name,
-  
evalType=PythonEvalType.SQL_BATCHED_UDF)
+raise TypeError("Please use registerFunction for registering a 
Python function "
--- End diff --

+1. I think the user might not necessarily pass a python function here. 
Maybe a error message like "Invalid UDF: f must be a object returned by `udf` 
or `pandas_udf`"


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20217: [SPARK-23026] [PySpark] Add RegisterUDF to PySpar...

2018-01-10 Thread icexelloss
Github user icexelloss commented on a diff in the pull request:

https://github.com/apache/spark/pull/20217#discussion_r160858423
  
--- Diff: python/pyspark/sql/catalog.py ---
@@ -255,26 +255,67 @@ def registerFunction(self, name, f, 
returnType=StringType()):
 >>> _ = spark.udf.register("stringLengthInt", len, IntegerType())
 >>> spark.sql("SELECT stringLengthInt('test')").collect()
 [Row(stringLengthInt(test)=4)]
+"""
+
+# This is to check whether the input function is a wrapped/native 
UserDefinedFunction
+if hasattr(f, 'asNondeterministic'):
+raise ValueError("Please use registerUDF for registering UDF. 
The expected function of "
+ "registerFunction is a Python function 
(including lambda function)")
+udf = UserDefinedFunction(f, returnType=returnType, name=name,
+  evalType=PythonEvalType.SQL_BATCHED_UDF)
+self._jsparkSession.udf().registerPython(name, udf._judf)
+return udf._wrapped()
+
+@ignore_unicode_prefix
+@since(2.3)
+def registerUDF(self, name, f):
+"""Registers a :class:`UserDefinedFunction`. The registered UDF 
can be used in SQL
+statement.
+
+:param name: name of the UDF
+:param f: a wrapped/native UserDefinedFunction. The UDF can be 
either row-at-a-time or
+  scalar vectorized. Grouped vectorized UDFs are not 
supported.
--- End diff --

I would probably say something like 
"The UDF can be either returned by `udf` or `pandas_udf` with type `SCALAR`"
to be more specific


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-10 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/20224
  
We always need to turn on this? It seems this is debug info for developers?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20222: [SPARK-23028] Bump master branch version to 2.4.0-SNAPSH...

2018-01-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20222
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85935/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20222: [SPARK-23028] Bump master branch version to 2.4.0-SNAPSH...

2018-01-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20222
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20217: [SPARK-23026] [PySpark] Add RegisterUDF to PySpar...

2018-01-10 Thread icexelloss
Github user icexelloss commented on a diff in the pull request:

https://github.com/apache/spark/pull/20217#discussion_r160858034
  
--- Diff: python/pyspark/sql/catalog.py ---
@@ -255,26 +255,67 @@ def registerFunction(self, name, f, 
returnType=StringType()):
 >>> _ = spark.udf.register("stringLengthInt", len, IntegerType())
 >>> spark.sql("SELECT stringLengthInt('test')").collect()
 [Row(stringLengthInt(test)=4)]
+"""
+
+# This is to check whether the input function is a wrapped/native 
UserDefinedFunction
+if hasattr(f, 'asNondeterministic'):
+raise ValueError("Please use registerUDF for registering UDF. 
The expected function of "
+ "registerFunction is a Python function 
(including lambda function)")
+udf = UserDefinedFunction(f, returnType=returnType, name=name,
+  evalType=PythonEvalType.SQL_BATCHED_UDF)
+self._jsparkSession.udf().registerPython(name, udf._judf)
+return udf._wrapped()
+
+@ignore_unicode_prefix
+@since(2.3)
+def registerUDF(self, name, f):
+"""Registers a :class:`UserDefinedFunction`. The registered UDF 
can be used in SQL
+statement.
+
+:param name: name of the UDF
--- End diff --

Maybe something like "name of the UDF in SQL statement" to be more specific.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20222: [SPARK-23028] Bump master branch version to 2.4.0-SNAPSH...

2018-01-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20222
  
**[Test build #85935 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85935/testReport)**
 for PR 20222 at commit 
[`9107f9f`](https://github.com/apache/spark/commit/9107f9f2d71610888fc4c0beac70a4a87c8348d9).
 * This patch **fails from timeout after a configured wait of \`250m\`**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20228: [SPARK-23036] Add withGlobalTempView for testing and cor...

2018-01-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20228
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20217: [SPARK-23026] [PySpark] Add RegisterUDF to PySpar...

2018-01-10 Thread icexelloss
Github user icexelloss commented on a diff in the pull request:

https://github.com/apache/spark/pull/20217#discussion_r160857927
  
--- Diff: python/pyspark/sql/catalog.py ---
@@ -255,26 +255,67 @@ def registerFunction(self, name, f, 
returnType=StringType()):
 >>> _ = spark.udf.register("stringLengthInt", len, IntegerType())
 >>> spark.sql("SELECT stringLengthInt('test')").collect()
 [Row(stringLengthInt(test)=4)]
+"""
+
+# This is to check whether the input function is a wrapped/native 
UserDefinedFunction
+if hasattr(f, 'asNondeterministic'):
+raise ValueError("Please use registerUDF for registering UDF. 
The expected function of "
+ "registerFunction is a Python function 
(including lambda function)")
+udf = UserDefinedFunction(f, returnType=returnType, name=name,
+  evalType=PythonEvalType.SQL_BATCHED_UDF)
+self._jsparkSession.udf().registerPython(name, udf._judf)
+return udf._wrapped()
+
+@ignore_unicode_prefix
+@since(2.3)
+def registerUDF(self, name, f):
+"""Registers a :class:`UserDefinedFunction`. The registered UDF 
can be used in SQL
+statement.
+
+:param name: name of the UDF
+:param f: a wrapped/native UserDefinedFunction. The UDF can be 
either row-at-a-time or
--- End diff --

I don't have a strong opinion what to describe them, but I think it's easy 
for user to understand if the description is consistent.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20217: [SPARK-23026] [PySpark] Add RegisterUDF to PySpar...

2018-01-10 Thread icexelloss
Github user icexelloss commented on a diff in the pull request:

https://github.com/apache/spark/pull/20217#discussion_r160857817
  
--- Diff: python/pyspark/sql/catalog.py ---
@@ -255,26 +255,67 @@ def registerFunction(self, name, f, 
returnType=StringType()):
 >>> _ = spark.udf.register("stringLengthInt", len, IntegerType())
 >>> spark.sql("SELECT stringLengthInt('test')").collect()
 [Row(stringLengthInt(test)=4)]
+"""
+
+# This is to check whether the input function is a wrapped/native 
UserDefinedFunction
+if hasattr(f, 'asNondeterministic'):
+raise ValueError("Please use registerUDF for registering UDF. 
The expected function of "
+ "registerFunction is a Python function 
(including lambda function)")
+udf = UserDefinedFunction(f, returnType=returnType, name=name,
+  evalType=PythonEvalType.SQL_BATCHED_UDF)
+self._jsparkSession.udf().registerPython(name, udf._judf)
+return udf._wrapped()
+
+@ignore_unicode_prefix
+@since(2.3)
+def registerUDF(self, name, f):
+"""Registers a :class:`UserDefinedFunction`. The registered UDF 
can be used in SQL
+statement.
+
+:param name: name of the UDF
+:param f: a wrapped/native UserDefinedFunction. The UDF can be 
either row-at-a-time or
--- End diff --

We should probably standardize the how to describe the wrapped function 
object returned by `udf` and `pandas_udf` in param docstring. Currently it's 
being described differently:

https://github.com/apache/spark/blob/master/python/pyspark/sql/group.py#L215


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20228: [SPARK-23036] Add withGlobalTempView for testing ...

2018-01-10 Thread xubo245
GitHub user xubo245 opened a pull request:

https://github.com/apache/spark/pull/20228

[SPARK-23036] Add withGlobalTempView for testing and correct some roper 
with view related method usage




## What changes were proposed in this pull request?

Add withGlobalTempView when create global temp view, like withTempView and 
withView.
And correct some improper usage.

## How was this patch tested?

no new test.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xubo245/spark DropTempView

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20228.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20228


commit fffd109e8c084f9a4d63840bf761364f1ede5dc9
Author: xubo245 <601450868@...>
Date:   2018-01-11T03:25:17Z

[SPARK-23036] Add withGlobalTempView for testing and correct some improper 
with view related method usage

Add withGlobalTempView when create global temp view, like withTempView and 
withView.
And correct some improper usage.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20226: [SPARK-23034][SQL][UI] Display tablename for `HiveTableS...

2018-01-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20226
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20226: [SPARK-23034][SQL][UI] Display tablename for `HiveTableS...

2018-01-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20226
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85943/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20226: [SPARK-23034][SQL][UI] Display tablename for `HiveTableS...

2018-01-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20226
  
**[Test build #85943 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85943/testReport)**
 for PR 20226 at commit 
[`6271804`](https://github.com/apache/spark/commit/62718040b1403dd796315bded9e47983f0ba9d6f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20223: [SPARK-23020][core] Fix races in launcher code, test.

2018-01-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20223
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20223: [SPARK-23020][core] Fix races in launcher code, test.

2018-01-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20223
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85937/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20223: [SPARK-23020][core] Fix races in launcher code, test.

2018-01-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20223
  
**[Test build #85937 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85937/testReport)**
 for PR 20223 at commit 
[`2018e29`](https://github.com/apache/spark/commit/2018e295f516d51e8c1661d3b77c31a029fdb006).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20214: [SPARK-23023][SQL] Cast field data to strings in showStr...

2018-01-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20214
  
**[Test build #85948 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85948/testReport)**
 for PR 20214 at commit 
[`cbccb1b`](https://github.com/apache/spark/commit/cbccb1b3a4f4220730c8e6260dda0e552c6c044b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20214: [SPARK-23023][SQL] Cast field data to strings in showStr...

2018-01-10 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/20214
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20214: [SPARK-23023][SQL] Cast field data to strings in showStr...

2018-01-10 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/20214
  
`org.apache.spark.sql.streaming.StreamingOuterJoinSuite` is flaky? (It 
seems this pr is not related to the test).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19872: [SPARK-22274][PYTHON][SQL] User-defined aggregation func...

2018-01-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19872
  
**[Test build #85947 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85947/testReport)**
 for PR 19872 at commit 
[`cb36227`](https://github.com/apache/spark/commit/cb362274711c1b26ed19e87aa15bc8c64668eae6).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   >