[GitHub] spark issue #20449: [SPARK-23040][CORE]: Returns interruptible iterator for ...

2018-02-28 Thread advancedxy
Github user advancedxy commented on the issue:

https://github.com/apache/spark/pull/20449
  
> I'm not sure, let's just try it :)

All right, I finally tracked down why it's hanging on Jenkins.
The global semaphores used by `interruptible iterator of shuffle reader` 
are interfered by other tasks.

Please check the latest change, @cloud-fan 



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20449: [SPARK-23040][CORE]: Returns interruptible iterator for ...

2018-02-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20449
  
**[Test build #87822 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87822/testReport)**
 for PR 20449 at commit 
[`a3d8ad5`](https://github.com/apache/spark/commit/a3d8ad56f0709c343e508c8b636083243f9ffdd2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20681: [SPARK-23518][SQL] Avoid metastore access when the users...

2018-02-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20681
  
**[Test build #87821 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87821/testReport)**
 for PR 20681 at commit 
[`6a962e9`](https://github.com/apache/spark/commit/6a962e900a2b9de2e434f2a6ec1eb256ea87a774).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20681: [SPARK-23518][SQL] Avoid metastore access when the users...

2018-02-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20681
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20681: [SPARK-23518][SQL] Avoid metastore access when the users...

2018-02-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20681
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1180/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20472: [SPARK-22751][ML]Improve ML RandomForest shuffle perform...

2018-02-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20472
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87820/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20472: [SPARK-22751][ML]Improve ML RandomForest shuffle perform...

2018-02-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20472
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20472: [SPARK-22751][ML]Improve ML RandomForest shuffle perform...

2018-02-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20472
  
**[Test build #87820 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87820/testReport)**
 for PR 20472 at commit 
[`51900da`](https://github.com/apache/spark/commit/51900da3266a9025ace567e3cbd5bf2b26051651).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20472: [SPARK-22751][ML]Improve ML RandomForest shuffle perform...

2018-02-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20472
  
**[Test build #87820 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87820/testReport)**
 for PR 20472 at commit 
[`51900da`](https://github.com/apache/spark/commit/51900da3266a9025ace567e3cbd5bf2b26051651).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #5785: [SPARK-7250][MLLIB] Added computeInverse to RowMatrix.sca...

2018-02-28 Thread kingsaction
Github user kingsaction commented on the issue:

https://github.com/apache/spark/pull/5785
  
@srowen how to add funciton that inverse matrix ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...

2018-02-28 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/20208
  
Hi, @gatorsmile , @HyukjinKwon , @cloud-fan .
Since 2.3 is officially announced, I'm pinging you guys again. :)
Please let me know if there is something for me to do here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20382: [SPARK-23097][SQL][SS] Migrate text socket source to V2

2018-02-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20382
  
**[Test build #87819 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87819/testReport)**
 for PR 20382 at commit 
[`1073be4`](https://github.com/apache/spark/commit/1073be420b2cc5fd099929fc0215bf8c1be4b6e0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20382: [SPARK-23097][SQL][SS] Migrate text socket source to V2

2018-02-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20382
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1179/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20382: [SPARK-23097][SQL][SS] Migrate text socket source to V2

2018-02-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20382
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous ...

2018-02-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19788
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous ...

2018-02-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19788
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87814/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous ...

2018-02-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19788
  
**[Test build #87814 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87814/testReport)**
 for PR 19788 at commit 
[`fc0fe77`](https://github.com/apache/spark/commit/fc0fe77cc4f1222ffd8a4a492e623ce43fd1f28c).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous ...

2018-02-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19788
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous ...

2018-02-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19788
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87812/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous ...

2018-02-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19788
  
**[Test build #87812 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87812/testReport)**
 for PR 19788 at commit 
[`c133776`](https://github.com/apache/spark/commit/c13377601da21368955335eb9f10e72c4ac18738).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20576: [SPARK-23389][CORE]When the shuffle dependency sp...

2018-02-28 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20576


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20576: [SPARK-23389][CORE]When the shuffle dependency specifies...

2018-02-28 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20576
  
thanks, merging to master!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20685: [SPARK-23524] Big local shuffle blocks should not be che...

2018-02-28 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/20685
  
cc @cloud-fan @jiangxb1987 
Could you please help take a look.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...

2018-02-28 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20670
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20382: [SPARK-23097][SQL][SS] Migrate text socket source...

2018-02-28 Thread jerryshao
Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/20382#discussion_r171469866
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/sources/TextSocketStreamSuite.scala
 ---
@@ -0,0 +1,300 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.streaming.sources
+
+import java.io.IOException
+import java.net.InetSocketAddress
+import java.nio.ByteBuffer
+import java.nio.channels.ServerSocketChannel
+import java.sql.Timestamp
+import java.util.Optional
+import java.util.concurrent.LinkedBlockingQueue
+
+import scala.collection.JavaConverters._
+
+import org.scalatest.BeforeAndAfterEach
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.execution.datasources.DataSource
+import org.apache.spark.sql.execution.streaming._
+import org.apache.spark.sql.sources.v2.{DataSourceOptions, 
MicroBatchReadSupport}
+import org.apache.spark.sql.sources.v2.reader.streaming.{MicroBatchReader, 
Offset}
+import org.apache.spark.sql.streaming.StreamTest
+import org.apache.spark.sql.test.SharedSQLContext
+import org.apache.spark.sql.types.{StringType, StructField, StructType, 
TimestampType}
+
+class TextSocketStreamSuite extends StreamTest with SharedSQLContext with 
BeforeAndAfterEach {
+
+  override def afterEach() {
+sqlContext.streams.active.foreach(_.stop())
+if (serverThread != null) {
+  serverThread.interrupt()
+  serverThread.join()
+  serverThread = null
+}
+if (batchReader != null) {
+  batchReader.stop()
+  batchReader = null
+}
+  }
+
+  private var serverThread: ServerThread = null
+  private var batchReader: MicroBatchReader = null
+
+  case class AddSocketData(data: String*) extends AddData {
+override def addData(query: Option[StreamExecution]): 
(BaseStreamingSource, Offset) = {
+  require(
+query.nonEmpty,
+"Cannot add data when there is no query for finding the active 
socket source")
+
+  val sources = query.get.logicalPlan.collect {
+case StreamingExecutionRelation(source: 
TextSocketMicroBatchReader, _) => source
+  }
+  if (sources.isEmpty) {
+throw new Exception(
+  "Could not find socket source in the StreamExecution logical 
plan to add data to")
+  } else if (sources.size > 1) {
+throw new Exception(
+  "Could not select the socket source in the StreamExecution 
logical plan as there" +
+"are multiple socket sources:\n\t" + sources.mkString("\n\t"))
+  }
+  val socketSource = sources.head
+
+  assert(serverThread != null && serverThread.port != 0)
+  val currOffset = socketSource.currentOffset
+  data.foreach(serverThread.enqueue)
+
+  val newOffset = LongOffset(currOffset.offset + data.size)
+  (socketSource, newOffset)
+}
+
+override def toString: String = s"AddSocketData(data = $data)"
+  }
+
+  test("backward compatibility with old path") {
+
DataSource.lookupDataSource("org.apache.spark.sql.execution.streaming.TextSocketSourceProvider",
+  spark.sqlContext.conf).newInstance() match {
+  case ds: MicroBatchReadSupport =>
+assert(ds.isInstanceOf[TextSocketSourceProvider])
+  case _ =>
+throw new IllegalStateException("Could not find socket source")
+}
+  }
+
+  test("basic usage") {
+serverThread = new ServerThread()
+serverThread.start()
+
+withSQLConf("spark.sql.streaming.unsupportedOperationCheck" -> 
"false") {
+  val ref = spark
+  import ref.implicits._
+
+  val socket = spark
+.readStream
+.format("socket")
+.options(Map("host" -> "localhost", "port" -> 

[GitHub] spark pull request #20295: [SPARK-23011] Support alternative function form w...

2018-02-28 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20295#discussion_r171469275
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -2253,6 +2253,30 @@ def pandas_udf(f=None, returnType=None, 
functionType=None):
|  2| 1.1094003924504583|
+---+---+
 
+   Alternatively, the user can define a function that takes two 
arguments.
+   In this case, the grouping key will be passed as the first argument 
and the data will
+   be passed as the second argument. The grouping key will be passed 
as a tuple of numpy
+   data types, e.g., `numpy.int32` and `numpy.float64`. The data will 
still be passed in
+   as a `pandas.DataFrame` containing all columns from the original 
Spark DataFrame.
+   This is useful when the user doesn't want to hardcode grouping key 
in the function.
+
+   >>> from pyspark.sql.functions import pandas_udf, PandasUDFType
+   >>> df = spark.createDataFrame(
+   ... [(1, 1.0), (1, 2.0), (2, 3.0), (2, 5.0), (2, 10.0)],
+   ... ("id", "v"))  # doctest: +SKIP
+   >>> @pandas_udf("id long, v double", PandasUDFType.GROUP_MAP)  # 
doctest: +SKIP
+   ... def mean_udf(key, pdf):
+   ... # key is a tuple of one numpy.int64, which is the value
+   ... # of 'id' for the current group
+   ... return pd.DataFrame([key + (pdf.v.mean(),)])
+   >>> df.groupby('id').apply(mean_udf).show()  #doctest: +SKIP
--- End diff --

I think it's because we couldn't find yet a min fix to enable the doctests 
only when PyArrow and Pandas are installed.

Maybe we can try to drop doctests right before we run `doctest.testmod` 
below conditionally but it's kind of a new approach to Spark as far as I know.

Will probably take a look for it separately soon. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20689: [SPARK-23533][SS] Add support for changing ContinuousDat...

2018-02-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20689
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20689: [SPARK-23533][SS] Add support for changing ContinuousDat...

2018-02-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20689
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87811/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20647: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20647
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1178/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...

2018-02-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20670
  
**[Test build #87817 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87817/testReport)**
 for PR 20670 at commit 
[`709ed39`](https://github.com/apache/spark/commit/709ed39052a032d0dc2258b2c637ab107d4b4df7).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20647: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20647
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20689: [SPARK-23533][SS] Add support for changing ContinuousDat...

2018-02-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20689
  
**[Test build #87811 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87811/testReport)**
 for PR 20689 at commit 
[`4bf17a7`](https://github.com/apache/spark/commit/4bf17a738de1b705ee673b8e889394ccbe972f47).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20647: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20647
  
**[Test build #87818 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87818/testReport)**
 for PR 20647 at commit 
[`6fe7681`](https://github.com/apache/spark/commit/6fe76817032cb9b6bac47f14b79d7a4041e286dd).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...

2018-02-28 Thread KaiXinXiaoLei
Github user KaiXinXiaoLei commented on the issue:

https://github.com/apache/spark/pull/20670
  
@gatorsmile   thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...

2018-02-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20670
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1177/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20647: [SPARK-23303][SQL] improve the explain result for...

2018-02-28 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20647#discussion_r171468670
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2StringFormat.scala
 ---
@@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.v2
+
+import org.apache.commons.lang3.StringUtils
+
+import org.apache.spark.sql.catalyst.expressions.Attribute
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.sources.DataSourceRegister
+import org.apache.spark.sql.sources.v2.DataSourceV2
+import org.apache.spark.sql.sources.v2.reader._
+import org.apache.spark.util.Utils
+
+/**
+ * A trait that can be used by data source v2 related query plans(both 
logical and physical), to
+ * provide a string format of the data source information for explain.
+ */
+trait DataSourceV2StringFormat {
+
+  /**
+   * The instance of this data source implementation. Note that we only 
consider its class in
+   * equals/hashCode, not the instance itself.
+   */
+  def source: DataSourceV2
+
+  /**
+   * The output of the data source reader, w.r.t. column pruning.
+   */
+  def output: Seq[Attribute]
+
+  /**
+   * The options for this data source reader.
+   */
+  def options: Map[String, String]
+
+  /**
+   * The created data source reader. Here we use it to get the filters 
that has been pushed down
+   * so far, itself doesn't take part in the equals/hashCode.
+   */
+  def reader: DataSourceReader
+
+  private lazy val filters = reader match {
+case s: SupportsPushDownCatalystFilters => 
s.pushedCatalystFilters().toSet
+case s: SupportsPushDownFilters => s.pushedFilters().toSet
+case _ => Set.empty
+  }
+
+  private def sourceName: String = source match {
+case registered: DataSourceRegister => registered.shortName()
+case _ => source.getClass.getSimpleName.stripSuffix("$")
+  }
+
+  def metadataString: String = {
+val entries = scala.collection.mutable.ArrayBuffer.empty[(String, 
String)]
+
+if (filters.nonEmpty) {
+  entries += "Pushed Filters" -> filters.mkString("[", ", ", "]")
+}
+
+// TODO: we should only display some standard options like path, 
table, etc.
--- End diff --

For followup, there are 2 proposals:
1. define some standard options and only display standard options, if they 
are specified.
2. Create a new mix-in interface to allow data source implementations to 
decide which options they want to show during explain.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...

2018-02-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20670
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20576: [SPARK-23389][CORE]When the shuffle dependency specifies...

2018-02-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20576
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20576: [SPARK-23389][CORE]When the shuffle dependency specifies...

2018-02-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20576
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87808/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20690: [SPARK-23532][Standalone]Improve data locality when laun...

2018-02-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20690
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20576: [SPARK-23389][CORE]When the shuffle dependency specifies...

2018-02-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20576
  
**[Test build #87808 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87808/testReport)**
 for PR 20576 at commit 
[`e409c4f`](https://github.com/apache/spark/commit/e409c4fecc6c80ed33b6dd8d3ac69bf7edbe0cb2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20690: [SPARK-23532][Standalone]Improve data locality when laun...

2018-02-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20690
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87809/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20695: [SPARK-21741][ML][PySpark] Python API for DataFrame-base...

2018-02-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20695
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87816/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20690: [SPARK-23532][Standalone]Improve data locality when laun...

2018-02-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20690
  
**[Test build #87809 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87809/testReport)**
 for PR 20690 at commit 
[`f7efb22`](https://github.com/apache/spark/commit/f7efb22ddea3dc8eeccc833086d5a82cbce7e530).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  case class RequestExecutors(appId: String, requestedTotal: Int,`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20695: [SPARK-21741][ML][PySpark] Python API for DataFrame-base...

2018-02-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20695
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20695: [SPARK-21741][ML][PySpark] Python API for DataFrame-base...

2018-02-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20695
  
**[Test build #87816 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87816/testReport)**
 for PR 20695 at commit 
[`b3e9ddd`](https://github.com/apache/spark/commit/b3e9dddc5eff082a892d109ad959369d5f5510a9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20685: [SPARK-23524] Big local shuffle blocks should not be che...

2018-02-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20685
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20685: [SPARK-23524] Big local shuffle blocks should not be che...

2018-02-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20685
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87807/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20685: [SPARK-23524] Big local shuffle blocks should not be che...

2018-02-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20685
  
**[Test build #87807 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87807/testReport)**
 for PR 20685 at commit 
[`110c851`](https://github.com/apache/spark/commit/110c8510dcc6c2abaf4ca416b95854daf129b0a5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain

2018-02-28 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20692
  
We should clearly define when and where we need to display attribute data 
type. I think leaf nodes and some nodes that produce new data like `Generate` 
are good places. And we may also need to introduce a debug mode for explain. 
Personally most of the time I only focus on the shape of the query plan, not 
each attribute. The data type info is only needed when doing some deep 
debugging.

also cc @rdblue


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20681: [SPARK-23518][SQL] Avoid metastore access when the users...

2018-02-28 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/20681
  
looks like test failures are related?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20043: [SPARK-22856][SQL] Add wrappers for codegen output and n...

2018-02-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20043
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20043: [SPARK-22856][SQL] Add wrappers for codegen output and n...

2018-02-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20043
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87810/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20295: [SPARK-23011] Support alternative function form w...

2018-02-28 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/20295#discussion_r171466325
  
--- Diff: python/pyspark/sql/types.py ---
@@ -1725,6 +1737,29 @@ def _get_local_timezone():
 return os.environ.get('TZ', 'dateutil/:')
 
 
+def _check_series_localize_timestamps(s, timezone):
+"""
+Convert timezone aware timestamps to timezone-naive in the specified 
timezone or local timezone.
+
+If the input series is not a timestamp series, then the same series is 
returned. If the input
+series is a timestamp series, then a converted series is returned.
+
+:param s: pandas.Series
+:param timezone: the timezone to convert. if None then use local 
timezone
+:return pandas.Series that have been converted to tz-naive
+"""
+from pyspark.sql.utils import require_minimum_pandas_version
+require_minimum_pandas_version()
+
+from pandas.api.types import is_datetime64tz_dtype
--- End diff --

do we have tests for tese in tests.py?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20295: [SPARK-23011] Support alternative function form w...

2018-02-28 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/20295#discussion_r171465908
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -2253,6 +2253,30 @@ def pandas_udf(f=None, returnType=None, 
functionType=None):
|  2| 1.1094003924504583|
+---+---+
 
+   Alternatively, the user can define a function that takes two 
arguments.
+   In this case, the grouping key will be passed as the first argument 
and the data will
+   be passed as the second argument. The grouping key will be passed 
as a tuple of numpy
+   data types, e.g., `numpy.int32` and `numpy.float64`. The data will 
still be passed in
+   as a `pandas.DataFrame` containing all columns from the original 
Spark DataFrame.
+   This is useful when the user doesn't want to hardcode grouping key 
in the function.
+
+   >>> from pyspark.sql.functions import pandas_udf, PandasUDFType
+   >>> df = spark.createDataFrame(
+   ... [(1, 1.0), (1, 2.0), (2, 3.0), (2, 5.0), (2, 10.0)],
+   ... ("id", "v"))  # doctest: +SKIP
+   >>> @pandas_udf("id long, v double", PandasUDFType.GROUP_MAP)  # 
doctest: +SKIP
+   ... def mean_udf(key, pdf):
+   ... # key is a tuple of one numpy.int64, which is the value
+   ... # of 'id' for the current group
+   ... return pd.DataFrame([key + (pdf.v.mean(),)])
+   >>> df.groupby('id').apply(mean_udf).show()  #doctest: +SKIP
--- End diff --

why skip all of these btw? why not run them so they can be tested?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20043: [SPARK-22856][SQL] Add wrappers for codegen output and n...

2018-02-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20043
  
**[Test build #87810 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87810/testReport)**
 for PR 20043 at commit 
[`37ae9b0`](https://github.com/apache/spark/commit/37ae9b0e217de323dbc73c9e1247ebe9bf2c278c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20695: [SPARK-21741][ML][PySpark] Python API for DataFrame-base...

2018-02-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20695
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20695: [SPARK-21741][ML][PySpark] Python API for DataFrame-base...

2018-02-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20695
  
**[Test build #87816 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87816/testReport)**
 for PR 20695 at commit 
[`b3e9ddd`](https://github.com/apache/spark/commit/b3e9dddc5eff082a892d109ad959369d5f5510a9).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20695: [SPARK-21741][ML][PySpark] Python API for DataFrame-base...

2018-02-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20695
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1176/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20682: [SPARK-23522][Python] always use sys.exit over builtin e...

2018-02-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20682
  
**[Test build #87815 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87815/testReport)**
 for PR 20682 at commit 
[`3d28bbf`](https://github.com/apache/spark/commit/3d28bbf9f218ce50ab08fb3e9e62ed9e2fc2307b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20682: [SPARK-23522][Python] always use sys.exit over builtin e...

2018-02-28 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/20682
  
retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20692: [SPARK-23531][SQL] Show attribute type in explain

2018-02-28 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20692#discussion_r171464091
  
--- Diff: 
sql/core/src/test/resources/sql-tests/results/inline-table.sql.out ---
@@ -166,18 +166,18 @@ struct
 
 == Analyzed Logical Plan ==
 col1: string, col2: int, col1: string, col2: int
-Project [col1#x, col2#x, col1#x, col2#x]
+Project [col1#x: string, col2#x: int, col1#x: string, col2#x: int]
 +- Join Cross
-   :- LocalRelation [col1#x, col2#x]
-   +- LocalRelation [col1#x, col2#x]
+   :- LocalRelation [col1#x: string, col2#x: int]
+   +- LocalRelation [col1#x: string, col2#x: int]
--- End diff --

Repeatedly showing the data types for the same attributes may not useful. 
Seems too verbose. For a big query plan, it will be filled with redundant info.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20697: Initial checkin of k8s integration tests.

2018-02-28 Thread liyinan926
Github user liyinan926 commented on the issue:

https://github.com/apache/spark/pull/20697
  
@ssuchter the jira ticket for this is 
https://issues.apache.org/jira/browse/SPARK-23010.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20670: [SPARK-23405] Generate additional constraints for...

2018-02-28 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20670#discussion_r171463022
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/QueryPlanConstraints.scala
 ---
@@ -22,21 +22,24 @@ import org.apache.spark.sql.catalyst.expressions._
 
 trait QueryPlanConstraints { self: LogicalPlan =>
 
+  /**
+   * An [[ExpressionSet]] that contains an additional set of constraints, 
such as equality
+   * constraints and `isNotNull` constraints, etc.
+   */
+  lazy val allConstraints: ExpressionSet = ExpressionSet(validConstraints
--- End diff --

We still need `if (conf.constraintPropagationEnabled)`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20670: [SPARK-23405] Generate additional constraints for...

2018-02-28 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20670#discussion_r171462811
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/QueryPlanConstraints.scala
 ---
@@ -22,21 +22,24 @@ import org.apache.spark.sql.catalyst.expressions._
 
 trait QueryPlanConstraints { self: LogicalPlan =>
 
+  /**
+   * An [[ExpressionSet]] that contains an additional set of constraints, 
such as equality
+   * constraints and `isNotNull` constraints, etc.
+   */
+  lazy val allConstraints: ExpressionSet = ExpressionSet(validConstraints
+.union(inferAdditionalConstraints(validConstraints))
+.union(constructIsNotNullConstraints(validConstraints)))
--- End diff --

Nit: indents


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20696: [SPARK-23525] [SQL] Support ALTER TABLE CHANGE CO...

2018-02-28 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20696#discussion_r171462718
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -314,8 +314,8 @@ case class AlterTableChangeColumnCommand(
 val resolver = sparkSession.sessionState.conf.resolver
 DDLUtils.verifyAlterTableType(catalog, table, isView = false)
 
-// Find the origin column from schema by column name.
-val originColumn = findColumnByName(table.schema, columnName, resolver)
+// Find the origin column from dataSchema by column name.
+val originColumn = findColumnByName(table.dataSchema, columnName, 
resolver)
--- End diff --

Do we have a negative test case to cover that?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...

2018-02-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20670
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...

2018-02-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20670
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87804/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...

2018-02-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20670
  
**[Test build #87804 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87804/testReport)**
 for PR 20670 at commit 
[`023f2f7`](https://github.com/apache/spark/commit/023f2f709db484d82cde22b00db0bad33ac72279).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous ...

2018-02-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19788
  
**[Test build #87814 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87814/testReport)**
 for PR 19788 at commit 
[`fc0fe77`](https://github.com/apache/spark/commit/fc0fe77cc4f1222ffd8a4a492e623ce43fd1f28c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20686: [SPARK-22915][MLlib] Streaming tests for spark.ml.featur...

2018-02-28 Thread WeichenXu123
Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/20686
  
Thanks! I will help review it later.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20576: [SPARK-23389][CORE]When the shuffle dependency sp...

2018-02-28 Thread 10110346
Github user 10110346 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20576#discussion_r171452995
  
--- Diff: 
core/src/test/scala/org/apache/spark/shuffle/sort/SortShuffleManagerSuite.scala 
---
@@ -85,6 +85,14 @@ class SortShuffleManagerSuite extends SparkFunSuite with 
Matchers {
   mapSideCombine = false
 )))
 
+// We support serialized shuffle if we do not need to do map-side 
aggregation
+assert(canUseSerializedShuffle(shuffleDep(
+  partitioner = new HashPartitioner(2),
+  serializer = kryo,
+  keyOrdering = None,
+  aggregator = Some(mock(classOf[Aggregator[Any, Any, Any]])),
+  mapSideCombine = false
--- End diff --

You can see this code: `def groupByKey(partitioner: Partitioner): RDD[(K, 
Iterable[V])]`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20449: [SPARK-23040][CORE]: Returns interruptible iterator for ...

2018-02-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20449
  
**[Test build #87813 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87813/testReport)**
 for PR 20449 at commit 
[`756e0b7`](https://github.com/apache/spark/commit/756e0b7336fff3c72eca70c2ab489600211b9253).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20658: [SPARK-23488][python] Add missing catalog methods to pyt...

2018-02-28 Thread drboyer
Github user drboyer commented on the issue:

https://github.com/apache/spark/pull/20658
  
Thanks @holdenk, I can open a separate JIRA about the missing field in 
`Function` if it seems worth fixing. It wasn't critical for me, I just happened 
to notice while doing some testing so I included it in my inital commit.

I hadn't added more complex docstrings just since these seemed like pretty 
simple methods with straightforward parameters. Happy to add :param: and 
:return: annotations if desired, but should we add these to some of the other 
catalog methods as well if we're adding it to these new ones (thinking 
especially of the `list*()` methods)?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous ...

2018-02-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19788
  
**[Test build #87812 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87812/testReport)**
 for PR 19788 at commit 
[`c133776`](https://github.com/apache/spark/commit/c13377601da21368955335eb9f10e72c4ac18738).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20689: [SPARK-23533][SS] Add support for changing ContinuousDat...

2018-02-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20689
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20689: [SPARK-23533][SS] Add support for changing ContinuousDat...

2018-02-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20689
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1175/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20689: [SPARK-23533][SS] Add support for changing ContinuousDat...

2018-02-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20689
  
**[Test build #87811 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87811/testReport)**
 for PR 20689 at commit 
[`4bf17a7`](https://github.com/apache/spark/commit/4bf17a738de1b705ee673b8e889394ccbe972f47).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20576: [SPARK-23389][CORE]When the shuffle dependency sp...

2018-02-28 Thread Ngone51
Github user Ngone51 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20576#discussion_r171451103
  
--- Diff: 
core/src/test/scala/org/apache/spark/shuffle/sort/SortShuffleManagerSuite.scala 
---
@@ -85,6 +85,14 @@ class SortShuffleManagerSuite extends SparkFunSuite with 
Matchers {
   mapSideCombine = false
 )))
 
+// We support serialized shuffle if we do not need to do map-side 
aggregation
+assert(canUseSerializedShuffle(shuffleDep(
+  partitioner = new HashPartitioner(2),
+  serializer = kryo,
+  keyOrdering = None,
+  aggregator = Some(mock(classOf[Aggregator[Any, Any, Any]])),
+  mapSideCombine = false
--- End diff --

Under what scenario will ```mapSideCombine``` be ```false```, but an 
```aggregator ```  set ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20043: [SPARK-22856][SQL] Add wrappers for codegen output and n...

2018-02-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20043
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1174/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20043: [SPARK-22856][SQL] Add wrappers for codegen output and n...

2018-02-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20043
  
**[Test build #87810 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87810/testReport)**
 for PR 20043 at commit 
[`37ae9b0`](https://github.com/apache/spark/commit/37ae9b0e217de323dbc73c9e1247ebe9bf2c278c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20043: [SPARK-22856][SQL] Add wrappers for codegen output and n...

2018-02-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20043
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20043: [SPARK-22856][SQL] Add wrappers for codegen outpu...

2018-02-28 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20043#discussion_r171447686
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/ExprValue.scala
 ---
@@ -0,0 +1,82 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions.codegen
+
+import scala.language.implicitConversions
+
+import org.apache.spark.sql.types.DataType
+
+// An abstraction that represents the evaluation result of [[ExprCode]].
+abstract class ExprValue {
+
+  val javaType: ExprType
+
+  // Whether we can directly access the evaluation value anywhere.
+  // For example, a variable created outside a method can not be accessed 
inside the method.
+  // For such cases, we may need to pass the evaluation as parameter.
+  val canDirectAccess: Boolean
+}
+
+object ExprValue {
+  implicit def exprValueToString(exprValue: ExprValue): String = 
exprValue.toString
+}
+
+// A literal evaluation of [[ExprCode]].
+class LiteralValue(val value: String, val javaType: ExprType) extends 
ExprValue {
+  override def toString: String = value
+  override val canDirectAccess: Boolean = true
+}
+
+object LiteralValue {
+  def apply(value: String, javaType: ExprType): LiteralValue = new 
LiteralValue(value, javaType)
+  def unapply(literal: LiteralValue): Option[(String, ExprType)] =
+Some((literal.value, literal.javaType))
+}
+
+// A variable evaluation of [[ExprCode]].
+case class VariableValue(
+val variableName: String,
+val javaType: ExprType,
+val canDirectAccess: Boolean = false) extends ExprValue {
--- End diff --

I want to give it a bit flexibility for something like static variable.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20698: [SPARK-23541][SS] Allow Kafka source to read data with g...

2018-02-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20698
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87806/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20698: [SPARK-23541][SS] Allow Kafka source to read data with g...

2018-02-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20698
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20698: [SPARK-23541][SS] Allow Kafka source to read data with g...

2018-02-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20698
  
**[Test build #87806 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87806/testReport)**
 for PR 20698 at commit 
[`5f066a0`](https://github.com/apache/spark/commit/5f066a058f685a394397244cb46b022483f7e892).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20690: [SPARK-23532][Standalone]Improve data locality when laun...

2018-02-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20690
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1173/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20690: [SPARK-23532][Standalone]Improve data locality when laun...

2018-02-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20690
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20690: [SPARK-23532][Standalone]Improve data locality when laun...

2018-02-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20690
  
**[Test build #87809 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87809/testReport)**
 for PR 20690 at commit 
[`f7efb22`](https://github.com/apache/spark/commit/f7efb22ddea3dc8eeccc833086d5a82cbce7e530).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20690: [SPARK-23532][Standalone]Improve data locality when laun...

2018-02-28 Thread 10110346
Github user 10110346 commented on the issue:

https://github.com/apache/spark/pull/20690
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20698: [SPARK-23541][SS] Allow Kafka source to read data with g...

2018-02-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20698
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20698: [SPARK-23541][SS] Allow Kafka source to read data with g...

2018-02-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20698
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87805/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20698: [SPARK-23541][SS] Allow Kafka source to read data with g...

2018-02-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20698
  
**[Test build #87805 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87805/testReport)**
 for PR 20698 at commit 
[`ebb9b51`](https://github.com/apache/spark/commit/ebb9b51c51a4411811a7e0e09fff8f8608faa017).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20382: [SPARK-23097][SQL][SS] Migrate text socket source to V2

2018-02-28 Thread jerryshao
Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/20382
  
Sure, I will do it today.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20576: [SPARK-23389][CORE]When the shuffle dependency specifies...

2018-02-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20576
  
**[Test build #87808 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87808/testReport)**
 for PR 20576 at commit 
[`e409c4f`](https://github.com/apache/spark/commit/e409c4fecc6c80ed33b6dd8d3ac69bf7edbe0cb2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20576: [SPARK-23389][CORE]When the shuffle dependency specifies...

2018-02-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20576
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20576: [SPARK-23389][CORE]When the shuffle dependency specifies...

2018-02-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20576
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1172/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20043: [SPARK-22856][SQL] Add wrappers for codegen outpu...

2018-02-28 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20043#discussion_r171444357
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
 ---
@@ -31,7 +31,7 @@ import org.apache.spark.sql.catalyst.InternalRow
 import org.apache.spark.sql.catalyst.ScalaReflection.universe.TermName
 import org.apache.spark.sql.catalyst.encoders.RowEncoder
 import org.apache.spark.sql.catalyst.expressions._
-import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, 
ExprCode}
+import org.apache.spark.sql.catalyst.expressions.codegen._
--- End diff --

It will list too many classes `CodegenContext`, `ExprCode`, `ExprValue`, 
`GlobalValue`, `FalseLiteral`, `VariableValue`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20685: [SPARK-23524] Big local shuffle blocks should not be che...

2018-02-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20685
  
**[Test build #87807 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87807/testReport)**
 for PR 20685 at commit 
[`110c851`](https://github.com/apache/spark/commit/110c8510dcc6c2abaf4ca416b95854daf129b0a5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20685: [SPARK-23524] Big local shuffle blocks should not be che...

2018-02-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20685
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   >