date:20180827

[GitHub] spark pull request #22162: [spark-24442][SQL] Added parameters to control th...

2018-08-27 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/22162#discussion_r213158056
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala 
---
@@ -969,6 +969,22 @@ class DatasetSuite extends QueryTest with 
SharedSQLContext {
 checkShowString(ds, expected)
   }
 
+
+  test("SPARK-2444git stat2 Show should follow 
spark.show.default.number.of.rows") {
+withSQLConf("spark.sql.show.defaultNumRows" -> "100") {
+  val ds = (1 to 1000).toDS().as[Int].show
--- End diff --

I think its ok to check the output number of rows in show.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22162: [spark-24442][SQL] Added parameters to control the defau...

2018-08-27 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/22162
  
ya, sure.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22162: [spark-24442][SQL] Added parameters to control th...

2018-08-27 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/22162#discussion_r213157406
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -815,6 +815,24 @@ class Dataset[T] private[sql](
 println(showString(numRows, truncate, vertical))
   // scalastyle:on println
 
+  /**
+   * Returns the default number of rows to show when the show function is 
called without
+   * a user specified max number of rows.
+   * @since 2.3.0
+   */
+  private def numberOfRowsToShow(): Int = {
+this.sparkSession.conf.get("spark.sql.show.defaultNumRows", "20").toInt
--- End diff --

+1


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22162: [spark-24442][SQL] Added parameters to control the defau...

2018-08-27 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/22162
  
We should wait @AndrewKL for few days?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21546: [SPARK-23030][SQL][PYTHON] Use Arrow stream format for c...

2018-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21546
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21546: [SPARK-23030][SQL][PYTHON] Use Arrow stream format for c...

2018-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21546
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95310/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21546: [SPARK-23030][SQL][PYTHON] Use Arrow stream format for c...

2018-08-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21546
  
**[Test build #95310 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95310/testReport)**
 for PR 21546 at commit 
[`2fe46f8`](https://github.com/apache/spark/commit/2fe46f82dc38af972bc0974aca1fd846bcb483e5).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22227: [SPARK-25202] [SQL] Implements split with limit s...

2018-08-27 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/7#discussion_r213154483
  
--- Diff: sql/core/src/test/resources/sql-tests/inputs/string-functions.sql 
---
@@ -5,6 +5,10 @@ select format_string();
 -- A pipe operator for string concatenation
 select 'a' || 'b' || 'c';
 
+-- split function
+select split('aa1cc2ee', '[1-9]+', 2);
+select split('aa1cc2ee', '[1-9]+');
+
--- End diff --

Can you move these tests to the end of this file in order to decrease 
unnecessary changes in the golden file.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...

2018-08-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22198
  
**[Test build #95322 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95322/testReport)**
 for PR 22198 at commit 
[`83387f6`](https://github.com/apache/spark/commit/83387f6f3b86532a79e83e8483c5e4683ff8beac).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...

2018-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22198
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...

2018-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22198
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2597/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22162: [spark-24442][SQL] Added parameters to control the defau...

2018-08-27 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/22162
  
I have much bandwidh to take it, too. Is it ok to take it over? @mgaido91 
not working on this now? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21976: [SPARK-24909][core] Always unregister pending partition ...

2018-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21976
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21976: [SPARK-24909][core] Always unregister pending partition ...

2018-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21976
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95307/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22192: [SPARK-24918][Core] Executor Plugin API

2018-08-27 Thread NiharS

Github user NiharS commented on a diff in the pull request:

https://github.com/apache/spark/pull/22192#discussion_r213150133
  
--- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
@@ -130,6 +130,16 @@ private[spark] class Executor(
   private val urlClassLoader = createClassLoader()
   private val replClassLoader = addReplClassLoaderIfNeeded(urlClassLoader)
 
+  // One thread will handle loading all of the plugins on this executor
--- End diff --

That does make sense. While I did say "aside from semantics", semantics is 
a good reason to include it. Especially since it'll be harder to get plugin 
writers to adopt an `init` function later. I'll make the other changes and make 
sure the tests still pass, if anyone does feel strongly (or even weakly) on one 
way over another I don't think there's much harm in either approach.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21976: [SPARK-24909][core] Always unregister pending partition ...

2018-08-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21976
  
**[Test build #95307 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95307/testReport)**
 for PR 21976 at commit 
[`e384245`](https://github.com/apache/spark/commit/e384245f7b0c6c43e6e0e0f7b73528b5c355e2f1).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22238: [SPARK-25245][DOCS][SS] Explain regarding limiting modif...

2018-08-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22238
  
**[Test build #95321 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95321/testReport)**
 for PR 22238 at commit 
[`138cc63`](https://github.com/apache/spark/commit/138cc63e639b60fb7e803097654816ad6c19c95f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22149: [SPARK-25158][SQL]Executor accidentally exit because Scr...

2018-08-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22149
  
**[Test build #95320 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95320/testReport)**
 for PR 22149 at commit 
[`412497f`](https://github.com/apache/spark/commit/412497f2ad615e5aeecb91e7fd5053864a00be37).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22210: [SPARK-25218][Core]Fix potential resource leaks in Trans...

2018-08-27 Thread brkyvz

Github user brkyvz commented on the issue:

https://github.com/apache/spark/pull/22210
  
LGTM! Good catches


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22149: [SPARK-25158][SQL]Executor accidentally exit because Scr...

2018-08-27 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22149
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22149: [SPARK-25158][SQL]Executor accidentally exit because Scr...

2018-08-27 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22149
  
Is that possible to add a test case?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-08-27 Thread holdenk

Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/22010
  
I did a quick micro-benchmark on this and got:

> scala> :paste
> // Entering paste mode (ctrl-D to finish)
> 
> import scala.collection.{mutable, Map}
> def removeDuplicatesInPartition(itr: Iterator[Int]): Iterator[Int] = {
> val set = new mutable.HashSet[Int]()
> itr.filter(set.add(_))
> }
> 
> def time[R](block: => R): (Long, R) = {
> val t0 = System.nanoTime()
> val result = block// call-by-name
> val t1 = System.nanoTime()
> println("Elapsed time: " + (t1 - t0) + "ns")
> (t1, result)
> }
> 
> val count = 100
> val inputData = sc.parallelize(1.to(count)).cache()
> inputData.count()
> 
> val o1 = time(inputData.distinct().count())
> val n1 = 
time(inputData.mapPartitions(removeDuplicatesInPartition).count())
> val n2 = 
time(inputData.mapPartitions(removeDuplicatesInPartition).count())
> val o2 = time(inputData.distinct().count())
> val n3 = 
time(inputData.mapPartitions(removeDuplicatesInPartition).count())
> 
> 
> // Exiting paste mode, now interpreting.
> 
> Elapsed time: 2464151504ns
  
> Elapsed time: 219130154ns
> Elapsed time: 133545428ns
> Elapsed time: 927133584ns 
  
> Elapsed time: 242432642ns
> import scala.collection.{mutable, Map}
> removeDuplicatesInPartition: (itr: Iterator[Int])Iterator[Int]
> time: [R](block: => R)(Long, R)
> count: Int = 100
> inputData: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[19] at 
parallelize at :47
> o1: (Long, Long) = (437102431151279,100)
> n1: (Long, Long) = (437102654798968,100)
> n2: (Long, Long) = (437102792389328,100)
> o2: (Long, Long) = (437103724196085,100)
> n3: (Long, Long) = (437103971061275,100)
> 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the aggregated stage metrics b...

2018-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22209
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95305/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the aggregated stage metrics b...

2018-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22209
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22211: [SPARK-23207][SPARK-22905][SPARK-24564][SPARK-25114][SQL...

2018-08-27 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22211
  
Thanks! Merged to 2.1


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the aggregated stage metrics b...

2018-08-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22209
  
**[Test build #95305 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95305/testReport)**
 for PR 22209 at commit 
[`0552af0`](https://github.com/apache/spark/commit/0552af0abb484c1b9129a0091b2057e06d5ab4ac).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22209: [SPARK-24415][Core] Fixed the aggregated stage me...

2018-08-27 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22209#discussion_r213143932
  
--- Diff: 
streaming/src/test/scala/org/apache/spark/streaming/UISeleniumSuite.scala ---
@@ -77,7 +77,14 @@ class UISeleniumSuite
 inputStream.foreachRDD { rdd =>
   rdd.foreach(_ => {})
   try {
-rdd.foreach(_ => throw new RuntimeException("Oops"))
+rdd.foreach(_ => {
--- End diff --

Since you're touching this: `.foreach { _ =>`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22209: [SPARK-24415][Core] Fixed the aggregated stage me...

2018-08-27 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22209#discussion_r213143804
  
--- Diff: 
core/src/main/scala/org/apache/spark/status/AppStatusListener.scala ---
@@ -350,11 +350,22 @@ private[spark] class AppStatusListener(
 val e = it.next()
 if (job.stageIds.contains(e.getKey()._1)) {
   val stage = e.getValue()
-  stage.status = v1.StageStatus.SKIPPED
-  job.skippedStages += stage.info.stageId
-  job.skippedTasks += stage.info.numTasks
-  it.remove()
-  update(stage, now)
+  // Only update the stage if it has not finished already
+  if (v1.StageStatus.ACTIVE.equals(stage.status) ||
--- End diff --

So I went back and took a closer look and I think this isn't entirely 
correct (and wasn't entirely correct before either).

If I remember the semantics correctly, the stage should be skipped if it is 
part of the job's stages, and is in the pending state when the job finishes.

If it's in the active state, it should not be marked as skipped. If you do 
that, the update to the skipped tasks (in L358) will most certainly be wrong.

So if the state is still active here, it means some event was missed. The 
best we can do in that case, I think, is remove it from the live stages list 
and update the pool data, and that's it.

On a related note, if the "onStageSubmitted" event is missed, the stage 
will remain in the "pending" state even if tasks start on it. Perhaps that 
could also be added to the "onTaskStart" handler, just to be sure the stage is 
marked as active.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22042: [SPARK-25005][SS]Support non-consecutive offsets for Kaf...

2018-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22042
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22042: [SPARK-25005][SS]Support non-consecutive offsets for Kaf...

2018-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22042
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95319/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22042: [SPARK-25005][SS]Support non-consecutive offsets for Kaf...

2018-08-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22042
  
**[Test build #95319 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95319/testReport)**
 for PR 22042 at commit 
[`ea804cf`](https://github.com/apache/spark/commit/ea804cfe840196519cc9444be9bedf03d10aa11a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22192: [SPARK-24918][Core] Executor Plugin API

2018-08-27 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22192#discussion_r213142394
  
--- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
@@ -130,6 +130,16 @@ private[spark] class Executor(
   private val urlClassLoader = createClassLoader()
   private val replClassLoader = addReplClassLoaderIfNeeded(urlClassLoader)
 
+  // One thread will handle loading all of the plugins on this executor
--- End diff --

I guess it could be in the constructor; `Utils.loadExtensions` already 
provides a `SparkConf` to the constructor if one accepts it, which was the only 
thing I could think of.

I generally dislike plugin APIs that encourage initialization in the 
constructor, but here, other than maybe potentially some benefit for testing, 
I'm not seeing a lot of differences in not having the init method after all...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22188: [SPARK-25164][SQL] Avoid rebuilding column and path list...

2018-08-27 Thread bersprockets

Github user bersprockets commented on the issue:

https://github.com/apache/spark/pull/22188
  
@gatorsmile Thanks much!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22192: [SPARK-24918][Core] Executor Plugin API

2018-08-27 Thread NiharS

Github user NiharS commented on a diff in the pull request:

https://github.com/apache/spark/pull/22192#discussion_r213140764
  
--- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
@@ -130,6 +130,16 @@ private[spark] class Executor(
   private val urlClassLoader = createClassLoader()
   private val replClassLoader = addReplClassLoaderIfNeeded(urlClassLoader)
 
+  // One thread will handle loading all of the plugins on this executor
--- End diff --

Aside from semantics, would an `init` method be necessary instead of having 
the initialization logic be in the plugin's constructor? Since the class loader 
is going to call the constructor immediately, I figure having an `init` 
function would only really make a difference if we want to load the plugins 
right here, and then call `init` at a later point in the executor's creation. I 
can't think of any particular reason why we'd want to do that, unless there's 
specific executor structures that we want created prior to plugin 
initialization (although in that case we could also just move the plugin 
initialization further down)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22247: [SPARK-25253][PYSPARK] Refactor local connection & auth ...

2018-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22247
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95303/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22247: [SPARK-25253][PYSPARK] Refactor local connection & auth ...

2018-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22247
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22247: [SPARK-25253][PYSPARK] Refactor local connection & auth ...

2018-08-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22247
  
**[Test build #95303 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95303/testReport)**
 for PR 22247 at commit 
[`c232ec6`](https://github.com/apache/spark/commit/c232ec63f80eea05d3756feb22e53aa5a1e67d93).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22233: [SPARK-25240][SQL] Fix for a deadlock in RECOVER ...

2018-08-27 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/22233#discussion_r213138024
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -671,7 +674,7 @@ case class AlterTableRecoverPartitionsCommand(
 val value = ExternalCatalogUtils.unescapePathName(ps(1))
 if (resolver(columnName, partitionNames.head)) {
   scanPartitions(spark, fs, filter, st.getPath, spec ++ 
Map(partitionNames.head -> value),
-partitionNames.drop(1), threshold, resolver)
+partitionNames.drop(1), threshold, resolver, 
listFilesInParallel = false)
--- End diff --

Does it mean there is no avaiable thread in a given thread pool when a 
problem try to execute a new `Future`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22188: [SPARK-25164][SQL] Avoid rebuilding column and path list...

2018-08-27 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22188
  
Normally, we do not backport such improvement PRs. However, the risk of 
this PR is pretty small. I think it is fine. Let me do this. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22233: [SPARK-25240][SQL] Fix for a deadlock in RECOVER ...

2018-08-27 Thread zsxwing

Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/22233#discussion_r213137139
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -671,7 +674,7 @@ case class AlterTableRecoverPartitionsCommand(
 val value = ExternalCatalogUtils.unescapePathName(ps(1))
 if (resolver(columnName, partitionNames.head)) {
   scanPartitions(spark, fs, filter, st.getPath, spec ++ 
Map(partitionNames.head -> value),
-partitionNames.drop(1), threshold, resolver)
+partitionNames.drop(1), threshold, resolver, 
listFilesInParallel = false)
--- End diff --

@MaxGekk could you revert to use Scala `par`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22042: [SPARK-25005][SS]Support non-consecutive offsets for Kaf...

2018-08-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22042
  
**[Test build #95319 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95319/testReport)**
 for PR 22042 at commit 
[`ea804cf`](https://github.com/apache/spark/commit/ea804cfe840196519cc9444be9bedf03d10aa11a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22246: [WIP] [SPARK-25235] [SHELL] Merge the REPL code in Scala...

2018-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22246
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22246: [WIP] [SPARK-25235] [SHELL] Merge the REPL code in Scala...

2018-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22246
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95304/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22042: [SPARK-25005][SS]Support non-consecutive offsets for Kaf...

2018-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22042
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2596/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22042: [SPARK-25005][SS]Support non-consecutive offsets for Kaf...

2018-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22042
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22188: [SPARK-25164][SQL] Avoid rebuilding column and path list...

2018-08-27 Thread bersprockets

Github user bersprockets commented on the issue:

https://github.com/apache/spark/pull/22188
  
@gatorsmile 
>Why 2.2 only?

Only that I forgot that master is already on 2.4. We should do 2.3 as well, 
but I haven't tested it yet.

Do I need to do anything on my end to get it into 2.2, and once I test, 
into 2.3?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22246: [WIP] [SPARK-25235] [SHELL] Merge the REPL code in Scala...

2018-08-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22246
  
**[Test build #95304 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95304/testReport)**
 for PR 22246 at commit 
[`6203f83`](https://github.com/apache/spark/commit/6203f83008950a811b33bba97b99540716d27833).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22183: [SPARK-25132][SQL][BACKPORT-2.3] Case-insensitive field ...

2018-08-27 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22183
  
For Hive tables, column resolution is always case insensitive. However, 
When `spark.sql.hive.convertMetastoreParquet` is true, users might face 
inconsistent behaviors when they use native parquet reader to resolve the 
columns in the case sensitive mode. We still introduce behavior changes. Better 
error messages sounds good enough, instead of disabling 
`spark.sql.hive.convertMetastoreParquet` when the mode is case sensitive. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22184: [SPARK-25132][SQL][DOC] Add migration doc for cas...

2018-08-27 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22184#discussion_r213135626
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1895,6 +1895,10 @@ working with timestamps in `pandas_udf`s to get the 
best performance, see
   - Since Spark 2.4, File listing for compute statistics is done in 
parallel by default. This can be disabled by setting 
`spark.sql.parallelFileListingInStatsComputation.enabled` to `False`.
   - Since Spark 2.4, Metadata files (e.g. Parquet summary files) and 
temporary files are not counted as data files when calculating table size 
during Statistics computation.
 
+## Upgrading From Spark SQL 2.3.1 to 2.3.2 and above
+
+  - In version 2.3.1 and earlier, when reading from a Parquet table, Spark 
always returns null for any column whose column names in Hive metastore schema 
and Parquet schema are in different letter cases, no matter whether 
`spark.sql.caseSensitive` is set to true or false. Since 2.3.2, when 
`spark.sql.caseSensitive` is set to false, Spark does case insensitive column 
name resolution between Hive metastore schema and Parquet schema, so even 
column names are in different letter cases, Spark returns corresponding column 
values. An exception is thrown if there is ambiguity, i.e. more than one 
Parquet column is matched.
--- End diff --

For Hive tables, column resolution is always case insensitive. However, 
When `spark.sql.hive.convertMetastoreParquet` is true, users might face 
inconsistent behaviors when they use native parquet reader to resolve the 
columns in the case sensitive mode. We still introduce behavior changes. Better 
error messages sounds good enough, instead of disabling 
`spark.sql.hive.convertMetastoreParquet` when the mode is case sensitive.  cc 
@cloud-fan 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17280: [SPARK-19939] [ML] Add support for association rules in ...

2018-08-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17280
  
**[Test build #95318 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95318/testReport)**
 for PR 17280 at commit 
[`733c7ff`](https://github.com/apache/spark/commit/733c7ff70c46f0c54cdf520b44645544b810e04e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17280: [SPARK-19939] [ML] Add support for association rules in ...

2018-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17280
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2595/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17280: [SPARK-19939] [ML] Add support for association rules in ...

2018-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17280
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22188: [SPARK-25164][SQL] Avoid rebuilding column and path list...

2018-08-27 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22188
  
@bersprockets The risk is pretty small I think. I am fine to backport it to 
the previous versions. Why 2.2 only?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22193: [SPARK-25186][SQL] Remove v2 save mode.

2018-08-27 Thread rdblue

Github user rdblue commented on the issue:

https://github.com/apache/spark/pull/22193
  
@HyukjinKwon, those changes probably don't need to be in this PR, but this 
is just a demonstration that we can remove `SaveMode` without changing test 
cases. The larger issue is that this doesn't correctly use CTAS or RTAS plans. 
Instead, it does things like directly deleting data.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22104: [SPARK-24721][SQL] Exclude Python UDFs filters in FileSo...

2018-08-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22104
  
**[Test build #95317 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95317/testReport)**
 for PR 22104 at commit 
[`2325a4f`](https://github.com/apache/spark/commit/2325a4f18a2bc6cc95d96bc5ac6790749b3e927e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22104: [SPARK-24721][SQL] Exclude Python UDFs filters in FileSo...

2018-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22104
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2594/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22104: [SPARK-24721][SQL] Exclude Python UDFs filters in FileSo...

2018-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22104
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17280: [SPARK-19939] [ML] Add support for association rules in ...

2018-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17280
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95316/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17280: [SPARK-19939] [ML] Add support for association rules in ...

2018-08-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17280
  
**[Test build #95316 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95316/testReport)**
 for PR 17280 at commit 
[`9e2854a`](https://github.com/apache/spark/commit/9e2854a9764b7f7a007d38c3ab89f2e228c0675e).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17280: [SPARK-19939] [ML] Add support for association rules in ...

2018-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17280
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17280: [SPARK-19939] [ML] Add support for association rules in ...

2018-08-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17280
  
**[Test build #95316 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95316/testReport)**
 for PR 17280 at commit 
[`9e2854a`](https://github.com/apache/spark/commit/9e2854a9764b7f7a007d38c3ab89f2e228c0675e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17280: [SPARK-19939] [ML] Add support for association rules in ...

2018-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17280
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2593/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17280: [SPARK-19939] [ML] Add support for association rules in ...

2018-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17280
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22208: [SPARK-25216][SQL] Improve error message when a column c...

2018-08-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22208
  
**[Test build #95315 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95315/testReport)**
 for PR 22208 at commit 
[`a8a5976`](https://github.com/apache/spark/commit/a8a59760228d4fac54175caeffdfe07faf26a184).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22238: [SPARK-25245][DOCS][SS] Explain regarding limitin...

2018-08-27 Thread HeartSaVioR

Github user HeartSaVioR commented on a diff in the pull request:

https://github.com/apache/spark/pull/22238#discussion_r213129120
  
--- Diff: docs/structured-streaming-programming-guide.md ---
@@ -2812,6 +2812,12 @@ See [Input Sources](#input-sources) and [Output 
Sinks](#output-sinks) sections f
 
 # Additional Information
 
+**Gotchas**
+
+- For structured streaming, modifying "spark.sql.shuffle.partitions" is 
restricted once you run the query.
+  - This is because state is partitioned via key, hence number of 
partitions for state should be unchanged.
+  - If you want to run less tasks for stateful operations, `coalesce` 
would help with avoiding unnecessary repartitioning. Please note that it will 
also affect downstream operators.
--- End diff --

It just means that the number of partitions in stateful operations' output 
will be same as parameter for `coalesce`, and the number of partitions will be 
kept unless another shuffle happens. It is implicitly same as 
`spark.sql.shuffle.partitions`, which default value is 200.

I'll add the code, but not sure we need to have the code per language like 
Scala / Java / Python tabs since they will be same.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22208: [SPARK-25216][SQL] Improve error message when a column c...

2018-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22208
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2592/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22208: [SPARK-25216][SQL] Improve error message when a column c...

2018-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22208
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22212: [SPARK-25220] Seperate kubernetes node selector config b...

2018-08-27 Thread erikerlandson

Github user erikerlandson commented on the issue:

https://github.com/apache/spark/pull/22212
  
I agree there's an argument for keeping this, but an alternative would be 
to leave the original for backward compatability, deprecate it, and recommend 
people make use of custom pod templates (#22146)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22212: [SPARK-25220] Seperate kubernetes node selector c...

2018-08-27 Thread erikerlandson

Github user erikerlandson commented on a diff in the pull request:

https://github.com/apache/spark/pull/22212#discussion_r213127037
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -663,11 +663,21 @@ specific to Spark on Kubernetes.
   
 
 
-  spark.kubernetes.node.selector.[labelKey]
+  spark.kubernetes.driver.selector.[labelKey]
--- End diff --

agreed we should keep it, but recommend annotating it as deprecated


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22222: [SPARK-25083][SQL] Remove the type erasure hack in data ...

2018-08-27 Thread rdblue

Github user rdblue commented on the issue:

https://github.com/apache/spark/pull/2
  
@xuanyuanking, while this does remove the hack, it doesn't address the 
underlying problem. The problem is that there is a single RDD, which may 
contain InternalRow or may contain ColumnarBatch. Generated code knows how to 
differentiate between the two and use the RDD contents correctly.

While this is an improvement because it uses the actual type of records in 
the RDD, the work that needs to be done is to update the columnar case so that 
it does return an `RDD[InternalRow]` for anyone that accesses data using that 
RDD, and then update the generated code to detect a data source RDD and access 
the underlying `RDD[ColumnarBatch]`.

Here's some pseudo-code to demonstrate what I mean. The current code does 
something like this with a cast. Your change wouldn't fix the need to cast to 
`RDD[ColumnarBatch]`:
```scala
def doExecute(rdd: DataSourceRDD[InternalRow]) { // with your change, 
DataSourceRDD[_]
  if (rdd.isColumnar) {
doExecuteColumnarBatch(rdd.asInstanceOf[RDD[ColumnarBatch]])
  } else {
doExecuteRows(rdd)
  }
}
```

I think that should be changed to something like this which is type safe:
```scala
def doExecute(rdd: DataSourceRDD[InternalRow]) {
  if (rdd.isColumnar) {
doExecuteColumnarBatch(rdd.getColumnBatchRDD)
  } else {
doExecuteRows(rdd)
  }
}
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22249: [SPARK-16281][SQL][FOLLOW-UP] Add parse_url to fu...

2018-08-27 Thread TomaszGaweda

Github user TomaszGaweda commented on a diff in the pull request:

https://github.com/apache/spark/pull/22249#discussion_r213126158
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -2459,6 +2459,26 @@ object functions {
 StringTrimLeft(e.expr, Literal(trimString))
   }
 
+  /**
+* Extracts a part from a URL.
+*
+* @group string_funcs
+* @since 2.4.0
+*/
+  def parse_url(url: Column, partToExtract: String): Column = withExpr {
--- End diff --

Ok, tomorrow I will create a Jira and start working on it. Thanks for your 
comments! :)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22205: [SPARK-25212][SQL] Support Filter in ConvertToLoc...

2018-08-27 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/22205#discussion_r213124828
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -1349,6 +1353,12 @@ object ConvertToLocalRelation extends 
Rule[LogicalPlan] {
 
 case Limit(IntegerLiteral(limit), LocalRelation(output, data, 
isStreaming)) =>
   LocalRelation(output, data.take(limit), isStreaming)
+
+case Filter(condition, LocalRelation(output, data, isStreaming))
+if !hasUnevaluableExpr(condition) =>
--- End diff --

I suppose it is fine in this case. The only thing is that it violates the 
contract of the optimizer: it should not change the results of a query.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22236: [SPARK-10697][ML] Add lift to Association rules

2018-08-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22236
  
**[Test build #95314 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95314/testReport)**
 for PR 22236 at commit 
[`88eb571`](https://github.com/apache/spark/commit/88eb571b732d42138b029ead106f4c8718e1e220).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22236: [SPARK-10697][ML] Add lift to Association rules

2018-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22236
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22236: [SPARK-10697][ML] Add lift to Association rules

2018-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22236
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2591/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22205: [SPARK-25212][SQL] Support Filter in ConvertToLocalRelat...

2018-08-27 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22205
  
Yes. Disable this rule for testing only. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22238: [SPARK-25245][DOCS][SS] Explain regarding limitin...

2018-08-27 Thread HeartSaVioR

Github user HeartSaVioR commented on a diff in the pull request:

https://github.com/apache/spark/pull/22238#discussion_r213123711
  
--- Diff: docs/structured-streaming-programming-guide.md ---
@@ -2812,6 +2812,12 @@ See [Input Sources](#input-sources) and [Output 
Sinks](#output-sinks) sections f
 
 # Additional Information
 
+**Gotchas**
--- End diff --

I was going to add the explanation to `doc()` of 
`spark.sql.shuffle.partitions`, but looks like what we explained in `doc()` 
would not be published automatically. (Please correct me if I'm missing here.) 
SQLConf is even not exposed to scaladoc. That's why I'm adding this to 
structured streaming guide doc. Actually I think most of end users only take a 
look at this doc for structured streaming, and we can't (and shouldn't) expect 
end users to take a look at source code to find it.

But also actually I didn't notice that `spark.sql.shuffle.partitions` is 
explained in `sql-programming-guide.md` but I also think we need to explain all 
configs here if they work differently with batch query. 
`spark.sql.shuffle.partitions` is the case. 

Btw, `Gotchas` looks like funny though. Maybe having section would be 
better. Maybe like `## Other Configuration Options` in 
`sql-programming-guide.md`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22205: [SPARK-25212][SQL] Support Filter in ConvertToLocalRelat...

2018-08-27 Thread hvanhovell

Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/22205
  
@gatorsmile what are you afraid of exactly? We could check which tests are 
affected. Also do you want to disable this for testing only?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21977: [SPARK-25004][CORE] Add spark.executor.pyspark.memory li...

2018-08-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21977
  
**[Test build #95313 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95313/testReport)**
 for PR 21977 at commit 
[`0b275cf`](https://github.com/apache/spark/commit/0b275cfea7d83cdf61802da30c4a7604be8900e4).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21977: [SPARK-25004][CORE] Add spark.executor.pyspark.memory li...

2018-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21977
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2590/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21977: [SPARK-25004][CORE] Add spark.executor.pyspark.memory li...

2018-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21977
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21977: [SPARK-25004][CORE] Add spark.executor.pyspark.me...

2018-08-27 Thread rdblue

Github user rdblue commented on a diff in the pull request:

https://github.com/apache/spark/pull/21977#discussion_r213122284
  
--- Diff: docs/configuration.md ---
@@ -179,6 +179,15 @@ of the most common options to set are:
 (e.g. 2g, 8g).
   
 
+
+ spark.executor.pyspark.memory
+  Not set
+  
+The amount of memory to be allocated to PySpark in each executor, in 
MiB
+unless otherwise specified.  If set, PySpark memory for an executor 
will be
+limited to this amount. If not set, Spark will not limit Python's 
memory use.
--- End diff --

I've added "and it is up to the application to avoid exceeding the overhead 
memory space shared with other non-JVM processes."


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22249: [SPARK-16281][SQL][FOLLOW-UP] Add parse_url to fu...

2018-08-27 Thread mgaido91

Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22249#discussion_r213121794
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -2459,6 +2459,26 @@ object functions {
 StringTrimLeft(e.expr, Literal(trimString))
   }
 
+  /**
+* Extracts a part from a URL.
+*
+* @group string_funcs
+* @since 2.4.0
+*/
+  def parse_url(url: Column, partToExtract: String): Column = withExpr {
--- End diff --

I like this idea too


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21977: [SPARK-25004][CORE] Add spark.executor.pyspark.me...

2018-08-27 Thread rdblue

Github user rdblue commented on a diff in the pull request:

https://github.com/apache/spark/pull/21977#discussion_r213121178
  
--- Diff: docs/configuration.md ---
@@ -179,6 +179,15 @@ of the most common options to set are:
 (e.g. 2g, 8g).
   
 
+
+ spark.executor.pyspark.memory
+  Not set
+  
+The amount of memory to be allocated to PySpark in each executor, in 
MiB
--- End diff --

I've added "When PySpark is run in YARN, this memory is added to executor 
resource requests."


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22104: [SPARK-24721][SQL] Exclude Python UDFs filters in FileSo...

2018-08-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22104
  
**[Test build #95312 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95312/testReport)**
 for PR 22104 at commit 
[`3f0a97a`](https://github.com/apache/spark/commit/3f0a97a89b39d2ad57c587e49bb07203a670faba).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22104: [SPARK-24721][SQL] Exclude Python UDFs filters in FileSo...

2018-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22104
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2589/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22206: [SPARK-25213][PYTHON] Add project to v2 scans bef...

2018-08-27 Thread rdblue

Github user rdblue closed the pull request at:

https://github.com/apache/spark/pull/22206


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22249: [SPARK-16281][SQL][FOLLOW-UP] Add parse_url to fu...

2018-08-27 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22249#discussion_r213120096
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -2459,6 +2459,26 @@ object functions {
 StringTrimLeft(e.expr, Literal(trimString))
   }
 
+  /**
+* Extracts a part from a URL.
+*
+* @group string_funcs
+* @since 2.4.0
+*/
+  def parse_url(url: Column, partToExtract: String): Column = withExpr {
--- End diff --

@TomaszGaweda This sounds a good idea by returning a handler for built-in 
functions. cc @rxin 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21546: [SPARK-23030][SQL][PYTHON] Use Arrow stream format for c...

2018-08-27 Thread BryanCutler

Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/21546
  
Hey @HyukjinKwon , after going through the previous benchmarks, it seems 
out-of-order batches had more of an effect on performance that I thought with 
`toPandas`. The current revision of this PR (which buffers out of order batches 
in the driver JVM) has about a 1.06x - 1.09x speedup which seems a bit 
underwhelming after getting ~1.25x when sending out-of-order batches. I still 
want to try to verify the old numbers and will hopefully get to that tomorrow.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22206: [SPARK-25213][PYTHON] Add project to v2 scans before pyt...

2018-08-27 Thread rdblue

Github user rdblue commented on the issue:

https://github.com/apache/spark/pull/22206
  
@HyukjinKwon and @viirya, thank you for looking at this commit, but I like 
@cloud-fan's approach to fixing this in #22244 better than this work-around. 
I'm going to close this in favor of that approach, although if we need a quick 
fix I can pick this back up.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22104: [SPARK-24721][SQL] Exclude Python UDFs filters in FileSo...

2018-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22104
  
Build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22236: [SPARK-10697][ML] Add lift to Association rules

2018-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22236
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22236: [SPARK-10697][ML] Add lift to Association rules

2018-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22236
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95294/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22236: [SPARK-10697][ML] Add lift to Association rules

2018-08-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22236
  
**[Test build #95294 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95294/testReport)**
 for PR 22236 at commit 
[`957a6a2`](https://github.com/apache/spark/commit/957a6a2cf0e05f01c2c2d602944b8da8cfb1b426).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22188: [SPARK-25164][SQL] Avoid rebuilding column and path list...

2018-08-27 Thread bersprockets

Github user bersprockets commented on the issue:

https://github.com/apache/spark/pull/22188
  
@cloud-fan @gatorsmile Should we merge this also onto 2.2? It was a clean 
cherry-pick for me (from master to branch-2.2), and I ran the top and bottom 
tests (6000 columns, 1 million rows, 67 32M files, and 60 columns, 100 million 
rows, 67 32M files) from the PR description and got the same results.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21638: [SPARK-22357][CORE] SparkContext.binaryFiles ignore minP...

2018-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21638
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95295/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21638: [SPARK-22357][CORE] SparkContext.binaryFiles ignore minP...

2018-08-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21638
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22227: [SPARK-25202] [SQL] Implements split with limit sql func...

2018-08-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/7
  
**[Test build #95311 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95311/testReport)**
 for PR 7 at commit 
[`4e10733`](https://github.com/apache/spark/commit/4e107337a47ce590c703b757b0a44d60d6b862e1).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21638: [SPARK-22357][CORE] SparkContext.binaryFiles ignore minP...

2018-08-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21638
  
**[Test build #95295 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95295/testReport)**
 for PR 21638 at commit 
[`5e46efb`](https://github.com/apache/spark/commit/5e46efb5f5ce86297c4aeb23bf934fd9942de3de).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22205: [SPARK-25212][SQL] Support Filter in ConvertToLocalRelat...

2018-08-27 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22205
  
It would be safer to turn off this rule, since it will skip the actual 
query execution. Normally, the tests are introduced for testing end-to-end 
scenarios instead of applying this rule. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

< 1 2 3 4 5 6 >

101 - 200 of 549 matches

Mail list logo