[GitHub] spark issue #22071: [SPARK-25088][CORE][MESOS][DOCS] Update Rest Server docs...

2018-08-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22071
  
**[Test build #4241 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4241/testReport)**
 for PR 22071 at commit 
[`b4ca224`](https://github.com/apache/spark/commit/b4ca224095cb7fda6822c431465bfb7f48a4bb2d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders

2018-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21732
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders

2018-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21732
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94573/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders

2018-08-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21732
  
**[Test build #94573 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94573/testReport)**
 for PR 21732 at commit 
[`80506f4`](https://github.com/apache/spark/commit/80506f4e98184ccd66dbaac14ec52d69c358020d).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `   * For example, we build an encoder for `case class Data(a: Int, b: 
String)` and the real type`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22047: [SPARK-19851] Add support for EVERY and ANY (SOME) aggre...

2018-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22047
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22047: [SPARK-19851] Add support for EVERY and ANY (SOME) aggre...

2018-08-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22047
  
**[Test build #94586 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94586/testReport)**
 for PR 22047 at commit 
[`af4d901`](https://github.com/apache/spark/commit/af4d9011adb290e4300efce936d25b4de4ec5cd5).
 * This patch **fails Python style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22047: [SPARK-19851] Add support for EVERY and ANY (SOME) aggre...

2018-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22047
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94586/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22047: [SPARK-19851] Add support for EVERY and ANY (SOME) aggre...

2018-08-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22047
  
**[Test build #94586 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94586/testReport)**
 for PR 22047 at commit 
[`af4d901`](https://github.com/apache/spark/commit/af4d9011adb290e4300efce936d25b4de4ec5cd5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22047: [SPARK-19851] Add support for EVERY and ANY (SOME) aggre...

2018-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22047
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22047: [SPARK-19851] Add support for EVERY and ANY (SOME) aggre...

2018-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22047
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2059/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21746: [SPARK-24699] [SS]Make watermarks work with Trigger.Once...

2018-08-10 Thread zsxwing
Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/21746
  
@c-horn it's in 2.4.0. I just fixed the ticket.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22071: [SPARK-25088][CORE][MESOS][DOCS] Update Rest Server docs...

2018-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22071
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94572/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21746: [SPARK-24699] [SS]Make watermarks work with Trigger.Once...

2018-08-10 Thread c-horn
Github user c-horn commented on the issue:

https://github.com/apache/spark/pull/21746
  
@tdas this will not be included in `2.4.0`? as indicated 
[SPARK-24699](https://issues.apache.org/jira/browse/SPARK-24699)?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22071: [SPARK-25088][CORE][MESOS][DOCS] Update Rest Server docs...

2018-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22071
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

2018-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22063
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94571/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22071: [SPARK-25088][CORE][MESOS][DOCS] Update Rest Server docs...

2018-08-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22071
  
**[Test build #94572 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94572/testReport)**
 for PR 22071 at commit 
[`b4ca224`](https://github.com/apache/spark/commit/b4ca224095cb7fda6822c431465bfb7f48a4bb2d).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

2018-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22063
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

2018-08-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22063
  
**[Test build #94571 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94571/testReport)**
 for PR 22063 at commit 
[`a803869`](https://github.com/apache/spark/commit/a803869775250f366ef357ccf06fed397a3f1cfd).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21819: [SPARK-24863][SS] Report Kafka offset lag as a custom me...

2018-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21819
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21819: [SPARK-24863][SS] Report Kafka offset lag as a custom me...

2018-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21819
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94582/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22074: [SPARK-25089][R] removing lintr checks for 2.0

2018-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22074
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2058/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22074: [SPARK-25089][R] removing lintr checks for 2.0

2018-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22074
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21819: [SPARK-24863][SS] Report Kafka offset lag as a custom me...

2018-08-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21819
  
**[Test build #94582 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94582/testReport)**
 for PR 21819 at commit 
[`7129c3f`](https://github.com/apache/spark/commit/7129c3fbcd73d99ad80111b55625b9b5d46333a0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21698: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

2018-08-10 Thread squito
Github user squito commented on the issue:

https://github.com/apache/spark/pull/21698
  
yeah, you'd have to sort the entire record.  I think originally it didn't 
seem like that would work, because you don't know that `T` is sortable for 
`RDD[T]`.  But after a sort, you've got bytes, so you can at least sort the 
serialized bytes of `T`.  Note that you don't want to do the entire 
repartitioning based on that, since you won't get a good distribution for 
skewed data.  But it can give you a deterministic order as an *additional* 
sort, just within one partition.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22017: [SPARK-23938][SQL] Add map_zip_with function

2018-08-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22017
  
**[Test build #94585 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94585/testReport)**
 for PR 22017 at commit 
[`595161f`](https://github.com/apache/spark/commit/595161fefbf55711b76530a9e53aff73491febd6).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22074: [R] removing lintr checks for 2.0

2018-08-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22074
  
**[Test build #94584 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94584/consoleFull)**
 for PR 22074 at commit 
[`b204a88`](https://github.com/apache/spark/commit/b204a88f9d0116a384682422f82f1a55be32443b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22074: removing lintr checks for 2.0

2018-08-10 Thread shaneknapp
GitHub user shaneknapp opened a pull request:

https://github.com/apache/spark/pull/22074

removing lintr checks for 2.0

## What changes were proposed in this pull request?

since 2.0 will be EOLed some time in the not too distant future, and we'll 
be moving the builds from centos to ubuntu, i think it's fine to disable R 
linting rather than going down the rabbit hole of trying to fix this stuff.

## How was this patch tested?

the build system will test this



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shaneknapp/spark removing-lintr-2.0

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22074.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22074


commit b204a88f9d0116a384682422f82f1a55be32443b
Author: shane knapp 
Date:   2018-08-10T20:30:25Z

removing lintr checks for 2.0




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22017: [SPARK-23938][SQL] Add map_zip_with function

2018-08-10 Thread mn-mikke
Github user mn-mikke commented on the issue:

https://github.com/apache/spark/pull/22017
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21698: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

2018-08-10 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/21698
  
thinking about the sorting thing again but I don't think it works unless 
you sort both the keys and the values themselves. 

for instance lets say we have a groupby key which generates:
B = k1, { (a11, a12, a13), (a21, a22, a23), (a11, a12, a13)}
Then we do a distinct on it, you could get:
C =((a11, a12, a13) ,(a21, a22, a23))

But say you get fetch failures and you refetch the data.  Since the reducer 
can fetch it in any order during the shuffle,  B could be 
B = k1, { (a21, a22, a23), (a11, a12, a13), (a11, a12, a13)}
and distinct you end up with 
 C =((a21, a22, a23), (a11, a12, a13))

If you were to just sort that C could be in a different place and then 
round robin the output, the above example of C would go to different partitions.

I'm not sure of a way to get around this without using the hash partitioner 
but still looking into it.  I need to look at the details about how the 
dataframe  pr solved this as well as I'm curious how it works with above 
example. 



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22009
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22009
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94574/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22009
  
**[Test build #94574 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94574/testReport)**
 for PR 22009 at commit 
[`f4f85a8`](https://github.com/apache/spark/commit/f4f85a833ef319a6860134e12655574aca081ed6).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21980: [SPARK-25010][SQL] Rand/Randn should produce different v...

2018-08-10 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/21980
  
Thanks @zsxwing 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21977: SPARK-25004: Add spark.executor.pyspark.memory li...

2018-08-10 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21977#discussion_r209374683
  
--- Diff: python/pyspark/worker.py ---
@@ -259,6 +260,26 @@ def main(infile, outfile):
  "PYSPARK_DRIVER_PYTHON are correctly set.") %
 ("%d.%d" % sys.version_info[:2], version))
 
+# set up memory limits
+memory_limit_mb = int(os.environ.get('PYSPARK_EXECUTOR_MEMORY_MB', 
"-1"))
+total_memory = resource.RLIMIT_AS
+try:
+(total_memory_limit, max_total_memory) = 
resource.getrlimit(total_memory)
+msg = "Current mem: {0} of max 
{1}\n".format(total_memory_limit, max_total_memory)
+print(msg, file=sys.stderr)
+
+if memory_limit_mb > 0 and total_memory_limit == 
resource.RLIM_INFINITY:
+# convert to bytes
+total_memory_limit = memory_limit_mb * 1024 * 1024
+
+msg = "Setting mem to {0} of max 
{1}\n".format(total_memory_limit, max_total_memory)
+print(msg, file=sys.stderr)
+resource.setrlimit(total_memory, (total_memory_limit, 
total_memory_limit))
--- End diff --

Here the hard limit is intended to be `total_memory_limit` or it should be 
`max_total_memory`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22017: [SPARK-23938][SQL] Add map_zip_with function

2018-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22017
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22073: [R] removing lintr checks for 2.1

2018-08-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22073
  
**[Test build #94583 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94583/testReport)**
 for PR 22073 at commit 
[`f2974fb`](https://github.com/apache/spark/commit/f2974fbaf518f9e5350324ea0bf32c2fcea6f9b3).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22017: [SPARK-23938][SQL] Add map_zip_with function

2018-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22017
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94570/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22073: [R] removing lintr checks for 2.1

2018-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22073
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2057/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22073: [R] removing lintr checks for 2.1

2018-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22073
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22017: [SPARK-23938][SQL] Add map_zip_with function

2018-08-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22017
  
**[Test build #94570 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94570/testReport)**
 for PR 22017 at commit 
[`595161f`](https://github.com/apache/spark/commit/595161fefbf55711b76530a9e53aff73491febd6).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22073: removing lintr checks for 2.1

2018-08-10 Thread shaneknapp
GitHub user shaneknapp opened a pull request:

https://github.com/apache/spark/pull/22073

removing lintr checks for 2.1

## What changes were proposed in this pull request?

since 2.1 will be EOLed some time in the not too distant future, and we'll 
be moving the builds from centos to ubuntu, i think it's fine to disable R 
linting rather than going down the rabbit hole of trying to fix this stuff.

## How was this patch tested?

the build system will test this

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shaneknapp/spark removing-lintr

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22073.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22073


commit f2974fbaf518f9e5350324ea0bf32c2fcea6f9b3
Author: shane knapp 
Date:   2018-08-10T20:12:35Z

removing lintr checks for 2.1




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22062: [SPARK-25081][Core]Nested spill in ShuffleExterna...

2018-08-10 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/22062#discussion_r209372979
  
--- Diff: 
core/src/test/scala/org/apache/spark/shuffle/sort/ShuffleExternalSorterSuite.scala
 ---
@@ -0,0 +1,111 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.shuffle.sort
+
+import java.lang.{Long => JLong}
+
+import org.mockito.Mockito.when
+import org.scalatest.mockito.MockitoSugar
+
+import org.apache.spark._
+import org.apache.spark.executor.{ShuffleWriteMetrics, TaskMetrics}
+import org.apache.spark.memory._
+import org.apache.spark.unsafe.Platform
+
+class ShuffleExternalSorterSuite extends SparkFunSuite with 
LocalSparkContext with MockitoSugar {
+
+  test("nested spill should be no-op") {
+val conf = new SparkConf()
+  .setMaster("local[1]")
+  .setAppName("ShuffleExternalSorterSuite")
+  .set("spark.testing", "true")
+  .set("spark.testing.memory", "1600")
+  .set("spark.memory.fraction", "1")
+sc = new SparkContext(conf)
+
+val memoryManager = UnifiedMemoryManager(conf, 1)
+
+var shouldAllocate = false
+
+// Mock `TaskMemoryManager` to allocate free memory when 
`shouldAllocate` is true.
+// This will trigger a nested spill and expose issues if we don't 
handle this case properly.
+val taskMemoryManager = new TaskMemoryManager(memoryManager, 0) {
+  override def acquireExecutionMemory(required: Long, consumer: 
MemoryConsumer): Long = {
+// ExecutionMemoryPool.acquireMemory will wait until there are 400 
bytes for a task to use.
+// So we leave 400 bytes for the task.
+if (shouldAllocate &&
+  memoryManager.maxHeapMemory - memoryManager.executionMemoryUsed 
> 400) {
+  val acquireExecutionMemoryMethod =
+memoryManager.getClass.getMethods.filter(_.getName == 
"acquireExecutionMemory").head
+  acquireExecutionMemoryMethod.invoke(
+memoryManager,
+JLong.valueOf(
+  memoryManager.maxHeapMemory - 
memoryManager.executionMemoryUsed - 400),
+JLong.valueOf(1L), // taskAttemptId
+MemoryMode.ON_HEAP
+  ).asInstanceOf[java.lang.Long]
+}
+super.acquireExecutionMemory(required, consumer)
+  }
+}
+val taskContext = mock[TaskContext]
--- End diff --

lol


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21980: [SPARK-25010][SQL] Rand/Randn should produce different v...

2018-08-10 Thread zsxwing
Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/21980
  
LGTM2


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18143: [SPARK-20919][SS] Simplificaiton of CachedKafkaConsumer ...

2018-08-10 Thread zsxwing
Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/18143
  
@ScrapCodes sorry for the delay. I think @tdas has fixed the issue. Please 
close the PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21634: [SPARK-24648][SQL] SqlMetrics should be threadsaf...

2018-08-10 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/21634#discussion_r209371636
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala
 ---
@@ -504,4 +504,38 @@ class SQLMetricsSuite extends SparkFunSuite with 
SQLMetricsTestUtils with Shared
   test("writing data out metrics with dynamic partition: parquet") {
 testMetricsDynamicPartition("parquet", "parquet", "t1")
   }
+
+  test("writing metrics from single thread") {
+val nAdds = 10
+val acc = new SQLMetric("test", -10)
+assert(acc.isZero())
+acc.set(0)
+for (i <- 1 to nAdds) acc.add(1)
+assert(!acc.isZero())
+assert(nAdds === acc.value)
+acc.reset()
+assert(acc.isZero())
+  }
+
+  test("writing metrics from multiple threads") {
--- End diff --

> Do you mean it's a one-writer, multi-reader scene?

Yes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21819: [SPARK-24863][SS] Report Kafka offset lag as a custom me...

2018-08-10 Thread arunmahadevan
Github user arunmahadevan commented on the issue:

https://github.com/apache/spark/pull/21819
  
@HeartSaVioR @HyukjinKwon @jose-torres @tdas  would you mind taking a look?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21819: [SPARK-24863][SS] Report Kafka offset lag as a custom me...

2018-08-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21819
  
**[Test build #94582 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94582/testReport)**
 for PR 21819 at commit 
[`7129c3f`](https://github.com/apache/spark/commit/7129c3fbcd73d99ad80111b55625b9b5d46333a0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21977: SPARK-25004: Add spark.executor.pyspark.memory limit.

2018-08-10 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21977
  
Why not using `resource.RLIMIT_RSS`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21864: [SPARK-24908][R][style] removing spaces to make lintr ha...

2018-08-10 Thread shaneknapp
Github user shaneknapp commented on the issue:

https://github.com/apache/spark/pull/21864
  
thanks @srowen 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21870: Branch 2.3

2018-08-10 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/21870
  
Close this @lovezeropython 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22070: Fix typos detected by github.com/client9/misspell

2018-08-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22070
  
**[Test build #4240 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4240/testReport)**
 for PR 22070 at commit 
[`9e95df2`](https://github.com/apache/spark/commit/9e95df24206bbcc51ae09bd488d72a2bcf84ee7b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21864: [SPARK-24908][R][style] removing spaces to make lintr ha...

2018-08-10 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/21864
  
Also backported to 2.3


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22072: [SPARK-25081][Core]Nested spill in ShuffleExternalSorter...

2018-08-10 Thread rezasafi
Github user rezasafi commented on the issue:

https://github.com/apache/spark/pull/22072
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21977: SPARK-25004: Add spark.executor.pyspark.memory li...

2018-08-10 Thread rezasafi
Github user rezasafi commented on a diff in the pull request:

https://github.com/apache/spark/pull/21977#discussion_r209364194
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/python/AggregateInPandasExec.scala
 ---
@@ -137,13 +135,12 @@ case class AggregateInPandasExec(
 
   val columnarBatchIter = new ArrowPythonRunner(
 pyFuncs,
-bufferSize,
-reuseWorker,
 PythonEvalType.SQL_GROUPED_AGG_PANDAS_UDF,
 argOffsets,
 aggInputSchema,
 sessionLocalTimeZone,
-pythonRunnerConf).compute(projectedRowIter, context.partitionId(), 
context)
+pythonRunnerConf,
+sparkContext.conf).compute(projectedRowIter, 
context.partitionId(), context)
--- End diff --

The conf is accessible through org.apache.spark.SparkEnv.get.conf on the 
executors AFAIK


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21939: [SPARK-23874][SQL][PYTHON] Upgrade Apache Arrow to 0.10....

2018-08-10 Thread shaneknapp
Github user shaneknapp commented on the issue:

https://github.com/apache/spark/pull/21939
  
green!


https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.6-python-3.5-arrow-0.10.0-ubuntu-testing/8/


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21537: [SPARK-24505][SQL] Convert strings in codegen to blocks:...

2018-08-10 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/21537
  
Thanks @mgaido91 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21939: [SPARK-23874][SQL][PYTHON] Upgrade Apache Arrow to 0.10....

2018-08-10 Thread shaneknapp
Github user shaneknapp commented on the issue:

https://github.com/apache/spark/pull/21939
  
alright, builds are now passing.  it failed the last one on the junit 
publish, and since we're not running java/scala unittests, i have since removed 
that block. 

should be green in ~20.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21537: [SPARK-24505][SQL] Convert strings in codegen to blocks:...

2018-08-10 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/21537
  
LGTM apart from this 
[comment](https://github.com/apache/spark/pull/21537#discussion_r209353035)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22067: [SPARK-25084][SQL] distribute by on multiple columns may...

2018-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22067
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22067: [SPARK-25084][SQL] distribute by on multiple columns may...

2018-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22067
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94568/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22067: [SPARK-25084][SQL] distribute by on multiple columns may...

2018-08-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22067
  
**[Test build #94568 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94568/testReport)**
 for PR 22067 at commit 
[`b799e92`](https://github.com/apache/spark/commit/b799e925cbd1b859204491eace7e64142b75727e).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...

2018-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22011
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94565/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...

2018-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22011
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...

2018-08-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22011
  
**[Test build #94565 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94565/testReport)**
 for PR 22011 at commit 
[`ea2330b`](https://github.com/apache/spark/commit/ea2330baa61e427665ba824c3c42d1e4ec1a7934).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22072: [SPARK-25081][Core]Nested spill in ShuffleExternalSorter...

2018-08-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22072
  
**[Test build #94581 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94581/testReport)**
 for PR 22072 at commit 
[`1a6452e`](https://github.com/apache/spark/commit/1a6452ef0939c09c09801cff78b0214d7979bf6d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type

2018-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22037
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2056/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22072: [SPARK-25081][Core]Nested spill in ShuffleExternalSorter...

2018-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22072
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type

2018-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22037
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22072: [SPARK-25081][Core]Nested spill in ShuffleExternalSorter...

2018-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22072
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2055/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22001: [SPARK-24819][CORE] Fail fast when no enough slots to la...

2018-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22001
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94566/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22001: [SPARK-24819][CORE] Fail fast when no enough slots to la...

2018-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22001
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22001: [SPARK-24819][CORE] Fail fast when no enough slots to la...

2018-08-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22001
  
**[Test build #94566 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94566/testReport)**
 for PR 22001 at commit 
[`8de1a4b`](https://github.com/apache/spark/commit/8de1a4b0523bc459f66973cd92b7648e2609a002).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22072: [SPARK-25081][Core]Nested spill in ShuffleExterna...

2018-08-10 Thread zsxwing
GitHub user zsxwing opened a pull request:

https://github.com/apache/spark/pull/22072

[SPARK-25081][Core]Nested spill in ShuffleExternalSorter should not access 
released memory page (branch-2.2)

## What changes were proposed in this pull request?

Backport https://github.com/apache/spark/pull/22062 to branch-2.2.

## How was this patch tested?

Jenkins


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zsxwing/spark SPARK-25081-2.2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22072.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22072


commit 1a6452ef0939c09c09801cff78b0214d7979bf6d
Author: Shixiong Zhu 
Date:   2018-08-10T17:53:44Z

Nested spill in ShuffleExternalSorter should not access released memory page

This issue is pretty similar to 
[SPARK-21907](https://issues.apache.org/jira/browse/SPARK-21907).

"allocateArray" in 
[ShuffleInMemorySorter.reset](https://github.com/apache/spark/blob/9b8521e53e56a53b44c02366a99f8a8ee1307bbf/core/src/main/java/org/apache/spark/shuffle/sort/ShuffleInMemorySorter.java#L99)
 may trigger a spill and cause ShuffleInMemorySorter access the released 
`array`. Another task may get the same memory page from the pool. This will 
cause two tasks access the same memory page. When a task reads memory written 
by another task, many types of failures may happen. Here are some examples I  
have seen:

- JVM crash. (This is easy to reproduce in a unit test as we fill newly 
allocated and deallocated memory with 0xa5 and 0x5a bytes which usually points 
to an invalid memory address)
- java.lang.IllegalArgumentException: Comparison method violates its 
general contract!
- java.lang.NullPointerException at 
org.apache.spark.memory.TaskMemoryManager.getPage(TaskMemoryManager.java:384)
- java.lang.UnsupportedOperationException: Cannot grow BufferHolder by size 
-536870912 because the size after growing exceeds size limitation 2147483632

This PR resets states in `ShuffleInMemorySorter.reset` before calling 
`allocateArray` to fix the issue.

The new unit test will make JVM crash without the fix.

Closes #22062 from zsxwing/SPARK-25081.

Authored-by: Shixiong Zhu 
Signed-off-by: Shixiong Zhu 




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type

2018-08-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22037
  
**[Test build #94579 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94579/testReport)**
 for PR 22037 at commit 
[`0384a1a`](https://github.com/apache/spark/commit/0384a1af69573af317f9e644bcf04e12bf38f1f3).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22007: [SPARK-25033] Bump Apache commons.{httpclient, httpcore}

2018-08-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22007
  
If you find this is unrelated, you could trigger another test here 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22007: [SPARK-25033] Bump Apache commons.{httpclient, httpcore}

2018-08-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22007
  
**[Test build #94580 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94580/testReport)**
 for PR 22007 at commit 
[`316b9ad`](https://github.com/apache/spark/commit/316b9adc3be3e2d12ce5c092421901929d5455d4).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22007: [SPARK-25033] Bump Apache commons.{httpclient, httpcore}

2018-08-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22007
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22068: [MINOR][DOC]Add missing compression codec .

2018-08-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22068
  
Yea, if there are more instances found, we better fix them together while 
we are here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21939: [SPARK-23874][SQL][PYTHON] Upgrade Apache Arrow to 0.10....

2018-08-10 Thread shaneknapp
Github user shaneknapp commented on the issue:

https://github.com/apache/spark/pull/21939
  
`./build/mvn -DskipTests -Phadoop2.6 -Pyarn -Phive -Phive-thriftserver 
clean package` FTW.

(i do know that the -Phadoop2.6 is superfluous, but at this point...)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22069: [MINOR][DOC] Fix Java example code in Column's comments

2018-08-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22069
  
Yea if there are more instances found, we better fix them together


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21939: [SPARK-23874][SQL][PYTHON] Upgrade Apache Arrow to 0.10....

2018-08-10 Thread shaneknapp
Github user shaneknapp commented on the issue:

https://github.com/apache/spark/pull/21939
  
ok think i got it...  :)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...

2018-08-10 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21537#discussion_r209353035
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 ---
@@ -1024,26 +1033,29 @@ case class Cast(child: Expression, dataType: 
DataType, timeZoneId: Option[String
   private[this] def castToIntervalCode(from: DataType): CastFunction = 
from match {
 case StringType =>
   (c, evPrim, evNull) =>
-s"""$evPrim = CalendarInterval.fromString($c.toString());
+code"""$evPrim = CalendarInterval.fromString($c.toString());
if(${evPrim} == null) {
  ${evNull} = true;
}
  """.stripMargin
 
   }
 
-  private[this] def decimalToTimestampCode(d: String): String =
-s"($d.toBigDecimal().bigDecimal().multiply(new 
java.math.BigDecimal(100L))).longValue()"
-  private[this] def longToTimeStampCode(l: String): String = s"$l * 
100L"
-  private[this] def timestampToIntegerCode(ts: String): String =
-s"java.lang.Math.floor((double) $ts / 100L)"
-  private[this] def timestampToDoubleCode(ts: String): String = s"$ts / 
100.0"
+  private[this] def decimalToTimestampCode(d: ExprValue): Block = {
+val block = code"new java.math.BigDecimal(100L)"
--- End diff --

maybe a `JavaCode.expression` then for this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22010: [SPARK-21436][CORE] Take advantage of known parti...

2018-08-10 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22010#discussion_r209351478
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -396,7 +396,16 @@ abstract class RDD[T: ClassTag](
* Return a new RDD containing the distinct elements in this RDD.
*/
   def distinct(numPartitions: Int)(implicit ord: Ordering[T] = null): 
RDD[T] = withScope {
-map(x => (x, null)).reduceByKey((x, y) => x, numPartitions).map(_._1)
+// If the data is already approriately partitioned with a known 
partitioner we can work locally.
+def removeDuplicatesInPartition(itr: Iterator[T]): Iterator[T] = {
+  val set = new mutable.HashSet[T]()
+  itr.filter(set.add(_))
--- End diff --

not a big deal, but despite this is really compact and elegant, it adds to 
the set also the elements which are already there and it is not needed. We can 
probably check if the key is there and add it only in that case, probably it is 
a bit faster.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22069: [MINOR][DOC] Fix Java example code in Column's comments

2018-08-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22069
  
**[Test build #4239 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4239/testReport)**
 for PR 22069 at commit 
[`8520df8`](https://github.com/apache/spark/commit/8520df899a3364f2bb41d4155d2bed9e68772a07).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21939: [SPARK-23874][SQL][PYTHON] Upgrade Apache Arrow to 0.10....

2018-08-10 Thread shaneknapp
Github user shaneknapp commented on the issue:

https://github.com/apache/spark/pull/21939
  
making build config changes...  forgot to add `-Pyarn` to the build target, 
as well as making sure the correct python env is selected before running python 
tests.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22066: [SPARK-25084][SQL] "distribute by" on multiple columns (...

2018-08-10 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/22066
  
> @viirya is that effort going on? I can help with the work if you want. 
Thanks.

@mgaido91 Yeah, I'm still working on it. One of the PRs #21537 is still 
waiting for review.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22001: [SPARK-24819][CORE] Fail fast when no enough slots to la...

2018-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22001
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22001: [SPARK-24819][CORE] Fail fast when no enough slots to la...

2018-08-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22001
  
**[Test build #94578 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94578/testReport)**
 for PR 22001 at commit 
[`458c78f`](https://github.com/apache/spark/commit/458c78fb076f642f5eee24a7a0911f3822254084).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22001: [SPARK-24819][CORE] Fail fast when no enough slots to la...

2018-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22001
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94578/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...

2018-08-10 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16677
  
Thank you! @hvanhovell 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21939: [SPARK-23874][SQL][PYTHON] Upgrade Apache Arrow to 0.10....

2018-08-10 Thread shaneknapp
Github user shaneknapp commented on the issue:

https://github.com/apache/spark/pull/21939
  
there's a bunch of stuff in the unittest logs that i could use some extra 
eyes on:

https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.6-python-3.5-arrow-0.10.0-ubuntu-testing/3/artifact/target/unit-tests.log

https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.6-python-3.5-arrow-0.10.0-ubuntu-testing/3/artifact/work/app-20180810110830-/0/target/unit-tests.log


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22007: [SPARK-25033] Bump Apache commons.{httpclient, httpcore}

2018-08-10 Thread Fokko
Github user Fokko commented on the issue:

https://github.com/apache/spark/pull/22007
  
I don't really understand the error, can a Spark expert elaborate what's 
going on here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22019: [WIP][SPARK-25040][SQL] Empty string for double and floa...

2018-08-10 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/22019
  
SGTM too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22001: [SPARK-24819][CORE] Fail fast when no enough slots to la...

2018-08-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22001
  
**[Test build #94578 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94578/testReport)**
 for PR 22001 at commit 
[`458c78f`](https://github.com/apache/spark/commit/458c78fb076f642f5eee24a7a0911f3822254084).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22001: [SPARK-24819][CORE] Fail fast when no enough slots to la...

2018-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22001
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22001: [SPARK-24819][CORE] Fail fast when no enough slots to la...

2018-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22001
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2054/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22066: [SPARK-25084][SQL] "distribute by" on multiple columns (...

2018-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22066
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94564/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22066: [SPARK-25084][SQL] "distribute by" on multiple columns (...

2018-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22066
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22066: [SPARK-25084][SQL] "distribute by" on multiple columns (...

2018-08-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22066
  
**[Test build #94564 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94564/testReport)**
 for PR 22066 at commit 
[`5da6d4d`](https://github.com/apache/spark/commit/5da6d4d437c6eb2bdf7a64c031b7c9281a5a8b83).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-08-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22010
  
**[Test build #94577 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94577/testReport)**
 for PR 22010 at commit 
[`5fd3659`](https://github.com/apache/spark/commit/5fd36592a26b07fdb58e79e4efbb6b70daea54df).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    1   2   3   4   5   6   >