[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22010 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22010 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark issue #21939: [SPARK-23874][SQL][PYTHON] Upgrade Apache Arrow to 0.10....

2018-08-10 Thread shaneknapp
Github user shaneknapp commented on the issue: https://github.com/apache/spark/pull/21939 hmm first build failed but i don't quite know what's up: `skipped 'Pandas >= 0.19.2 must be installed; however, it was not found.'` and: ``` (py3k)

[GitHub] spark issue #22062: [SPARK-25081][Core]Nested spill in ShuffleExternalSorter...

2018-08-10 Thread zsxwing
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/22062 I also merged to branch-2.3. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark pull request #22010: [SPARK-21436][CORE] Take advantage of known parti...

2018-08-10 Thread holdenk
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/22010#discussion_r209340344 --- Diff: core/src/test/scala/org/apache/spark/rdd/RDDSuite.scala --- @@ -95,6 +95,18 @@ class RDDSuite extends SparkFunSuite with SharedSparkContext {

[GitHub] spark pull request #22010: [SPARK-21436][CORE] Take advantage of known parti...

2018-08-10 Thread holdenk
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/22010#discussion_r209340321 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -396,7 +396,16 @@ abstract class RDD[T: ClassTag]( * Return a new RDD containing

[GitHub] spark pull request #22062: [SPARK-25081][Core]Nested spill in ShuffleExterna...

2018-08-10 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22062 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21939: [SPARK-23874][SQL][PYTHON] Upgrade Apache Arrow to 0.10....

2018-08-10 Thread shaneknapp
Github user shaneknapp commented on the issue: https://github.com/apache/spark/pull/21939 https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-master-test-sbt-hadoop-2.6-python-3.5-arrow-0.10.0-ubuntu-testing/ ---

[GitHub] spark pull request #22010: [SPARK-21436][CORE] Take advantage of known parti...

2018-08-10 Thread holdenk
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/22010#discussion_r209338809 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -396,7 +396,16 @@ abstract class RDD[T: ClassTag]( * Return a new RDD containing

[GitHub] spark issue #21939: [SPARK-23874][SQL][PYTHON] Upgrade Apache Arrow to 0.10....

2018-08-10 Thread shaneknapp
Github user shaneknapp commented on the issue: https://github.com/apache/spark/pull/21939 master branch, i assume --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands,

[GitHub] spark issue #22062: [SPARK-25081][Core]Nested spill in ShuffleExternalSorter...

2018-08-10 Thread zsxwing
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/22062 Thanks. Merging to master. I will try to merge to old branches and report back. --- - To unsubscribe, e-mail:

[GitHub] spark pull request #22062: [SPARK-25081][Core]Nested spill in ShuffleExterna...

2018-08-10 Thread zsxwing
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/22062#discussion_r209337943 --- Diff: core/src/test/scala/org/apache/spark/shuffle/sort/ShuffleExternalSorterSuite.scala --- @@ -0,0 +1,111 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #22062: [SPARK-25081][Core]Nested spill in ShuffleExterna...

2018-08-10 Thread zsxwing
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/22062#discussion_r209338026 --- Diff: core/src/test/scala/org/apache/spark/shuffle/sort/ShuffleExternalSorterSuite.scala --- @@ -0,0 +1,111 @@ +/* + * Licensed to the Apache

[GitHub] spark issue #22007: [SPARK-25033] Bump Apache commons.{httpclient, httpcore}

2018-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22007 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94563/ Test FAILed. ---

[GitHub] spark issue #22007: [SPARK-25033] Bump Apache commons.{httpclient, httpcore}

2018-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22007 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #22053: [SPARK-25069][CORE]Using UnsafeAlignedOffset to make the...

2018-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22053 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #22062: [SPARK-25081][Core]Nested spill in ShuffleExterna...

2018-08-10 Thread zsxwing
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/22062#discussion_r209337484 --- Diff: core/src/main/java/org/apache/spark/shuffle/sort/ShuffleInMemorySorter.java --- @@ -94,12 +94,20 @@ public int numRecords() { }

[GitHub] spark issue #22053: [SPARK-25069][CORE]Using UnsafeAlignedOffset to make the...

2018-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22053 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94560/ Test FAILed. ---

[GitHub] spark issue #22007: [SPARK-25033] Bump Apache commons.{httpclient, httpcore}

2018-08-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22007 **[Test build #94563 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94563/testReport)** for PR 22007 at commit

[GitHub] spark pull request #21977: SPARK-25004: Add spark.executor.pyspark.memory li...

2018-08-10 Thread holdenk
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/21977#discussion_r209330276 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -91,6 +91,13 @@ private[spark] class Client(

[GitHub] spark pull request #21977: SPARK-25004: Add spark.executor.pyspark.memory li...

2018-08-10 Thread holdenk
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/21977#discussion_r209329689 --- Diff: python/pyspark/worker.py --- @@ -259,6 +260,26 @@ def main(infile, outfile): "PYSPARK_DRIVER_PYTHON are

[GitHub] spark issue #22053: [SPARK-25069][CORE]Using UnsafeAlignedOffset to make the...

2018-08-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22053 **[Test build #94560 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94560/testReport)** for PR 22053 at commit

[GitHub] spark pull request #21977: SPARK-25004: Add spark.executor.pyspark.memory li...

2018-08-10 Thread holdenk
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/21977#discussion_r209331503 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/AggregateInPandasExec.scala --- @@ -137,13 +135,12 @@ case class

[GitHub] spark issue #22067: [SPARK-25084][SQL] distribute by on multiple columns may...

2018-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22067 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #22067: [SPARK-25084][SQL] distribute by on multiple columns may...

2018-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22067 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94562/ Test FAILed. ---

[GitHub] spark issue #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-08-10 Thread holdenk
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/20838 Looking at Jenkins though I can see somethings which don't appear to be timeout failures, e.g. > Traceback (most recent call last): > File

[GitHub] spark issue #21939: [SPARK-23874][SQL][PYTHON] Upgrade Apache Arrow to 0.10....

2018-08-10 Thread BryanCutler
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/21939 Yup, that would run all the pyarrow tests --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #22067: [SPARK-25084][SQL] distribute by on multiple columns may...

2018-08-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22067 **[Test build #94562 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94562/testReport)** for PR 22067 at commit

[GitHub] spark pull request #14083: [SPARK-16406][SQL] Improve performance of Logical...

2018-08-10 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/14083#discussion_r209336169 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/package.scala --- @@ -138,6 +140,88 @@ package object expressions {

[GitHub] spark issue #21698: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

2018-08-10 Thread zsxwing
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/21698 > IIUC streaming query always need to specify a checkpoint location? You can use a batch query to read and write Kafka :) My point is if the input and output data sources are not

[GitHub] spark issue #21933: [SPARK-24917][CORE] make chunk size configurable

2018-08-10 Thread squito
Github user squito commented on the issue: https://github.com/apache/spark/pull/21933 Thanks for detailed analysis @vincent-grosbois . I agree with everything, but as you noted you won't hit this particular issue anymore with `ChunkedByteBufferFileRegion`. Is there another

[GitHub] spark issue #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-08-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20838 **[Test build #94576 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94576/testReport)** for PR 20838 at commit

[GitHub] spark issue #21939: [SPARK-23874][SQL][PYTHON] Upgrade Apache Arrow to 0.10....

2018-08-10 Thread shaneknapp
Github user shaneknapp commented on the issue: https://github.com/apache/spark/pull/21939 @BryanCutler responding to the wall 'o text: 1) we are not moving to py3.5 until after 2.4 is cut 2) my 'testing' server has 3.5 and 0.10.0 installed, so i can create a job that

[GitHub] spark issue #21698: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

2018-08-10 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/21698 Ok, it seems like the proposal @squito had to sort on the binary/serialized data seems like at least a good short term solution. any sorting is going to definitely add overhead but at least its

[GitHub] spark issue #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-08-10 Thread holdenk
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/20838 So if the issue was a timeout last time, we can ask "Jenkins retest this please" and see if not now works. If that isn't working for you another way we can trigger Jenkins retest is to merge in

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-10 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 >> @mallman, while we wait for the go-no-go, do you have the changes for the next PR ready? Is there anything you need help with? > I have the hack I used originally, but I haven't tried

[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...

2018-08-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22011 **[Test build #94575 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94575/testReport)** for PR 22011 at commit

[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...

2018-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22011 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...

2018-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22011 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21939: [SPARK-23874][SQL][PYTHON] Upgrade Apache Arrow to 0.10....

2018-08-10 Thread BryanCutler
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/21939 Wow, thanks @shaneknapp for helping to get this worked out! I think you're plan to move to Python 3.5 sounds great, but it does make me a bit nervous making a change like this at a a critical

[GitHub] spark issue #22066: [SPARK-25084][SQL] "distribute by" on multiple columns (...

2018-08-10 Thread yucai
Github user yucai commented on the issue: https://github.com/apache/spark/pull/22066 @LantaoJin I realized the initial way had some issue, so I marked it as WIP to refine and add test. It is different from your original implementation, so I would like to use this one. ---

[GitHub] spark pull request #14083: [SPARK-16406][SQL] Improve performance of Logical...

2018-08-10 Thread heuermh
Github user heuermh commented on a diff in the pull request: https://github.com/apache/spark/pull/14083#discussion_r209327944 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/package.scala --- @@ -138,6 +140,88 @@ package object expressions {

[GitHub] spark issue #22066: [SPARK-25084][SQL] "distribute by" on multiple columns (...

2018-08-10 Thread yucai
Github user yucai commented on the issue: https://github.com/apache/spark/pull/22066 @cloud-fan Jira and 1st is from this one. It is critical to our 2.3 migration. --- - To unsubscribe, e-mail:

[GitHub] spark issue #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22009 **[Test build #94574 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94574/testReport)** for PR 22009 at commit

[GitHub] spark issue #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22009 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22009 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark issue #21698: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

2018-08-10 Thread jiangxb1987
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/21698 > What happened to the other pr proposal of just using the hashPartitioner? It's not able to handle the use case when we use `repartition()` to shuffle skewed data, so we give up on

[GitHub] spark issue #22066: [SPARK-25084][SQL] "distribute by" on multiple columns (...

2018-08-10 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22066 This PR looks identical to #22067 , which one is the first PR? --- - To unsubscribe, e-mail:

[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders

2018-08-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21732 **[Test build #94573 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94573/testReport)** for PR 21732 at commit

[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders

2018-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21732 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders

2018-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21732 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders

2018-08-10 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21732 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #22071: [SPARK-25088][CORE][MESOS][DOCS] Update Rest Server docs...

2018-08-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22071 **[Test build #94572 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94572/testReport)** for PR 22071 at commit

[GitHub] spark issue #22071: [SPARK-25088][CORE][MESOS][DOCS] Update Rest Server docs...

2018-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22071 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #22071: [SPARK-25088][CORE][MESOS][DOCS] Update Rest Server docs...

2018-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22071 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request #22071: [SPARK-25088][CORE][MESOS][DOCS] Update Rest Serv...

2018-08-10 Thread squito
GitHub user squito opened a pull request: https://github.com/apache/spark/pull/22071 [SPARK-25088][CORE][MESOS][DOCS] Update Rest Server docs & defaults. ## What changes were proposed in this pull request? (a) disabled rest submission server by default in standalone mode

[GitHub] spark issue #22068: [MINOR][DOC]Add missing compression codec .

2018-08-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22068 **[Test build #4237 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4237/testReport)** for PR 22068 at commit

[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

2018-08-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22063 **[Test build #94571 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94571/testReport)** for PR 22063 at commit

[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

2018-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22063 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

2018-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22063 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type

2018-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22037 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94558/ Test FAILed. ---

[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type

2018-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22037 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type

2018-08-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22037 **[Test build #94558 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94558/testReport)** for PR 22037 at commit

[GitHub] spark issue #21369: [SPARK-22713][CORE] ExternalAppendOnlyMap leaks when spi...

2018-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21369 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94559/ Test FAILed. ---

[GitHub] spark issue #21369: [SPARK-22713][CORE] ExternalAppendOnlyMap leaks when spi...

2018-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21369 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21698: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

2018-08-10 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/21698 Still catching up on this and trying to understand all the cases. What happened to the other pr proposal of just using the hashPartitioner? Did we give up on that because of the skewed

[GitHub] spark issue #21369: [SPARK-22713][CORE] ExternalAppendOnlyMap leaks when spi...

2018-08-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21369 **[Test build #94559 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94559/testReport)** for PR 21369 at commit

[GitHub] spark issue #22064: [MINOR][BUILD] Add ECCN notice required by http://www.ap...

2018-08-10 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22064 Merged to master/2.3/2.2 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark pull request #22064: [MINOR][BUILD] Add ECCN notice required by http:/...

2018-08-10 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22064 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21698: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

2018-08-10 Thread squito
Github user squito commented on the issue: https://github.com/apache/spark/pull/21698 > What if the user does't provide a distributed file system path? E.g., you can read from Kafka and write them back to Kafka and such workloads don't need a distributed file system in standalone

[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

2018-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22063 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

2018-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22063 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94569/ Test FAILed. ---

[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

2018-08-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22063 **[Test build #94569 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94569/testReport)** for PR 22063 at commit

[GitHub] spark issue #22064: [MINOR][BUILD] Add ECCN notice required by http://www.ap...

2018-08-10 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22064 I'm going to ignore the failures as they can't be related --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For

[GitHub] spark pull request #22017: [SPARK-23938][SQL] Add map_zip_with function

2018-08-10 Thread mn-mikke
Github user mn-mikke commented on a diff in the pull request: https://github.com/apache/spark/pull/22017#discussion_r209311502 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -231,6 +231,15 @@ object TypeCoercion {

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-10 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r209310622 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/streaming/StreamingReadSupport.java --- @@ -0,0 +1,49 @@ +/* + *

[GitHub] spark issue #22064: [MINOR][BUILD] Add ECCN notice required by http://www.ap...

2018-08-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22064 **[Test build #4236 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4236/testReport)** for PR 22064 at commit

[GitHub] spark issue #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-08-10 Thread cclauss
Github user cclauss commented on the issue: https://github.com/apache/spark/pull/20838 jenkins, retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-08-10 Thread cclauss
Github user cclauss commented on the issue: https://github.com/apache/spark/pull/20838 @HyukjinKwon Your advise on next steps? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #22017: [SPARK-23938][SQL] Add map_zip_with function

2018-08-10 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/22017#discussion_r209308055 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -231,6 +231,15 @@ object TypeCoercion {

[GitHub] spark pull request #22017: [SPARK-23938][SQL] Add map_zip_with function

2018-08-10 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/22017#discussion_r209307767 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala --- @@ -442,3 +442,186 @@ case class

[GitHub] spark issue #22017: [SPARK-23938][SQL] Add map_zip_with function

2018-08-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22017 **[Test build #94570 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94570/testReport)** for PR 22017 at commit

[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

2018-08-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22063 **[Test build #94569 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94569/testReport)** for PR 22063 at commit

[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

2018-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22063 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

2018-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22063 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #22066: [SPARK-25084][SQL] "distribute by" on multiple columns (...

2018-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22066 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #22066: [SPARK-25084][SQL] "distribute by" on multiple columns (...

2018-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22066 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94557/ Test FAILed. ---

[GitHub] spark issue #22066: [SPARK-25084][SQL] "distribute by" on multiple columns (...

2018-08-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22066 **[Test build #94557 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94557/testReport)** for PR 22066 at commit

[GitHub] spark issue #22070: Fix typos detected by github.com/client9/misspell

2018-08-10 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/22070 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark pull request #22001: [SPARK-24819][CORE] Fail fast when no enough slot...

2018-08-10 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/22001#discussion_r209304798 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -929,6 +955,28 @@ class DAGScheduler( // HadoopRDD whose

[GitHub] spark issue #22067: [SPARK-25084][SQL] distribute by on multiple columns may...

2018-08-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22067 **[Test build #94568 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94568/testReport)** for PR 22067 at commit

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-10 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r209301920 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/streaming/StreamingReadSupport.java --- @@ -0,0 +1,49 @@ +/* + *

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-10 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r209301833 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/SessionConfigSupport.java --- @@ -27,10 +27,10 @@ @InterfaceStability.Evolving

[GitHub] spark issue #22067: [SPARK-25084][SQL] distribute by on multiple columns may...

2018-08-10 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/22067 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #22069: [MINOR][DOC] Fix Java example code in Column's comments

2018-08-10 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/22069 Do we need to update the following similar examples, too? Column.scala ``` * {{{ * // Example: encoding gender string column into integer. * * // Scala:

[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders

2018-08-10 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21732 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark pull request #22001: [SPARK-24819][CORE] Fail fast when no enough slot...

2018-08-10 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/22001#discussion_r209294774 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosFineGrainedSchedulerBackend.scala --- @@ -453,4 +453,8 @@

[GitHub] spark pull request #22001: [SPARK-24819][CORE] Fail fast when no enough slot...

2018-08-10 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/22001#discussion_r209276818 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -929,6 +955,28 @@ class DAGScheduler( // HadoopRDD whose

[GitHub] spark pull request #22001: [SPARK-24819][CORE] Fail fast when no enough slot...

2018-08-10 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/22001#discussion_r209277357 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -929,6 +955,28 @@ class DAGScheduler( // HadoopRDD whose

[GitHub] spark pull request #22001: [SPARK-24819][CORE] Fail fast when no enough slot...

2018-08-10 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/22001#discussion_r209274833 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -929,6 +955,28 @@ class DAGScheduler( // HadoopRDD whose

<    1   2   3   4   5   6   >