[GitHub] spark issue #21511: [SPARK-24491][Kubernetes] Configuration support for requ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21511 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22893: [SPARK-25868][MLlib] One part of Spark MLlib Kmean Logic...
Github user KyleLi1985 commented on the issue: https://github.com/apache/spark/pull/22893 @SparkQA test this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22992: [SPARK-24229] Update to Apache Thrift 0.10.0
Github user Fokko closed the pull request at: https://github.com/apache/spark/pull/22992 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22992: [SPARK-24229] Update to Apache Thrift 0.10.0
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22992 @mingwandroid . If you are worrying about the real issues, could you lend us your hand, please? Reopening the issue with the valid reproducible case is always welcome. Apache Spark community do seriously care about the correct CVE report, and provide backports. - http://spark.apache.org/security.html Alarming real risks is the only way to make people happy. We should not make people surprise with wrong reasons. Apache Spark issues and commits are precious resources. Not only you, all downstream are affected. So, we are trying to do our best to deliver only the correct patch. If we cry `Wolf, Wolf` for incorrect situation repeatedly, Apache Spark security alert's credibility will go down gradually (and seriously eventually). Nobody believes Spark's security alart in the future. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22899: [SPARK-25573] Combine resolveExpression and resolve in t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22899 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22899: [SPARK-25573] Combine resolveExpression and resolve in t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22899 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4908/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22899: [SPARK-25573] Combine resolveExpression and resolve in t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22899 **[Test build #98674 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98674/testReport)** for PR 22899 at commit [`3a32007`](https://github.com/apache/spark/commit/3a320075e2749e5ff21fc6fef616406fd8756cc9). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22255: [SPARK-25102][Spark Core] Write Spark version inf...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22255 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22932: [SPARK-25102][SQL] Write Spark version to ORC/Par...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22932 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22932 Thank you so much! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22932: [SPARK-25102][SQL] Write Spark version to ORC/Par...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22932#discussion_r232444034 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcOutputWriter.scala --- @@ -36,11 +37,17 @@ private[orc] class OrcOutputWriter( private[this] val serializer = new OrcSerializer(dataSchema) private val recordWriter = { -new OrcOutputFormat[OrcStruct]() { +val orcOutputFormat = new OrcOutputFormat[OrcStruct]() { override def getDefaultWorkFile(context: TaskAttemptContext, extension: String): Path = { new Path(path) } -}.getRecordWriter(context) +} +val filename = orcOutputFormat.getDefaultWorkFile(context, ".orc") +val options = OrcMapRedOutputFormat.buildOptions(context.getConfiguration) +val writer = OrcFile.createWriter(filename, options) +val recordWriter = new OrcMapreduceRecordWriter[OrcStruct](writer) --- End diff -- Right. To avoid reflection, this was the only way. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22998: [SPARK-26001][SQL]Reduce memory copy when writing decima...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22998 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/22932 LGTM. Thanks! Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22998: [SPARK-26001][SQL]Reduce memory copy when writing decima...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22998 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22998: [SPARK-26001][SQL]Reduce memory copy when writing decima...
Github user heary-cao commented on the issue: https://github.com/apache/spark/pull/22998 cc @mgaido91, @dongjoon-hyun , @cloud-fan , @kiszk --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22998: [SPARK-26001][SQL]Reduce memory copy when writing decima...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22998 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22998: [SPARK-26001][SQL]Reduce memory copy when writing...
GitHub user heary-cao opened a pull request: https://github.com/apache/spark/pull/22998 [SPARK-26001][SQL]Reduce memory copy when writing decimal ## What changes were proposed in this pull request? this PR fix 2 here: - when writing non-null decimals, we not zero-out all the 16 allocated bytes. if the number of bytes needed for a decimal is greater than 8. then we not need zero-out between 0-byte and 8-byte. The first 8-byte will be covered when writing decimal. - when writing null decimals, we not zero-out all the 16 allocated bytes. BitSetMethods.set the label for null and the length of decimal to 0. when we get the decimal, will not access the 16 byte memory value, so this is safe. ## How was this patch tested? the existed test cases. You can merge this pull request into a Git repository by running: $ git pull https://github.com/heary-cao/spark writeDecimal Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22998.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22998 commit bab69d426578a009ce7796e14757c6ae79d57f28 Author: caoxuewen Date: 2018-11-10T06:31:52Z Reduce memory copy when writing decimal --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22932: [SPARK-25102][SQL] Write Spark version to ORC/Par...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/22932#discussion_r232443802 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcOutputWriter.scala --- @@ -36,11 +37,17 @@ private[orc] class OrcOutputWriter( private[this] val serializer = new OrcSerializer(dataSchema) private val recordWriter = { -new OrcOutputFormat[OrcStruct]() { +val orcOutputFormat = new OrcOutputFormat[OrcStruct]() { override def getDefaultWorkFile(context: TaskAttemptContext, extension: String): Path = { new Path(path) } -}.getRecordWriter(context) +} +val filename = orcOutputFormat.getDefaultWorkFile(context, ".orc") +val options = OrcMapRedOutputFormat.buildOptions(context.getConfiguration) +val writer = OrcFile.createWriter(filename, options) +val recordWriter = new OrcMapreduceRecordWriter[OrcStruct](writer) --- End diff -- This is basically copied from getRecordWriter --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22954: [SPARK-25981][R] Enables Arrow optimization from R DataF...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22954 Hey guys thanks for reviewing! Will address them soon. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22976: [SPARK-25974][SQL]Optimizes Generates bytecode fo...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/22976#discussion_r232443266 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateOrdering.scala --- @@ -68,57 +68,50 @@ object GenerateOrdering extends CodeGenerator[Seq[SortOrder], Ordering[InternalR genComparisons(ctx, ordering) } + /** + * Creates the variables for ordering based on the given order. + */ + private def createOrderKeys( +ctx: CodegenContext, +row: String, +ordering: Seq[SortOrder]): Seq[ExprCode] = { +ctx.INPUT_ROW = row +// to use INPUT_ROW we must make sure currentVars is null +ctx.currentVars = null +ordering.map(_.child.genCode(ctx)) + } + /** * Generates the code for ordering based on the given order. */ def genComparisons(ctx: CodegenContext, ordering: Seq[SortOrder]): String = { val oldInputRow = ctx.INPUT_ROW val oldCurrentVars = ctx.currentVars -val inputRow = "i" -ctx.INPUT_ROW = inputRow -// to use INPUT_ROW we must make sure currentVars is null -ctx.currentVars = null - -val comparisons = ordering.map { order => - val eval = order.child.genCode(ctx) - val asc = order.isAscending - val isNullA = ctx.freshName("isNullA") - val primitiveA = ctx.freshName("primitiveA") - val isNullB = ctx.freshName("isNullB") - val primitiveB = ctx.freshName("primitiveB") +val rowAKeys = createOrderKeys(ctx, "a", ordering) +val rowBKeys = createOrderKeys(ctx, "b", ordering) +val comparisons = rowAKeys.zip(rowBKeys).zipWithIndex.map { case ((l, r), i) => + val dt = ordering(i).child.dataType + val asc = ordering(i).isAscending + val nullOrdering = ordering(i).nullOrdering s""" - ${ctx.INPUT_ROW} = a; - boolean $isNullA; - ${CodeGenerator.javaType(order.child.dataType)} $primitiveA; - { -${eval.code} -$isNullA = ${eval.isNull}; -$primitiveA = ${eval.value}; - } - ${ctx.INPUT_ROW} = b; - boolean $isNullB; - ${CodeGenerator.javaType(order.child.dataType)} $primitiveB; - { -${eval.code} -$isNullB = ${eval.isNull}; -$primitiveB = ${eval.value}; - } - if ($isNullA && $isNullB) { + ${l.code} --- End diff -- Would you update this to use | and .stripMargin? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22976: [SPARK-25974][SQL]Optimizes Generates bytecode fo...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/22976#discussion_r232443230 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateOrdering.scala --- @@ -133,7 +126,6 @@ object GenerateOrdering extends CodeGenerator[Seq[SortOrder], Ordering[InternalR returnType = "int", makeSplitFunction = { body => s""" --- End diff -- Would you update this to use `|` and `.stripMargin`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22976: [SPARK-25974][SQL]Optimizes Generates bytecode fo...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/22976#discussion_r232443205 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateOrdering.scala --- @@ -154,7 +146,6 @@ object GenerateOrdering extends CodeGenerator[Seq[SortOrder], Ordering[InternalR // make sure INPUT_ROW is declared even if splitExpressions // returns an inlined block s""" --- End diff -- Can we use just `code`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22993: [SPARK-24421][BUILD][CORE] Accessing sun.misc.Cleaner in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22993 **[Test build #4422 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4422/testReport)** for PR 22993 at commit [`f137de7`](https://github.com/apache/spark/commit/f137de748e092315cc11e66deaafbcb469dd5764). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22993: [SPARK-24421][BUILD][CORE] Accessing sun.misc.Cleaner in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22993 **[Test build #4422 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4422/testReport)** for PR 22993 at commit [`f137de7`](https://github.com/apache/spark/commit/f137de748e092315cc11e66deaafbcb469dd5764). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22893: [SPARK-25868][MLlib] One part of Spark MLlib Kmean Logic...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22893 There's no merge conflict right now. You can just update the file and push the commit to your branch. If there were a merge conflict, you'd just rebase on apache/master, resolve the conflict, and force-push the branch. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21732 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98672/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21732 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21732 **[Test build #98672 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98672/testReport)** for PR 21732 at commit [`2d2057b`](https://github.com/apache/spark/commit/2d2057b4f2dbb541b4f2573944318f7a874fac3d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22759: [MINOR][SQL][DOC] Correct parquet nullability doc...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/22759#discussion_r232441937 --- Diff: docs/sql-programming-guide.md --- @@ -706,7 +706,7 @@ data across a fixed number of buckets and can be used when a number of unique va [Parquet](http://parquet.io) is a columnar format that is supported by many other data processing systems. Spark SQL provides support for both reading and writing Parquet files that automatically preserves the schema -of the original data. When writing Parquet files, all columns are automatically converted to be nullable for +of the original data. When reading Parquet files, all columns are automatically converted to be nullable for --- End diff -- This file has been re-org . Could you merge the latest master? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22759: [MINOR][SQL][DOC] Correct parquet nullability documentat...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/22759 LGTM Could you do us a favor to add the test cases for ensuring that the generated parquet files have a correct nullability value? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22932 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98671/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22932 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22932 **[Test build #98671 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98671/testReport)** for PR 22932 at commit [`04457be`](https://github.com/apache/spark/commit/04457be5bc8e6023a9b9c2e71f9a123869465cbd). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22994: [BUILD] refactor dev/lint-python in to something readabl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22994 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22994: [BUILD] refactor dev/lint-python in to something readabl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22994 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98673/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22994: [BUILD] refactor dev/lint-python in to something readabl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22994 **[Test build #98673 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98673/testReport)** for PR 22994 at commit [`56329bc`](https://github.com/apache/spark/commit/56329bc9d9d28252032fe6fef8da2ffbb1ed0f9e). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22992: [SPARK-24229] Update to Apache Thrift 0.10.0
Github user mingwandroid commented on the issue: https://github.com/apache/spark/pull/22992 Can you not just update this version so that people who care about CVE scan results can still use Apache Spark without worrying? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22893: [SPARK-25868][MLlib] One part of Spark MLlib Kmean Logic...
Github user KyleLi1985 commented on the issue: https://github.com/apache/spark/pull/22893 It seems the related file spark/python/pyspark/ml/clustering.py has been changed, during these days. My local latest commit stay on "bfe60fc on 30 Jul". So I need re-fork spark and open another pull request, or is there other method? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22855: [SPARK-25839] [Core] Implement use of KryoPool in KryoSe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22855 **[Test build #4421 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4421/testReport)** for PR 22855 at commit [`3bfc4eb`](https://github.com/apache/spark/commit/3bfc4ebbf214b6b0fadbaa10aa832303a59de97d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send u...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22275 LGTM the current change looks clearer. Thanks @BryanCutler --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22994: [BUILD] refactor dev/lint-python in to something readabl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22994 **[Test build #98673 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98673/testReport)** for PR 22994 at commit [`56329bc`](https://github.com/apache/spark/commit/56329bc9d9d28252032fe6fef8da2ffbb1ed0f9e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22992: [SPARK-24229] Update to Apache Thrift 0.10.0
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22992 Please provide a test case or reproducible step for the issue. Otherwise, please close this PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22994: [BUILD] refactor dev/lint-python in to something readabl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22994 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4907/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22994: [BUILD] refactor dev/lint-python in to something readabl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22994 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22994: [BUILD] refactor dev/lint-python in to something readabl...
Github user shaneknapp commented on the issue: https://github.com/apache/spark/pull/22994 test this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22994: [BUILD] refactor dev/lint-python in to something readabl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22994 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98669/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22994: [BUILD] refactor dev/lint-python in to something readabl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22994 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22994: [BUILD] refactor dev/lint-python in to something readabl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22994 **[Test build #98669 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98669/testReport)** for PR 22994 at commit [`56329bc`](https://github.com/apache/spark/commit/56329bc9d9d28252032fe6fef8da2ffbb1ed0f9e). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22996: [SPARK-25997][ML]add Python example code for Power Itera...
Github user huaxingao commented on the issue: https://github.com/apache/spark/pull/22996 @holdenk Yes, it is. I will include the examples in ml-clustering.md. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21732 **[Test build #98672 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98672/testReport)** for PR 21732 at commit [`2d2057b`](https://github.com/apache/spark/commit/2d2057b4f2dbb541b4f2573944318f7a874fac3d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21732 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4906/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21732 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22305: [SPARK-24561][SQL][Python] User-defined window aggregati...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22305 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98663/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22305: [SPARK-24561][SQL][Python] User-defined window aggregati...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22305 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22305: [SPARK-24561][SQL][Python] User-defined window aggregati...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22305 **[Test build #98663 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98663/testReport)** for PR 22305 at commit [`006b953`](https://github.com/apache/spark/commit/006b9533c6beb90fe93d8bc4ec875a78ec7b50af). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22997: SPARK-25999: make-distribution.sh failure with --r and -...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22997 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22997: SPARK-25999: make-distribution.sh failure with --r and -...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22997 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22997: SPARK-25999: make-distribution.sh failure with --r and -...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22997 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22997: SPARK-25999: make-distribution.sh failure with --...
GitHub user shanyu opened a pull request: https://github.com/apache/spark/pull/22997 SPARK-25999: make-distribution.sh failure with --r and -Phadoop-provided Signed-off-by: Shanyu Zhao ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/shanyu/spark shanyu-25999 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22997.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22997 commit 090c3bc1c43c2286c825d74a82304ef00a75900c Author: Shanyu Zhao Date: 2018-11-10T01:12:55Z SPARK-25999: make-distribution.sh failure with --r and -Phadoop-provided Signed-off-by: Shanyu Zhao --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22932 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4905/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22932 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22932: [SPARK-25102][SQL] Write Spark version to ORC/Parquet fi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22932 **[Test build #98671 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98671/testReport)** for PR 22932 at commit [`04457be`](https://github.com/apache/spark/commit/04457be5bc8e6023a9b9c2e71f9a123869465cbd). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22932: [SPARK-25102][SQL] Write Spark version to ORC/Par...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22932#discussion_r232430599 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala --- @@ -274,6 +278,15 @@ private[orc] class OrcOutputWriter( override def close(): Unit = { if (recordWriterInstantiated) { + // Hive 1.2.1 ORC initializes its private `writer` field at the first write. + try { +val writerField = recordWriter.getClass.getDeclaredField("writer") +writerField.setAccessible(true) +val writer = writerField.get(recordWriter).asInstanceOf[Writer] +writer.addUserMetadata(SPARK_VERSION_METADATA_KEY, UTF_8.encode(SPARK_VERSION_SHORT)) + } catch { +case NonFatal(e) => log.warn(e.toString, e) + } --- End diff -- For this case, I'll refactor out all the new code (line 281 ~ 289). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22932: [SPARK-25102][SQL] Write Spark version to ORC/Par...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22932#discussion_r232428893 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala --- @@ -274,6 +278,15 @@ private[orc] class OrcOutputWriter( override def close(): Unit = { if (recordWriterInstantiated) { + // Hive 1.2.1 ORC initializes its private `writer` field at the first write. + try { +val writerField = recordWriter.getClass.getDeclaredField("writer") +writerField.setAccessible(true) +val writer = writerField.get(recordWriter).asInstanceOf[Writer] +writer.addUserMetadata(SPARK_VERSION_METADATA_KEY, UTF_8.encode(SPARK_VERSION_SHORT)) + } catch { +case NonFatal(e) => log.warn(e.toString, e) + } --- End diff -- BTW, as you expected, we cannot use a single function for this. The `Writer` are not the same. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22938: [SPARK-25935][SQL] Prevent null rows from JSON parser
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22938 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22938: [SPARK-25935][SQL] Prevent null rows from JSON parser
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22938 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98660/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22938: [SPARK-25935][SQL] Prevent null rows from JSON parser
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22938 **[Test build #98660 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98660/testReport)** for PR 22938 at commit [`9132af3`](https://github.com/apache/spark/commit/9132af3a8ee7404e3a14c280567a418a85693c07). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22932: [SPARK-25102][SQL] Write Spark version to ORC/Par...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22932#discussion_r232428173 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcOutputWriter.scala --- @@ -36,11 +41,17 @@ private[orc] class OrcOutputWriter( private[this] val serializer = new OrcSerializer(dataSchema) private val recordWriter = { -new OrcOutputFormat[OrcStruct]() { +val orcOutputFormat = new OrcOutputFormat[OrcStruct]() { override def getDefaultWorkFile(context: TaskAttemptContext, extension: String): Path = { new Path(path) } -}.getRecordWriter(context) +} +val filename = orcOutputFormat.getDefaultWorkFile(context, ".orc") +val options = OrcMapRedOutputFormat.buildOptions(context.getConfiguration) +val writer = OrcFile.createWriter(filename, options) +val recordWriter = new OrcMapreduceRecordWriter[OrcStruct](writer) +writer.addUserMetadata(SPARK_VERSION_METADATA_KEY, UTF_8.encode(SPARK_VERSION_SHORT)) --- End diff -- Thank you for review, @gatorsmile . Sure. I'll refactor out the following line. ``` writer.addUserMetadata(SPARK_VERSION_METADATA_KEY, UTF_8.encode(SPARK_VERSION_SHORT)) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22987: [SPARK-25979][SQL] Window function: allow parentheses ar...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22987 Thanks. Technically, since it's categorized as a `BUG`, I'm +1 to have this in `branch-2.4` as a syntax bug fix. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22954: [SPARK-25981][R] Enables Arrow optimization from R DataF...
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/22954 I don't know R well enough to review that code, but the results look awesome! Nice work @HyukjinKwon!! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22954: [SPARK-25981][R] Enables Arrow optimization from ...
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/22954#discussion_r232425279 --- Diff: R/pkg/R/SQLContext.R --- @@ -189,19 +238,67 @@ createDataFrame <- function(data, schema = NULL, samplingRatio = 1.0, x } } + data[] <- lapply(data, cleanCols) - # drop factors and wrap lists - data <- setNames(lapply(data, cleanCols), NULL) + args <- list(FUN = list, SIMPLIFY = FALSE, USE.NAMES = FALSE) + if (arrowEnabled) { +shouldUseArrow <- tryCatch({ + stopifnot(length(data) > 0) + dataHead <- head(data, 1) + # Currenty Arrow optimization does not support POSIXct and raw for now. + # Also, it does not support explicit float type set by users. It leads to + # incorrect conversion. We will fall back to the path without Arrow optimization. + if (any(sapply(dataHead, function(x) is(x, "POSIXct" { +stop("Arrow optimization with R DataFrame does not support POSIXct type yet.") + } + if (any(sapply(dataHead, is.raw))) { +stop("Arrow optimization with R DataFrame does not support raw type yet.") + } + if (inherits(schema, "structType")) { +if (any(sapply(schema$fields(), function(x) x$dataType.toString() == "FloatType"))) { + stop("Arrow optimization with R DataFrame does not support FloatType type yet.") --- End diff -- Any idea what's going on with the `FloatType`? Is it a problem on the arrow side? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22954: [SPARK-25981][R] Enables Arrow optimization from ...
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/22954#discussion_r232425031 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/api/r/SQLUtils.scala --- @@ -225,4 +226,25 @@ private[sql] object SQLUtils extends Logging { } sparkSession.sessionState.catalog.listTables(db).map(_.table).toArray } + + /** + * R callable function to read a file in Arrow stream format and create a `RDD` --- End diff -- nit: a `RDD` -> an `RDD` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22932: [SPARK-25102][SQL] Write Spark version to ORC/Par...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/22932#discussion_r232424657 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala --- @@ -274,6 +278,15 @@ private[orc] class OrcOutputWriter( override def close(): Unit = { if (recordWriterInstantiated) { + // Hive 1.2.1 ORC initializes its private `writer` field at the first write. + try { +val writerField = recordWriter.getClass.getDeclaredField("writer") +writerField.setAccessible(true) +val writer = writerField.get(recordWriter).asInstanceOf[Writer] +writer.addUserMetadata(SPARK_VERSION_METADATA_KEY, UTF_8.encode(SPARK_VERSION_SHORT)) + } catch { +case NonFatal(e) => log.warn(e.toString, e) + } --- End diff -- The same comment here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22932: [SPARK-25102][SQL] Write Spark version to ORC/Par...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/22932#discussion_r232424626 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcOutputWriter.scala --- @@ -36,11 +41,17 @@ private[orc] class OrcOutputWriter( private[this] val serializer = new OrcSerializer(dataSchema) private val recordWriter = { -new OrcOutputFormat[OrcStruct]() { +val orcOutputFormat = new OrcOutputFormat[OrcStruct]() { override def getDefaultWorkFile(context: TaskAttemptContext, extension: String): Path = { new Path(path) } -}.getRecordWriter(context) +} +val filename = orcOutputFormat.getDefaultWorkFile(context, ".orc") +val options = OrcMapRedOutputFormat.buildOptions(context.getConfiguration) +val writer = OrcFile.createWriter(filename, options) +val recordWriter = new OrcMapreduceRecordWriter[OrcStruct](writer) +writer.addUserMetadata(SPARK_VERSION_METADATA_KEY, UTF_8.encode(SPARK_VERSION_SHORT)) --- End diff -- Could we create a separate function for adding these metadata? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22913: [SPARK-25902][SQL] Add support for dates with millisecon...
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/22913 Sounds good, thanks @javierluraschi ! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22996: [SPARK-25997][ML]add Python example code for Power Itera...
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/22996 Thanks for working on this! I noticed you have the example on / off tags, normally those correspond with it being included in documentation somewhere the those tags are used -- is that the plan for this PR? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/22275#discussion_r232420076 --- Diff: python/pyspark/sql/tests.py --- @@ -4923,6 +4923,28 @@ def test_timestamp_dst(self): self.assertPandasEqual(pdf, df_from_python.toPandas()) self.assertPandasEqual(pdf, df_from_pandas.toPandas()) +def test_toPandas_batch_order(self): + +# Collects Arrow RecordBatches out of order in driver JVM then re-orders in Python +def run_test(num_records, num_parts, max_records): +df = self.spark.range(num_records, numPartitions=num_parts).toDF("a") +with self.sql_conf({"spark.sql.execution.arrow.maxRecordsPerBatch": max_records}): +pdf, pdf_arrow = self._toPandas_arrow_toggle(df) +self.assertPandasEqual(pdf, pdf_arrow) + +cases = [ +(1024, 512, 2), # Try large num partitions for good chance of not collecting in order +(512, 64, 2),# Try medium num partitions to test out of order collection +(64, 8, 2), # Try small number of partitions to test out of order collection +(64, 64, 1), # Test single batch per partition +(64, 1, 64), # Test single partition, single batch +(64, 1, 8), # Test single partition, multiple batches +(30, 7, 2), # Test different sized partitions +] --- End diff -- I like the new tests, I think 0.1 on one of partitions is enough. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/22275#discussion_r232420015 --- Diff: python/pyspark/sql/tests.py --- @@ -4923,6 +4923,34 @@ def test_timestamp_dst(self): self.assertPandasEqual(pdf, df_from_python.toPandas()) self.assertPandasEqual(pdf, df_from_pandas.toPandas()) +def test_toPandas_batch_order(self): + +def delay_first_part(partition_index, iterator): +if partition_index == 0: +time.sleep(0.1) --- End diff -- I like this :) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18610: [SPARK-21386] ML LinearRegression supports warm start fr...
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/18610 @JohnHBrock this PR is pretty old so the biggest challenge is going to be updating it to the current master branch. There's some discussion around the types needing to be changed as well. If this is a thing you want to work on I'd love to do what I can to help with the review process. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22996: [SPARK-25997][ML]add Python example code for Power Itera...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22996 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98670/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22996: [SPARK-25997][ML]add Python example code for Power Itera...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22996 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22996: [SPARK-25997][ML]add Python example code for Power Itera...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22996 **[Test build #98670 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98670/testReport)** for PR 22996 at commit [`905b542`](https://github.com/apache/spark/commit/905b542a8618269bdc079f3c335a80c13d2214fa). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22996: add Python example code for Power Iteration Clustering i...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22996 **[Test build #98670 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98670/testReport)** for PR 22996 at commit [`905b542`](https://github.com/apache/spark/commit/905b542a8618269bdc079f3c335a80c13d2214fa). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22995: [SPARK-25998] [CORE] Change TorrentBroadcast to hold wea...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22995 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22996: add Python example code for Power Iteration Clustering i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22996 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22996: add Python example code for Power Iteration Clustering i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22996 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4904/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22995: [SPARK-25998] [CORE] Change TorrentBroadcast to hold wea...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22995 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22995: [SPARK-25998] [CORE] Change TorrentBroadcast to hold wea...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22995 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22996: add Python example code for Power Iteration Clust...
GitHub user huaxingao opened a pull request: https://github.com/apache/spark/pull/22996 add Python example code for Power Iteration Clustering in spark.ml ## What changes were proposed in this pull request? Add python example for Power Iteration Clustering in spark.ml ## How was this patch tested? Manually tested You can merge this pull request into a Git repository by running: $ git pull https://github.com/huaxingao/spark spark-25997 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22996.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22996 commit 905b542a8618269bdc079f3c335a80c13d2214fa Author: Huaxin Gao Date: 2018-11-09T22:32:17Z add Python example code for Power Iteration Clustering in spark.ml --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22995: [SPARK-25998] [CORE] Change TorrentBroadcast to h...
GitHub user bkrieger opened a pull request: https://github.com/apache/spark/pull/22995 [SPARK-25998] [CORE] Change TorrentBroadcast to hold weak reference of broadcast object ## What changes were proposed in this pull request? This PR changes the broadcast object in TorrentBroadcast from a strong reference to a weak reference. This allows it to be garbage collected even if the Dataset is held in memory. This is ok, because the broadcast object can always be re-read. ## How was this patch tested? Tested in Spark shell by taking a heap dump, full repro steps listed in https://issues.apache.org/jira/browse/SPARK-25998. You can merge this pull request into a Git repository by running: $ git pull https://github.com/bkrieger/spark bk/torrent-broadcast-weak Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22995.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22995 commit a2683b62985fc9c7d15fb92f3bb170a4b5225058 Author: Brandon Krieger Date: 2018-11-08T23:04:06Z use weak reference for torrent broadcast commit 99fbeecf43a289648a56d178fa55e188ce75bdb7 Author: Brandon Krieger Date: 2018-11-09T21:04:51Z fix compile commit 5e0a179c168a70b0166abe4bb51a1d26a2f1d666 Author: Brandon Krieger Date: 2018-11-09T21:33:22Z fix commit 1908b5b8dfa6c0b55db3bd9a90e21ca713e5bf25 Author: Brandon Krieger Date: 2018-11-09T21:48:44Z no npe commit 24183e5b8b63e0b4e117856ab4de7eb1b0ea6c9a Author: Brandon Krieger Date: 2018-11-09T21:52:21Z no option commit f212da322242386ce3b71e9961a964e60b587287 Author: Brandon Krieger Date: 2018-11-09T22:08:23Z typo --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22994: [BUILD] refactor dev/lint-python in to something readabl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22994 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22994: [BUILD] refactor dev/lint-python in to something readabl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22994 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22994: [BUILD] refactor dev/lint-python in to something readabl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22994 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98668/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22994: [BUILD] refactor dev/lint-python in to something readabl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22994 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98667/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22994: [BUILD] refactor dev/lint-python in to something readabl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22994 **[Test build #98669 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98669/testReport)** for PR 22994 at commit [`56329bc`](https://github.com/apache/spark/commit/56329bc9d9d28252032fe6fef8da2ffbb1ed0f9e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22994: [BUILD] refactor dev/lint-python in to something readabl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22994 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22994: [BUILD] refactor dev/lint-python in to something readabl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22994 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4903/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22994: [BUILD] refactor dev/lint-python in to something readabl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22994 **[Test build #98668 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98668/testReport)** for PR 22994 at commit [`c05683b`](https://github.com/apache/spark/commit/c05683bab177b7b203fe0ca440a19810fc2df418). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22994: [BUILD] refactor dev/lint-python in to something readabl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22994 **[Test build #98667 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98667/testReport)** for PR 22994 at commit [`6bddfec`](https://github.com/apache/spark/commit/6bddfec5cb76584c172552d8a3822e29e12c5654). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22994: [BUILD] refactor dev/lint-python in to something readabl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22994 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4902/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org