[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22037 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22037 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94546/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22037 **[Test build #94546 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94546/testReport)** for PR 22037 at commit [`9eefbe5`](https://github.com/apache/spark/commit/9eefbe5dc58bba272dedce7ae0174be89a0a9b28). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22067: [SPARK-25084][SQL] distribute by on multiple columns may...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22067 **[Test build #94556 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94556/testReport)** for PR 22067 at commit [`0a6bccc`](https://github.com/apache/spark/commit/0a6bccc9e6a308d0b064bc0f2f37f7b19294df20). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22070: Fix typos detected by github.com/client9/misspell
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22070 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22070: Fix typos detected by github.com/client9/misspell
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22070 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22067: [SPARK-25084][SQL] distribute by on multiple columns may...
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/22067 @jerryshao Could you help to trigger test build please? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22067: [SPARK-25084][SQL] distribute by on multiple columns may...
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/22067 ok to test. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22070: Fix typos detected by github.com/client9/misspell
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22070 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22070: Fix typos detected by github.com/client9/misspell
GitHub user seratch opened a pull request: https://github.com/apache/spark/pull/22070 Fix typos detected by github.com/client9/misspell ## What changes were proposed in this pull request? Fixing typos is sometimes very hard. It's not so easy to visually review them. Recently, I discovered a very useful tool for it, [misspell](https://github.com/client9/misspell). This pull request fixes minor typos detected by [misspell](https://github.com/client9/misspell) except for the false positives. If you would like me to work on other files as well, let me know. ## How was this patch tested? ### before ``` $ misspell . | grep -v '.js' R/pkg/R/SQLContext.R:354:43: "definiton" is a misspelling of "definition" R/pkg/R/SQLContext.R:424:43: "definiton" is a misspelling of "definition" R/pkg/R/SQLContext.R:445:43: "definiton" is a misspelling of "definition" R/pkg/R/SQLContext.R:495:43: "definiton" is a misspelling of "definition" NOTICE-binary:454:16: "containd" is a misspelling of "contained" R/pkg/R/context.R:46:43: "definiton" is a misspelling of "definition" R/pkg/R/context.R:74:43: "definiton" is a misspelling of "definition" R/pkg/R/DataFrame.R:591:48: "persistance" is a misspelling of "persistence" R/pkg/R/streaming.R:166:44: "occured" is a misspelling of "occurred" R/pkg/inst/worker/worker.R:65:22: "ouput" is a misspelling of "output" R/pkg/tests/fulltests/test_utils.R:106:25: "environemnt" is a misspelling of "environment" common/kvstore/src/test/java/org/apache/spark/util/kvstore/InMemoryStoreSuite.java:38:39: "existant" is a misspelling of "existent" common/kvstore/src/test/java/org/apache/spark/util/kvstore/LevelDBSuite.java:83:39: "existant" is a misspelling of "existent" common/network-common/src/main/java/org/apache/spark/network/crypto/TransportCipher.java:243:46: "transfered" is a misspelling of "transferred" common/network-common/src/main/java/org/apache/spark/network/sasl/SaslEncryption.java:234:19: "transfered" is a misspelling of "transferred" common/network-common/src/main/java/org/apache/spark/network/sasl/SaslEncryption.java:238:63: "transfered" is a misspelling of "transferred" common/network-common/src/main/java/org/apache/spark/network/sasl/SaslEncryption.java:244:46: "transfered" is a misspelling of "transferred" common/network-common/src/main/java/org/apache/spark/network/sasl/SaslEncryption.java:276:39: "transfered" is a misspelling of "transferred" common/network-common/src/main/java/org/apache/spark/network/util/AbstractFileRegion.java:27:20: "transfered" is a misspelling of "transferred" common/unsafe/src/test/scala/org/apache/spark/unsafe/types/UTF8StringPropertyCheckSuite.scala:195:15: "orgin" is a misspelling of "origin" core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala:621:39: "gauranteed" is a misspelling of "guaranteed" core/src/main/scala/org/apache/spark/status/storeTypes.scala:113:29: "ect" is a misspelling of "etc" core/src/main/scala/org/apache/spark/storage/DiskStore.scala:282:18: "transfered" is a misspelling of "transferred" core/src/main/scala/org/apache/spark/util/ListenerBus.scala:64:17: "overriden" is a misspelling of "overridden" core/src/test/scala/org/apache/spark/ShuffleSuite.scala:211:7: "substracted" is a misspelling of "subtracted" core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala:1922:49: "agriculteur" is a misspelling of "agriculture" core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala:2468:84: "truely" is a misspelling of "truly" core/src/test/scala/org/apache/spark/storage/FlatmapIteratorSuite.scala:25:18: "persistance" is a misspelling of "persistence" core/src/test/scala/org/apache/spark/storage/FlatmapIteratorSuite.scala:26:69: "persistance" is a misspelling of "persistence" data/streaming/AFINN-111.txt:1219:0: "humerous" is a misspelling of "humorous" dev/run-pip-tests:55:28: "enviroments" is a misspelling of "environments" dev/run-pip-tests:91:37: "virutal" is a misspelling of "virtual" dev/merge_spark_pr.py:377:72: "accross" is a misspelling of "across" dev/merge_spark_pr.py:378:66: "accross" is a misspelling of "across" dev/run-pip-tests:126:25: "enviroments" is a misspelling of "environments" docs/configuration.md:1830:82: "overriden" is a misspelling of "overridden" docs/structured-streaming-programming-guide.md:525:45: "processs" is a misspelling of "processes" docs/structured-streaming-programming-guide.md:1165:61: "BETWEN" is a misspelling of "BETWEEN" docs/sql-programming-guide.md:1891:810: "behaivor" is a misspelling of "behavior" examples/src/main/python/sql/arrow.py:98:8: "substract" is a misspelling of "subtract" examples/src/main/python/sql/arrow.py:103:27: "substract" is a misspelling of "subtract"
[GitHub] spark issue #22069: [MINOR][DOC] Fix Java example code in Column's comments
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22069 **[Test build #94555 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94555/testReport)** for PR 22069 at commit [`8520df8`](https://github.com/apache/spark/commit/8520df899a3364f2bb41d4155d2bed9e68772a07). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22069: [MINOR][DOC] Fix Java example code in Column's comments
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22069 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22008: [SPARK-24928][SQL] Optimize cross join according to stat...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22008 cc @wzhfy --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21977: SPARK-25004: Add spark.executor.pyspark.memory li...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21977#discussion_r209209021 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -60,14 +61,20 @@ private[spark] object PythonEvalType { */ private[spark] abstract class BasePythonRunner[IN, OUT]( funcs: Seq[ChainedPythonFunctions], -bufferSize: Int, -reuseWorker: Boolean, evalType: Int, -argOffsets: Array[Array[Int]]) +argOffsets: Array[Array[Int]], +conf: SparkConf) extends Logging { require(funcs.length == argOffsets.length, "argOffsets should have the same length as funcs") + private val bufferSize = conf.getInt("spark.buffer.size", 65536) + private val reuseWorker = conf.getBoolean("spark.python.worker.reuse", true) + // each python worker gets an equal part of the allocation. the worker pool will grow to the + // number of concurrent tasks, which is determined by the number of cores in this executor. + private val memoryMb = conf.get(PYSPARK_EXECUTOR_MEMORY) + .map(_ / conf.getInt("spark.executor.cores", 1)) --- End diff -- tiny nit: indentation --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21977: SPARK-25004: Add spark.executor.pyspark.memory li...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21977#discussion_r209209726 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/AggregateInPandasExec.scala --- @@ -137,13 +135,12 @@ case class AggregateInPandasExec( val columnarBatchIter = new ArrowPythonRunner( pyFuncs, -bufferSize, -reuseWorker, PythonEvalType.SQL_GROUPED_AGG_PANDAS_UDF, argOffsets, aggInputSchema, sessionLocalTimeZone, -pythonRunnerConf).compute(projectedRowIter, context.partitionId(), context) +pythonRunnerConf, +sparkContext.conf).compute(projectedRowIter, context.partitionId(), context) --- End diff -- Yea, same question. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21732 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2039/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21732 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21732 **[Test build #94554 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94554/testReport)** for PR 21732 at commit [`80506f4`](https://github.com/apache/spark/commit/80506f4e98184ccd66dbaac14ec52d69c358020d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21732 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22007: [SPARK-25033] Bump Apache commons.{httpclient, httpcore}
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22007 **[Test build #94553 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94553/testReport)** for PR 22007 at commit [`618de1e`](https://github.com/apache/spark/commit/618de1e71e5ce38b6f9a640a538bdfbf95b3ae7e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21868: [SPARK-24906][SQL] Adaptively enlarge split / partition ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21868 ??? why does this still target branch-2.3? is this a backport? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22007: [SPARK-25033] Bump Apache commons.{httpclient, httpcore}
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22007 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16677: [SPARK-19355][SQL] Use map output statistics to i...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16677 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22048: Fix the show method to display the wide character alignm...
Github user xuejianbest commented on the issue: https://github.com/apache/spark/pull/22048 After testing, it is found that regular expressions are changed to the following. `val regex = """[^\x00-\u2e39]""".r` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/16677 Merging to master. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22069: [MINOR][DOC] Fix Java example code in Column's comments
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22069 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22037 **[Test build #94552 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94552/testReport)** for PR 22037 at commit [`24dbada`](https://github.com/apache/spark/commit/24dbada0823e47b50892a34d19e1b8e2a63af7c3). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22037 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22037 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2038/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22069: [MINOR][DOC] Fix Java example code in Column's comments
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22069 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22069: [MINOR][DOC] Fix Java example code in Column's comments
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22069 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22069: [MINOR][DOC] Fix Java example code in Column's co...
GitHub user sadhen opened a pull request: https://github.com/apache/spark/pull/22069 [MINOR][DOC] Fix Java example code in Column's comments ## What changes were proposed in this pull request? Fix scaladoc in Column ## How was this patch tested? None You can merge this pull request into a Git repository by running: $ git pull https://github.com/sadhen/spark fix_doc_minor Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22069.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22069 commit 8520df899a3364f2bb41d4155d2bed9e68772a07 Author: å¿å¬ Date: 2018-08-10T09:24:08Z Fix Java example code in Column's comments --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22017: [SPARK-23938][SQL] Add map_zip_with function
Github user mn-mikke commented on a diff in the pull request: https://github.com/apache/spark/pull/22017#discussion_r209188342 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala --- @@ -442,3 +442,186 @@ case class ArrayAggregate( override def prettyName: String = "aggregate" } + +/** + * Merges two given maps into a single map by applying function to the pair of values with + * the same key. + */ +@ExpressionDescription( + usage = +""" + _FUNC_(map1, map2, function) - Merges two given maps into a single map by applying + function to the pair of values with the same key. For keys only presented in one map, + NULL will be passed as the value for the missing key. If an input map contains duplicated + keys, only the first entry of the duplicated key is passed into the lambda function. +""", + examples = """ +Examples: + > SELECT _FUNC_(map(1, 'a', 2, 'b'), map(1, 'x', 2, 'y'), (k, v1, v2) -> concat(v1, v2)); + {1:"ax",2:"by"} + """, + since = "2.4.0") +case class MapZipWith(left: Expression, right: Expression, function: Expression) + extends HigherOrderFunction with CodegenFallback { + + @transient lazy val functionForEval: Expression = functionsForEval.head + + @transient lazy val (leftKeyType, leftValueType, leftValueContainsNull) = +HigherOrderFunction.mapKeyValueArgumentType(left.dataType) + + @transient lazy val (rightKeyType, rightValueType, rightValueContainsNull) = +HigherOrderFunction.mapKeyValueArgumentType(right.dataType) + + @transient lazy val keyType = +TypeCoercion.findTightestCommonType(leftKeyType, rightKeyType).getOrElse(NullType) --- End diff -- Even though there is a coercion rule for unification of key types. The key types may differ in nullability flags if they are complex. In theory, we could use ```==``` and ```findTightestCommonType``` in the coercion rule since there is no codegen to be optimized for ```null``` checks. But unfortunatelly, ```bind``` gets called once before execution of coercion rules, so ```findTightestCommonType``` is important for setting up a correct input type for lamda function. Maybe, we could play with order of analysis rules, but I'm not sure about all the consequences. @ueshin could shad some light on analysis rules ordering? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22065: [SPARK-23992][CORE] ShuffleDependency does not need to b...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22065 **[Test build #94551 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94551/testReport)** for PR 22065 at commit [`a99769d`](https://github.com/apache/spark/commit/a99769dd1aac779e972ed2e23aa7598e6d7c7105). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22065: [SPARK-23992][CORE] ShuffleDependency does not need to b...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22065 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2037/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22065: [SPARK-23992][CORE] ShuffleDependency does not need to b...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22065 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22065: [SPARK-23992][CORE] ShuffleDependency does not need to b...
Github user 10110346 commented on the issue: https://github.com/apache/spark/pull/22065 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22067: [SPARK-25084][SQL] distribute by on multiple columns may...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22067 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22067: [SPARK-25084][SQL] distribute by on multiple columns may...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22067 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94547/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22067: [SPARK-25084][SQL] distribute by on multiple columns may...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22067 **[Test build #94547 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94547/testReport)** for PR 22067 at commit [`9e6941c`](https://github.com/apache/spark/commit/9e6941cfc89b16980bd5d4470baf21550ffd0877). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22068: [MINOR][DOC]Add missing compression codec .
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22068 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2036/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22068: [MINOR][DOC]Add missing compression codec .
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22068 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22068: [MINOR][DOC]Add missing compression codec .
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22068 **[Test build #94550 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94550/testReport)** for PR 22068 at commit [`74aa80c`](https://github.com/apache/spark/commit/74aa80cb63c6ea98f0b9106f0724748931317c05). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22068: [MINOR][DOC]Add missing compression codec .
GitHub user 10110346 opened a pull request: https://github.com/apache/spark/pull/22068 [MINOR][DOC]Add missing compression codec . ## What changes were proposed in this pull request? Parquet file provides six codecs: "snappy", "gzip", "lzo", "lz4", "brotli", "zstd". This pr add missing compression codec :"lz4", "brotli", "zstd" . ## How was this patch tested? N/A You can merge this pull request into a Git repository by running: $ git pull https://github.com/10110346/spark nosupportlz4 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22068.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22068 commit 74aa80cb63c6ea98f0b9106f0724748931317c05 Author: liuxian Date: 2018-08-09T07:22:01Z fix --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22011 **[Test build #94549 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94549/testReport)** for PR 22011 at commit [`ea2330b`](https://github.com/apache/spark/commit/ea2330baa61e427665ba824c3c42d1e4ec1a7934). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22011 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22011 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2035/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20637: [SPARK-23466][SQL] Remove redundant null checks i...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/20637#discussion_r209180525 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala --- @@ -43,25 +43,29 @@ object GenerateUnsafeProjection extends CodeGenerator[Seq[Expression], UnsafePro case _ => false } - // TODO: if the nullability of field is correct, we can use it to save null check. private def writeStructToBuffer( ctx: CodegenContext, input: String, index: String, - fieldTypes: Seq[DataType], + fieldTypeAndNullables: Seq[(DataType, Boolean)], --- End diff -- I think that it would be good since it is used at `JavaTypeInference` and `higherOrderFunctions`. cc @ueshin --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20637: [SPARK-23466][SQL] Remove redundant null checks i...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/20637#discussion_r209178573 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala --- @@ -170,6 +174,23 @@ object GenerateUnsafeProjection extends CodeGenerator[Seq[Expression], UnsafePro val element = CodeGenerator.getValue(tmpInput, et, index) +val primitiveTypeName = if (CodeGenerator.isPrimitiveType(jt)) { --- End diff -- good catch --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22066: [WIP][SPARK-25084][SQL] "distribute by" on multiple colu...
Github user yucai commented on the issue: https://github.com/apache/spark/pull/22066 @cloud-fan I am refining and adding tests. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21199: [SPARK-24127][SS] Continuous text socket source
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21199 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22067: [SPARK-25084][SQL] distribute by on multiple columns may...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22067 **[Test build #94547 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94547/testReport)** for PR 22067 at commit [`9e6941c`](https://github.com/apache/spark/commit/9e6941cfc89b16980bd5d4470baf21550ffd0877). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21199: [SPARK-24127][SS] Continuous text socket source
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21199 Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22053: [SPARK-25069][CORE]Using UnsafeAlignedOffset to make the...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22053 **[Test build #94548 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94548/testReport)** for PR 22053 at commit [`d95d357`](https://github.com/apache/spark/commit/d95d35794528702a2de5523ca00334d479598c57). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22053: [SPARK-25069][CORE]Using UnsafeAlignedOffset to make the...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22053 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22053: [SPARK-25069][CORE]Using UnsafeAlignedOffset to make the...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22053 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2034/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22053: [SPARK-25069][CORE]Using UnsafeAlignedOffset to make the...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/22053 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22067: [SPARK-25084][SQL] distribute by on multiple columns may...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/22067 ok to test. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22067: [SPARK-25084][SQL] distribute by on multiple columns may...
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/22067 @cloud-fan @jerryshao --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22037 **[Test build #94546 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94546/testReport)** for PR 22037 at commit [`9eefbe5`](https://github.com/apache/spark/commit/9eefbe5dc58bba272dedce7ae0174be89a0a9b28). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22037 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22037 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2033/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22036: [SPARK-25028][SQL] Avoid NPE when analyzing partition wi...
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/22036 cc @cloud-fan @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22067: [SPARK-25084][SQL] distribute by on multiple columns may...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22067 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22066: [WIP][SPARK-25084][SQL] "distribute by" on multiple colu...
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/22066 I offer other fix way. #22067 It doesn't need "input" as a global variable (If distribute by random) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22067: [SPARK-25084][SQL] distribute by on multiple columns may...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22067 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22067: [SPARK-25084][SQL] distribute by on multiple columns may...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22067 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21439: [SPARK-24391][SQL] Support arrays of any types by from_j...
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21439 @gatorsmile Could you look at the PR, please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22067: [SPARK-25084][SQL] distribute by on multiple colu...
GitHub user LantaoJin opened a pull request: https://github.com/apache/spark/pull/22067 [SPARK-25084][SQL] distribute by on multiple columns may lead to code⦠â¦gen issue ## What changes were proposed in this pull request? "distribute by" on multiple columns may lead to codegen issue ## How was this patch tested? manual test You can merge this pull request into a Git repository by running: $ git pull https://github.com/LantaoJin/spark SPARK-25084 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22067.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22067 commit 9e6941cfc89b16980bd5d4470baf21550ffd0877 Author: LantaoJin Date: 2018-08-10T07:12:32Z [SPARK-25084][SQL] distribute by on multiple columns may lead to codegen issue --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22038: [SPARK-25056][SQL] Unify the InConversion and Bin...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/22038#discussion_r209163143 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercionSuite.scala --- @@ -1378,8 +1378,8 @@ class TypeCoercionSuite extends AnalysisTest { ) ruleTest(inConversion, In(Literal("a"), Seq(Literal(1), Literal("b"))), - In(Cast(Literal("a"), StringType), -Seq(Cast(Literal(1), StringType), Cast(Literal("b"), StringType))) + In(Cast(Literal("a"), IntegerType), --- End diff -- mmmh...honestly in this case I'd rather say that string is a better type for the cast than int. I am not sure which is the result of casting "a" and "b" to int... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22064: [MINOR][BUILD] Add ECCN notice required by http://www.ap...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22064 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22064: [MINOR][BUILD] Add ECCN notice required by http://www.ap...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22064 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94537/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21732 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94542/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22065: [SPARK-23992][CORE] ShuffleDependency does not need to b...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22065 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22065: [SPARK-23992][CORE] ShuffleDependency does not need to b...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22065 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94541/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21732 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21732 **[Test build #94542 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94542/testReport)** for PR 21732 at commit [`80506f4`](https://github.com/apache/spark/commit/80506f4e98184ccd66dbaac14ec52d69c358020d). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` * For example, we build an encoder for `case class Data(a: Int, b: String)` and the real type` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22064: [MINOR][BUILD] Add ECCN notice required by http://www.ap...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22064 **[Test build #94537 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94537/testReport)** for PR 22064 at commit [`878e5ca`](https://github.com/apache/spark/commit/878e5ca274a3b9e5fe37f4e0c2ed4b499bc81676). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22053: [SPARK-25069][CORE]Using UnsafeAlignedOffset to make the...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22053 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22037: [SPARK-24774][SQL] Avro: Support logical decimal ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22037#discussion_r209162410 --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala --- @@ -138,10 +142,21 @@ class AvroDeserializer(rootAvroType: Schema, rootCatalystType: DataType) { bytes case b: Array[Byte] => b case other => throw new RuntimeException(s"$other is not a valid avro binary.") - } updater.set(ordinal, bytes) + case (FIXED, d: DecimalType) => (updater, ordinal, value) => +val bigDecimal = decimalConversions.fromFixed(value.asInstanceOf[GenericFixed], avroType, + LogicalTypes.decimal(d.precision, d.scale)) --- End diff -- ok let's leave it. We can always add later. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22065: [SPARK-23992][CORE] ShuffleDependency does not need to b...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22065 **[Test build #94541 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94541/testReport)** for PR 22065 at commit [`a99769d`](https://github.com/apache/spark/commit/a99769dd1aac779e972ed2e23aa7598e6d7c7105). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22053: [SPARK-25069][CORE]Using UnsafeAlignedOffset to make the...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22053 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94545/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22038: [SPARK-25056][SQL] Unify the InConversion and BinaryComp...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22038 **[Test build #94544 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94544/testReport)** for PR 22038 at commit [`cb25b78`](https://github.com/apache/spark/commit/cb25b788cfc3cd7799a6671713558a32969f6dff). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22038: [SPARK-25056][SQL] Unify the InConversion and BinaryComp...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22038 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94544/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22038: [SPARK-25056][SQL] Unify the InConversion and BinaryComp...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22038 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22053: [SPARK-25069][CORE]Using UnsafeAlignedOffset to make the...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22053 **[Test build #94545 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94545/testReport)** for PR 22053 at commit [`d95d357`](https://github.com/apache/spark/commit/d95d35794528702a2de5523ca00334d479598c57). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22017: [SPARK-23938][SQL] Add map_zip_with function
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/22017#discussion_r209160027 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala --- @@ -442,3 +442,186 @@ case class ArrayAggregate( override def prettyName: String = "aggregate" } + +/** + * Merges two given maps into a single map by applying function to the pair of values with + * the same key. + */ +@ExpressionDescription( + usage = +""" + _FUNC_(map1, map2, function) - Merges two given maps into a single map by applying + function to the pair of values with the same key. For keys only presented in one map, + NULL will be passed as the value for the missing key. If an input map contains duplicated + keys, only the first entry of the duplicated key is passed into the lambda function. +""", + examples = """ +Examples: + > SELECT _FUNC_(map(1, 'a', 2, 'b'), map(1, 'x', 2, 'y'), (k, v1, v2) -> concat(v1, v2)); + {1:"ax",2:"by"} + """, + since = "2.4.0") +case class MapZipWith(left: Expression, right: Expression, function: Expression) + extends HigherOrderFunction with CodegenFallback { + + @transient lazy val functionForEval: Expression = functionsForEval.head + + @transient lazy val (leftKeyType, leftValueType, leftValueContainsNull) = +HigherOrderFunction.mapKeyValueArgumentType(left.dataType) + + @transient lazy val (rightKeyType, rightValueType, rightValueContainsNull) = +HigherOrderFunction.mapKeyValueArgumentType(right.dataType) + + @transient lazy val keyType = +TypeCoercion.findTightestCommonType(leftKeyType, rightKeyType).getOrElse(NullType) --- End diff -- why do we need this? We are enforcing that the two maps have the same key type, can't we just get one? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20637: [SPARK-23466][SQL] Remove redundant null checks in gener...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20637 > When spark.sql.fromJsonForceNullableSchema=false, I think that a test is invalid to pass nullable=false in the corresponding schema to the missing field. +1. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20637: [SPARK-23466][SQL] Remove redundant null checks i...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20637#discussion_r209161074 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala --- @@ -170,6 +174,23 @@ object GenerateUnsafeProjection extends CodeGenerator[Seq[Expression], UnsafePro val element = CodeGenerator.getValue(tmpInput, et, index) +val primitiveTypeName = if (CodeGenerator.isPrimitiveType(jt)) { --- End diff -- where do we use it? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20637: [SPARK-23466][SQL] Remove redundant null checks i...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20637#discussion_r209160237 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala --- @@ -43,25 +43,29 @@ object GenerateUnsafeProjection extends CodeGenerator[Seq[Expression], UnsafePro case _ => false } - // TODO: if the nullability of field is correct, we can use it to save null check. private def writeStructToBuffer( ctx: CodegenContext, input: String, index: String, - fieldTypes: Seq[DataType], + fieldTypeAndNullables: Seq[(DataType, Boolean)], --- End diff -- shall we create a class for `(DataType, Boolean)`? it can also be used in https://github.com/apache/spark/pull/22063 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22019: [WIP][SPARK-25040][SQL] Empty string for double and floa...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22019 SGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22019: [WIP][SPARK-25040][SQL] Empty string for double and floa...
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/22019 I agree with this proposal @HyukjinKwon. I think it is wrong to consider as a null an empty string. An empty string is not a valid value for an int/double/... So in case we have, we should fail I think. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22066: [WIP][SPARK-25084][SQL] "distribute by" on multiple colu...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22066 can you add a test first? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22044: [SPARK-23912][SQL][Followup] Refactor ArrayDistin...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22044 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21889 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94536/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21889 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21889 **[Test build #94536 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94536/testReport)** for PR 21889 at commit [`51f0dc5`](https://github.com/apache/spark/commit/51f0dc59c6403aa862e18ff0192fc37b87d22320). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21933: [SPARK-24917][CORE] make chunk size configurable
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21933 cc @squito too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22044: [SPARK-23912][SQL][Followup] Refactor ArrayDistinct
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/22044 Thanks! merging to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22037 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org