[GitHub] spark issue #20327: [SPARK-12963][CORE] NM host for driver end points
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20327 **[Test build #88142 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88142/testReport)** for PR 20327 at commit [`ae4ad4a`](https://github.com/apache/spark/commit/ae4ad4a7568cf5845861237d848468c4dc8cf840). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20763: [SPARK-23523] [SQL] [BACKPORT-2.3] Fix the incorrect res...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20763 **[Test build #88144 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88144/testReport)** for PR 20763 at commit [`c0ac5ef`](https://github.com/apache/spark/commit/c0ac5ef3a1f00eee44dd50be925f983be852fe96). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20779: [SPARK-23598][SQL] Make methods in BufferedRowIterator p...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20779 **[Test build #88145 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88145/testReport)** for PR 20779 at commit [`6e45791`](https://github.com/apache/spark/commit/6e4579113fcd4eff7c042c6b1d14e672596bc54c). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20754: [SPARK-23287][CORE] Spark scheduler does not remove init...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20754 **[Test build #88143 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88143/testReport)** for PR 20754 at commit [`5a7224e`](https://github.com/apache/spark/commit/5a7224eba039d2c6421a710bdcc562c4b96f9876). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20754: [SPARK-23287][CORE] Spark scheduler does not remove init...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20754 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88143/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20779: [SPARK-23598][SQL] Make methods in BufferedRowIterator p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20779 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88145/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20779: [SPARK-23598][SQL] Make methods in BufferedRowIterator p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20779 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20754: [SPARK-23287][CORE] Spark scheduler does not remove init...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20754 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20792: Branch 2.1
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20792 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20771: [SPARK-23587][SQL] Add interpreted execution for ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20771#discussion_r173616458 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -599,8 +610,79 @@ case class MapObjects private( override def children: Seq[Expression] = lambdaFunction :: inputData :: Nil - override def eval(input: InternalRow): Any = -throw new UnsupportedOperationException("Only code-generated evaluation is supported") + // The data with UserDefinedType are actually stored with the data type of its sqlType. + // When we want to apply MapObjects on it, we have to use it. + lazy private val inputDataType = inputData.dataType match { +case u: UserDefinedType[_] => u.sqlType +case _ => inputData.dataType + } + + private def executeFuncOnCollection(inputCollection: Seq[_]): Seq[_] = { +inputCollection.map { element => + val row = InternalRow.fromSeq(Seq(element)) + lambdaFunction.eval(row) +} + } + + // Executes lambda function on input collection. + private lazy val executeFunc: Any => Seq[_] = inputDataType match { +case ObjectType(cls) if classOf[Seq[_]].isAssignableFrom(cls) => + x => executeFuncOnCollection(x.asInstanceOf[Seq[_]]) +case ObjectType(cls) if cls.isArray => + x => executeFuncOnCollection(x.asInstanceOf[Array[_]].toSeq) +case ObjectType(cls) if classOf[java.util.List[_]].isAssignableFrom(cls) => + x => executeFuncOnCollection(x.asInstanceOf[java.util.List[_]].asScala) +case ObjectType(cls) if cls == classOf[Object] => + (inputCollection) => { +if (inputCollection.getClass.isArray) { + executeFuncOnCollection(inputCollection.asInstanceOf[Array[_]].toSeq) +} else { + executeFuncOnCollection(inputCollection.asInstanceOf[Seq[_]]) +} + } +case ArrayType(et, _) => + x => executeFuncOnCollection(x.asInstanceOf[ArrayData].array) + } + + // Converts the processed collection to custom collection class if any. + private lazy val getResults: Seq[_] => Any = customCollectionCls match { +case Some(cls) if classOf[Seq[_]].isAssignableFrom(cls) => + // Scala sequence + _.toSeq +case Some(cls) if classOf[scala.collection.Set[_]].isAssignableFrom(cls) => + // Scala set + _.toSet +case Some(cls) if classOf[java.util.List[_]].isAssignableFrom(cls) => + // Java list + if (cls == classOf[java.util.List[_]] || cls == classOf[java.util.AbstractList[_]] || --- End diff -- Added. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20771: [SPARK-23587][SQL] Add interpreted execution for ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20771#discussion_r173616462 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -599,8 +610,79 @@ case class MapObjects private( override def children: Seq[Expression] = lambdaFunction :: inputData :: Nil - override def eval(input: InternalRow): Any = -throw new UnsupportedOperationException("Only code-generated evaluation is supported") + // The data with UserDefinedType are actually stored with the data type of its sqlType. + // When we want to apply MapObjects on it, we have to use it. + lazy private val inputDataType = inputData.dataType match { +case u: UserDefinedType[_] => u.sqlType +case _ => inputData.dataType + } + + private def executeFuncOnCollection(inputCollection: Seq[_]): Seq[_] = { +inputCollection.map { element => + val row = InternalRow.fromSeq(Seq(element)) + lambdaFunction.eval(row) +} + } + + // Executes lambda function on input collection. + private lazy val executeFunc: Any => Seq[_] = inputDataType match { +case ObjectType(cls) if classOf[Seq[_]].isAssignableFrom(cls) => + x => executeFuncOnCollection(x.asInstanceOf[Seq[_]]) +case ObjectType(cls) if cls.isArray => + x => executeFuncOnCollection(x.asInstanceOf[Array[_]].toSeq) +case ObjectType(cls) if classOf[java.util.List[_]].isAssignableFrom(cls) => + x => executeFuncOnCollection(x.asInstanceOf[java.util.List[_]].asScala) +case ObjectType(cls) if cls == classOf[Object] => + (inputCollection) => { +if (inputCollection.getClass.isArray) { + executeFuncOnCollection(inputCollection.asInstanceOf[Array[_]].toSeq) +} else { + executeFuncOnCollection(inputCollection.asInstanceOf[Seq[_]]) +} + } +case ArrayType(et, _) => + x => executeFuncOnCollection(x.asInstanceOf[ArrayData].array) + } + + // Converts the processed collection to custom collection class if any. + private lazy val getResults: Seq[_] => Any = customCollectionCls match { +case Some(cls) if classOf[Seq[_]].isAssignableFrom(cls) => + // Scala sequence + _.toSeq --- End diff -- Yap. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20717: [SPARK-23564][SQL] Add isNotNull check for left anti and...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20717 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20717: [SPARK-23564][SQL] Add isNotNull check for left anti and...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20717 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1445/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20717: [SPARK-23564][SQL] Add isNotNull check for left anti and...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20717 **[Test build #88149 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88149/testReport)** for PR 20717 at commit [`9e2d993`](https://github.com/apache/spark/commit/9e2d993d691ad37b230c9e14d16148b9dc9727e6). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20788: [WIP][SPARK-21030][PYTHON][SQL] Adds more types for hint...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20788 If it's difficult to make a test, let's do a manual test and describe it in the PR description. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20701: [SPARK-23528][ML] Add numIter to ClusteringSummar...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/20701#discussion_r173619104 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/BisectingKMeansSuite.scala --- @@ -127,6 +128,7 @@ class BisectingKMeansSuite assert(clusterSizes.length === k) assert(clusterSizes.sum === numRows) assert(clusterSizes.forall(_ >= 0)) +assert(summary.numIter == 2) --- End diff -- In `KMeansSuite` the value is not `maxIter` (it performs only 1 iteration in that case). In `BisectingKMeans` `numIter` is always `maxIter` since we are always performing `maxIter` (see https://github.com/apache/spark/blob/b6f837c9d3cb0f76f0a52df37e34aea8944f6867/mllib/src/main/scala/org/apache/spark/mllib/clustering/BisectingKMeans.scala#L192). Does it answer to your comment? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20779: [SPARK-23598][SQL] Make methods in BufferedRowIterator p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20779 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20779: [SPARK-23598][SQL] Make methods in BufferedRowIterator p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20779 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88147/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20719: [SPARK-23568][ML] Use metadata numAttributes if availabl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20719 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20719: [SPARK-23568][ML] Use metadata numAttributes if availabl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20719 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1448/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20790: AccumulatorV2 subclass isZero scaladoc fix
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20790#discussion_r173621260 --- Diff: core/src/main/scala/org/apache/spark/util/AccumulatorV2.scala --- @@ -290,7 +290,8 @@ class LongAccumulator extends AccumulatorV2[jl.Long, jl.Long] { private var _count = 0L /** - * Adds v to the accumulator, i.e. increment sum by v and count by 1. + * Returns false if this accumulator has had any values added to it or the sum is non-zero. + * --- End diff -- I think this duplicates the doc from `AccumulatorV2.isZero`. Can we simply remove this wrong doc and revert other changes so that we can reuse inherited doc from `AccumulatorV2.isZero` in all places? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20779: [SPARK-23598][SQL] Make methods in BufferedRowIterator p...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/20779 Ah, I increased the heap size (4GB) in my environment with IntelliJ. Should we create a class like https://github.com/apache/spark/pull/20636? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20525: [SPARK-23271[SQL] Parquet output contains only _SUCCESS ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20525 late LGTM too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20794: [SPARK-23644][CORE][UI] Use absolute path for RES...
GitHub user mgaido91 opened a pull request: https://github.com/apache/spark/pull/20794 [SPARK-23644][CORE][UI] Use absolute path for REST call in SHS ## What changes were proposed in this pull request? SHS is using a relative path for the REST API call to get the list of the application is a relative path call. In case of the SHS being consumed through a proxy, it can be an issue if the path doesn't end with a "/". Therefore, we should use an absolute path for the REST call as it is done for all the other resources. ## How was this patch tested? manual tests Before the change: ![screen shot 2018-03-10 at 4 22 02 pm](https://user-images.githubusercontent.com/8821783/37244190-8ccf9d40-2485-11e8-8fa9-345bc81472fc.png) After the change: ![screen shot 2018-03-10 at 4 36 34 pm 1](https://user-images.githubusercontent.com/8821783/37244201-a1922810-2485-11e8-8856-eeab2bf5e180.png) You can merge this pull request into a Git repository by running: $ git pull https://github.com/mgaido91/spark SPARK-23644 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20794.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20794 commit 17ea399162167092e0362f90b49a03397ae82afe Author: Marco GaidoDate: 2018-03-10T15:49:52Z [SPARK-23644][CORE][UI] Use absolute path for REST call in SHS --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20779: [SPARK-23598][SQL] Make methods in BufferedRowIterator p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20779 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20779: [SPARK-23598][SQL] Make methods in BufferedRowIterator p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20779 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1451/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20779: [SPARK-23598][SQL] Make methods in BufferedRowIterator p...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20779 **[Test build #88154 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88154/testReport)** for PR 20779 at commit [`603ce0f`](https://github.com/apache/spark/commit/603ce0fb29bfa5b5c0cfea69fb72e2a3128e772a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20719: [SPARK-23568][ML] Use metadata numAttributes if availabl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20719 **[Test build #88151 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88151/testReport)** for PR 20719 at commit [`2d64a90`](https://github.com/apache/spark/commit/2d64a9028ea138aa8b538da25637771543109076). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20717: [SPARK-23564][SQL] Add isNotNull check for left anti and...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20717 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20717: [SPARK-23564][SQL] Add isNotNull check for left anti and...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20717 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88149/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20788: [WIP][SPARK-21030][PYTHON][SQL] Adds more types f...
Github user DylanGuedes commented on a diff in the pull request: https://github.com/apache/spark/pull/20788#discussion_r173623998 --- Diff: python/pyspark/sql/dataframe.py --- @@ -437,10 +437,11 @@ def hint(self, name, *parameters): if not isinstance(name, str): raise TypeError("name should be provided as str, got {0}".format(type(name))) +allowed = [str, list, float, int] for p in parameters: -if not isinstance(p, str): +if not type(p) in allowed: --- End diff -- Didn't know that it was possible, nice! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20701: [SPARK-23528][ML] Add numIter to ClusteringSummary
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20701 **[Test build #88150 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88150/testReport)** for PR 20701 at commit [`b3d0523`](https://github.com/apache/spark/commit/b3d0523e5eed89dc800d0678adde59eb4ac4343e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20701: [SPARK-23528][ML] Add numIter to ClusteringSummary
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20701 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88150/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20701: [SPARK-23528][ML] Add numIter to ClusteringSummary
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20701 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20793: [SPARK-23643] Shrinking the buffer in hashSeed up...
GitHub user MaxGekk opened a pull request: https://github.com/apache/spark/pull/20793 [SPARK-23643] Shrinking the buffer in hashSeed up to size of the seed parameter ## What changes were proposed in this pull request? The hashSeed method allocates 64 bytes instead of 8. Other bytes are always zeros. And they could be excluded from hash calculation because they don't differentiate inputs. ## How was this patch tested? By running the existing tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/MaxGekk/spark-1 hash-buff-size Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20793.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20793 commit bb40ef2e8d337508d60903a6a824b5aa45d87326 Author: Maxim GekkDate: 2018-03-10T13:14:33Z Shrinking the buffer up to size of the long type --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20779: [SPARK-23598][SQL] Make methods in BufferedRowIterator p...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/20779 Let me reduce the number of loops. Another option is to revert this change to use non-loop version that worked without an exception. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20043: [SPARK-22856][SQL] Add wrappers for codegen output and n...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/20043 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20719: [SPARK-23568][ML] Use metadata numAttributes if availabl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20719 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88151/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20719: [SPARK-23568][ML] Use metadata numAttributes if availabl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20719 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20779: [SPARK-23598][SQL] Make methods in BufferedRowIterator p...
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/20779 I don't think so. There is an option to change the heap size for test execution, but I am not sure we are allowed/it is a good idea to do that. Let's hear others' opinion... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20717: [SPARK-23564][SQL] Add isNotNull check for left anti and...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20717 **[Test build #88149 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88149/testReport)** for PR 20717 at commit [`9e2d993`](https://github.com/apache/spark/commit/9e2d993d691ad37b230c9e14d16148b9dc9727e6). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20785: [SPARK-23640][CORE] Fix hadoop config may override spark...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20785 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1449/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20785: [SPARK-23640][CORE] Fix hadoop config may override spark...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20785 **[Test build #88152 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88152/testReport)** for PR 20785 at commit [`0034a58`](https://github.com/apache/spark/commit/0034a58437684fdcfde8511ef47278ff8bfb1fe2). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20785: [SPARK-23640][CORE] Fix hadoop config may override spark...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20785 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20793: [SPARK-23643] Shrinking the buffer in hashSeed up to siz...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20793 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20793: [SPARK-23643] Shrinking the buffer in hashSeed up to siz...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20793 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20579: [SPARK-23372][SQL] Writing empty struct in parque...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20579#discussion_r173625828 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala --- @@ -72,6 +72,29 @@ class FileBasedDataSourceSuite extends QueryTest with SharedSQLContext { } } + // Text and Parquet format does not allow wrting data frame with empty schema. + Seq("parquet", "text").foreach { format => +test(s"SPARK-23372 writing empty dataframe should produce AnalysisException - $format") { + withTempPath { outputPath => +intercept[AnalysisException] { + spark.emptyDataFrame.write.format(format).save(outputPath.toString) +} + } +} + } + + // Formats excluding text and parquet allow writing empty data frames to files. + allFileBasedDataSources.filterNot(p => p == "text" || p == "parquet").foreach { format => +test(s"SPARK-23372 writing empty dataframe and reading from it - $format") { + withTempPath { outputPath => + spark.emptyDataFrame.write.format(format).save(outputPath.toString) + intercept[AnalysisException] { +val df = spark.read.format(format).load(outputPath.toString) --- End diff -- Sorry if I misunderstood. The link is https://github.com/apache/spark/pull/20579#issuecomment-364994881. Is that the right link? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20794: [SPARK-23644][CORE][UI] Use absolute path for REST call ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20794 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20794: [SPARK-23644][CORE][UI] Use absolute path for REST call ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20794 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1450/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20794: [SPARK-23644][CORE][UI] Use absolute path for REST call ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20794 **[Test build #88153 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88153/testReport)** for PR 20794 at commit [`17ea399`](https://github.com/apache/spark/commit/17ea399162167092e0362f90b49a03397ae82afe). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20763: [SPARK-23523] [SQL] [BACKPORT-2.3] Fix the incorrect res...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20763 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20763: [SPARK-23523] [SQL] [BACKPORT-2.3] Fix the incorrect res...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20763 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88144/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20327: [SPARK-12963][CORE] NM host for driver end points
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20327 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20327: [SPARK-12963][CORE] NM host for driver end points
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20327 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88142/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20792: Branch 2.1
GitHub user dsjch123 opened a pull request: https://github.com/apache/spark/pull/20792 Branch 2.1 ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/spark branch-2.1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20792.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20792 commit 43084b3cc3918b720fe28053d2037fa22a71264e Author: Herman van HovellDate: 2017-02-23T22:58:02Z [SPARK-19459][SQL][BRANCH-2.1] Support for nested char/varchar fields in ORC ## What changes were proposed in this pull request? This is a backport of the two following commits: https://github.com/apache/spark/commit/78eae7e67fd5dec0c2d5b1853ce86cd0f1ae & https://github.com/apache/spark/commit/de8a03e68202647555e30fffba551f65bc77608d This PR adds support for ORC tables with (nested) char/varchar fields. ## How was this patch tested? Added a regression test to `OrcSourceSuite`. Author: Herman van Hovell Closes #17041 from hvanhovell/SPARK-19459-branch-2.1. commit 66a7ca28a9de92e67ce24896a851a0c96c92aec6 Author: Takeshi Yamamuro Date: 2017-02-24T09:54:00Z [SPARK-19691][SQL][BRANCH-2.1] Fix ClassCastException when calculating percentile of decimal column ## What changes were proposed in this pull request? This is a backport of the two following commits: https://github.com/apache/spark/commit/93aa4271596a30752dc5234d869c3ae2f6e8e723 This pr fixed a class-cast exception below; ``` scala> spark.range(10).selectExpr("cast (id as decimal) as x").selectExpr("percentile(x, 0.5)").collect() java.lang.ClassCastException: org.apache.spark.sql.types.Decimal cannot be cast to java.lang.Number at org.apache.spark.sql.catalyst.expressions.aggregate.Percentile.update(Percentile.scala:141) at org.apache.spark.sql.catalyst.expressions.aggregate.Percentile.update(Percentile.scala:58) at org.apache.spark.sql.catalyst.expressions.aggregate.TypedImperativeAggregate.update(interfaces.scala:514) at org.apache.spark.sql.execution.aggregate.AggregationIterator$$anonfun$1$$anonfun$applyOrElse$1.apply(AggregationIterator.scala:171) at org.apache.spark.sql.execution.aggregate.AggregationIterator$$anonfun$1$$anonfun$applyOrElse$1.apply(AggregationIterator.scala:171) at org.apache.spark.sql.execution.aggregate.AggregationIterator$$anonfun$generateProcessRow$1.apply(AggregationIterator.scala:187) at org.apache.spark.sql.execution.aggregate.AggregationIterator$$anonfun$generateProcessRow$1.apply(AggregationIterator.scala:181) at org.apache.spark.sql.execution.aggregate.ObjectAggregationIterator.processInputs(ObjectAggregationIterator.scala:151) at org.apache.spark.sql.execution.aggregate.ObjectAggregationIterator.(ObjectAggregationIterator.scala:78) at org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec$$anonfun$doExecute$1$$anonfun$2.apply(ObjectHashAggregateExec.scala:109) at ``` This fix simply converts catalyst values (i.e., `Decimal`) into scala ones by using `CatalystTypeConverters`. ## How was this patch tested? Added a test in `DataFrameSuite`. Author: Takeshi Yamamuro Closes #17046 from maropu/SPARK-19691-BACKPORT2.1. commit 6da6a27f673f6e45fe619e0411fbaaa14ea34bfb Author: jerryshao Date: 2017-02-24T17:28:59Z [SPARK-19707][CORE] Improve the invalid path check for sc.addJar ## What changes were proposed in this pull request? Currently in Spark there're two issues when we add jars with invalid path: * If the jar path is a empty string {--jar ",dummy.jar"}, then Spark will resolve it to the current directory path and add to classpath / file server, which is unwanted. This is happened in our programatic way to submit Spark application. From my understanding Spark should defensively filter out such empty path. * If the jar path is a invalid path (file doesn't exist), `addJar` doesn't check it and will still add to file server, the exception will be delayed until job running. Actually this local path could be checked beforehand, no need to wait until task running. We have similar check in
[GitHub] spark issue #20792: Branch 2.1
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20792 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20775: [PYTHON] Changes input variable to not conflict w...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20775 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20792: Branch 2.1
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20792 @dsjch123 seems mistakenly open. Mind closing this please? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20779: [SPARK-23598][SQL] Make methods in BufferedRowIterator p...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20779 **[Test build #88147 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88147/testReport)** for PR 20779 at commit [`6e45791`](https://github.com/apache/spark/commit/6e4579113fcd4eff7c042c6b1d14e672596bc54c). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20633 cc @dbtsai if you have time to look at this too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20043: [SPARK-22856][SQL] Add wrappers for codegen output and n...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20043 ping @hvanhovell @cloud-fan Any more comment for this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20775: [PYTHON] Changes input variable to not conflict with bui...
Github user DylanGuedes commented on the issue: https://github.com/apache/spark/pull/20775 Sure. Done. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20701: [SPARK-23528][ML] Add numIter to ClusteringSummar...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/20701#discussion_r173618874 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeansModel.scala --- @@ -46,6 +47,10 @@ class KMeansModel @Since("2.4.0") (@Since("1.0.0") val clusterCenters: Array[Vec private val clusterCentersWithNorm = if (clusterCenters == null) null else clusterCenters.map(new VectorWithNorm(_)) + @Since("2.4.0") + def this(clusterCenters: Array[Vector], distanceMeasure: String) = +this(clusterCenters: Array[Vector], distanceMeasure, -1) --- End diff -- yes, this can happen for instance when reloading a persisted model. Moreover this is only for the mllib model, which as far as I know is suggested not to be used anymore in favor of the new ml api. Any concern/suggestion about this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20701: [SPARK-23528][ML] Add numIter to ClusteringSummary
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20701 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20701: [SPARK-23528][ML] Add numIter to ClusteringSummary
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20701 **[Test build #88150 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88150/testReport)** for PR 20701 at commit [`b3d0523`](https://github.com/apache/spark/commit/b3d0523e5eed89dc800d0678adde59eb4ac4343e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20701: [SPARK-23528][ML] Add numIter to ClusteringSummary
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20701 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1447/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20719: [SPARK-23568][ML] Use metadata numAttributes if availabl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20719 **[Test build #88151 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88151/testReport)** for PR 20719 at commit [`2d64a90`](https://github.com/apache/spark/commit/2d64a9028ea138aa8b538da25637771543109076). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20779: [SPARK-23598][SQL] Make methods in BufferedRowIterator p...
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/20779 @kiszk the UT error is valid. How did you tested it? Any idea about the reason of the OOM? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20790: AccumulatorV2 subclass isZero scaladoc fix
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20790 Shall we fix the title to `[MINOR][DOCS] AccumulatorV2 ...` to be consistent with other PRs? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20771: [SPARK-23587][SQL] Add interpreted execution for MapObje...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20771 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20771: [SPARK-23587][SQL] Add interpreted execution for MapObje...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20771 **[Test build #88146 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88146/testReport)** for PR 20771 at commit [`e725608`](https://github.com/apache/spark/commit/e725608d1b38a7a2b1a0677afca947cec6a12801). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20771: [SPARK-23587][SQL] Add interpreted execution for MapObje...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20771 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1443/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20779: [SPARK-23598][SQL] Make methods in BufferedRowIterator p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20779 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20779: [SPARK-23598][SQL] Make methods in BufferedRowIterator p...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20779 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20779: [SPARK-23598][SQL] Make methods in BufferedRowIterator p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20779 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1444/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20779: [SPARK-23598][SQL] Make methods in BufferedRowIterator p...
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/20779 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20717: [SPARK-23564][SQL] Add isNotNull check for left anti and...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20717 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1446/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20717: [SPARK-23564][SQL] Add isNotNull check for left anti and...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20717 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20787: Documenting months_between direction
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20787#discussion_r173619271 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -2708,7 +2708,8 @@ object functions { def minute(e: Column): Column = withExpr { Minute(e.expr) } /** - * Returns number of months between dates `date1` and `date2`. + * Returns number of months between dates `date1` and `date2`. If `date1` is later than `date2`, + * then the result is positive. --- End diff -- Can we resemble this: https://github.com/apache/spark/blob/6e36d8d56279a2c5c92c8df8e89ee99b514817e7/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L884-L888 ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20787: Documenting months_between direction
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20787 Let's fix https://github.com/apache/spark/blob/2ce37b50fc01558f49ad22f89c8659f50544ffec/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala#L1120-L1124 too while we are here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20783: [SPARK-23628][SQL][BACKPORT-2.3] calculateParamLe...
Github user mgaido91 closed the pull request at: https://github.com/apache/spark/pull/20783 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/20692 any thoughts @cloud-fan @dongjoon-hyun @gatorsmile @rdblue about the above analysis? Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20790: AccumulatorV2 subclass isZero scaladoc fix
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20790 Wait .. I just found you opened a JIRA - SPARK-23642. Please link it by `[SPARK-23642][DOCS] ...`. see https://spark.apache.org/contributing.html --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20779: [SPARK-23598][SQL] Make methods in BufferedRowIterator p...
Github user dvogelbacher commented on the issue: https://github.com/apache/spark/pull/20779 LGTM as well, thanks for making the PR @kiszk --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20775: [PYTHON] Changes input variable to not conflict with bui...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20775 Merged to master and branch-2.3. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20649: [SPARK-23462][SQL] improve missing field error message i...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20649 ping @xysun --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20779: [SPARK-23598][SQL] Make methods in BufferedRowIterator p...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20779 **[Test build #88147 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88147/testReport)** for PR 20779 at commit [`6e45791`](https://github.com/apache/spark/commit/6e4579113fcd4eff7c042c6b1d14e672596bc54c). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20771: [SPARK-23587][SQL] Add interpreted execution for MapObje...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20771 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20771: [SPARK-23587][SQL] Add interpreted execution for MapObje...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20771 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88146/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20771: [SPARK-23587][SQL] Add interpreted execution for MapObje...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20771 **[Test build #88146 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88146/testReport)** for PR 20771 at commit [`e725608`](https://github.com/apache/spark/commit/e725608d1b38a7a2b1a0677afca947cec6a12801). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20771: [SPARK-23587][SQL] Add interpreted execution for ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20771#discussion_r173616466 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -599,8 +610,79 @@ case class MapObjects private( override def children: Seq[Expression] = lambdaFunction :: inputData :: Nil - override def eval(input: InternalRow): Any = -throw new UnsupportedOperationException("Only code-generated evaluation is supported") + // The data with UserDefinedType are actually stored with the data type of its sqlType. + // When we want to apply MapObjects on it, we have to use it. + lazy private val inputDataType = inputData.dataType match { +case u: UserDefinedType[_] => u.sqlType +case _ => inputData.dataType + } + + private def executeFuncOnCollection(inputCollection: Seq[_]): Seq[_] = { +inputCollection.map { element => + val row = InternalRow.fromSeq(Seq(element)) + lambdaFunction.eval(row) +} + } + + // Executes lambda function on input collection. + private lazy val executeFunc: Any => Seq[_] = inputDataType match { +case ObjectType(cls) if classOf[Seq[_]].isAssignableFrom(cls) => + x => executeFuncOnCollection(x.asInstanceOf[Seq[_]]) +case ObjectType(cls) if cls.isArray => + x => executeFuncOnCollection(x.asInstanceOf[Array[_]].toSeq) +case ObjectType(cls) if classOf[java.util.List[_]].isAssignableFrom(cls) => + x => executeFuncOnCollection(x.asInstanceOf[java.util.List[_]].asScala) +case ObjectType(cls) if cls == classOf[Object] => + (inputCollection) => { +if (inputCollection.getClass.isArray) { --- End diff -- Sorry... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20771: [SPARK-23587][SQL] Add interpreted execution for ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20771#discussion_r173616476 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -599,8 +610,79 @@ case class MapObjects private( override def children: Seq[Expression] = lambdaFunction :: inputData :: Nil - override def eval(input: InternalRow): Any = -throw new UnsupportedOperationException("Only code-generated evaluation is supported") + // The data with UserDefinedType are actually stored with the data type of its sqlType. + // When we want to apply MapObjects on it, we have to use it. + lazy private val inputDataType = inputData.dataType match { +case u: UserDefinedType[_] => u.sqlType +case _ => inputData.dataType + } + + private def executeFuncOnCollection(inputCollection: Seq[_]): Seq[_] = { +inputCollection.map { element => + val row = InternalRow.fromSeq(Seq(element)) + lambdaFunction.eval(row) +} + } + + // Executes lambda function on input collection. + private lazy val executeFunc: Any => Seq[_] = inputDataType match { +case ObjectType(cls) if classOf[Seq[_]].isAssignableFrom(cls) => + x => executeFuncOnCollection(x.asInstanceOf[Seq[_]]) +case ObjectType(cls) if cls.isArray => + x => executeFuncOnCollection(x.asInstanceOf[Array[_]].toSeq) +case ObjectType(cls) if classOf[java.util.List[_]].isAssignableFrom(cls) => + x => executeFuncOnCollection(x.asInstanceOf[java.util.List[_]].asScala) +case ObjectType(cls) if cls == classOf[Object] => + (inputCollection) => { +if (inputCollection.getClass.isArray) { + executeFuncOnCollection(inputCollection.asInstanceOf[Array[_]].toSeq) +} else { + executeFuncOnCollection(inputCollection.asInstanceOf[Seq[_]]) +} + } +case ArrayType(et, _) => + x => executeFuncOnCollection(x.asInstanceOf[ArrayData].array) + } + + // Converts the processed collection to custom collection class if any. + private lazy val getResults: Seq[_] => Any = customCollectionCls match { +case Some(cls) if classOf[Seq[_]].isAssignableFrom(cls) => + // Scala sequence + _.toSeq +case Some(cls) if classOf[scala.collection.Set[_]].isAssignableFrom(cls) => + // Scala set + _.toSet +case Some(cls) if classOf[java.util.List[_]].isAssignableFrom(cls) => + // Java list + if (cls == classOf[java.util.List[_]] || cls == classOf[java.util.AbstractList[_]] || + cls == classOf[java.util.AbstractSequentialList[_]]) { +_.asJava + } else { +(results) => { + val builder = Try(cls.getConstructor(Integer.TYPE)).map { constructor => --- End diff -- Not sure if I understand correctly. Please check update again. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20757: [SPARK-23595][SQL] ValidateExternalType should su...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20757#discussion_r173616654 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -1440,7 +1463,7 @@ case class ValidateExternalType(child: Expression, expected: DataType) Seq(classOf[java.math.BigDecimal], classOf[scala.math.BigDecimal], classOf[Decimal]) .map(cls => s"$obj instanceof ${cls.getName}").mkString(" || ") case _: ArrayType => -s"$obj instanceof ${classOf[Seq[_]].getName} || $obj.getClass().isArray()" +s"$obj.getClass().isArray() || $obj instanceof ${classOf[Seq[_]].getName}" --- End diff -- Why we need to change codegen implementation? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20788: [WIP][SPARK-21030][PYTHON][SQL] Adds more types f...
Github user DylanGuedes commented on a diff in the pull request: https://github.com/apache/spark/pull/20788#discussion_r173618711 --- Diff: python/pyspark/sql/dataframe.py --- @@ -437,10 +437,11 @@ def hint(self, name, *parameters): if not isinstance(name, str): raise TypeError("name should be provided as str, got {0}".format(type(name))) +allowed = [str, list, float, int] for p in parameters: -if not isinstance(p, str): +if not type(p) in allowed: --- End diff -- Oh good to know, good catch. But then should I replicate `isinstance` for the other types (int, float, etc)? Or adding unicode to `allowed` is also a solution? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20600: [SPARK-23412][ML] Add cosine distance to BisectingKMeans
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/20600 any more comments @srowen ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20717: [SPARK-23564][SQL] Add isNotNull check for left anti and...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20717 **[Test build #88148 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88148/testReport)** for PR 20717 at commit [`d8a1190`](https://github.com/apache/spark/commit/d8a11901bb2785739caa593b3048df420419d35b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20717: [SPARK-23564][SQL] Add isNotNull check for left anti and...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20717 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20717: [SPARK-23564][SQL] Add isNotNull check for left anti and...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20717 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88148/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20717: [SPARK-23564][SQL] Add isNotNull check for left anti and...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20717 **[Test build #88148 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88148/testReport)** for PR 20717 at commit [`d8a1190`](https://github.com/apache/spark/commit/d8a11901bb2785739caa593b3048df420419d35b). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `trait QueryPlanConstraints ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20788: [WIP][SPARK-21030][PYTHON][SQL] Adds more types f...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20788#discussion_r173619046 --- Diff: python/pyspark/sql/dataframe.py --- @@ -437,10 +437,11 @@ def hint(self, name, *parameters): if not isinstance(name, str): raise TypeError("name should be provided as str, got {0}".format(type(name))) +allowed = [str, list, float, int] for p in parameters: -if not isinstance(p, str): +if not type(p) in allowed: --- End diff -- Hm? can't we just simply do `isinstance(p, (basestring, list, float, int))`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org