[GitHub] spark issue #21298: [SPARK-24198][SparkR][SQL] Adding slice function to Spar...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21298 **[Test build #90539 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90539/testReport)** for PR 21298 at commit [`1c9511e`](https://github.com/apache/spark/commit/1c9511e976f4991185e195bafff388ae7162f24d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21298: [SPARK-24198][SparkR][SQL] Adding slice function to Spar...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21298 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90539/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21298: [SPARK-24198][SparkR][SQL] Adding slice function to Spar...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21298 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21311 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21310: [SPARK-24256][SQL] SPARK-24256: ExpressionEncoder...
GitHub user fangshil opened a pull request: https://github.com/apache/spark/pull/21310 [SPARK-24256][SQL] SPARK-24256: ExpressionEncoder should support user-defined types as fields of Scala case class and tuple ## What changes were proposed in this pull request? Right now, ExpressionEncoder supports ser/de of primitive types, as well as scala case class, tuple and java bean class. Spark's Dataset natively supports these mentioned types, but we find Dataset is not flexible for other user-defined types and encoders. For example, spark-avro has an AvroEncoder for ser/de Avro types in Dataset. Although we can use AvroEncoder to define Dataset with types being the Avro Generic or Specific Record, using such Avro typed Dataset has many limitations: 1. We can not use joinWith on this Dataset since the result is a tuple, but Avro types cannot be the field of this tuple. 2. We can not use some type-safe aggregation methods on this Dataset, such as KeyValueGroupedDataset's reduceGroups, since the result is also a tuple. 3. We cannot augment an Avro SpecificRecord with additional primitive fields together in a case class, which we find is a very common use case. The limitation that Spark does not support define a Scala case class/tuple with subfields being any other user-defined type, is because ExpressionEncoder does not discover the implicit Encoder for the user-defined field types, thus can not use any Encoder to serde the user-defined fields in case class/tuple. To address this issue, we propose a trait as a contract(between ExpressionEncoder and any other user-defined Encoder) to enable case class/tuple/java bean's ExpressionEncoder to discover the serializer/deserializer/schema from the Encoder of the user-defined type. With this proposed patch and our minor modification in AvroEncoder, we remove these limitations with cluster-default conf spark.expressionencoder.org.apache.avro.specific.SpecificRecord = com.databricks.spark.avro.AvroEncoder$ This is a patch we have implemented internally and has been used for a few quarters. We want to propose to upstream as we think this is a useful feature to make Dataset more flexible to user types. ## How was this patch tested? We have tested this patch internally. Did not write unit test since the user-defined Encoder(AvroEncoder) is defined outside Spark. We look for comments on how to write unit tests for this path. You can merge this pull request into a Git repository by running: $ git pull https://github.com/fangshil/spark SPARK-24256 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21310.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21310 commit 547ff81e0470bed14371996da89924bfed0cc101 Author: Fangshi LiDate: 2018-02-02T02:16:14Z [SPARK-24256][SQL]ExpressionEncoder should support user-defined types as fields of Scala case class and tuple --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21247: [SPARK-24190] Separating JSONOptions for read
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/21247#discussion_r187769734 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala --- @@ -138,3 +121,40 @@ private[sql] class JSONOptions( factory.configure(JsonParser.Feature.ALLOW_UNQUOTED_CONTROL_CHARS, allowUnquotedControlChars) } } + +private[sql] class JSONOptionsInRead( +@transient override val parameters: CaseInsensitiveMap[String], +defaultTimeZoneId: String, +defaultColumnNameOfCorruptRecord: String) + extends JSONOptions(parameters, defaultTimeZoneId, defaultColumnNameOfCorruptRecord) { + + def this( +parameters: Map[String, String], +defaultTimeZoneId: String, +defaultColumnNameOfCorruptRecord: String = "") = { +this( + CaseInsensitiveMap(parameters), + defaultTimeZoneId, + defaultColumnNameOfCorruptRecord) + } + + protected override def checkedEncoding(enc: String): String = { +// The following encodings are not supported in per-line mode (multiline is false) +// because they cause some problems in reading files with BOM which is supposed to +// present in the files with such encodings. After splitting input files by lines, +// only the first lines will have the BOM which leads to impossibility for reading +// the rest lines. Besides of that, the lineSep option must have the BOM in such +// encodings which can never present between lines. +val blacklist = Seq(Charset.forName("UTF-16"), Charset.forName("UTF-32")) +val isBlacklisted = blacklist.contains(Charset.forName(enc)) +require(multiLine || !isBlacklisted, --- End diff -- Do we need to check `blacklist` in write path? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate th...
GitHub user cxzl25 opened a pull request: https://github.com/apache/spark/pull/21311 [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new size may be wrong ## What changes were proposed in this pull request? LongToUnsafeRowMap Calculate the new size simply by multiplying by 2 At this time, the size of the application may not be enough to store data Some data is lost and the data read out is dirty ## How was this patch tested? HashedRelationSuite test("LongToUnsafeRowMap with big values") Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/cxzl25/spark fix_LongToUnsafeRowMap_page_size Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21311.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21311 commit d9d8e62c2de7d9d04534396ab3bbf984ab16c7f5 Author: sychenDate: 2018-05-12T11:14:17Z LongToUnsafeRowMap Calculate the new correct size --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20611: [SPARK-23425][SQL]Support wildcard in HDFS path for load...
Github user sujith71955 commented on the issue: https://github.com/apache/spark/pull/20611 @kevinyu98 Spark supports wildcards characters in both local and non local scenarios, the only restriction spark imposes is using wildcards in folder names of the local file-system path , but currently because of bug,in HDFS file path scenarios wildcards are not working, My PR has addressed these issues, as per your test result still you are getting an issue for the first use-case, i think you are right there will be problem if path is getting truncated. hope you are fine if i take care this problem in my PR. Thanks a lot for your valuable input. please let me know for any clarifications. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21309: [SPARK-23907] Removes regr_* functions in functions.scal...
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/21309 @rxin I also always had some problems in understanding which functions are available and which not. I think it is quite confusing for a user. I am wondering: what do you think if we automatically generate this file in order to make all them available to users? In this way probably we don't have the huge maintenance effort exposing all of them to users. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21246: [SPARK-23901][SQL] Add masking functions
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/21246#discussion_r187769261 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala --- @@ -0,0 +1,569 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions + +import org.apache.commons.codec.digest.DigestUtils + +import org.apache.spark.sql.AnalysisException +import org.apache.spark.sql.catalyst.expressions.MaskExpressionsUtils._ +import org.apache.spark.sql.catalyst.expressions.MaskLike._ +import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, CodeGenerator, ExprCode} +import org.apache.spark.sql.types._ +import org.apache.spark.unsafe.types.UTF8String + + +trait MaskLike { + def upper: String + def lower: String + def digit: String + + protected lazy val upperReplacement: Int = getReplacementChar(upper, defaultMaskedUppercase) + protected lazy val lowerReplacement: Int = getReplacementChar(lower, defaultMaskedLowercase) + protected lazy val digitReplacement: Int = getReplacementChar(digit, defaultMaskedDigit) + + protected val maskUtilsClassName: String = classOf[MaskExpressionsUtils].getName + + def inputStringLengthCode(inputString: String, length: String): String = { +s"${CodeGenerator.JAVA_INT} $length = $inputString.codePointCount(0, $inputString.length());" + } + + def appendMaskedToStringBuilderCode( + ctx: CodegenContext, + sb: String, + inputString: String, + offset: String, + numChars: String): String = { +val i = ctx.freshName("i") +val codePoint = ctx.freshName("codePoint") +s""" + |for (${CodeGenerator.JAVA_INT} $i = 0; $i < $numChars; $i++) { + | ${CodeGenerator.JAVA_INT} $codePoint = $inputString.codePointAt($offset); + | $sb.appendCodePoint($maskUtilsClassName.transformChar($codePoint, + |$upperReplacement, $lowerReplacement, + |$digitReplacement, $defaultMaskedOther)); + | $offset += Character.charCount($codePoint); + |} + """.stripMargin + } + + def appendUnchangedToStringBuilderCode( + ctx: CodegenContext, + sb: String, + inputString: String, + offset: String, + numChars: String): String = { +val i = ctx.freshName("i") +val codePoint = ctx.freshName("codePoint") +s""" + |for (${CodeGenerator.JAVA_INT} $i = 0; $i < $numChars; $i++) { + | ${CodeGenerator.JAVA_INT} $codePoint = $inputString.codePointAt($offset); + | $sb.appendCodePoint($codePoint); + | $offset += Character.charCount($codePoint); + |} + """.stripMargin + } + + def appendMaskedToStringBuffer( + sb: StringBuffer, + inputString: String, + startOffset: Int, + numChars: Int): Int = { +var offset = startOffset +(1 to numChars) foreach { _ => + val codePoint = inputString.codePointAt(offset) + sb.appendCodePoint(transformChar( +codePoint, +upperReplacement, +lowerReplacement, +digitReplacement, +defaultMaskedOther)) + offset += Character.charCount(codePoint) +} +offset + } + + def appendUnchangedToStringBuffer( + sb: StringBuffer, + inputString: String, + startOffset: Int, + numChars: Int): Int = { +var offset = startOffset +(1 to numChars) foreach { _ => + val codePoint = inputString.codePointAt(offset) + sb.appendCodePoint(codePoint) + offset += Character.charCount(codePoint) +} +offset + } +} + +trait MaskLikeWithN extends MaskLike { + def n: Int + protected lazy val charCount: Int = if (n < 0) 0 else n +} + +/** + * Utils for mask
[GitHub] spark pull request #21298: [SPARK-24198][SparkR][SQL] Adding slice function ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21298 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21310: [SPARK-24256][SQL] SPARK-24256: ExpressionEncoder should...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21310 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21310: [SPARK-24256][SQL] SPARK-24256: ExpressionEncoder should...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21310 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21298: [SPARK-24198][SparkR][SQL] Adding slice function to Spar...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21298 **[Test build #90539 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90539/testReport)** for PR 21298 at commit [`1c9511e`](https://github.com/apache/spark/commit/1c9511e976f4991185e195bafff388ae7162f24d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21273: [SPARK-17916][SQL] Fix empty string being parsed as null...
Github user gengliangwang commented on the issue: https://github.com/apache/spark/pull/21273 LGTM, it would be nice to have a micro-benmark suite in this PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21307: [SPARK-24186][R][SQL]change reverse and concat to collec...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21307 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90540/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21307: [SPARK-24186][R][SQL]change reverse and concat to collec...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21307 **[Test build #90540 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90540/testReport)** for PR 21307 at commit [`132af85`](https://github.com/apache/spark/commit/132af853bba50561926a0be3bb9b585305b4c8b5). * This patch **fails RAT tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21307: [SPARK-24186][R][SQL]change reverse and concat to collec...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21307 @huaxingao. feel free to reopen a PR if you get any problem. It's fine. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21299: [SPARK-24250][SQL] support accessing SQLConf inside task...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/21299 Retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21307: [SPARK-24186][R][SQL]change reverse and concat to collec...
Github user huaxingao commented on the issue: https://github.com/apache/spark/pull/21307 @HyukjinKwon I think I resolved the problem. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21300: [SPARK-24067][BACKPORT-2.3][STREAMING][KAFKA] All...
Github user koeninger closed the pull request at: https://github.com/apache/spark/pull/21300 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21307: [SPARK-24186][R][SQL]change reverse and concat to collec...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21307 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21307: [SPARK-24186][R][SQL]change reverse and concat to collec...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21307 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3167/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21299: [SPARK-24250][SQL] support accessing SQLConf inside task...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21299 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3169/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20347: [SPARK-20129][Core] JavaSparkContext should use S...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20347 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20530: [SPARK-23349][SQL]ShuffleExchangeExec: Duplicate ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20530 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21303: [BUILD] Close stale PRs
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21303 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20905: [SPARK-23795][LAUNCHER] Make AbstractLauncher#sel...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20905 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20078: [SPARK-22900] [Spark-Streaming] Remove unnecessar...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20078 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20966: [SPARK-23854] Update Guava to 16.0.1
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20966 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20825: add impurity stats in tree leaf node debug string
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20825 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20857: [SPARK-23735]Optimize the document by adding an i...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20857 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20458: changed scala example from java "style" to scala
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20458 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18227: [SPARK-21005][ML] Fix VectorIndexerModel does not...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18227 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20881: Add a note about jobs running in FIFO order in th...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20881 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20557: [SPARK-23364][SQL]'desc table' command in spark-s...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20557 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20683: [SPARK-8605] Exclude files in StreamingContext. t...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20683 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19694: [SPARK-22470][DOC][SQL] functions.hash is also us...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19694 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19951: [SPARK-22760][CORE][YARN] When sc.stop() is calle...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19951 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21281: Branch 1.6
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21281 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20635: [SPARK-23053][CORE][BRANCH-2.1] taskBinarySeriali...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20635 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21114: [SPARK-22371][CORE] Return None instead of throwi...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21114#discussion_r187763285 --- Diff: core/src/test/scala/org/apache/spark/AccumulatorSuite.scala --- @@ -209,10 +209,8 @@ class AccumulatorSuite extends SparkFunSuite with Matchers with LocalSparkContex System.gc() assert(ref.get.isEmpty) -// Getting a garbage collected accum should throw error -intercept[IllegalStateException] { - AccumulatorContext.get(accId) -} +// Getting a garbage collected accum should return None. +assert(AccumulatorContext.get(accId).isEmpty) --- End diff -- Cool! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21301: [SPARK-24228][SQL] Fix Java lint errors
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/21301 @kiszk . Could you update your PR description like this? ``` ~/PR-21301:PR-21301$ dev/lint-java exec: curl --progress-bar -L https://downloads.typesafe.com/zinc/0.3.15/zinc-0.3.15.tgz 100.0% exec: curl --progress-bar -L https://downloads.typesafe.com/scala/2.11.8/scala-2.11.8.tgz 100.0% exec: curl --progress-bar -L https://www.apache.org/dyn/closer.lua?action=download=/maven/maven-3/3.3.9/binaries/apache-maven-3.3.9-bin.tar.gz 100.0% Using `mvn` from path: /home/dongjoon/PR-21301/build/apache-maven-3.3.9/bin/mvn Checkstyle checks passed. ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21307: [SPARK-24186][R][SQL]change reverse and concat to collec...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21307 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3168/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21307: [SPARK-24186][R][SQL]change reverse and concat to collec...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21307 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21114: [SPARK-22371][CORE] Return None instead of throwi...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/21114#discussion_r187763308 --- Diff: core/src/test/scala/org/apache/spark/AccumulatorSuite.scala --- @@ -237,6 +236,65 @@ class AccumulatorSuite extends SparkFunSuite with Matchers with LocalSparkContex acc.merge("kindness") assert(acc.value === "kindness") } + + test("updating garbage collected accumulators") { +// Simulate FetchFailedException in the first attempt to force a retry. +// Then complete remaining task from the first attempt after the second +// attempt started, but before it completes. Completion event for the first +// attempt will try to update garbage collected accumulators. +val numPartitions = 2 +sc = new SparkContext("local[2]", "test") + +val attempt0Latch = new TestLatch("attempt0") +val attempt1Latch = new TestLatch("attempt1") + +val x = sc.parallelize(1 to 100, numPartitions).groupBy(identity) +val sid = x.dependencies.head.asInstanceOf[ShuffleDependency[_, _, _]].shuffleHandle.shuffleId +val rdd = x.mapPartitionsWithIndex { case (i, iter) => + val taskContext = TaskContext.get() + if (taskContext.stageAttemptNumber() == 0) { +if (i == 0) { + // Fail the first task in the first stage attempt to force retry. + throw new FetchFailedException( +SparkEnv.get.blockManager.blockManagerId, +sid, +taskContext.partitionId(), +taskContext.partitionId(), +"simulated fetch failure") +} else { + // Wait till the second attempt starts. + attempt0Latch.await() + iter +} + } else { +if (i == 0) { + // Wait till the first attempt completes. + attempt1Latch.await() +} +iter + } +} + +sc.addSparkListener(new SparkListener { + override def onTaskStart(taskStart: SparkListenerTaskStart): Unit = { +if (taskStart.stageId == 1 && taskStart.stageAttemptId == 1) { --- End diff -- Should we add 'taskStart.taskInfo.index == 0' here to make sure it's the partition 0? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21299: [SPARK-24250][SQL] support accessing SQLConf inside task...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21299 **[Test build #90542 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90542/testReport)** for PR 21299 at commit [`2ecabe4`](https://github.com/apache/spark/commit/2ecabe4fd984bb6a3f909364dcee27490c7a5d0a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21307: [SPARK-24186][R][SQL]change reverse and concat to collec...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21307 **[Test build #90541 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90541/testReport)** for PR 21307 at commit [`4f1513b`](https://github.com/apache/spark/commit/4f1513baccd5d66a0c374c83499bf453a3f590bc). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21307: [SPARK-24186][R][SQL]change reverse and concat to collec...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21307 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21307: [SPARK-24186][R][SQL]change reverse and concat to collec...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21307 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90541/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21307: [SPARK-24186][R][SQL]change reverse and concat to collec...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21307 **[Test build #90541 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90541/testReport)** for PR 21307 at commit [`4f1513b`](https://github.com/apache/spark/commit/4f1513baccd5d66a0c374c83499bf453a3f590bc). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20869: Improve implicitNotFound message for Encoder
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20869 **[Test build #4173 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4173/testReport)** for PR 20869 at commit [`588dffc`](https://github.com/apache/spark/commit/588dffc51df53bcbb885305e8ecd5bf39aa2e465). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19680: [SPARK-22461][ML] Refactor Spark ML model summaries
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19680 Build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19680: [SPARK-22461][ML] Refactor Spark ML model summaries
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19680 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3166/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21307: [SPARK-24186][R][SQL]change reverse and concat to collec...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21307 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21307: [SPARK-24186][R][SQL]change reverse and concat to collec...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21307 **[Test build #90540 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90540/testReport)** for PR 21307 at commit [`132af85`](https://github.com/apache/spark/commit/132af853bba50561926a0be3bb9b585305b4c8b5). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21299: [SPARK-24250][SQL] support accessing SQLConf inside task...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21299 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21045: [SPARK-23931][SQL] Adds zip function to sparksql
Github user DylanGuedes commented on a diff in the pull request: https://github.com/apache/spark/pull/21045#discussion_r187775631 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -90,6 +90,112 @@ case class MapKeys(child: Expression) override def prettyName: String = "map_keys" } +@ExpressionDescription( + usage = """_FUNC_(a1, a2) - Returns a merged array matching N-th element of first + array with the N-th element of second.""", + examples = """ +Examples: + > SELECT _FUNC_(array(1, 2, 3), array(2, 3, 4)); +[[1, 2], [2, 3], [3, 4]] + """, + since = "2.4.0") +case class ZipLists(left: Expression, right: Expression) + extends BinaryExpression with ExpectsInputTypes { + + override def inputTypes: Seq[AbstractDataType] = Seq(ArrayType, ArrayType) + + override def dataType: DataType = ArrayType(StructType( +StructField("_1", left.dataType.asInstanceOf[ArrayType].elementType, true) :: +StructField("_2", right.dataType.asInstanceOf[ArrayType].elementType, true) :: + Nil)) + + override def prettyName: String = "zip_lists" + + override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +nullSafeCodeGen(ctx, ev, (arr1, arr2) => { + val genericArrayData = classOf[GenericArrayData].getName + val genericInternalRow = classOf[GenericInternalRow].getName + + val i = ctx.freshName("i") + val values = ctx.freshName("values") + val len1 = ctx.freshName("len1") + val len2 = ctx.freshName("len2") + val pair = ctx.freshName("pair") + val getValue1 = CodeGenerator.getValue( +arr1, left.dataType.asInstanceOf[ArrayType].elementType, i) + val getValue2 = CodeGenerator.getValue( +arr2, right.dataType.asInstanceOf[ArrayType].elementType, i) + + s""" --- End diff -- Done! I am currently using sbt and an editor, actually. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21299: [SPARK-24250][SQL] support accessing SQLConf insi...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/21299#discussion_r187775593 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala --- @@ -68,16 +68,27 @@ object SQLExecution { // sparkContext.getCallSite() would first try to pick up any call site that was previously // set, then fall back to Utils.getCallSite(); call Utils.getCallSite() directly on // streaming queries would give us call site like "run at :0" - val callSite = sparkSession.sparkContext.getCallSite() + val callSite = sc.getCallSite() - sparkSession.sparkContext.listenerBus.post(SparkListenerSQLExecutionStart( + // Set all the specified SQL configs to local properties, so that they can be available at + // the executor side. + val allConfigs = sparkSession.sessionState.conf.getAllConfs + allConfigs.foreach { +// Excludes external configs defined by users. +case (key, value) if key.startsWith("spark") => sc.setLocalProperty(key, value) --- End diff -- This causes `scala.MatchError`. We need to cover the other case, too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21045: [SPARK-23931][SQL] Adds zip function to sparksql
Github user DylanGuedes commented on the issue: https://github.com/apache/spark/pull/21045 @mgaido91 Thank you for the suggestions and for being so patient. I updated the code with `zip` name, more tests in CollectionExpression (I'll add more after adding support to any number of arrays as input), the stripMargin syntax and a little refactor in the doGenCode (Is it the change that you was thinking of? I can make the code shorter but i'll be more complicated I think). I didn't find the proper way to add support to any number of arrays, since the number of inputs looks limited (unary, binary and ternary expressions, didn't find a "n-nary" or something similar). What you think? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21295: [SPARK-24230][SQL] Fix SpecificParquetRecordReade...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/21295#discussion_r187778727 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java --- @@ -147,7 +147,8 @@ public void initialize(InputSplit inputSplit, TaskAttemptContext taskAttemptCont this.sparkSchema = StructType$.MODULE$.fromString(sparkRequestedSchemaString); this.reader = new ParquetFileReader( configuration, footer.getFileMetaData(), file, blocks, requestedSchema.getColumns()); -for (BlockMetaData block : blocks) { +// use the blocks from the reader in case some do not match filters and will not be read --- End diff -- Could you be more specific by mentioning the corresponding Parquet JIRA issue or versions (1.10.0)? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21247: [SPARK-24190] Separating JSONOptions for read
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21247#discussion_r187780271 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala --- @@ -138,3 +121,40 @@ private[sql] class JSONOptions( factory.configure(JsonParser.Feature.ALLOW_UNQUOTED_CONTROL_CHARS, allowUnquotedControlChars) } } + +private[sql] class JSONOptionsInRead( +@transient override val parameters: CaseInsensitiveMap[String], +defaultTimeZoneId: String, +defaultColumnNameOfCorruptRecord: String) + extends JSONOptions(parameters, defaultTimeZoneId, defaultColumnNameOfCorruptRecord) { + + def this( +parameters: Map[String, String], +defaultTimeZoneId: String, +defaultColumnNameOfCorruptRecord: String = "") = { +this( + CaseInsensitiveMap(parameters), + defaultTimeZoneId, + defaultColumnNameOfCorruptRecord) + } + + protected override def checkedEncoding(enc: String): String = { +// The following encodings are not supported in per-line mode (multiline is false) +// because they cause some problems in reading files with BOM which is supposed to +// present in the files with such encodings. After splitting input files by lines, +// only the first lines will have the BOM which leads to impossibility for reading +// the rest lines. Besides of that, the lineSep option must have the BOM in such +// encodings which can never present between lines. +val blacklist = Seq(Charset.forName("UTF-16"), Charset.forName("UTF-32")) +val isBlacklisted = blacklist.contains(Charset.forName(enc)) +require(multiLine || !isBlacklisted, --- End diff -- There is no reasons to blacklist `UTF-16` and `UTF-32` in write. I have checked the content of written JSON files on @gatorsmile 's [test](https://github.com/apache/spark/pull/21247/commits/97c4af76addc78a85ceb503a5db16f3285f18a5f). For example, for `UTF-16` ``` $ hexdump -C ...c000.json fe ff 00 7b 00 22 00 5f 00 31 00 22 00 3a 00 22 |...{."._.1.".:."| 0010 00 61 00 22 00 2c 00 22 00 5f 00 32 00 22 00 3a |.a.".,."._.2.".:| 0020 00 31 00 7d 00 0a 00 7b 00 22 00 5f 00 31 00 22 |.1.}...{."._.1."| 0030 00 3a 00 22 00 63 00 22 00 2c 00 22 00 5f 00 32 |.:.".c.".,."._.2| 0040 00 22 00 3a 00 33 00 7d 00 0a|.".:.3.}..| 004a ``` It contains BOM `fe ff` at the beginning as it is expected, and written line separator doesn't contains BOM (look at the position 0x24-0x25) - `00 7d` **00 0a** `00 7b`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20869: Improve implicitNotFound message for Encoder
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20869 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21273: [SPARK-17916][SQL] Fix empty string being parsed as null...
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21273 @gengliangwang @gatorsmile I added a benchmark for parsing of quoted values. Parsing time dropped by **28%** (look at the commit https://github.com/apache/spark/pull/21273/commits/f3a0072b82c23566d4010d977172578f04f51bff) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20869: Improve implicitNotFound message for Encoder
Github user srowen commented on the issue: https://github.com/apache/spark/pull/20869 Merged to master --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21295: [SPARK-24230][SQL] Fix SpecificParquetRecordReade...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/21295#discussion_r187778648 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetQuerySuite.scala --- @@ -879,6 +879,18 @@ class ParquetQuerySuite extends QueryTest with ParquetTest with SharedSQLContext } } } + + test("SPARK-24230: filter row group using dictionary") { +withSQLConf(("parquet.filter.dictionary.enabled", "true")) { --- End diff -- Is this a valid way to control this configuration? It seems to pass with `false`, too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21045: [SPARK-23931][SQL] Adds zip function to sparksql
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/21045 @DylanGuedes you can see an example of functions supporting a variable number of arguments in `Coalesce` for instance. Please refer to it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21273: [SPARK-17916][SQL] Fix empty string being parsed as null...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21273 **[Test build #90543 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90543/testReport)** for PR 21273 at commit [`f3a0072`](https://github.com/apache/spark/commit/f3a0072b82c23566d4010d977172578f04f51bff). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21299: [SPARK-24250][SQL] support accessing SQLConf inside task...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21299 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21299: [SPARK-24250][SQL] support accessing SQLConf inside task...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21299 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90542/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21299: [SPARK-24250][SQL] support accessing SQLConf inside task...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21299 **[Test build #90542 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90542/testReport)** for PR 21299 at commit [`2ecabe4`](https://github.com/apache/spark/commit/2ecabe4fd984bb6a3f909364dcee27490c7a5d0a). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20869: Improve implicitNotFound message for Encoder
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20869 **[Test build #4173 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4173/testReport)** for PR 20869 at commit [`588dffc`](https://github.com/apache/spark/commit/588dffc51df53bcbb885305e8ecd5bf39aa2e465). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21311 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21298: [SPARK-24198][SparkR][SQL] Adding slice function to Spar...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21298 Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21106: [SPARK-23711][SQL][WIP] Add fallback logic for Un...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21106#discussion_r187786817 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/CodegenObjectFactory.scala --- @@ -0,0 +1,80 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions + +import org.codehaus.commons.compiler.CompileException +import org.codehaus.janino.InternalCompilerException + +import org.apache.spark.TaskContext +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.util.Utils + +/** + * Catches compile error during code generation. + */ +object CodegenError { + def unapply(throwable: Throwable): Option[Exception] = throwable match { +case e: InternalCompilerException => Some(e) +case e: CompileException => Some(e) +case _ => None + } +} + +/** + * Defines values for `SQLConf` config of fallback mode. Use for test only. + */ +object CodegenObjectFactoryMode extends Enumeration { + val AUTO, CODEGEN_ONLY, NO_CODEGEN = Value + + def currentMode: CodegenObjectFactoryMode.Value = { +// If we weren't on task execution, accesses that config. +if (TaskContext.get == null) { + val config = SQLConf.get.getConf(SQLConf.CODEGEN_FACTORY_MODE) + CodegenObjectFactoryMode.withName(config) +} else { + CodegenObjectFactoryMode.AUTO +} + } +} + +/** + * A factory which can be used to create objects that have both codegen and interpreted + * implementations. This tries to create codegen object first, if any compile error happens, + * it fallbacks to interpreted version. + */ +abstract class CodegenObjectFactory[IN, OUT] { --- End diff -- Ok. Reamed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21299: [SPARK-24250][SQL] support accessing SQLConf insi...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21299#discussion_r187787083 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/CreateJacksonParser.scala --- @@ -78,17 +78,4 @@ private[sql] object CreateJacksonParser extends Serializable { def inputStream(enc: String, jsonFactory: JsonFactory, is: InputStream): JsonParser = { jsonFactory.createParser(new InputStreamReader(is, enc)) } - - def internalRow(jsonFactory: JsonFactory, row: InternalRow): JsonParser = { -val ba = row.getBinary(0) - -jsonFactory.createParser(ba, 0, ba.length) - } - - def internalRow(enc: String, jsonFactory: JsonFactory, row: InternalRow): JsonParser = { -val binary = row.getBinary(0) -val sd = getStreamDecoder(enc, binary, binary.length) - -jsonFactory.createParser(sd) - } --- End diff -- Why these two removed? Looks like no SQLConf involved here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21106: [SPARK-23711][SQL][WIP] Add fallback logic for UnsafePro...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21106 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3172/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21106: [SPARK-23711][SQL][WIP] Add fallback logic for UnsafePro...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21106 **[Test build #90546 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90546/testReport)** for PR 21106 at commit [`67f8701`](https://github.com/apache/spark/commit/67f870133ab22a32e2af020a1b8893595dcef7cf). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21312#discussion_r187787343 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowWriter.scala --- @@ -311,6 +311,7 @@ private[arrow] class ArrayWriter( override def reset(): Unit = { super.reset() elementWriter.reset() +valueVector.clear() --- End diff -- Looks @BryanCutler added `reset()` interface in 0.9.0 mentioned in: https://github.com/apache/spark/blob/eb386be1ed383323da6e757f63f3b8a7ced38cc4/sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowWriter.scala#L132 at https://github.com/apache/arrow/commit/4dbce607d50031a405af39d36e08cd03c5ffc764 and https://issues.apache.org/jira/browse/ARROW-1962 but if we think about backporting, probably I guess we can go this way as a bug fix as is? Roughly looks making sense. Would it be also safe to do: ``` valueVector match { case fixedWidthVector: BaseFixedWidthVector => fixedWidthVector.reset() case variableWidthVector: BaseVariableWidthVector => variableWidthVector.reset() case repeatedValueVector: BaseRepeatedValueVector => repeatedValueVector.clear() case _ => } ``` ? @icexelloss, @BryanCutler and @viirya? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces wrong ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21312 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3173/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces wrong ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21312 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces wrong ...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21312 Thanks @HyukjinKwon --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21307: [SPARK-24186][R][SQL]change reverse and concat to...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/21307#discussion_r187788406 --- Diff: R/pkg/R/functions.R --- @@ -2055,20 +2058,10 @@ setMethod("countDistinct", #' @details #' \code{concat}: Concatenates multiple input columns together into a single column. -#' If all inputs are binary, concat returns an output as binary. Otherwise, it returns as string. +#' The function works with strings, binary and compatible array columns. --- End diff -- btw, what does "compatible array columns" mean? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces wrong ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21312 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces wrong ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21312 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90544/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21307: [SPARK-24186][R][SQL]change reverse and concat to collec...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21307 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21312#discussion_r187787511 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowWriter.scala --- @@ -311,6 +311,7 @@ private[arrow] class ArrayWriter( override def reset(): Unit = { super.reset() elementWriter.reset() +valueVector.clear() --- End diff -- Yeah, I think so. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21304: Fix typo in UDF type match error message
Github user robinske commented on the issue: https://github.com/apache/spark/pull/21304 Looked through the other text/error messages and didn't see any other typos jump out. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21106: [SPARK-23711][SQL][WIP] Add fallback logic for UnsafePro...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21106 **[Test build #90545 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90545/testReport)** for PR 21106 at commit [`129b6ac`](https://github.com/apache/spark/commit/129b6acc5a24c7ef48f94e407c91575645cd46b2). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21106: [SPARK-23711][SQL][WIP] Add fallback logic for UnsafePro...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21106 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21299: [SPARK-24250][SQL] support accessing SQLConf insi...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21299#discussion_r187787271 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala --- @@ -898,7 +898,6 @@ object SparkSession extends Logging { * @since 2.0.0 */ def getOrCreate(): SparkSession = synchronized { - assertOnDriver() --- End diff -- Is this meaning we can create SparkSession on driver? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21312#discussion_r187787882 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowWriter.scala --- @@ -311,6 +311,7 @@ private[arrow] class ArrayWriter( override def reset(): Unit = { super.reset() elementWriter.reset() +valueVector.clear() --- End diff -- I've also noticed that @BryanCutler added `reset` to `ListVector`. But we can only use `clear` for now. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21304: Fix typo in UDF type match error message
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21304 **[Test build #90548 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90548/testReport)** for PR 21304 at commit [`c1d79d2`](https://github.com/apache/spark/commit/c1d79d2258d79e19fc015a533a2026b026376961). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21106: [SPARK-23711][SQL][WIP] Add fallback logic for UnsafePro...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21106 **[Test build #90545 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90545/testReport)** for PR 21106 at commit [`129b6ac`](https://github.com/apache/spark/commit/129b6acc5a24c7ef48f94e407c91575645cd46b2). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21304: Fix typo in UDF type match error message
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21304 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90548/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21106: [SPARK-23711][SQL][WIP] Add fallback logic for UnsafePro...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21106 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21299: [SPARK-24250][SQL] support accessing SQLConf inside task...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21299 **[Test build #90549 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90549/testReport)** for PR 21299 at commit [`bf8b42d`](https://github.com/apache/spark/commit/bf8b42d494d4a8f21bd08b2fd6ed531e21e4eb49). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21304: Fix typo in UDF type match error message
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21304 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21299: [SPARK-24250][SQL] support accessing SQLConf inside task...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21299 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21299: [SPARK-24250][SQL] support accessing SQLConf inside task...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21299 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3174/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org