[GitHub] spark pull request #22014: [SPARK-25036][SQL] avoid match may not be exhaust...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22014#discussion_r208091824 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -709,6 +709,7 @@ object ScalaReflection extends ScalaReflection { def attributesFor[T: TypeTag]: Seq[Attribute] = schemaFor[T] match { case Schema(s: StructType, _) => s.toAttributes +case _ => throw new RuntimeException(s"$schemaFor is not supported at attributesFor()") --- End diff -- How about this: ```scala case other => throw new UnsupportedOperationException(s"Attributes for type $other is not supported") ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22014: [SPARK-25036][SQL] avoid match may not be exhaust...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22014#discussion_r208091445 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproxCountDistinctForIntervals.scala --- @@ -67,6 +67,7 @@ case class ApproxCountDistinctForIntervals( (endpointsExpression.dataType, endpointsExpression.eval()) match { case (ArrayType(elementType, _), arrayData: ArrayData) => arrayData.toObjectArray(elementType).map(_.toString.toDouble) + case _ => throw new RuntimeException("not found at endpoints") --- End diff -- Can we do this like: ```scala val endpointsType = endpointsExpression.dataType.asInstanceOf[ArrayType] val endpoints = endpointsExpression.eval().asInstanceOf[ArrayData] endpoints.toObjectArray(endpointsType.elementType).map(_.toString.toDouble) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22014: [SPARK-25036][SQL] avoid match may not be exhaust...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22014#discussion_r208090085 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -471,6 +471,7 @@ class CodegenContext { case NewFunctionSpec(functionName, None, None) => functionName case NewFunctionSpec(functionName, Some(_), Some(innerClassInstance)) => innerClassInstance + "." + functionName + case _ => null // nothing to do since addNewFunctionInteral() must return one of them --- End diff -- Shall we throw an `IllegalArgumentException`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22014: [SPARK-25036][SQL] avoid match may not be exhaust...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22014#discussion_r208089613 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/ValueInterval.scala --- @@ -86,6 +87,7 @@ object ValueInterval { val newMax = if (n1.max <= n2.max) n1.max else n2.max (Some(EstimationUtils.fromDouble(newMin, dt)), Some(EstimationUtils.fromDouble(newMax, dt))) + case _ => throw new RuntimeException(s"Not supported pair: $r1, $r2 at intersect()") --- End diff -- Shall we do `UnsupportedOperationException`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21977: SPARK-25004: Add spark.executor.pyspark.memory li...
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21977#discussion_r208091782 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -60,14 +61,26 @@ private[spark] object PythonEvalType { */ private[spark] abstract class BasePythonRunner[IN, OUT]( funcs: Seq[ChainedPythonFunctions], -bufferSize: Int, -reuseWorker: Boolean, evalType: Int, -argOffsets: Array[Array[Int]]) +argOffsets: Array[Array[Int]], +conf: SparkConf) extends Logging { require(funcs.length == argOffsets.length, "argOffsets should have the same length as funcs") + private val bufferSize = conf.getInt("spark.buffer.size", 65536) + private val reuseWorker = conf.getBoolean("spark.python.worker.reuse", true) + private val memoryMb = { +val allocation = conf.get(PYSPARK_EXECUTOR_MEMORY) +if (reuseWorker) { --- End diff -- No, I'm not sure where that is. Is it on the python side? If you can point me to it, I'll have a closer look. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22009: [SPARK-24882][SQL] improve data source v2 API
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22009 **[Test build #94336 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94336/testReport)** for PR 22009 at commit [`cab6d28`](https://github.com/apache/spark/commit/cab6d2828dacaca6e62d3409c684d18a1fc861f2). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22009: [SPARK-24882][SQL] improve data source v2 API
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22009 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1884/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22009: [SPARK-24882][SQL] improve data source v2 API
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22009 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21898: [SPARK-24817][Core] Implement BarrierTaskContext.barrier...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21898 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21898: [SPARK-24817][Core] Implement BarrierTaskContext.barrier...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21898 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94326/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21898: [SPARK-24817][Core] Implement BarrierTaskContext.barrier...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21898 **[Test build #94326 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94326/testReport)** for PR 21898 at commit [`1f71e65`](https://github.com/apache/spark/commit/1f71e6583f9f9f270d07323f15c731717e13518d). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21991: [SPARK-25018] [Infra] Use `Co-authored-by` and `Signed-o...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21991 The failed test is `FlatMapGroupsWithStateSuite.flatMapGroupsWithState`. I saw it fails some times occasionally. I think it should not be related to this change. @HyukjinKwon @dbtsai --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21305: [SPARK-24251][SQL] Add AppendData logical plan.
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21305#discussion_r208090280 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/types/DataTypeWriteCompatibilitySuite.scala --- @@ -0,0 +1,395 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.types + +import scala.collection.mutable + +import org.apache.spark.SparkFunSuite +import org.apache.spark.sql.catalyst.analysis +import org.apache.spark.sql.catalyst.expressions.Cast + +class DataTypeWriteCompatibilitySuite extends SparkFunSuite { --- End diff -- I'm planning on adding this, but it would be great to get this in and I'll add the tests next. It would be great to get this in to no longer keep rebasing it! Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21305: [SPARK-24251][SQL] Add AppendData logical plan.
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21305#discussion_r208090428 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala --- @@ -336,4 +337,124 @@ object DataType { case (fromDataType, toDataType) => fromDataType == toDataType } } + + private val SparkGeneratedName = """col\d+""".r + private def isSparkGeneratedName(name: String): Boolean = name match { +case SparkGeneratedName(_*) => true +case _ => false + } + + /** + * Returns true if the write data type can be read using the read data type. + * + * The write type is compatible with the read type if: + * - Both types are arrays, the array element types are compatible, and element nullability is + * compatible (read allows nulls or write does not contain nulls). + * - Both types are maps and the map key and value types are compatible, and value nullability + * is compatible (read allows nulls or write does not contain nulls). + * - Both types are structs and each field in the read struct is present in the write struct and + * compatible (including nullability), or is nullable if the write struct does not contain the + * field. Write-side structs are not compatible if they contain fields that are not present in + * the read-side struct. + * - Both types are atomic and the write type can be safely cast to the read type. + * + * Extra fields in write-side structs are not allowed to avoid accidentally writing data that + * the read schema will not read, and to ensure map key equality is not changed when data is read. + * + * @param write a write-side data type to validate against the read type + * @param read a read-side data type + * @return true if data written with the write type can be read using the read type + */ + def canWrite( + write: DataType, + read: DataType, + resolver: Resolver, + context: String, + addError: String => Unit = (_: String) => {}): Boolean = { +(write, read) match { + case (wArr: ArrayType, rArr: ArrayType) => +// run compatibility check first to produce all error messages +val typesCompatible = + canWrite(wArr.elementType, rArr.elementType, resolver, context + ".element", addError) + +if (wArr.containsNull && !rArr.containsNull) { + addError(s"Cannot write nullable elements to array of non-nulls: '$context'") + false +} else { + typesCompatible +} + + case (wMap: MapType, rMap: MapType) => +// map keys cannot include data fields not in the read schema without changing equality when +// read. map keys can be missing fields as long as they are nullable in the read schema. + +// run compatibility check first to produce all error messages +val keyCompatible = + canWrite(wMap.keyType, rMap.keyType, resolver, context + ".key", addError) +val valueCompatible = + canWrite(wMap.valueType, rMap.valueType, resolver, context + ".value", addError) +val typesCompatible = keyCompatible && valueCompatible + +if (wMap.valueContainsNull && !rMap.valueContainsNull) { + addError(s"Cannot write nullable values to map of non-nulls: '$context'") + false +} else { + typesCompatible +} + + case (StructType(writeFields), StructType(readFields)) => +var fieldCompatible = true +readFields.zip(writeFields).foreach { + case (rField, wField) => +val namesMatch = resolver(wField.name, rField.name) || isSparkGeneratedName(wField.name) +val fieldContext = s"$context.${rField.name}" +val typesCompatible = + canWrite(wField.dataType, rField.dataType, resolver, fieldContext, addError) + +if (!namesMatch) { + addError(s"Struct '$context' field name does not match (may be out of order): " + + s"expected '${rField.name}', found '${wField.name}'") + fieldCompatible = false +} else if (!rField.nullable && wField.nullable) { + addError(s"Cannot write nullable values to non-null field: '$fieldContext'") + fieldCompatible = false +} else if (!typesCompatible) { + // errors are added in the recursive call to canWrite above + fieldCompatible = false +} +} + +if (readFields.size > writeFields.size) { + val missingFieldsStr = readFields.takeRight(rea
[GitHub] spark issue #20666: [SPARK-23448][SQL] Clarify JSON and CSV parser behavior ...
Github user fuqiliang commented on the issue: https://github.com/apache/spark/pull/20666 for specify, the json file (Sanity4.json) is `{"a":"a1","int":1,"other":4.4} {"a":"a2","int":"","other":""}` code ï¼ > val config = new SparkConf().setMaster("local[5]").setAppName("test") > val sc = SparkContext.getOrCreate(config) > val sql = new SQLContext(sc) > > val file_path = this.getClass.getClassLoader.getResource("Sanity4.json").getFile > val df = sql.read.schema(null).json(file_path) > df.show(30) then in spark 1.6, result is +---++-+ | a| int|other| +---++-+ | a1| 1| 4.4| | a2|null| null| +---++-+ root |-- a: string (nullable = true) |-- int: long (nullable = true) |-- other: double (nullable = true) but in spark 2.2, result is +++-+ | a| int|other| +++-+ | a1| 1| 4.4| |null|null| null| +++-+ root |-- a: string (nullable = true) |-- int: long (nullable = true) |-- other: double (nullable = true) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17185: [SPARK-19602][SQL] Support column resolution of f...
Github user skambha commented on a diff in the pull request: https://github.com/apache/spark/pull/17185#discussion_r208089990 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/package.scala --- @@ -169,25 +181,50 @@ package object expressions { }) } - // Find matches for the given name assuming that the 1st part is a qualifier (i.e. table name, - // alias, or subquery alias) and the 2nd part is the actual name. This returns a tuple of + // Find matches for the given name assuming that the 1st two parts are qualifier + // (i.e. database name and table name) and the 3rd part is the actual column name. + // + // For example, consider an example where "db1" is the database name, "a" is the table name + // and "b" is the column name and "c" is the struct field name. + // If the name parts is db1.a.b.c, then Attribute will match + // Attribute(b, qualifier("db1,"a")) and List("c") will be the second element + var matches: (Seq[Attribute], Seq[String]) = nameParts match { +case dbPart +: tblPart +: name +: nestedFields => + val key = (dbPart.toLowerCase(Locale.ROOT), +tblPart.toLowerCase(Locale.ROOT), name.toLowerCase(Locale.ROOT)) + val attributes = collectMatches(name, qualified3Part.get(key)).filter { +a => (resolver(dbPart, a.qualifier.head) && resolver(tblPart, a.qualifier.last)) + } + (attributes, nestedFields) +case all => + (Seq.empty, Seq.empty) + } + + // If there are no matches, then find matches for the given name assuming that + // the 1st part is a qualifier (i.e. table name, alias, or subquery alias) and the + // 2nd part is the actual name. This returns a tuple of // matched attributes and a list of parts that are to be resolved. // // For example, consider an example where "a" is the table name, "b" is the column name, // and "c" is the struct field name, i.e. "a.b.c". In this case, Attribute will be "a.b", // and the second element will be List("c"). - val matches = nameParts match { -case qualifier +: name +: nestedFields => - val key = (qualifier.toLowerCase(Locale.ROOT), name.toLowerCase(Locale.ROOT)) - val attributes = collectMatches(name, qualified.get(key)).filter { a => -resolver(qualifier, a.qualifier.get) + matches = matches match { --- End diff -- done. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17185: [SPARK-19602][SQL] Support column resolution of fully qu...
Github user skambha commented on the issue: https://github.com/apache/spark/pull/17185 Thanks for the review. I have addressed your comments and pushed the changes. @cloud-fan, Please take a look. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17185: [SPARK-19602][SQL] Support column resolution of fully qu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17185 **[Test build #94335 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94335/testReport)** for PR 17185 at commit [`5f7e5d7`](https://github.com/apache/spark/commit/5f7e5d7bddca593d72818b07d71f678bd0a1982d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21991: [SPARK-25018] [Infra] Use `Co-authored-by` and `Signed-o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21991 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94324/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22018: [SPARK-25038][SQL] Accelerate Spark Plan generation when...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22018 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21991: [SPARK-25018] [Infra] Use `Co-authored-by` and `Signed-o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21991 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21721: [SPARK-24748][SS] Support for reporting custom me...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21721#discussion_r208089043 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ProgressReporter.scala --- @@ -196,6 +237,18 @@ trait ProgressReporter extends Logging { currentStatus = currentStatus.copy(isTriggerActive = false) } + /** Extract writer from the executed query plan. */ + private def dataSourceWriter: Option[DataSourceWriter] = { +if (lastExecution == null) return None +lastExecution.executedPlan.collect { + case p if p.isInstanceOf[WriteToDataSourceV2Exec] => --- End diff -- this only works for microbatch mode, do we have a plan to support continuous mode? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22018: [SPARK-25038][SQL] Accelerate Spark Plan generation when...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22018 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21991: [SPARK-25018] [Infra] Use `Co-authored-by` and `Signed-o...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21991 **[Test build #94324 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94324/testReport)** for PR 21991 at commit [`272d8fd`](https://github.com/apache/spark/commit/272d8fd4c6a46164069e2e3a892f016e9664cf5f). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22018: [SPARK-25038][SQL] Accelerate Spark Plan generation when...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22018 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21990: [SPARK-25003][PYSPARK][MASTER] Use SessionExtensions in ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21990 @RussellSpitzer, let's close other ones except for this and name it `[SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark`. Let me review this one within few days. Also, I don't think we should do it, at least, to branch-2.2. This logic here is quite convoluted and I would rather avoid to backport even to branch-2.3 actually. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21988: [SPARK-25003][PYSPARK][BRANCH-2.2] Use SessionExtensions...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21988 Yea, let's just close except the master one. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22018: [SPARK-25038][SQL] Accelerate Spark Plan generati...
GitHub user habren opened a pull request: https://github.com/apache/spark/pull/22018 [SPARK-25038][SQL] Accelerate Spark Plan generation when Spark SQL re⦠https://issues.apache.org/jira/browse/SPARK-25038 When Spark SQL read large amount of data, it take a long time (more than 10 minutes) to generate physical Plan and then ActiveJob Example: There is a table which is partitioned by date and hour. There are more than 13 TB data each hour and 185 TB per day. When we just issue a very simple SQL, it take a long time to generate ActiveJob The SQL statement is select count(device_id) from test_tbl where date=20180731 and hour='21'; Before optimization, it takes 2 minutes and 9 seconds to generate the Job The SQL is issued at 2018-08-07 09:07:41 However, the job is submitted at 2018-08-07 09:09:53, which is 2minutes and 9 seconds later than the SQL issue time After the optimization, it takes only 4 seconds to generate the Job The SQL is issued at 2018-08-07 09:20:15 And the job is submitted at 2018-08-07 09:20:19, which is 4 seconds later than the SQL issue time You can merge this pull request into a Git repository by running: $ git pull https://github.com/habren/spark SPARK-25038 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22018.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22018 commit 2bb5924e04eba5accfe58a4fbae094d46cc36488 Author: Jason Guo Date: 2018-08-07T03:13:03Z [SPARK-25038][SQL] Accelerate Spark Plan generation when Spark SQL read large amount of data --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20666: [SPARK-23448][SQL] Clarify JSON and CSV parser behavior ...
Github user fuqiliang commented on the issue: https://github.com/apache/spark/pull/20666 Hi, guys, I am a spark user. I have a question for this "JSON doesn't support partial results for corrupted records." behavior. In spark 1.6, the partial results is given, but when upgraded to 2.2, I loss some meaningful datas in my json file. Could i get those datas come back in spark 2+? @viirya Thanks for help. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21937: [SPARK-23914][SQL][follow-up] refactor ArrayUnion
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21937 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21860: [SPARK-24901][SQL]Merge the codegen of RegularHashMap an...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21860 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21860: [SPARK-24901][SQL]Merge the codegen of RegularHashMap an...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21860 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94327/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21860: [SPARK-24901][SQL]Merge the codegen of RegularHashMap an...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21860 **[Test build #94327 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94327/testReport)** for PR 21860 at commit [`f290668`](https://github.com/apache/spark/commit/f2906684f49c84183cf1f5e64ab4b887d4a77ca1). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21937: [SPARK-23914][SQL][follow-up] refactor ArrayUnion
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/21937 Thanks! merging to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21937: [SPARK-23914][SQL][follow-up] refactor ArrayUnion
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/21937 LGTM. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21937: [SPARK-23914][SQL][follow-up] refactor ArrayUnion
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21937#discussion_r208085448 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -3767,230 +3767,160 @@ object ArraySetLike { """, since = "2.4.0") case class ArrayUnion(left: Expression, right: Expression) extends ArraySetLike -with ComplexTypeMergingExpression { - var hsInt: OpenHashSet[Int] = _ - var hsLong: OpenHashSet[Long] = _ - - def assignInt(array: ArrayData, idx: Int, resultArray: ArrayData, pos: Int): Boolean = { -val elem = array.getInt(idx) -if (!hsInt.contains(elem)) { - if (resultArray != null) { -resultArray.setInt(pos, elem) - } - hsInt.add(elem) - true -} else { - false -} - } - - def assignLong(array: ArrayData, idx: Int, resultArray: ArrayData, pos: Int): Boolean = { -val elem = array.getLong(idx) -if (!hsLong.contains(elem)) { - if (resultArray != null) { -resultArray.setLong(pos, elem) - } - hsLong.add(elem) - true -} else { - false -} - } + with ComplexTypeMergingExpression { - def evalIntLongPrimitiveType( - array1: ArrayData, - array2: ArrayData, - resultArray: ArrayData, - isLongType: Boolean): Int = { -// store elements into resultArray -var nullElementSize = 0 -var pos = 0 -Seq(array1, array2).foreach { array => - var i = 0 - while (i < array.numElements()) { -val size = if (!isLongType) hsInt.size else hsLong.size -if (size + nullElementSize > ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH) { - ArraySetLike.throwUnionLengthOverflowException(size) -} -if (array.isNullAt(i)) { - if (nullElementSize == 0) { -if (resultArray != null) { - resultArray.setNullAt(pos) + @transient lazy val evalUnion: (ArrayData, ArrayData) => ArrayData = { +if (elementTypeSupportEquals) { + (array1, array2) => +val arrayBuffer = new scala.collection.mutable.ArrayBuffer[Any] +val hs = new OpenHashSet[Any] +var foundNullElement = false +Seq(array1, array2).foreach { array => + var i = 0 + while (i < array.numElements()) { +if (array.isNullAt(i)) { + if (!foundNullElement) { +arrayBuffer += null +foundNullElement = true + } +} else { + val elem = array.get(i, elementType) + if (!hs.contains(elem)) { +if (arrayBuffer.size > ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH) { + ArraySetLike.throwUnionLengthOverflowException(arrayBuffer.size) +} +arrayBuffer += elem +hs.add(elem) + } } -pos += 1 -nullElementSize = 1 +i += 1 } -} else { - val assigned = if (!isLongType) { -assignInt(array, i, resultArray, pos) +} +new GenericArrayData(arrayBuffer) +} else { + (array1, array2) => +val arrayBuffer = new scala.collection.mutable.ArrayBuffer[Any] +var alreadyIncludeNull = false +Seq(array1, array2).foreach(_.foreach(elementType, (_, elem) => { + var found = false + if (elem == null) { +if (alreadyIncludeNull) { + found = true +} else { + alreadyIncludeNull = true +} } else { -assignLong(array, i, resultArray, pos) +// check elem is already stored in arrayBuffer or not? +var j = 0 +while (!found && j < arrayBuffer.size) { + val va = arrayBuffer(j) + if (va != null && ordering.equiv(va, elem)) { +found = true + } + j = j + 1 +} } - if (assigned) { -pos += 1 + if (!found) { +if (arrayBuffer.length > ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH) { + ArraySetLike.throwUnionLengthOverflowException(arrayBuffer.length) +} +arrayBuffer += elem } -} -i += 1 - } +})) +new GenericArrayData(arrayBuffer) } -pos } override def nullSafeEval(input1: Any, input2: Any):
[GitHub] spark issue #20611: [SPARK-23425][SQL]Support wildcard in HDFS path for load...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20611 **[Test build #94334 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94334/testReport)** for PR 20611 at commit [`5b5bb52`](https://github.com/apache/spark/commit/5b5bb52e1c334eeec49c318e4c437d04c489671b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21222: [SPARK-24161][SS] Enable debug package feature on struct...
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/21222 Thanks @zsxwing for merging and thanks all for reviewing! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21991: [SPARK-25018] [Infra] Use `Co-authored-by` and `S...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21991 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21622: [SPARK-24637][SS] Add metrics regarding state and waterm...
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/21622 Thanks @HyukjinKwon for merging, and thanks all for reviewing! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20611: [SPARK-23425][SQL]Support wildcard in HDFS path for load...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20611 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21721: [SPARK-24748][SS] Support for reporting custom me...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21721 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21991: [SPARK-25018] [Infra] Use `Co-authored-by` and `Signed-o...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21991 https://github.com/apache/spark/commit/51bee7aca13451167fa3e701fcd60f023eae5e61 looks good :-) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21991: [SPARK-25018] [Infra] Use `Co-authored-by` and `Signed-o...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21991 Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21721: [SPARK-24748][SS] Support for reporting custom metrics v...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21721 Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22016: Fix typos
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22016 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94328/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21451: [SPARK-24296][CORE] Replicate large blocks as a stream.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21451 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94319/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22016: Fix typos
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22016 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22016: Fix typos
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22016 **[Test build #94328 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94328/testReport)** for PR 22016 at commit [`0d26901`](https://github.com/apache/spark/commit/0d2690185f6f8765accb78d39a3e74c1df5a4536). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21451: [SPARK-24296][CORE] Replicate large blocks as a stream.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21451 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21451: [SPARK-24296][CORE] Replicate large blocks as a stream.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21451 **[Test build #94319 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94319/testReport)** for PR 21451 at commit [`6d059f2`](https://github.com/apache/spark/commit/6d059f25f3595243a8dd6195a5ee938a78e40d99). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22009: [SPARK-24882][SQL] improve data source v2 API
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22009 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94330/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21622: [SPARK-24637][SS] Add metrics regarding state and...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21622 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22009: [SPARK-24882][SQL] improve data source v2 API
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22009 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22009: [SPARK-24882][SQL] improve data source v2 API
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22009 **[Test build #94330 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94330/testReport)** for PR 22009 at commit [`2f6d1d2`](https://github.com/apache/spark/commit/2f6d1d27a2a5aabc0db87b2e97f7f8e6fd6fe91c). * This patch **fails to generate documentation**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21898: [SPARK-24817][Core] Implement BarrierTaskContext.barrier...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21898 **[Test build #94333 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94333/testReport)** for PR 21898 at commit [`1f71e65`](https://github.com/apache/spark/commit/1f71e6583f9f9f270d07323f15c731717e13518d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21898: [SPARK-24817][Core] Implement BarrierTaskContext.barrier...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21898 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21622: [SPARK-24637][SS] Add metrics regarding state and waterm...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21622 Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21898: [SPARK-24817][Core] Implement BarrierTaskContext.barrier...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21898 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1883/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21898: [SPARK-24817][Core] Implement BarrierTaskContext.barrier...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21898 is there a way to increase the build timeout? cc @shaneknapp --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17185: [SPARK-19602][SQL] Support column resolution of f...
Github user skambha commented on a diff in the pull request: https://github.com/apache/spark/pull/17185#discussion_r208079529 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala --- @@ -201,7 +204,7 @@ case class Alias(child: Expression, name: String)( } override def sql: String = { -val qualifierPrefix = qualifier.map(_ + ".").getOrElse("") +val qualifierPrefix = if (qualifier.nonEmpty) qualifier.mkString(".") + "." else "" --- End diff -- ok, sounds good. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21898: [SPARK-24817][Core] Implement BarrierTaskContext.barrier...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21898 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17185: [SPARK-19602][SQL] Support column resolution of f...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17185#discussion_r208079335 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala --- @@ -201,7 +204,7 @@ case class Alias(child: Expression, name: String)( } override def sql: String = { -val qualifierPrefix = qualifier.map(_ + ".").getOrElse("") +val qualifierPrefix = if (qualifier.nonEmpty) qualifier.mkString(".") + "." else "" --- End diff -- ah my bad, I thought it would return empty string for empty seq. Let's leave it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21991: [SPARK-25018] [Infra] Use `Co-authored-by` and `Signed-o...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21991 Let me leave it to you @dbtsai. I thought you live in a timezone completely different with me. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21991: [SPARK-25018] [Infra] Use `Co-authored-by` and `Signed-o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21991 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94322/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21991: [SPARK-25018] [Infra] Use `Co-authored-by` and `Signed-o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21991 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21978: SPARK-25006: Add CatalogTableIdentifier.
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21978 I'd like to wait for https://github.com/apache/spark/pull/17185 #17185 allows users to do `db1.table1.col1`, and we can later extend it to `catalog1.db1.table1.col1`. We should also update the column resolution logic to consider catalog name. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21991: [SPARK-25018] [Infra] Use `Co-authored-by` and `Signed-o...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21991 **[Test build #94322 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94322/testReport)** for PR 21991 at commit [`11887ae`](https://github.com/apache/spark/commit/11887aefdda4f1a21cde9ad7d1099c91b0744264). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17185: [SPARK-19602][SQL] Support column resolution of f...
Github user skambha commented on a diff in the pull request: https://github.com/apache/spark/pull/17185#discussion_r208078928 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala --- @@ -794,19 +795,37 @@ case class LocalLimit(limitExpr: Expression, child: LogicalPlan) extends OrderPr /** * Aliased subquery. * - * @param alias the alias name for this subquery. + * @param name the alias identifier for this subquery. * @param child the logical plan of this subquery. */ case class SubqueryAlias( -alias: String, +name: AliasIdentifier, child: LogicalPlan) extends OrderPreservingUnaryNode { - override def doCanonicalize(): LogicalPlan = child.canonicalized + def alias: String = name.identifier - override def output: Seq[Attribute] = child.output.map(_.withQualifier(Some(alias))) + override def output: Seq[Attribute] = { +val qualifierList = name.database.map(Seq(_, alias)).getOrElse(Seq(alias)) +child.output.map(_.withQualifier(qualifierList)) + } + override def doCanonicalize(): LogicalPlan = child.canonicalized } +object SubqueryAlias { + def apply( + identifier: String, + child: LogicalPlan): SubqueryAlias = { +SubqueryAlias(AliasIdentifier(identifier), child) + } + + def apply( + identifier: String, + database: Option[String], --- End diff -- good point! I'll take care of this in the next push. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21991: [SPARK-25018] [Infra] Use `Co-authored-by` and `Signed-o...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21991 Yes, I think so. I was about to merge this in that way :-). Seems to me we are good to merge now since the current change is only checked by Python lint which is already passed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21980: [SPARK-25010][SQL] Rand/Randn should produce different v...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21980 **[Test build #94332 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94332/testReport)** for PR 21980 at commit [`d4d8d0f`](https://github.com/apache/spark/commit/d4d8d0fd2597d52dd2da5b36da6f05a60d89d25e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17185: [SPARK-19602][SQL] Support column resolution of f...
Github user skambha commented on a diff in the pull request: https://github.com/apache/spark/pull/17185#discussion_r208078754 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala --- @@ -201,7 +204,7 @@ case class Alias(child: Expression, name: String)( } override def sql: String = { -val qualifierPrefix = qualifier.map(_ + ".").getOrElse("") +val qualifierPrefix = if (qualifier.nonEmpty) qualifier.mkString(".") + "." else "" --- End diff -- This won't work for the case when we have Seq.empty. The suffix "." gets returned even for a empty sequence. For a non empty Seq, the above call will be fine. Shall we leave the 'if' as is or is there an equivalent preferred style that would work? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21980: [SPARK-25010][SQL] Rand/Randn should produce different v...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21980 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1882/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21980: [SPARK-25010][SQL] Rand/Randn should produce different v...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21980 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21898: [SPARK-24817][Core] Implement BarrierTaskContext.barrier...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21898 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94318/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21898: [SPARK-24817][Core] Implement BarrierTaskContext.barrier...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21898 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21898: [SPARK-24817][Core] Implement BarrierTaskContext.barrier...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21898 **[Test build #94318 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94318/testReport)** for PR 21898 at commit [`1f71e65`](https://github.com/apache/spark/commit/1f71e6583f9f9f270d07323f15c731717e13518d). * This patch **fails from timeout after a configured wait of \`300m\`**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21980: [SPARK-25010][SQL] Rand/Randn should produce different v...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21980 **[Test build #94331 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94331/testReport)** for PR 21980 at commit [`f60a238`](https://github.com/apache/spark/commit/f60a2384f335b1c95e81a0c232299af9bb426654). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21980: [SPARK-25010][SQL] Rand/Randn should produce different v...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21980 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1881/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21980: [SPARK-25010][SQL] Rand/Randn should produce different v...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21980 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22009: [SPARK-24882][SQL] improve data source v2 API
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22009 **[Test build #94330 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94330/testReport)** for PR 22009 at commit [`2f6d1d2`](https://github.com/apache/spark/commit/2f6d1d27a2a5aabc0db87b2e97f7f8e6fd6fe91c). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22009: [SPARK-24882][SQL] improve data source v2 API
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22009 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1880/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22009: [SPARK-24882][SQL] improve data source v2 API
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22009 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21980: [SPARK-25010][SQL] Rand/Randn should produce diff...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21980#discussion_r208078032 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala --- @@ -854,6 +854,26 @@ class StreamingQuerySuite extends StreamTest with BeforeAndAfter with Logging wi assert(uuids.distinct.size == 2) } + test("Rand/Randn in streaming query should not produce results in each execution") { --- End diff -- oops, fixed typo. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21991: [SPARK-25018] [Infra] Use `Co-authored-by` and `Signed-o...
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/21991 @HyukjinKwon thanks! Is it possible to use this script to merge this PR which has many people involve? A good demonstration of collaboration in the community. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21991: [SPARK-25018] [Infra] Use `Co-authored-by` and `Signed-o...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21991 **[Test build #94329 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94329/testReport)** for PR 21991 at commit [`272d8fd`](https://github.com/apache/spark/commit/272d8fd4c6a46164069e2e3a892f016e9664cf5f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21991: [SPARK-25018] [Infra] Use `Co-authored-by` and `Signed-o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21991 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1879/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21991: [SPARK-25018] [Infra] Use `Co-authored-by` and `Signed-o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21991 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21199: [SPARK-24127][SS] Continuous text socket source
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21199 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21956: [MINOR][DOCS] Fix grammatical error in SortShuffl...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21956 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21199: [SPARK-24127][SS] Continuous text socket source
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21199 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94321/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21632: [SPARK-19591][ML][MLlib] Add sample weights to decision ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21632 cc also @jkbradley --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21199: [SPARK-24127][SS] Continuous text socket source
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21199 **[Test build #94321 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94321/testReport)** for PR 21199 at commit [`f4a39d9`](https://github.com/apache/spark/commit/f4a39d9ebae2d6f6ae59caf3140310b17e75b602). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class TextSocketContinuousReader(options: DataSourceOptions) extends ContinuousReader with Logging ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21991: [SPARK-25018] [Infra] Use `Co-authored-by` and `Signed-o...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21991 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21991: [SPARK-25018] [Infra] Use `Co-authored-by` and `Signed-o...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21991 I am merging this since this is not actually tested and only thing is Python linter which is already passed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22016: Fix typos
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22016 **[Test build #94328 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94328/testReport)** for PR 22016 at commit [`0d26901`](https://github.com/apache/spark/commit/0d2690185f6f8765accb78d39a3e74c1df5a4536). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21956: [MINOR][DOCS] Fix grammatical error in SortShuffleManage...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21956 @kiszk, I would appreciate if you feel free to open a PR fixing them, or suggest them in someone's PR fixing them. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21956: [MINOR][DOCS] Fix grammatical error in SortShuffleManage...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21956 Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21980: [SPARK-25010][SQL] Rand/Randn should produce different v...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21980 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21980: [SPARK-25010][SQL] Rand/Randn should produce diff...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21980#discussion_r208075258 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala --- @@ -854,6 +854,26 @@ class StreamingQuerySuite extends StreamTest with BeforeAndAfter with Logging wi assert(uuids.distinct.size == 2) } + test("Rand/Randn in streaming query should not produce results in each execution") { --- End diff -- `produce results` -> `produce same results` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org