[GitHub] spark issue #23121: [SPARK-24553][UI][FOLLOWUP][2.4 Backport] Fix unnecessar...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/23121 I had better ask about the target branches. :) Thanks, @jerryshao . Now, it lands for 2.4.1. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22979 **[Test build #99211 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99211/testReport)** for PR 22979 at commit [`15a09b8`](https://github.com/apache/spark/commit/15a09b8f5a8181e2e758f108a2734c22af4928de). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23080 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23080 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5304/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23080: [SPARK-26108][SQL] Support custom lineSep in CSV datasou...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23080 **[Test build #99210 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99210/testReport)** for PR 23080 at commit [`a4c4b67`](https://github.com/apache/spark/commit/a4c4b6710cb67bddd9badbb53aa07b0d93242bc5). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23103: [SPARK-26121] [Structured Streaming] Allow users to defi...
Github user zouzias commented on the issue: https://github.com/apache/spark/pull/23103 @koeninger, I will make the doc changes asap. FYI, I plan to make changes on file `structured-streaming-kafka-integration.md` seems the most relevant doc for this diff. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23121: [SPARK-24553][UI][FOLLOWUP][2.4 Backport] Fix unnecessar...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/23121 Merged to `branch-2.4`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23121: [SPARK-24553][UI][FOLLOWUP][2.4 Backport] Fix unnecessar...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23121 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99206/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23121: [SPARK-24553][UI][FOLLOWUP][2.4 Backport] Fix unnecessar...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23121 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23121: [SPARK-24553][UI][FOLLOWUP][2.4 Backport] Fix unnecessar...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23121 **[Test build #99206 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99206/testReport)** for PR 23121 at commit [`c6351f6`](https://github.com/apache/spark/commit/c6351f68b4e24834fde503c8d068d2e6d3966348). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23119: [SPARK-25954][SS][FOLLOWUP][test-maven] Add Zooke...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/23119 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23119: [SPARK-25954][SS][FOLLOWUP][test-maven] Add Zookeeper 3....
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/23119 Merged to master to recover `master` branch Maven testings. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23119: [SPARK-25954][SS][FOLLOWUP][test-maven] Add Zookeeper 3....
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23119 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99205/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23119: [SPARK-25954][SS][FOLLOWUP][test-maven] Add Zookeeper 3....
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23119 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23119: [SPARK-25954][SS][FOLLOWUP][test-maven] Add Zookeeper 3....
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23119 **[Test build #99205 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99205/testReport)** for PR 23119 at commit [`9c5d796`](https://github.com/apache/spark/commit/9c5d7961406dccfb16685a18467e02f3ea8c1ce8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23124: [SPARK-25829][SQL] remove duplicated map keys wit...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/23124#discussion_r235852779 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayBasedMapBuilder.scala --- @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.util + +import scala.collection.mutable + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.types.{AtomicType, CalendarIntervalType, DataType, MapType} + +/** + * A builder of [[ArrayBasedMapData]], which fails if a null map key is detected, and removes + * duplicated map keys w.r.t. the last wins policy. + */ +class ArrayBasedMapBuilder(keyType: DataType, valueType: DataType) extends Serializable { + assert(!keyType.existsRecursively(_.isInstanceOf[MapType]), "key of map cannot be/contain map") + + private lazy val keyToIndex = keyType match { +case _: AtomicType | _: CalendarIntervalType => mutable.HashMap.empty[Any, Int] +case _ => + // for complex types, use interpreted ordering to be able to compare unsafe data with safe + // data, e.g. UnsafeRow vs GenericInternalRow. + mutable.TreeMap.empty[Any, Int](TypeUtils.getInterpretedOrdering(keyType)) --- End diff -- ```scala scala> sql("select map(null,2)") res1: org.apache.spark.sql.DataFrame = [map(NULL, 2): map] scala> sql("select map(null,2)").collect scala.MatchError: NullType (of class org.apache.spark.sql.types.NullType$) at org.apache.spark.sql.catalyst.util.TypeUtils$.getInterpretedOrdering(TypeUtils.scala:67) at org.apache.spark.sql.catalyst.util.ArrayBasedMapBuilder.keyToIndex$lzycompute(ArrayBasedMapBuilder.scala:37) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23119: [SPARK-25954][SS][FOLLOWUP][test-maven] Add Zookeeper 3....
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/23119 Hi, @cloud-fan . Could you review this, please? [Test build #99205 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99205/testReport) is almost passed. It's at `Python` testing stage. I'd like to merge this because this causes UT failures in both `maven` builds (master-maven and master-maven-scala-2.12). Also, this PR suffers from flaky tests a lot today. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23124: [SPARK-25829][SQL] remove duplicated map keys wit...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/23124#discussion_r235851923 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala --- @@ -558,8 +558,11 @@ private[parquet] class ParquetRowConverter( override def getConverter(fieldIndex: Int): Converter = keyValueConverter -override def end(): Unit = +override def end(): Unit = { + // The parquet map may contains null or duplicated map keys. When it happens, the behavior is + // undefined. --- End diff -- What about creating a Spark JIRA issue for this and embedded that ID here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23124: [SPARK-25829][SQL] remove duplicated map keys wit...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/23124#discussion_r235851798 --- Diff: docs/sql-migration-guide-upgrade.md --- @@ -19,6 +19,8 @@ displayTitle: Spark SQL Upgrading Guide - In Spark version 2.4 and earlier, users can create map values with map type key via built-in function like `CreateMap`, `MapFromArrays`, etc. Since Spark 3.0, it's not allowed to create map values with map type key with these built-in functions. Users can still read map values with map type key from data source or Java/Scala collections, though they are not very useful. + - In Spark version 2.4 and earlier, users can create a map with duplicated keys via built-in functions like `CreateMap`, `StringToMap`, etc. The behavior of map with duplicated keys is undefined, e.g. map look up respects the duplicated key appears first, `Dataset.collect` only keeps the duplicated key appears last, `MapKeys` returns duplicated keys, etc. Since Spark 3.0, these built-in functions will remove duplicated map keys with last wins policy. --- End diff -- Can we merge this with the above sentence at line 20? Both are different, but are related very strongly. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23124: [SPARK-25829][SQL] remove duplicated map keys wit...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/23124#discussion_r235851554 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayBasedMapBuilder.scala --- @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.util + +import scala.collection.mutable + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.types.{AtomicType, CalendarIntervalType, DataType, MapType} + +/** + * A builder of [[ArrayBasedMapData]], which fails if a null map key is detected, and removes + * duplicated map keys w.r.t. the last wins policy. + */ +class ArrayBasedMapBuilder(keyType: DataType, valueType: DataType) extends Serializable { + assert(!keyType.existsRecursively(_.isInstanceOf[MapType]), "key of map cannot be/contain map") --- End diff -- Shall we add assert to prevent `NullType` here, too? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23124: [SPARK-25829][SQL] remove duplicated map keys wit...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23124#discussion_r235849825 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -751,171 +739,46 @@ case class MapFromEntries(child: Expression) extends UnaryExpression { s"${child.dataType.catalogString} type. $prettyName accepts only arrays of pair structs.") } + private lazy val mapBuilder = new ArrayBasedMapBuilder(dataType.keyType, dataType.valueType) + override protected def nullSafeEval(input: Any): Any = { -val arrayData = input.asInstanceOf[ArrayData] -val numEntries = arrayData.numElements() +val entries = input.asInstanceOf[ArrayData] +val numEntries = entries.numElements() var i = 0 -if(nullEntries) { +if (nullEntries) { while (i < numEntries) { -if (arrayData.isNullAt(i)) return null +if (entries.isNullAt(i)) return null i += 1 } } -val keyArray = new Array[AnyRef](numEntries) -val valueArray = new Array[AnyRef](numEntries) + +mapBuilder.reset() i = 0 while (i < numEntries) { - val entry = arrayData.getStruct(i, 2) - val key = entry.get(0, dataType.keyType) - if (key == null) { -throw new RuntimeException("The first field from a struct (key) can't be null.") - } - keyArray.update(i, key) - val value = entry.get(1, dataType.valueType) - valueArray.update(i, value) + mapBuilder.put(entries.getStruct(i, 2)) i += 1 } -ArrayBasedMapData(keyArray, valueArray) +mapBuilder.build() } override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { nullSafeCodeGen(ctx, ev, c => { val numEntries = ctx.freshName("numEntries") - val isKeyPrimitive = CodeGenerator.isPrimitiveType(dataType.keyType) - val isValuePrimitive = CodeGenerator.isPrimitiveType(dataType.valueType) - val code = if (isKeyPrimitive && isValuePrimitive) { -genCodeForPrimitiveElements(ctx, c, ev.value, numEntries) --- End diff -- since we need to check duplicated map keys, it's not possible to apply this trick anymore, as we need to overwrite values if the key appears before. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23124: [SPARK-25829][SQL] remove duplicated map keys wit...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23124#discussion_r235849697 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/InternalRow.scala --- @@ -125,22 +125,36 @@ object InternalRow { * actually takes a `SpecializedGetters` input because it can be generalized to other classes * that implements `SpecializedGetters` (e.g., `ArrayData`) too. */ - def getAccessor(dataType: DataType): (SpecializedGetters, Int) => Any = dataType match { -case BooleanType => (input, ordinal) => input.getBoolean(ordinal) -case ByteType => (input, ordinal) => input.getByte(ordinal) -case ShortType => (input, ordinal) => input.getShort(ordinal) -case IntegerType | DateType => (input, ordinal) => input.getInt(ordinal) -case LongType | TimestampType => (input, ordinal) => input.getLong(ordinal) -case FloatType => (input, ordinal) => input.getFloat(ordinal) -case DoubleType => (input, ordinal) => input.getDouble(ordinal) -case StringType => (input, ordinal) => input.getUTF8String(ordinal) -case BinaryType => (input, ordinal) => input.getBinary(ordinal) -case CalendarIntervalType => (input, ordinal) => input.getInterval(ordinal) -case t: DecimalType => (input, ordinal) => input.getDecimal(ordinal, t.precision, t.scale) -case t: StructType => (input, ordinal) => input.getStruct(ordinal, t.size) -case _: ArrayType => (input, ordinal) => input.getArray(ordinal) -case _: MapType => (input, ordinal) => input.getMap(ordinal) -case u: UserDefinedType[_] => getAccessor(u.sqlType) -case _ => (input, ordinal) => input.get(ordinal, dataType) + def getAccessor(dt: DataType, nullable: Boolean = true): (SpecializedGetters, Int) => Any = { --- End diff -- I can move it to a new PR if others think it's necessary. It's a little dangerous to ask the caller side to take care of null values. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23124: [SPARK-25829][SQL] remove duplicated map keys with last ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23124 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23124: [SPARK-25829][SQL] remove duplicated map keys with last ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23124 **[Test build #99209 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99209/testReport)** for PR 23124 at commit [`cbcd5d7`](https://github.com/apache/spark/commit/cbcd5d7a937f8120ef8527f1f26150ed93f1de0a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23124: [SPARK-25829][SQL] remove duplicated map keys with last ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23124 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5303/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23124: [SPARK-25829][SQL] remove duplicated map keys with last ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23124 cc @dongjoon-hyun @gatorsmile @viirya @kiszk @mgaido91 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23119: [SPARK-25954][SS][FOLLOWUP][test-maven] Add Zookeeper 3....
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23119 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23119: [SPARK-25954][SS][FOLLOWUP][test-maven] Add Zookeeper 3....
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23119 **[Test build #99202 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99202/testReport)** for PR 23119 at commit [`9c5d796`](https://github.com/apache/spark/commit/9c5d7961406dccfb16685a18467e02f3ea8c1ce8). * This patch **fails from timeout after a configured wait of `400m`**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23119: [SPARK-25954][SS][FOLLOWUP][test-maven] Add Zookeeper 3....
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23119 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99202/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23124: [SPARK-25829][SQL] remove duplicated map keys wit...
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/23124 [SPARK-25829][SQL] remove duplicated map keys with last wins policy ## What changes were proposed in this pull request? Currently duplicated map keys are not handled consistently. For example, map look up respects the duplicated key appears first, `Dataset.collect` only keeps the duplicated key appears last, `MapKeys` returns duplicated keys, etc. This PR proposes to remove duplicated map keys with last wins policy, to follow Java/Scala and Presto. It only applies to built-in functions, as users can create map with duplicated map keys via private APIs anyway. For other places: 1. data source v1 doesn't have this problem, as users need to provide a java/scala map, which can't have duplicated keys. 2. data source v2 may have this problem. I've added a note to `ArrayBasedMapData` to ask the caller to take care of duplicated keys. In the future we should enforce it in the stable data APIs for data source v2. 3. UDF doesn't have this problem, as users need to provide a java/scala map. Same as data source v1. 4. file format. I checked all of them and only parquet does not enforce it. For backward compatibility reasons I change nothing but leave a note saying that the behavior will be undefined if users write map with duplicated keys to parquet files. Maybe we can add a config and fail by default if parquet files have map with duplicated keys. This can be done in followup. ## How was this patch tested? updated tests and new tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloud-fan/spark map Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23124.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23124 commit cbcd5d7a937f8120ef8527f1f26150ed93f1de0a Author: Wenchen Fan Date: 2018-11-15T02:49:22Z remove duplicated map keys with last wins policy --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23123: [SPARK-26153][ML] GBT & RandomForest avoid unnecessary `...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23123 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23123: [SPARK-26153][ML] GBT & RandomForest avoid unnecessary `...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23123 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99208/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23123: [SPARK-26153][ML] GBT & RandomForest avoid unnecessary `...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23123 **[Test build #99208 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99208/testReport)** for PR 23123 at commit [`28a0a92`](https://github.com/apache/spark/commit/28a0a923703e2d751d409773ec8995bbc731440b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23122: [MINOR][ML] add missing params to Instr
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23122 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23122: [MINOR][ML] add missing params to Instr
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23122 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99207/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23122: [MINOR][ML] add missing params to Instr
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23122 **[Test build #99207 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99207/testReport)** for PR 23122 at commit [`de3aa78`](https://github.com/apache/spark/commit/de3aa789490e87e44da7a998455c31d03ffe2aa3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23104: [SPARK-26138][SQL] LimitPushDown cross join requires may...
Github user guoxiaolongzte commented on the issue: https://github.com/apache/spark/pull/23104 Yes I tested and understood, you are right. @mgaido91 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23123: [SPARK-26153][ML] GBT & RandomForest avoid unnecessary `...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23123 **[Test build #99208 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99208/testReport)** for PR 23123 at commit [`28a0a92`](https://github.com/apache/spark/commit/28a0a923703e2d751d409773ec8995bbc731440b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23123: [SPARK-26153][ML] GBT & RandomForest avoid unnecessary `...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23123 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5302/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23123: [SPARK-26153][ML] GBT & RandomForest avoid unnecessary `...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23123 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23123: [SPARK-26153][ML] GBT & RandomForest avoid unnece...
GitHub user zhengruifeng opened a pull request: https://github.com/apache/spark/pull/23123 [SPARK-26153][ML] GBT & RandomForest avoid unnecessary `first` job to compute `numFeatures` ## What changes were proposed in this pull request? use base models' `numFeature` instead of `first` job ## How was this patch tested? existing tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhengruifeng/spark avoid_first_job Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23123.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23123 commit 28a0a923703e2d751d409773ec8995bbc731440b Author: zhengruifeng Date: 2018-11-23T03:55:01Z init --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23122: [MINOR][ML] add missing params to Instr
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23122 **[Test build #99207 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99207/testReport)** for PR 23122 at commit [`de3aa78`](https://github.com/apache/spark/commit/de3aa789490e87e44da7a998455c31d03ffe2aa3). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23122: [MINOR][ML] add missing params to Instr
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23122 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5301/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23122: [MINOR][ML] add missing params to Instr
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23122 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23122: [MINOR][ML] add missing params to Instr
GitHub user zhengruifeng opened a pull request: https://github.com/apache/spark/pull/23122 [MINOR][ML] add missing params to Instr ## What changes were proposed in this pull request? add following param to instr: GBTC: validationTol GBTR: validationTol, validationIndicatorCol ALS: coldStartStrategy ## How was this patch tested? existing tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhengruifeng/spark instr_append_missing_params Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23122.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23122 commit de3aa789490e87e44da7a998455c31d03ffe2aa3 Author: zhengruifeng Date: 2018-11-23T03:41:03Z init --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21732: [SPARK-24762][SQL] Enable Option of Product encod...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21732#discussion_r235840903 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala --- @@ -257,6 +251,11 @@ case class ExpressionEncoder[T]( */ def isSerializedAsStruct: Boolean = objSerializer.dataType.isInstanceOf[StructType] + /** + * Returns true if the type `T` is `Option`. + */ + def isOptionType: Boolean = classOf[Option[_]].isAssignableFrom(clsTag.runtimeClass) --- End diff -- sorry typo: `isSerializedAsStruct && !isOption` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21732: [SPARK-24762][SQL] Enable Option of Product encod...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21732#discussion_r235840884 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala --- @@ -257,6 +251,11 @@ case class ExpressionEncoder[T]( */ def isSerializedAsStruct: Boolean = objSerializer.dataType.isInstanceOf[StructType] + /** + * Returns true if the type `T` is `Option`. + */ + def isOptionType: Boolean = classOf[Option[_]].isAssignableFrom(clsTag.runtimeClass) --- End diff -- what do you mean? What I asked is a code style change, to make the code more maintainable. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23119: [SPARK-25954][SS][FOLLOWUP][test-maven] Add Zookeeper 3....
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23119 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99203/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23119: [SPARK-25954][SS][FOLLOWUP][test-maven] Add Zookeeper 3....
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23119 **[Test build #99203 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99203/testReport)** for PR 23119 at commit [`9c5d796`](https://github.com/apache/spark/commit/9c5d7961406dccfb16685a18467e02f3ea8c1ce8). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23119: [SPARK-25954][SS][FOLLOWUP][test-maven] Add Zookeeper 3....
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23119 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23121: [SPARK-24553][UI][FOLLOWUP][2.4 Backport] Fix unnecessar...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23121 **[Test build #99206 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99206/testReport)** for PR 23121 at commit [`c6351f6`](https://github.com/apache/spark/commit/c6351f68b4e24834fde503c8d068d2e6d3966348). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23121: [SPARK-24553][UI][FOLLOWUP][2.4 Backport] Fix unnecessar...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23121 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23121: [SPARK-24553][UI][FOLLOWUP][2.4 Backport] Fix unnecessar...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23121 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5300/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23121: [SPARK-24553][UI][FOLLOWUP][2.4 Backport] Fix unn...
GitHub user jerryshao opened a pull request: https://github.com/apache/spark/pull/23121 [SPARK-24553][UI][FOLLOWUP][2.4 Backport] Fix unnecessary UI redirect ## What changes were proposed in this pull request? This is a backport PR of #23116 . This PR is a follow-up PR of #21600 to fix the unnecessary UI redirect. ## How was this patch tested? Local verification You can merge this pull request into a Git repository by running: $ git pull https://github.com/jerryshao/apache-spark SPARK-24553-branch-2.4 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23121.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23121 commit c6351f68b4e24834fde503c8d068d2e6d3966348 Author: jerryshao Date: 2018-11-22T22:54:00Z [SPARK-24553][UI][FOLLOWUP] Fix unnecessary UI redirect ## What changes were proposed in this pull request? This PR is a follow-up PR of #21600 to fix the unnecessary UI redirect. ## How was this patch tested? Local verification Closes #23116 from jerryshao/SPARK-24553. Authored-by: jerryshao Signed-off-by: Dongjoon Hyun (cherry picked from commit 76aae7f1fd512f150ffcdb618107b12e1e97fe43) Signed-off-by: jerryshao --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23119: [SPARK-25954][SS][FOLLOWUP][test-maven] Add Zookeeper 3....
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23119 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5299/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23119: [SPARK-25954][SS][FOLLOWUP][test-maven] Add Zookeeper 3....
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23119 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23119: [SPARK-25954][SS][FOLLOWUP][test-maven] Add Zookeeper 3....
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23119 **[Test build #99205 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99205/testReport)** for PR 23119 at commit [`9c5d796`](https://github.com/apache/spark/commit/9c5d7961406dccfb16685a18467e02f3ea8c1ce8). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23119: [SPARK-25954][SS][FOLLOWUP][test-maven] Add Zookeeper 3....
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/23119 Retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23119: [SPARK-25954][SS][FOLLOWUP][test-maven] Add Zookeeper 3....
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23119 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99204/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23119: [SPARK-25954][SS][FOLLOWUP][test-maven] Add Zookeeper 3....
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23119 **[Test build #99204 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99204/testReport)** for PR 23119 at commit [`9c5d796`](https://github.com/apache/spark/commit/9c5d7961406dccfb16685a18467e02f3ea8c1ce8). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23119: [SPARK-25954][SS][FOLLOWUP][test-maven] Add Zookeeper 3....
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23119 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23119: [SPARK-25954][SS][FOLLOWUP][test-maven] Add Zookeeper 3....
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23119 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23119: [SPARK-25954][SS][FOLLOWUP][test-maven] Add Zookeeper 3....
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23119 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5298/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23119: [SPARK-25954][SS][FOLLOWUP][test-maven] Add Zookeeper 3....
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23119 **[Test build #99204 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99204/testReport)** for PR 23119 at commit [`9c5d796`](https://github.com/apache/spark/commit/9c5d7961406dccfb16685a18467e02f3ea8c1ce8). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23116: [SPARK-24553][UI][FOLLOWUP] Fix unnecessary UI redirect
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/23116 @dongjoon-hyun , this should also be backported to branch 2.4, let me create a backport PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23119: [SPARK-25954][SS][FOLLOWUP][test-maven] Add Zookeeper 3....
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/23119 Retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23119: [SPARK-25954][SS][FOLLOWUP][test-maven] Add Zookeeper 3....
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/23119 The current one hangs on `BroadcastSuite`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23120: [SPARK-26151][SQL] Return partial results for bad CSV re...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23120 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99201/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23120: [SPARK-26151][SQL] Return partial results for bad CSV re...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23120 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23120: [SPARK-26151][SQL] Return partial results for bad CSV re...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23120 **[Test build #99201 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99201/testReport)** for PR 23120 at commit [`8f2d69d`](https://github.com/apache/spark/commit/8f2d69d848b8242c529118436249019016069ca2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23105: [SPARK-26140] Enable custom metrics implementatio...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23105#discussion_r235834225 --- Diff: core/src/main/scala/org/apache/spark/shuffle/ShuffleManager.scala --- @@ -48,7 +48,8 @@ private[spark] trait ShuffleManager { handle: ShuffleHandle, startPartition: Int, endPartition: Int, - context: TaskContext): ShuffleReader[K, C] + context: TaskContext, + metrics: ShuffleMetricsReporter): ShuffleReader[K, C] --- End diff -- IIUC, we should pass a read metrics reporter here, as this method is `getReader` which is called by the reducers to read shuffle files. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23105: [SPARK-26140] Enable custom metrics implementatio...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23105#discussion_r235834136 --- Diff: core/src/main/scala/org/apache/spark/shuffle/metrics.scala --- @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.shuffle + +/** + * An interface for reporting shuffle read metrics, for each shuffle. This interface assumes + * all the methods are called on a single-threaded, i.e. concrete implementations would not need + * to synchronize. + * + * All methods have additional Spark visibility modifier to allow public, concrete implementations + * that still have these methods marked as private[spark]. + */ +private[spark] trait ShuffleReadMetricsReporter { --- End diff -- how do we plan to use this interface later on? It's not used in this PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23105: [SPARK-26140] Enable custom metrics implementatio...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23105#discussion_r235834088 --- Diff: core/src/main/scala/org/apache/spark/shuffle/ShuffleMetricsReporter.scala --- @@ -0,0 +1,33 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.shuffle + +/** + * An interface for reporting shuffle information, for each shuffle. This interface assumes --- End diff -- `for each shuffle` -> `for each reducer of a shuffle`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23043: [SPARK-26021][SQL] replace minus zero with zero i...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/23043 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23043: [SPARK-26021][SQL] replace minus zero with zero in Platf...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23043 thanks, merging to master/2.4! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23103: [SPARK-26121] [Structured Streaming] Allow users to defi...
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/23103 @zouzias can you add the new option to docs/structured-streaming-kafka-integration.md as part of this PR? Instructions for building docs are in docs/README.md , ping me if you need a hand. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22598: [SPARK-25501][SS] Add kafka delegation token support.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22598 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22598: [SPARK-25501][SS] Add kafka delegation token support.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22598 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99197/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22598: [SPARK-25501][SS] Add kafka delegation token support.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22598 **[Test build #99197 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99197/testReport)** for PR 22598 at commit [`30df8f1`](https://github.com/apache/spark/commit/30df8f129ad0ae8a373c65f9244db257dfdc0633). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23080: [SPARK-26108][SQL] Support custom lineSep in CSV ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23080#discussion_r235830894 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala --- @@ -377,6 +377,8 @@ final class DataStreamReader private[sql](sparkSession: SparkSession) extends Lo * `multiLine` (default `false`): parse one record, which may span multiple lines. * `locale` (default is `en-US`): sets a locale as language tag in IETF BCP 47 format. * For instance, this is used while parsing dates and timestamps. + * `lineSep` (default covers all `\r`, `\r\n` and `\n`): defines the line separator + * that should be used for parsing. Maximum length is 2. --- End diff -- I'm sorry. can you fix `Maximum length is 2` as well? should be good to go. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23111: [SPARK-26148][PYTHON][TESTS] Increases default paralleli...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23111 Yea, the improvement looks persistent: `Tests passed in 1027 seconds` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23117: [WIP][SPARK-7721][INFRA] Run and generate test coverage ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23117 It's not urgent :) so it's okay. Actually i'm on a vacation for a week as well. Thanks for taking a look @shaneknapp !! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23119: [SPARK-25954][SS][FOLLOWUP][test-maven] Add Zookeeper 3....
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23119 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5297/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23119: [SPARK-25954][SS][FOLLOWUP][test-maven] Add Zookeeper 3....
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23119 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23098: [WIP][SPARK-26132][BUILD][CORE] Remove support fo...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23098#discussion_r235830558 --- Diff: bin/load-spark-env.cmd --- @@ -21,37 +21,42 @@ rem This script loads spark-env.cmd if it exists, and ensures it is only loaded rem spark-env.cmd is loaded from SPARK_CONF_DIR if set, or within the current directory's rem conf\ subdirectory. +set SPARK_ENV_CMD="spark-env.cmd" if [%SPARK_ENV_LOADED%] == [] ( set SPARK_ENV_LOADED=1 if [%SPARK_CONF_DIR%] == [] ( set SPARK_CONF_DIR=%~dp0..\conf ) - call :LoadSparkEnv + set SPARK_ENV_CMD="%SPARK_CONF_DIR%/%SPARK_ENV_CMD%" + if exist "%SPARK_ENV_CMD%" ( +call "%SPARK_ENV_CMD%" + ) ) rem Setting SPARK_SCALA_VERSION if not already set. -set ASSEMBLY_DIR2="%SPARK_HOME%\assembly\target\scala-2.11" -set ASSEMBLY_DIR1="%SPARK_HOME%\assembly\target\scala-2.12" - -if [%SPARK_SCALA_VERSION%] == [] ( - - if exist %ASSEMBLY_DIR2% if exist %ASSEMBLY_DIR1% ( -echo "Presence of build for multiple Scala versions detected." -echo "Either clean one of them or, set SPARK_SCALA_VERSION in spark-env.cmd." -exit 1 - ) - if exist %ASSEMBLY_DIR2% ( -set SPARK_SCALA_VERSION=2.11 - ) else ( -set SPARK_SCALA_VERSION=2.12 - ) -) +rem TODO: revisit for Scala 2.13 support +set SPARK_SCALA_VERSION=2.12 --- End diff -- Here, I ran some of simple commands: ```cmd C:\>set A=aa C:\>ECHO %A% aa C:\>set A="aa" C:\>ECHO %A% "aa" C:\>call "python.exe" Python 3.6.4 (v3.6.4:d48eceb, Dec 19 2017, 06:54:40) [MSC v.1900 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> exit(0) C:\>call python.exe Python 3.6.4 (v3.6.4:d48eceb, Dec 19 2017, 06:54:40) [MSC v.1900 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> exit(0) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23119: [SPARK-25954][SS][FOLLOWUP][test-maven] Add Zookeeper 3....
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23119 **[Test build #99203 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99203/testReport)** for PR 23119 at commit [`9c5d796`](https://github.com/apache/spark/commit/9c5d7961406dccfb16685a18467e02f3ea8c1ce8). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23119: [SPARK-25954][SS][FOLLOWUP][test-maven] Add Zookeeper 3....
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/23119 Retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23119: [SPARK-25954][SS][FOLLOWUP][test-maven] Add Zookeeper 3....
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/23119 The previous successful one was `sbt`. The current on-going one fails with another flacky test, [SPARK-25903](https://issues.apache.org/jira/browse/SPARK-25903). ``` BarrierTaskContextSuite: ... - throw exception if the number of barrier() calls are not the same on every task *** FAILED *** ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23118: [SPARK-26144][BUILD] `build/mvn` should detect `s...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/23118#discussion_r235830226 --- Diff: build/mvn --- @@ -116,7 +116,8 @@ install_zinc() { # the build/ folder install_scala() { # determine the Scala version used in Spark - local scala_version=`grep "scala.version" "${_DIR}/../pom.xml" | head -n1 | awk -F '[<>]' '{print $3}'` + local scala_binary_version=`grep "scala.binary.version" "${_DIR}/../pom.xml" | head -n1 | awk -F '[<>]' '{print $3}'` + local scala_version=`grep "scala.version" "${_DIR}/../pom.xml" | grep ${scala_binary_version} | head -n1 | awk -F '[<>]' '{print $3}'` --- End diff -- oh, I see. Thanks @dongjoon-hyun --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23119: [SPARK-25954][SS][FOLLOWUP][test-maven] Add Zookeeper 3....
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23119 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99196/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23119: [SPARK-25954][SS][FOLLOWUP][test-maven] Add Zookeeper 3....
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23119 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23119: [SPARK-25954][SS][FOLLOWUP][test-maven] Add Zookeeper 3....
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23119 **[Test build #99196 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99196/testReport)** for PR 23119 at commit [`9c5d796`](https://github.com/apache/spark/commit/9c5d7961406dccfb16685a18467e02f3ea8c1ce8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22979 @HyukjinKwon Could it be related to recent changes in python tests? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22979 No chance to pass tests in the PR ;-) ``` test_aggregator (pyspark.sql.tests.test_group.GroupTests) ... # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x0080, pid=40070, tid=139648880690944 # # JRE version: Java(TM) SE Runtime Environment (8.0_60-b27) (build 1.8.0_60-b27) # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.60-b23 mixed mode linux-amd64 compressed oops) # Problematic frame: # C 0x0080 ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23116: [SPARK-24553][UI][FOLLOWUP] Fix unnecessary UI re...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/23116 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23119: [SPARK-25954][SS][FOLLOWUP][test-maven] Add Zookeeper 3....
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23119 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5296/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23119: [SPARK-25954][SS][FOLLOWUP][test-maven] Add Zookeeper 3....
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23119 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23119: [SPARK-25954][SS][FOLLOWUP][test-maven] Add Zookeeper 3....
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23119 **[Test build #99202 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99202/testReport)** for PR 23119 at commit [`9c5d796`](https://github.com/apache/spark/commit/9c5d7961406dccfb16685a18467e02f3ea8c1ce8). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23118: [SPARK-26144][BUILD] `build/mvn` should detect `s...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/23118 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23116: [SPARK-24553][UI][FOLLOWUP] Fix unnecessary UI redirect
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/23116 Thank you, @jerryshao . Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org