[GitHub] spark pull request #20201: [SPARK-22389][SQL] data source v2 partitioning re...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20201#discussion_r162733684 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/Partitioning.java --- @@ -0,0 +1,46 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.sources.v2.reader; + +import org.apache.spark.annotation.InterfaceStability; + +/** + * An interface to represent output data partitioning for a data source, which is returned by + * {@link SupportsReportPartitioning#outputPartitioning()}. Note that this should work like a + * snapshot, once created, it should be deterministic and always report same number of partitions --- End diff -- `, once` -> `. Once` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20201: [SPARK-22389][SQL] data source v2 partitioning re...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20201#discussion_r162733629 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/Partitioning.java --- @@ -0,0 +1,46 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.sources.v2.reader; + +import org.apache.spark.annotation.InterfaceStability; + +/** + * An interface to represent output data partitioning for a data source, which is returned by --- End diff -- `output` -> `the output` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20201: [SPARK-22389][SQL] data source v2 partitioning re...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20201#discussion_r162733463 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/Distribution.java --- @@ -0,0 +1,39 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.sources.v2.reader; + +import org.apache.spark.annotation.InterfaceStability; + +/** + * An interface to represent data distribution requirement, which specifies how the records should + * be distributed among the {@link ReadTask}s that are returned by + * {@link DataSourceV2Reader#createReadTasks()}. Note that this interface has nothing to do with + * the data ordering inside one partition(the output records of a single {@link ReadTask}). + * + * The instance of this interface is created and provided by Spark, then consumed by + * {@link Partitioning#satisfy(Distribution)}. This means users don't need to implement --- End diff -- `users ` -> `data source developers` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20201: [SPARK-22389][SQL] data source v2 partitioning re...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20201#discussion_r162733351 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourcePartitioning.scala --- @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources.v2 + +import org.apache.spark.sql.catalyst.expressions.{Attribute, AttributeMap} +import org.apache.spark.sql.catalyst.plans.physical +import org.apache.spark.sql.sources.v2.reader.{ClusteredDistribution, Partitioning} + +/** + * An adapter from public data source partitioning to catalyst internal partitioning. --- End diff -- `Partitioning ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20299: [SPARK-23135][ui] Fix rendering of accumulators in the s...
Github user sameeragarwal commented on the issue: https://github.com/apache/spark/pull/20299 LGTM. Merging this to master/2.3. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20201: [SPARK-22389][SQL] data source v2 partitioning re...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20201#discussion_r162733141 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/ClusteredDistribution.java --- @@ -0,0 +1,34 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.sources.v2.reader; + +import org.apache.spark.annotation.InterfaceStability; + +/** + * A concrete implementation of {@link Distribution}. Represents a distribution where records that + * share the same values for the {@link #clusteredColumns} will be produced by the same + * {@link ReadTask}. + */ +@InterfaceStability.Evolving +public class ClusteredDistribution implements Distribution { + public String[] clusteredColumns; --- End diff -- Need to emphasize these columns are order insensitive. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20201: [SPARK-22389][SQL] data source v2 partitioning re...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20201#discussion_r162732939 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/sources/v2/DataSourceV2Suite.scala --- @@ -95,6 +96,34 @@ class DataSourceV2Suite extends QueryTest with SharedSQLContext { } } + test("partitioning reporting") { +import org.apache.spark.sql.functions.{count, sum} +Seq(classOf[PartitionAwareDataSource], classOf[JavaPartitionAwareDataSource]).foreach { cls => + withClue(cls.getName) { +val df = spark.read.format(cls.getName).load() +checkAnswer(df, Seq(Row(1, 4), Row(1, 4), Row(3, 6), Row(2, 6), Row(4, 2), Row(4, 2))) + +val groupByColA = df.groupBy('a).agg(sum('b)) +checkAnswer(groupByColA, Seq(Row(1, 8), Row(2, 6), Row(3, 6), Row(4, 4))) +assert(groupByColA.queryExecution.executedPlan.collectFirst { + case e: ShuffleExchangeExec => e +}.isEmpty) + +val groupByColAB = df.groupBy('a, 'b).agg(count("*")) --- End diff -- Try `df.groupBy('a + 'b).agg(count("*")).show()` At least, it should not fail, even if we do not support complex `ClusteredDistribution` expressions --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20203: [SPARK-22577] [core] executor page blacklist status shou...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20203 **[Test build #86400 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86400/testReport)** for PR 20203 at commit [`cf6e0c9`](https://github.com/apache/spark/commit/cf6e0c919e151c26772ec78a10abc6d2454f7dd5). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20091: [SPARK-22465][FOLLOWUP] Update the number of partitions ...
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/20091 @mridulm Thank you! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20091: [SPARK-22465][FOLLOWUP] Update the number of partitions ...
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/20091 @jiangxb1987 Thanks for clarifying, looks good to me - I will merge it later today evening (assuming someone else does not before :) ) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20319: [SPARK-22884][ML][TESTS] ML test for StructuredStreaming...
Github user smurakozi commented on the issue: https://github.com/apache/spark/pull/20319 @jkbradley could you check out this change, please? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20235: [Spark-22887][ML][TESTS][WIP] ML test for StructuredStre...
Github user smurakozi commented on the issue: https://github.com/apache/spark/pull/20235 @jkbradley could you check out this change, please? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20330: [SPARK-23121][core] Fix for ui becoming unaccessi...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20330#discussion_r162721177 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/AllJobsPage.scala --- @@ -427,23 +435,21 @@ private[ui] class JobDataSource( val formattedDuration = duration.map(d => UIUtils.formatDuration(d)).getOrElse("Unknown") val submissionTime = jobData.submissionTime val formattedSubmissionTime = submissionTime.map(UIUtils.formatDate).getOrElse("Unknown") -val lastStageAttempt = store.lastStageAttempt(jobData.stageIds.max) -val lastStageDescription = lastStageAttempt.description.getOrElse("") +val (lastStageName, lastStageDescription) = lastStageNameAndDescription(store, jobData) -val formattedJobDescription = - UIUtils.makeDescription(lastStageDescription, basePath, plainText = false) +val jobDescription = UIUtils.makeDescription(lastStageDescription, basePath, plainText = false) --- End diff -- Sure, but don't you want the same behavior as above here (falling back to the job name)? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20330: [SPARK-23121][core] Fix for ui becoming unaccessible for...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20330 **[Test build #86399 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86399/testReport)** for PR 20330 at commit [`d5fdabb`](https://github.com/apache/spark/commit/d5fdabb678f4df7c101d8660cb7c37086e35489a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19993: [SPARK-22799][ML] Bucketizer should throw excepti...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19993#discussion_r162719519 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala --- @@ -201,9 +184,13 @@ final class Bucketizer @Since("1.4.0") (@Since("1.4.0") override val uid: String @Since("1.4.0") override def transformSchema(schema: StructType): StructType = { -if (isBucketizeMultipleColumns()) { +ParamValidators.checkExclusiveParams(this, "inputCol", "inputCols") --- End diff -- my initial implementation (with @hhbyyh's comments) was more generic and checked what you said. After, @MLnick and @viirya asked to switch to a more generic approach which is the current you see. I'm fine with either of those, but I think we need to choose one way and go in that direction, otherwise we just loose time. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19993: [SPARK-22799][ML] Bucketizer should throw exception if s...
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/19993 @jkbradley sure no problem, let me know how I can help. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19993: [SPARK-22799][ML] Bucketizer should throw excepti...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19993#discussion_r162717263 --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala --- @@ -166,6 +167,8 @@ private[ml] object Param { @DeveloperApi object ParamValidators { + private val LOGGER = LoggerFactory.getLogger(ParamValidators.getClass) --- End diff -- Let's switch this to use the Logging trait, to match other MLlib patterns. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19993: [SPARK-22799][ML] Bucketizer should throw excepti...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/19993#discussion_r162717142 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala --- @@ -201,9 +184,13 @@ final class Bucketizer @Since("1.4.0") (@Since("1.4.0") override val uid: String @Since("1.4.0") override def transformSchema(schema: StructType): StructType = { -if (isBucketizeMultipleColumns()) { +ParamValidators.checkExclusiveParams(this, "inputCol", "inputCols") --- End diff -- The problem with trying to use a general method like this is that it's hard to capture model-specific requirements. This currently misses checking to make sure that exactly one (not just <= 1) of each pair is available, plus that all of the single-column OR all of the multi-column Params are available. (The same issue occurs in https://github.com/apache/spark/pull/20146 ) It will also be hard to check these items and account for defaults. I'd argue that it's not worth trying to use generic checking functions here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20203: [SPARK-22577] [core] executor page blacklist status shou...
Github user squito commented on the issue: https://github.com/apache/spark/pull/20203 btw another way you could test out having a bad host would be something like this (untested): ```scala import org.apache.spark.SparkEnv val hosts = sc.parallelize(1 to 1, 100).map { _ => InetAddress.getHostName()}.collect().toSet val badHost = hosts.head sc.parallelize(1 to 1, 10).map { x => if (InetAddress.getHostName() == badHost) throw new RuntimeException("Bad host") else (x % 3, x) }.reduceByKey((a, b) => a + b).collect() ``` that way you make sure the failures are consistently on one host, not dependent on higher executor ids getting concentrated on one host. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20203: [SPARK-22577] [core] executor page blacklist stat...
Github user attilapiros commented on a diff in the pull request: https://github.com/apache/spark/pull/20203#discussion_r162716271 --- Diff: core/src/test/scala/org/apache/spark/scheduler/TaskSetBlacklistSuite.scala --- @@ -59,31 +60,55 @@ class TaskSetBlacklistSuite extends SparkFunSuite with BeforeAndAfterEach with M val shouldBeBlacklisted = (executor == "exec1" && index == 0) assert(taskSetBlacklist.isExecutorBlacklistedForTask(executor, index) === shouldBeBlacklisted) } + assert(!taskSetBlacklist.isExecutorBlacklistedForTaskSet("exec1")) +verify(listenerBusMock, never()) + .post(isA(classOf[SparkListenerExecutorBlacklistedForStage])) + assert(!taskSetBlacklist.isNodeBlacklistedForTaskSet("hostA")) +verify(listenerBusMock, never()) + .post(isA(classOf[SparkListenerNodeBlacklistedForStage])) // Mark task 1 failed on exec1 -- this pushes the executor into the blacklist taskSetBlacklist.updateBlacklistForFailedTask( "hostA", exec = "exec1", index = 1, failureReason = "testing") + assert(taskSetBlacklist.isExecutorBlacklistedForTaskSet("exec1")) -assert(!taskSetBlacklist.isNodeBlacklistedForTaskSet("hostA")) verify(listenerBusMock).post( SparkListenerExecutorBlacklistedForStage(0, "exec1", 2, 0, attemptId)) + +assert(!taskSetBlacklist.isNodeBlacklistedForTaskSet("hostA")) +verify(listenerBusMock, never()) + .post(isA(classOf[SparkListenerNodeBlacklistedForStage])) + // Mark one task as failed on exec2 -- not enough for any further blacklisting yet. taskSetBlacklist.updateBlacklistForFailedTask( "hostA", exec = "exec2", index = 0, failureReason = "testing") assert(taskSetBlacklist.isExecutorBlacklistedForTaskSet("exec1")) + assert(!taskSetBlacklist.isExecutorBlacklistedForTaskSet("exec2")) +verify(listenerBusMock, never()).post( + SparkListenerNodeBlacklistedForStage(0, "hostA", 2, 0, attemptId)) + assert(!taskSetBlacklist.isNodeBlacklistedForTaskSet("hostA")) +verify(listenerBusMock, never()) + .post(isA(classOf[SparkListenerNodeBlacklistedForStage])) --- End diff -- yes, you are right --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20146: [SPARK-11215][ML] Add multiple columns support to...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20146#discussion_r162715788 --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala --- @@ -249,6 +249,16 @@ object ParamValidators { def arrayLengthGt[T](lowerBound: Double): Array[T] => Boolean = { (value: Array[T]) => value.length > lowerBound } + + /** Check if more than one param in a set of exclusive params are set. */ + def checkExclusiveParams(model: Params, params: String*): Unit = { +if (params.filter(paramName => model.hasParam(paramName) && --- End diff -- Why is this checking to see if the Param belongs to the Model? If this method is called with irrelevant Params, shouldn't it throw an error? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20203: [SPARK-22577] [core] executor page blacklist stat...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/20203#discussion_r162714257 --- Diff: core/src/test/scala/org/apache/spark/scheduler/TaskSetBlacklistSuite.scala --- @@ -59,31 +60,55 @@ class TaskSetBlacklistSuite extends SparkFunSuite with BeforeAndAfterEach with M val shouldBeBlacklisted = (executor == "exec1" && index == 0) assert(taskSetBlacklist.isExecutorBlacklistedForTask(executor, index) === shouldBeBlacklisted) } + assert(!taskSetBlacklist.isExecutorBlacklistedForTaskSet("exec1")) +verify(listenerBusMock, never()) + .post(isA(classOf[SparkListenerExecutorBlacklistedForStage])) + assert(!taskSetBlacklist.isNodeBlacklistedForTaskSet("hostA")) +verify(listenerBusMock, never()) + .post(isA(classOf[SparkListenerNodeBlacklistedForStage])) // Mark task 1 failed on exec1 -- this pushes the executor into the blacklist taskSetBlacklist.updateBlacklistForFailedTask( "hostA", exec = "exec1", index = 1, failureReason = "testing") + assert(taskSetBlacklist.isExecutorBlacklistedForTaskSet("exec1")) -assert(!taskSetBlacklist.isNodeBlacklistedForTaskSet("hostA")) verify(listenerBusMock).post( SparkListenerExecutorBlacklistedForStage(0, "exec1", 2, 0, attemptId)) + +assert(!taskSetBlacklist.isNodeBlacklistedForTaskSet("hostA")) +verify(listenerBusMock, never()) + .post(isA(classOf[SparkListenerNodeBlacklistedForStage])) + // Mark one task as failed on exec2 -- not enough for any further blacklisting yet. taskSetBlacklist.updateBlacklistForFailedTask( "hostA", exec = "exec2", index = 0, failureReason = "testing") assert(taskSetBlacklist.isExecutorBlacklistedForTaskSet("exec1")) + assert(!taskSetBlacklist.isExecutorBlacklistedForTaskSet("exec2")) +verify(listenerBusMock, never()).post( + SparkListenerNodeBlacklistedForStage(0, "hostA", 2, 0, attemptId)) + assert(!taskSetBlacklist.isNodeBlacklistedForTaskSet("hostA")) +verify(listenerBusMock, never()) + .post(isA(classOf[SparkListenerNodeBlacklistedForStage])) --- End diff -- the `verify` you add just above this is pointless with this one too, right? I think you only need this one. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20203: [SPARK-22577] [core] executor page blacklist status shou...
Github user attilapiros commented on the issue: https://github.com/apache/spark/pull/20203 The node blacklisting is tested by unit tests: - HistoryServerSuite - TaskSetBlacklistSuite - AppStatusListenerSuite And manually with a 2 node cluster: https://issues.apache.org/jira/secure/attachment/12906833/node_blacklisting_for_stage.png Here you can see apiros3.gce.test.com was node blacklisted for the stage because of failures on executor 4 and 5. As expected executor 3 is also blacklisted even it has no failures itself but sharing the node with 4 and 5. Spark was started as: ``` bash ./bin/spark-shell --master yarn --deploy-mode client --executor-memory=2G --num-executors=8 --conf "spark.blacklist.enabled=true" --conf "spark.blacklist.stage.maxFailedTasksPerExecutor=1" --conf "spark.blacklist.stage.maxFailedExecutorsPerNode=1" --conf "spark.blacklist.application.maxFailedTasksPerExecutor=10" --conf "spark.eventLog.enabled=true" ``` And the job was: ``` scala import org.apache.spark.SparkEnv sc.parallelize(1 to 1, 10).map { x => if (SparkEnv.get.executorId.toInt >= 4) throw new RuntimeException("Bad executor") else (x % 3, x) }.reduceByKey((a, b) => a + b).collect() ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20332: [SPARK-23138][ML][DOC] Multiclass logistic regression su...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20332 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20332: [SPARK-23138][ML][DOC] Multiclass logistic regression su...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20332 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86397/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20332: [SPARK-23138][ML][DOC] Multiclass logistic regression su...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20332 **[Test build #86397 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86397/testReport)** for PR 20332 at commit [`58d973e`](https://github.com/apache/spark/commit/58d973e204bd62128567fd3dfb2e5a335ac46bf1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20284: [SPARK-23103][core] Ensure correct sort order for...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20284 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20330: [SPARK-23121][core] Fix for ui becoming unaccessi...
Github user smurakozi commented on a diff in the pull request: https://github.com/apache/spark/pull/20330#discussion_r162712767 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/AllJobsPage.scala --- @@ -427,23 +435,21 @@ private[ui] class JobDataSource( val formattedDuration = duration.map(d => UIUtils.formatDuration(d)).getOrElse("Unknown") val submissionTime = jobData.submissionTime val formattedSubmissionTime = submissionTime.map(UIUtils.formatDate).getOrElse("Unknown") -val lastStageAttempt = store.lastStageAttempt(jobData.stageIds.max) -val lastStageDescription = lastStageAttempt.description.getOrElse("") +val (lastStageName, lastStageDescription) = lastStageNameAndDescription(store, jobData) -val formattedJobDescription = - UIUtils.makeDescription(lastStageDescription, basePath, plainText = false) +val jobDescription = UIUtils.makeDescription(lastStageDescription, basePath, plainText = false) --- End diff -- `lastStageDescription` may be empty, but it will not cause problems, `makeDescription` will handle it properly, just like in the version before lastStageAttempt was used: ``` val jobDescription = UIUtils.makeDescription(jobData.description.getOrElse(""), basePath, plainText = false) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20203: [SPARK-22577] [core] executor page blacklist status shou...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20203 **[Test build #86398 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86398/testReport)** for PR 20203 at commit [`41dd7bb`](https://github.com/apache/spark/commit/41dd7bbc1f62e093738e730bf3f5bfeb3dff16fb). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20284: [SPARK-23103][core] Ensure correct sort order for negati...
Github user squito commented on the issue: https://github.com/apache/spark/pull/20284 even though we don't *know* of this causing a bug in 2.3, I still think we should merge it in there just because there may be some case we aren't thinking of, and this is a relatively small, safe fix. so, I'm merging to master & 2.3 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20138: [SPARK-20664][core] Delete stale application data...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20138 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20331: [SPARK-23158] [SQL] Move HadoopFsRelationTest test suite...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20331 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20138: [SPARK-20664][core] Delete stale application data from S...
Github user squito commented on the issue: https://github.com/apache/spark/pull/20138 as RC1 failed and RC2 is going to be cut soon, I'm going to merge this to master & 2.3 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20331: [SPARK-23158] [SQL] Move HadoopFsRelationTest test suite...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20331 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86393/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20331: [SPARK-23158] [SQL] Move HadoopFsRelationTest test suite...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20331 **[Test build #86393 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86393/testReport)** for PR 20331 at commit [`f7693f0`](https://github.com/apache/spark/commit/f7693f0abfe0923868c1918ddcaeaece2c107c5d). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `abstract class HadoopFsRelationTest extends QueryTest with SQLTestUtils with SharedSQLContext ` * `class JsonHadoopFsRelationSuite extends HadoopFsRelationTest ` * `class OrcHadoopFsRelationSuite extends HadoopFsRelationTest ` * `class ParquetHadoopFsRelationSuite extends HadoopFsRelationTest ` * `class SimpleTextHadoopFsRelationSuite extends HadoopFsRelationTest with PredicateHelper ` * `class SimpleTextSource extends TextBasedFileFormat with DataSourceRegister ` * `class SimpleTextOutputWriter(path: String, dataSchema: StructType, context: TaskAttemptContext)` * `class HiveOrcHadoopFsRelationSuite extends OrcHadoopFsRelationSuite ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20332: [SPARK-23138][ML][DOC] Multiclass logistic regression su...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20332 **[Test build #86397 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86397/testReport)** for PR 20332 at commit [`58d973e`](https://github.com/apache/spark/commit/58d973e204bd62128567fd3dfb2e5a335ac46bf1). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20332: [SPARK-23138][ML][DOC] Multiclass logistic regression su...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20332 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/48/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20332: [SPARK-23138][ML][DOC] Multiclass logistic regression su...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20332 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20331: [SPARK-23158] [SQL] Move HadoopFsRelationTest test suite...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20331 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20331: [SPARK-23158] [SQL] Move HadoopFsRelationTest test suite...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20331 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86392/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20331: [SPARK-23158] [SQL] Move HadoopFsRelationTest test suite...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20331 **[Test build #86392 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86392/testReport)** for PR 20331 at commit [`f7693f0`](https://github.com/apache/spark/commit/f7693f0abfe0923868c1918ddcaeaece2c107c5d). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `abstract class HadoopFsRelationTest extends QueryTest with SQLTestUtils with SharedSQLContext ` * `class JsonHadoopFsRelationSuite extends HadoopFsRelationTest ` * `class OrcHadoopFsRelationSuite extends HadoopFsRelationTest ` * `class ParquetHadoopFsRelationSuite extends HadoopFsRelationTest ` * `class SimpleTextHadoopFsRelationSuite extends HadoopFsRelationTest with PredicateHelper ` * `class SimpleTextSource extends TextBasedFileFormat with DataSourceRegister ` * `class SimpleTextOutputWriter(path: String, dataSchema: StructType, context: TaskAttemptContext)` * `class HiveOrcHadoopFsRelationSuite extends OrcHadoopFsRelationSuite ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17123: [SPARK-19781][ML] Handle NULLs as well as NaNs in Bucket...
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/17123 But, pls resolve conflicts first. :) Bucketizer add multiple column support so the code is different now. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19993: [SPARK-22799][ML] Bucketizer should throw exception if s...
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/19993 Since RC1 for 2.3 failed, it'd be great to get this into 2.3. @mgaido91 do you mind if I send my comments along with a PR to update this PR of yours? I'm rushing because of the time pressure to get this into 2.3 (to avoid a behavior change between 2.3 and 2.4). Thanks in advance! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20332: [SPARK-23138][ML][DOC] Multiclass logistic regression su...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20332 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20332: [SPARK-23138][ML][DOC] Multiclass logistic regression su...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20332 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/47/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20332: [SPARK-23138][ML][DOC] Multiclass logistic regression su...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20332 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86396/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20332: [SPARK-23138][ML][DOC] Multiclass logistic regression su...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20332 **[Test build #86396 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86396/testReport)** for PR 20332 at commit [`cb6c811`](https://github.com/apache/spark/commit/cb6c811e98d9739a7c1608880b2d0037cdeb5990). * This patch **fails Python style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20332: [SPARK-23138][ML][DOC] Multiclass logistic regression su...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20332 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17123: [SPARK-19781][ML] Handle NULLs as well as NaNs in...
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/17123#discussion_r162703711 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala --- @@ -171,23 +176,23 @@ object Bucketizer extends DefaultParamsReadable[Bucketizer] { * Binary searching in several buckets to place each data point. * @param splits array of split points * @param feature data point - * @param keepInvalid NaN flag. - *Set "true" to make an extra bucket for NaN values; - *Set "false" to report an error for NaN values + * @param keepInvalid NaN/NULL flag. + *Set "true" to make an extra bucket for NaN/NULL values; + *Set "false" to report an error for NaN/NULL values * @return bucket for each data point * @throws SparkException if a feature is < splits.head or > splits.last */ private[feature] def binarySearchForBuckets( splits: Array[Double], - feature: Double, + feature: java.lang.Double, --- End diff -- Also change to `Option[Double]` here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17123: [SPARK-19781][ML] Handle NULLs as well as NaNs in...
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/17123#discussion_r162703633 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala --- @@ -105,20 +106,21 @@ final class Bucketizer @Since("1.4.0") (@Since("1.4.0") override val uid: String transformSchema(dataset.schema) val (filteredDataset, keepInvalid) = { if (getHandleInvalid == Bucketizer.SKIP_INVALID) { -// "skip" NaN option is set, will filter out NaN values in the dataset +// "skip" NaN/NULL option is set, will filter out NaN/NULL values in the dataset (dataset.na.drop().toDF(), false) } else { (dataset.toDF(), getHandleInvalid == Bucketizer.KEEP_INVALID) } } -val bucketizer: UserDefinedFunction = udf { (feature: Double) => --- End diff -- As @cloud-fan suggested, `Option[Double]` is better. :-) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20332: [SPARK-23138][ML][DOC] Multiclass summary example and us...
Github user sethah commented on the issue: https://github.com/apache/spark/pull/20332 @jkbradley @MLnick --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20332: [SPARK-23138][ML][DOC] Multiclass summary example and us...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20332 **[Test build #86396 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86396/testReport)** for PR 20332 at commit [`cb6c811`](https://github.com/apache/spark/commit/cb6c811e98d9739a7c1608880b2d0037cdeb5990). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20332: [SPARK-23138][ML][DOC] Multiclass summary example...
GitHub user sethah opened a pull request: https://github.com/apache/spark/pull/20332 [SPARK-23138][ML][DOC] Multiclass summary example and user guide ## What changes were proposed in this pull request? User guide and examples are updated to reflect multiclass logistic regression summary which was added in [SPARK-17139](https://issues.apache.org/jira/browse/SPARK-17139). I did not make a separate summary example, but added the summary code to the multiclass example that already existed. I don't see the need for a separate example for the summary. ## How was this patch tested? Docs and examples only. Ran all examples locally using spark-submit. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sethah/spark multiclass_summary_example Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20332.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20332 commit 9299fc83d2edab956bd13b2e1c985f64dcd2643e Author: sethahDate: 2018-01-19T17:52:10Z adding examples for python, scala, and java commit bf076ed09abb3bb474e0925b3b9c4dbc6e90771a Author: sethah Date: 2018-01-19T18:43:01Z use binaryTrainingSummary commit d0aa9f19550deb620e515ec33004be365c5439be Author: sethah Date: 2018-01-19T18:46:16Z import cleanup commit cb6c811e98d9739a7c1608880b2d0037cdeb5990 Author: sethah Date: 2018-01-19T18:51:28Z clarify user guide --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18983: [SPARK-21771][SQL]remove useless hive client in SparkSQL...
Github user liufengdb commented on the issue: https://github.com/apache/spark/pull/18983 LGTM! It is only created once though. Frankly, we should completely remove the implementation of `newSession()` method. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20091: [SPARK-22465][FOLLOWUP] Update the number of partitions ...
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/20091 @mridulm Great write up! Yeah it's exactly that you described, and I've copied them to the PR description. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20025: [SPARK-22837][SQL]Session timeout checker does no...
Github user liufengdb commented on a diff in the pull request: https://github.com/apache/spark/pull/20025#discussion_r162698093 --- Diff: sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/session/SessionManager.java --- @@ -80,7 +76,6 @@ public synchronized void init(HiveConf hiveConf) { } createBackgroundOperationPool(); addService(operationManager); -super.init(hiveConf); --- End diff -- hmm, I think we revert keep this line too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17894: [WIP][SPARK-17134][ML] Use level 2 BLAS operations in Lo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17894 Build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17894: [WIP][SPARK-17134][ML] Use level 2 BLAS operations in Lo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17894 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/46/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20328: [SPARK-23000] [TEST] Keep Derby DB Location Unchanged Af...
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/20328 An late LGTM! :) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20185: Branch 2.3
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20185 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20314: [SPARK-23104][K8S][Docs] Changes to Kubernetes sc...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20314 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20297 **[Test build #86395 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86395/testReport)** for PR 20297 at commit [`95bac27`](https://github.com/apache/spark/commit/95bac2773ee7adab9f57aa4377ff2e998353f02f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20314: [SPARK-23104][K8S][Docs] Changes to Kubernetes scheduler...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/20314 Merging to master / 2.3. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20297 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/45/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20297 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20297 **[Test build #4067 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4067/testReport)** for PR 20297 at commit [`95bac27`](https://github.com/apache/spark/commit/95bac2773ee7adab9f57aa4377ff2e998353f02f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/20297 I kicked an extra couple of builds aside from the one that should auto-trigger. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20297 **[Test build #4066 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4066/testReport)** for PR 20297 at commit [`95bac27`](https://github.com/apache/spark/commit/95bac2773ee7adab9f57aa4377ff2e998353f02f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20297: [SPARK-23020][CORE] Fix races in launcher code, t...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20297#discussion_r162694343 --- Diff: launcher/src/main/java/org/apache/spark/launcher/LauncherServer.java --- @@ -331,23 +358,27 @@ protected void handle(Message msg) throws IOException { timeout.cancel(); } close(); +if (handle != null) { + handle.dispose(); +} } finally { timeoutTimer.purge(); } } @Override public void close() throws IOException { + if (!isOpen()) { +return; + } + synchronized (clients) { clients.remove(this); } - super.close(); - if (handle != null) { -if (!handle.getState().isFinal()) { - LOG.log(Level.WARNING, "Lost connection to spark application."); - handle.setState(SparkAppHandle.State.LOST); -} -handle.disconnect(); --- End diff -- See https://github.com/apache/spark/pull/20297#pullrequestreview-89568079 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20297: [SPARK-23020][CORE] Fix races in launcher code, t...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20297#discussion_r162694174 --- Diff: launcher/src/main/java/org/apache/spark/launcher/LauncherServer.java --- @@ -331,23 +358,27 @@ protected void handle(Message msg) throws IOException { timeout.cancel(); } close(); +if (handle != null) { + handle.dispose(); +} } finally { timeoutTimer.purge(); } } @Override public void close() throws IOException { + if (!isOpen()) { +return; + } + synchronized (clients) { clients.remove(this); } - super.close(); - if (handle != null) { -if (!handle.getState().isFinal()) { - LOG.log(Level.WARNING, "Lost connection to spark application."); - handle.setState(SparkAppHandle.State.LOST); -} -handle.disconnect(); + + synchronized (this) { +super.close(); +notifyAll(); --- End diff -- See L239. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20297: [SPARK-23020][CORE] Fix races in launcher code, t...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20297#discussion_r162693890 --- Diff: launcher/src/main/java/org/apache/spark/launcher/LauncherConnection.java --- @@ -95,15 +95,15 @@ protected synchronized void send(Message msg) throws IOException { } @Override - public void close() throws IOException { + public synchronized void close() throws IOException { --- End diff -- We never *needed* to change it, but the extra code wasn't doing anything useful, so I chose the simpler version. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20297: [SPARK-23020][CORE] Fix races in launcher code, t...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20297#discussion_r162693731 --- Diff: launcher/src/main/java/org/apache/spark/launcher/ChildProcAppHandle.java --- @@ -48,14 +48,16 @@ public synchronized void disconnect() { @Override public synchronized void kill() { -disconnect(); -if (childProc != null) { - if (childProc.isAlive()) { -childProc.destroyForcibly(); +if (!isDisposed()) { + setState(State.KILLED); --- End diff -- None of the calls below should raise exceptions. Even the socket close is wrapped in a try..catch. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19340: [SPARK-22119][ML] Add cosine distance to KMeans
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19340 **[Test build #4065 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4065/testReport)** for PR 19340 at commit [`fda93ae`](https://github.com/apache/spark/commit/fda93aeadd782d520f32eb34475e3a7fa349c425). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20330: [SPARK-23121][core] Fix for ui becoming unaccessible for...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20330 **[Test build #4063 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4063/testReport)** for PR 20330 at commit [`6525ef4`](https://github.com/apache/spark/commit/6525ef4eda0bf65bbbcb842495341afc8c5971ad). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20324: [SPARK-23091][ML] Incorrect unit test for approxQuantile
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20324 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86390/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20324: [SPARK-23091][ML] Incorrect unit test for approxQuantile
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20324 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20324: [SPARK-23091][ML] Incorrect unit test for approxQuantile
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20324 **[Test build #86390 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86390/testReport)** for PR 20324 at commit [`673c520`](https://github.com/apache/spark/commit/673c52042a70b5dfc061dd053ae2e6553a4a2612). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20277: [SPARK-23090][SQL] polish ColumnVector
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20277 **[Test build #86394 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86394/testReport)** for PR 20277 at commit [`3972093`](https://github.com/apache/spark/commit/397209342646a253a56650df8a00dfb6d66c834e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20277: [SPARK-23090][SQL] polish ColumnVector
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20277 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20277: [SPARK-23090][SQL] polish ColumnVector
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20277 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/44/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20277: [SPARK-23090][SQL] polish ColumnVector
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20277 retest this please, since the `ColumnarBatch` PR has been merged. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20326: [SPARK-23155][DEPLOY] log.server.url links in SHS
Github user gerashegalov commented on the issue: https://github.com/apache/spark/pull/20326 @vanzin do you mind considering this issue? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20330: [SPARK-23121][core] Fix for ui becoming unaccessi...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20330#discussion_r162687023 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/AllJobsPage.scala --- @@ -65,10 +68,13 @@ private[ui] class AllJobsPage(parent: JobsTab, store: AppStatusStore) extends We }.map { job => val jobId = job.jobId val status = job.status - val jobDescription = store.lastStageAttempt(job.stageIds.max).description - val displayJobDescription = jobDescription -.map(UIUtils.makeDescription(_, "", plainText = true).text) -.getOrElse("") + val (_, lastStageDescription) = lastStageNameAndDescription(store, job) + val displayJobDescription = +if (lastStageDescription.isEmpty) { --- End diff -- nit: I generally prefer the opposite check. ``` if (data is good) do something with data else fallback to something else ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20330: [SPARK-23121][core] Fix for ui becoming unaccessi...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20330#discussion_r162687347 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala --- @@ -18,7 +18,7 @@ package org.apache.spark.ui.jobs import java.net.URLEncoder -import java.util.Date +import java.util.{Collections, Date} --- End diff -- New import is unused? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20330: [SPARK-23121][core] Fix for ui becoming unaccessi...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20330#discussion_r162687444 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala --- @@ -31,6 +31,7 @@ import org.apache.spark.SparkConf import org.apache.spark.internal.config._ import org.apache.spark.scheduler.TaskLocality import org.apache.spark.status._ +import org.apache.spark.status.api.v1 --- End diff -- Just use `JobData` since it's already imported? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20330: [SPARK-23121][core] Fix for ui becoming unaccessi...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20330#discussion_r162687232 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/JobPage.scala --- @@ -336,8 +336,14 @@ private[ui] class JobPage(parent: JobsTab, store: AppStatusStore) extends WebUIP content ++= makeTimeline(activeStages ++ completedStages ++ failedStages, store.executorList(false), appStartTime) -content ++= UIUtils.showDagVizForJob( - jobId, store.operationGraphForJob(jobId)) +val operationGraphContent = store.asOption(store.operationGraphForJob(jobId)) match { + case Some(operationGraph) => UIUtils.showDagVizForJob(jobId, operationGraph) + case None => + --- End diff -- Indentation is off. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20330: [SPARK-23121][core] Fix for ui becoming unaccessi...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20330#discussion_r162687160 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/AllJobsPage.scala --- @@ -427,23 +435,21 @@ private[ui] class JobDataSource( val formattedDuration = duration.map(d => UIUtils.formatDuration(d)).getOrElse("Unknown") val submissionTime = jobData.submissionTime val formattedSubmissionTime = submissionTime.map(UIUtils.formatDate).getOrElse("Unknown") -val lastStageAttempt = store.lastStageAttempt(jobData.stageIds.max) -val lastStageDescription = lastStageAttempt.description.getOrElse("") +val (lastStageName, lastStageDescription) = lastStageNameAndDescription(store, jobData) -val formattedJobDescription = - UIUtils.makeDescription(lastStageDescription, basePath, plainText = false) +val jobDescription = UIUtils.makeDescription(lastStageDescription, basePath, plainText = false) --- End diff -- No need to check for empty description here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20330: [SPARK-23121][core] Fix for ui becoming unaccessible for...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20330 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20330: [SPARK-23121][core] Fix for ui becoming unaccessible for...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20330 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86389/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20330: [SPARK-23121][core] Fix for ui becoming unaccessible for...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20330 **[Test build #86389 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86389/testReport)** for PR 20330 at commit [`6525ef4`](https://github.com/apache/spark/commit/6525ef4eda0bf65bbbcb842495341afc8c5971ad). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20331: [SPARK-23158] [SQL] Move HadoopFsRelationTest test suite...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20331 **[Test build #86393 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86393/testReport)** for PR 20331 at commit [`f7693f0`](https://github.com/apache/spark/commit/f7693f0abfe0923868c1918ddcaeaece2c107c5d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20331: [SPARK-23158] [SQL] Move HadoopFsRelationTest test suite...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20331 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/43/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20331: [SPARK-23158] [SQL] Move HadoopFsRelationTest test suite...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20331 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20331: [SPARK-23158] [SQL] Move HadoopFsRelationTest test suite...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20331 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20331: [SPARK-23158] [SQL] Move HadoopFsRelationTest tes...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20331#discussion_r162683746 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcHadoopFsRelationSuite.scala --- @@ -82,44 +80,4 @@ class OrcHadoopFsRelationSuite extends HadoopFsRelationTest { } } } - - test("SPARK-13543: Support for specifying compression codec for ORC via option()") { -withTempPath { dir => - val path = s"${dir.getCanonicalPath}/table1" - val df = (1 to 5).map(i => (i, (i % 2).toString)).toDF("a", "b") - df.write -.option("compression", "ZlIb") -.orc(path) - - // Check if this is compressed as ZLIB. - val maybeOrcFile = new File(path).listFiles().find { f => -!f.getName.startsWith("_") && f.getName.endsWith(".zlib.orc") - } - assert(maybeOrcFile.isDefined) - val orcFilePath = maybeOrcFile.get.toPath.toString - val expectedCompressionKind = -OrcFileOperator.getFileReader(orcFilePath).get.getCompression --- End diff -- The same here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20331: [SPARK-23158] [SQL] Move HadoopFsRelationTest tes...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20331#discussion_r162683705 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcHadoopFsRelationSuite.scala --- @@ -82,44 +80,4 @@ class OrcHadoopFsRelationSuite extends HadoopFsRelationTest { } } } - - test("SPARK-13543: Support for specifying compression codec for ORC via option()") { -withTempPath { dir => - val path = s"${dir.getCanonicalPath}/table1" - val df = (1 to 5).map(i => (i, (i % 2).toString)).toDF("a", "b") - df.write -.option("compression", "ZlIb") -.orc(path) - - // Check if this is compressed as ZLIB. - val maybeOrcFile = new File(path).listFiles().find { f => -!f.getName.startsWith("_") && f.getName.endsWith(".zlib.orc") - } - assert(maybeOrcFile.isDefined) - val orcFilePath = maybeOrcFile.get.toPath.toString - val expectedCompressionKind = -OrcFileOperator.getFileReader(orcFilePath).get.getCompression - assert("ZLIB" === expectedCompressionKind.name()) - - val copyDf = spark -.read -.orc(path) - checkAnswer(df, copyDf) -} - } - - test("Default compression codec is snappy for ORC compression") { -withTempPath { file => - spark.range(0, 10).write -.orc(file.getCanonicalPath) - val expectedCompressionKind = - OrcFileOperator.getFileReader(file.getCanonicalPath).get.getCompression --- End diff -- `OrcFileOperator` is defined in `sql\hive`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20331: [SPARK-23158] [SQL] Move HadoopFsRelationTest test suite...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20331 **[Test build #86392 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86392/testReport)** for PR 20331 at commit [`f7693f0`](https://github.com/apache/spark/commit/f7693f0abfe0923868c1918ddcaeaece2c107c5d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20331: [SPARK-23158] [SQL] Move HadoopFsRelationTest tes...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20331#discussion_r162683627 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcHadoopFsRelationSuite.scala --- @@ -82,44 +80,4 @@ class OrcHadoopFsRelationSuite extends HadoopFsRelationTest { } } } - - test("SPARK-13543: Support for specifying compression codec for ORC via option()") { -withTempPath { dir => - val path = s"${dir.getCanonicalPath}/table1" - val df = (1 to 5).map(i => (i, (i % 2).toString)).toDF("a", "b") - df.write -.option("compression", "ZlIb") -.orc(path) - - // Check if this is compressed as ZLIB. - val maybeOrcFile = new File(path).listFiles().find { f => -!f.getName.startsWith("_") && f.getName.endsWith(".zlib.orc") - } - assert(maybeOrcFile.isDefined) - val orcFilePath = maybeOrcFile.get.toPath.toString - val expectedCompressionKind = -OrcFileOperator.getFileReader(orcFilePath).get.getCompression - assert("ZLIB" === expectedCompressionKind.name()) - - val copyDf = spark -.read -.orc(path) - checkAnswer(df, copyDf) -} - } - - test("Default compression codec is snappy for ORC compression") { -withTempPath { file => - spark.range(0, 10).write -.orc(file.getCanonicalPath) - val expectedCompressionKind = - OrcFileOperator.getFileReader(file.getCanonicalPath).get.getCompression - assert("SNAPPY" === expectedCompressionKind.name()) -} - } -} - -class HiveOrcHadoopFsRelationSuite extends OrcHadoopFsRelationSuite { --- End diff -- This is Hive only. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20331: [SPARK-23158] [SQL] Move HadoopFsRelationTest test suite...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20331 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/42/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20331: [SPARK-23158] [SQL] Move HadoopFsRelationTest test suite...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20331 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org