[GitHub] spark pull request #21582: [SPARK-24576][BUILD] Upgrade Apache ORC to 1.5.2
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21582 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18544: [SPARK-21318][SQL]Improve exception message thrown by `l...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18544 @stanzhai Could we just fix it by changing `failFunctionLookup`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21582: [SPARK-24576][BUILD] Upgrade Apache ORC to 1.5.2
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21582 Thanks! Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21608: [SPARK-24626] [SQL] Improve location size calcula...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/21608#discussion_r203265365 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/CommandUtils.scala --- @@ -47,15 +48,27 @@ object CommandUtils extends Logging { } } - def calculateTotalSize(sessionState: SessionState, catalogTable: CatalogTable): BigInt = { + def calculateTotalSize(spark: SparkSession, catalogTable: CatalogTable): BigInt = { +val sessionState = spark.sessionState if (catalogTable.partitionColumnNames.isEmpty) { calculateLocationSize(sessionState, catalogTable.identifier, catalogTable.storage.locationUri) } else { // Calculate table size as a sum of the visible partitions. See SPARK-21079 val partitions = sessionState.catalog.listPartitions(catalogTable.identifier) - partitions.map { p => -calculateLocationSize(sessionState, catalogTable.identifier, p.storage.locationUri) - }.sum + val paths = partitions.map(x => new Path(x.storage.locationUri.get)) + val stagingDir = sessionState.conf.getConfString("hive.exec.stagingdir", ".hive-staging") + val pathFilter = new PathFilter with Serializable { +override def accept(path: Path): Boolean = { + val fileName = path.getName + (!fileName.startsWith(stagingDir) && +// Ignore metadata files starting with "_" +!fileName.startsWith("_")) --- End diff -- We already have the function to check if it is a data file, or not. Can we reuse this? https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningAwareFileIndex.scala#L232 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21608: [SPARK-24626] [SQL] Improve location size calcula...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/21608#discussion_r203264804 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala --- @@ -148,6 +148,19 @@ class StatisticsSuite extends StatisticsCollectionTestBase with TestHiveSingleto } } + test("verify table size calculation is accurate") { --- End diff -- I originally meant you'd be better to test if the calculation done in parallel like https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileIndexSuite.scala#L99 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21764: [SPARK-24802] Optimization Rule Exclusion
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21764 **[Test build #93217 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93217/testReport)** for PR 21764 at commit [`84f1a6b`](https://github.com/apache/spark/commit/84f1a6b5cba08df8684179e9d7195545be655e76). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21801: [SPARK-24386][SPARK-24768][BUILD][FOLLOWUP] Fix lint-jav...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21801 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1085/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21764: [SPARK-24802] Optimization Rule Exclusion
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21764 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21764: [SPARK-24802] Optimization Rule Exclusion
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21764 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1086/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21801: [SPARK-24386][SPARK-24768][BUILD][FOLLOWUP] Fix lint-jav...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21801 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21698: [SPARK-23243][Core] Fix RDD.repartition() data correctne...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21698 Ah I see, then we need to change DAGScheduler a lot to fix it, which may not worth. I still insist that `repartition` should be treated as a bug, but it's not easy to hit(when the input order is random) and very hard to fix. Maybe documenting it is the best we can do for now. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21770: [SPARK-24806][SQL] Brush up generated code so that JDK c...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/21770 yea, if we get the consensus to implement #21777, it sounds ok to me. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21770: [SPARK-24806][SQL] Brush up generated code so tha...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/21770#discussion_r203261248 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala --- @@ -318,7 +318,8 @@ case class SampleExec( v => s""" | $v = new $samplerClass($lowerBound, $upperBound, false); | $v.setSeed(${seed}L + partitionIndex); - """.stripMargin.trim) + """.stripMargin.trim, +forceInline = true) --- End diff -- aha, ok. let's us wait for the other developer's comments. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21801: [SPARK-24386][SPARK-24768][BUILD][FOLLOWUP] Fix lint-jav...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21801 **[Test build #93216 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93216/testReport)** for PR 21801 at commit [`7f78d75`](https://github.com/apache/spark/commit/7f78d750411a4098527b2b332495f5dd4f20c63e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21801: [SPARK-24386][SPARK-24768][BUILD][FOLLOWUP] Fix lint-jav...
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/21801 Jenkins, retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21777: [WIP][SPARK-24498][SQL] Add JDK compiler for runtime cod...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/21777 Yea, as you said, the JDK compiler generates different bytecode though, I couldn't get obvious performance gains for TPCDS as compared to the janino one. So, I couldn't find the strong reason to implement this in terms of performance; https://docs.google.com/spreadsheets/d/1Mgdd9dfFaACXOUHqKfaeKrj09hB3X1j9sKTJlJ6UM6w/edit#gid=1236423798 As another viewpoint, I think it might be useful to check if the generated Java code of Spark could be compiled by the JDK compiler (JDK8 code compatibility checks). But, since the compilation of the JDK compiler is too slow (see the performance values in the google spread sheet above), IMO it is impractical to check this in Jenkins (I found it took 7~8 hours to run the tests of the `spark/sql` only in aws instances). WDYT? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21801: [SPARK-24386][SPARK-24768][BUILD][FOLLOWUP] Fix lint-jav...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21801 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21801: [SPARK-24386][SPARK-24768][BUILD][FOLLOWUP] Fix lint-jav...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21801 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93214/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21801: [SPARK-24386][SPARK-24768][BUILD][FOLLOWUP] Fix lint-jav...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21801 **[Test build #93214 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93214/testReport)** for PR 21801 at commit [`7f78d75`](https://github.com/apache/spark/commit/7f78d750411a4098527b2b332495f5dd4f20c63e). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21131: [SPARK-23433][CORE] Late zombie task completions update ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21131 a late LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21131: [SPARK-23433][CORE] Late zombie task completions ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21131#discussion_r203257926 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -764,6 +769,19 @@ private[spark] class TaskSetManager( maybeFinishTaskSet() } + private[scheduler] def markPartitionCompleted(partitionId: Int): Unit = { +partitionToIndex.get(partitionId).foreach { index => + if (!successful(index)) { +tasksSuccessful += 1 +successful(index) = true +if (tasksSuccessful == numTasks) { + isZombie = true +} +maybeFinishTaskSet() --- End diff -- is this line needed? We will call `maybeFinishTaskSet()` at the end of `handleSuccessfulTask` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21488: SPARK-18057 Update structured streaming kafka fro...
Github user ijuma commented on a diff in the pull request: https://github.com/apache/spark/pull/21488#discussion_r203256766 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaOffsetReader.scala --- @@ -115,7 +116,7 @@ private[kafka010] class KafkaOffsetReader( def fetchTopicPartitions(): Set[TopicPartition] = runUninterruptibly { assert(Thread.currentThread().isInstanceOf[UninterruptibleThread]) // Poll to get the latest assigned partitions -consumer.poll(0) +consumer.poll(JDuration.ofMillis(0)) --- End diff -- @zsxwing Why do you want to support Kafka clients jars from 0.10 to 2.0? Since newer clients jars support older brokers, we recommend people use the latest Kafka clients jar whenever possible. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21386: [SPARK-23928][SQL][WIP] Add shuffle collection function.
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/21386 Okay, I'll take this over, and ping you when I submit a PR to ask a review. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21795: [SPARK-24840][SQL] do not use dummy filter to switch cod...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21795 **[Test build #93215 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93215/testReport)** for PR 21795 at commit [`de5a232`](https://github.com/apache/spark/commit/de5a2323b5b46a4c073e3ff1dce6daea395dd1dd). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21795: [SPARK-24165][SQL][followup] Fixing conditional expressi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21795 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21795: [SPARK-24165][SQL][followup] Fixing conditional expressi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21795 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1084/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21488: SPARK-18057 Update structured streaming kafka from 0.10....
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21488 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21488: SPARK-18057 Update structured streaming kafka from 0.10....
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21488 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93212/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21488: SPARK-18057 Update structured streaming kafka from 0.10....
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21488 **[Test build #93212 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93212/testReport)** for PR 21488 at commit [`e7318a9`](https://github.com/apache/spark/commit/e7318a9ac7597c0284d5b0732926fce5caec40ad). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21800: [SPARK-24825][K8S][TEST] Kubernetes integration tests bu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21800 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21800: [SPARK-24825][K8S][TEST] Kubernetes integration tests bu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21800 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93210/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21800: [SPARK-24825][K8S][TEST] Kubernetes integration tests bu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21800 **[Test build #93210 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93210/testReport)** for PR 21800 at commit [`d6f41e0`](https://github.com/apache/spark/commit/d6f41e08e3111e6c878afc0845e3f2a091519dd1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21772: [SPARK-24809] [SQL] Serializing LongHashedRelatio...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21772#discussion_r203252809 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala --- @@ -726,8 +726,9 @@ private[execution] final class LongToUnsafeRowMap(val mm: TaskMemoryManager, cap writeLong(array.length) writeLongArray(writeBuffer, array, array.length) -val used = ((cursor - Platform.LONG_ARRAY_OFFSET) / 8).toInt -writeLong(used) +val cursorFlag = cursor - Platform.LONG_ARRAY_OFFSET +writeLong(cursorFlag) +val used = (cursorFlag / 8).toInt --- End diff -- Can you post the image in this PR? The web site you refer contains too many ads. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21440: [SPARK-24307][CORE] Support reading remote cached...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/21440#discussion_r203251175 --- Diff: core/src/main/scala/org/apache/spark/util/io/ChunkedByteBufferFileRegion.scala --- @@ -0,0 +1,86 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.util.io + +import java.nio.channels.WritableByteChannel + +import io.netty.channel.FileRegion +import io.netty.util.AbstractReferenceCounted + +import org.apache.spark.internal.Logging +import org.apache.spark.network.util.AbstractFileRegion + + +/** + * This exposes a ChunkedByteBuffer as a netty FileRegion, just to allow sending > 2gb in one netty + * message. This is because netty cannot send a ByteBuf > 2g, but it can send a large FileRegion, + * even though the data is not backed by a file. + */ +private[io] class ChunkedByteBufferFileRegion( +private val chunkedByteBuffer: ChunkedByteBuffer, +private val ioChunkSize: Int) extends AbstractFileRegion { + + private var _transferred: Long = 0 + // this duplicates the original chunks, so we're free to modify the position, limit, etc. + private val chunks = chunkedByteBuffer.getChunks() + private val size = chunks.foldLeft(0L) { _ + _.remaining() } + + protected def deallocate: Unit = {} + + override def count(): Long = size + + // this is the "start position" of the overall Data in the backing file, not our current position + override def position(): Long = 0 + + override def transferred(): Long = _transferred + + private var currentChunkIdx = 0 + + def transferTo(target: WritableByteChannel, position: Long): Long = { +assert(position == _transferred) +if (position == size) return 0L +var keepGoing = true +var written = 0L +var currentChunk = chunks(currentChunkIdx) +while (keepGoing) { + while (currentChunk.hasRemaining && keepGoing) { +val ioSize = Math.min(currentChunk.remaining(), ioChunkSize) +val originalLimit = currentChunk.limit() +currentChunk.limit(currentChunk.position() + ioSize) +val thisWriteSize = target.write(currentChunk) +currentChunk.limit(originalLimit) +written += thisWriteSize +if (thisWriteSize < ioSize) { --- End diff -- I see, thanks for explain. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21440: [SPARK-24307][CORE] Support reading remote cached...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/21440#discussion_r203250619 --- Diff: core/src/main/scala/org/apache/spark/util/io/ChunkedByteBuffer.scala --- @@ -166,6 +170,34 @@ private[spark] class ChunkedByteBuffer(var chunks: Array[ByteBuffer]) { } +object ChunkedByteBuffer { + // TODO eliminate this method if we switch BlockManager to getting InputStreams + def fromManagedBuffer(data: ManagedBuffer, maxChunkSize: Int): ChunkedByteBuffer = { +data match { + case f: FileSegmentManagedBuffer => +map(f.getFile, maxChunkSize, f.getOffset, f.getLength) + case other => +new ChunkedByteBuffer(other.nioByteBuffer()) +} + } + + def map(file: File, maxChunkSize: Int, offset: Long, length: Long): ChunkedByteBuffer = { +Utils.tryWithResource(new FileInputStream(file).getChannel()) { channel => --- End diff -- I've already updated some of them in SPARK-21475 in shuffle related code path, but not all of them which are not so critical. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21669: [SPARK-23257][K8S][WIP] Kerberos Support for Spark on K8...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21669 Build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21669: [SPARK-23257][K8S][WIP] Kerberos Support for Spark on K8...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21669 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93211/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21669: [SPARK-23257][K8S][WIP] Kerberos Support for Spark on K8...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21669 **[Test build #93211 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93211/testReport)** for PR 21669 at commit [`13b3adc`](https://github.com/apache/spark/commit/13b3adc5ffb55fbfd6572089b1f54e8bca393494). * This patch **fails Spark unit tests**. * This patch **does not merge cleanly**. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21800: [SPARK-24825][K8S][TEST] Kubernetes integration tests bu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21800 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93209/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21800: [SPARK-24825][K8S][TEST] Kubernetes integration tests bu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21800 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21801: [SPARK-24386][SPARK-24768][BUILD][FOLLOWUP] Fix l...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21801#discussion_r203249250 --- Diff: external/avro/src/test/resources/log4j.properties --- @@ -46,4 +46,4 @@ log4j.additivity.org.apache.hadoop.hive.metastore.RetryingHMSHandler=false log4j.logger.org.apache.hadoop.hive.metastore.RetryingHMSHandler=OFF log4j.additivity.hive.ql.metadata.Hive=false -log4j.logger.hive.ql.metadata.Hive=OFF \ No newline at end of file +log4j.logger.hive.ql.metadata.Hive=OFF --- End diff -- Me neither, actually. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21800: [SPARK-24825][K8S][TEST] Kubernetes integration tests bu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21800 **[Test build #93209 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93209/testReport)** for PR 21800 at commit [`6a89c65`](https://github.com/apache/spark/commit/6a89c658c6a56c34d5e0578f09db77b4ed9b41f3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20146: [SPARK-11215][ML] Add multiple columns support to String...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20146 Ah, I know why. Because the latest commit https://github.com/apache/spark/pull/20146/commits/a6551b02a10428d66e0dadcfcb5a8da3798ec814 doesn't contain the changes to trigger the test. Rebasing and squashing it will trigger the test. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21542: [SPARK-24529][Build][test-maven] Add spotbugs into maven...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21542 H this happened to me too and I just pushed https://github.com/apache/spark/commit/fc2e18963efdf4b50258f85c8779122742876910. Mine was Java 8. There's reproducer in the PR description. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21801: [SPARK-24386][SPARK-24768][BUILD][FOLLOWUP] Fix l...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21801#discussion_r203248740 --- Diff: external/avro/src/test/resources/log4j.properties --- @@ -46,4 +46,4 @@ log4j.additivity.org.apache.hadoop.hive.metastore.RetryingHMSHandler=false log4j.logger.org.apache.hadoop.hive.metastore.RetryingHMSHandler=OFF log4j.additivity.hive.ql.metadata.Hive=false -log4j.logger.hive.ql.metadata.Hive=OFF \ No newline at end of file +log4j.logger.hive.ql.metadata.Hive=OFF --- End diff -- hm I didn't know Maven's checkstyle checks resources too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21488: SPARK-18057 Update structured streaming kafka from 0.10....
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21488 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93213/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21488: SPARK-18057 Update structured streaming kafka from 0.10....
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21488 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21488: SPARK-18057 Update structured streaming kafka from 0.10....
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21488 **[Test build #93213 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93213/testReport)** for PR 21488 at commit [`13a7884`](https://github.com/apache/spark/commit/13a7884279103ed06778e4351616e316beb7566f). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21801: [SPARK-24386][SPARK-24768][BUILD][FOLLOWUP] Fix lint-jav...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21801 **[Test build #93214 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93214/testReport)** for PR 21801 at commit [`7f78d75`](https://github.com/apache/spark/commit/7f78d750411a4098527b2b332495f5dd4f20c63e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21801: [SPARK-24386][SPARK-24768][BUILD][FOLLOWUP] Fix lint-jav...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21801 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1083/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21801: [SPARK-24386][SPARK-24768][BUILD][FOLLOWUP] Fix lint-jav...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21801 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21801: [SPARK-24386][SPARK-24768][BUILD][FOLLOWUP] Fix l...
GitHub user ueshin opened a pull request: https://github.com/apache/spark/pull/21801 [SPARK-24386][SPARK-24768][BUILD][FOLLOWUP] Fix lint-java and Scala 2.12 build. ## What changes were proposed in this pull request? This pr fixes lint-java and Scala 2.12 build. lint-java: ``` [ERROR] src/test/resources/log4j.properties:[0] (misc) NewlineAtEndOfFile: File does not end with a newline. ``` Scala 2.12 build: ``` [error] /.../sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousCoalesceRDD.scala:121: overloaded method value addTaskCompletionListener with alternatives: [error] (f: org.apache.spark.TaskContext => Unit)org.apache.spark.TaskContext [error] (listener: org.apache.spark.util.TaskCompletionListener)org.apache.spark.TaskContext [error] cannot be applied to (org.apache.spark.TaskContext => java.util.List[Runnable]) [error] context.addTaskCompletionListener { ctx => [error] ^ ``` ## How was this patch tested? Manually executed lint-java and Scala 2.12 build in my local environment. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ueshin/apache-spark issues/SPARK-24386_24768/fix_build Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21801.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21801 commit 2fed7d487d90da5ea65d35d50208f88b5f829145 Author: Takuya UESHIN Date: 2018-07-17T09:51:01Z Fix lint-java. commit 7f78d750411a4098527b2b332495f5dd4f20c63e Author: Takuya UESHIN Date: 2018-07-18T03:45:34Z Fix Scala 2.12 build. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21440: [SPARK-24307][CORE] Support reading remote cached...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21440#discussion_r20324 --- Diff: core/src/main/scala/org/apache/spark/util/io/ChunkedByteBufferFileRegion.scala --- @@ -0,0 +1,86 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.util.io + +import java.nio.channels.WritableByteChannel + +import io.netty.channel.FileRegion +import io.netty.util.AbstractReferenceCounted + +import org.apache.spark.internal.Logging +import org.apache.spark.network.util.AbstractFileRegion + + +/** + * This exposes a ChunkedByteBuffer as a netty FileRegion, just to allow sending > 2gb in one netty + * message. This is because netty cannot send a ByteBuf > 2g, but it can send a large FileRegion, + * even though the data is not backed by a file. + */ +private[io] class ChunkedByteBufferFileRegion( +private val chunkedByteBuffer: ChunkedByteBuffer, +private val ioChunkSize: Int) extends AbstractFileRegion { + + private var _transferred: Long = 0 + // this duplicates the original chunks, so we're free to modify the position, limit, etc. + private val chunks = chunkedByteBuffer.getChunks() + private val size = chunks.foldLeft(0L) { _ + _.remaining() } + + protected def deallocate: Unit = {} + + override def count(): Long = size + + // this is the "start position" of the overall Data in the backing file, not our current position + override def position(): Long = 0 + + override def transferred(): Long = _transferred + + private var currentChunkIdx = 0 + + def transferTo(target: WritableByteChannel, position: Long): Long = { +assert(position == _transferred) +if (position == size) return 0L +var keepGoing = true +var written = 0L +var currentChunk = chunks(currentChunkIdx) +while (keepGoing) { + while (currentChunk.hasRemaining && keepGoing) { +val ioSize = Math.min(currentChunk.remaining(), ioChunkSize) +val originalLimit = currentChunk.limit() +currentChunk.limit(currentChunk.position() + ioSize) +val thisWriteSize = target.write(currentChunk) +currentChunk.limit(originalLimit) +written += thisWriteSize +if (thisWriteSize < ioSize) { --- End diff -- actually this is a totally normal condition, it just means the channel is not currently ready to accept anymore data. This is something netty expects, and it will make sure the rest of the data is put on the channel eventually (it'll get called the next time with the correct `position` argument indicating how far along it is). The added unit tests cover this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20146: [SPARK-11215][ML] Add multiple columns support to String...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/20146 it's odd appveyer tests are not getting triggered. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21440: [SPARK-24307][CORE] Support reading remote cached...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21440#discussion_r203245221 --- Diff: core/src/main/scala/org/apache/spark/util/io/ChunkedByteBuffer.scala --- @@ -166,6 +170,34 @@ private[spark] class ChunkedByteBuffer(var chunks: Array[ByteBuffer]) { } +object ChunkedByteBuffer { + // TODO eliminate this method if we switch BlockManager to getting InputStreams + def fromManagedBuffer(data: ManagedBuffer, maxChunkSize: Int): ChunkedByteBuffer = { +data match { + case f: FileSegmentManagedBuffer => +map(f.getFile, maxChunkSize, f.getOffset, f.getLength) + case other => +new ChunkedByteBuffer(other.nioByteBuffer()) +} + } + + def map(file: File, maxChunkSize: Int, offset: Long, length: Long): ChunkedByteBuffer = { +Utils.tryWithResource(new FileInputStream(file).getChannel()) { channel => --- End diff -- I wasn't aware of that issue, thanks for sharing that, I'll update this. Should we also update other uses? Seems there are a lot of other cases, eg. `UnsafeShuffleWriter`, `DiskBlockObjectWriter`, etc. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21710: [SPARK-24207][R]add R API for PrefixSpan
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/21710#discussion_r203245118 --- Diff: R/pkg/R/mllib_fpm.R --- @@ -154,3 +160,74 @@ setMethod("write.ml", signature(object = "FPGrowthModel", path = "character"), function(object, path, overwrite = FALSE) { write_internal(object, path, overwrite) }) + +#' PrefixSpan +#' +#' A parallel PrefixSpan algorithm to mine frequent sequential patterns. +#' \code{spark.prefixSpan} returns an instance of PrefixSpan. +#' \code{spark.findFrequentSequentialPatterns} returns a complete set of frequent sequential +#' patterns. +#' For more details, see +#' \href{https://spark.apache.org/docs/latest/mllib-frequent-pattern-mining.html#prefixspan}{ +#' PrefixSpan}. +#' +#' @param minSupport Minimal support level. +#' @param maxPatternLength Maximal pattern length. +#' @param maxLocalProjDBSize Maximum number of items (including delimiters used in the internal +#' storage format) allowed in a projected database before local +#' processing. +#' @param sequenceCol name of the sequence column in dataset. +#' @param ... additional argument(s) passed to the method. +#' @return \code{spark.prefixSpan} returns an instance of PrefixSpan +#' @rdname spark.prefixSpan +#' @name spark.prefixSpan +#' @aliases spark.prefixSpan,ANY-method +#' @examples +#' \dontrun{ +#' df <- createDataFrame(list(list(list(list(1L, 2L), list(3L))), +#' list(list(list(1L), list(3L, 2L), list(1L, 2L))), +#' list(list(list(1L, 2L), list(5L))), +#' list(list(list(6L, schema = c("sequence")) +#' prefix_Span <- spark.prefixSpan(minSupport = 0.5, maxPatternLength = 5L, +#' maxLocalProjDBSize = 3200L) +#' frequency <- spark.findFrequentSequentialPatterns(prefix_Span, df) +#' showDF(frequency) +#' } +#' @note spark.prefixSpan since 2.4.0 +setMethod("spark.prefixSpan", signature(), + function(minSupport=0.1, maxPatternLength=10L, + maxLocalProjDBSize=3200L, sequenceCol="sequence") { +if (!is.numeric(minSupport) || minSupport < 0) { + stop("minSupport should be a number with value >= 0.") +} +if (!is.integer(maxPatternLength) || maxPatternLength <= 0) { + stop("maxPatternLength should be a number with value > 0.") +} +if (!is.numeric(maxLocalProjDBSize) || maxLocalProjDBSize <= 0) { + stop("maxLocalProjDBSize should be a number with value > 0.") +} + +jobj <- callJStatic("org.apache.spark.ml.r.PrefixSpanWrapper", "getPrefixSpan", +as.numeric(minSupport), as.integer(maxPatternLength), +as.numeric(maxLocalProjDBSize), as.character(sequenceCol)) +new("PrefixSpan", jobj = jobj) + }) + +# Find frequent sequential patterns. + +#' @param object a prefixSpan object. +#' @param data A SparkDataFrame. +#' @return A complete set of frequent sequential patterns in the input sequences of itemsets. +#' The returned \code{SparkDataFrame} contains columns of sequence and corresponding +#' frequency. The schema of it will be: +#' \code{sequence: ArrayType(ArrayType(T))} (T is the item type) +#' \code{freq: Long} +#' @rdname spark.prefixSpan --- End diff -- ditto here --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21710: [SPARK-24207][R]add R API for PrefixSpan
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/21710#discussion_r203245010 --- Diff: R/pkg/R/generics.R --- @@ -1415,6 +1415,13 @@ setGeneric("spark.freqItemsets", function(object) { standardGeneric("spark.freqI #' @rdname spark.fpGrowth setGeneric("spark.associationRules", function(object) { standardGeneric("spark.associationRules") }) +#' @rdname spark.prefixSpan +setGeneric("spark.prefixSpan", function(...) { standardGeneric("spark.prefixSpan") }) + +#' @rdname spark.prefixSpan --- End diff -- i think he meant the rdname - yes ok to have both in one rd --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21440: [SPARK-24307][CORE] Support reading remote cached...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21440#discussion_r203244832 --- Diff: core/src/main/scala/org/apache/spark/util/io/ChunkedByteBuffer.scala --- @@ -166,6 +170,34 @@ private[spark] class ChunkedByteBuffer(var chunks: Array[ByteBuffer]) { } +object ChunkedByteBuffer { + // TODO eliminate this method if we switch BlockManager to getting InputStreams + def fromManagedBuffer(data: ManagedBuffer, maxChunkSize: Int): ChunkedByteBuffer = { +data match { + case f: FileSegmentManagedBuffer => +map(f.getFile, maxChunkSize, f.getOffset, f.getLength) + case other => +new ChunkedByteBuffer(other.nioByteBuffer()) +} + } + + def map(file: File, maxChunkSize: Int, offset: Long, length: Long): ChunkedByteBuffer = { +Utils.tryWithResource(new FileInputStream(file).getChannel()) { channel => + var remaining = length + var pos = offset + val chunks = new ListBuffer[ByteBuffer]() + while (remaining > 0) { +val chunkSize = math.min(remaining, maxChunkSize) +val chunk = channel.map(FileChannel.MapMode.READ_ONLY, pos, chunkSize) --- End diff -- I think your concern is that when we are going to send data that is backed by a file, eg. a remote read of an RDD cached on disk, we should be able to send it using something more efficient than memory mapping the entire file. Is that correct? That actually isn't a problem. This `map()` method isn't called for sending disk-cached RDDs. That is already handled correctly with `FileSegmentManagedBuffer.convertToNetty()`, which uses the `DefaultFileRegion` you had in mind. The `map` method is only used on the receiving end, after the data has already been transferred, and just to pass the data on to other spark code locally in the executor. (And that will avoid the `map()` entirely after the TODO above.) I needed to add `ChunkedByteBufferFileRegion` for data that is already in memory as a ChunkedByteBuffer, eg. for memory-cached RDDs. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21305: [SPARK-24251][SQL] Add AppendData logical plan.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21305 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21305: [SPARK-24251][SQL] Add AppendData logical plan.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21305 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93208/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21305: [SPARK-24251][SQL] Add AppendData logical plan.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21305 **[Test build #93208 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93208/testReport)** for PR 21305 at commit [`f041019`](https://github.com/apache/spark/commit/f041019b1ffe8187e47ba89de96daf631dfd56da). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21542: [SPARK-24529][Build][test-maven] Add spotbugs into maven...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/21542 "permission" stuff might be Java 9 related? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21787: [SPARK-24568] Code refactoring for DataType equalsXXX me...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21787 Constant variables and squashing the logic into one function look not worth enough and overkill. Less duplication is good of course but it doesn't look worth enough for both. I would focus on more important stuff. -1 from me. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21787: [SPARK-24568] Code refactoring for DataType equalsXXX me...
Github user swapnilushinde commented on the issue: https://github.com/apache/spark/pull/21787 true. Currently we have just 3 variations of comparing two datatypes for equality. Adding even one more equality function would easily cause writing same repetitive code which would negate observed increase in code lines. Comparing two datasets' with different criteria for datatype equality is very handy for unit testing purposes too. Can you please let me know what do you find unreadable/complex in new code so I can try to simplify it if possible? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21795: [SPARK-24165][SQL][followup] Fixing conditional e...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21795#discussion_r203242468 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala --- @@ -2336,46 +2336,40 @@ class DataFrameSuite extends QueryTest with SharedSQLContext { val sourceDF = spark.createDataFrame(rows, schema) -val structWhenDF = sourceDF +def structWhenDF: DataFrame = sourceDF .select(when('cond, struct(lit("a").as("val1"), lit(10).as("val2"))).otherwise('s) as "res") .select('res.getField("val1")) -val arrayWhenDF = sourceDF +def arrayWhenDF: DataFrame = sourceDF .select(when('cond, array(lit("a"), lit("b"))).otherwise('a) as "res") .select('res.getItem(0)) -val mapWhenDF = sourceDF +def mapWhenDF: DataFrame = sourceDF .select(when('cond, map(lit(0), lit("a"))).otherwise('m) as "res") .select('res.getItem(0)) -val structIfDF = sourceDF +def structIfDF: DataFrame = sourceDF .select(expr("if(cond, struct('a' as val1, 10 as val2), s)") as "res") .select('res.getField("val1")) -val arrayIfDF = sourceDF +def arrayIfDF: DataFrame = sourceDF .select(expr("if(cond, array('a', 'b'), a)") as "res") .select('res.getItem(0)) -val mapIfDF = sourceDF +def mapIfDF: DataFrame = sourceDF .select(expr("if(cond, map(0, 'a'), m)") as "res") .select('res.getItem(0)) -def checkResult(df: DataFrame, codegenExpected: Boolean): Unit = { - assert(df.queryExecution.executedPlan.isInstanceOf[WholeStageCodegenExec] == codegenExpected) - checkAnswer(df, Seq(Row("a"), Row(null))) +def checkResult(): Unit = { + checkAnswer(structWhenDF, Seq(Row("a"), Row(null))) + checkAnswer(arrayWhenDF, Seq(Row("a"), Row(null))) + checkAnswer(mapWhenDF, Seq(Row("a"), Row(null))) + checkAnswer(structIfDF, Seq(Row("a"), Row(null))) + checkAnswer(arrayIfDF, Seq(Row("a"), Row(null))) + checkAnswer(mapIfDF, Seq(Row("a"), Row(null))) } -// without codegen -checkResult(structWhenDF, false) -checkResult(arrayWhenDF, false) -checkResult(mapWhenDF, false) -checkResult(structIfDF, false) -checkResult(arrayIfDF, false) -checkResult(mapIfDF, false) - -// with codegen -checkResult(structWhenDF.filter('cond.isNotNull), true) --- End diff -- I saw some tests using similar dummy filters in `DataFrameFunctionsSuite`. Should we fix them as well? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21772: [SPARK-24809] [SQL] Serializing LongHashedRelation in ex...
Github user liutang123 commented on the issue: https://github.com/apache/spark/pull/21772 @hvanhovell Thanks for reviewing. Losing data because the variable **cursor** in executor is 0 and serialization depends on it. I will add an UT later. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21772: [SPARK-24809] [SQL] Serializing LongHashedRelatio...
Github user liutang123 commented on a diff in the pull request: https://github.com/apache/spark/pull/21772#discussion_r203241485 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala --- @@ -726,8 +726,9 @@ private[execution] final class LongToUnsafeRowMap(val mm: TaskMemoryManager, cap writeLong(array.length) writeLongArray(writeBuffer, array, array.length) -val used = ((cursor - Platform.LONG_ARRAY_OFFSET) / 8).toInt -writeLong(used) +val cursorFlag = cursor - Platform.LONG_ARRAY_OFFSET +writeLong(cursorFlag) +val used = (cursorFlag / 8).toInt --- End diff -- losing data when serializing LongHashedRelation in executor, can you see [this picture](http://oi67.tinypic.com/2z5pzs7.jpg)? In executor, the cursor is 0. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18138: [SPARK-20915][SQL] Make lpad/rpad with empty pad ...
Github user wangyum closed the pull request at: https://github.com/apache/spark/pull/18138 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19804: [WIP][SPARK-22573][SQL] Shouldn't inferFilters if it con...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19804 Thank you @wangyum. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21795: [SPARK-24165][SQL][followup] Fixing conditional e...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21795#discussion_r203240428 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala --- @@ -2336,46 +2336,40 @@ class DataFrameSuite extends QueryTest with SharedSQLContext { val sourceDF = spark.createDataFrame(rows, schema) -val structWhenDF = sourceDF +def structWhenDF: DataFrame = sourceDF .select(when('cond, struct(lit("a").as("val1"), lit(10).as("val2"))).otherwise('s) as "res") .select('res.getField("val1")) -val arrayWhenDF = sourceDF +def arrayWhenDF: DataFrame = sourceDF .select(when('cond, array(lit("a"), lit("b"))).otherwise('a) as "res") .select('res.getItem(0)) -val mapWhenDF = sourceDF +def mapWhenDF: DataFrame = sourceDF .select(when('cond, map(lit(0), lit("a"))).otherwise('m) as "res") .select('res.getItem(0)) -val structIfDF = sourceDF +def structIfDF: DataFrame = sourceDF .select(expr("if(cond, struct('a' as val1, 10 as val2), s)") as "res") .select('res.getField("val1")) -val arrayIfDF = sourceDF +def arrayIfDF: DataFrame = sourceDF .select(expr("if(cond, array('a', 'b'), a)") as "res") .select('res.getItem(0)) -val mapIfDF = sourceDF +def mapIfDF: DataFrame = sourceDF .select(expr("if(cond, map(0, 'a'), m)") as "res") .select('res.getItem(0)) -def checkResult(df: DataFrame, codegenExpected: Boolean): Unit = { - assert(df.queryExecution.executedPlan.isInstanceOf[WholeStageCodegenExec] == codegenExpected) - checkAnswer(df, Seq(Row("a"), Row(null))) +def checkResult(): Unit = { + checkAnswer(structWhenDF, Seq(Row("a"), Row(null))) + checkAnswer(arrayWhenDF, Seq(Row("a"), Row(null))) + checkAnswer(mapWhenDF, Seq(Row("a"), Row(null))) + checkAnswer(structIfDF, Seq(Row("a"), Row(null))) + checkAnswer(arrayIfDF, Seq(Row("a"), Row(null))) + checkAnswer(mapIfDF, Seq(Row("a"), Row(null))) } -// without codegen -checkResult(structWhenDF, false) -checkResult(arrayWhenDF, false) -checkResult(mapWhenDF, false) -checkResult(structIfDF, false) -checkResult(arrayIfDF, false) -checkResult(mapIfDF, false) - -// with codegen -checkResult(structWhenDF.filter('cond.isNotNull), true) --- End diff -- ah that's tricky. Because filter pushdown runs first, the local relation optimization can't be applied. To prevent confusions like this, how about we use local/cached relation to test it? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21784: [SPARK-24182][YARN][FOLLOW-UP] Turn off noisy log output
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/21784 It's noisy when type something: ![spark-24128](https://user-images.githubusercontent.com/5399861/42857022-8afee254-8a7a-11e8-8ee0-47a19af99fa7.gif) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21792: [SPARK-23231][ML][DOC] Add doc for string indexer orderi...
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/21792 @srowen I think we need to update the docs 1, Current doc in `StringIndexer` is somewhat misleading: "The indices are in `[0, numLabels)`, ordered by label frequencies, so the most frequent label gets index `0`." this is true only with default ordering type. 2, In RFormula, `stringOrderType` only affect feature columns, not label column. This need to be emphasised, which is somewhat out of expectation. @MLnick your thoughts? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21765: [MINOR][CORE] Add test cases for RDD.cartesian
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21765 **[Test build #4218 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4218/testReport)** for PR 21765 at commit [`e5f469a`](https://github.com/apache/spark/commit/e5f469a0b83d35b8735eeba30dfca4fe0320810b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21789: [SPARK-24829][SQL]CAST AS FLOAT inconsistent with Hive
Github user zuotingbing commented on the issue: https://github.com/apache/spark/pull/21789 yes, only in STS, i will update the title --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19804: [WIP][SPARK-22573][SQL] Shouldn't inferFilters if it con...
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/19804 Thanks @HyukjinKwon, already close some. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21789: [SPARK-24829][SQL]CAST AS FLOAT inconsistent with...
Github user zuotingbing commented on a diff in the pull request: https://github.com/apache/spark/pull/21789#discussion_r203239370 --- Diff: sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suites.scala --- @@ -766,6 +774,14 @@ class HiveThriftHttpServerSuite extends HiveThriftJdbcTest { assert(resultSet.getString(2) === HiveUtils.builtinHiveVersion) } } + + test("Checks cast as float") { --- End diff -- for two different modes: binary & http --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21754: [SPARK-24705][SQL] Cannot reuse an exchange operator wit...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/21754 ping --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15945: [SPARK-12978][SQL] Merge unnecessary partial aggregates
Github user maropu commented on the issue: https://github.com/apache/spark/pull/15945 I'll close for now. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15945: [SPARK-12978][SQL] Merge unnecessary partial aggr...
Github user maropu closed the pull request at: https://github.com/apache/spark/pull/15945 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16605: [SPARK-18884][SQL] Throw an exception in compile time if...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/16605 I'll close for now --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16605: [SPARK-18884][SQL] Throw an exception in compile ...
Github user maropu closed the pull request at: https://github.com/apache/spark/pull/16605 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21460: [SPARK-23442][SQL] Increase reading tasks when re...
Github user wangyum closed the pull request at: https://github.com/apache/spark/pull/21460 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19804: [WIP][SPARK-22573][SQL] Shouldn't inferFilters if...
Github user wangyum closed the pull request at: https://github.com/apache/spark/pull/19804 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20248: [SPARK-23058][SQL] Show non printable field delim...
Github user wangyum closed the pull request at: https://github.com/apache/spark/pull/20248 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19804: [WIP][SPARK-22573][SQL] Shouldn't inferFilters if it con...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19804 @wangyum, I think you could leave close some of PRs that you are currently not working on and have no explicit plan yet to start to work on soon. You could reopen or create new one when you start to work on that. See also http://apache-spark-developers-list.1001551.n3.nabble.com/Stale-PR-update-and-review-request-td24430.html --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21440: [SPARK-24307][CORE] Support reading remote cached...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/21440#discussion_r203237484 --- Diff: core/src/main/scala/org/apache/spark/util/io/ChunkedByteBufferFileRegion.scala --- @@ -0,0 +1,86 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.util.io + +import java.nio.channels.WritableByteChannel + +import io.netty.channel.FileRegion +import io.netty.util.AbstractReferenceCounted + +import org.apache.spark.internal.Logging +import org.apache.spark.network.util.AbstractFileRegion + + +/** + * This exposes a ChunkedByteBuffer as a netty FileRegion, just to allow sending > 2gb in one netty + * message. This is because netty cannot send a ByteBuf > 2g, but it can send a large FileRegion, + * even though the data is not backed by a file. + */ +private[io] class ChunkedByteBufferFileRegion( +private val chunkedByteBuffer: ChunkedByteBuffer, +private val ioChunkSize: Int) extends AbstractFileRegion { + + private var _transferred: Long = 0 + // this duplicates the original chunks, so we're free to modify the position, limit, etc. + private val chunks = chunkedByteBuffer.getChunks() + private val size = chunks.foldLeft(0L) { _ + _.remaining() } + + protected def deallocate: Unit = {} + + override def count(): Long = size + + // this is the "start position" of the overall Data in the backing file, not our current position + override def position(): Long = 0 + + override def transferred(): Long = _transferred + + private var currentChunkIdx = 0 + + def transferTo(target: WritableByteChannel, position: Long): Long = { +assert(position == _transferred) +if (position == size) return 0L +var keepGoing = true +var written = 0L +var currentChunk = chunks(currentChunkIdx) +while (keepGoing) { + while (currentChunk.hasRemaining && keepGoing) { +val ioSize = Math.min(currentChunk.remaining(), ioChunkSize) +val originalLimit = currentChunk.limit() +currentChunk.limit(currentChunk.position() + ioSize) +val thisWriteSize = target.write(currentChunk) +currentChunk.limit(originalLimit) +written += thisWriteSize +if (thisWriteSize < ioSize) { --- End diff -- What will be happened if `thisWriteSize` is smaller than `ioSize`, will Spark throw an exception or something else? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18544: [SPARK-21318][SQL]Improve exception message thrown by `l...
Github user stanzhai commented on the issue: https://github.com/apache/spark/pull/18544 cc @gatorsmile changes in `sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala` has been reverted. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21192: [SPARK-24118][SQL] Flexible format for the lineSep optio...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21192 To me, yes. If you find some times, I would appreciate if you take a look at https://github.com/apache/spark/pull/21192#issuecomment-391405633 too so that we can review each other whoever make a PR first. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21440: [SPARK-24307][CORE] Support reading remote cached...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/21440#discussion_r203236014 --- Diff: core/src/main/scala/org/apache/spark/util/io/ChunkedByteBuffer.scala --- @@ -166,6 +170,34 @@ private[spark] class ChunkedByteBuffer(var chunks: Array[ByteBuffer]) { } +object ChunkedByteBuffer { + // TODO eliminate this method if we switch BlockManager to getting InputStreams + def fromManagedBuffer(data: ManagedBuffer, maxChunkSize: Int): ChunkedByteBuffer = { +data match { + case f: FileSegmentManagedBuffer => +map(f.getFile, maxChunkSize, f.getOffset, f.getLength) + case other => +new ChunkedByteBuffer(other.nioByteBuffer()) +} + } + + def map(file: File, maxChunkSize: Int, offset: Long, length: Long): ChunkedByteBuffer = { +Utils.tryWithResource(new FileInputStream(file).getChannel()) { channel => --- End diff -- Can we please use `FileChannel#open` instead, FileInputStream/FileOutputStream has some issues (https://www.cloudbees.com/blog/fileinputstream-fileoutputstream-considered-harmful) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21589#discussion_r203235747 --- Diff: R/pkg/R/context.R --- @@ -437,3 +437,33 @@ setCheckpointDir <- function(directory) { sc <- getSparkContext() invisible(callJMethod(sc, "setCheckpointDir", suppressWarnings(normalizePath(directory } + +#' Total number of CPU cores of all executors registered in the cluster at the moment +#' and potentially available to jobs submitted via the Spark context. +#' --- End diff -- and `This method is experimental, and its behavior can be changed in the next releases.` too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21589#discussion_r203235671 --- Diff: R/pkg/R/context.R --- @@ -437,3 +437,33 @@ setCheckpointDir <- function(directory) { sc <- getSparkContext() invisible(callJMethod(sc, "setCheckpointDir", suppressWarnings(normalizePath(directory } + +#' Total number of CPU cores of all executors registered in the cluster at the moment +#' and potentially available to jobs submitted via the Spark context. +#' --- End diff -- @MaxGekk, is `The number reflects current status of the cluster and can change in the future` intentionally taken out here or a mistake? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21440: [SPARK-24307][CORE] Support reading remote cached...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/21440#discussion_r203235292 --- Diff: core/src/main/scala/org/apache/spark/util/io/ChunkedByteBuffer.scala --- @@ -17,17 +17,21 @@ package org.apache.spark.util.io -import java.io.InputStream +import java.io.{File, FileInputStream, InputStream} import java.nio.ByteBuffer -import java.nio.channels.WritableByteChannel +import java.nio.channels.{FileChannel, WritableByteChannel} + +import scala.collection.mutable.ListBuffer import com.google.common.primitives.UnsignedBytes -import io.netty.buffer.{ByteBuf, Unpooled} import org.apache.spark.SparkEnv import org.apache.spark.internal.config +import org.apache.spark.network.buffer.{FileSegmentManagedBuffer, ManagedBuffer} import org.apache.spark.network.util.ByteArrayWritableChannel import org.apache.spark.storage.StorageUtils +import org.apache.spark.util.Utils + --- End diff -- nit. This blank line seems not necessary. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21800: [SPARK-24825][K8S][TEST] Kubernetes integration tests bu...
Github user shaneknapp commented on the issue: https://github.com/apache/spark/pull/21800 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21784: [SPARK-24182][YARN][FOLLOW-UP] Turn off noisy log output
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21784 Hm, yea. I don't think find this super noisy though to be honest. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19819: [SPARK-22606][Streaming]Add threadId to the Cache...
Github user eatoncys closed the pull request at: https://github.com/apache/spark/pull/19819 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21785: [SPARK-24529][BUILD][test-maven][FOLLOW-UP] Set s...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21785 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21785: [SPARK-24529][BUILD][test-maven][FOLLOW-UP] Set spotbugs...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21785 Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21785: [SPARK-24529][BUILD][test-maven][FOLLOW-UP] Set spotbugs...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21785 @kiszk, let me just push this in since it fixes an actual issue and the build passes; however, please make a followup if you see something else to fix. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/21635#discussion_r203232034 --- Diff: docs/monitoring.md --- @@ -435,6 +435,7 @@ set of sinks to which metrics are reported. The following instances are currentl * `executor`: A Spark executor. * `driver`: The Spark driver process (the process in which your SparkContext is created). * `shuffleService`: The Spark shuffle service. +* `applicationMaster`: The Spark application master on YARN. --- End diff -- I think it would be better to clarify as "The Spark ApplicationMaster when running on YARN." --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19819: [SPARK-22606][Streaming]Add threadId to the CachedKafkaC...
Github user lvdongr commented on the issue: https://github.com/apache/spark/pull/19819 I've seen your PR: https://github.com/apache/spark/pull/20997, a good solution @gaborgsomogyi --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org