[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107312905 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -161,6 +162,110 @@ sealed trait Matrix extends Serializable { */ @Since("2.0.0") def numActives: Int + + /** + * Converts this matrix to a sparse matrix. + * + * @param colMajor Whether the values of the resulting sparse matrix should be in column major + *or row major order. If `false`, resulting matrix will be row major. + */ + private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix + + /** + * Converts this matrix to a sparse matrix in column major order. + */ + @Since("2.2.0") + def toCSC: SparseMatrix = toSparseMatrix(colMajor = true) + + /** + * Converts this matrix to a sparse matrix in row major order. + */ + @Since("2.2.0") + def toCSR: SparseMatrix = toSparseMatrix(colMajor = false) + + /** + * Converts this matrix to a sparse matrix in column major order. + */ + @Since("2.2.0") + def toSparse: SparseMatrix = toSparseMatrix(colMajor = true) --- End diff -- I'm debating that should we keep the same ordering of layout when we call `toSparse` or `toDense`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17381: [SPARK-20023] [SQL] Output table comment for DESC FORMAT...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17381 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17381: [SPARK-20023] [SQL] Output table comment for DESC FORMAT...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17381 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75009/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17302: [SPARK-19959][SQL] Fix to throw NullPointerExcept...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/17302#discussion_r107315291 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala --- @@ -41,7 +41,20 @@ object CatalystSerde { } def generateObjAttr[T : Encoder]: Attribute = { -AttributeReference("obj", encoderFor[T].deserializer.dataType, nullable = false)() +val deserializer = encoderFor[T].deserializer +val dataType = deserializer.dataType +val nullable = if (deserializer.childrenResolved) { --- End diff -- The point is whether `deserializer` is resolved or not. It is not a point whether `encodeFor[T]` is resolved or not. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17100: [SPARK-13947][PYTHON][SQL] PySpark DataFrames: The error...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17100 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17100: [SPARK-13947][PYTHON][SQL] PySpark DataFrames: The error...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17100 **[Test build #75013 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75013/testReport)** for PR 17100 at commit [`7bb6f35`](https://github.com/apache/spark/commit/7bb6f35d9153ea522b8ccb4bcd44bf1c70cc64f0). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17100: [SPARK-13947][PYTHON][SQL] PySpark DataFrames: The error...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17100 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75013/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17348: [SPARK-20018][SQL] Pivot with timestamp and count should...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17348 @ueshin, you are right. I think we should consider the timezone. ```scala val timestamp = java.sql.Timestamp.valueOf("2012-12-31 16:00:10.011") spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles") Seq(timestamp).toDF("a").groupBy("a").pivot("a").count() ``` ``` ++---+ | a|2012-12-31 16:00:10.011| ++---+ |2012-12-30 23:00:...| 1| ++---+ ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17348: [SPARK-20018][SQL] Pivot with timestamp and count...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17348#discussion_r107319052 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -486,14 +486,16 @@ class Analyzer( case Pivot(groupByExprs, pivotColumn, pivotValues, aggregates, child) => val singleAgg = aggregates.size == 1 def outputName(value: Literal, aggregate: Expression): String = { + val utf8val = Cast(value, StringType, Some(conf.sessionLocalTimeZone)).eval(EmptyRow) --- End diff -- BTW, is this a correct way for handling timezone - @ueshin ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17335: [SPARK-19995][Hive][Yarn] Using real user to initialize ...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/17335 Ping @vanzin , mind reviewing again? Thanks a lot. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17348: [SPARK-20018][SQL] Pivot with timestamp and count...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17348#discussion_r107319641 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -486,14 +486,16 @@ class Analyzer( case Pivot(groupByExprs, pivotColumn, pivotValues, aggregates, child) => val singleAgg = aggregates.size == 1 def outputName(value: Literal, aggregate: Expression): String = { + val utf8Value = Cast(value, StringType, Some(conf.sessionLocalTimeZone)).eval(EmptyRow) --- End diff -- It seems we can cast into `StringType` in all the ways - https://github.com/apache/spark/blob/e9e2c612d58a19ddcb4b6abfb7389a4b0f7ef6f8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L41 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17348: [SPARK-20018][SQL] Pivot with timestamp and count should...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17348 **[Test build #75019 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75019/testReport)** for PR 17348 at commit [`4e4cfa7`](https://github.com/apache/spark/commit/4e4cfa76727cc213f71d2f9b5bf2bf7c5905c54e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17382: [SPARK-20051][SS] Fix StreamSuite flaky test - recover f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17382 **[Test build #75014 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75014/testReport)** for PR 17382 at commit [`8c0fcb1`](https://github.com/apache/spark/commit/8c0fcb1f5406e76908ee045d4530d09e5da1c017). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17350: [SPARK-20017][SQL] change the nullability of function 'S...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/17350 Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17302: [SPARK-19959][SQL] Fix to throw NullPointerExcept...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/17302#discussion_r107321520 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala --- @@ -41,7 +41,20 @@ object CatalystSerde { } def generateObjAttr[T : Encoder]: Attribute = { -AttributeReference("obj", encoderFor[T].deserializer.dataType, nullable = false)() +val deserializer = encoderFor[T].deserializer +val dataType = deserializer.dataType +val nullable = if (deserializer.childrenResolved) { --- End diff -- `deserializer` is resolved at [here](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala#L251-L260). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17378: [SPARK-20046][SQL] Facilitate loop optimizations in a JI...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/17378 cc @hvanhovell --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17295: [SPARK-19556][core] Do not encrypt block manager ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17295#discussion_r107322905 --- Diff: core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala --- @@ -219,18 +219,22 @@ private[spark] class TorrentBroadcast[T: ClassTag](obj: T, id: Long) case None => logInfo("Started reading broadcast variable " + id) val startTimeMs = System.currentTimeMillis() - val blocks = readBlocks().flatMap(_.getChunks()) + val blocks = readBlocks() logInfo("Reading broadcast variable " + id + " took" + Utils.getUsedTimeMs(startTimeMs)) - val obj = TorrentBroadcast.unBlockifyObject[T]( -blocks, SparkEnv.get.serializer, compressionCodec) - // Store the merged copy in BlockManager so other tasks on this executor don't - // need to re-fetch it. - val storageLevel = StorageLevel.MEMORY_AND_DISK - if (!blockManager.putSingle(broadcastId, obj, storageLevel, tellMaster = false)) { -throw new SparkException(s"Failed to store $broadcastId in BlockManager") + try { +val obj = TorrentBroadcast.unBlockifyObject[T]( + blocks.map(_.toInputStream()), SparkEnv.get.serializer, compressionCodec) +// Store the merged copy in BlockManager so other tasks on this executor don't +// need to re-fetch it. +val storageLevel = StorageLevel.MEMORY_AND_DISK +if (!blockManager.putSingle(broadcastId, obj, storageLevel, tellMaster = false)) { + throw new SparkException(s"Failed to store $broadcastId in BlockManager") +} +obj + } finally { +blocks.foreach(_.dispose()) --- End diff -- ah good catch! we should dispose the blocks here --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17295: [SPARK-19556][core] Do not encrypt block manager ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17295#discussion_r107323246 --- Diff: core/src/main/scala/org/apache/spark/security/CryptoStreamUtils.scala --- @@ -63,12 +84,27 @@ private[spark] object CryptoStreamUtils extends Logging { is: InputStream, sparkConf: SparkConf, key: Array[Byte]): InputStream = { -val properties = toCryptoConf(sparkConf) val iv = new Array[Byte](IV_LENGTH_IN_BYTES) -is.read(iv, 0, iv.length) -val transformationStr = sparkConf.get(IO_CRYPTO_CIPHER_TRANSFORMATION) -new CryptoInputStream(transformationStr, properties, is, - new SecretKeySpec(key, "AES"), new IvParameterSpec(iv)) +ByteStreams.readFully(is, iv) +val params = new CryptoParams(key, sparkConf) +new CryptoInputStream(params.transformation, params.conf, is, params.keySpec, + new IvParameterSpec(iv)) + } + + /** + * Wrap a `ReadableByteChannel` for decryption. + */ + def createReadableChannel( + channel: ReadableByteChannel, + sparkConf: SparkConf, + key: Array[Byte]): ReadableByteChannel = { +val iv = new Array[Byte](IV_LENGTH_IN_BYTES) +val buf = ByteBuffer.wrap(iv) +JavaUtils.readFully(channel, buf) --- End diff -- why not use `ByteStreams.readFully`? the `buf` is not used else where --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17302: [SPARK-19959][SQL] Fix to throw NullPointerExcept...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17302#discussion_r107323253 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameImplicitsSuite.scala --- @@ -51,4 +51,15 @@ class DataFrameImplicitsSuite extends QueryTest with SharedSQLContext { sparkContext.parallelize(1 to 10).map(_.toString).toDF("stringCol"), (1 to 10).map(i => Row(i.toString))) } + + test("SPARK-19959: df[java.lang.Long].collect includes null throws NullPointerException") { +val dfInt = sparkContext.parallelize(Seq[java.lang.Integer](0, null, 2), 1).toDF +assert(dfInt.collect === Array(Row(0), Row(null), Row(2))) --- End diff -- Use `checkAnswer`? ```Scala checkAnswer( sparkContext.parallelize(Seq[java.lang.Integer](0, null, 2), 1).toDF, Seq(Row(0), Row(null), Row(2))) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17295: [SPARK-19556][core] Do not encrypt block manager ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17295#discussion_r107323552 --- Diff: core/src/main/scala/org/apache/spark/security/CryptoStreamUtils.scala --- @@ -48,12 +51,30 @@ private[spark] object CryptoStreamUtils extends Logging { os: OutputStream, sparkConf: SparkConf, key: Array[Byte]): OutputStream = { -val properties = toCryptoConf(sparkConf) -val iv = createInitializationVector(properties) +val params = new CryptoParams(key, sparkConf) +val iv = createInitializationVector(params.conf) os.write(iv) -val transformationStr = sparkConf.get(IO_CRYPTO_CIPHER_TRANSFORMATION) -new CryptoOutputStream(transformationStr, properties, os, - new SecretKeySpec(key, "AES"), new IvParameterSpec(iv)) +new CryptoOutputStream(params.transformation, params.conf, os, params.keySpec, + new IvParameterSpec(iv)) + } + + /** + * Wrap a `WritableByteChannel` for encryption. + */ + def createWritableChannel( + channel: WritableByteChannel, + sparkConf: SparkConf, + key: Array[Byte]): WritableByteChannel = { +val params = new CryptoParams(key, sparkConf) +val iv = createInitializationVector(params.conf) +val buf = ByteBuffer.wrap(iv) +while (buf.hasRemaining()) { --- End diff -- actually this logic is same with `CryptoHelperChannel`. Shall we create `CryptoHelperChannel` first and simply call `helper.write(buf)` here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15628: [SPARK-17471][ML] Add compressed method to ML matrices
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15628 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15628: [SPARK-17471][ML] Add compressed method to ML matrices
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15628 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75016/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17361: [SPARK-20030][SS] Event-time-based timeout for Ma...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/17361#discussion_r107307445 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FlatMapGroupsWithStateSuite.scala --- @@ -519,6 +588,52 @@ class FlatMapGroupsWithStateSuite extends StateStoreMetricsTest with BeforeAndAf ) } + test("flatMapGroupsWithState - streaming with event time timeout") { +// Function to maintain the max event time +// Returns the max event time in the state, or -1 if the state was removed by timeout +val stateFunc = ( +key: String, +values: Iterator[(String, Long)], +state: KeyedState[Long]) => { + val timeoutDelay = 5 + if (key != "a") { +Iterator.empty + } else { +if (state.hasTimedOut) { + state.remove() + Iterator((key, -1)) +} else { + val valuesSeq = values.toSeq + val maxEventTime = math.max(valuesSeq.map(_._2).max, state.getOption.getOrElse(0L)) + val timeoutTimestampMs = maxEventTime + timeoutDelay + state.update(maxEventTime) + state.setTimeoutTimestamp(timeoutTimestampMs * 1000) + Iterator((key, maxEventTime.toInt)) +} + } +} +val inputData = MemoryStream[(String, Int)] +val result = + inputData.toDS +.select($"_1".as("key"), $"_2".cast("timestamp").as("eventTime")) +.withWatermark("eventTime", "10 seconds") +.as[(String, Long)] +.groupByKey[String]((x: (String, Long)) => x._1) +.flatMapGroupsWithState[Long, (String, Int)](Update, EventTimeTimeout)(stateFunc) --- End diff -- I was debugging and I left them there thinking it help readability of tests. I can remove them. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17361: [SPARK-20030][SS] Event-time-based timeout for Ma...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/17361#discussion_r107307617 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FlatMapGroupsWithStateSuite.scala --- @@ -519,6 +588,52 @@ class FlatMapGroupsWithStateSuite extends StateStoreMetricsTest with BeforeAndAf ) } + test("flatMapGroupsWithState - streaming with event time timeout") { +// Function to maintain the max event time +// Returns the max event time in the state, or -1 if the state was removed by timeout +val stateFunc = ( +key: String, +values: Iterator[(String, Long)], +state: KeyedState[Long]) => { + val timeoutDelay = 5 + if (key != "a") { +Iterator.empty + } else { +if (state.hasTimedOut) { + state.remove() + Iterator((key, -1)) +} else { + val valuesSeq = values.toSeq + val maxEventTime = math.max(valuesSeq.map(_._2).max, state.getOption.getOrElse(0L)) + val timeoutTimestampMs = maxEventTime + timeoutDelay + state.update(maxEventTime) + state.setTimeoutTimestamp(timeoutTimestampMs * 1000) + Iterator((key, maxEventTime.toInt)) +} + } +} +val inputData = MemoryStream[(String, Int)] +val result = + inputData.toDS +.select($"_1".as("key"), $"_2".cast("timestamp").as("eventTime")) +.withWatermark("eventTime", "10 seconds") +.as[(String, Long)] +.groupByKey[String]((x: (String, Long)) => x._1) +.flatMapGroupsWithState[Long, (String, Int)](Update, EventTimeTimeout)(stateFunc) --- End diff -- As long as they aren't required its okay. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17361: [SPARK-20030][SS] Event-time-based timeout for Ma...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/17361#discussion_r107307589 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/streaming/KeyedStateTimeout.java --- @@ -34,9 +32,20 @@ @InterfaceStability.Evolving public class KeyedStateTimeout { - /** Timeout based on processing time. */ + /** + * Timeout based on processing time. The duration of timeout can be set for each group in + * `map/flatMapGroupsWithState` by calling `KeyedState.setTimeoutDuration()`. + */ public static KeyedStateTimeout ProcessingTimeTimeout() { return ProcessingTimeTimeout$.MODULE$; } --- End diff -- Its just that if someone this does `import KeyedStateTimeout._` the code boils down to `flatMapGroupsWithState(Update, ProcessingTime) { ... } ` with no reference to timeout. Fine either way. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17361: [SPARK-20030][SS] Event-time-based timeout for MapGroups...
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/17361 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17361: [SPARK-20030][SS] Event-time-based timeout for Ma...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/17361#discussion_r107309220 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala --- @@ -147,49 +147,68 @@ object UnsupportedOperationChecker { throwError("Commands like CreateTable*, AlterTable*, Show* are not supported with " + "streaming DataFrames/Datasets") -// mapGroupsWithState: Allowed only when no aggregation + Update output mode -case m: FlatMapGroupsWithState if m.isStreaming && m.isMapGroupsWithState => - if (collectStreamingAggregates(plan).isEmpty) { -if (outputMode != InternalOutputModes.Update) { - throwError("mapGroupsWithState is not supported with " + -s"$outputMode output mode on a streaming DataFrame/Dataset") -} else { - // Allowed when no aggregation + Update output mode -} - } else { -throwError("mapGroupsWithState is not supported with aggregation " + - "on a streaming DataFrame/Dataset") - } - -// flatMapGroupsWithState without aggregation -case m: FlatMapGroupsWithState - if m.isStreaming && collectStreamingAggregates(plan).isEmpty => - m.outputMode match { -case InternalOutputModes.Update => - if (outputMode != InternalOutputModes.Update) { -throwError("flatMapGroupsWithState in update mode is not supported with " + +// mapGroupsWithState and flatMapGroupsWithState +case m: FlatMapGroupsWithState if m.isStreaming => + + // Check compatibility with output modes and aggregations in query + val aggsAfterFlatMapGroups = collectStreamingAggregates(plan) + + if (m.isMapGroupsWithState) { // check mapGroupsWithState +// allowed only in update query output mode and without aggregation +if (aggsAfterFlatMapGroups.nonEmpty) { + throwError( +"mapGroupsWithState is not supported with aggregation " + + "on a streaming DataFrame/Dataset") +} else if (outputMode != InternalOutputModes.Update) { + throwError( +"mapGroupsWithState is not supported with " + s"$outputMode output mode on a streaming DataFrame/Dataset") +} + } else { // check latMapGroupsWithState +if (aggsAfterFlatMapGroups.isEmpty) { + // flatMapGroupsWithState without aggregation: operation's output mode must + // match query output mode + m.outputMode match { +case InternalOutputModes.Update if outputMode != InternalOutputModes.Update => + throwError( +"flatMapGroupsWithState in update mode is not supported with " + + s"$outputMode output mode on a streaming DataFrame/Dataset") + +case InternalOutputModes.Append if outputMode != InternalOutputModes.Append => + throwError( +"flatMapGroupsWithState in append mode is not supported with " + + s"$outputMode output mode on a streaming DataFrame/Dataset") + +case _ => } -case InternalOutputModes.Append => - if (outputMode != InternalOutputModes.Append) { -throwError("flatMapGroupsWithState in append mode is not supported with " + - s"$outputMode output mode on a streaming DataFrame/Dataset") +} else { + // flatMapGroupsWithState with aggregation: update operation mode not allowed, and + // *groupsWithState after aggregation not allowed + if (m.outputMode == InternalOutputModes.Update) { +throwError( + "flatMapGroupsWithState in update mode is not supported with " + +"aggregation on a streaming DataFrame/Dataset") + } else if (collectStreamingAggregates(m).nonEmpty) { +throwError( + "flatMapGroupsWithState in append mode is not supported after " + +s"aggregation on a streaming DataFrame/Dataset") } +} } -// flatMapGroupsWithState(Update) with aggregation -case m: FlatMapGroupsWithState - if m.isStreaming && m.outputMode == InternalOutputModes.Update -&&
[GitHub] spark issue #17256: [SPARK-19919][SQL] Defer throwing the exception for empt...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/17256 LGTM, merging to master! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17316: [SPARK-15040][ML][PYSPARK] Add Imputer to PySpark
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17316 **[Test build #75012 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75012/testReport)** for PR 17316 at commit [`7fd17dd`](https://github.com/apache/spark/commit/7fd17dd43441b2c7212f964efd921e8c2d429a9b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17361: [SPARK-20030][SS] Event-time-based timeout for MapGroups...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17361 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75010/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17361: [SPARK-20030][SS] Event-time-based timeout for MapGroups...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17361 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13932: [SPARK-15354] [CORE] Topology aware block replica...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13932#discussion_r107310985 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockReplicationPolicy.scala --- @@ -53,6 +53,48 @@ trait BlockReplicationPolicy { numReplicas: Int): List[BlockManagerId] } +object BlockReplicationUtils { + // scalastyle:off line.size.limit + /** + * Uses sampling algorithm by Robert Floyd. Finds a random sample in O(n) while + * minimizing space usage. Please see http://math.stackexchange.com/questions/178690/whats-the-proof-of-correctness-for-robert-floyds-algorithm-for-selecting-a-sin;> + * here. + * + * @param n total number of indices + * @param m number of samples needed + * @param r random number generator + * @return list of m random unique indices + */ + // scalastyle:on line.size.limit + private def getSampleIds(n: Int, m: Int, r: Random): List[Int] = { +val indices = (n - m + 1 to n).foldLeft(Set.empty[Int]) {case (set, i) => + val t = r.nextInt(i) + 1 + if (set.contains(t)) set + i else set + t +} +// we shuffle the result to ensure a random arrangement within the sample +// to avoid any bias from set implementations +r.shuffle(indices.map(_ - 1).toList) --- End diff -- shall we use `LinkedHashSet` so that we don't need this extra shuffle? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13932: [SPARK-15354] [CORE] Topology aware block replica...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13932#discussion_r107311406 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockReplicationPolicy.scala --- @@ -88,26 +131,94 @@ class RandomBlockReplicationPolicy logDebug(s"Prioritized peers : ${prioritizedPeers.mkString(", ")}") prioritizedPeers } +} + +@DeveloperApi +class BasicBlockReplicationPolicy + extends BlockReplicationPolicy +with Logging { - // scalastyle:off line.size.limit /** - * Uses sampling algorithm by Robert Floyd. Finds a random sample in O(n) while - * minimizing space usage. Please see http://math.stackexchange.com/questions/178690/whats-the-proof-of-correctness-for-robert-floyds-algorithm-for-selecting-a-sin;> - * here. + * Method to prioritize a bunch of candidate peers of a block manager. This implementation + * replicates the behavior of block replication in HDFS, a peer is chosen within the rack, --- End diff -- can we explain the replicating logic for any replication factor? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15628: [SPARK-17471][ML] Add compressed method to ML mat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/15628#discussion_r107306774 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala --- @@ -161,6 +162,109 @@ sealed trait Matrix extends Serializable { */ @Since("2.0.0") def numActives: Int + + /** + * Converts this matrix to a sparse matrix. + * + * @param colMajor Whether the values of the resulting sparse matrix should be in column major + *or row major order. If `false`, resulting matrix will be row major. + */ + private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix + + /** + * Converts this matrix to a sparse matrix in column major order. + */ + @Since("2.2.0") + def toCSC: SparseMatrix = toSparseMatrix(colMajor = true) + + /** + * Converts this matrix to a sparse matrix in row major order. + */ + @Since("2.2.0") + def toCSR: SparseMatrix = toSparseMatrix(colMajor = false) --- End diff -- Same question. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17348: [SPARK-20018][SQL] Pivot with timestamp and count should...
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/17348 What if session local timezone is changed? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17166 **[Test build #75005 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75005/testReport)** for PR 17166 at commit [`5707715`](https://github.com/apache/spark/commit/570771555c877fc0b7a8c989e14fdaf4aa79c217). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15009: [SPARK-17443][SPARK-11035] Stop Spark Application if lau...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15009 **[Test build #75006 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75006/testReport)** for PR 15009 at commit [`fe5b5d6`](https://github.com/apache/spark/commit/fe5b5d64b56dd55ad4a956619bf41c3492975f89). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17361: [SPARK-20030][SS] Event-time-based timeout for MapGroups...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17361 **[Test build #75017 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75017/testReport)** for PR 17361 at commit [`9c9668b`](https://github.com/apache/spark/commit/9c9668b9e2d76e3ef56f6a8094c76b5a38178d1b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17381: [SPARK-20023] [SQL] Output table comment for DESC FORMAT...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17381 **[Test build #75009 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75009/testReport)** for PR 17381 at commit [`6890bbc`](https://github.com/apache/spark/commit/6890bbc36cbefcb886fb772c8199ef64188d4db8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17350: [SPARK-20017][SQL] change the nullability of function 'S...
Github user zhaorongsheng commented on the issue: https://github.com/apache/spark/pull/17350 @gatorsmile OK, I will do it and I will give you feedback as soon as possible. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13932: [SPARK-15354] [CORE] Topology aware block replica...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13932#discussion_r107316887 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockReplicationPolicy.scala --- @@ -88,26 +131,94 @@ class RandomBlockReplicationPolicy logDebug(s"Prioritized peers : ${prioritizedPeers.mkString(", ")}") prioritizedPeers } +} + +@DeveloperApi +class BasicBlockReplicationPolicy + extends BlockReplicationPolicy +with Logging { - // scalastyle:off line.size.limit /** - * Uses sampling algorithm by Robert Floyd. Finds a random sample in O(n) while - * minimizing space usage. Please see http://math.stackexchange.com/questions/178690/whats-the-proof-of-correctness-for-robert-floyds-algorithm-for-selecting-a-sin;> - * here. + * Method to prioritize a bunch of candidate peers of a block manager. This implementation + * replicates the behavior of block replication in HDFS, a peer is chosen within the rack, + * one outside and that's it. This works best with a total replication factor of 3. * - * @param n total number of indices - * @param m number of samples needed - * @param r random number generator - * @return list of m random unique indices + * @param blockManagerIdId of the current BlockManager for self identification + * @param peers A list of peers of a BlockManager + * @param peersReplicatedTo Set of peers already replicated to + * @param blockId BlockId of the block being replicated. This can be used as a source of + * randomness if needed. + * @param numReplicas Number of peers we need to replicate to + * @return A prioritized list of peers. Lower the index of a peer, higher its priority */ - // scalastyle:on line.size.limit - private def getSampleIds(n: Int, m: Int, r: Random): List[Int] = { -val indices = (n - m + 1 to n).foldLeft(Set.empty[Int]) {case (set, i) => - val t = r.nextInt(i) + 1 - if (set.contains(t)) set + i else set + t + override def prioritize( + blockManagerId: BlockManagerId, + peers: Seq[BlockManagerId], + peersReplicatedTo: mutable.HashSet[BlockManagerId], + blockId: BlockId, + numReplicas: Int): List[BlockManagerId] = { + +logDebug(s"Input peers : $peers") +logDebug(s"BlockManagerId : $blockManagerId") + +val random = new Random(blockId.hashCode) + +// if block doesn't have topology info, we can't do much, so we randlomly shuffle +// if there is, we see what's needed from peersReplicatedTo and based on numReplicas, +// we choose whats needed +if (blockManagerId.topologyInfo.isEmpty || numReplicas == 0) { + // no topology info for the block. The best we can do is randomly choose peers + BlockReplicationUtils.getRandomSample(peers, numReplicas, random) +} else { + // we have topology information, we see what is left to be done from peersReplicatedTo + val doneWithinRack = peersReplicatedTo.exists(_.topologyInfo == blockManagerId.topologyInfo) + val doneOutsideRack = peersReplicatedTo.exists { p => +p.topologyInfo.isDefined && p.topologyInfo != blockManagerId.topologyInfo + } + + if (doneOutsideRack && doneWithinRack) { --- End diff -- what? I think this branch is where we should do smart replication --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13932: [SPARK-15354] [CORE] Topology aware block replica...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13932#discussion_r107316905 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockReplicationPolicy.scala --- @@ -88,26 +131,94 @@ class RandomBlockReplicationPolicy logDebug(s"Prioritized peers : ${prioritizedPeers.mkString(", ")}") prioritizedPeers } +} + +@DeveloperApi +class BasicBlockReplicationPolicy + extends BlockReplicationPolicy +with Logging { - // scalastyle:off line.size.limit /** - * Uses sampling algorithm by Robert Floyd. Finds a random sample in O(n) while - * minimizing space usage. Please see http://math.stackexchange.com/questions/178690/whats-the-proof-of-correctness-for-robert-floyds-algorithm-for-selecting-a-sin;> - * here. + * Method to prioritize a bunch of candidate peers of a block manager. This implementation + * replicates the behavior of block replication in HDFS, a peer is chosen within the rack, + * one outside and that's it. This works best with a total replication factor of 3. * - * @param n total number of indices - * @param m number of samples needed - * @param r random number generator - * @return list of m random unique indices + * @param blockManagerIdId of the current BlockManager for self identification + * @param peers A list of peers of a BlockManager + * @param peersReplicatedTo Set of peers already replicated to + * @param blockId BlockId of the block being replicated. This can be used as a source of + * randomness if needed. + * @param numReplicas Number of peers we need to replicate to + * @return A prioritized list of peers. Lower the index of a peer, higher its priority */ - // scalastyle:on line.size.limit - private def getSampleIds(n: Int, m: Int, r: Random): List[Int] = { -val indices = (n - m + 1 to n).foldLeft(Set.empty[Int]) {case (set, i) => - val t = r.nextInt(i) + 1 - if (set.contains(t)) set + i else set + t + override def prioritize( + blockManagerId: BlockManagerId, + peers: Seq[BlockManagerId], + peersReplicatedTo: mutable.HashSet[BlockManagerId], + blockId: BlockId, + numReplicas: Int): List[BlockManagerId] = { + +logDebug(s"Input peers : $peers") +logDebug(s"BlockManagerId : $blockManagerId") + +val random = new Random(blockId.hashCode) + +// if block doesn't have topology info, we can't do much, so we randlomly shuffle +// if there is, we see what's needed from peersReplicatedTo and based on numReplicas, +// we choose whats needed +if (blockManagerId.topologyInfo.isEmpty || numReplicas == 0) { + // no topology info for the block. The best we can do is randomly choose peers + BlockReplicationUtils.getRandomSample(peers, numReplicas, random) +} else { + // we have topology information, we see what is left to be done from peersReplicatedTo + val doneWithinRack = peersReplicatedTo.exists(_.topologyInfo == blockManagerId.topologyInfo) + val doneOutsideRack = peersReplicatedTo.exists { p => +p.topologyInfo.isDefined && p.topologyInfo != blockManagerId.topologyInfo + } + + if (doneOutsideRack && doneWithinRack) { +// we are done, we just return a random sample --- End diff -- what? I think this branch is where we should do smart replication --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17348: [SPARK-20018][SQL] Pivot with timestamp and count should...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17348 **[Test build #75018 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75018/testReport)** for PR 17348 at commit [`93f05f3`](https://github.com/apache/spark/commit/93f05f3545d9af335ca1f6c711b6f84b9938b95e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17361: [SPARK-20030][SS] Event-time-based timeout for MapGroups...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17361 **[Test build #3604 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3604/testReport)** for PR 17361 at commit [`9c9668b`](https://github.com/apache/spark/commit/9c9668b9e2d76e3ef56f6a8094c76b5a38178d1b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17250: [SPARK-19911][STREAMING] Add builder interface fo...
Github user budde commented on a diff in the pull request: https://github.com/apache/spark/pull/17250#discussion_r107318916 --- Diff: external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/SerializableCredentialsProvider.scala --- @@ -83,3 +84,146 @@ private[kinesis] final case class STSCredentialsProvider( } } } + +@InterfaceStability.Stable +object SerializableCredentialsProvider { --- End diff -- I agree we should definitely come up with a better name here. What about ```SparkAWSCredentials```? Obviously it's not as succinct as ```AWSCredentials``` but I think it's a clear name that avoids collisions. I'm okay with ```CredentialsProvider``` otherwise. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17379: [SPARK-20048][SQL] Cloning SessionState does not clone q...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17379 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17361: [SPARK-20030][SS] Event-time-based timeout for Ma...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/17361#discussion_r107307253 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala --- @@ -147,49 +147,68 @@ object UnsupportedOperationChecker { throwError("Commands like CreateTable*, AlterTable*, Show* are not supported with " + "streaming DataFrames/Datasets") -// mapGroupsWithState: Allowed only when no aggregation + Update output mode -case m: FlatMapGroupsWithState if m.isStreaming && m.isMapGroupsWithState => - if (collectStreamingAggregates(plan).isEmpty) { -if (outputMode != InternalOutputModes.Update) { - throwError("mapGroupsWithState is not supported with " + -s"$outputMode output mode on a streaming DataFrame/Dataset") -} else { - // Allowed when no aggregation + Update output mode -} - } else { -throwError("mapGroupsWithState is not supported with aggregation " + - "on a streaming DataFrame/Dataset") - } - -// flatMapGroupsWithState without aggregation -case m: FlatMapGroupsWithState - if m.isStreaming && collectStreamingAggregates(plan).isEmpty => - m.outputMode match { -case InternalOutputModes.Update => - if (outputMode != InternalOutputModes.Update) { -throwError("flatMapGroupsWithState in update mode is not supported with " + +// mapGroupsWithState and flatMapGroupsWithState +case m: FlatMapGroupsWithState if m.isStreaming => + + // Check compatibility with output modes and aggregations in query + val aggsAfterFlatMapGroups = collectStreamingAggregates(plan) + + if (m.isMapGroupsWithState) { // check mapGroupsWithState +// allowed only in update query output mode and without aggregation +if (aggsAfterFlatMapGroups.nonEmpty) { + throwError( +"mapGroupsWithState is not supported with aggregation " + + "on a streaming DataFrame/Dataset") +} else if (outputMode != InternalOutputModes.Update) { + throwError( +"mapGroupsWithState is not supported with " + s"$outputMode output mode on a streaming DataFrame/Dataset") +} + } else { // check latMapGroupsWithState +if (aggsAfterFlatMapGroups.isEmpty) { + // flatMapGroupsWithState without aggregation: operation's output mode must + // match query output mode + m.outputMode match { +case InternalOutputModes.Update if outputMode != InternalOutputModes.Update => + throwError( +"flatMapGroupsWithState in update mode is not supported with " + + s"$outputMode output mode on a streaming DataFrame/Dataset") + +case InternalOutputModes.Append if outputMode != InternalOutputModes.Append => + throwError( +"flatMapGroupsWithState in append mode is not supported with " + + s"$outputMode output mode on a streaming DataFrame/Dataset") + +case _ => } -case InternalOutputModes.Append => - if (outputMode != InternalOutputModes.Append) { -throwError("flatMapGroupsWithState in append mode is not supported with " + - s"$outputMode output mode on a streaming DataFrame/Dataset") +} else { + // flatMapGroupsWithState with aggregation: update operation mode not allowed, and + // *groupsWithState after aggregation not allowed + if (m.outputMode == InternalOutputModes.Update) { +throwError( + "flatMapGroupsWithState in update mode is not supported with " + +"aggregation on a streaming DataFrame/Dataset") + } else if (collectStreamingAggregates(m).nonEmpty) { +throwError( + "flatMapGroupsWithState in append mode is not supported after " + +s"aggregation on a streaming DataFrame/Dataset") } +} } -// flatMapGroupsWithState(Update) with aggregation -case m: FlatMapGroupsWithState - if m.isStreaming && m.outputMode == InternalOutputModes.Update -&&
[GitHub] spark pull request #17361: [SPARK-20030][SS] Event-time-based timeout for Ma...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/17361#discussion_r107307367 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/KeyedStateImpl.scala --- @@ -17,37 +17,45 @@ package org.apache.spark.sql.execution.streaming +import java.sql.Date + import org.apache.commons.lang3.StringUtils -import org.apache.spark.sql.streaming.KeyedState +import org.apache.spark.sql.catalyst.plans.logical.{EventTimeTimeout, ProcessingTimeTimeout} +import org.apache.spark.sql.execution.streaming.KeyedStateImpl._ +import org.apache.spark.sql.streaming.{KeyedState, KeyedStateTimeout} import org.apache.spark.unsafe.types.CalendarInterval + /** * Internal implementation of the [[KeyedState]] interface. Methods are not thread-safe. * @param optionalValue Optional value of the state * @param batchProcessingTimeMs Processing time of current batch, used to calculate timestamp * for processing time timeouts - * @param isTimeoutEnabled Whether timeout is enabled. This will be used to check whether the user - * is allowed to configure timeouts. + * @param timeoutConf Type of timeout configured. Based on this, different operations will + *be supported. --- End diff -- nit: indent is inconsistent --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17361: [SPARK-20030][SS] Event-time-based timeout for Ma...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/17361#discussion_r107307283 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala --- @@ -147,49 +147,68 @@ object UnsupportedOperationChecker { throwError("Commands like CreateTable*, AlterTable*, Show* are not supported with " + "streaming DataFrames/Datasets") -// mapGroupsWithState: Allowed only when no aggregation + Update output mode -case m: FlatMapGroupsWithState if m.isStreaming && m.isMapGroupsWithState => - if (collectStreamingAggregates(plan).isEmpty) { -if (outputMode != InternalOutputModes.Update) { - throwError("mapGroupsWithState is not supported with " + -s"$outputMode output mode on a streaming DataFrame/Dataset") -} else { - // Allowed when no aggregation + Update output mode -} - } else { -throwError("mapGroupsWithState is not supported with aggregation " + - "on a streaming DataFrame/Dataset") - } - -// flatMapGroupsWithState without aggregation -case m: FlatMapGroupsWithState - if m.isStreaming && collectStreamingAggregates(plan).isEmpty => - m.outputMode match { -case InternalOutputModes.Update => - if (outputMode != InternalOutputModes.Update) { -throwError("flatMapGroupsWithState in update mode is not supported with " + +// mapGroupsWithState and flatMapGroupsWithState +case m: FlatMapGroupsWithState if m.isStreaming => + + // Check compatibility with output modes and aggregations in query + val aggsAfterFlatMapGroups = collectStreamingAggregates(plan) + + if (m.isMapGroupsWithState) { // check mapGroupsWithState +// allowed only in update query output mode and without aggregation +if (aggsAfterFlatMapGroups.nonEmpty) { + throwError( +"mapGroupsWithState is not supported with aggregation " + + "on a streaming DataFrame/Dataset") +} else if (outputMode != InternalOutputModes.Update) { + throwError( +"mapGroupsWithState is not supported with " + s"$outputMode output mode on a streaming DataFrame/Dataset") +} + } else { // check latMapGroupsWithState +if (aggsAfterFlatMapGroups.isEmpty) { + // flatMapGroupsWithState without aggregation: operation's output mode must + // match query output mode + m.outputMode match { +case InternalOutputModes.Update if outputMode != InternalOutputModes.Update => + throwError( +"flatMapGroupsWithState in update mode is not supported with " + + s"$outputMode output mode on a streaming DataFrame/Dataset") + +case InternalOutputModes.Append if outputMode != InternalOutputModes.Append => + throwError( +"flatMapGroupsWithState in append mode is not supported with " + + s"$outputMode output mode on a streaming DataFrame/Dataset") + +case _ => } -case InternalOutputModes.Append => - if (outputMode != InternalOutputModes.Append) { -throwError("flatMapGroupsWithState in append mode is not supported with " + - s"$outputMode output mode on a streaming DataFrame/Dataset") +} else { + // flatMapGroupsWithState with aggregation: update operation mode not allowed, and + // *groupsWithState after aggregation not allowed + if (m.outputMode == InternalOutputModes.Update) { +throwError( + "flatMapGroupsWithState in update mode is not supported with " + +"aggregation on a streaming DataFrame/Dataset") + } else if (collectStreamingAggregates(m).nonEmpty) { +throwError( + "flatMapGroupsWithState in append mode is not supported after " + +s"aggregation on a streaming DataFrame/Dataset") } +} } -// flatMapGroupsWithState(Update) with aggregation -case m: FlatMapGroupsWithState - if m.isStreaming && m.outputMode == InternalOutputModes.Update -&&
[GitHub] spark pull request #17361: [SPARK-20030][SS] Event-time-based timeout for Ma...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/17361#discussion_r107307722 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala --- @@ -147,49 +147,68 @@ object UnsupportedOperationChecker { throwError("Commands like CreateTable*, AlterTable*, Show* are not supported with " + "streaming DataFrames/Datasets") -// mapGroupsWithState: Allowed only when no aggregation + Update output mode -case m: FlatMapGroupsWithState if m.isStreaming && m.isMapGroupsWithState => - if (collectStreamingAggregates(plan).isEmpty) { -if (outputMode != InternalOutputModes.Update) { - throwError("mapGroupsWithState is not supported with " + -s"$outputMode output mode on a streaming DataFrame/Dataset") -} else { - // Allowed when no aggregation + Update output mode -} - } else { -throwError("mapGroupsWithState is not supported with aggregation " + - "on a streaming DataFrame/Dataset") - } - -// flatMapGroupsWithState without aggregation -case m: FlatMapGroupsWithState - if m.isStreaming && collectStreamingAggregates(plan).isEmpty => - m.outputMode match { -case InternalOutputModes.Update => - if (outputMode != InternalOutputModes.Update) { -throwError("flatMapGroupsWithState in update mode is not supported with " + +// mapGroupsWithState and flatMapGroupsWithState +case m: FlatMapGroupsWithState if m.isStreaming => + + // Check compatibility with output modes and aggregations in query + val aggsAfterFlatMapGroups = collectStreamingAggregates(plan) + + if (m.isMapGroupsWithState) { // check mapGroupsWithState +// allowed only in update query output mode and without aggregation +if (aggsAfterFlatMapGroups.nonEmpty) { + throwError( +"mapGroupsWithState is not supported with aggregation " + + "on a streaming DataFrame/Dataset") +} else if (outputMode != InternalOutputModes.Update) { + throwError( +"mapGroupsWithState is not supported with " + s"$outputMode output mode on a streaming DataFrame/Dataset") +} + } else { // check latMapGroupsWithState +if (aggsAfterFlatMapGroups.isEmpty) { + // flatMapGroupsWithState without aggregation: operation's output mode must + // match query output mode + m.outputMode match { +case InternalOutputModes.Update if outputMode != InternalOutputModes.Update => + throwError( +"flatMapGroupsWithState in update mode is not supported with " + + s"$outputMode output mode on a streaming DataFrame/Dataset") + +case InternalOutputModes.Append if outputMode != InternalOutputModes.Append => + throwError( +"flatMapGroupsWithState in append mode is not supported with " + + s"$outputMode output mode on a streaming DataFrame/Dataset") + +case _ => } -case InternalOutputModes.Append => - if (outputMode != InternalOutputModes.Append) { -throwError("flatMapGroupsWithState in append mode is not supported with " + - s"$outputMode output mode on a streaming DataFrame/Dataset") +} else { + // flatMapGroupsWithState with aggregation: update operation mode not allowed, and + // *groupsWithState after aggregation not allowed + if (m.outputMode == InternalOutputModes.Update) { +throwError( + "flatMapGroupsWithState in update mode is not supported with " + +"aggregation on a streaming DataFrame/Dataset") + } else if (collectStreamingAggregates(m).nonEmpty) { +throwError( + "flatMapGroupsWithState in append mode is not supported after " + +s"aggregation on a streaming DataFrame/Dataset") } +} } -// flatMapGroupsWithState(Update) with aggregation -case m: FlatMapGroupsWithState - if m.isStreaming && m.outputMode == InternalOutputModes.Update -&&
[GitHub] spark issue #17256: [SPARK-19919][SQL] Defer throwing the exception for empt...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17256 cc @cloud-fan, could you see if it sounds good? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17361: [SPARK-20030][SS] Event-time-based timeout for Ma...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/17361#discussion_r107307806 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/streaming/KeyedStateTimeout.java --- @@ -34,9 +32,20 @@ @InterfaceStability.Evolving public class KeyedStateTimeout { - /** Timeout based on processing time. */ + /** + * Timeout based on processing time. The duration of timeout can be set for each group in + * `map/flatMapGroupsWithState` by calling `KeyedState.setTimeoutDuration()`. + */ public static KeyedStateTimeout ProcessingTimeTimeout() { return ProcessingTimeTimeout$.MODULE$; } --- End diff -- I'd probably still remove it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17361: [SPARK-20030][SS] Event-time-based timeout for MapGroups...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17361 **[Test build #75010 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75010/testReport)** for PR 17361 at commit [`64b6abf`](https://github.com/apache/spark/commit/64b6abf89189fe8a3f3d2f58f341a9b58a95a6d2). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17256: [SPARK-19919][SQL] Defer throwing the exception f...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/17256 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17256: [SPARK-19919][SQL] Defer throwing the exception for empt...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17256 Thank you @cloud-fan. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17348: [SPARK-20018][SQL] Pivot with timestamp and count...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17348#discussion_r107311172 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -486,14 +486,16 @@ class Analyzer( case Pivot(groupByExprs, pivotColumn, pivotValues, aggregates, child) => val singleAgg = aggregates.size == 1 def outputName(value: Literal, aggregate: Expression): String = { + val scalaValue = CatalystTypeConverters.convertToScala(value.value, value.dataType) + val stringValue = Option(scalaValue).getOrElse("null").toString --- End diff -- The impact is not only on the data type `timestamp`. Any test case to cover `null`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17375: [SPARK-19019][PYTHON][BRANCH-1.6] Fix hijacked `collecti...
Github user davies commented on the issue: https://github.com/apache/spark/pull/17375 lgtm --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17374: [SPARK-19019][PYTHON][BRANCH-2.0] Fix hijacked `collecti...
Github user davies commented on the issue: https://github.com/apache/spark/pull/17374 lgtm --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11119: [SPARK-10780][ML] Add an initial model to kmeans
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/9 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75004/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11119: [SPARK-10780][ML] Add an initial model to kmeans
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/9 **[Test build #75004 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75004/consoleFull)** for PR 9 at commit [`6f169eb`](https://github.com/apache/spark/commit/6f169ebf8c0c832010d2dbd8f971cfabff7870f2). * This patch passes all tests. * This patch **does not merge cleanly**. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11119: [SPARK-10780][ML] Add an initial model to kmeans
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/9 Build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17166 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75005/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17166 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15009: [SPARK-17443][SPARK-11035] Stop Spark Application if lau...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15009 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15009: [SPARK-17443][SPARK-11035] Stop Spark Application if lau...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15009 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75006/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16781: [SPARK-12297][SQL] Hive compatibility for Parquet Timest...
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/16781 I checked HIVE-12767 and reviewed this pr roughly. And I collected my thoughts about the desired behavior of this issue, please correct me if I'm wrong. when creating table: - if `SQLConf.PARQUET_TABLE_INCLUDE_TIMEZONE` == true - if a property `PARQUET_TIMEZONE_TABLE_PROPERTY` exists - include the table property `PARQUET_TIMEZONE_TABLE_PROPERTY` - use the `PARQUET_TIMEZONE_TABLE_PROPERTY` value - else - include the table property `PARQUET_TIMEZONE_TABLE_PROPERTY` - use session local timezone as the default `PARQUET_TIMEZONE_TABLE_PROPERTY` value - else - if a property `PARQUET_TIMEZONE_TABLE_PROPERTY` exists - include the table property `PARQUET_TIMEZONE_TABLE_PROPERTY` - use the `PARQUET_TIMEZONE_TABLE_PROPERTY` value - else - don't include table property `PARQUET_TIMEZONE_TABLE_PROPERTY` when writing/reading data: - if a table property `PARQUET_TIMEZONE_TABLE_PROPERTY` exists - use the `PARQUET_TIMEZONE_TABLE_PROPERTY` value to adjust timezone - else - don't adjust timezone Timezone related expressions respect session local timezone now, so we should also use session local timezone as the default value of `PARQUET_TIMEZONE_TABLE_PROPERTY` instead of system timezone, i.e. use `sparkSession.sessionState.conf.sessionLocalTimeZone` instead of `TimeZone.getDefault()`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17302: [SPARK-19959][SQL] Fix to throw NullPointerExcept...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/17302#discussion_r107314977 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala --- @@ -41,7 +41,20 @@ object CatalystSerde { } def generateObjAttr[T : Encoder]: Attribute = { -AttributeReference("obj", encoderFor[T].deserializer.dataType, nullable = false)() +val deserializer = encoderFor[T].deserializer +val dataType = deserializer.dataType +val nullable = if (deserializer.childrenResolved) { --- End diff -- The point is whether `deserializer` is resolved or not. It is not a point whether `encoderFor[T]` is resolved or not. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17381: [SPARK-20023] [SQL] Output table comment for DESC FORMAT...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/17381 cc @cloud-fan @wzhfy --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13932: [SPARK-15354] [CORE] Topology aware block replica...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13932#discussion_r107316810 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockReplicationPolicy.scala --- @@ -88,26 +131,94 @@ class RandomBlockReplicationPolicy logDebug(s"Prioritized peers : ${prioritizedPeers.mkString(", ")}") prioritizedPeers } +} + +@DeveloperApi +class BasicBlockReplicationPolicy + extends BlockReplicationPolicy +with Logging { - // scalastyle:off line.size.limit /** - * Uses sampling algorithm by Robert Floyd. Finds a random sample in O(n) while - * minimizing space usage. Please see http://math.stackexchange.com/questions/178690/whats-the-proof-of-correctness-for-robert-floyds-algorithm-for-selecting-a-sin;> - * here. + * Method to prioritize a bunch of candidate peers of a block manager. This implementation + * replicates the behavior of block replication in HDFS, a peer is chosen within the rack, + * one outside and that's it. This works best with a total replication factor of 3. * - * @param n total number of indices - * @param m number of samples needed - * @param r random number generator - * @return list of m random unique indices + * @param blockManagerIdId of the current BlockManager for self identification + * @param peers A list of peers of a BlockManager + * @param peersReplicatedTo Set of peers already replicated to + * @param blockId BlockId of the block being replicated. This can be used as a source of + * randomness if needed. + * @param numReplicas Number of peers we need to replicate to + * @return A prioritized list of peers. Lower the index of a peer, higher its priority */ - // scalastyle:on line.size.limit - private def getSampleIds(n: Int, m: Int, r: Random): List[Int] = { -val indices = (n - m + 1 to n).foldLeft(Set.empty[Int]) {case (set, i) => - val t = r.nextInt(i) + 1 - if (set.contains(t)) set + i else set + t + override def prioritize( + blockManagerId: BlockManagerId, + peers: Seq[BlockManagerId], + peersReplicatedTo: mutable.HashSet[BlockManagerId], + blockId: BlockId, + numReplicas: Int): List[BlockManagerId] = { + +logDebug(s"Input peers : $peers") +logDebug(s"BlockManagerId : $blockManagerId") + +val random = new Random(blockId.hashCode) + +// if block doesn't have topology info, we can't do much, so we randlomly shuffle +// if there is, we see what's needed from peersReplicatedTo and based on numReplicas, +// we choose whats needed +if (blockManagerId.topologyInfo.isEmpty || numReplicas == 0) { + // no topology info for the block. The best we can do is randomly choose peers + BlockReplicationUtils.getRandomSample(peers, numReplicas, random) +} else { + // we have topology information, we see what is left to be done from peersReplicatedTo + val doneWithinRack = peersReplicatedTo.exists(_.topologyInfo == blockManagerId.topologyInfo) + val doneOutsideRack = peersReplicatedTo.exists { p => --- End diff -- calculate the `inRackPeers` and `outOfRackPeers` here to reduce duplicated code --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17382: [SPARK-20051][SS] Fix StreamSuite flaky test - recover f...
Github user tdas commented on the issue: https://github.com/apache/spark/pull/17382 Scala tests have passed. I am merging this to unblock other PRs. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17302: [SPARK-19959][SQL] Fix to throw NullPointerExcept...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/17302#discussion_r107317074 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala --- @@ -41,7 +41,20 @@ object CatalystSerde { } def generateObjAttr[T : Encoder]: Attribute = { -AttributeReference("obj", encoderFor[T].deserializer.dataType, nullable = false)() +val deserializer = encoderFor[T].deserializer +val dataType = deserializer.dataType +val nullable = if (deserializer.childrenResolved) { --- End diff -- In the above context, I confirmed that `encodeFor[T]` is not resolved and `encodeFor[T].deserializer` is resolved. Do you want to see my log output? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17250: [SPARK-19911][STREAMING] Add builder interface fo...
Github user budde commented on a diff in the pull request: https://github.com/apache/spark/pull/17250#discussion_r107318150 --- Diff: external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisInputDStream.scala --- @@ -71,7 +75,256 @@ private[kinesis] class KinesisInputDStream[T: ClassTag]( override def getReceiver(): Receiver[T] = { new KinesisReceiver(streamName, endpointUrl, regionName, initialPositionInStream, - checkpointAppName, checkpointInterval, storageLevel, messageHandler, - kinesisCredsProvider) + checkpointAppName, checkpointInterval, _storageLevel, messageHandler, + kinesisCredsProvider, dynamoDBCredsProvider, cloudWatchCredsProvider) } } + +@InterfaceStability.Stable +object KinesisInputDStream { + /** + * Builder for [[KinesisInputDStream]] instances. + * + * @since 2.2.0 + */ + @InterfaceStability.Stable + class Builder { +// Required params +private var streamingContext: Option[StreamingContext] = None +private var streamName: Option[String] = None +private var checkpointAppName: Option[String] = None + +// Params with defaults +private var endpointUrl: Option[String] = None +private var regionName: Option[String] = None +private var initialPositionInStream: Option[InitialPositionInStream] = None +private var checkpointInterval: Option[Duration] = None +private var storageLevel: Option[StorageLevel] = None +private var kinesisCredsProvider: Option[SerializableCredentialsProvider] = None +private var dynamoDBCredsProvider: Option[SerializableCredentialsProvider] = None +private var cloudWatchCredsProvider: Option[SerializableCredentialsProvider] = None + +/** + * Sets the StreamingContext that will be used to construct the Kinesis DStream. This is a + * required parameter. + * + * @param ssc [[StreamingContext]] used to construct Kinesis DStreams + * @return Reference to this [[KinesisInputDStream.Builder]] + */ +def streamingContext(ssc: StreamingContext): Builder = { + streamingContext = Option(ssc) + this +} + +/** + * Sets the StreamingContext that will be used to construct the Kinesis DStream. This is a + * required parameter. + * + * @param jssc [[JavaStreamingContext]] used to construct Kinesis DStreams + * @return Reference to this [[KinesisInputDStream.Builder]] + */ +def streamingContext(jssc: JavaStreamingContext): Builder = { + streamingContext = Option(jssc.ssc) + this +} + +/** + * Sets the name of the Kinesis stream that the DStream will read from. This is a required + * parameter. + * + * @param streamName Name of Kinesis stream that the DStream will read from + * @return Reference to this [[KinesisInputDStream.Builder]] + */ +def streamName(streamName: String): Builder = { + this.streamName = Option(streamName) + this +} + +/** + * Sets the KCL application name to use when checkpointing state to DynamoDB. This is a + * required parameter. + * + * @param appName Value to use for the KCL app name (used when creating the DynamoDB checkpoint + *table and when writing metrics to CloudWatch) + * @return Reference to this [[KinesisInputDStream.Builder]] + */ +def checkpointAppName(appName: String): Builder = { + checkpointAppName = Option(appName) + this +} + +/** + * Sets the AWS Kinesis endpoint URL. Defaults to "https://kinesis.us-east-1.amazonaws.com; if + * no custom value is specified + * + * @param url Kinesis endpoint URL to use + * @return Reference to this [[KinesisInputDStream.Builder]] + */ +def endpointUrl(url: String): Builder = { + endpointUrl = Option(url) + this +} + +/** + * Sets the AWS region to construct clients for. Defaults to "us-east-1" if no custom value + * is specified. + * + * @param regionName Name of AWS region to use (e.g. "us-west-2") + * @return Reference to this [[KinesisInputDStream.Builder]] + */ +def regionName(regionName: String): Builder = { + this.regionName = Option(regionName) + this +} + +/** + * Sets the initial position data is read from in the Kinesis stream. Defaults to + * [[InitialPositionInStream.LATEST]] if no custom value is specified. + * + * @param initialPosition InitialPositionInStream value specifying where Spark Streaming
[GitHub] spark pull request #17250: [SPARK-19911][STREAMING] Add builder interface fo...
Github user budde commented on a diff in the pull request: https://github.com/apache/spark/pull/17250#discussion_r107318178 --- Diff: external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisInputDStream.scala --- @@ -71,7 +75,256 @@ private[kinesis] class KinesisInputDStream[T: ClassTag]( override def getReceiver(): Receiver[T] = { new KinesisReceiver(streamName, endpointUrl, regionName, initialPositionInStream, - checkpointAppName, checkpointInterval, storageLevel, messageHandler, - kinesisCredsProvider) + checkpointAppName, checkpointInterval, _storageLevel, messageHandler, + kinesisCredsProvider, dynamoDBCredsProvider, cloudWatchCredsProvider) } } + +@InterfaceStability.Stable --- End diff -- Will do --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17250: [SPARK-19911][STREAMING] Add builder interface fo...
Github user budde commented on a diff in the pull request: https://github.com/apache/spark/pull/17250#discussion_r107318157 --- Diff: external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisInputDStream.scala --- @@ -71,7 +75,256 @@ private[kinesis] class KinesisInputDStream[T: ClassTag]( override def getReceiver(): Receiver[T] = { new KinesisReceiver(streamName, endpointUrl, regionName, initialPositionInStream, - checkpointAppName, checkpointInterval, storageLevel, messageHandler, - kinesisCredsProvider) + checkpointAppName, checkpointInterval, _storageLevel, messageHandler, + kinesisCredsProvider, dynamoDBCredsProvider, cloudWatchCredsProvider) } } + +@InterfaceStability.Stable +object KinesisInputDStream { + /** + * Builder for [[KinesisInputDStream]] instances. + * + * @since 2.2.0 + */ + @InterfaceStability.Stable + class Builder { +// Required params +private var streamingContext: Option[StreamingContext] = None +private var streamName: Option[String] = None +private var checkpointAppName: Option[String] = None + +// Params with defaults +private var endpointUrl: Option[String] = None +private var regionName: Option[String] = None +private var initialPositionInStream: Option[InitialPositionInStream] = None +private var checkpointInterval: Option[Duration] = None +private var storageLevel: Option[StorageLevel] = None +private var kinesisCredsProvider: Option[SerializableCredentialsProvider] = None +private var dynamoDBCredsProvider: Option[SerializableCredentialsProvider] = None +private var cloudWatchCredsProvider: Option[SerializableCredentialsProvider] = None + +/** + * Sets the StreamingContext that will be used to construct the Kinesis DStream. This is a + * required parameter. + * + * @param ssc [[StreamingContext]] used to construct Kinesis DStreams + * @return Reference to this [[KinesisInputDStream.Builder]] + */ +def streamingContext(ssc: StreamingContext): Builder = { + streamingContext = Option(ssc) + this +} + +/** + * Sets the StreamingContext that will be used to construct the Kinesis DStream. This is a + * required parameter. + * + * @param jssc [[JavaStreamingContext]] used to construct Kinesis DStreams + * @return Reference to this [[KinesisInputDStream.Builder]] + */ +def streamingContext(jssc: JavaStreamingContext): Builder = { + streamingContext = Option(jssc.ssc) + this +} + +/** + * Sets the name of the Kinesis stream that the DStream will read from. This is a required + * parameter. + * + * @param streamName Name of Kinesis stream that the DStream will read from + * @return Reference to this [[KinesisInputDStream.Builder]] + */ +def streamName(streamName: String): Builder = { + this.streamName = Option(streamName) + this +} + +/** + * Sets the KCL application name to use when checkpointing state to DynamoDB. This is a + * required parameter. + * + * @param appName Value to use for the KCL app name (used when creating the DynamoDB checkpoint + *table and when writing metrics to CloudWatch) + * @return Reference to this [[KinesisInputDStream.Builder]] + */ +def checkpointAppName(appName: String): Builder = { + checkpointAppName = Option(appName) + this +} + +/** + * Sets the AWS Kinesis endpoint URL. Defaults to "https://kinesis.us-east-1.amazonaws.com; if + * no custom value is specified + * + * @param url Kinesis endpoint URL to use + * @return Reference to this [[KinesisInputDStream.Builder]] + */ +def endpointUrl(url: String): Builder = { + endpointUrl = Option(url) + this +} + +/** + * Sets the AWS region to construct clients for. Defaults to "us-east-1" if no custom value + * is specified. + * + * @param regionName Name of AWS region to use (e.g. "us-west-2") + * @return Reference to this [[KinesisInputDStream.Builder]] + */ +def regionName(regionName: String): Builder = { + this.regionName = Option(regionName) + this +} + +/** + * Sets the initial position data is read from in the Kinesis stream. Defaults to + * [[InitialPositionInStream.LATEST]] if no custom value is specified. + * + * @param initialPosition InitialPositionInStream value specifying where Spark Streaming
[GitHub] spark pull request #17382: [SPARK-20051][SS] Fix StreamSuite flaky test - re...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/17382 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17250: [SPARK-19911][STREAMING] Add builder interface fo...
Github user budde commented on a diff in the pull request: https://github.com/apache/spark/pull/17250#discussion_r107318164 --- Diff: external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/SerializableCredentialsProvider.scala --- @@ -83,3 +84,146 @@ private[kinesis] final case class STSCredentialsProvider( } } } + +@InterfaceStability.Stable --- End diff -- For sure. Honestly I just cribbed these annotations from ```SparkSession.Builder``` so I appreciate you letting me know what the proper convention is. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17250: [SPARK-19911][STREAMING] Add builder interface fo...
Github user budde commented on a diff in the pull request: https://github.com/apache/spark/pull/17250#discussion_r107318994 --- Diff: external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/SerializableCredentialsProvider.scala --- @@ -83,3 +84,146 @@ private[kinesis] final case class STSCredentialsProvider( } } } + +@InterfaceStability.Stable +object SerializableCredentialsProvider { + /** + * Builder for [[SerializableCredentialsProvider]] instances. + * + * @since 2.2.0 + */ + @InterfaceStability.Stable + class Builder { +private var awsAccessKeyId: Option[String] = None +private var awsSecretKey: Option[String] = None +private var stsRoleArn: Option[String] = None +private var stsSessionName: Option[String] = None +private var stsExternalId: Option[String] = None + + +/** + * Sets the AWS access key ID when using a basic AWS keypair for long-lived authorization + * credentials. A value must also be provided for the AWS secret key. + * + * @param accessKeyId AWS access key ID + * @return Reference to this [[SerializableCredentialsProvider.Builder]] + */ +def awsAccessKeyId(accessKeyId: String): Builder = { --- End diff -- I'll rework this builder to take multiple arguments for the long-lived keypair and STS --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17348: [SPARK-20018][SQL] Pivot with timestamp and count...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17348#discussion_r107318998 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -486,14 +486,16 @@ class Analyzer( case Pivot(groupByExprs, pivotColumn, pivotValues, aggregates, child) => val singleAgg = aggregates.size == 1 def outputName(value: Literal, aggregate: Expression): String = { + val utf8val = Cast(value, StringType, Some(conf.sessionLocalTimeZone)).eval(EmptyRow) --- End diff -- It seems we can cast into `StringType` in all the ways - https://github.com/apache/spark/blob/e9e2c612d58a19ddcb4b6abfb7389a4b0f7ef6f8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L41 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17250: [SPARK-19911][STREAMING] Add builder interface fo...
Github user budde commented on a diff in the pull request: https://github.com/apache/spark/pull/17250#discussion_r107319004 --- Diff: external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/SerializableCredentialsProvider.scala --- @@ -83,3 +84,146 @@ private[kinesis] final case class STSCredentialsProvider( } } } + +@InterfaceStability.Stable +object SerializableCredentialsProvider { + /** + * Builder for [[SerializableCredentialsProvider]] instances. + * + * @since 2.2.0 + */ + @InterfaceStability.Stable + class Builder { +private var awsAccessKeyId: Option[String] = None +private var awsSecretKey: Option[String] = None +private var stsRoleArn: Option[String] = None +private var stsSessionName: Option[String] = None +private var stsExternalId: Option[String] = None + + +/** + * Sets the AWS access key ID when using a basic AWS keypair for long-lived authorization + * credentials. A value must also be provided for the AWS secret key. + * + * @param accessKeyId AWS access key ID + * @return Reference to this [[SerializableCredentialsProvider.Builder]] + */ +def awsAccessKeyId(accessKeyId: String): Builder = { + awsAccessKeyId = Option(accessKeyId) + this +} + +/** + * Sets the AWS secret key when using a basic AWS keypair for long-lived authorization + * credentials. A value must also be provided for the AWS access key ID. + * + * @param secretKey AWS secret key + * @return Reference to this [[SerializableCredentialsProvider.Builder]] + */ +def awsSecretKey(secretKey: String): Builder = { + awsSecretKey = Option(secretKey) + this +} + +/** + * Sets the ARN of the IAM role to assume when using AWS STS for temporary session-based + * authentication. A value must also be provided for the STS session name. + * + * @param roleArn ARN of IAM role to assume via STS + * @return Reference to this [[SerializableCredentialsProvider.Builder]] + */ +def stsRoleArn(roleArn: String): Builder = { --- End diff -- Will do --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17348: [SPARK-20018][SQL] Pivot with timestamp and count...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17348#discussion_r107319664 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -486,14 +486,16 @@ class Analyzer( case Pivot(groupByExprs, pivotColumn, pivotValues, aggregates, child) => val singleAgg = aggregates.size == 1 def outputName(value: Literal, aggregate: Expression): String = { + val utf8Value = Cast(value, StringType, Some(conf.sessionLocalTimeZone)).eval(EmptyRow) --- End diff -- BTW, is this a correct way for handling timezone - @ueshin ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17382: [SPARK-20051][SS] Fix StreamSuite flaky test - recover f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17382 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75014/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17382: [SPARK-20051][SS] Fix StreamSuite flaky test - recover f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17382 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17382: [SPARK-20051][SS] Fix StreamSuite flaky test - recover f...
Github user kunalkhamar commented on the issue: https://github.com/apache/spark/pull/17382 @tdas all tests passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17379: [SPARK-20048][SQL] Cloning SessionState does not clone q...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17379 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75015/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17379: [SPARK-20048][SQL] Cloning SessionState does not clone q...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17379 **[Test build #75015 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75015/testReport)** for PR 17379 at commit [`ad77fe9`](https://github.com/apache/spark/commit/ad77fe9ad258eac224f069bbc89294818ee6b549). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13932: [SPARK-15354] [CORE] Topology aware block replication st...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/13932 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13932: [SPARK-15354] [CORE] Topology aware block replication st...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13932 **[Test build #75020 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75020/testReport)** for PR 13932 at commit [`ec601bd`](https://github.com/apache/spark/commit/ec601bd1e619a3f6f35597753954b82536de6bc9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17358: [SPARK-20027][DOCS] Compilation fix in java docs.
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17358 If my suggestion is the reason to hold off this PR currently and you are not sure of it, I am fine. I can double-check and deal with this in a separate PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17302: [SPARK-19959][SQL] Fix to throw NullPointerExcept...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17302#discussion_r107323158 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala --- @@ -41,7 +41,20 @@ object CatalystSerde { } def generateObjAttr[T : Encoder]: Attribute = { -AttributeReference("obj", encoderFor[T].deserializer.dataType, nullable = false)() +val deserializer = encoderFor[T].deserializer +val dataType = deserializer.dataType +val nullable = if (deserializer.childrenResolved) { --- End diff -- I know `resolveAndBind` will resolve it. But looks at the code snippet above: val deserializer = encoderFor[T].deserializer val dataType = deserializer.dataType val nullable = if (deserializer.childrenResolved) { // or deserializer.resolved I don't see you have a chance to call `resolveAndBind` here... `encoderFor[T]` will call `assertUnresolved` on the encoder to return, it is implemented like: def assertUnresolved(): Unit = { (deserializer +: serializer).foreach(_.foreach { case a: AttributeReference if a.name != "loopVar" => sys.error(s"Unresolved encoder expected, but $a was found.") case _ => }) } It makes sure that both `deserializer` and `serializer` are unresolved. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17295: [SPARK-19556][core] Do not encrypt block manager ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17295#discussion_r107323085 --- Diff: core/src/main/scala/org/apache/spark/security/CryptoStreamUtils.scala --- @@ -48,12 +51,30 @@ private[spark] object CryptoStreamUtils extends Logging { os: OutputStream, sparkConf: SparkConf, key: Array[Byte]): OutputStream = { -val properties = toCryptoConf(sparkConf) -val iv = createInitializationVector(properties) +val params = new CryptoParams(key, sparkConf) +val iv = createInitializationVector(params.conf) os.write(iv) -val transformationStr = sparkConf.get(IO_CRYPTO_CIPHER_TRANSFORMATION) -new CryptoOutputStream(transformationStr, properties, os, - new SecretKeySpec(key, "AES"), new IvParameterSpec(iv)) +new CryptoOutputStream(params.transformation, params.conf, os, params.keySpec, + new IvParameterSpec(iv)) + } + + /** + * Wrap a `WritableByteChannel` for encryption. + */ + def createWritableChannel( + channel: WritableByteChannel, + sparkConf: SparkConf, + key: Array[Byte]): WritableByteChannel = { +val params = new CryptoParams(key, sparkConf) +val iv = createInitializationVector(params.conf) +val buf = ByteBuffer.wrap(iv) +while (buf.hasRemaining()) { --- End diff -- is there any possibility this may be an infinite loop? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15628: [SPARK-17471][ML] Add compressed method to ML matrices
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15628 **[Test build #75016 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75016/testReport)** for PR 15628 at commit [`17dec63`](https://github.com/apache/spark/commit/17dec637d33c997f724e06ab28487f1e9441394c). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17014: [SPARK-18608][ML] Fix double-caching in ML algorithms
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/17014 ping @hhbyyh I updated the PR, can you please help reviewing this? Thank in advance. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17295: [SPARK-19556][core] Do not encrypt block manager ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17295#discussion_r107323833 --- Diff: core/src/main/scala/org/apache/spark/serializer/SerializerManager.scala --- @@ -167,30 +167,26 @@ private[spark] class SerializerManager( val byteStream = new BufferedOutputStream(outputStream) val autoPick = !blockId.isInstanceOf[StreamBlockId] val ser = getSerializer(implicitly[ClassTag[T]], autoPick).newInstance() -ser.serializeStream(wrapStream(blockId, byteStream)).writeAll(values).close() --- End diff -- the `wrapStream` and `wrapForEncryption` methods can be removed from this class --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17295: [SPARK-19556][core] Do not encrypt block manager data in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17295 **[Test build #74991 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74991/testReport)** for PR 17295 at commit [`107e3e7`](https://github.com/apache/spark/commit/107e3e72e81d2c7813d832d3e9c2beab89e01379). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17166 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17302: [SPARK-19959][SQL] Fix to throw NullPointerException in ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17302 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74992/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17166 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74993/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17336: [SPARK-20003] [ML] FPGrowthModel setMinConfidence should...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17336 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17336: [SPARK-20003] [ML] FPGrowthModel setMinConfidence should...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17336 **[Test build #74995 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74995/testReport)** for PR 17336 at commit [`9c046c3`](https://github.com/apache/spark/commit/9c046c3bb8dfd6dd0fa2799d434a4f92cbb1b802). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17295: [SPARK-19556][core] Do not encrypt block manager data in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17295 **[Test build #74997 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74997/testReport)** for PR 17295 at commit [`6bda670`](https://github.com/apache/spark/commit/6bda6701bf0c266047a5fa81fd29f4fb826728c7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org