[GitHub] [spark] AmplabJenkins commented on pull request #30139: [SPARK-31069][CORE] high cpu caused by chunksBeingTransferred in external shuffle service
AmplabJenkins commented on pull request #30139: URL: https://github.com/apache/spark/pull/30139#issuecomment-717065671 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu opened a new pull request #30157: [WIP][SPARK-33228][SQL][2.4] Don't uncache data when replacing a view having the same logical plan
maropu opened a new pull request #30157: URL: https://github.com/apache/spark/pull/30157 ### What changes were proposed in this pull request? SPARK-30494's updated the `CreateViewCommand` code to implicitly drop cache when replacing an existing view. But, this change drops cache even when replacing a view having the same logical plan. A sequence of queries to reproduce this as follows; ``` // Spark v2.4.6+ scala> val df = spark.range(1).selectExpr("id a", "id b") scala> df.cache() scala> df.explain() == Physical Plan == *(1) ColumnarToRow +- InMemoryTableScan [a#2L, b#3L] +- InMemoryRelation [a#2L, b#3L], StorageLevel(disk, memory, deserialized, 1 replicas) +- *(1) Project [id#0L AS a#2L, id#0L AS b#3L] +- *(1) Range (0, 1, step=1, splits=4) scala> df.createOrReplaceTempView("t") scala> sql("select * from t").explain() == Physical Plan == *(1) ColumnarToRow +- InMemoryTableScan [a#2L, b#3L] +- InMemoryRelation [a#2L, b#3L], StorageLevel(disk, memory, deserialized, 1 replicas) +- *(1) Project [id#0L AS a#2L, id#0L AS b#3L] +- *(1) Range (0, 1, step=1, splits=4) // If one re-runs the same query `df.createOrReplaceTempView("t")`, the cache's swept away scala> df.createOrReplaceTempView("t") scala> sql("select * from t").explain() == Physical Plan == *(1) Project [id#0L AS a#2L, id#0L AS b#3L] +- *(1) Range (0, 1, step=1, splits=4) // Until v2.4.6 scala> val df = spark.range(1).selectExpr("id a", "id b") scala> df.cache() scala> df.createOrReplaceTempView("t") scala> sql("select * from t").explain() 20/10/23 22:33:42 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException == Physical Plan == *(1) InMemoryTableScan [a#2L, b#3L] +- InMemoryRelation [a#2L, b#3L], StorageLevel(disk, memory, deserialized, 1 replicas) +- *(1) Project [id#0L AS a#2L, id#0L AS b#3L] +- *(1) Range (0, 1, step=1, splits=4) scala> df.createOrReplaceTempView("t") scala> sql("select * from t").explain() == Physical Plan == *(1) InMemoryTableScan [a#2L, b#3L] +- InMemoryRelation [a#2L, b#3L], StorageLevel(disk, memory, deserialized, 1 replicas) +- *(1) Project [id#0L AS a#2L, id#0L AS b#3L] +- *(1) Range (0, 1, step=1, splits=4) ``` ### Why are the changes needed? bugfix. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Added tests. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30139: [SPARK-31069][CORE] high cpu caused by chunksBeingTransferred in external shuffle service
SparkQA commented on pull request #30139: URL: https://github.com/apache/spark/pull/30139#issuecomment-717065639 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34921/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction
SparkQA commented on pull request #29800: URL: https://github.com/apache/spark/pull/29800#issuecomment-717064724 **[Test build #130322 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130322/testReport)** for PR 29800 at commit [`ca574d9`](https://github.com/apache/spark/commit/ca574d993939db573aa1b5e6692b9ae56df0aea7). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30097: [SPARK-33140][SQL] remove SQLConf and SparkSession in all sub-class of Rule[QueryPlan]
SparkQA commented on pull request #30097: URL: https://github.com/apache/spark/pull/30097#issuecomment-717063665 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34923/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction
beliefer commented on a change in pull request #29800: URL: https://github.com/apache/spark/pull/29800#discussion_r512481955 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala ## @@ -151,10 +169,69 @@ final class OffsetWindowFunctionFrame( } inputIndex += 1 } +} - override def currentLowerBound(): Int = throw new UnsupportedOperationException() +/** + * The unbounded offset window frame calculates frames containing NTH_VALUE statements. + * The unbounded offset window frame return the same value for all rows in the window partition. + */ +class UnboundedOffsetWindowFunctionFrame( +target: InternalRow, +ordinal: Int, +expressions: Array[OffsetWindowSpec], +inputSchema: Seq[Attribute], +newMutableProjection: (Seq[Expression], Seq[Attribute]) => MutableProjection, +offset: Int) + extends OffsetWindowFunctionFrameBase( +target, ordinal, expressions, inputSchema, newMutableProjection, offset) { - override def currentUpperBound(): Int = throw new UnsupportedOperationException() + override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = { +super.prepare(rows) +if (inputIndex >= 0 && inputIndex < input.length) { + val r = WindowFunctionFrame.getNextOrNull(inputIterator) Review comment: OK This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction
beliefer commented on a change in pull request #29800: URL: https://github.com/apache/spark/pull/29800#discussion_r512478210 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowExecBase.scala ## @@ -171,18 +178,42 @@ trait WindowExecBase extends UnaryExecNode { // Create the factory to produce WindowFunctionFrame. val factory = key match { - // Offset Frame - case ("OFFSET", _, IntegerLiteral(offset), _) => + // Frameless offset Frame + case ("FRAME_LESS_OFFSET", _, IntegerLiteral(offset), _) => target: InternalRow => - new OffsetWindowFunctionFrame( + new FrameLessOffsetWindowFunctionFrame( target, ordinal, // OFFSET frame functions are guaranteed be OffsetWindowFunctions. Review comment: OK ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowExecBase.scala ## @@ -171,18 +178,42 @@ trait WindowExecBase extends UnaryExecNode { // Create the factory to produce WindowFunctionFrame. val factory = key match { - // Offset Frame - case ("OFFSET", _, IntegerLiteral(offset), _) => + // Frameless offset Frame + case ("FRAME_LESS_OFFSET", _, IntegerLiteral(offset), _) => target: InternalRow => - new OffsetWindowFunctionFrame( + new FrameLessOffsetWindowFunctionFrame( target, ordinal, // OFFSET frame functions are guaranteed be OffsetWindowFunctions. -functions.map(_.asInstanceOf[OffsetWindowFunction]), +functions.map(_.asInstanceOf[OffsetWindowSpec]), child.output, (expressions, schema) => MutableProjection.create(expressions, schema), offset) + case ("UNBOUNDED_OFFSET", _, IntegerLiteral(offset), _) => +target: InternalRow => { + new UnboundedOffsetWindowFunctionFrame( +target, +ordinal, +// OFFSET frame functions are guaranteed be OffsetWindowFunctions. Review comment: OK ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowExecBase.scala ## @@ -171,18 +178,42 @@ trait WindowExecBase extends UnaryExecNode { // Create the factory to produce WindowFunctionFrame. val factory = key match { - // Offset Frame - case ("OFFSET", _, IntegerLiteral(offset), _) => + // Frameless offset Frame + case ("FRAME_LESS_OFFSET", _, IntegerLiteral(offset), _) => target: InternalRow => - new OffsetWindowFunctionFrame( + new FrameLessOffsetWindowFunctionFrame( target, ordinal, // OFFSET frame functions are guaranteed be OffsetWindowFunctions. -functions.map(_.asInstanceOf[OffsetWindowFunction]), +functions.map(_.asInstanceOf[OffsetWindowSpec]), child.output, (expressions, schema) => MutableProjection.create(expressions, schema), offset) + case ("UNBOUNDED_OFFSET", _, IntegerLiteral(offset), _) => +target: InternalRow => { + new UnboundedOffsetWindowFunctionFrame( +target, +ordinal, +// OFFSET frame functions are guaranteed be OffsetWindowFunctions. +functions.map(_.asInstanceOf[OffsetWindowSpec]), +child.output, +(expressions, schema) => + MutableProjection.create(expressions, schema), +offset - 1) +} + case ("UNBOUNDED_PRECEDING_OFFSET", _, IntegerLiteral(offset), _) => +target: InternalRow => { + new UnboundedPrecedingOffsetWindowFunctionFrame( +target, +ordinal, +// OFFSET frame functions are guaranteed be OffsetWindowFunctions. Review comment: OK This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction
beliefer commented on a change in pull request #29800: URL: https://github.com/apache/spark/pull/29800#discussion_r512477822 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowExec.scala ## @@ -58,7 +58,7 @@ import org.apache.spark.sql.types.{CalendarIntervalType, DateType, IntegerType, * 4. 1 PRECEDING AND 1 FOLLOWING * 5. 1 FOLLOWING AND 2 FOLLOWING * - Offset frame: The frame consist of one row, which is an offset number of rows away from the - * current row. Only [[OffsetWindowFunction]]s can be processed in an offset frame. + * current row. Only [[FrameLessOffsetWindowFunction]]s can be processed in an offset frame. Review comment: I will update the doc. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30156: [SPARK-33248][SQL] Add a configuration to control the legacy behavior of whether need to pad null value when value size less then schema size
SparkQA commented on pull request #30156: URL: https://github.com/apache/spark/pull/30156#issuecomment-717055292 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34922/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30139: [SPARK-31069][CORE] high cpu caused by chunksBeingTransferred in external shuffle service
SparkQA commented on pull request #30139: URL: https://github.com/apache/spark/pull/30139#issuecomment-717051971 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34921/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction
AmplabJenkins removed a comment on pull request #29800: URL: https://github.com/apache/spark/pull/29800#issuecomment-717049528 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/34920/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction
AmplabJenkins removed a comment on pull request #29800: URL: https://github.com/apache/spark/pull/29800#issuecomment-717049518 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction
SparkQA commented on pull request #29800: URL: https://github.com/apache/spark/pull/29800#issuecomment-717049483 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34920/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction
AmplabJenkins commented on pull request #29800: URL: https://github.com/apache/spark/pull/29800#issuecomment-717049518 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction
beliefer commented on a change in pull request #29800: URL: https://github.com/apache/spark/pull/29800#discussion_r512468539 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala ## @@ -355,6 +344,36 @@ abstract class OffsetWindowFunction */ val offset: Expression + /** + * Default result value for the function when the `offset`th row does not exist. + */ + val default: Expression + + /** + * An optional specification that indicates the offset window function should skip null values in + * the determination of which row to use. + */ + val ignoreNulls: Boolean + + /** + * Whether the offset is starts with the current row. If `isRelative` is true, `offset` means + * the offset is start with the current row. otherwise, the offset is starts with the first + * row of the entire window frame. + */ + val isRelative: Boolean + + lazy val fakeFrame = SpecifiedWindowFrame(RowFrame, offset, offset) +} + +/** + * A frameless offset window function is a window function that cannot specify window frame and + * returns the value of the input column offset by a number of rows within the partition. Review comment: I make a mistake. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29882: [SPARK-33008][SQL] Division by zero on divide-like operations returns incorrect result
AmplabJenkins removed a comment on pull request #29882: URL: https://github.com/apache/spark/pull/29882#issuecomment-717048147 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29882: [SPARK-33008][SQL] Division by zero on divide-like operations returns incorrect result
AmplabJenkins commented on pull request #29882: URL: https://github.com/apache/spark/pull/29882#issuecomment-717048147 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29882: [SPARK-33008][SQL] Division by zero on divide-like operations returns incorrect result
SparkQA commented on pull request #29882: URL: https://github.com/apache/spark/pull/29882#issuecomment-717048124 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34919/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30139: [SPARK-31069][CORE] high cpu caused by chunksBeingTransferred in external shuffle service
AmplabJenkins removed a comment on pull request #30139: URL: https://github.com/apache/spark/pull/30139#issuecomment-717045452 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30139: [SPARK-31069][CORE] high cpu caused by chunksBeingTransferred in external shuffle service
SparkQA commented on pull request #30139: URL: https://github.com/apache/spark/pull/30139#issuecomment-717045434 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34918/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30139: [SPARK-31069][CORE] high cpu caused by chunksBeingTransferred in external shuffle service
AmplabJenkins commented on pull request #30139: URL: https://github.com/apache/spark/pull/30139#issuecomment-717045452 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction
SparkQA commented on pull request #29800: URL: https://github.com/apache/spark/pull/29800#issuecomment-717038908 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34920/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] stijndehaes edited a comment on pull request #29533: [SPARK-24266][K8S][3.0] Restart the watcher when we receive a version changed from k8s
stijndehaes edited a comment on pull request #29533: URL: https://github.com/apache/spark/pull/29533#issuecomment-717038227 @redsk @jkleckner The error line you are seeing comes from the class `ExecutorPodsWatchSnapshotSource` this is somewhere else in the code. I thought there was another mechanism for the driver to executor watches. This also looks like driver logs? The fix here is in the spark-submit application, you should watch the logs of that i.s.o. the driver. Can you tell me if these were driver logs or spark-submit logs? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] stijndehaes commented on pull request #29533: [SPARK-24266][K8S][3.0] Restart the watcher when we receive a version changed from k8s
stijndehaes commented on pull request #29533: URL: https://github.com/apache/spark/pull/29533#issuecomment-717038227 @redsk @jkleckner The error line you are seeing comes from the class `ExecutorPodsWatchSnapshotSource` this is somewhere else. I thought there was another mechanism for the executors to work with connection problems. This also looks like driver logs? The fix here is in the spark-submit application, you should watch the logs of that i.s.o. the driver. Can you tell me if these were driver logs or spark-submit logs? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29882: [SPARK-33008][SQL] Division by zero on divide-like operations returns incorrect result
SparkQA commented on pull request #29882: URL: https://github.com/apache/spark/pull/29882#issuecomment-717037578 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34919/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30097: [SPARK-33140][SQL] remove SQLConf and SparkSession in all sub-class of Rule[QueryPlan]
SparkQA commented on pull request #30097: URL: https://github.com/apache/spark/pull/30097#issuecomment-717036263 **[Test build #130321 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130321/testReport)** for PR 30097 at commit [`7612695`](https://github.com/apache/spark/commit/7612695c78456155a95ad4f7d54ef70e53f88921). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30156: [SPARK-33248][SQL] Add a configuration to control the legacy behavior of whether need to pad null value when value size less th
AmplabJenkins removed a comment on pull request #30156: URL: https://github.com/apache/spark/pull/30156#issuecomment-717035560 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/130308/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30156: [SPARK-33248][SQL] Add a configuration to control the legacy behavior of whether need to pad null value when value size less then schema size
SparkQA commented on pull request #30156: URL: https://github.com/apache/spark/pull/30156#issuecomment-717036225 **[Test build #130320 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130320/testReport)** for PR 30156 at commit [`8a18234`](https://github.com/apache/spark/commit/8a18234f325ffa7cf7c3d164d0f03b9402a5f4b8). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30139: [SPARK-31069][CORE] high cpu caused by chunksBeingTransferred in external shuffle service
AmplabJenkins removed a comment on pull request #30139: URL: https://github.com/apache/spark/pull/30139#issuecomment-717034895 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/130316/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30097: [SPARK-33140][SQL] remove SQLConf and SparkSession in all sub-class of Rule[QueryPlan]
AmplabJenkins removed a comment on pull request #30097: URL: https://github.com/apache/spark/pull/30097#issuecomment-717035349 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on pull request #30139: [SPARK-31069][CORE] high cpu caused by chunksBeingTransferred in external shuffle service
AngersZh commented on pull request #30139: URL: https://github.com/apache/spark/pull/30139#issuecomment-717036010 > Sorry but your latest change doesn't actually lock properly. Long is immutable, and you always replace the object when you do the calculation and assign to the field, which is used as a lock. Your best try would be changing it to long (to avoid box/unbox), and have a separate Object field as locking purpose. Update and check **_1. with UT change_** ![image](https://user-images.githubusercontent.com/46485123/97267072-84dffc80-1864-11eb-8125-9dc05b92886b.png) Current change With lock ``` OneForOneStreamManager fetch data duration test: Stream Size Max Min Avg 1 2834 704 1188.0 5 14048 10789 12055.7 ``` Only use AtomicLong ``` OneForOneStreamManager fetch data duration test: Stream Size Max Min Avg 1 6723 724 1527.5 5 14673 10712 12080.5 ``` **_2. Without UT change_** Current change With lock ``` OneForOneStreamManager fetch data duration test: Stream Size Max Min Avg 1 822 176 360.1 5 4190 1263 2387.9 10 6785 3870 4811.9 ``` Only use AtomicLong ``` OneForOneStreamManager fetch data duration test: Stream Size Max Min Avg 1 1037 167 548.5 5 2982 1102 1712.8 10 8779 3165 4831.9 ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29882: [SPARK-33008][SQL] Division by zero on divide-like operations returns incorrect result
AmplabJenkins removed a comment on pull request #29882: URL: https://github.com/apache/spark/pull/29882#issuecomment-717034822 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/130317/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30156: [SPARK-33248][SQL] Add a configuration to control the legacy behavior of whether need to pad null value when value size less th
AmplabJenkins removed a comment on pull request #30156: URL: https://github.com/apache/spark/pull/30156#issuecomment-717035401 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30156: [SPARK-33248][SQL] Add a configuration to control the legacy behavior of whether need to pad null value when value size less then schem
AmplabJenkins commented on pull request #30156: URL: https://github.com/apache/spark/pull/30156#issuecomment-717035552 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction
AmplabJenkins removed a comment on pull request #29800: URL: https://github.com/apache/spark/pull/29800#issuecomment-717034727 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/130318/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30139: [SPARK-31069][CORE] high cpu caused by chunksBeingTransferred in external shuffle service
AmplabJenkins removed a comment on pull request #30139: URL: https://github.com/apache/spark/pull/30139#issuecomment-717034885 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30156: [SPARK-33248][SQL] Add a configuration to control the legacy behavior of whether need to pad null value when value size less then schem
AmplabJenkins commented on pull request #30156: URL: https://github.com/apache/spark/pull/30156#issuecomment-717035401 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction
AmplabJenkins removed a comment on pull request #29800: URL: https://github.com/apache/spark/pull/29800#issuecomment-717034718 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30097: [SPARK-33140][SQL] remove SQLConf and SparkSession in all sub-class of Rule[QueryPlan]
AmplabJenkins commented on pull request #30097: URL: https://github.com/apache/spark/pull/30097#issuecomment-717035334 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] leanken edited a comment on pull request #30097: [SPARK-33140][SQL] remove SQLConf and SparkSession in all sub-class of Rule[QueryPlan]
leanken edited a comment on pull request #30097: URL: https://github.com/apache/spark/pull/30097#issuecomment-717035399 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30156: [SPARK-33248][SQL] Add a configuration to control the legacy behavior of whether need to pad null value when value size less th
AmplabJenkins removed a comment on pull request #30156: URL: https://github.com/apache/spark/pull/30156#issuecomment-717035055 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29882: [SPARK-33008][SQL] Division by zero on divide-like operations returns incorrect result
AmplabJenkins removed a comment on pull request #29882: URL: https://github.com/apache/spark/pull/29882#issuecomment-717034813 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30156: [SPARK-33248][SQL] Add a configuration to control the legacy behavior of whether need to pad null value when value size less then sch
SparkQA removed a comment on pull request #30156: URL: https://github.com/apache/spark/pull/30156#issuecomment-716970169 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30139: [SPARK-31069][CORE] high cpu caused by chunksBeingTransferred in external shuffle service
SparkQA commented on pull request #30139: URL: https://github.com/apache/spark/pull/30139#issuecomment-717035252 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34918/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30097: [SPARK-33140][SQL] remove SQLConf and SparkSession in all sub-class of Rule[QueryPlan]
SparkQA removed a comment on pull request #30097: URL: https://github.com/apache/spark/pull/30097#issuecomment-716945800 **[Test build #130311 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130311/testReport)** for PR 30097 at commit [`f8e103b`](https://github.com/apache/spark/commit/f8e103bcdef514c27627daea59bfafd00f22dcd9). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on pull request #30156: [SPARK-33248][SQL] Add a configuration to control the legacy behavior of whether need to pad null value when value size less then schema
AngersZh commented on pull request #30156: URL: https://github.com/apache/spark/pull/30156#issuecomment-717035082 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30139: [SPARK-31069][CORE] high cpu caused by chunksBeingTransferred in external shuffle service
SparkQA removed a comment on pull request #30139: URL: https://github.com/apache/spark/pull/30139#issuecomment-717013857 **[Test build #130316 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130316/testReport)** for PR 30139 at commit [`a32d6f3`](https://github.com/apache/spark/commit/a32d6f31e285c767825545aa5c3033b08bd2f271). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30156: [SPARK-33248][SQL] Add a configuration to control the legacy behavior of whether need to pad null value when value size less then schem
AmplabJenkins commented on pull request #30156: URL: https://github.com/apache/spark/pull/30156#issuecomment-717035055 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] leanken commented on pull request #30097: [SPARK-33140][SQL] remove SQLConf and SparkSession in all sub-class of Rule[QueryPlan]
leanken commented on pull request #30097: URL: https://github.com/apache/spark/pull/30097#issuecomment-717035399 retest this, please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction
SparkQA removed a comment on pull request #29800: URL: https://github.com/apache/spark/pull/29800#issuecomment-717017189 **[Test build #130318 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130318/testReport)** for PR 29800 at commit [`cde6170`](https://github.com/apache/spark/commit/cde61709ad60b5c8df294fe46c1957afb73fa57d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29882: [SPARK-33008][SQL] Division by zero on divide-like operations returns incorrect result
AmplabJenkins commented on pull request #29882: URL: https://github.com/apache/spark/pull/29882#issuecomment-717034813 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30139: [SPARK-31069][CORE] high cpu caused by chunksBeingTransferred in external shuffle service
AmplabJenkins commented on pull request #30139: URL: https://github.com/apache/spark/pull/30139#issuecomment-717034885 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30156: [SPARK-33248][SQL] Add a configuration to control the legacy behavior of whether need to pad null value when value size less then schema size
SparkQA commented on pull request #30156: URL: https://github.com/apache/spark/pull/30156#issuecomment-717034649 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction
SparkQA commented on pull request #29800: URL: https://github.com/apache/spark/pull/29800#issuecomment-717034646 **[Test build #130318 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130318/testReport)** for PR 29800 at commit [`cde6170`](https://github.com/apache/spark/commit/cde61709ad60b5c8df294fe46c1957afb73fa57d). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class FrameLessOffsetWindowFunctionFrame(` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30097: [SPARK-33140][SQL] remove SQLConf and SparkSession in all sub-class of Rule[QueryPlan]
SparkQA commented on pull request #30097: URL: https://github.com/apache/spark/pull/30097#issuecomment-717034660 **[Test build #130311 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130311/testReport)** for PR 30097 at commit [`f8e103b`](https://github.com/apache/spark/commit/f8e103bcdef514c27627daea59bfafd00f22dcd9). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29882: [SPARK-33008][SQL] Division by zero on divide-like operations returns incorrect result
SparkQA commented on pull request #29882: URL: https://github.com/apache/spark/pull/29882#issuecomment-717034645 **[Test build #130317 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130317/testReport)** for PR 29882 at commit [`df9b54f`](https://github.com/apache/spark/commit/df9b54fa7a78bfb8f760e54fcc3e402c0a0bb490). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class BrokenColumnarAdd(` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction
AmplabJenkins commented on pull request #29800: URL: https://github.com/apache/spark/pull/29800#issuecomment-717034718 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29882: [SPARK-33008][SQL] Division by zero on divide-like operations returns incorrect result
SparkQA removed a comment on pull request #29882: URL: https://github.com/apache/spark/pull/29882#issuecomment-717013900 **[Test build #130317 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130317/testReport)** for PR 29882 at commit [`df9b54f`](https://github.com/apache/spark/commit/df9b54fa7a78bfb8f760e54fcc3e402c0a0bb490). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30139: [SPARK-31069][CORE] high cpu caused by chunksBeingTransferred in external shuffle service
SparkQA commented on pull request #30139: URL: https://github.com/apache/spark/pull/30139#issuecomment-717034652 **[Test build #130316 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130316/testReport)** for PR 30139 at commit [`a32d6f3`](https://github.com/apache/spark/commit/a32d6f31e285c767825545aa5c3033b08bd2f271). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29882: [SPARK-33008][SQL] Division by zero on divide-like operations returns incorrect result
AmplabJenkins removed a comment on pull request #29882: URL: https://github.com/apache/spark/pull/29882#issuecomment-717033467 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/130312/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29882: [SPARK-33008][SQL] Division by zero on divide-like operations returns incorrect result
AmplabJenkins removed a comment on pull request #29882: URL: https://github.com/apache/spark/pull/29882#issuecomment-717033455 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29882: [SPARK-33008][SQL] Division by zero on divide-like operations returns incorrect result
AmplabJenkins commented on pull request #29882: URL: https://github.com/apache/spark/pull/29882#issuecomment-717033455 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29882: [SPARK-33008][SQL] Division by zero on divide-like operations returns incorrect result
SparkQA removed a comment on pull request #29882: URL: https://github.com/apache/spark/pull/29882#issuecomment-716950296 **[Test build #130312 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130312/testReport)** for PR 29882 at commit [`14c45e9`](https://github.com/apache/spark/commit/14c45e9d029e1165053cd1cbec5573fa9df508b2). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29882: [SPARK-33008][SQL] Division by zero on divide-like operations returns incorrect result
SparkQA commented on pull request #29882: URL: https://github.com/apache/spark/pull/29882#issuecomment-717032670 **[Test build #130312 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130312/testReport)** for PR 29882 at commit [`14c45e9`](https://github.com/apache/spark/commit/14c45e9d029e1165053cd1cbec5573fa9df508b2). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30139: [SPARK-31069][CORE] high cpu caused by chunksBeingTransferred in external shuffle service
SparkQA commented on pull request #30139: URL: https://github.com/apache/spark/pull/30139#issuecomment-717032105 **[Test build #130319 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130319/testReport)** for PR 30139 at commit [`b23dbd0`](https://github.com/apache/spark/commit/b23dbd03d0f4219ab8db3fa3557d7a50634821f1). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction
cloud-fan commented on a change in pull request #29800: URL: https://github.com/apache/spark/pull/29800#discussion_r512451640 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowExecBase.scala ## @@ -171,18 +178,42 @@ trait WindowExecBase extends UnaryExecNode { // Create the factory to produce WindowFunctionFrame. val factory = key match { - // Offset Frame - case ("OFFSET", _, IntegerLiteral(offset), _) => + // Frameless offset Frame + case ("FRAME_LESS_OFFSET", _, IntegerLiteral(offset), _) => target: InternalRow => - new OffsetWindowFunctionFrame( + new FrameLessOffsetWindowFunctionFrame( target, ordinal, // OFFSET frame functions are guaranteed be OffsetWindowFunctions. Review comment: OffsetWindowSpec ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowExecBase.scala ## @@ -171,18 +178,42 @@ trait WindowExecBase extends UnaryExecNode { // Create the factory to produce WindowFunctionFrame. val factory = key match { - // Offset Frame - case ("OFFSET", _, IntegerLiteral(offset), _) => + // Frameless offset Frame + case ("FRAME_LESS_OFFSET", _, IntegerLiteral(offset), _) => target: InternalRow => - new OffsetWindowFunctionFrame( + new FrameLessOffsetWindowFunctionFrame( target, ordinal, // OFFSET frame functions are guaranteed be OffsetWindowFunctions. -functions.map(_.asInstanceOf[OffsetWindowFunction]), +functions.map(_.asInstanceOf[OffsetWindowSpec]), child.output, (expressions, schema) => MutableProjection.create(expressions, schema), offset) + case ("UNBOUNDED_OFFSET", _, IntegerLiteral(offset), _) => +target: InternalRow => { + new UnboundedOffsetWindowFunctionFrame( +target, +ordinal, +// OFFSET frame functions are guaranteed be OffsetWindowFunctions. Review comment: OffsetWindowSpec ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowExecBase.scala ## @@ -171,18 +178,42 @@ trait WindowExecBase extends UnaryExecNode { // Create the factory to produce WindowFunctionFrame. val factory = key match { - // Offset Frame - case ("OFFSET", _, IntegerLiteral(offset), _) => + // Frameless offset Frame + case ("FRAME_LESS_OFFSET", _, IntegerLiteral(offset), _) => target: InternalRow => - new OffsetWindowFunctionFrame( + new FrameLessOffsetWindowFunctionFrame( target, ordinal, // OFFSET frame functions are guaranteed be OffsetWindowFunctions. -functions.map(_.asInstanceOf[OffsetWindowFunction]), +functions.map(_.asInstanceOf[OffsetWindowSpec]), child.output, (expressions, schema) => MutableProjection.create(expressions, schema), offset) + case ("UNBOUNDED_OFFSET", _, IntegerLiteral(offset), _) => +target: InternalRow => { + new UnboundedOffsetWindowFunctionFrame( +target, +ordinal, +// OFFSET frame functions are guaranteed be OffsetWindowFunctions. +functions.map(_.asInstanceOf[OffsetWindowSpec]), +child.output, +(expressions, schema) => + MutableProjection.create(expressions, schema), +offset - 1) +} + case ("UNBOUNDED_PRECEDING_OFFSET", _, IntegerLiteral(offset), _) => +target: InternalRow => { + new UnboundedPrecedingOffsetWindowFunctionFrame( +target, +ordinal, +// OFFSET frame functions are guaranteed be OffsetWindowFunctions. Review comment: OffsetWindowSpec This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction
cloud-fan commented on a change in pull request #29800: URL: https://github.com/apache/spark/pull/29800#discussion_r512451428 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowExec.scala ## @@ -58,7 +58,7 @@ import org.apache.spark.sql.types.{CalendarIntervalType, DateType, IntegerType, * 4. 1 PRECEDING AND 1 FOLLOWING * 5. 1 FOLLOWING AND 2 FOLLOWING * - Offset frame: The frame consist of one row, which is an offset number of rows away from the - * current row. Only [[OffsetWindowFunction]]s can be processed in an offset frame. + * current row. Only [[FrameLessOffsetWindowFunction]]s can be processed in an offset frame. Review comment: how about `UnboundedOffsetWindowFunctionFrame`? Is it also a "Offset frame"? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction
cloud-fan commented on a change in pull request #29800: URL: https://github.com/apache/spark/pull/29800#discussion_r512450768 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala ## @@ -355,6 +344,36 @@ abstract class OffsetWindowFunction */ val offset: Expression + /** + * Default result value for the function when the `offset`th row does not exist. + */ + val default: Expression + + /** + * An optional specification that indicates the offset window function should skip null values in + * the determination of which row to use. + */ + val ignoreNulls: Boolean + + /** + * Whether the offset is starts with the current row. If `isRelative` is true, `offset` means + * the offset is start with the current row. otherwise, the offset is starts with the first + * row of the entire window frame. + */ + val isRelative: Boolean + + lazy val fakeFrame = SpecifiedWindowFrame(RowFrame, offset, offset) +} + +/** + * A frameless offset window function is a window function that cannot specify window frame and + * returns the value of the input column offset by a number of rows within the partition. Review comment: `by a number of rows according to the current row` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR edited a comment on pull request #30139: [SPARK-31069][CORE] high cpu caused by chunksBeingTransferred in external shuffle service
HeartSaVioR edited a comment on pull request #30139: URL: https://github.com/apache/spark/pull/30139#issuecomment-717022097 Sorry but your latest change doesn't actually lock properly. Long is immutable, and you always replace the object when you do the calculation and assign to the field, which is used as a lock. Your best try would be changing it to long (to avoid box/unbox), and have a separate Object field as locking purpose. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on pull request #30139: [SPARK-31069][CORE] high cpu caused by chunksBeingTransferred in external shuffle service
AngersZh commented on pull request #30139: URL: https://github.com/apache/spark/pull/30139#issuecomment-717023200 > Sorry but your latest change doesn't actually lock properly. Long is immutable, and you always replace the object when you do the calculation and assign to the field, which is used as a lock. Your best try would be changing it to long, and have a separate Object field as locking purpose. a big mistake, thanks for your suggestion. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR edited a comment on pull request #30139: [SPARK-31069][CORE] high cpu caused by chunksBeingTransferred in external shuffle service
HeartSaVioR edited a comment on pull request #30139: URL: https://github.com/apache/spark/pull/30139#issuecomment-717022097 Sorry but your latest change doesn't actually lock properly. Long is immutable, and you always replace the object when you do the calculation and assign to the field, which is used as a lock. Your best try would be changing it to long, and have a separate Object field as locking purpose. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #30139: [SPARK-31069][CORE] high cpu caused by chunksBeingTransferred in external shuffle service
HeartSaVioR commented on pull request #30139: URL: https://github.com/apache/spark/pull/30139#issuecomment-717022097 Sorry but your latest change doesn't actually lock properly. Long is immutable, and you always replace the object when you do the calculation, which is used as a lock. Your best try would be changing it to long, and have a separate Object field as locking purpose. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on pull request #30139: [SPARK-31069][CORE] high cpu caused by chunksBeingTransferred in external shuffle service
AngersZh commented on pull request #30139: URL: https://github.com/apache/spark/pull/30139#issuecomment-717019830 > Hmmm. Every update to `chunkSent` and `chunkBeingSent` will compete for the lock on object `totalChunksBeingTransferred` We reduce many race condition on `streams` and just add a very quick lock on. `numChunksBeingTransferred` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction
SparkQA commented on pull request #29800: URL: https://github.com/apache/spark/pull/29800#issuecomment-717017189 **[Test build #130318 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130318/testReport)** for PR 29800 at commit [`cde6170`](https://github.com/apache/spark/commit/cde61709ad60b5c8df294fe46c1957afb73fa57d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu edited a comment on pull request #30139: [SPARK-31069][CORE] high cpu caused by chunksBeingTransferred in external shuffle service
AngersZh edited a comment on pull request #30139: URL: https://github.com/apache/spark/pull/30139#issuecomment-717016273 > Hmmm. Every update to `chunkSent` and `chunkBeingSent` will compete for the lock on object `totalChunksBeingTransferred` if we add the `synchronize(totalChunksBeingTransferred)`. This would increase the time for these operations. This would mean that to speed up `chunksBeingTransferred`, we are increasing the time of updates to `streamState`. I know that, but as @jiangxb1987 @mridulm mentioned, we need to ensure the streamState and the totalChunksBeingTransfered are updated synchronically. Add this lock is a strong guarantee. The execution process in the middle of the lock is very fast, so the impact is not really significant. Also, It's a huge leap in performance compared to what it was before This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on pull request #30139: [SPARK-31069][CORE] high cpu caused by chunksBeingTransferred in external shuffle service
AngersZh commented on pull request #30139: URL: https://github.com/apache/spark/pull/30139#issuecomment-717016273 > Hmmm. Every update to `chunkSent` and `chunkBeingSent` will compete for the lock on object `totalChunksBeingTransferred` if we add the `synchronize(totalChunksBeingTransferred)`. This would increase the time for these operations. This would mean that to speed up `chunksBeingTransferred`, we are increasing the time of updates to `streamState`. I know that, but as @jiangxb1987 @mridulm mentioned, we need to ensure the streamState and the totalChunksBeingTransfered are updated synchronically. Add this lock is a strong guarantee. The execution process in the middle of the lock is very fast, so the impact is not really significant This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30139: [SPARK-31069][CORE] high cpu caused by chunksBeingTransferred in external shuffle service
SparkQA commented on pull request #30139: URL: https://github.com/apache/spark/pull/30139#issuecomment-717013857 **[Test build #130316 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130316/testReport)** for PR 30139 at commit [`a32d6f3`](https://github.com/apache/spark/commit/a32d6f31e285c767825545aa5c3033b08bd2f271). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29882: [SPARK-33008][SQL] Division by zero on divide-like operations returns incorrect result
SparkQA commented on pull request #29882: URL: https://github.com/apache/spark/pull/29882#issuecomment-717013900 **[Test build #130317 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130317/testReport)** for PR 29882 at commit [`df9b54f`](https://github.com/apache/spark/commit/df9b54fa7a78bfb8f760e54fcc3e402c0a0bb490). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on pull request #30139: [SPARK-31069][CORE] high cpu caused by chunksBeingTransferred in external shuffle service
AngersZh commented on pull request #30139: URL: https://github.com/apache/spark/pull/30139#issuecomment-717013113 > Yes we should ensure the streamState and the totalChunksBeingTransfered are updated synchronically. Other than that the PR looks good! How about current change? Don't use AtomicLong but use `synchronize` to keep strong consistency. And the test result is ``` OneForOneStreamManager fetch data duration test: Stream Size Max Min Avg 1 1796 187 497.5 5 4214 1295 2267.9 10 10635 3643 5800.3 Process finished with exit code 0 ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #30139: [SPARK-31069][CORE] high cpu caused by chunksBeingTransferred in external shuffle service
AngersZh commented on a change in pull request #30139: URL: https://github.com/apache/spark/pull/30139#discussion_r512439581 ## File path: common/network-common/src/main/java/org/apache/spark/network/server/OneForOneStreamManager.java ## @@ -43,6 +43,7 @@ private final AtomicLong nextStreamId; private final ConcurrentHashMap streams; + private final AtomicLong totalChunksBeingTransferred = new AtomicLong(0); Review comment: > nit: maybe rename to `numChunksBeingTransferred`? Because it's not accumulating all the chunks that are transferred in history. Updated This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] otterc commented on pull request #30139: [SPARK-31069][CORE] high cpu caused by chunksBeingTransferred in external shuffle service
otterc commented on pull request #30139: URL: https://github.com/apache/spark/pull/30139#issuecomment-717008775 > > This should have a considerable impact on the performance when there are multiple open streams because updates of different streams would lock on a single object `totalChunksBeingTransferred`. Isn't that the case? > > Since `totalChunksBeingTransfereed` is atomic, when it update It has its own competition, add a lock at `totalChunksBeingTransfereed` won't have too much impact. And we can keep strong consistences between the streamState and the totalChunksBeingTransfered are updated synchronically. Hmmm. Every update to `chunkSent` and `chunkBeingSent` will compete for the lock on object `totalChunksBeingTransferred` if we add the `synchronize(totalChunksBeingTransferred)`. This would increase the time for these operations. This would mean that to speed up `chunksBeingTransferred`, we are increasing the time of updates to `streamState`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu edited a comment on pull request #30139: [SPARK-31069][CORE] high cpu caused by chunksBeingTransferred in external shuffle service
AngersZh edited a comment on pull request #30139: URL: https://github.com/apache/spark/pull/30139#issuecomment-717004690 > This should have a considerable impact on the performance when there are multiple open streams because updates of different streams would lock on a single object `totalChunksBeingTransferred`. Isn't that the case? Since `totalChunksBeingTransfereed` is atomic, when it update It has its own competition, add a lock at `totalChunksBeingTransfereed` won't have too much impact. And we can keep strong consistences between the streamState and the totalChunksBeingTransfered are updated synchronically. Also run the test above, can't see any apparent influence. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on pull request #30139: [SPARK-31069][CORE] high cpu caused by chunksBeingTransferred in external shuffle service
AngersZh commented on pull request #30139: URL: https://github.com/apache/spark/pull/30139#issuecomment-717004690 > This should have a considerable impact on the performance when there are multiple open streams because updates of different streams would lock on a single object `totalChunksBeingTransferred`. Isn't that the case? Since `totalChunksBeingTransfereed` is atomic, when it update It has its own competition, add a lock at `totalChunksBeingTransfereed` won't have too much impact. And we can keep strong consistences between the streamState and the totalChunksBeingTransfered are updated synchronically. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29882: [SPARK-33008][SQL] Division by zero on divide-like operations returns incorrect result
AmplabJenkins removed a comment on pull request #29882: URL: https://github.com/apache/spark/pull/29882#issuecomment-717003726 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/130310/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29882: [SPARK-33008][SQL] Division by zero on divide-like operations returns incorrect result
AmplabJenkins removed a comment on pull request #29882: URL: https://github.com/apache/spark/pull/29882#issuecomment-717003715 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29882: [SPARK-33008][SQL] Division by zero on divide-like operations returns incorrect result
AmplabJenkins commented on pull request #29882: URL: https://github.com/apache/spark/pull/29882#issuecomment-717003715 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org