[GitHub] [spark] AmplabJenkins commented on pull request #30139: [SPARK-31069][CORE] high cpu caused by chunksBeingTransferred in external shuffle service

2020-10-27 Thread GitBox


AmplabJenkins commented on pull request #30139:
URL: https://github.com/apache/spark/pull/30139#issuecomment-717065671







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu opened a new pull request #30157: [WIP][SPARK-33228][SQL][2.4] Don't uncache data when replacing a view having the same logical plan

2020-10-27 Thread GitBox


maropu opened a new pull request #30157:
URL: https://github.com/apache/spark/pull/30157


   
   
   ### What changes were proposed in this pull request?
   
   SPARK-30494's updated the `CreateViewCommand` code to implicitly drop cache 
when replacing an existing view. But, this change drops cache even when 
replacing a view having the same logical plan. A sequence of queries to 
reproduce this as follows;
   ```
   // Spark v2.4.6+
   scala> val df = spark.range(1).selectExpr("id a", "id b")
   scala> df.cache()
   scala> df.explain()
   == Physical Plan ==
   *(1) ColumnarToRow
   +- InMemoryTableScan [a#2L, b#3L]
 +- InMemoryRelation [a#2L, b#3L], StorageLevel(disk, memory, 
deserialized, 1 replicas)
   +- *(1) Project [id#0L AS a#2L, id#0L AS b#3L]
  +- *(1) Range (0, 1, step=1, splits=4)
   
   scala> df.createOrReplaceTempView("t")
   scala> sql("select * from t").explain()
   == Physical Plan ==
   *(1) ColumnarToRow
   +- InMemoryTableScan [a#2L, b#3L]
 +- InMemoryRelation [a#2L, b#3L], StorageLevel(disk, memory, 
deserialized, 1 replicas)
   +- *(1) Project [id#0L AS a#2L, id#0L AS b#3L]
  +- *(1) Range (0, 1, step=1, splits=4)
   
   // If one re-runs the same query `df.createOrReplaceTempView("t")`, the 
cache's swept away
   scala> df.createOrReplaceTempView("t")
   scala> sql("select * from t").explain()
   == Physical Plan ==
   *(1) Project [id#0L AS a#2L, id#0L AS b#3L]
   +- *(1) Range (0, 1, step=1, splits=4)
   
   
   // Until v2.4.6
   scala> val df = spark.range(1).selectExpr("id a", "id b")
   scala> df.cache()
   scala> df.createOrReplaceTempView("t")
   scala> sql("select * from t").explain()
   20/10/23 22:33:42 WARN ObjectStore: Failed to get database global_temp, 
returning NoSuchObjectException
   == Physical Plan ==
   *(1) InMemoryTableScan [a#2L, b#3L]
  +- InMemoryRelation [a#2L, b#3L], StorageLevel(disk, memory, 
deserialized, 1 replicas)
+- *(1) Project [id#0L AS a#2L, id#0L AS b#3L]
   +- *(1) Range (0, 1, step=1, splits=4)
   
   scala> df.createOrReplaceTempView("t")
   scala> sql("select * from t").explain()
   == Physical Plan ==
   *(1) InMemoryTableScan [a#2L, b#3L]
  +- InMemoryRelation [a#2L, b#3L], StorageLevel(disk, memory, 
deserialized, 1 replicas)
+- *(1) Project [id#0L AS a#2L, id#0L AS b#3L]
   +- *(1) Range (0, 1, step=1, splits=4)
   ```
   
   ### Why are the changes needed?
   
   bugfix.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Added tests.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30139: [SPARK-31069][CORE] high cpu caused by chunksBeingTransferred in external shuffle service

2020-10-27 Thread GitBox


SparkQA commented on pull request #30139:
URL: https://github.com/apache/spark/pull/30139#issuecomment-717065639


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34921/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction

2020-10-27 Thread GitBox


SparkQA commented on pull request #29800:
URL: https://github.com/apache/spark/pull/29800#issuecomment-717064724


   **[Test build #130322 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130322/testReport)**
 for PR 29800 at commit 
[`ca574d9`](https://github.com/apache/spark/commit/ca574d993939db573aa1b5e6692b9ae56df0aea7).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30097: [SPARK-33140][SQL] remove SQLConf and SparkSession in all sub-class of Rule[QueryPlan]

2020-10-27 Thread GitBox


SparkQA commented on pull request #30097:
URL: https://github.com/apache/spark/pull/30097#issuecomment-717063665


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34923/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on a change in pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction

2020-10-27 Thread GitBox


beliefer commented on a change in pull request #29800:
URL: https://github.com/apache/spark/pull/29800#discussion_r512481955



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala
##
@@ -151,10 +169,69 @@ final class OffsetWindowFunctionFrame(
 }
 inputIndex += 1
   }
+}
 
-  override def currentLowerBound(): Int = throw new 
UnsupportedOperationException()
+/**
+ * The unbounded offset window frame calculates frames containing NTH_VALUE 
statements.
+ * The unbounded offset window frame return the same value for all rows in the 
window partition.
+ */
+class UnboundedOffsetWindowFunctionFrame(
+target: InternalRow,
+ordinal: Int,
+expressions: Array[OffsetWindowSpec],
+inputSchema: Seq[Attribute],
+newMutableProjection: (Seq[Expression], Seq[Attribute]) => 
MutableProjection,
+offset: Int)
+  extends OffsetWindowFunctionFrameBase(
+target, ordinal, expressions, inputSchema, newMutableProjection, offset) {
 
-  override def currentUpperBound(): Int = throw new 
UnsupportedOperationException()
+  override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = {
+super.prepare(rows)
+if (inputIndex >= 0 && inputIndex < input.length) {
+  val r = WindowFunctionFrame.getNextOrNull(inputIterator)

Review comment:
   OK





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on a change in pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction

2020-10-27 Thread GitBox


beliefer commented on a change in pull request #29800:
URL: https://github.com/apache/spark/pull/29800#discussion_r512478210



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowExecBase.scala
##
@@ -171,18 +178,42 @@ trait WindowExecBase extends UnaryExecNode {
 
 // Create the factory to produce WindowFunctionFrame.
 val factory = key match {
-  // Offset Frame
-  case ("OFFSET", _, IntegerLiteral(offset), _) =>
+  // Frameless offset Frame
+  case ("FRAME_LESS_OFFSET", _, IntegerLiteral(offset), _) =>
 target: InternalRow =>
-  new OffsetWindowFunctionFrame(
+  new FrameLessOffsetWindowFunctionFrame(
 target,
 ordinal,
 // OFFSET frame functions are guaranteed be 
OffsetWindowFunctions.

Review comment:
   OK

##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowExecBase.scala
##
@@ -171,18 +178,42 @@ trait WindowExecBase extends UnaryExecNode {
 
 // Create the factory to produce WindowFunctionFrame.
 val factory = key match {
-  // Offset Frame
-  case ("OFFSET", _, IntegerLiteral(offset), _) =>
+  // Frameless offset Frame
+  case ("FRAME_LESS_OFFSET", _, IntegerLiteral(offset), _) =>
 target: InternalRow =>
-  new OffsetWindowFunctionFrame(
+  new FrameLessOffsetWindowFunctionFrame(
 target,
 ordinal,
 // OFFSET frame functions are guaranteed be 
OffsetWindowFunctions.
-functions.map(_.asInstanceOf[OffsetWindowFunction]),
+functions.map(_.asInstanceOf[OffsetWindowSpec]),
 child.output,
 (expressions, schema) =>
   MutableProjection.create(expressions, schema),
 offset)
+  case ("UNBOUNDED_OFFSET", _, IntegerLiteral(offset), _) =>
+target: InternalRow => {
+  new UnboundedOffsetWindowFunctionFrame(
+target,
+ordinal,
+// OFFSET frame functions are guaranteed be 
OffsetWindowFunctions.

Review comment:
   OK

##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowExecBase.scala
##
@@ -171,18 +178,42 @@ trait WindowExecBase extends UnaryExecNode {
 
 // Create the factory to produce WindowFunctionFrame.
 val factory = key match {
-  // Offset Frame
-  case ("OFFSET", _, IntegerLiteral(offset), _) =>
+  // Frameless offset Frame
+  case ("FRAME_LESS_OFFSET", _, IntegerLiteral(offset), _) =>
 target: InternalRow =>
-  new OffsetWindowFunctionFrame(
+  new FrameLessOffsetWindowFunctionFrame(
 target,
 ordinal,
 // OFFSET frame functions are guaranteed be 
OffsetWindowFunctions.
-functions.map(_.asInstanceOf[OffsetWindowFunction]),
+functions.map(_.asInstanceOf[OffsetWindowSpec]),
 child.output,
 (expressions, schema) =>
   MutableProjection.create(expressions, schema),
 offset)
+  case ("UNBOUNDED_OFFSET", _, IntegerLiteral(offset), _) =>
+target: InternalRow => {
+  new UnboundedOffsetWindowFunctionFrame(
+target,
+ordinal,
+// OFFSET frame functions are guaranteed be 
OffsetWindowFunctions.
+functions.map(_.asInstanceOf[OffsetWindowSpec]),
+child.output,
+(expressions, schema) =>
+  MutableProjection.create(expressions, schema),
+offset - 1)
+}
+  case ("UNBOUNDED_PRECEDING_OFFSET", _, IntegerLiteral(offset), _) =>
+target: InternalRow => {
+  new UnboundedPrecedingOffsetWindowFunctionFrame(
+target,
+ordinal,
+// OFFSET frame functions are guaranteed be 
OffsetWindowFunctions.

Review comment:
   OK





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on a change in pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction

2020-10-27 Thread GitBox


beliefer commented on a change in pull request #29800:
URL: https://github.com/apache/spark/pull/29800#discussion_r512477822



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowExec.scala
##
@@ -58,7 +58,7 @@ import org.apache.spark.sql.types.{CalendarIntervalType, 
DateType, IntegerType,
  * 4. 1 PRECEDING AND 1 FOLLOWING
  * 5. 1 FOLLOWING AND 2 FOLLOWING
  * - Offset frame: The frame consist of one row, which is an offset number of 
rows away from the
- *   current row. Only [[OffsetWindowFunction]]s can be processed in an offset 
frame.
+ *   current row. Only [[FrameLessOffsetWindowFunction]]s can be processed in 
an offset frame.

Review comment:
   I will update the doc.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30156: [SPARK-33248][SQL] Add a configuration to control the legacy behavior of whether need to pad null value when value size less then schema size

2020-10-27 Thread GitBox


SparkQA commented on pull request #30156:
URL: https://github.com/apache/spark/pull/30156#issuecomment-717055292


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34922/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30139: [SPARK-31069][CORE] high cpu caused by chunksBeingTransferred in external shuffle service

2020-10-27 Thread GitBox


SparkQA commented on pull request #30139:
URL: https://github.com/apache/spark/pull/30139#issuecomment-717051971


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34921/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction

2020-10-27 Thread GitBox


AmplabJenkins removed a comment on pull request #29800:
URL: https://github.com/apache/spark/pull/29800#issuecomment-717049528


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/34920/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction

2020-10-27 Thread GitBox


AmplabJenkins removed a comment on pull request #29800:
URL: https://github.com/apache/spark/pull/29800#issuecomment-717049518


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction

2020-10-27 Thread GitBox


SparkQA commented on pull request #29800:
URL: https://github.com/apache/spark/pull/29800#issuecomment-717049483


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34920/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction

2020-10-27 Thread GitBox


AmplabJenkins commented on pull request #29800:
URL: https://github.com/apache/spark/pull/29800#issuecomment-717049518







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on a change in pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction

2020-10-27 Thread GitBox


beliefer commented on a change in pull request #29800:
URL: https://github.com/apache/spark/pull/29800#discussion_r512468539



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala
##
@@ -355,6 +344,36 @@ abstract class OffsetWindowFunction
*/
   val offset: Expression
 
+  /**
+   * Default result value for the function when the `offset`th row does not 
exist.
+   */
+  val default: Expression
+
+  /**
+   * An optional specification that indicates the offset window function 
should skip null values in
+   * the determination of which row to use.
+   */
+  val ignoreNulls: Boolean
+
+  /**
+   * Whether the offset is starts with the current row. If `isRelative` is 
true, `offset` means
+   * the offset is start with the current row. otherwise, the offset is starts 
with the first
+   * row of the entire window frame.
+   */
+  val isRelative: Boolean
+
+  lazy val fakeFrame = SpecifiedWindowFrame(RowFrame, offset, offset)
+}
+
+/**
+ * A frameless offset window function is a window function that cannot specify 
window frame and
+ * returns the value of the input column offset by a number of rows within the 
partition.

Review comment:
   I make a mistake.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29882: [SPARK-33008][SQL] Division by zero on divide-like operations returns incorrect result

2020-10-27 Thread GitBox


AmplabJenkins removed a comment on pull request #29882:
URL: https://github.com/apache/spark/pull/29882#issuecomment-717048147







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29882: [SPARK-33008][SQL] Division by zero on divide-like operations returns incorrect result

2020-10-27 Thread GitBox


AmplabJenkins commented on pull request #29882:
URL: https://github.com/apache/spark/pull/29882#issuecomment-717048147







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29882: [SPARK-33008][SQL] Division by zero on divide-like operations returns incorrect result

2020-10-27 Thread GitBox


SparkQA commented on pull request #29882:
URL: https://github.com/apache/spark/pull/29882#issuecomment-717048124


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34919/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30139: [SPARK-31069][CORE] high cpu caused by chunksBeingTransferred in external shuffle service

2020-10-27 Thread GitBox


AmplabJenkins removed a comment on pull request #30139:
URL: https://github.com/apache/spark/pull/30139#issuecomment-717045452







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30139: [SPARK-31069][CORE] high cpu caused by chunksBeingTransferred in external shuffle service

2020-10-27 Thread GitBox


SparkQA commented on pull request #30139:
URL: https://github.com/apache/spark/pull/30139#issuecomment-717045434


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34918/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30139: [SPARK-31069][CORE] high cpu caused by chunksBeingTransferred in external shuffle service

2020-10-27 Thread GitBox


AmplabJenkins commented on pull request #30139:
URL: https://github.com/apache/spark/pull/30139#issuecomment-717045452







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction

2020-10-27 Thread GitBox


SparkQA commented on pull request #29800:
URL: https://github.com/apache/spark/pull/29800#issuecomment-717038908


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34920/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] stijndehaes edited a comment on pull request #29533: [SPARK-24266][K8S][3.0] Restart the watcher when we receive a version changed from k8s

2020-10-27 Thread GitBox


stijndehaes edited a comment on pull request #29533:
URL: https://github.com/apache/spark/pull/29533#issuecomment-717038227


   @redsk @jkleckner The error line you are seeing comes from the class 
`ExecutorPodsWatchSnapshotSource` this is somewhere else in the code. I thought 
there was another mechanism for the driver to executor watches.
   
   This also looks like driver logs? The fix here is in the spark-submit 
application, you should watch the logs of that i.s.o. the driver. Can you tell 
me if these were driver logs or spark-submit logs?
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] stijndehaes commented on pull request #29533: [SPARK-24266][K8S][3.0] Restart the watcher when we receive a version changed from k8s

2020-10-27 Thread GitBox


stijndehaes commented on pull request #29533:
URL: https://github.com/apache/spark/pull/29533#issuecomment-717038227


   @redsk @jkleckner The error line you are seeing comes from the class 
`ExecutorPodsWatchSnapshotSource` this is somewhere else. I thought there was 
another mechanism for the executors to work with connection problems.
   
   This also looks like driver logs? The fix here is in the spark-submit 
application, you should watch the logs of that i.s.o. the driver. Can you tell 
me if these were driver logs or spark-submit logs?
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29882: [SPARK-33008][SQL] Division by zero on divide-like operations returns incorrect result

2020-10-27 Thread GitBox


SparkQA commented on pull request #29882:
URL: https://github.com/apache/spark/pull/29882#issuecomment-717037578


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34919/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30097: [SPARK-33140][SQL] remove SQLConf and SparkSession in all sub-class of Rule[QueryPlan]

2020-10-27 Thread GitBox


SparkQA commented on pull request #30097:
URL: https://github.com/apache/spark/pull/30097#issuecomment-717036263


   **[Test build #130321 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130321/testReport)**
 for PR 30097 at commit 
[`7612695`](https://github.com/apache/spark/commit/7612695c78456155a95ad4f7d54ef70e53f88921).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30156: [SPARK-33248][SQL] Add a configuration to control the legacy behavior of whether need to pad null value when value size less th

2020-10-27 Thread GitBox


AmplabJenkins removed a comment on pull request #30156:
URL: https://github.com/apache/spark/pull/30156#issuecomment-717035560


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/130308/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30156: [SPARK-33248][SQL] Add a configuration to control the legacy behavior of whether need to pad null value when value size less then schema size

2020-10-27 Thread GitBox


SparkQA commented on pull request #30156:
URL: https://github.com/apache/spark/pull/30156#issuecomment-717036225


   **[Test build #130320 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130320/testReport)**
 for PR 30156 at commit 
[`8a18234`](https://github.com/apache/spark/commit/8a18234f325ffa7cf7c3d164d0f03b9402a5f4b8).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30139: [SPARK-31069][CORE] high cpu caused by chunksBeingTransferred in external shuffle service

2020-10-27 Thread GitBox


AmplabJenkins removed a comment on pull request #30139:
URL: https://github.com/apache/spark/pull/30139#issuecomment-717034895


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/130316/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30097: [SPARK-33140][SQL] remove SQLConf and SparkSession in all sub-class of Rule[QueryPlan]

2020-10-27 Thread GitBox


AmplabJenkins removed a comment on pull request #30097:
URL: https://github.com/apache/spark/pull/30097#issuecomment-717035349







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on pull request #30139: [SPARK-31069][CORE] high cpu caused by chunksBeingTransferred in external shuffle service

2020-10-27 Thread GitBox


AngersZh commented on pull request #30139:
URL: https://github.com/apache/spark/pull/30139#issuecomment-717036010


   > Sorry but your latest change doesn't actually lock properly. Long is 
immutable, and you always replace the object when you do the calculation and 
assign to the field, which is used as a lock. Your best try would be changing 
it to long (to avoid box/unbox), and have a separate Object field as locking 
purpose.
   
   Update and check 
   
   **_1. with UT change_**
   
![image](https://user-images.githubusercontent.com/46485123/97267072-84dffc80-1864-11eb-8125-9dc05b92886b.png)
   
   Current change With lock 
   ```
   OneForOneStreamManager fetch data duration test:
   Stream Size  Max   Min   Avg
   1   2834   704   1188.0
   5   14048   10789   12055.7
   ```
   
   Only use AtomicLong
   ```
   OneForOneStreamManager fetch data duration test:
   Stream Size  Max   Min   Avg
   1   6723   724   1527.5
   5   14673   10712   12080.5
   ```
   
   **_2. Without UT change_**
   Current change With lock 
   ```
   OneForOneStreamManager fetch data duration test:
   Stream Size  Max   Min   Avg
   1   822   176   360.1
   5   4190   1263   2387.9
   10   6785   3870   4811.9
   ```
   Only use AtomicLong
   ```
   OneForOneStreamManager fetch data duration test:
   Stream Size  Max   Min   Avg
   1   1037   167   548.5
   5   2982   1102   1712.8
   10   8779   3165   4831.9
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29882: [SPARK-33008][SQL] Division by zero on divide-like operations returns incorrect result

2020-10-27 Thread GitBox


AmplabJenkins removed a comment on pull request #29882:
URL: https://github.com/apache/spark/pull/29882#issuecomment-717034822


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/130317/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30156: [SPARK-33248][SQL] Add a configuration to control the legacy behavior of whether need to pad null value when value size less th

2020-10-27 Thread GitBox


AmplabJenkins removed a comment on pull request #30156:
URL: https://github.com/apache/spark/pull/30156#issuecomment-717035401







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30156: [SPARK-33248][SQL] Add a configuration to control the legacy behavior of whether need to pad null value when value size less then schem

2020-10-27 Thread GitBox


AmplabJenkins commented on pull request #30156:
URL: https://github.com/apache/spark/pull/30156#issuecomment-717035552







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction

2020-10-27 Thread GitBox


AmplabJenkins removed a comment on pull request #29800:
URL: https://github.com/apache/spark/pull/29800#issuecomment-717034727


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/130318/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30139: [SPARK-31069][CORE] high cpu caused by chunksBeingTransferred in external shuffle service

2020-10-27 Thread GitBox


AmplabJenkins removed a comment on pull request #30139:
URL: https://github.com/apache/spark/pull/30139#issuecomment-717034885


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30156: [SPARK-33248][SQL] Add a configuration to control the legacy behavior of whether need to pad null value when value size less then schem

2020-10-27 Thread GitBox


AmplabJenkins commented on pull request #30156:
URL: https://github.com/apache/spark/pull/30156#issuecomment-717035401







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction

2020-10-27 Thread GitBox


AmplabJenkins removed a comment on pull request #29800:
URL: https://github.com/apache/spark/pull/29800#issuecomment-717034718


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30097: [SPARK-33140][SQL] remove SQLConf and SparkSession in all sub-class of Rule[QueryPlan]

2020-10-27 Thread GitBox


AmplabJenkins commented on pull request #30097:
URL: https://github.com/apache/spark/pull/30097#issuecomment-717035334







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] leanken edited a comment on pull request #30097: [SPARK-33140][SQL] remove SQLConf and SparkSession in all sub-class of Rule[QueryPlan]

2020-10-27 Thread GitBox


leanken edited a comment on pull request #30097:
URL: https://github.com/apache/spark/pull/30097#issuecomment-717035399


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30156: [SPARK-33248][SQL] Add a configuration to control the legacy behavior of whether need to pad null value when value size less th

2020-10-27 Thread GitBox


AmplabJenkins removed a comment on pull request #30156:
URL: https://github.com/apache/spark/pull/30156#issuecomment-717035055







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29882: [SPARK-33008][SQL] Division by zero on divide-like operations returns incorrect result

2020-10-27 Thread GitBox


AmplabJenkins removed a comment on pull request #29882:
URL: https://github.com/apache/spark/pull/29882#issuecomment-717034813


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30156: [SPARK-33248][SQL] Add a configuration to control the legacy behavior of whether need to pad null value when value size less then sch

2020-10-27 Thread GitBox


SparkQA removed a comment on pull request #30156:
URL: https://github.com/apache/spark/pull/30156#issuecomment-716970169







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30139: [SPARK-31069][CORE] high cpu caused by chunksBeingTransferred in external shuffle service

2020-10-27 Thread GitBox


SparkQA commented on pull request #30139:
URL: https://github.com/apache/spark/pull/30139#issuecomment-717035252


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34918/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30097: [SPARK-33140][SQL] remove SQLConf and SparkSession in all sub-class of Rule[QueryPlan]

2020-10-27 Thread GitBox


SparkQA removed a comment on pull request #30097:
URL: https://github.com/apache/spark/pull/30097#issuecomment-716945800


   **[Test build #130311 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130311/testReport)**
 for PR 30097 at commit 
[`f8e103b`](https://github.com/apache/spark/commit/f8e103bcdef514c27627daea59bfafd00f22dcd9).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on pull request #30156: [SPARK-33248][SQL] Add a configuration to control the legacy behavior of whether need to pad null value when value size less then schema

2020-10-27 Thread GitBox


AngersZh commented on pull request #30156:
URL: https://github.com/apache/spark/pull/30156#issuecomment-717035082


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30139: [SPARK-31069][CORE] high cpu caused by chunksBeingTransferred in external shuffle service

2020-10-27 Thread GitBox


SparkQA removed a comment on pull request #30139:
URL: https://github.com/apache/spark/pull/30139#issuecomment-717013857


   **[Test build #130316 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130316/testReport)**
 for PR 30139 at commit 
[`a32d6f3`](https://github.com/apache/spark/commit/a32d6f31e285c767825545aa5c3033b08bd2f271).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30156: [SPARK-33248][SQL] Add a configuration to control the legacy behavior of whether need to pad null value when value size less then schem

2020-10-27 Thread GitBox


AmplabJenkins commented on pull request #30156:
URL: https://github.com/apache/spark/pull/30156#issuecomment-717035055







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] leanken commented on pull request #30097: [SPARK-33140][SQL] remove SQLConf and SparkSession in all sub-class of Rule[QueryPlan]

2020-10-27 Thread GitBox


leanken commented on pull request #30097:
URL: https://github.com/apache/spark/pull/30097#issuecomment-717035399


   retest this, please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction

2020-10-27 Thread GitBox


SparkQA removed a comment on pull request #29800:
URL: https://github.com/apache/spark/pull/29800#issuecomment-717017189


   **[Test build #130318 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130318/testReport)**
 for PR 29800 at commit 
[`cde6170`](https://github.com/apache/spark/commit/cde61709ad60b5c8df294fe46c1957afb73fa57d).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29882: [SPARK-33008][SQL] Division by zero on divide-like operations returns incorrect result

2020-10-27 Thread GitBox


AmplabJenkins commented on pull request #29882:
URL: https://github.com/apache/spark/pull/29882#issuecomment-717034813







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30139: [SPARK-31069][CORE] high cpu caused by chunksBeingTransferred in external shuffle service

2020-10-27 Thread GitBox


AmplabJenkins commented on pull request #30139:
URL: https://github.com/apache/spark/pull/30139#issuecomment-717034885







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30156: [SPARK-33248][SQL] Add a configuration to control the legacy behavior of whether need to pad null value when value size less then schema size

2020-10-27 Thread GitBox


SparkQA commented on pull request #30156:
URL: https://github.com/apache/spark/pull/30156#issuecomment-717034649







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction

2020-10-27 Thread GitBox


SparkQA commented on pull request #29800:
URL: https://github.com/apache/spark/pull/29800#issuecomment-717034646


   **[Test build #130318 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130318/testReport)**
 for PR 29800 at commit 
[`cde6170`](https://github.com/apache/spark/commit/cde61709ad60b5c8df294fe46c1957afb73fa57d).
* This patch **fails due to an unknown error code, -9**.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `class FrameLessOffsetWindowFunctionFrame(`



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30097: [SPARK-33140][SQL] remove SQLConf and SparkSession in all sub-class of Rule[QueryPlan]

2020-10-27 Thread GitBox


SparkQA commented on pull request #30097:
URL: https://github.com/apache/spark/pull/30097#issuecomment-717034660


   **[Test build #130311 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130311/testReport)**
 for PR 30097 at commit 
[`f8e103b`](https://github.com/apache/spark/commit/f8e103bcdef514c27627daea59bfafd00f22dcd9).
* This patch **fails due to an unknown error code, -9**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29882: [SPARK-33008][SQL] Division by zero on divide-like operations returns incorrect result

2020-10-27 Thread GitBox


SparkQA commented on pull request #29882:
URL: https://github.com/apache/spark/pull/29882#issuecomment-717034645


   **[Test build #130317 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130317/testReport)**
 for PR 29882 at commit 
[`df9b54f`](https://github.com/apache/spark/commit/df9b54fa7a78bfb8f760e54fcc3e402c0a0bb490).
* This patch **fails due to an unknown error code, -9**.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `class BrokenColumnarAdd(`



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction

2020-10-27 Thread GitBox


AmplabJenkins commented on pull request #29800:
URL: https://github.com/apache/spark/pull/29800#issuecomment-717034718







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29882: [SPARK-33008][SQL] Division by zero on divide-like operations returns incorrect result

2020-10-27 Thread GitBox


SparkQA removed a comment on pull request #29882:
URL: https://github.com/apache/spark/pull/29882#issuecomment-717013900


   **[Test build #130317 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130317/testReport)**
 for PR 29882 at commit 
[`df9b54f`](https://github.com/apache/spark/commit/df9b54fa7a78bfb8f760e54fcc3e402c0a0bb490).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30139: [SPARK-31069][CORE] high cpu caused by chunksBeingTransferred in external shuffle service

2020-10-27 Thread GitBox


SparkQA commented on pull request #30139:
URL: https://github.com/apache/spark/pull/30139#issuecomment-717034652


   **[Test build #130316 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130316/testReport)**
 for PR 30139 at commit 
[`a32d6f3`](https://github.com/apache/spark/commit/a32d6f31e285c767825545aa5c3033b08bd2f271).
* This patch **fails due to an unknown error code, -9**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29882: [SPARK-33008][SQL] Division by zero on divide-like operations returns incorrect result

2020-10-27 Thread GitBox


AmplabJenkins removed a comment on pull request #29882:
URL: https://github.com/apache/spark/pull/29882#issuecomment-717033467


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/130312/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29882: [SPARK-33008][SQL] Division by zero on divide-like operations returns incorrect result

2020-10-27 Thread GitBox


AmplabJenkins removed a comment on pull request #29882:
URL: https://github.com/apache/spark/pull/29882#issuecomment-717033455


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29882: [SPARK-33008][SQL] Division by zero on divide-like operations returns incorrect result

2020-10-27 Thread GitBox


AmplabJenkins commented on pull request #29882:
URL: https://github.com/apache/spark/pull/29882#issuecomment-717033455







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29882: [SPARK-33008][SQL] Division by zero on divide-like operations returns incorrect result

2020-10-27 Thread GitBox


SparkQA removed a comment on pull request #29882:
URL: https://github.com/apache/spark/pull/29882#issuecomment-716950296


   **[Test build #130312 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130312/testReport)**
 for PR 29882 at commit 
[`14c45e9`](https://github.com/apache/spark/commit/14c45e9d029e1165053cd1cbec5573fa9df508b2).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29882: [SPARK-33008][SQL] Division by zero on divide-like operations returns incorrect result

2020-10-27 Thread GitBox


SparkQA commented on pull request #29882:
URL: https://github.com/apache/spark/pull/29882#issuecomment-717032670


   **[Test build #130312 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130312/testReport)**
 for PR 29882 at commit 
[`14c45e9`](https://github.com/apache/spark/commit/14c45e9d029e1165053cd1cbec5573fa9df508b2).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30139: [SPARK-31069][CORE] high cpu caused by chunksBeingTransferred in external shuffle service

2020-10-27 Thread GitBox


SparkQA commented on pull request #30139:
URL: https://github.com/apache/spark/pull/30139#issuecomment-717032105


   **[Test build #130319 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130319/testReport)**
 for PR 30139 at commit 
[`b23dbd0`](https://github.com/apache/spark/commit/b23dbd03d0f4219ab8db3fa3557d7a50634821f1).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction

2020-10-27 Thread GitBox


cloud-fan commented on a change in pull request #29800:
URL: https://github.com/apache/spark/pull/29800#discussion_r512451640



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowExecBase.scala
##
@@ -171,18 +178,42 @@ trait WindowExecBase extends UnaryExecNode {
 
 // Create the factory to produce WindowFunctionFrame.
 val factory = key match {
-  // Offset Frame
-  case ("OFFSET", _, IntegerLiteral(offset), _) =>
+  // Frameless offset Frame
+  case ("FRAME_LESS_OFFSET", _, IntegerLiteral(offset), _) =>
 target: InternalRow =>
-  new OffsetWindowFunctionFrame(
+  new FrameLessOffsetWindowFunctionFrame(
 target,
 ordinal,
 // OFFSET frame functions are guaranteed be 
OffsetWindowFunctions.

Review comment:
   OffsetWindowSpec

##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowExecBase.scala
##
@@ -171,18 +178,42 @@ trait WindowExecBase extends UnaryExecNode {
 
 // Create the factory to produce WindowFunctionFrame.
 val factory = key match {
-  // Offset Frame
-  case ("OFFSET", _, IntegerLiteral(offset), _) =>
+  // Frameless offset Frame
+  case ("FRAME_LESS_OFFSET", _, IntegerLiteral(offset), _) =>
 target: InternalRow =>
-  new OffsetWindowFunctionFrame(
+  new FrameLessOffsetWindowFunctionFrame(
 target,
 ordinal,
 // OFFSET frame functions are guaranteed be 
OffsetWindowFunctions.
-functions.map(_.asInstanceOf[OffsetWindowFunction]),
+functions.map(_.asInstanceOf[OffsetWindowSpec]),
 child.output,
 (expressions, schema) =>
   MutableProjection.create(expressions, schema),
 offset)
+  case ("UNBOUNDED_OFFSET", _, IntegerLiteral(offset), _) =>
+target: InternalRow => {
+  new UnboundedOffsetWindowFunctionFrame(
+target,
+ordinal,
+// OFFSET frame functions are guaranteed be 
OffsetWindowFunctions.

Review comment:
   OffsetWindowSpec

##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowExecBase.scala
##
@@ -171,18 +178,42 @@ trait WindowExecBase extends UnaryExecNode {
 
 // Create the factory to produce WindowFunctionFrame.
 val factory = key match {
-  // Offset Frame
-  case ("OFFSET", _, IntegerLiteral(offset), _) =>
+  // Frameless offset Frame
+  case ("FRAME_LESS_OFFSET", _, IntegerLiteral(offset), _) =>
 target: InternalRow =>
-  new OffsetWindowFunctionFrame(
+  new FrameLessOffsetWindowFunctionFrame(
 target,
 ordinal,
 // OFFSET frame functions are guaranteed be 
OffsetWindowFunctions.
-functions.map(_.asInstanceOf[OffsetWindowFunction]),
+functions.map(_.asInstanceOf[OffsetWindowSpec]),
 child.output,
 (expressions, schema) =>
   MutableProjection.create(expressions, schema),
 offset)
+  case ("UNBOUNDED_OFFSET", _, IntegerLiteral(offset), _) =>
+target: InternalRow => {
+  new UnboundedOffsetWindowFunctionFrame(
+target,
+ordinal,
+// OFFSET frame functions are guaranteed be 
OffsetWindowFunctions.
+functions.map(_.asInstanceOf[OffsetWindowSpec]),
+child.output,
+(expressions, schema) =>
+  MutableProjection.create(expressions, schema),
+offset - 1)
+}
+  case ("UNBOUNDED_PRECEDING_OFFSET", _, IntegerLiteral(offset), _) =>
+target: InternalRow => {
+  new UnboundedPrecedingOffsetWindowFunctionFrame(
+target,
+ordinal,
+// OFFSET frame functions are guaranteed be 
OffsetWindowFunctions.

Review comment:
   OffsetWindowSpec





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction

2020-10-27 Thread GitBox


cloud-fan commented on a change in pull request #29800:
URL: https://github.com/apache/spark/pull/29800#discussion_r512451428



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowExec.scala
##
@@ -58,7 +58,7 @@ import org.apache.spark.sql.types.{CalendarIntervalType, 
DateType, IntegerType,
  * 4. 1 PRECEDING AND 1 FOLLOWING
  * 5. 1 FOLLOWING AND 2 FOLLOWING
  * - Offset frame: The frame consist of one row, which is an offset number of 
rows away from the
- *   current row. Only [[OffsetWindowFunction]]s can be processed in an offset 
frame.
+ *   current row. Only [[FrameLessOffsetWindowFunction]]s can be processed in 
an offset frame.

Review comment:
   how about `UnboundedOffsetWindowFunctionFrame`? Is it also a "Offset 
frame"?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction

2020-10-27 Thread GitBox


cloud-fan commented on a change in pull request #29800:
URL: https://github.com/apache/spark/pull/29800#discussion_r512450768



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala
##
@@ -355,6 +344,36 @@ abstract class OffsetWindowFunction
*/
   val offset: Expression
 
+  /**
+   * Default result value for the function when the `offset`th row does not 
exist.
+   */
+  val default: Expression
+
+  /**
+   * An optional specification that indicates the offset window function 
should skip null values in
+   * the determination of which row to use.
+   */
+  val ignoreNulls: Boolean
+
+  /**
+   * Whether the offset is starts with the current row. If `isRelative` is 
true, `offset` means
+   * the offset is start with the current row. otherwise, the offset is starts 
with the first
+   * row of the entire window frame.
+   */
+  val isRelative: Boolean
+
+  lazy val fakeFrame = SpecifiedWindowFrame(RowFrame, offset, offset)
+}
+
+/**
+ * A frameless offset window function is a window function that cannot specify 
window frame and
+ * returns the value of the input column offset by a number of rows within the 
partition.

Review comment:
   `by a number of rows according to the current row`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR edited a comment on pull request #30139: [SPARK-31069][CORE] high cpu caused by chunksBeingTransferred in external shuffle service

2020-10-27 Thread GitBox


HeartSaVioR edited a comment on pull request #30139:
URL: https://github.com/apache/spark/pull/30139#issuecomment-717022097


   Sorry but your latest change doesn't actually lock properly. Long is 
immutable, and you always replace the object when you do the calculation and 
assign to the field, which is used as a lock. Your best try would be changing 
it to long (to avoid box/unbox), and have a separate Object field as locking 
purpose.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on pull request #30139: [SPARK-31069][CORE] high cpu caused by chunksBeingTransferred in external shuffle service

2020-10-27 Thread GitBox


AngersZh commented on pull request #30139:
URL: https://github.com/apache/spark/pull/30139#issuecomment-717023200


   > Sorry but your latest change doesn't actually lock properly. Long is 
immutable, and you always replace the object when you do the calculation and 
assign to the field, which is used as a lock. Your best try would be changing 
it to long, and have a separate Object field as locking purpose.
   
   a  big mistake, thanks for your suggestion.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR edited a comment on pull request #30139: [SPARK-31069][CORE] high cpu caused by chunksBeingTransferred in external shuffle service

2020-10-27 Thread GitBox


HeartSaVioR edited a comment on pull request #30139:
URL: https://github.com/apache/spark/pull/30139#issuecomment-717022097


   Sorry but your latest change doesn't actually lock properly. Long is 
immutable, and you always replace the object when you do the calculation and 
assign to the field, which is used as a lock. Your best try would be changing 
it to long, and have a separate Object field as locking purpose.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #30139: [SPARK-31069][CORE] high cpu caused by chunksBeingTransferred in external shuffle service

2020-10-27 Thread GitBox


HeartSaVioR commented on pull request #30139:
URL: https://github.com/apache/spark/pull/30139#issuecomment-717022097


   Sorry but your latest change doesn't actually lock properly. Long is 
immutable, and you always replace the object when you do the calculation, which 
is used as a lock. Your best try would be changing it to long, and have a 
separate Object field as locking purpose.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on pull request #30139: [SPARK-31069][CORE] high cpu caused by chunksBeingTransferred in external shuffle service

2020-10-27 Thread GitBox


AngersZh commented on pull request #30139:
URL: https://github.com/apache/spark/pull/30139#issuecomment-717019830


   > Hmmm. Every update to `chunkSent` and `chunkBeingSent` will compete for 
the lock on object `totalChunksBeingTransferred`
   
   We reduce many race condition on `streams` and just add a very quick lock  
on. `numChunksBeingTransferred`



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29800: [SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction

2020-10-27 Thread GitBox


SparkQA commented on pull request #29800:
URL: https://github.com/apache/spark/pull/29800#issuecomment-717017189


   **[Test build #130318 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130318/testReport)**
 for PR 29800 at commit 
[`cde6170`](https://github.com/apache/spark/commit/cde61709ad60b5c8df294fe46c1957afb73fa57d).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu edited a comment on pull request #30139: [SPARK-31069][CORE] high cpu caused by chunksBeingTransferred in external shuffle service

2020-10-27 Thread GitBox


AngersZh edited a comment on pull request #30139:
URL: https://github.com/apache/spark/pull/30139#issuecomment-717016273


   > Hmmm. Every update to `chunkSent` and `chunkBeingSent` will compete for 
the lock on object `totalChunksBeingTransferred` if we add the 
`synchronize(totalChunksBeingTransferred)`. This would increase the time for 
these operations. This would mean that to speed up `chunksBeingTransferred`, we 
are increasing the time of updates to `streamState`.
   
   I know that, but as @jiangxb1987  @mridulm mentioned, we need to  ensure the 
streamState and the totalChunksBeingTransfered are updated synchronically.  Add 
this lock is a strong guarantee.  The execution process in the middle of the 
lock is very fast, so the impact is not really significant. Also, It's a huge 
leap in performance compared to what it was before



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on pull request #30139: [SPARK-31069][CORE] high cpu caused by chunksBeingTransferred in external shuffle service

2020-10-27 Thread GitBox


AngersZh commented on pull request #30139:
URL: https://github.com/apache/spark/pull/30139#issuecomment-717016273


   > Hmmm. Every update to `chunkSent` and `chunkBeingSent` will compete for 
the lock on object `totalChunksBeingTransferred` if we add the 
`synchronize(totalChunksBeingTransferred)`. This would increase the time for 
these operations. This would mean that to speed up `chunksBeingTransferred`, we 
are increasing the time of updates to `streamState`.
   
   I know that, but as @jiangxb1987  @mridulm mentioned, we need to  ensure the 
streamState and the totalChunksBeingTransfered are updated synchronically.  Add 
this lock is a strong guarantee.  The execution process in the middle of the 
lock is very fast, so the impact is not really significant



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30139: [SPARK-31069][CORE] high cpu caused by chunksBeingTransferred in external shuffle service

2020-10-27 Thread GitBox


SparkQA commented on pull request #30139:
URL: https://github.com/apache/spark/pull/30139#issuecomment-717013857


   **[Test build #130316 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130316/testReport)**
 for PR 30139 at commit 
[`a32d6f3`](https://github.com/apache/spark/commit/a32d6f31e285c767825545aa5c3033b08bd2f271).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29882: [SPARK-33008][SQL] Division by zero on divide-like operations returns incorrect result

2020-10-27 Thread GitBox


SparkQA commented on pull request #29882:
URL: https://github.com/apache/spark/pull/29882#issuecomment-717013900


   **[Test build #130317 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130317/testReport)**
 for PR 29882 at commit 
[`df9b54f`](https://github.com/apache/spark/commit/df9b54fa7a78bfb8f760e54fcc3e402c0a0bb490).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on pull request #30139: [SPARK-31069][CORE] high cpu caused by chunksBeingTransferred in external shuffle service

2020-10-27 Thread GitBox


AngersZh commented on pull request #30139:
URL: https://github.com/apache/spark/pull/30139#issuecomment-717013113


   > Yes we should ensure the streamState and the totalChunksBeingTransfered 
are updated synchronically. Other than that the PR looks good!
   
   How about current change? Don't use AtomicLong but use `synchronize` to  
keep  strong consistency. And the test result is 
   ```
   OneForOneStreamManager fetch data duration test:
   Stream Size  Max   Min   Avg
   1   1796   187   497.5
   5   4214   1295   2267.9
   10   10635   3643   5800.3
   
   Process finished with exit code 0
   
   ```
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on a change in pull request #30139: [SPARK-31069][CORE] high cpu caused by chunksBeingTransferred in external shuffle service

2020-10-27 Thread GitBox


AngersZh commented on a change in pull request #30139:
URL: https://github.com/apache/spark/pull/30139#discussion_r512439581



##
File path: 
common/network-common/src/main/java/org/apache/spark/network/server/OneForOneStreamManager.java
##
@@ -43,6 +43,7 @@
 
   private final AtomicLong nextStreamId;
   private final ConcurrentHashMap streams;
+  private final AtomicLong totalChunksBeingTransferred = new AtomicLong(0);

Review comment:
   > nit: maybe rename to `numChunksBeingTransferred`? Because it's not 
accumulating all the chunks that are transferred in history.
   
   Updated





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] otterc commented on pull request #30139: [SPARK-31069][CORE] high cpu caused by chunksBeingTransferred in external shuffle service

2020-10-27 Thread GitBox


otterc commented on pull request #30139:
URL: https://github.com/apache/spark/pull/30139#issuecomment-717008775


   > > This should have a considerable impact on the performance when there are 
multiple open streams because updates of different streams would lock on a 
single object `totalChunksBeingTransferred`. Isn't that the case?
   > 
   > Since `totalChunksBeingTransfereed` is atomic, when it update It has its 
own competition, add a lock at `totalChunksBeingTransfereed` won't have too 
much impact. And we can keep strong consistences between the streamState and 
the totalChunksBeingTransfered are updated synchronically.
   
   Hmmm. Every update to `chunkSent` and `chunkBeingSent` will compete for the 
lock on object `totalChunksBeingTransferred` if we add the 
`synchronize(totalChunksBeingTransferred)`. This would increase the time for 
these operations. This would mean that to speed up `chunksBeingTransferred`, we 
are increasing the time of updates to `streamState`.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu edited a comment on pull request #30139: [SPARK-31069][CORE] high cpu caused by chunksBeingTransferred in external shuffle service

2020-10-27 Thread GitBox


AngersZh edited a comment on pull request #30139:
URL: https://github.com/apache/spark/pull/30139#issuecomment-717004690


   > This should have a considerable impact on the performance when there are 
multiple open streams because updates of different streams would lock on a 
single object `totalChunksBeingTransferred`. Isn't that the case?
   
   Since `totalChunksBeingTransfereed` is atomic, when it update It has its own 
competition, add a lock at `totalChunksBeingTransfereed` won't have too much 
impact. And we can keep strong consistences between the streamState and the 
totalChunksBeingTransfered are updated synchronically.  Also run the test 
above,  can't see any  apparent influence.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on pull request #30139: [SPARK-31069][CORE] high cpu caused by chunksBeingTransferred in external shuffle service

2020-10-27 Thread GitBox


AngersZh commented on pull request #30139:
URL: https://github.com/apache/spark/pull/30139#issuecomment-717004690


   > This should have a considerable impact on the performance when there are 
multiple open streams because updates of different streams would lock on a 
single object `totalChunksBeingTransferred`. Isn't that the case?
   
   Since `totalChunksBeingTransfereed` is atomic, when it update It has its own 
competition, add a lock at `totalChunksBeingTransfereed` won't have too much 
impact. And we can keep strong consistences between the streamState and the 
totalChunksBeingTransfered are updated synchronically. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29882: [SPARK-33008][SQL] Division by zero on divide-like operations returns incorrect result

2020-10-27 Thread GitBox


AmplabJenkins removed a comment on pull request #29882:
URL: https://github.com/apache/spark/pull/29882#issuecomment-717003726


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/130310/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29882: [SPARK-33008][SQL] Division by zero on divide-like operations returns incorrect result

2020-10-27 Thread GitBox


AmplabJenkins removed a comment on pull request #29882:
URL: https://github.com/apache/spark/pull/29882#issuecomment-717003715


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29882: [SPARK-33008][SQL] Division by zero on divide-like operations returns incorrect result

2020-10-27 Thread GitBox


AmplabJenkins commented on pull request #29882:
URL: https://github.com/apache/spark/pull/29882#issuecomment-717003715







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    1   2   3   4   5   6