[jira] [Updated] (SPARK-27221) Improve the assert error message in TreeNode
[ https://issues.apache.org/jira/browse/SPARK-27221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-27221: - Description: When TreeNode.parseToJson may throw an assert error without any error message when a TreeNode is not implemented properly, and it's hard to find the bad TreeNode implementation. It's better to improve the error message to indicate the type of TreeNode. > Improve the assert error message in TreeNode > > > Key: SPARK-27221 > URL: https://issues.apache.org/jira/browse/SPARK-27221 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Minor > > When TreeNode.parseToJson may throw an assert error without any error message > when a TreeNode is not implemented properly, and it's hard to find the bad > TreeNode implementation. > It's better to improve the error message to indicate the type of TreeNode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27221) Improve the assert error message in TreeNode
Shixiong Zhu created SPARK-27221: Summary: Improve the assert error message in TreeNode Key: SPARK-27221 URL: https://issues.apache.org/jira/browse/SPARK-27221 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.0 Reporter: Shixiong Zhu Assignee: Shixiong Zhu -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25449) Don't send zero accumulators in heartbeats
[ https://issues.apache.org/jira/browse/SPARK-25449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791960#comment-16791960 ] Shixiong Zhu commented on SPARK-25449: -- I think this patch actually fixed a bug introduced by https://github.com/apache/spark/commit/0514e8d4b69615ba8918649e7e3c46b5713b6540 It didn't use the correct default timeout. Before this batch, using `spark.executor.heartbeatInterval 30` would send a heartbeat every 30 ms, but each heartbeat RPC message timeout was 30 seconds. This patch just unifies the default time unit in all usages of "spark.executor.heartbeatInterval". > Don't send zero accumulators in heartbeats > -- > > Key: SPARK-25449 > URL: https://issues.apache.org/jira/browse/SPARK-25449 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Mukul Murthy >Assignee: Mukul Murthy >Priority: Major > Labels: release-notes > Fix For: 3.0.0 > > > Heartbeats sent from executors to the driver every 10 seconds contain metrics > and are generally on the order of a few KBs. However, for large jobs with > lots of tasks, heartbeats can be on the order of tens of MBs, causing tasks > to die with heartbeat failures. We can mitigate this by not sending zero > metrics to the driver. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27111) A continuous query may fail with InterruptedException when kafka consumer temporally 0 partitions temporally
[ https://issues.apache.org/jira/browse/SPARK-27111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-27111: - Fix Version/s: 2.3.4 > A continuous query may fail with InterruptedException when kafka consumer > temporally 0 partitions temporally > > > Key: SPARK-27111 > URL: https://issues.apache.org/jira/browse/SPARK-27111 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.4.0, 2.4.1 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Major > Fix For: 2.3.4, 2.4.2, 3.0.0 > > > Before a Kafka consumer gets assigned with partitions, its offset will > contain 0 partitions. However, runContinuous will still run and launch a > Spark job having 0 partitions. In this case, there is a race that epoch may > interrupt the query execution thread after `lastExecution.toRdd`, and either > `epochEndpoint.askSync[Unit](StopContinuousExecutionWrites)` or the next > `runContinuous` will get interrupted unintentionally. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-27111) A continuous query may fail with InterruptedException when kafka consumer temporally 0 partitions temporally
[ https://issues.apache.org/jira/browse/SPARK-27111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-27111. -- Resolution: Fixed > A continuous query may fail with InterruptedException when kafka consumer > temporally 0 partitions temporally > > > Key: SPARK-27111 > URL: https://issues.apache.org/jira/browse/SPARK-27111 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.4.0, 2.4.1 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Major > Fix For: 2.3.4, 2.4.2, 3.0.0 > > > Before a Kafka consumer gets assigned with partitions, its offset will > contain 0 partitions. However, runContinuous will still run and launch a > Spark job having 0 partitions. In this case, there is a race that epoch may > interrupt the query execution thread after `lastExecution.toRdd`, and either > `epochEndpoint.askSync[Unit](StopContinuousExecutionWrites)` or the next > `runContinuous` will get interrupted unintentionally. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27111) A continuous query may fail with InterruptedException when kafka consumer temporally 0 partitions temporally
[ https://issues.apache.org/jira/browse/SPARK-27111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-27111: - Fix Version/s: 2.4.2 > A continuous query may fail with InterruptedException when kafka consumer > temporally 0 partitions temporally > > > Key: SPARK-27111 > URL: https://issues.apache.org/jira/browse/SPARK-27111 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.4.0, 2.4.1 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Major > Fix For: 2.4.2, 3.0.0 > > > Before a Kafka consumer gets assigned with partitions, its offset will > contain 0 partitions. However, runContinuous will still run and launch a > Spark job having 0 partitions. In this case, there is a race that epoch may > interrupt the query execution thread after `lastExecution.toRdd`, and either > `epochEndpoint.askSync[Unit](StopContinuousExecutionWrites)` or the next > `runContinuous` will get interrupted unintentionally. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27111) A continuous query may fail with InterruptedException when kafka consumer temporally 0 partitions temporally
[ https://issues.apache.org/jira/browse/SPARK-27111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-27111: - Fix Version/s: 3.0.0 > A continuous query may fail with InterruptedException when kafka consumer > temporally 0 partitions temporally > > > Key: SPARK-27111 > URL: https://issues.apache.org/jira/browse/SPARK-27111 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.4.0, 2.4.1 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Major > Fix For: 3.0.0 > > > Before a Kafka consumer gets assigned with partitions, its offset will > contain 0 partitions. However, runContinuous will still run and launch a > Spark job having 0 partitions. In this case, there is a race that epoch may > interrupt the query execution thread after `lastExecution.toRdd`, and either > `epochEndpoint.askSync[Unit](StopContinuousExecutionWrites)` or the next > `runContinuous` will get interrupted unintentionally. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27111) A continuous query may fail with InterruptedException when kafka consumer temporally 0 partitions temporally
[ https://issues.apache.org/jira/browse/SPARK-27111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-27111: - Affects Version/s: 2.4.1 2.4.0 > A continuous query may fail with InterruptedException when kafka consumer > temporally 0 partitions temporally > > > Key: SPARK-27111 > URL: https://issues.apache.org/jira/browse/SPARK-27111 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.4.0, 2.4.1 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Major > > Before a Kafka consumer gets assigned with partitions, its offset will > contain 0 partitions. However, runContinuous will still run and launch a > Spark job having 0 partitions. In this case, there is a race that epoch may > interrupt the query execution thread after `lastExecution.toRdd`, and either > `epochEndpoint.askSync[Unit](StopContinuousExecutionWrites)` or the next > `runContinuous` will get interrupted unintentionally. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27111) A continuous query may fail with InterruptedException when kafka consumer temporally 0 partitions temporally
Shixiong Zhu created SPARK-27111: Summary: A continuous query may fail with InterruptedException when kafka consumer temporally 0 partitions temporally Key: SPARK-27111 URL: https://issues.apache.org/jira/browse/SPARK-27111 Project: Spark Issue Type: Bug Components: Structured Streaming Affects Versions: 2.3.3, 2.3.2, 2.3.1, 2.3.0 Reporter: Shixiong Zhu Assignee: Shixiong Zhu Before a Kafka consumer gets assigned with partitions, its offset will contain 0 partitions. However, runContinuous will still run and launch a Spark job having 0 partitions. In this case, there is a race that epoch may interrupt the query execution thread after `lastExecution.toRdd`, and either `epochEndpoint.askSync[Unit](StopContinuousExecutionWrites)` or the next `runContinuous` will get interrupted unintentionally. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26824) Streaming queries may store checkpoint data in a wrong directory
[ https://issues.apache.org/jira/browse/SPARK-26824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-26824: - Docs Text: Earlier version of Spark incorrectly escaped paths when writing out checkpoints and "_spark_metadata" for structured streaming. Queries affected by this issue will fail when running in Spark 3.0. It will report an instruction about how to migrate your queries. > Streaming queries may store checkpoint data in a wrong directory > > > Key: SPARK-26824 > URL: https://issues.apache.org/jira/browse/SPARK-26824 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.0.0, 2.1.0, 2.2.0, 2.3.0, 2.4.0 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Major > Labels: release-notes > Fix For: 3.0.0 > > > When a user specifies a checkpoint location containing special chars that > need to be escaped in a path, the streaming query will store checkpoint in a > wrong place. For example, if you use "/chk chk", the metadata will be stored > in "/chk%20chk". File sink's "_spark_metadata" directory has the same issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-26824) Streaming queries may store checkpoint data in a wrong directory
[ https://issues.apache.org/jira/browse/SPARK-26824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-26824. -- Resolution: Fixed Fix Version/s: 3.0.0 > Streaming queries may store checkpoint data in a wrong directory > > > Key: SPARK-26824 > URL: https://issues.apache.org/jira/browse/SPARK-26824 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.0.0, 2.1.0, 2.2.0, 2.3.0, 2.4.0 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Major > Labels: release-notes > Fix For: 3.0.0 > > > When a user specifies a checkpoint location containing special chars that > need to be escaped in a path, the streaming query will store checkpoint in a > wrong place. For example, if you use "/chk chk", the metadata will be stored > in "/chk%20chk". File sink's "_spark_metadata" directory has the same issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-20977) NPE in CollectionAccumulator
[ https://issues.apache.org/jira/browse/SPARK-20977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu reopened SPARK-20977: -- Reopening this issue as I believe I understand the cause. An accumulator is escaped before it's fully constructed in "readObject": https://github.com/apache/spark/blob/b19a28dea098c7d6188f8540429c50f42952d678/core/src/main/scala/org/apache/spark/util/AccumulatorV2.scala#L195 "The object should not be made visible to other threads, nor should the final fields be read, until all updates to the final fields of the object are complete." (https://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html#jls-17.5.3) > NPE in CollectionAccumulator > > > Key: SPARK-20977 > URL: https://issues.apache.org/jira/browse/SPARK-20977 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.1 >Reporter: sharkd tu >Priority: Major > > {code:java} > 17/06/03 13:39:31 ERROR Utils: Uncaught exception in thread > heartbeat-receiver-event-loop-thread > java.lang.NullPointerException > at > org.apache.spark.util.CollectionAccumulator.value(AccumulatorV2.scala:464) > at > org.apache.spark.util.CollectionAccumulator.value(AccumulatorV2.scala:439) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$6$$anonfun$7.apply(TaskSchedulerImpl.scala:408) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$6$$anonfun$7.apply(TaskSchedulerImpl.scala:408) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:893) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.AbstractTraversable.map(Traversable.scala:104) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$6.apply(TaskSchedulerImpl.scala:408) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$6.apply(TaskSchedulerImpl.scala:407) > at > scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) > at > scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) > at > scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241) > at scala.collection.mutable.ArrayOps$ofRef.flatMap(ArrayOps.scala:186) > at > org.apache.spark.scheduler.TaskSchedulerImpl.executorHeartbeatReceived(TaskSchedulerImpl.scala:407) > at > org.apache.spark.HeartbeatReceiver$$anonfun$receiveAndReply$1$$anon$2$$anonfun$run$2.apply$mcV$sp(HeartbeatReceiver.scala:129) > at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1283) > at > org.apache.spark.HeartbeatReceiver$$anonfun$receiveAndReply$1$$anon$2.run(HeartbeatReceiver.scala:128) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} > Is that the bug of spark? Has anybody ever hit the problem? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26824) Streaming queries may store checkpoint data in a wrong directory
[ https://issues.apache.org/jira/browse/SPARK-26824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-26824: - Affects Version/s: 2.0.0 2.1.0 2.2.0 2.3.0 > Streaming queries may store checkpoint data in a wrong directory > > > Key: SPARK-26824 > URL: https://issues.apache.org/jira/browse/SPARK-26824 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.0.0, 2.1.0, 2.2.0, 2.3.0, 2.4.0 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Major > Labels: release-notes > > When a user specifies a checkpoint location containing special chars that > need to be escaped in a path, the streaming query will store checkpoint in a > wrong place. For example, if you use "/chk chk", the metadata will be stored > in "/chk%20chk". File sink's "_spark_metadata" directory has the same issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26824) Streaming queries may store checkpoint data in a wrong directory
[ https://issues.apache.org/jira/browse/SPARK-26824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760071#comment-16760071 ] Shixiong Zhu commented on SPARK-26824: -- This will need a release note. After the fix, the paths to store metadata will be changed. The user needs to move the metadata files which are in a wrong place to the right location manually. Otherwise, their query will not pick up the old metadata. > Streaming queries may store checkpoint data in a wrong directory > > > Key: SPARK-26824 > URL: https://issues.apache.org/jira/browse/SPARK-26824 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.4.0 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Major > Labels: release-notes > > When a user specifies a checkpoint location containing special chars that > need to be escaped in a path, the streaming query will store checkpoint in a > wrong place. For example, if you use "/chk chk", the metadata will be stored > in "/chk%20chk". File sink's "_spark_metadata" directory has the same issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26824) Streaming queries may store checkpoint data in a wrong directory
[ https://issues.apache.org/jira/browse/SPARK-26824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-26824: - Description: When a user specifies a checkpoint location containing special chars that need to be escaped in a path, the streaming query will store checkpoint in a wrong place. For example, if you use "/chk chk", the metadata will be stored in "/chk%20chk". File sink's "_spark_metadata" directory has the same issue. (was: When a user specifies a checkpoint location containing special chars that need to be escaped in a path, the streaming query will store checkpoint in a wrong place. File sink's "_spark_metadata" directory has the same issue. ) > Streaming queries may store checkpoint data in a wrong directory > > > Key: SPARK-26824 > URL: https://issues.apache.org/jira/browse/SPARK-26824 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.4.0 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Major > Labels: release-notes > > When a user specifies a checkpoint location containing special chars that > need to be escaped in a path, the streaming query will store checkpoint in a > wrong place. For example, if you use "/chk chk", the metadata will be stored > in "/chk%20chk". File sink's "_spark_metadata" directory has the same issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26824) Streaming queries may store checkpoint data in a wrong directory
Shixiong Zhu created SPARK-26824: Summary: Streaming queries may store checkpoint data in a wrong directory Key: SPARK-26824 URL: https://issues.apache.org/jira/browse/SPARK-26824 Project: Spark Issue Type: Bug Components: Structured Streaming Affects Versions: 2.4.0 Reporter: Shixiong Zhu Assignee: Shixiong Zhu When a user specifies a checkpoint location containing special chars that need to be escaped in a path, the streaming query will store checkpoint in a wrong place. File sink's "_spark_metadata" directory has the same issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26824) Streaming queries may store checkpoint data in a wrong directory
[ https://issues.apache.org/jira/browse/SPARK-26824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-26824: - Labels: release-notes (was: ) > Streaming queries may store checkpoint data in a wrong directory > > > Key: SPARK-26824 > URL: https://issues.apache.org/jira/browse/SPARK-26824 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.4.0 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Major > Labels: release-notes > > When a user specifies a checkpoint location containing special chars that > need to be escaped in a path, the streaming query will store checkpoint in a > wrong place. File sink's "_spark_metadata" directory has the same issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26806) EventTimeStats.merge doesn't handle "zero.merge(zero)" correctly
[ https://issues.apache.org/jira/browse/SPARK-26806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-26806: - Affects Version/s: 2.3.3 > EventTimeStats.merge doesn't handle "zero.merge(zero)" correctly > > > Key: SPARK-26806 > URL: https://issues.apache.org/jira/browse/SPARK-26806 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.2.1, 2.2.2, 2.2.3, 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.4.0 >Reporter: liancheng >Assignee: Shixiong Zhu >Priority: Major > Fix For: 2.3.3, 2.4.1, 3.0.0, 2.2.4 > > > Right now, EventTimeStats.merge doesn't handle "zero.merge(zero)". This will > make "avg" become "NaN". And whatever gets merged with the result of > "zero.merge(zero)", "avg" will still be "NaN". Then finally, "NaN".toLong > will return "0" and the user will see the following incorrect report: > {code} > "eventTime" : { > "avg" : "1970-01-01T00:00:00.000Z", > "max" : "2019-01-31T12:57:00.000Z", > "min" : "2019-01-30T18:44:04.000Z", > "watermark" : "1970-01-01T00:00:00.000Z" > } > {code} > This issue was reported by [~liancheng] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26806) EventTimeStats.merge doesn't handle "zero.merge(zero)" correctly
[ https://issues.apache.org/jira/browse/SPARK-26806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-26806: - Fix Version/s: (was: 2.3.3) 2.3.4 > EventTimeStats.merge doesn't handle "zero.merge(zero)" correctly > > > Key: SPARK-26806 > URL: https://issues.apache.org/jira/browse/SPARK-26806 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.2.1, 2.2.2, 2.2.3, 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.4.0 >Reporter: liancheng >Assignee: Shixiong Zhu >Priority: Major > Fix For: 2.3.4, 2.4.1, 3.0.0, 2.2.4 > > > Right now, EventTimeStats.merge doesn't handle "zero.merge(zero)". This will > make "avg" become "NaN". And whatever gets merged with the result of > "zero.merge(zero)", "avg" will still be "NaN". Then finally, "NaN".toLong > will return "0" and the user will see the following incorrect report: > {code} > "eventTime" : { > "avg" : "1970-01-01T00:00:00.000Z", > "max" : "2019-01-31T12:57:00.000Z", > "min" : "2019-01-30T18:44:04.000Z", > "watermark" : "1970-01-01T00:00:00.000Z" > } > {code} > This issue was reported by [~liancheng] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26806) EventTimeStats.merge doesn't handle "zero.merge(zero)" correctly
[ https://issues.apache.org/jira/browse/SPARK-26806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-26806: - Affects Version/s: 2.2.2 2.2.3 > EventTimeStats.merge doesn't handle "zero.merge(zero)" correctly > > > Key: SPARK-26806 > URL: https://issues.apache.org/jira/browse/SPARK-26806 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.2.1, 2.2.2, 2.2.3, 2.3.0, 2.3.1, 2.3.2, 2.4.0 >Reporter: liancheng >Assignee: Shixiong Zhu >Priority: Major > Fix For: 2.3.3, 2.4.1, 3.0.0, 2.2.4 > > > Right now, EventTimeStats.merge doesn't handle "zero.merge(zero)". This will > make "avg" become "NaN". And whatever gets merged with the result of > "zero.merge(zero)", "avg" will still be "NaN". Then finally, "NaN".toLong > will return "0" and the user will see the following incorrect report: > {code} > "eventTime" : { > "avg" : "1970-01-01T00:00:00.000Z", > "max" : "2019-01-31T12:57:00.000Z", > "min" : "2019-01-30T18:44:04.000Z", > "watermark" : "1970-01-01T00:00:00.000Z" > } > {code} > This issue was reported by [~liancheng] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-26806) EventTimeStats.merge doesn't handle "zero.merge(zero)" correctly
[ https://issues.apache.org/jira/browse/SPARK-26806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-26806. -- Resolution: Fixed Fix Version/s: 3.0.0 2.4.1 2.3.3 2.2.4 > EventTimeStats.merge doesn't handle "zero.merge(zero)" correctly > > > Key: SPARK-26806 > URL: https://issues.apache.org/jira/browse/SPARK-26806 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.2.1, 2.3.0, 2.3.1, 2.3.2, 2.4.0 >Reporter: liancheng >Assignee: Shixiong Zhu >Priority: Major > Fix For: 2.2.4, 2.3.3, 2.4.1, 3.0.0 > > > Right now, EventTimeStats.merge doesn't handle "zero.merge(zero)". This will > make "avg" become "NaN". And whatever gets merged with the result of > "zero.merge(zero)", "avg" will still be "NaN". Then finally, "NaN".toLong > will return "0" and the user will see the following incorrect report: > {code} > "eventTime" : { > "avg" : "1970-01-01T00:00:00.000Z", > "max" : "2019-01-31T12:57:00.000Z", > "min" : "2019-01-30T18:44:04.000Z", > "watermark" : "1970-01-01T00:00:00.000Z" > } > {code} > This issue was reported by [~liancheng] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26783) Kafka parameter documentation doesn't match with the reality (upper/lowercase)
[ https://issues.apache.org/jira/browse/SPARK-26783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16757824#comment-16757824 ] Shixiong Zhu commented on SPARK-26783: -- [~gsomogyi] This seems just an API document issue. Right? If the user passes "failOnDataLoss", the Kafka source will pick it up correctly. > Kafka parameter documentation doesn't match with the reality (upper/lowercase) > -- > > Key: SPARK-26783 > URL: https://issues.apache.org/jira/browse/SPARK-26783 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 3.0.0 >Reporter: Gabor Somogyi >Priority: Minor > > A good example for this is "failOnDataLoss" which is reported in SPARK-23685. > I've just checked and there are several other parameters which suffer from > the same issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26806) EventTimeStats.merge doesn't handle "zero.merge(zero)" correctly
[ https://issues.apache.org/jira/browse/SPARK-26806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-26806: - Description: Right now, EventTimeStats.merge doesn't handle "zero.merge(zero)". This will make "avg" become "NaN". And whatever gets merged with the result of "zero.merge(zero)", "avg" will still be "NaN". Then finally, "NaN".toLong will return "0" and the user will see the following incorrect report: {code} "eventTime" : { "avg" : "1970-01-01T00:00:00.000Z", "max" : "2019-01-31T12:57:00.000Z", "min" : "2019-01-30T18:44:04.000Z", "watermark" : "1970-01-01T00:00:00.000Z" } {code} This issue was reported by [~liancheng] was: Right now, EventTimeStats.merge doesn't handle "zero.merge(zero)". This will make "avg" become "NaN". And whatever gets merged with the result of "zero.merge(zero)", "avg" will still be "NaN". Then finally, "NaN".toLong will return "0" and the user will see the following incorrect report: {code} "eventTime" : { "avg" : "1970-01-01T00:00:00.000Z", "max" : "2019-01-31T12:57:00.000Z", "min" : "2019-01-30T18:44:04.000Z", "watermark" : "1970-01-01T00:00:00.000Z" } {code} > EventTimeStats.merge doesn't handle "zero.merge(zero)" correctly > > > Key: SPARK-26806 > URL: https://issues.apache.org/jira/browse/SPARK-26806 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.2.1, 2.3.0, 2.3.1, 2.3.2, 2.4.0 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Major > > Right now, EventTimeStats.merge doesn't handle "zero.merge(zero)". This will > make "avg" become "NaN". And whatever gets merged with the result of > "zero.merge(zero)", "avg" will still be "NaN". Then finally, "NaN".toLong > will return "0" and the user will see the following incorrect report: > {code} > "eventTime" : { > "avg" : "1970-01-01T00:00:00.000Z", > "max" : "2019-01-31T12:57:00.000Z", > "min" : "2019-01-30T18:44:04.000Z", > "watermark" : "1970-01-01T00:00:00.000Z" > } > {code} > This issue was reported by [~liancheng] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26806) EventTimeStats.merge doesn't handle "zero.merge(zero)" correctly
[ https://issues.apache.org/jira/browse/SPARK-26806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-26806: - Reporter: liancheng (was: Shixiong Zhu) > EventTimeStats.merge doesn't handle "zero.merge(zero)" correctly > > > Key: SPARK-26806 > URL: https://issues.apache.org/jira/browse/SPARK-26806 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.2.1, 2.3.0, 2.3.1, 2.3.2, 2.4.0 >Reporter: liancheng >Assignee: Shixiong Zhu >Priority: Major > > Right now, EventTimeStats.merge doesn't handle "zero.merge(zero)". This will > make "avg" become "NaN". And whatever gets merged with the result of > "zero.merge(zero)", "avg" will still be "NaN". Then finally, "NaN".toLong > will return "0" and the user will see the following incorrect report: > {code} > "eventTime" : { > "avg" : "1970-01-01T00:00:00.000Z", > "max" : "2019-01-31T12:57:00.000Z", > "min" : "2019-01-30T18:44:04.000Z", > "watermark" : "1970-01-01T00:00:00.000Z" > } > {code} > This issue was reported by [~liancheng] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26806) EventTimeStats.merge doesn't handle "zero.merge(zero)" correctly
[ https://issues.apache.org/jira/browse/SPARK-26806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-26806: - Affects Version/s: 2.2.1 2.3.0 2.3.1 2.3.2 > EventTimeStats.merge doesn't handle "zero.merge(zero)" correctly > > > Key: SPARK-26806 > URL: https://issues.apache.org/jira/browse/SPARK-26806 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.2.1, 2.3.0, 2.3.1, 2.3.2, 2.4.0 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Major > > Right now, EventTimeStats.merge doesn't handle "zero.merge(zero)". This will > make "avg" become "NaN". And whatever gets merged with the result of > "zero.merge(zero)", "avg" will still be "NaN". Then finally, "NaN".toLong > will return "0" and the user will see the following incorrect report: > {code} > "eventTime" : { > "avg" : "1970-01-01T00:00:00.000Z", > "max" : "2019-01-31T12:57:00.000Z", > "min" : "2019-01-30T18:44:04.000Z", > "watermark" : "1970-01-01T00:00:00.000Z" > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26806) EventTimeStats.merge doesn't handle "zero.merge(zero)" correctly
Shixiong Zhu created SPARK-26806: Summary: EventTimeStats.merge doesn't handle "zero.merge(zero)" correctly Key: SPARK-26806 URL: https://issues.apache.org/jira/browse/SPARK-26806 Project: Spark Issue Type: Bug Components: Structured Streaming Affects Versions: 2.4.0 Reporter: Shixiong Zhu Assignee: Shixiong Zhu Right now, EventTimeStats.merge doesn't handle "zero.merge(zero)". This will make "avg" become "NaN". And whatever gets merged with the result of "zero.merge(zero)", "avg" will still be "NaN". Then finally, "NaN".toLong will return "0" and the user will see the following incorrect report: {code} "eventTime" : { "avg" : "1970-01-01T00:00:00.000Z", "max" : "2019-01-31T12:57:00.000Z", "min" : "2019-01-30T18:44:04.000Z", "watermark" : "1970-01-01T00:00:00.000Z" } {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26682) Task attempt ID collision causes lost data
[ https://issues.apache.org/jira/browse/SPARK-26682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16751651#comment-16751651 ] Shixiong Zhu commented on SPARK-26682: -- For future reference, data loss could happen when one task modified the other task's temp output file. > Task attempt ID collision causes lost data > -- > > Key: SPARK-26682 > URL: https://issues.apache.org/jira/browse/SPARK-26682 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.3, 2.3.2, 2.4.0 >Reporter: Ryan Blue >Assignee: Ryan Blue >Priority: Blocker > Labels: data-loss > Fix For: 2.3.3, 2.4.1, 3.0.0 > > > We recently tracked missing data to a collision in the fake Hadoop task > attempt ID created when using Hadoop OutputCommitters. This is similar to > SPARK-24589. > A stage had one task fail to get one shard from a shuffle, causing a > FetchFailedException and Spark resubmitted the stage. Because only one task > was affected, the original stage attempt continued running tasks that had > been resubmitted. Another task ran two attempts concurrently on the same > executor, but had the same attempt number because they were from different > stage attempts. Because the attempt number was the same, the task used the > same temp locations. That caused one attempt to fail because a file path > already existed, and that attempt then removed the shared temp location and > deleted the other task's data. When the second attempt succeeded, it > committed partial data. > The problem was that both attempts had the same partition and attempt > numbers, despite being run in different stages, and that was used to create a > Hadoop task attempt ID on which the temp location was based. The fix is to > use Spark's global task attempt ID, which is a counter, instead of attempt > number because attempt number is reused in stage attempts. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26682) Task attempt ID collision causes lost data
[ https://issues.apache.org/jira/browse/SPARK-26682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750486#comment-16750486 ] Shixiong Zhu commented on SPARK-26682: -- IIUC, this issue will cause a file deletion (delete the temp file) and a file rename (move the temp file to the target file) happen at the same time. Could you clarify why this will cause a task committed partial data? I think the file rename should either move the whole file to the target file, or just fail, right? > Task attempt ID collision causes lost data > -- > > Key: SPARK-26682 > URL: https://issues.apache.org/jira/browse/SPARK-26682 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.3, 2.3.2, 2.4.0 >Reporter: Ryan Blue >Priority: Blocker > Labels: data-loss > > We recently tracked missing data to a collision in the fake Hadoop task > attempt ID created when using Hadoop OutputCommitters. This is similar to > SPARK-24589. > A stage had one task fail to get one shard from a shuffle, causing a > FetchFailedException and Spark resubmitted the stage. Because only one task > was affected, the original stage attempt continued running tasks that had > been resubmitted. Another task ran two attempts concurrently on the same > executor, but had the same attempt number because they were from different > stage attempts. Because the attempt number was the same, the task used the > same temp locations. That caused one attempt to fail because a file path > already existed, and that attempt then removed the shared temp location and > deleted the other task's data. When the second attempt succeeded, it > committed partial data. > The problem was that both attempts had the same partition and attempt > numbers, despite being run in different stages, and that was used to create a > Hadoop task attempt ID on which the temp location was based. The fix is to > use Spark's global task attempt ID, which is a counter, instead of attempt > number because attempt number is reused in stage attempts. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26665) BlockTransferService.fetchBlockSync may hang forever
[ https://issues.apache.org/jira/browse/SPARK-26665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-26665: - Fix Version/s: 2.3.4 > BlockTransferService.fetchBlockSync may hang forever > > > Key: SPARK-26665 > URL: https://issues.apache.org/jira/browse/SPARK-26665 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Major > Fix For: 2.3.4, 2.4.1, 3.0.0 > > > `ByteBuffer.allocate` may throw OutOfMemoryError when the block is large but > no enough memory is available. However, when this happens, right now > BlockTransferService.fetchBlockSync will just hang forever. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26665) BlockTransferService.fetchBlockSync may hang forever
[ https://issues.apache.org/jira/browse/SPARK-26665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-26665: - Affects Version/s: 2.3.0 2.3.1 > BlockTransferService.fetchBlockSync may hang forever > > > Key: SPARK-26665 > URL: https://issues.apache.org/jira/browse/SPARK-26665 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.4.0 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Major > Fix For: 2.3.4, 2.4.1, 3.0.0 > > > `ByteBuffer.allocate` may throw OutOfMemoryError when the block is large but > no enough memory is available. However, when this happens, right now > BlockTransferService.fetchBlockSync will just hang forever. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26665) BlockTransferService.fetchBlockSync may hang forever
[ https://issues.apache.org/jira/browse/SPARK-26665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-26665: - Affects Version/s: 2.3.2 > BlockTransferService.fetchBlockSync may hang forever > > > Key: SPARK-26665 > URL: https://issues.apache.org/jira/browse/SPARK-26665 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.2, 2.4.0 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Major > Fix For: 2.3.4, 2.4.1, 3.0.0 > > > `ByteBuffer.allocate` may throw OutOfMemoryError when the block is large but > no enough memory is available. However, when this happens, right now > BlockTransferService.fetchBlockSync will just hang forever. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-26665) BlockTransferService.fetchBlockSync may hang forever
[ https://issues.apache.org/jira/browse/SPARK-26665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-26665. -- Resolution: Fixed Fix Version/s: 3.0.0 2.4.1 > BlockTransferService.fetchBlockSync may hang forever > > > Key: SPARK-26665 > URL: https://issues.apache.org/jira/browse/SPARK-26665 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Major > Fix For: 2.4.1, 3.0.0 > > > `ByteBuffer.allocate` may throw OutOfMemoryError when the block is large but > no enough memory is available. However, when this happens, right now > BlockTransferService.fetchBlockSync will just hang forever. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26665) BlockTransferService.fetchBlockSync may hang forever
Shixiong Zhu created SPARK-26665: Summary: BlockTransferService.fetchBlockSync may hang forever Key: SPARK-26665 URL: https://issues.apache.org/jira/browse/SPARK-26665 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.4.0 Reporter: Shixiong Zhu Assignee: Shixiong Zhu `ByteBuffer.allocate` may throw OutOfMemoryError when the block is large but no enough memory is available. However, when this happens, right now BlockTransferService.fetchBlockSync will just hang forever. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26629) Error with multiple file stream in a query + restart on a batch that has no data for one file stream
[ https://issues.apache.org/jira/browse/SPARK-26629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-26629: - Fix Version/s: (was: 2.3.4) > Error with multiple file stream in a query + restart on a batch that has no > data for one file stream > > > Key: SPARK-26629 > URL: https://issues.apache.org/jira/browse/SPARK-26629 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.4.0, 2.4.1 >Reporter: Tathagata Das >Assignee: Tathagata Das >Priority: Major > Fix For: 2.4.1, 3.0.0 > > > When a streaming query has multiple file streams, and there is a batch where > one of the file streams dont have data in that batch, then if the query has > to restart from that, it will throw the following error. > {code} > java.lang.IllegalStateException: batch 1 doesn't exist > at > org.apache.spark.sql.execution.streaming.HDFSMetadataLog$.verifyBatchIds(HDFSMetadataLog.scala:300) > at > org.apache.spark.sql.execution.streaming.FileStreamSourceLog.get(FileStreamSourceLog.scala:120) > at > org.apache.spark.sql.execution.streaming.FileStreamSource.getBatch(FileStreamSource.scala:181) > at > org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$populateStartOffsets$2.apply(MicroBatchExecution.scala:294) > at > org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$populateStartOffsets$2.apply(MicroBatchExecution.scala:291) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1334) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at > org.apache.spark.sql.execution.streaming.StreamProgress.foreach(StreamProgress.scala:25) > at > org.apache.spark.sql.execution.streaming.MicroBatchExecution.org$apache$spark$sql$execution$streaming$MicroBatchExecution$$populateStartOffsets(MicroBatchExecution.scala:291) > at > org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(MicroBatchExecution.scala:178) > at > org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:175) > at > org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:175) > at > org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:251) > at > org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:61) > at > org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1.apply$mcZ$sp(MicroBatchExecution.scala:175) > at > org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56) > at > org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExecution.scala:169) > at > org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:295) > at > org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:205) > {code} > **Reason** > Existing {{HDFSMetadata.verifyBatchIds}} throws error whenever the batchIds > list was empty. In the context of {{FileStreamSource.getBatch}} (where verify > is called) and FileStreamSourceLog (subclass of HDFSMetadata), this is > usually okay because, in a streaming query with one file stream, the batchIds > can never be empty: > A batch is planned only when the FileStreamSourceLog has seen new offset > (that is, there are new data files). > So FileStreamSource.getBatch will be called on X to Y where X will always be > > Y. This calls internally {{HDFSMetadata.verifyBatchIds (X+1, Y)}} with > X+1-Y ids. > For example, {{FileStreamSource.getBatch(4, 5)}} will call {{verify(batchIds > = Seq(5), start = 5, end = 5)}}. However, the invariant of X > Y is not true > when there are two file stream sources, as a batch may be planned even when > only one of the file streams has data. So one of the file stream may not have > data, which can call {{FileStreamSource.getBatch(X, X) -> verify(batchIds = > Seq.empty, start = X+1, end = X) -> failure}}. > Note that FileStreamSource.getBatch(X, X) gets called only when restarting a > query in a batch
[jira] [Updated] (SPARK-26629) Error with multiple file stream in a query + restart on a batch that has no data for one file stream
[ https://issues.apache.org/jira/browse/SPARK-26629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-26629: - Fix Version/s: 3.0.0 2.4.1 2.3.4 > Error with multiple file stream in a query + restart on a batch that has no > data for one file stream > > > Key: SPARK-26629 > URL: https://issues.apache.org/jira/browse/SPARK-26629 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.4.0, 2.4.1 >Reporter: Tathagata Das >Assignee: Tathagata Das >Priority: Major > Fix For: 2.4.1, 3.0.0, 2.3.4 > > > When a streaming query has multiple file streams, and there is a batch where > one of the file streams dont have data in that batch, then if the query has > to restart from that, it will throw the following error. > {code} > java.lang.IllegalStateException: batch 1 doesn't exist > at > org.apache.spark.sql.execution.streaming.HDFSMetadataLog$.verifyBatchIds(HDFSMetadataLog.scala:300) > at > org.apache.spark.sql.execution.streaming.FileStreamSourceLog.get(FileStreamSourceLog.scala:120) > at > org.apache.spark.sql.execution.streaming.FileStreamSource.getBatch(FileStreamSource.scala:181) > at > org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$populateStartOffsets$2.apply(MicroBatchExecution.scala:294) > at > org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$populateStartOffsets$2.apply(MicroBatchExecution.scala:291) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1334) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at > org.apache.spark.sql.execution.streaming.StreamProgress.foreach(StreamProgress.scala:25) > at > org.apache.spark.sql.execution.streaming.MicroBatchExecution.org$apache$spark$sql$execution$streaming$MicroBatchExecution$$populateStartOffsets(MicroBatchExecution.scala:291) > at > org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(MicroBatchExecution.scala:178) > at > org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:175) > at > org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:175) > at > org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:251) > at > org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:61) > at > org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1.apply$mcZ$sp(MicroBatchExecution.scala:175) > at > org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56) > at > org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExecution.scala:169) > at > org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:295) > at > org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:205) > {code} > **Reason** > Existing {{HDFSMetadata.verifyBatchIds}} throws error whenever the batchIds > list was empty. In the context of {{FileStreamSource.getBatch}} (where verify > is called) and FileStreamSourceLog (subclass of HDFSMetadata), this is > usually okay because, in a streaming query with one file stream, the batchIds > can never be empty: > A batch is planned only when the FileStreamSourceLog has seen new offset > (that is, there are new data files). > So FileStreamSource.getBatch will be called on X to Y where X will always be > > Y. This calls internally {{HDFSMetadata.verifyBatchIds (X+1, Y)}} with > X+1-Y ids. > For example, {{FileStreamSource.getBatch(4, 5)}} will call {{verify(batchIds > = Seq(5), start = 5, end = 5)}}. However, the invariant of X > Y is not true > when there are two file stream sources, as a batch may be planned even when > only one of the file streams has data. So one of the file stream may not have > data, which can call {{FileStreamSource.getBatch(X, X) -> verify(batchIds = > Seq.empty, start = X+1, end = X) -> failure}}. > Note that FileStreamSource.getBatch(X, X) gets called only
[jira] [Resolved] (SPARK-26629) Error with multiple file stream in a query + restart on a batch that has no data for one file stream
[ https://issues.apache.org/jira/browse/SPARK-26629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-26629. -- Resolution: Fixed > Error with multiple file stream in a query + restart on a batch that has no > data for one file stream > > > Key: SPARK-26629 > URL: https://issues.apache.org/jira/browse/SPARK-26629 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.4.0, 2.4.1 >Reporter: Tathagata Das >Assignee: Tathagata Das >Priority: Major > Fix For: 2.4.1, 3.0.0, 2.3.4 > > > When a streaming query has multiple file streams, and there is a batch where > one of the file streams dont have data in that batch, then if the query has > to restart from that, it will throw the following error. > {code} > java.lang.IllegalStateException: batch 1 doesn't exist > at > org.apache.spark.sql.execution.streaming.HDFSMetadataLog$.verifyBatchIds(HDFSMetadataLog.scala:300) > at > org.apache.spark.sql.execution.streaming.FileStreamSourceLog.get(FileStreamSourceLog.scala:120) > at > org.apache.spark.sql.execution.streaming.FileStreamSource.getBatch(FileStreamSource.scala:181) > at > org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$populateStartOffsets$2.apply(MicroBatchExecution.scala:294) > at > org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$populateStartOffsets$2.apply(MicroBatchExecution.scala:291) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1334) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at > org.apache.spark.sql.execution.streaming.StreamProgress.foreach(StreamProgress.scala:25) > at > org.apache.spark.sql.execution.streaming.MicroBatchExecution.org$apache$spark$sql$execution$streaming$MicroBatchExecution$$populateStartOffsets(MicroBatchExecution.scala:291) > at > org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(MicroBatchExecution.scala:178) > at > org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:175) > at > org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:175) > at > org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:251) > at > org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:61) > at > org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1.apply$mcZ$sp(MicroBatchExecution.scala:175) > at > org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56) > at > org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExecution.scala:169) > at > org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:295) > at > org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:205) > {code} > **Reason** > Existing {{HDFSMetadata.verifyBatchIds}} throws error whenever the batchIds > list was empty. In the context of {{FileStreamSource.getBatch}} (where verify > is called) and FileStreamSourceLog (subclass of HDFSMetadata), this is > usually okay because, in a streaming query with one file stream, the batchIds > can never be empty: > A batch is planned only when the FileStreamSourceLog has seen new offset > (that is, there are new data files). > So FileStreamSource.getBatch will be called on X to Y where X will always be > > Y. This calls internally {{HDFSMetadata.verifyBatchIds (X+1, Y)}} with > X+1-Y ids. > For example, {{FileStreamSource.getBatch(4, 5)}} will call {{verify(batchIds > = Seq(5), start = 5, end = 5)}}. However, the invariant of X > Y is not true > when there are two file stream sources, as a batch may be planned even when > only one of the file streams has data. So one of the file stream may not have > data, which can call {{FileStreamSource.getBatch(X, X) -> verify(batchIds = > Seq.empty, start = X+1, end = X) -> failure}}. > Note that FileStreamSource.getBatch(X, X) gets called only when restarting a > query in a batch where
[jira] [Updated] (SPARK-26629) Error with multiple file stream in a query + restart on a batch that has no data for one file stream
[ https://issues.apache.org/jira/browse/SPARK-26629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-26629: - Affects Version/s: 2.3.3 > Error with multiple file stream in a query + restart on a batch that has no > data for one file stream > > > Key: SPARK-26629 > URL: https://issues.apache.org/jira/browse/SPARK-26629 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.4.0, 2.4.1 >Reporter: Tathagata Das >Assignee: Tathagata Das >Priority: Major > Fix For: 2.4.1, 3.0.0, 2.3.4 > > > When a streaming query has multiple file streams, and there is a batch where > one of the file streams dont have data in that batch, then if the query has > to restart from that, it will throw the following error. > {code} > java.lang.IllegalStateException: batch 1 doesn't exist > at > org.apache.spark.sql.execution.streaming.HDFSMetadataLog$.verifyBatchIds(HDFSMetadataLog.scala:300) > at > org.apache.spark.sql.execution.streaming.FileStreamSourceLog.get(FileStreamSourceLog.scala:120) > at > org.apache.spark.sql.execution.streaming.FileStreamSource.getBatch(FileStreamSource.scala:181) > at > org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$populateStartOffsets$2.apply(MicroBatchExecution.scala:294) > at > org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$populateStartOffsets$2.apply(MicroBatchExecution.scala:291) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1334) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at > org.apache.spark.sql.execution.streaming.StreamProgress.foreach(StreamProgress.scala:25) > at > org.apache.spark.sql.execution.streaming.MicroBatchExecution.org$apache$spark$sql$execution$streaming$MicroBatchExecution$$populateStartOffsets(MicroBatchExecution.scala:291) > at > org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(MicroBatchExecution.scala:178) > at > org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:175) > at > org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:175) > at > org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:251) > at > org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:61) > at > org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1.apply$mcZ$sp(MicroBatchExecution.scala:175) > at > org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56) > at > org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExecution.scala:169) > at > org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:295) > at > org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:205) > {code} > **Reason** > Existing {{HDFSMetadata.verifyBatchIds}} throws error whenever the batchIds > list was empty. In the context of {{FileStreamSource.getBatch}} (where verify > is called) and FileStreamSourceLog (subclass of HDFSMetadata), this is > usually okay because, in a streaming query with one file stream, the batchIds > can never be empty: > A batch is planned only when the FileStreamSourceLog has seen new offset > (that is, there are new data files). > So FileStreamSource.getBatch will be called on X to Y where X will always be > > Y. This calls internally {{HDFSMetadata.verifyBatchIds (X+1, Y)}} with > X+1-Y ids. > For example, {{FileStreamSource.getBatch(4, 5)}} will call {{verify(batchIds > = Seq(5), start = 5, end = 5)}}. However, the invariant of X > Y is not true > when there are two file stream sources, as a batch may be planned even when > only one of the file streams has data. So one of the file stream may not have > data, which can call {{FileStreamSource.getBatch(X, X) -> verify(batchIds = > Seq.empty, start = X+1, end = X) -> failure}}. > Note that FileStreamSource.getBatch(X, X) gets called only when restarting a > query in a batch
[jira] [Resolved] (SPARK-26350) Allow the user to override the group id of the Kafka's consumer
[ https://issues.apache.org/jira/browse/SPARK-26350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-26350. -- Resolution: Fixed Assignee: Shixiong Zhu Fix Version/s: 3.0.0 > Allow the user to override the group id of the Kafka's consumer > --- > > Key: SPARK-26350 > URL: https://issues.apache.org/jira/browse/SPARK-26350 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 2.4.0 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Major > Fix For: 3.0.0 > > Attachments: Permalink.url > > > Sometimes the group id is used to identify the stream for "security". We > should give a flag that lets you override it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26586) Streaming queries should have isolated SparkSessions and confs
[ https://issues.apache.org/jira/browse/SPARK-26586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-26586: - Affects Version/s: 2.2.1 2.2.2 2.3.1 2.3.2 > Streaming queries should have isolated SparkSessions and confs > -- > > Key: SPARK-26586 > URL: https://issues.apache.org/jira/browse/SPARK-26586 > Project: Spark > Issue Type: Bug > Components: SQL, Structured Streaming >Affects Versions: 2.2.0, 2.2.1, 2.2.2, 2.3.0, 2.3.1, 2.3.2, 2.4.0 >Reporter: Mukul Murthy >Assignee: Mukul Murthy >Priority: Major > Fix For: 2.4.1, 3.0.0 > > > When a stream is started, the stream's config is supposed to be frozen and > all batches run with the config at start time. However, due to a race > condition in creating streams, updating a conf value in the active spark > session immediately after starting a stream can lead to the stream getting > that updated value. > > The problem is that when StreamingQueryManager creates a MicrobatchExecution > (or ContinuousExecution), it passes in the shared spark session, and the > spark session isn't cloned until StreamExecution.start() is called. > DataStreamWriter.start() should not return until the SparkSession is cloned. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26586) Streaming queries should have isolated SparkSessions and confs
[ https://issues.apache.org/jira/browse/SPARK-26586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-26586: - Affects Version/s: 2.2.0 > Streaming queries should have isolated SparkSessions and confs > -- > > Key: SPARK-26586 > URL: https://issues.apache.org/jira/browse/SPARK-26586 > Project: Spark > Issue Type: Bug > Components: SQL, Structured Streaming >Affects Versions: 2.2.0, 2.3.0, 2.4.0 >Reporter: Mukul Murthy >Assignee: Mukul Murthy >Priority: Major > Fix For: 2.4.1, 3.0.0 > > > When a stream is started, the stream's config is supposed to be frozen and > all batches run with the config at start time. However, due to a race > condition in creating streams, updating a conf value in the active spark > session immediately after starting a stream can lead to the stream getting > that updated value. > > The problem is that when StreamingQueryManager creates a MicrobatchExecution > (or ContinuousExecution), it passes in the shared spark session, and the > spark session isn't cloned until StreamExecution.start() is called. > DataStreamWriter.start() should not return until the SparkSession is cloned. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26586) Streaming queries should have isolated SparkSessions and confs
[ https://issues.apache.org/jira/browse/SPARK-26586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-26586: - Target Version/s: 3.0.0 (was: 2.5.0, 3.0.0) > Streaming queries should have isolated SparkSessions and confs > -- > > Key: SPARK-26586 > URL: https://issues.apache.org/jira/browse/SPARK-26586 > Project: Spark > Issue Type: Bug > Components: SQL, Structured Streaming >Affects Versions: 2.3.0, 2.4.0 >Reporter: Mukul Murthy >Assignee: Mukul Murthy >Priority: Major > Fix For: 2.4.1, 3.0.0 > > > When a stream is started, the stream's config is supposed to be frozen and > all batches run with the config at start time. However, due to a race > condition in creating streams, updating a conf value in the active spark > session immediately after starting a stream can lead to the stream getting > that updated value. > > The problem is that when StreamingQueryManager creates a MicrobatchExecution > (or ContinuousExecution), it passes in the shared spark session, and the > spark session isn't cloned until StreamExecution.start() is called. > DataStreamWriter.start() should not return until the SparkSession is cloned. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-26586) Streaming queries should have isolated SparkSessions and confs
[ https://issues.apache.org/jira/browse/SPARK-26586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-26586. -- Resolution: Fixed Assignee: Mukul Murthy Fix Version/s: 3.0.0 2.4.1 > Streaming queries should have isolated SparkSessions and confs > -- > > Key: SPARK-26586 > URL: https://issues.apache.org/jira/browse/SPARK-26586 > Project: Spark > Issue Type: Bug > Components: SQL, Structured Streaming >Affects Versions: 2.3.0, 2.4.0 >Reporter: Mukul Murthy >Assignee: Mukul Murthy >Priority: Major > Fix For: 2.4.1, 3.0.0 > > > When a stream is started, the stream's config is supposed to be frozen and > all batches run with the config at start time. However, due to a race > condition in creating streams, updating a conf value in the active spark > session immediately after starting a stream can lead to the stream getting > that updated value. > > The problem is that when StreamingQueryManager creates a MicrobatchExecution > (or ContinuousExecution), it passes in the shared spark session, and the > spark session isn't cloned until StreamExecution.start() is called. > DataStreamWriter.start() should not return until the SparkSession is cloned. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26594) DataSourceOptions.asMap should return CaseInsensitiveMap
Shixiong Zhu created SPARK-26594: Summary: DataSourceOptions.asMap should return CaseInsensitiveMap Key: SPARK-26594 URL: https://issues.apache.org/jira/browse/SPARK-26594 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.0 Reporter: Shixiong Zhu I'm pretty surprised that the following codes will fail. {code} import scala.collection.JavaConverters._ import org.apache.spark.sql.sources.v2.DataSourceOptions val map = new DataSourceOptions(Map("fooBar" -> "x").asJava).asMap assert(map.get("fooBar") == "x") {code} It's better to make DataSourceOptions.asMap return CaseInsensitiveMap. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-26267) Kafka source may reprocess data
[ https://issues.apache.org/jira/browse/SPARK-26267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-26267. -- Resolution: Fixed Fix Version/s: 2.4.1 > Kafka source may reprocess data > --- > > Key: SPARK-26267 > URL: https://issues.apache.org/jira/browse/SPARK-26267 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.4.0 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Blocker > Labels: correctness > Fix For: 2.4.1, 3.0.0 > > > Due to KAFKA-7703, when the Kafka source tries to get the latest offset, it > may get an earliest offset, and then it will reprocess messages that have > been processed when it gets the correct latest offset in the next batch. > This usually happens when restarting a streaming query. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26267) Kafka source may reprocess data
[ https://issues.apache.org/jira/browse/SPARK-26267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-26267: - Fix Version/s: 3.0.0 > Kafka source may reprocess data > --- > > Key: SPARK-26267 > URL: https://issues.apache.org/jira/browse/SPARK-26267 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.4.0 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Blocker > Labels: correctness > Fix For: 3.0.0 > > > Due to KAFKA-7703, when the Kafka source tries to get the latest offset, it > may get an earliest offset, and then it will reprocess messages that have > been processed when it gets the correct latest offset in the next batch. > This usually happens when restarting a streaming query. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-26267) Kafka source may reprocess data
[ https://issues.apache.org/jira/browse/SPARK-26267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu reassigned SPARK-26267: Assignee: Shixiong Zhu > Kafka source may reprocess data > --- > > Key: SPARK-26267 > URL: https://issues.apache.org/jira/browse/SPARK-26267 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.4.0 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Blocker > Labels: correctness > > Due to KAFKA-7703, when the Kafka source tries to get the latest offset, it > may get an earliest offset, and then it will reprocess messages that have > been processed when it gets the correct latest offset in the next batch. > This usually happens when restarting a streaming query. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26350) Allow the user to override the group id of the Kafka's consumer
Shixiong Zhu created SPARK-26350: Summary: Allow the user to override the group id of the Kafka's consumer Key: SPARK-26350 URL: https://issues.apache.org/jira/browse/SPARK-26350 Project: Spark Issue Type: Improvement Components: Structured Streaming Affects Versions: 2.4.0 Reporter: Shixiong Zhu Sometimes the group id is used to identify the stream for "security". We should give a flag that lets you override it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26267) Kafka source may reprocess data
[ https://issues.apache.org/jira/browse/SPARK-26267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-26267: - Priority: Blocker (was: Major) > Kafka source may reprocess data > --- > > Key: SPARK-26267 > URL: https://issues.apache.org/jira/browse/SPARK-26267 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.4.0 >Reporter: Shixiong Zhu >Priority: Blocker > Labels: correctness > > Due to KAFKA-7703, when the Kafka source tries to get the latest offset, it > may get an earliest offset, and then it will reprocess messages that have > been processed when it gets the correct latest offset in the next batch. > This usually happens when restarting a streaming query. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26267) Kafka source may reprocess data
[ https://issues.apache.org/jira/browse/SPARK-26267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-26267: - Labels: correctness (was: ) > Kafka source may reprocess data > --- > > Key: SPARK-26267 > URL: https://issues.apache.org/jira/browse/SPARK-26267 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.4.0 >Reporter: Shixiong Zhu >Priority: Blocker > Labels: correctness > > Due to KAFKA-7703, when the Kafka source tries to get the latest offset, it > may get an earliest offset, and then it will reprocess messages that have > been processed when it gets the correct latest offset in the next batch. > This usually happens when restarting a streaming query. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26267) Kafka source may reprocess data
[ https://issues.apache.org/jira/browse/SPARK-26267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16709169#comment-16709169 ] Shixiong Zhu commented on SPARK-26267: -- KAFKA-7703 only exists in Kafka 1.1.0 and above, so a possible workaround is using an old version that doesn't have this issue. This doesn't impact Spark 2.3.x and below as we use Kafka 0.10.0.1 by default. > Kafka source may reprocess data > --- > > Key: SPARK-26267 > URL: https://issues.apache.org/jira/browse/SPARK-26267 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.4.0 >Reporter: Shixiong Zhu >Priority: Major > > Due to KAFKA-7703, when the Kafka source tries to get the latest offset, it > may get an earliest offset, and then it will reprocess messages that have > been processed when it gets the correct latest offset in the next batch. > This usually happens when restarting a streaming query. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26267) Kafka source may reprocess data
Shixiong Zhu created SPARK-26267: Summary: Kafka source may reprocess data Key: SPARK-26267 URL: https://issues.apache.org/jira/browse/SPARK-26267 Project: Spark Issue Type: Bug Components: Structured Streaming Affects Versions: 2.4.0 Reporter: Shixiong Zhu Due to KAFKA-7703, when the Kafka source tries to get the latest offset, it may get an earliest offset, and then it will reprocess messages that have been processed when it gets the correct latest offset in the next batch. This usually happens when restarting a streaming query. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26120) Fix a streaming query leak in Structured Streaming R tests
Shixiong Zhu created SPARK-26120: Summary: Fix a streaming query leak in Structured Streaming R tests Key: SPARK-26120 URL: https://issues.apache.org/jira/browse/SPARK-26120 Project: Spark Issue Type: Test Components: Tests Affects Versions: 2.4.0 Reporter: Shixiong Zhu Assignee: Shixiong Zhu "Specify a schema by using a DDL-formatted string when reading" doesn't stop the streaming query before stopping Spark. It causes the following annoying logs. {code} Exception in thread "stream execution thread for [id = 186dad10-e87f-4155-8119-00e0e63bbc1a, runId = 2c0cc158-410b-442f-ac36-20f80ec429b1]" Exception in thread "stream execution thread for people3 [id = ffa6136d-fe7b-4777-aa47-b0cb64d07ea4, runId = 644b888e-9cce-4a09-bb5e-2fb122796c19]" org.apache.spark.SparkException: Exception thrown in awaitResult: at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:355) at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:92) at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:76) at org.apache.spark.sql.execution.streaming.state.StateStoreCoordinatorRef.deactivateInstances(StateStoreCoordinator.scala:108) at org.apache.spark.sql.streaming.StreamingQueryManager.notifyQueryTermination(StreamingQueryManager.scala:399) at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runStream$2.apply(StreamExecution.scala:342) at org.apache.spark.util.UninterruptibleThread.runUninterruptibly(UninterruptibleThread.scala:77) at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:323) at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:204) Caused by: org.apache.spark.rpc.RpcEnvStoppedException: RpcEnv already stopped. at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:158) at org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:135) at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:229) at org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:523) at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:91) ... 7 more org.apache.spark.SparkException: Exception thrown in awaitResult: at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:355) at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:92) at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:76) at org.apache.spark.sql.execution.streaming.state.StateStoreCoordinatorRef.deactivateInstances(StateStoreCoordinator.scala:108) at org.apache.spark.sql.streaming.StreamingQueryManager.notifyQueryTermination(StreamingQueryManager.scala:399) at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runStream$2.apply(StreamExecution.scala:342) at org.apache.spark.util.UninterruptibleThread.runUninterruptibly(UninterruptibleThread.scala:77) at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:323) at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:204) Caused by: org.apache.spark.rpc.RpcEnvStoppedException: RpcEnv already stopped. at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:158) at org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:135) at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:229) at org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:523) at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:91) ... 7 more {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26120) Fix a streaming query leak in Structured Streaming R tests
[ https://issues.apache.org/jira/browse/SPARK-26120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-26120: - Priority: Minor (was: Major) > Fix a streaming query leak in Structured Streaming R tests > -- > > Key: SPARK-26120 > URL: https://issues.apache.org/jira/browse/SPARK-26120 > Project: Spark > Issue Type: Test > Components: SparkR, Structured Streaming, Tests >Affects Versions: 2.4.0 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Minor > > "Specify a schema by using a DDL-formatted string when reading" doesn't stop > the streaming query before stopping Spark. It causes the following annoying > logs. > {code} > Exception in thread "stream execution thread for [id = > 186dad10-e87f-4155-8119-00e0e63bbc1a, runId = > 2c0cc158-410b-442f-ac36-20f80ec429b1]" Exception in thread "stream execution > thread for people3 [id = ffa6136d-fe7b-4777-aa47-b0cb64d07ea4, runId = > 644b888e-9cce-4a09-bb5e-2fb122796c19]" org.apache.spark.SparkException: > Exception thrown in awaitResult: > at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:355) > at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) > at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:92) > at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:76) > at > org.apache.spark.sql.execution.streaming.state.StateStoreCoordinatorRef.deactivateInstances(StateStoreCoordinator.scala:108) > at > org.apache.spark.sql.streaming.StreamingQueryManager.notifyQueryTermination(StreamingQueryManager.scala:399) > at > org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runStream$2.apply(StreamExecution.scala:342) > at > org.apache.spark.util.UninterruptibleThread.runUninterruptibly(UninterruptibleThread.scala:77) > at > org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:323) > at > org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:204) > Caused by: org.apache.spark.rpc.RpcEnvStoppedException: RpcEnv already > stopped. > at > org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:158) > at > org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:135) > at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:229) > at > org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:523) > at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:91) > ... 7 more > org.apache.spark.SparkException: Exception thrown in awaitResult: > at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:355) > at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) > at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:92) > at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:76) > at > org.apache.spark.sql.execution.streaming.state.StateStoreCoordinatorRef.deactivateInstances(StateStoreCoordinator.scala:108) > at > org.apache.spark.sql.streaming.StreamingQueryManager.notifyQueryTermination(StreamingQueryManager.scala:399) > at > org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runStream$2.apply(StreamExecution.scala:342) > at > org.apache.spark.util.UninterruptibleThread.runUninterruptibly(UninterruptibleThread.scala:77) > at > org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:323) > at > org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:204) > Caused by: org.apache.spark.rpc.RpcEnvStoppedException: RpcEnv already > stopped. > at > org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:158) > at > org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:135) > at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:229) > at > org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:523) > at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:91) > ... 7 more > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26120) Fix a streaming query leak in Structured Streaming R tests
[ https://issues.apache.org/jira/browse/SPARK-26120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-26120: - Component/s: Structured Streaming SparkR > Fix a streaming query leak in Structured Streaming R tests > -- > > Key: SPARK-26120 > URL: https://issues.apache.org/jira/browse/SPARK-26120 > Project: Spark > Issue Type: Test > Components: SparkR, Structured Streaming, Tests >Affects Versions: 2.4.0 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Major > > "Specify a schema by using a DDL-formatted string when reading" doesn't stop > the streaming query before stopping Spark. It causes the following annoying > logs. > {code} > Exception in thread "stream execution thread for [id = > 186dad10-e87f-4155-8119-00e0e63bbc1a, runId = > 2c0cc158-410b-442f-ac36-20f80ec429b1]" Exception in thread "stream execution > thread for people3 [id = ffa6136d-fe7b-4777-aa47-b0cb64d07ea4, runId = > 644b888e-9cce-4a09-bb5e-2fb122796c19]" org.apache.spark.SparkException: > Exception thrown in awaitResult: > at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:355) > at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) > at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:92) > at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:76) > at > org.apache.spark.sql.execution.streaming.state.StateStoreCoordinatorRef.deactivateInstances(StateStoreCoordinator.scala:108) > at > org.apache.spark.sql.streaming.StreamingQueryManager.notifyQueryTermination(StreamingQueryManager.scala:399) > at > org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runStream$2.apply(StreamExecution.scala:342) > at > org.apache.spark.util.UninterruptibleThread.runUninterruptibly(UninterruptibleThread.scala:77) > at > org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:323) > at > org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:204) > Caused by: org.apache.spark.rpc.RpcEnvStoppedException: RpcEnv already > stopped. > at > org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:158) > at > org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:135) > at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:229) > at > org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:523) > at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:91) > ... 7 more > org.apache.spark.SparkException: Exception thrown in awaitResult: > at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:355) > at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) > at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:92) > at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:76) > at > org.apache.spark.sql.execution.streaming.state.StateStoreCoordinatorRef.deactivateInstances(StateStoreCoordinator.scala:108) > at > org.apache.spark.sql.streaming.StreamingQueryManager.notifyQueryTermination(StreamingQueryManager.scala:399) > at > org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runStream$2.apply(StreamExecution.scala:342) > at > org.apache.spark.util.UninterruptibleThread.runUninterruptibly(UninterruptibleThread.scala:77) > at > org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:323) > at > org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:204) > Caused by: org.apache.spark.rpc.RpcEnvStoppedException: RpcEnv already > stopped. > at > org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:158) > at > org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:135) > at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:229) > at > org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:523) > at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:91) > ... 7 more > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-26092) Use CheckpointFileManager to write the streaming metadata file
[ https://issues.apache.org/jira/browse/SPARK-26092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-26092. -- Resolution: Fixed Assignee: Shixiong Zhu Fix Version/s: 3.0.0 2.4.1 > Use CheckpointFileManager to write the streaming metadata file > -- > > Key: SPARK-26092 > URL: https://issues.apache.org/jira/browse/SPARK-26092 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.4.0 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Major > Fix For: 2.4.1, 3.0.0 > > > We should use CheckpointFileManager to write the streaming metadata file to > avoid potential partial file issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26092) Use CheckpointFileManager to write the streaming metadata file
[ https://issues.apache.org/jira/browse/SPARK-26092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-26092: - Issue Type: Bug (was: Test) > Use CheckpointFileManager to write the streaming metadata file > -- > > Key: SPARK-26092 > URL: https://issues.apache.org/jira/browse/SPARK-26092 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.4.0 >Reporter: Shixiong Zhu >Priority: Major > > We should use CheckpointFileManager to write the streaming metadata file to > avoid potential partial file issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26092) Use CheckpointFileManager to write the streaming metadata file
Shixiong Zhu created SPARK-26092: Summary: Use CheckpointFileManager to write the streaming metadata file Key: SPARK-26092 URL: https://issues.apache.org/jira/browse/SPARK-26092 Project: Spark Issue Type: Test Components: Structured Streaming Affects Versions: 2.4.0 Reporter: Shixiong Zhu We should use CheckpointFileManager to write the streaming metadata file to avoid potential partial file issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-26069) Flaky test: RpcIntegrationSuite.sendRpcWithStreamFailures
[ https://issues.apache.org/jira/browse/SPARK-26069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-26069. -- Resolution: Fixed Fix Version/s: 3.0.0 2.4.1 > Flaky test: RpcIntegrationSuite.sendRpcWithStreamFailures > - > > Key: SPARK-26069 > URL: https://issues.apache.org/jira/browse/SPARK-26069 > Project: Spark > Issue Type: Test > Components: Tests >Affects Versions: 2.4.0 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Major > Fix For: 2.4.1, 3.0.0 > > > {code} > sbt.ForkMain$ForkError: java.lang.AssertionError: expected:<1> but was:<2> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.spark.network.RpcIntegrationSuite.assertErrorAndClosed(RpcIntegrationSuite.java:386) > at > org.apache.spark.network.RpcIntegrationSuite.sendRpcWithStreamFailures(RpcIntegrationSuite.java:347) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at org.junit.runners.Suite.runChild(Suite.java:128) > at org.junit.runners.Suite.runChild(Suite.java:27) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at org.junit.runner.JUnitCore.run(JUnitCore.java:137) > at org.junit.runner.JUnitCore.run(JUnitCore.java:115) > at com.novocode.junit.JUnitRunner$1.execute(JUnitRunner.java:132) > at sbt.ForkMain$Run$2.call(ForkMain.java:296) > at sbt.ForkMain$Run$2.call(ForkMain.java:286) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26069) Flaky test: RpcIntegrationSuite.sendRpcWithStreamFailures
Shixiong Zhu created SPARK-26069: Summary: Flaky test: RpcIntegrationSuite.sendRpcWithStreamFailures Key: SPARK-26069 URL: https://issues.apache.org/jira/browse/SPARK-26069 Project: Spark Issue Type: Test Components: Tests Affects Versions: 2.4.0 Reporter: Shixiong Zhu Assignee: Shixiong Zhu {code} sbt.ForkMain$ForkError: java.lang.AssertionError: expected:<1> but was:<2> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:834) at org.junit.Assert.assertEquals(Assert.java:645) at org.junit.Assert.assertEquals(Assert.java:631) at org.apache.spark.network.RpcIntegrationSuite.assertErrorAndClosed(RpcIntegrationSuite.java:386) at org.apache.spark.network.RpcIntegrationSuite.sendRpcWithStreamFailures(RpcIntegrationSuite.java:347) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner.run(ParentRunner.java:363) at org.junit.runners.Suite.runChild(Suite.java:128) at org.junit.runners.Suite.runChild(Suite.java:27) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at org.junit.runners.ParentRunner.run(ParentRunner.java:363) at org.junit.runner.JUnitCore.run(JUnitCore.java:137) at org.junit.runner.JUnitCore.run(JUnitCore.java:115) at com.novocode.junit.JUnitRunner$1.execute(JUnitRunner.java:132) at sbt.ForkMain$Run$2.call(ForkMain.java:296) at sbt.ForkMain$Run$2.call(ForkMain.java:286) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-26042) KafkaContinuousSourceTopicDeletionSuite may hang forever
[ https://issues.apache.org/jira/browse/SPARK-26042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-26042. -- Resolution: Fixed Fix Version/s: 3.0.0 2.4.1 > KafkaContinuousSourceTopicDeletionSuite may hang forever > > > Key: SPARK-26042 > URL: https://issues.apache.org/jira/browse/SPARK-26042 > Project: Spark > Issue Type: Test > Components: Structured Streaming, Tests >Affects Versions: 2.4.0 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Major > Fix For: 2.4.1, 3.0.0 > > > Saw the following thread dump in some build: > {code} > "stream execution thread for [id = 1c13482e-1edf-4b5c-b63a-d652738c8a48, > runId = 10667ce9-7eac-4cef-a525-f1bd08eb50f1]" #4406 daemon prio=5 os_prio=0 > tid=0x7fab1d3c5000 nid=0x7f4b waiting on condition [0x7fa96efcb000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00070a904cf8> (a > scala.concurrent.impl.Promise$CompletionLatch) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) > ... > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:180) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:131) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:109) > - locked <0x00070a913ee8> (a > org.apache.spark.sql.execution.streaming.IncrementalExecution) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:109) > at > org.apache.spark.sql.execution.streaming.continuous.ContinuousExecution$$anonfun$runContinuous$3$$anonfun$apply$1.apply(ContinuousExecution.scala:270) > at > org.apache.spark.sql.execution.streaming.continuous.ContinuousExecution$$anonfun$runContinuous$3$$anonfun$apply$1.apply(ContinuousExecution.scala:270) > ,,, > "pool-1-thread-1-ScalaTest-running-KafkaContinuousSourceTopicDeletionSuite" > #20 prio=5 os_prio=0 tid=0x7fabc4e78800 nid=0x23be waiting for monitor > entry [0x7fab3dbff000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:100) > - waiting to lock <0x00070a913ee8> (a > org.apache.spark.sql.execution.streaming.IncrementalExecution) > at > org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:100) > at > org.apache.spark.sql.kafka010.KafkaContinuousSourceTopicDeletionSuite$$anonfun$3$$anonfun$apply$mcV$sp$12$$anonfun$apply$15.apply(KafkaContinuousSourceSuite.scala:210) > at > org.apache.spark.sql.kafka010.KafkaContinuousSourceTopicDeletionSuite$$anonfun$3$$anonfun$apply$mcV$sp$12$$anonfun$apply$15.apply(KafkaContinuousSourceSuite.scala:209) > ... > {code} > It hung forever because the test main thread was trying to access > `executedPlan` but the lock was held by the streaming thread. > This is a pretty common issue when using lazy vals as all lazy vals share the > same lock. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26042) KafkaContinuousSourceTopicDeletionSuite may hang forever
Shixiong Zhu created SPARK-26042: Summary: KafkaContinuousSourceTopicDeletionSuite may hang forever Key: SPARK-26042 URL: https://issues.apache.org/jira/browse/SPARK-26042 Project: Spark Issue Type: Test Components: Structured Streaming, Tests Affects Versions: 2.4.0 Reporter: Shixiong Zhu Assignee: Shixiong Zhu Saw the following thread dump in some build: {code} "stream execution thread for [id = 1c13482e-1edf-4b5c-b63a-d652738c8a48, runId = 10667ce9-7eac-4cef-a525-f1bd08eb50f1]" #4406 daemon prio=5 os_prio=0 tid=0x7fab1d3c5000 nid=0x7f4b waiting on condition [0x7fa96efcb000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00070a904cf8> (a scala.concurrent.impl.Promise$CompletionLatch) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) ... at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:180) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:131) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:109) - locked <0x00070a913ee8> (a org.apache.spark.sql.execution.streaming.IncrementalExecution) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:109) at org.apache.spark.sql.execution.streaming.continuous.ContinuousExecution$$anonfun$runContinuous$3$$anonfun$apply$1.apply(ContinuousExecution.scala:270) at org.apache.spark.sql.execution.streaming.continuous.ContinuousExecution$$anonfun$runContinuous$3$$anonfun$apply$1.apply(ContinuousExecution.scala:270) ,,, "pool-1-thread-1-ScalaTest-running-KafkaContinuousSourceTopicDeletionSuite" #20 prio=5 os_prio=0 tid=0x7fabc4e78800 nid=0x23be waiting for monitor entry [0x7fab3dbff000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:100) - waiting to lock <0x00070a913ee8> (a org.apache.spark.sql.execution.streaming.IncrementalExecution) at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:100) at org.apache.spark.sql.kafka010.KafkaContinuousSourceTopicDeletionSuite$$anonfun$3$$anonfun$apply$mcV$sp$12$$anonfun$apply$15.apply(KafkaContinuousSourceSuite.scala:210) at org.apache.spark.sql.kafka010.KafkaContinuousSourceTopicDeletionSuite$$anonfun$3$$anonfun$apply$mcV$sp$12$$anonfun$apply$15.apply(KafkaContinuousSourceSuite.scala:209) ... {code} It hung forever because the test main thread was trying to access `executedPlan` but the lock was held by the streaming thread. This is a pretty common issue when using lazy vals as all lazy vals share the same lock. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25692) Flaky test: ChunkFetchIntegrationSuite
[ https://issues.apache.org/jira/browse/SPARK-25692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16671900#comment-16671900 ] Shixiong Zhu commented on SPARK-25692: -- [~sanket991] You can download the unit test logs from https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.7/5554/artifact/common/network-common/target/ (It will be kept on Jenkins for several days) > Flaky test: ChunkFetchIntegrationSuite > -- > > Key: SPARK-25692 > URL: https://issues.apache.org/jira/browse/SPARK-25692 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Shixiong Zhu >Priority: Blocker > Attachments: Screen Shot 2018-10-22 at 4.12.41 PM.png, Screen Shot > 2018-11-01 at 10.17.16 AM.png > > > Looks like the whole test suite is pretty flaky. See: > https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.6/5490/testReport/junit/org.apache.spark.network/ChunkFetchIntegrationSuite/history/ > This may be a regression in 3.0 as this didn't happen in 2.4 branch. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25692) Flaky test: ChunkFetchIntegrationSuite
[ https://issues.apache.org/jira/browse/SPARK-25692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16671895#comment-16671895 ] Shixiong Zhu commented on SPARK-25692: -- It's still flaky on Jenkins: [https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.7/5554/testReport/junit/org.apache.spark.network/ChunkFetchIntegrationSuite/fetchBothChunks/history/] !Screen Shot 2018-11-01 at 10.17.16 AM.png! You may need to run the whole tests together. [https://github.com/apache/spark/pull/22173] added a global thread pool, so other tests may also impact this test suite. > Flaky test: ChunkFetchIntegrationSuite > -- > > Key: SPARK-25692 > URL: https://issues.apache.org/jira/browse/SPARK-25692 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Shixiong Zhu >Priority: Blocker > Attachments: Screen Shot 2018-10-22 at 4.12.41 PM.png, Screen Shot > 2018-11-01 at 10.17.16 AM.png > > > Looks like the whole test suite is pretty flaky. See: > https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.6/5490/testReport/junit/org.apache.spark.network/ChunkFetchIntegrationSuite/history/ > This may be a regression in 3.0 as this didn't happen in 2.4 branch. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25692) Flaky test: ChunkFetchIntegrationSuite
[ https://issues.apache.org/jira/browse/SPARK-25692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-25692: - Attachment: Screen Shot 2018-11-01 at 10.17.16 AM.png > Flaky test: ChunkFetchIntegrationSuite > -- > > Key: SPARK-25692 > URL: https://issues.apache.org/jira/browse/SPARK-25692 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Shixiong Zhu >Priority: Blocker > Attachments: Screen Shot 2018-10-22 at 4.12.41 PM.png, Screen Shot > 2018-11-01 at 10.17.16 AM.png > > > Looks like the whole test suite is pretty flaky. See: > https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.6/5490/testReport/junit/org.apache.spark.network/ChunkFetchIntegrationSuite/history/ > This may be a regression in 3.0 as this didn't happen in 2.4 branch. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20568) Delete files after processing in structured streaming
[ https://issues.apache.org/jira/browse/SPARK-20568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16670897#comment-16670897 ] Shixiong Zhu commented on SPARK-20568: -- [~kabhwan] I think this is pretty useful. Do you have time working on this? > Delete files after processing in structured streaming > - > > Key: SPARK-20568 > URL: https://issues.apache.org/jira/browse/SPARK-20568 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming >Affects Versions: 2.1.0, 2.2.1 >Reporter: Saul Shanabrook >Priority: Major > > It would be great to be able to delete files after processing them with > structured streaming. > For example, I am reading in a bunch of JSON files and converting them into > Parquet. If the JSON files are not deleted after they are processed, it > quickly fills up my hard drive. I originally [posted this on Stack > Overflow|http://stackoverflow.com/q/43671757/907060] and was recommended to > make a feature request for it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25899) Flaky test: CoarseGrainedSchedulerBackendSuite.compute max number of concurrent tasks can be launched
[ https://issues.apache.org/jira/browse/SPARK-25899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-25899: - Issue Type: Test (was: Documentation) > Flaky test: CoarseGrainedSchedulerBackendSuite.compute max number of > concurrent tasks can be launched > - > > Key: SPARK-25899 > URL: https://issues.apache.org/jira/browse/SPARK-25899 > Project: Spark > Issue Type: Test > Components: Tests >Affects Versions: 2.4.0 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Major > > {code} > sbt.ForkMain$ForkError: > org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to > eventually never returned normally. Attempted 400 times over > 10.00982864399 seconds. Last failure message: ArrayBuffer("2", "0", "3") > had length 3 instead of expected length 4. > at > org.scalatest.concurrent.Eventually$class.tryTryAgain$1(Eventually.scala:421) > at > org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:439) > at > org.apache.spark.scheduler.CoarseGrainedSchedulerBackendSuite.eventually(CoarseGrainedSchedulerBackendSuite.scala:30) > at > org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:337) > at > org.apache.spark.scheduler.CoarseGrainedSchedulerBackendSuite.eventually(CoarseGrainedSchedulerBackendSuite.scala:30) > at > org.apache.spark.scheduler.CoarseGrainedSchedulerBackendSuite$$anonfun$3.apply(CoarseGrainedSchedulerBackendSuite.scala:54) > at > org.apache.spark.scheduler.CoarseGrainedSchedulerBackendSuite$$anonfun$3.apply(CoarseGrainedSchedulerBackendSuite.scala:49) > at > org.apache.spark.SparkFunSuite$$anonfun$test$1.apply(SparkFunSuite.scala:266) > at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) > at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > at org.scalatest.Transformer.apply(Transformer.scala:22) > at org.scalatest.Transformer.apply(Transformer.scala:20) > at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186) > at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:168) > at > org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:183) > at > org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:196) > at > org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:196) > at org.scalatest.SuperEngine.runTestImpl(Engine.scala:289) > at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:196) > at > org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:62) > at > org.scalatest.BeforeAndAfterEach$class.runTest(BeforeAndAfterEach.scala:221) > at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:62) > at > org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:229) > at > org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:229) > at > org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:396) > at > org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:384) > at scala.collection.immutable.List.foreach(List.scala:392) > at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:384) > at > org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:379) > at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:461) > at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:229) > at org.scalatest.FunSuite.runTests(FunSuite.scala:1560) > at org.scalatest.Suite$class.run(Suite.scala:1147) > at > org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1560) > at > org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:233) > at > org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:233) > at org.scalatest.SuperEngine.runImpl(Engine.scala:521) > at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:233) > at > org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:62) > at > org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:213) > at > org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:210) > at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:62) > at > org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:314) > at > org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:480) > at sbt.ForkMain$Run$2.call(ForkMain.java:296)
[jira] [Created] (SPARK-25899) Flaky test: CoarseGrainedSchedulerBackendSuite.compute max number of concurrent tasks can be launched
Shixiong Zhu created SPARK-25899: Summary: Flaky test: CoarseGrainedSchedulerBackendSuite.compute max number of concurrent tasks can be launched Key: SPARK-25899 URL: https://issues.apache.org/jira/browse/SPARK-25899 Project: Spark Issue Type: Documentation Components: Tests Affects Versions: 2.4.0 Reporter: Shixiong Zhu Assignee: Shixiong Zhu {code} sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to eventually never returned normally. Attempted 400 times over 10.00982864399 seconds. Last failure message: ArrayBuffer("2", "0", "3") had length 3 instead of expected length 4. at org.scalatest.concurrent.Eventually$class.tryTryAgain$1(Eventually.scala:421) at org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:439) at org.apache.spark.scheduler.CoarseGrainedSchedulerBackendSuite.eventually(CoarseGrainedSchedulerBackendSuite.scala:30) at org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:337) at org.apache.spark.scheduler.CoarseGrainedSchedulerBackendSuite.eventually(CoarseGrainedSchedulerBackendSuite.scala:30) at org.apache.spark.scheduler.CoarseGrainedSchedulerBackendSuite$$anonfun$3.apply(CoarseGrainedSchedulerBackendSuite.scala:54) at org.apache.spark.scheduler.CoarseGrainedSchedulerBackendSuite$$anonfun$3.apply(CoarseGrainedSchedulerBackendSuite.scala:49) at org.apache.spark.SparkFunSuite$$anonfun$test$1.apply(SparkFunSuite.scala:266) at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) at org.scalatest.Transformer.apply(Transformer.scala:20) at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186) at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:168) at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:183) at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:196) at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:196) at org.scalatest.SuperEngine.runTestImpl(Engine.scala:289) at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:196) at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:62) at org.scalatest.BeforeAndAfterEach$class.runTest(BeforeAndAfterEach.scala:221) at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:62) at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:229) at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:229) at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:396) at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:384) at scala.collection.immutable.List.foreach(List.scala:392) at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:384) at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:379) at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:461) at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:229) at org.scalatest.FunSuite.runTests(FunSuite.scala:1560) at org.scalatest.Suite$class.run(Suite.scala:1147) at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1560) at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:233) at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:233) at org.scalatest.SuperEngine.runImpl(Engine.scala:521) at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:233) at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:62) at org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:213) at org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:210) at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:62) at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:314) at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:480) at sbt.ForkMain$Run$2.call(ForkMain.java:296) at sbt.ForkMain$Run$2.call(ForkMain.java:286) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at
[jira] [Resolved] (SPARK-25773) Cancel zombie tasks in a result stage when the job finishes
[ https://issues.apache.org/jira/browse/SPARK-25773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-25773. -- Resolution: Fixed Fix Version/s: 3.0.0 > Cancel zombie tasks in a result stage when the job finishes > --- > > Key: SPARK-25773 > URL: https://issues.apache.org/jira/browse/SPARK-25773 > Project: Spark > Issue Type: Improvement > Components: Scheduler >Affects Versions: 2.4.0 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Major > Fix For: 3.0.0 > > > When a job finishes, there may be some zombie tasks still running due to > stage retry. Since a result stage will never be used by other jobs, running > these tasks are just wasting the cluster resource. This PR just asks > TaskScheduler to cancel the running tasks of a result stage when it's already > finished. Credits go to @srinathshankar who suggested this idea to me. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25849) Improve document for task cancellation.
Shixiong Zhu created SPARK-25849: Summary: Improve document for task cancellation. Key: SPARK-25849 URL: https://issues.apache.org/jira/browse/SPARK-25849 Project: Spark Issue Type: Documentation Components: Documentation Affects Versions: 2.4.0 Reporter: Shixiong Zhu As suggested by [~markhamstra] in https://github.com/apache/spark/pull/22771#discussion_r228371144 , we should update the document to clarify how task cancellation works in Spark. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25822) Fix a race condition when releasing a Python worker
Shixiong Zhu created SPARK-25822: Summary: Fix a race condition when releasing a Python worker Key: SPARK-25822 URL: https://issues.apache.org/jira/browse/SPARK-25822 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 2.3.2 Reporter: Shixiong Zhu Assignee: Shixiong Zhu There is a race condition when releasing a Python worker. If "ReaderIterator.handleEndOfDataSection" is not running in the task thread, when a task is early terminated (such as "take(N)"), the task completion listener may close the worker but "handleEndOfDataSection" can still put the worker into the worker pool to reuse. https://github.com/zsxwing/spark/commit/0e07b483d2e7c68f3b5c3c118d0bf58c501041b7 is a patch to reproduce this issue. I also found a user reported this in the mail list: http://mail-archives.apache.org/mod_mbox/spark-user/201610.mbox/%3CCAAUq=h+yluepd23nwvq13ms5hostkhx3ao4f4zqv6sgo5zm...@mail.gmail.com%3E -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-25771) Fix improper synchronization in PythonWorkerFactory
[ https://issues.apache.org/jira/browse/SPARK-25771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-25771. -- Resolution: Fixed Fix Version/s: 3.0.0 > Fix improper synchronization in PythonWorkerFactory > --- > > Key: SPARK-25771 > URL: https://issues.apache.org/jira/browse/SPARK-25771 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.3.2 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25773) Cancel zombie tasks in a result stage when the job finishes
[ https://issues.apache.org/jira/browse/SPARK-25773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-25773: - Description: When a job finishes, there may be some zombie tasks still running due to stage retry. Since a result stage will never be used by other jobs, running these tasks are just wasting the cluster resource. This PR just asks TaskScheduler to cancel the running tasks of a result stage when it's already finished. Credits go to @srinathshankar who suggested this idea to me. > Cancel zombie tasks in a result stage when the job finishes > --- > > Key: SPARK-25773 > URL: https://issues.apache.org/jira/browse/SPARK-25773 > Project: Spark > Issue Type: Improvement > Components: Scheduler >Affects Versions: 2.4.0 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Major > > When a job finishes, there may be some zombie tasks still running due to > stage retry. Since a result stage will never be used by other jobs, running > these tasks are just wasting the cluster resource. This PR just asks > TaskScheduler to cancel the running tasks of a result stage when it's already > finished. Credits go to @srinathshankar who suggested this idea to me. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25773) Cancel zombie tasks in a result stage when the job finishes
Shixiong Zhu created SPARK-25773: Summary: Cancel zombie tasks in a result stage when the job finishes Key: SPARK-25773 URL: https://issues.apache.org/jira/browse/SPARK-25773 Project: Spark Issue Type: Improvement Components: Scheduler Affects Versions: 2.4.0 Reporter: Shixiong Zhu Assignee: Shixiong Zhu -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25771) Fix improper synchronization in PythonWorkerFactory
Shixiong Zhu created SPARK-25771: Summary: Fix improper synchronization in PythonWorkerFactory Key: SPARK-25771 URL: https://issues.apache.org/jira/browse/SPARK-25771 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 2.3.2 Reporter: Shixiong Zhu Assignee: Shixiong Zhu -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25738) LOAD DATA INPATH doesn't work if hdfs conf includes port
[ https://issues.apache.org/jira/browse/SPARK-25738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16650796#comment-16650796 ] Shixiong Zhu commented on SPARK-25738: -- Marked as a blocker since this is a regression > LOAD DATA INPATH doesn't work if hdfs conf includes port > > > Key: SPARK-25738 > URL: https://issues.apache.org/jira/browse/SPARK-25738 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Imran Rashid >Priority: Blocker > > LOAD DATA INPATH throws {{java.net.URISyntaxException: Malformed IPv6 address > at index 8}} if your hdfs conf includes a port for the namenode. > This is because the URI is passing in the value of the hdfs conf > "fs.defaultFS" in for the host. Note that variable is called {{authority}}, > but the 4-arg URI constructor actually expects a host: > https://docs.oracle.com/javase/7/docs/api/java/net/URI.html#URI(java.lang.String,%20java.lang.String,%20java.lang.String,%20java.lang.String) > {code} > val defaultFSConf = > sparkSession.sessionState.newHadoopConf().get("fs.defaultFS") > ... > val newUri = new URI(scheme, authority, pathUri.getPath, pathUri.getFragment) > {code} > https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala#L386 > This was introduced by SPARK-23425. > *Workaround*: specify a fully qualified path, eg. instead of > {noformat} > LOAD DATA INPATH '/some/path/on/hdfs' > {noformat} > use > {noformat} > LOAD DATA INPATH 'hdfs://fizz.buzz.com:8020/some/path/on/hdfs' > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25738) LOAD DATA INPATH doesn't work if hdfs conf includes port
[ https://issues.apache.org/jira/browse/SPARK-25738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-25738: - Priority: Blocker (was: Critical) > LOAD DATA INPATH doesn't work if hdfs conf includes port > > > Key: SPARK-25738 > URL: https://issues.apache.org/jira/browse/SPARK-25738 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Imran Rashid >Priority: Blocker > > LOAD DATA INPATH throws {{java.net.URISyntaxException: Malformed IPv6 address > at index 8}} if your hdfs conf includes a port for the namenode. > This is because the URI is passing in the value of the hdfs conf > "fs.defaultFS" in for the host. Note that variable is called {{authority}}, > but the 4-arg URI constructor actually expects a host: > https://docs.oracle.com/javase/7/docs/api/java/net/URI.html#URI(java.lang.String,%20java.lang.String,%20java.lang.String,%20java.lang.String) > {code} > val defaultFSConf = > sparkSession.sessionState.newHadoopConf().get("fs.defaultFS") > ... > val newUri = new URI(scheme, authority, pathUri.getPath, pathUri.getFragment) > {code} > https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala#L386 > This was introduced by SPARK-23425. > *Workaround*: specify a fully qualified path, eg. instead of > {noformat} > LOAD DATA INPATH '/some/path/on/hdfs' > {noformat} > use > {noformat} > LOAD DATA INPATH 'hdfs://fizz.buzz.com:8020/some/path/on/hdfs' > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23390) Flaky test: FileBasedDataSourceSuite
[ https://issues.apache.org/jira/browse/SPARK-23390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16645399#comment-16645399 ] Shixiong Zhu commented on SPARK-23390: -- [~dongjoon] when spark cancels a task, the task thread will get interrupted. I think this is what we need to test in Spark. I don't think Spark needs to have a great test coverage for codes inside third party libraries. It's unlikely we can reproduce this issue consistently without changing the third party libraries, since this will require to cancel a task when it's running some special codes in a third party library. > Flaky test: FileBasedDataSourceSuite > > > Key: SPARK-23390 > URL: https://issues.apache.org/jira/browse/SPARK-23390 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0, 2.4.0 >Reporter: Sameer Agarwal >Assignee: Wenchen Fan >Priority: Critical > > *RECENT HISTORY* > [http://spark-tests.appspot.com/test-details?suite_name=org.apache.spark.sql.FileBasedDataSourceSuite_name=%28It+is+not+a+test+it+is+a+sbt.testing.SuiteSelector%29] > > > We're seeing multiple failures in {{FileBasedDataSourceSuite}} in > {{spark-branch-2.3-test-sbt-hadoop-2.7}}: > {code:java} > org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to > eventually never returned normally. Attempted 15 times over > 10.01215805999 seconds. Last failure message: There are 1 possibly leaked > file streams.. > {code} > Here's the full history: > [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.7/189/testReport/org.apache.spark.sql/FileBasedDataSourceSuite/history/] > From a very quick look, these failures seem to be correlated with > [https://github.com/apache/spark/pull/20479] (cc [~dongjoon]) as evident from > the following stack trace (full logs > [here|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.7/189/console]): > {code:java} > [info] - Enabling/disabling ignoreMissingFiles using orc (648 milliseconds) > 15:55:58.673 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 0.0 in > stage 61.0 (TID 85, localhost, executor driver): TaskKilled (Stage cancelled) > 15:55:58.674 WARN org.apache.spark.DebugFilesystem: Leaked filesystem > connection created at: > java.lang.Throwable > at > org.apache.spark.DebugFilesystem$.addOpenStream(DebugFilesystem.scala:36) > at org.apache.spark.DebugFilesystem.open(DebugFilesystem.scala:70) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:769) > at > org.apache.orc.impl.RecordReaderUtils$DefaultDataReader.open(RecordReaderUtils.java:173) > at > org.apache.orc.impl.RecordReaderImpl.(RecordReaderImpl.java:254) > at org.apache.orc.impl.ReaderImpl.rows(ReaderImpl.java:633) > at > org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.initialize(OrcColumnarBatchReader.java:138) > {code} > Also, while this might be just a false correlation but the frequency of these > test failures have increased considerably in > [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.7/] > after [https://github.com/apache/spark/pull/20562] (cc > [~feng...@databricks.com]) was merged. > The following is Parquet leakage. > {code:java} > Caused by: sbt.ForkMain$ForkError: java.lang.Throwable: null > at > org.apache.spark.DebugFilesystem$.addOpenStream(DebugFilesystem.scala:36) > at org.apache.spark.DebugFilesystem.open(DebugFilesystem.scala:70) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:769) > at > org.apache.parquet.hadoop.ParquetFileReader.(ParquetFileReader.java:538) > at > org.apache.spark.sql.execution.datasources.parquet.SpecificParquetRecordReaderBase.initialize(SpecificParquetRecordReaderBase.java:149) > at > org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.initialize(VectorizedParquetRecordReader.java:133) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$buildReaderWithPartitionValues$1.apply(ParquetFileFormat.scala:400) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$buildReaderWithPartitionValues$1.apply(ParquetFileFormat.scala:356) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:125) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:179) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:106) >
[jira] [Commented] (SPARK-10816) EventTime based sessionization
[ https://issues.apache.org/jira/browse/SPARK-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16645370#comment-16645370 ] Shixiong Zhu commented on SPARK-10816: -- Thanks a lot for the design docs and prototypes. I had a long discussion with [~tdas] and we think we should discuss other alternative approaches in the design doc. We came out 3 possible implementations: [1] Put into a state store. In each batch, for each key, scan the sorted event lists and use the watermark to find out the finalized session and output them. [2] Put <(key, timestamp), event> into a state store. Here is the key used in the state store is a tuple of user key and the event timestamp. In each batch, sort each partition using (key, timestamp) and scan the whole sorted partition to find out the finalized sessions and output them. [3] Use two state stores like what stream-stream join does. The first state store will store , the second one will store . When we insert an event into the second state store, we should use insertion sort to make sure we store events order by timestamp, such as find the proper index for this event, and update the following indices after this event. Then we can just scan all keys and their values in the state store to find out the finalized session and output them. [1] is easy to implement and can be done directly using `flatMapGroupsWithState` but it may fail when a key has too many events. [2] and [3] will scale well but the performance may be worse. If I read the codes correctly, [https://github.com/apache/spark/pull/22583] is [1]. [https://github.com/apache/spark/pull/22482] is a combination of [2] and [3] but still need to load all values of a key into the memory at the same time. [~kabhwan] [~XuanYuan] could you work together to update your design docs to add these alternative approaches and discuss pros and cons? It would be great you can put the design docs to a google doc so that it's easy to leave comments. In addition, it's better to also discuss the compatibility, such as if we decide to use a new approach to implement session window but need to change the state format in the state store, do we have enough version information to identity the old and new formats? > EventTime based sessionization > -- > > Key: SPARK-10816 > URL: https://issues.apache.org/jira/browse/SPARK-10816 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming >Reporter: Reynold Xin >Priority: Major > Attachments: SPARK-10816 Support session window natively.pdf, Session > Window Support For Structure Streaming.pdf > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25692) Flaky test: ChunkFetchIntegrationSuite
[ https://issues.apache.org/jira/browse/SPARK-25692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644162#comment-16644162 ] Shixiong Zhu commented on SPARK-25692: -- It may be caused by https://github.com/apache/spark/pull/22173 > Flaky test: ChunkFetchIntegrationSuite > -- > > Key: SPARK-25692 > URL: https://issues.apache.org/jira/browse/SPARK-25692 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Shixiong Zhu >Priority: Blocker > > Looks like the whole test suite is pretty flaky. See: > https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.6/5490/testReport/junit/org.apache.spark.network/ChunkFetchIntegrationSuite/history/ > This may be a regression in 3.0 as this didn't happen in 2.4 branch. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25692) Flaky test: ChunkFetchIntegrationSuite
[ https://issues.apache.org/jira/browse/SPARK-25692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-25692: - Description: Looks like the whole test suite is pretty flaky. See: https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.6/5490/testReport/junit/org.apache.spark.network/ChunkFetchIntegrationSuite/history/ This may be a regression in 3.0 as this didn't happen in 2.4 branch. was: Looks like the whole test suite is pretty flaky. See: https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.6/5490/testReport/junit/org.apache.spark.network/ChunkFetchIntegrationSuite/history/ This may be a regression in 2.4 as this didn't happen before. > Flaky test: ChunkFetchIntegrationSuite > -- > > Key: SPARK-25692 > URL: https://issues.apache.org/jira/browse/SPARK-25692 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Shixiong Zhu >Priority: Blocker > > Looks like the whole test suite is pretty flaky. See: > https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.6/5490/testReport/junit/org.apache.spark.network/ChunkFetchIntegrationSuite/history/ > This may be a regression in 3.0 as this didn't happen in 2.4 branch. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25692) Flaky test: ChunkFetchIntegrationSuite
Shixiong Zhu created SPARK-25692: Summary: Flaky test: ChunkFetchIntegrationSuite Key: SPARK-25692 URL: https://issues.apache.org/jira/browse/SPARK-25692 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.4.0 Reporter: Shixiong Zhu Looks like the whole test suite is pretty flaky. See: https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.6/5490/testReport/junit/org.apache.spark.network/ChunkFetchIntegrationSuite/history/ This may be a regression in 2.4 as this didn't happen before. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23390) Flaky test: FileBasedDataSourceSuite
[ https://issues.apache.org/jira/browse/SPARK-23390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644036#comment-16644036 ] Shixiong Zhu commented on SPARK-23390: -- I didn't look at parquet. It may have a similar issue. > Flaky test: FileBasedDataSourceSuite > > > Key: SPARK-23390 > URL: https://issues.apache.org/jira/browse/SPARK-23390 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0, 2.4.0 >Reporter: Sameer Agarwal >Assignee: Wenchen Fan >Priority: Critical > > *RECENT HISTORY* > [http://spark-tests.appspot.com/test-details?suite_name=org.apache.spark.sql.FileBasedDataSourceSuite_name=%28It+is+not+a+test+it+is+a+sbt.testing.SuiteSelector%29] > > > We're seeing multiple failures in {{FileBasedDataSourceSuite}} in > {{spark-branch-2.3-test-sbt-hadoop-2.7}}: > {code:java} > org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to > eventually never returned normally. Attempted 15 times over > 10.01215805999 seconds. Last failure message: There are 1 possibly leaked > file streams.. > {code} > Here's the full history: > [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.7/189/testReport/org.apache.spark.sql/FileBasedDataSourceSuite/history/] > From a very quick look, these failures seem to be correlated with > [https://github.com/apache/spark/pull/20479] (cc [~dongjoon]) as evident from > the following stack trace (full logs > [here|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.7/189/console]): > {code:java} > [info] - Enabling/disabling ignoreMissingFiles using orc (648 milliseconds) > 15:55:58.673 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 0.0 in > stage 61.0 (TID 85, localhost, executor driver): TaskKilled (Stage cancelled) > 15:55:58.674 WARN org.apache.spark.DebugFilesystem: Leaked filesystem > connection created at: > java.lang.Throwable > at > org.apache.spark.DebugFilesystem$.addOpenStream(DebugFilesystem.scala:36) > at org.apache.spark.DebugFilesystem.open(DebugFilesystem.scala:70) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:769) > at > org.apache.orc.impl.RecordReaderUtils$DefaultDataReader.open(RecordReaderUtils.java:173) > at > org.apache.orc.impl.RecordReaderImpl.(RecordReaderImpl.java:254) > at org.apache.orc.impl.ReaderImpl.rows(ReaderImpl.java:633) > at > org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.initialize(OrcColumnarBatchReader.java:138) > {code} > Also, while this might be just a false correlation but the frequency of these > test failures have increased considerably in > [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.7/] > after [https://github.com/apache/spark/pull/20562] (cc > [~feng...@databricks.com]) was merged. > The following is Parquet leakage. > {code:java} > Caused by: sbt.ForkMain$ForkError: java.lang.Throwable: null > at > org.apache.spark.DebugFilesystem$.addOpenStream(DebugFilesystem.scala:36) > at org.apache.spark.DebugFilesystem.open(DebugFilesystem.scala:70) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:769) > at > org.apache.parquet.hadoop.ParquetFileReader.(ParquetFileReader.java:538) > at > org.apache.spark.sql.execution.datasources.parquet.SpecificParquetRecordReaderBase.initialize(SpecificParquetRecordReaderBase.java:149) > at > org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.initialize(VectorizedParquetRecordReader.java:133) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$buildReaderWithPartitionValues$1.apply(ParquetFileFormat.scala:400) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$buildReaderWithPartitionValues$1.apply(ParquetFileFormat.scala:356) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:125) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:179) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:106) > {code} > - > [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.7/322/] > (May 3rd) > - > [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.7/331/] > (May 9th) > -
[jira] [Commented] (SPARK-23390) Flaky test: FileBasedDataSourceSuite
[ https://issues.apache.org/jira/browse/SPARK-23390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644034#comment-16644034 ] Shixiong Zhu commented on SPARK-23390: -- I think the issue is probably in orc. Any exception throwing between https://github.com/apache/orc/blob/b21b5ffcc1efcbd4aef337fa6faae4d25262f8f1/java/core/src/java/org/apache/orc/impl/RecordReaderImpl.java#L252 and https://github.com/apache/orc/blob/b21b5ffcc1efcbd4aef337fa6faae4d25262f8f1/java/core/src/java/org/apache/orc/impl/RecordReaderImpl.java#L273 will leak `dataReader`. For example, cancelling a Spark task may cause https://github.com/apache/orc/blob/b21b5ffcc1efcbd4aef337fa6faae4d25262f8f1/java/core/src/java/org/apache/orc/impl/RecordReaderImpl.java#L273 throw an exception. > Flaky test: FileBasedDataSourceSuite > > > Key: SPARK-23390 > URL: https://issues.apache.org/jira/browse/SPARK-23390 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0, 2.4.0 >Reporter: Sameer Agarwal >Assignee: Wenchen Fan >Priority: Critical > > *RECENT HISTORY* > [http://spark-tests.appspot.com/test-details?suite_name=org.apache.spark.sql.FileBasedDataSourceSuite_name=%28It+is+not+a+test+it+is+a+sbt.testing.SuiteSelector%29] > > > We're seeing multiple failures in {{FileBasedDataSourceSuite}} in > {{spark-branch-2.3-test-sbt-hadoop-2.7}}: > {code:java} > org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to > eventually never returned normally. Attempted 15 times over > 10.01215805999 seconds. Last failure message: There are 1 possibly leaked > file streams.. > {code} > Here's the full history: > [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.7/189/testReport/org.apache.spark.sql/FileBasedDataSourceSuite/history/] > From a very quick look, these failures seem to be correlated with > [https://github.com/apache/spark/pull/20479] (cc [~dongjoon]) as evident from > the following stack trace (full logs > [here|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.7/189/console]): > {code:java} > [info] - Enabling/disabling ignoreMissingFiles using orc (648 milliseconds) > 15:55:58.673 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 0.0 in > stage 61.0 (TID 85, localhost, executor driver): TaskKilled (Stage cancelled) > 15:55:58.674 WARN org.apache.spark.DebugFilesystem: Leaked filesystem > connection created at: > java.lang.Throwable > at > org.apache.spark.DebugFilesystem$.addOpenStream(DebugFilesystem.scala:36) > at org.apache.spark.DebugFilesystem.open(DebugFilesystem.scala:70) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:769) > at > org.apache.orc.impl.RecordReaderUtils$DefaultDataReader.open(RecordReaderUtils.java:173) > at > org.apache.orc.impl.RecordReaderImpl.(RecordReaderImpl.java:254) > at org.apache.orc.impl.ReaderImpl.rows(ReaderImpl.java:633) > at > org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.initialize(OrcColumnarBatchReader.java:138) > {code} > Also, while this might be just a false correlation but the frequency of these > test failures have increased considerably in > [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.7/] > after [https://github.com/apache/spark/pull/20562] (cc > [~feng...@databricks.com]) was merged. > The following is Parquet leakage. > {code:java} > Caused by: sbt.ForkMain$ForkError: java.lang.Throwable: null > at > org.apache.spark.DebugFilesystem$.addOpenStream(DebugFilesystem.scala:36) > at org.apache.spark.DebugFilesystem.open(DebugFilesystem.scala:70) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:769) > at > org.apache.parquet.hadoop.ParquetFileReader.(ParquetFileReader.java:538) > at > org.apache.spark.sql.execution.datasources.parquet.SpecificParquetRecordReaderBase.initialize(SpecificParquetRecordReaderBase.java:149) > at > org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.initialize(VectorizedParquetRecordReader.java:133) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$buildReaderWithPartitionValues$1.apply(ParquetFileFormat.scala:400) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$buildReaderWithPartitionValues$1.apply(ParquetFileFormat.scala:356) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:125) > at >
[jira] [Resolved] (SPARK-25644) Fix java foreachBatch API
[ https://issues.apache.org/jira/browse/SPARK-25644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-25644. -- Resolution: Fixed Fix Version/s: 2.4.0 > Fix java foreachBatch API > - > > Key: SPARK-25644 > URL: https://issues.apache.org/jira/browse/SPARK-25644 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.4.0 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Blocker > Fix For: 2.4.0 > > > The java foreachBatch API in DataStreamWriter should accept java.lang.Long > rather scala.Long. It's better to fix the new API before the release gets > out, so I marked this ticket as a blocker. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25644) Fix java foreachBatch API
[ https://issues.apache.org/jira/browse/SPARK-25644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-25644: - Target Version/s: 2.4.0 > Fix java foreachBatch API > - > > Key: SPARK-25644 > URL: https://issues.apache.org/jira/browse/SPARK-25644 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.4.0 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Blocker > > The java foreachBatch API in DataStreamWriter should accept java.lang.Long > rather scala.Long. It's better to fix the new API before the release gets > out, so I marked this ticket as a blocker. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25644) Fix java foreachBatch API
Shixiong Zhu created SPARK-25644: Summary: Fix java foreachBatch API Key: SPARK-25644 URL: https://issues.apache.org/jira/browse/SPARK-25644 Project: Spark Issue Type: Bug Components: Structured Streaming Affects Versions: 2.4.0 Reporter: Shixiong Zhu Assignee: Shixiong Zhu The java foreachBatch API in DataStreamWriter should accept java.lang.Long rather scala.Long. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25644) Fix java foreachBatch API
[ https://issues.apache.org/jira/browse/SPARK-25644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-25644: - Description: The java foreachBatch API in DataStreamWriter should accept java.lang.Long rather scala.Long. It's better to fix the new API before the release gets out, so I marked this ticket as a blocker. (was: The java foreachBatch API in DataStreamWriter should accept java.lang.Long rather scala.Long.) > Fix java foreachBatch API > - > > Key: SPARK-25644 > URL: https://issues.apache.org/jira/browse/SPARK-25644 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.4.0 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Blocker > > The java foreachBatch API in DataStreamWriter should accept java.lang.Long > rather scala.Long. It's better to fix the new API before the release gets > out, so I marked this ticket as a blocker. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25005) Structured streaming doesn't support kafka transaction (creating empty offset with abort & markers)
[ https://issues.apache.org/jira/browse/SPARK-25005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16637564#comment-16637564 ] Shixiong Zhu commented on SPARK-25005: -- [~qambard] Not sure about your question. If Kafka consumers fetch nothing, it will not update the position. And yes, if a partition is full with invisible messages, we have to wait for timeout. I don't see any API to avoid this. > Structured streaming doesn't support kafka transaction (creating empty offset > with abort & markers) > --- > > Key: SPARK-25005 > URL: https://issues.apache.org/jira/browse/SPARK-25005 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.3.1 >Reporter: Quentin Ambard >Assignee: Shixiong Zhu >Priority: Major > Fix For: 2.4.0 > > > Structured streaming can't consume kafka transaction. > We could try to apply SPARK-24720 (DStream) logic to Structured Streaming > source -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25005) Structured streaming doesn't support kafka transaction (creating empty offset with abort & markers)
[ https://issues.apache.org/jira/browse/SPARK-25005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16637541#comment-16637541 ] Shixiong Zhu commented on SPARK-25005: -- [~qambard] If `poll` returns and offset gets changed, it means Kafka consumer fetches something but all of messages are invisible so consumer return empty. If `poll` returns but offset doesn't change, it means Kafka fetches nothing before timeout. In this case, we just throw "TimeoutException". Spark will retry the task or just fail the job. Large GC pause can cause timeout and the user should tune the configs to avoid this happening. We cannot do much in Spark. > Structured streaming doesn't support kafka transaction (creating empty offset > with abort & markers) > --- > > Key: SPARK-25005 > URL: https://issues.apache.org/jira/browse/SPARK-25005 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.3.1 >Reporter: Quentin Ambard >Assignee: Shixiong Zhu >Priority: Major > Fix For: 2.4.0 > > > Structured streaming can't consume kafka transaction. > We could try to apply SPARK-24720 (DStream) logic to Structured Streaming > source -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-25315) setting "auto.offset.reset" to "earliest" has no effect in Structured Streaming with Spark 2.3.1 and Kafka 1.0
[ https://issues.apache.org/jira/browse/SPARK-25315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-25315. -- Resolution: Not A Bug > setting "auto.offset.reset" to "earliest" has no effect in Structured > Streaming with Spark 2.3.1 and Kafka 1.0 > -- > > Key: SPARK-25315 > URL: https://issues.apache.org/jira/browse/SPARK-25315 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.3.1 > Environment: Standalone; running in IDEA >Reporter: Zhenhao Li >Priority: Major > > The following code won't read from the beginning of the topic > ``` > {code:java} > val kafkaOptions = Map[String, String]( > "kafka.bootstrap.servers" -> KAFKA_BOOTSTRAP_SERVERS, > "subscribe" -> TOPIC, > "group.id" -> GROUP_ID, > "auto.offset.reset" -> "earliest" > ) > val myStream = sparkSession > .readStream > .format("kafka") > .options(kafkaOptions) > .load() > .selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)") > myStream > .writeStream > .format("console") > .start() > .awaitTermination() > {code} > ``` -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25315) setting "auto.offset.reset" to "earliest" has no effect in Structured Streaming with Spark 2.3.1 and Kafka 1.0
[ https://issues.apache.org/jira/browse/SPARK-25315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16634423#comment-16634423 ] Shixiong Zhu commented on SPARK-25315: -- Kafka’s own configurations should be set with "kafka." prefix. "group.id" and "auto.offset.reset" will be ignored. In addition, after you add "kafka." prefix, you will see some error messages as "group.id" or "auto.offset.reset" is not supported. They are documented here: http://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html#kafka-specific-configurations > setting "auto.offset.reset" to "earliest" has no effect in Structured > Streaming with Spark 2.3.1 and Kafka 1.0 > -- > > Key: SPARK-25315 > URL: https://issues.apache.org/jira/browse/SPARK-25315 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.3.1 > Environment: Standalone; running in IDEA >Reporter: Zhenhao Li >Priority: Major > > The following code won't read from the beginning of the topic > ``` > {code:java} > val kafkaOptions = Map[String, String]( > "kafka.bootstrap.servers" -> KAFKA_BOOTSTRAP_SERVERS, > "subscribe" -> TOPIC, > "group.id" -> GROUP_ID, > "auto.offset.reset" -> "earliest" > ) > val myStream = sparkSession > .readStream > .format("kafka") > .options(kafkaOptions) > .load() > .selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)") > myStream > .writeStream > .format("console") > .start() > .awaitTermination() > {code} > ``` -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-25449) Don't send zero accumulators in heartbeats
[ https://issues.apache.org/jira/browse/SPARK-25449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-25449. -- Resolution: Fixed Assignee: Mukul Murthy Fix Version/s: 2.5.0 > Don't send zero accumulators in heartbeats > -- > > Key: SPARK-25449 > URL: https://issues.apache.org/jira/browse/SPARK-25449 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Mukul Murthy >Assignee: Mukul Murthy >Priority: Major > Fix For: 2.5.0 > > > Heartbeats sent from executors to the driver every 10 seconds contain metrics > and are generally on the order of a few KBs. However, for large jobs with > lots of tasks, heartbeats can be on the order of tens of MBs, causing tasks > to die with heartbeat failures. We can mitigate this by not sending zero > metrics to the driver. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25569) Failing a Spark job when an accumulator cannot be updated
Shixiong Zhu created SPARK-25569: Summary: Failing a Spark job when an accumulator cannot be updated Key: SPARK-25569 URL: https://issues.apache.org/jira/browse/SPARK-25569 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 2.4.0 Reporter: Shixiong Zhu Currently, when Spark fails to merge an accumulator updates from a task, it will not fail the task. (See https://github.com/apache/spark/blob/b7d80349b0e367d78cab238e62c2ec353f0f12b3/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1266) So an accumulator update failure may be ignored silently. Some user may want to use accumulators in business critical things, and would like to fail a job when an accumulator is broken. We can add a flag to always fail a Spark job when hitting an accumulator failure. Or we can add a new property to an accumulator and only fail a spark job when such accumulator fails. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25568) Continue to update the remaining accumulators when failing to update one accumulator
Shixiong Zhu created SPARK-25568: Summary: Continue to update the remaining accumulators when failing to update one accumulator Key: SPARK-25568 URL: https://issues.apache.org/jira/browse/SPARK-25568 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.3.2, 2.4.0 Reporter: Shixiong Zhu Assignee: Shixiong Zhu Currently when failing to update an accumulator, DAGScheduler.updateAccumulators will skip the remaining accumulators. We should try to update the remaining accumulators if possible. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25568) Continue to update the remaining accumulators when failing to update one accumulator
[ https://issues.apache.org/jira/browse/SPARK-25568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-25568: - Description: Currently when failing to update an accumulator, DAGScheduler.updateAccumulators will skip the remaining accumulators. We should try to update the remaining accumulators if possible so that they can still report correct values. was:Currently when failing to update an accumulator, DAGScheduler.updateAccumulators will skip the remaining accumulators. We should try to update the remaining accumulators if possible. > Continue to update the remaining accumulators when failing to update one > accumulator > > > Key: SPARK-25568 > URL: https://issues.apache.org/jira/browse/SPARK-25568 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.2, 2.4.0 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Major > > Currently when failing to update an accumulator, > DAGScheduler.updateAccumulators will skip the remaining accumulators. We > should try to update the remaining accumulators if possible so that they can > still report correct values. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-25495) FetchedData.reset doesn't reset _nextOffsetInFetchedData and _offsetAfterPoll
[ https://issues.apache.org/jira/browse/SPARK-25495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-25495. -- Resolution: Fixed Fix Version/s: 2.4.0 > FetchedData.reset doesn't reset _nextOffsetInFetchedData and _offsetAfterPoll > - > > Key: SPARK-25495 > URL: https://issues.apache.org/jira/browse/SPARK-25495 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.4.0 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Blocker > Labels: correctness > Fix For: 2.4.0 > > > FetchedData.reset doesn't reset _nextOffsetInFetchedData and _offsetAfterPoll > and causes inconsistent cached data and may make Kafka connector return wrong > results. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25495) FetchedData.reset doesn't reset _nextOffsetInFetchedData and _offsetAfterPoll
Shixiong Zhu created SPARK-25495: Summary: FetchedData.reset doesn't reset _nextOffsetInFetchedData and _offsetAfterPoll Key: SPARK-25495 URL: https://issues.apache.org/jira/browse/SPARK-25495 Project: Spark Issue Type: Bug Components: Structured Streaming Affects Versions: 2.4.0 Reporter: Shixiong Zhu Assignee: Shixiong Zhu FetchedData.reset doesn't reset _nextOffsetInFetchedData and _offsetAfterPoll and causes inconsistent cached data and may make Kafka connector return wrong results. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25449) Don't send zero accumulators in heartbeats
[ https://issues.apache.org/jira/browse/SPARK-25449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-25449: - Issue Type: Improvement (was: Task) > Don't send zero accumulators in heartbeats > -- > > Key: SPARK-25449 > URL: https://issues.apache.org/jira/browse/SPARK-25449 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Mukul Murthy >Priority: Major > > Heartbeats sent from executors to the driver every 10 seconds contain metrics > and are generally on the order of a few KBs. However, for large jobs with > lots of tasks, heartbeats can be on the order of tens of MBs, causing tasks > to die with heartbeat failures. We can mitigate this by not sending zero > metrics to the driver. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25449) Don't send zero accumulators in heartbeats
[ https://issues.apache.org/jira/browse/SPARK-25449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-25449: - Target Version/s: (was: 2.5.0) > Don't send zero accumulators in heartbeats > -- > > Key: SPARK-25449 > URL: https://issues.apache.org/jira/browse/SPARK-25449 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Mukul Murthy >Priority: Major > > Heartbeats sent from executors to the driver every 10 seconds contain metrics > and are generally on the order of a few KBs. However, for large jobs with > lots of tasks, heartbeats can be on the order of tens of MBs, causing tasks > to die with heartbeat failures. We can mitigate this by not sending zero > metrics to the driver. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19903) Watermark metadata is lost when using resolved attributes
[ https://issues.apache.org/jira/browse/SPARK-19903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-19903: - Target Version/s: (was: 2.4.0) > Watermark metadata is lost when using resolved attributes > - > > Key: SPARK-19903 > URL: https://issues.apache.org/jira/browse/SPARK-19903 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.1.0 > Environment: Ubuntu Linux >Reporter: Piotr Nestorow >Priority: Major > > PySpark example reads a Kafka stream. There is watermarking set when handling > the data window. The defined query uses output Append mode. > The PySpark engine reports the error: > 'Append output mode not supported when there are streaming aggregations on > streaming DataFrames/DataSets' > The Python example: > --- > {code} > import sys > from pyspark.sql import SparkSession > from pyspark.sql.functions import explode, split, window > if __name__ == "__main__": > if len(sys.argv) != 4: > print(""" > Usage: structured_kafka_wordcount.py > > """, file=sys.stderr) > exit(-1) > bootstrapServers = sys.argv[1] > subscribeType = sys.argv[2] > topics = sys.argv[3] > spark = SparkSession\ > .builder\ > .appName("StructuredKafkaWordCount")\ > .getOrCreate() > # Create DataSet representing the stream of input lines from kafka > lines = spark\ > .readStream\ > .format("kafka")\ > .option("kafka.bootstrap.servers", bootstrapServers)\ > .option(subscribeType, topics)\ > .load()\ > .selectExpr("CAST(value AS STRING)", "CAST(timestamp AS TIMESTAMP)") > # Split the lines into words, retaining timestamps > # split() splits each line into an array, and explode() turns the array > into multiple rows > words = lines.select( > explode(split(lines.value, ' ')).alias('word'), > lines.timestamp > ) > # Group the data by window and word and compute the count of each group > windowedCounts = words.withWatermark("timestamp", "30 seconds").groupBy( > window(words.timestamp, "30 seconds", "30 seconds"), words.word > ).count() > # Start running the query that prints the running counts to the console > query = windowedCounts\ > .writeStream\ > .outputMode('append')\ > .format('console')\ > .option("truncate", "false")\ > .start() > query.awaitTermination() > {code} > The corresponding example in Zeppelin notebook: > {code} > %spark.pyspark > from pyspark.sql.functions import explode, split, window > # Create DataSet representing the stream of input lines from kafka > lines = spark\ > .readStream\ > .format("kafka")\ > .option("kafka.bootstrap.servers", "localhost:9092")\ > .option("subscribe", "words")\ > .load()\ > .selectExpr("CAST(value AS STRING)", "CAST(timestamp AS TIMESTAMP)") > # Split the lines into words, retaining timestamps > # split() splits each line into an array, and explode() turns the array into > multiple rows > words = lines.select( > explode(split(lines.value, ' ')).alias('word'), > lines.timestamp > ) > # Group the data by window and word and compute the count of each group > windowedCounts = words.withWatermark("timestamp", "30 seconds").groupBy( > window(words.timestamp, "30 seconds", "30 seconds"), words.word > ).count() > # Start running the query that prints the running counts to the console > query = windowedCounts\ > .writeStream\ > .outputMode('append')\ > .format('console')\ > .option("truncate", "false")\ > .start() > query.awaitTermination() > -- > Note that the Scala version of the same example in Zeppelin notebook works > fine: > > import java.sql.Timestamp > import org.apache.spark.sql.streaming.ProcessingTime > import org.apache.spark.sql.functions._ > // Create DataSet representing the stream of input lines from kafka > val lines = spark > .readStream > .format("kafka") > .option("kafka.bootstrap.servers", "localhost:9092") > .option("subscribe", "words") > .load() > // Split the lines into words, retaining timestamps > val words = lines > .selectExpr("CAST(value AS STRING)", "CAST(timestamp AS > TIMESTAMP)") > .as[(String, Timestamp)] > .flatMap(line => line._1.split(" ").map(word => (word, line._2))) > .toDF("word", "timestamp") > // Group the data by