[ 
https://issues.apache.org/jira/browse/SPARK-23565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick McGloin updated SPARK-23565:
------------------------------------
    Description: 
If you change the number of sources for a Structured Streaming query then you 
will get an assertion error as the number of sources in the checkpoint does not 
match the number of sources in the query that is starting.  This can happen if, 
for example, you add a union to the input of the query.  This is of course 
correct but the error is a bit cryptic and requires investigation.

Suggestion for a more informative error message =>

The number of sources for this query has changed.  There are [x] sources in the 
checkpoint offsets and now there are [y] sources requested by the query.  
Cannot continue.

This is the current message.

02-03-2018 13:14:22 ERROR StreamExecution:91 - Query ORPositionsState to Kafka 
[id = 35f71e63-dbd0-49e9-98b2-a4c72a7da80e, runId = 
d4439aca-549c-4ef6-872e-29fbfde1df78] terminated with error 
java.lang.AssertionError: assertion failed at 
scala.Predef$.assert(Predef.scala:156) at 
org.apache.spark.sql.execution.streaming.OffsetSeq.toStreamProgress(OffsetSeq.scala:38)
 at 
org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$populateStartOffsets(StreamExecution.scala:429)
 at 
org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(StreamExecution.scala:297)
 at 
org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1$$anonfun$apply$mcZ$sp$1.apply(StreamExecution.scala:294)
 at 
org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1$$anonfun$apply$mcZ$sp$1.apply(StreamExecution.scala:294)
 at 
org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:279)
 at 
org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
 at 
org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1.apply$mcZ$sp(StreamExecution.scala:294)
 at 
org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56)

  was:
If you change the number of sources for a Structured Streaming query then you 
will get an assertion error as the number of sources in the checkpoint does not 
match the number of sources in the query that is starting.  This can happen if, 
for example, you add a union to the input of the query.  This is of course 
correct but the error is a bit cryptic and requires investigation.

Suggestion for a more informative error message =>

The number of sources for this query has changed.  There was [x] sources in the 
checkpoint offsets and now there are [y] sources requested by the query.  
Cannot continue.

This is the current message.

02-03-2018 13:14:22 ERROR StreamExecution:91 - Query ORPositionsState to Kafka 
[id = 35f71e63-dbd0-49e9-98b2-a4c72a7da80e, runId = 
d4439aca-549c-4ef6-872e-29fbfde1df78] terminated with error 
java.lang.AssertionError: assertion failed at 
scala.Predef$.assert(Predef.scala:156) at 
org.apache.spark.sql.execution.streaming.OffsetSeq.toStreamProgress(OffsetSeq.scala:38)
 at 
org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$populateStartOffsets(StreamExecution.scala:429)
 at 
org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(StreamExecution.scala:297)
 at 
org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1$$anonfun$apply$mcZ$sp$1.apply(StreamExecution.scala:294)
 at 
org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1$$anonfun$apply$mcZ$sp$1.apply(StreamExecution.scala:294)
 at 
org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:279)
 at 
org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
 at 
org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1.apply$mcZ$sp(StreamExecution.scala:294)
 at 
org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56)


> Improved error message for when the number of sources for a query changes
> -------------------------------------------------------------------------
>
>                 Key: SPARK-23565
>                 URL: https://issues.apache.org/jira/browse/SPARK-23565
>             Project: Spark
>          Issue Type: Improvement
>          Components: Structured Streaming
>    Affects Versions: 2.3.0
>            Reporter: Patrick McGloin
>            Priority: Minor
>
> If you change the number of sources for a Structured Streaming query then you 
> will get an assertion error as the number of sources in the checkpoint does 
> not match the number of sources in the query that is starting.  This can 
> happen if, for example, you add a union to the input of the query.  This is 
> of course correct but the error is a bit cryptic and requires investigation.
> Suggestion for a more informative error message =>
> The number of sources for this query has changed.  There are [x] sources in 
> the checkpoint offsets and now there are [y] sources requested by the query.  
> Cannot continue.
> This is the current message.
> 02-03-2018 13:14:22 ERROR StreamExecution:91 - Query ORPositionsState to 
> Kafka [id = 35f71e63-dbd0-49e9-98b2-a4c72a7da80e, runId = 
> d4439aca-549c-4ef6-872e-29fbfde1df78] terminated with error 
> java.lang.AssertionError: assertion failed at 
> scala.Predef$.assert(Predef.scala:156) at 
> org.apache.spark.sql.execution.streaming.OffsetSeq.toStreamProgress(OffsetSeq.scala:38)
>  at 
> org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$populateStartOffsets(StreamExecution.scala:429)
>  at 
> org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(StreamExecution.scala:297)
>  at 
> org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1$$anonfun$apply$mcZ$sp$1.apply(StreamExecution.scala:294)
>  at 
> org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1$$anonfun$apply$mcZ$sp$1.apply(StreamExecution.scala:294)
>  at 
> org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:279)
>  at 
> org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
>  at 
> org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1.apply$mcZ$sp(StreamExecution.scala:294)
>  at 
> org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to