[jira] [Commented] (SPARK-18805) InternalMapWithStateDStream make java.lang.StackOverflowError

2019-01-02 Thread Joost Verdoorn (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16731924#comment-16731924
 ] 

Joost Verdoorn commented on SPARK-18805:


This issue occurs relatively often within our application, when resuming from 
checkpoint. Is there any progress on this?

> InternalMapWithStateDStream make java.lang.StackOverflowError 
> --
>
> Key: SPARK-18805
> URL: https://issues.apache.org/jira/browse/SPARK-18805
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 1.6.3, 2.0.2
> Environment: mesos
>Reporter: etienne
>Priority: Major
>
> When load InternalMapWithStateDStream from a check point.
> If isValidTime is true and if there is no generatedRDD at the given time 
> there is an infinite loop.
> 1) compute is call on InternalMapWithStateDStream
> 2) InternalMapWithStateDStream try to generate the previousRDD
> 3) Stream look in generatedRDD if the RDD is already generated for the given 
> time 
> 4) It not fund the rdd so it check if the time is valid.
> 5) if the time is valid call compute on InternalMapWithStateDStream
> 6) restart from 1)
> Here the exception that illustrate this error
> {code}
> Exception in thread "streaming-start" java.lang.StackOverflowError
>   at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:341)
>   at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:341)
>   at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
>   at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:340)
>   at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:340)
>   at 
> org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:415)
>   at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:335)
>   at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:333)
>   at scala.Option.orElse(Option.scala:289)
>   at 
> org.apache.spark.streaming.dstream.DStream.getOrCompute(DStream.scala:330)
>   at 
> org.apache.spark.streaming.dstream.InternalMapWithStateDStream.compute(MapWithStateDStream.scala:134)
>   at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:341)
>   at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:341)
>   at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
>   at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:340)
>   at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:340)
>   at 
> org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:415)
>   at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:335)
>   at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:333)
>   at scala.Option.orElse(Option.scala:289)
>   at 
> org.apache.spark.streaming.dstream.DStream.getOrCompute(DStream.scala:330)
>   at 
> org.apache.spark.streaming.dstream.InternalMapWithStateDStream.compute(MapWithStateDStream.scala:134)
>   at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:341)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16854) mapWithState Support for Python

2018-07-19 Thread Joost Verdoorn (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549209#comment-16549209
 ] 

Joost Verdoorn commented on SPARK-16854:


mapWithState would be extremely helpful within python. Any plans on supporting 
this soon?

> mapWithState Support for Python
> ---
>
> Key: SPARK-16854
> URL: https://issues.apache.org/jira/browse/SPARK-16854
> Project: Spark
>  Issue Type: Task
>  Components: PySpark
>Affects Versions: 1.6.2, 2.0.0
>Reporter: Boaz
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-15428) Disable support for multiple streaming aggregations

2018-07-19 Thread Joost Verdoorn (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549174#comment-16549174
 ] 

Joost Verdoorn edited comment on SPARK-15428 at 7/19/18 11:56 AM:
--

I was wondering the same. Being able to do only one aggregation within 
structured streaming is extremely limiting. Any idea on when (if ever) multiple 
aggregations could be supported? [~tdas]


was (Author: joostverdoorn):
I was wondering the same. Being able to do only one aggregation within 
structured streaming is extremely limiting. Any idea on when (if ever) multiple 
aggregations could be supported?

> Disable support for multiple streaming aggregations
> ---
>
> Key: SPARK-15428
> URL: https://issues.apache.org/jira/browse/SPARK-15428
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Reporter: Tathagata Das
>Assignee: Tathagata Das
>Priority: Major
> Fix For: 2.0.0
>
>
> Incrementalizing plans of with multiple streaming aggregation is tricky and 
> we dont have the necessary support for "delta" to implement correctly. So 
> disabling the support for multiple streaming aggregations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15428) Disable support for multiple streaming aggregations

2018-07-19 Thread Joost Verdoorn (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549174#comment-16549174
 ] 

Joost Verdoorn commented on SPARK-15428:


I was wondering the same. Being able to do only one aggregation within 
structured streaming is extremely limiting. Any idea on when (if ever) multiple 
aggregations could be supported?

> Disable support for multiple streaming aggregations
> ---
>
> Key: SPARK-15428
> URL: https://issues.apache.org/jira/browse/SPARK-15428
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Reporter: Tathagata Das
>Assignee: Tathagata Das
>Priority: Major
> Fix For: 2.0.0
>
>
> Incrementalizing plans of with multiple streaming aggregation is tricky and 
> we dont have the necessary support for "delta" to implement correctly. So 
> disabling the support for multiple streaming aggregations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org