Hi Prashant!
That's very interesting. So this appears to be a general issue with ActorStream, then?With all DStream's I suppose.
OK, so that seems like a bug independent of the issue around having no ActorSystem in scope for runJob(), then.
I have not spent enough time on spray-io yet, but if it is implemented as ActorExtension, then using actorStream is the best fit. For example I have used akka-camel/akka-zeromq to receive streams from variety of endpoints. I am not suggesting you should use it, although it has SSL support etc. That was the original motivation while designing it. But if it can be somehow used in a different and more useful way, I had encourage it.
I had like to understand a bit further, as to what are your ultimate intention in a bit more detail on architecture from a higher level. Probably I am hinting on a redesign.
That's certainly a valid option!
Without discussing things I'm not at liberty to, all I can really say is that I'm trying to build a real-time streaming server that takes a connection from a client that expects to be able to then send arbitrary bytes to the server. On the server, I want a DStream per socket connection to process the incoming data. There's other stuff, such as this needing to support TLS mutual authentication, but I'll get there when I have data flowing. :-)
It's really quite simple, or at least it should be, and again, I believe it will be, when the bugs we've identified are fixed.
In your case, the code is still complicated to spot where exactly we are leaking an ActorRef in the test case ?
If it helps, the only ActorRef in my code now is in the SpraySslClient, i.e. nowhere that's being serialized.
or in our Handler.
I have to respectfully suggest it is indeed this. OTOH, is it really a "leak?" I am, in fact, closing over objects that, sooner or later, construct ActorRefs. The whole point of my task is to do something with data coming from an ActorRef that represents a TCP connection, then send the result back to that same ActorRef. I guess I'm having a little trouble understanding how many actorStream use cases there are that wouldn't involve ultimately closing over at least one ActorRef. :-)
It is also possible to start multiple actorStream receiver in advance and then using their supervisor to start more actor as receiver for you, for example on each connection (if that makes any sense, take a look at it).
I'm afraid I didn't quite follow that, but it sounds interesting!
At this point, though, I have to suggest that the path forward, simply for Spark Streaming to work correctly out of the box, is clear. I have created an issue against the missing ActorSystem in runJob() in JIRA, but it sounds like we need another one for the foreach ... foreach ... issue. I'm afraid I don't know exactly how to characterize that issue. Could I possibly impose upon you to create that ticket?
Thanks again for digging into tis!
Paul
