Hello all, I am having some problems with my custom java based receiver. I am running Spark 1.5.0 and I used the template on the spark website (http://spark.apache.org/docs/1.0.0/streaming-custom-receivers.html). Basically my receiver listens to a JMS queue (Solace) and then based on the size of the received messages or the number of messages, it stores this messages by calling store. It doesn't ack the messages until after store has been called.
The problems I am having are: 1. I am not able to use the internal backpressure system in spark streaming to control my receiver so it doesn't overload my executors. Is there something extra I need to implement in other to make the driver pause the receiver so my system is not unstable? I tried doing it myself by using a JobListener and stopping the RecieverTracker once I noticed the number of queued batches is at a set limit. This works but it means that every time I restarted the RecieverTracker (calling start), the processing time for each batch was increasing in the SparkUI. I think stopping it might be affecting the metrics. Is there any other way to do this? 2. The other problem I have is the WAL. If I ask spark to unpersist my rdds (spark.streaming.unpersist = true), I get a lot of WAL exceptions saying (Could not read from WriteAheadLog) This is a problem because even though I am calling persist on my RDD after processing it, Yarn sometimes kills my job because the containers have gone above the allocated memory limit. Does anyone have any idea how to get around this? If I set unpersist to false, this problem goes away. 3. Finally in other to avoid some of the issues I detailed above, I tried running a version of my application without streaming. So I connect to the queue in the driver, then create an RDD from the received messages and process them on the executors. I do this in an infinite loop until a stop file is placed on the HDFS file system and my spark application exits. This works fine to my surprise but I am not sure if this is a more stable and correct solution in place of Spark Streaming. I would be really grateful if anyone has come across similar problems and can shed some light on a solution. Thanks in advanced. ---- Charles Bajomo Operations Director www.cloudxtiny.co.uk | Precision Technology Consulting Ltd Registered England & Wales : 07397178 VAT No. : 124 4354 38 GB