Re: Zeppelin with Spark Streaming

Felipe Almeida Mon, 15 Feb 2016 17:32:57 -0800

Yes there is. I do it every day. It's very useful to try things out to see
if things are behaving according to plan.


Create the spark streaming context from the sc SparkContext (I'm using
@transient for everything (because I run stuff outside of any enclosing
scope, like an object) stuff gets "sucked up" into the context and has to
be serialized, which causes lots of errors)

@transient val ssc = new StreamingContext(sc,windowDuration)


Here is an example of stuff you can do (I'm consuming Kinesis streams, you
might use something else)

@transient val streams = (0 until 1).map { i =>
  KinesisUtils.createStream(ssc, appName, streamName, endpointUrl, regionName,
    InitialPositionInStream.LATEST, windowDuration, StorageLevel.MEMORY_ONLY)
}
@transient val events = ssc
.union(streams)
.map(byteArray => new String(byteArray))
.map(str => Funcs.parseLogEvent(str))
.filter(tryLogEvent => tryLogEvent.isSuccess)
.map(tryLogEvent => tryLogEvent.get)
.filter(logEvent => logEvent.DataType == "Event" &&
logEvent.RequestType != "UIInputError" && logEvent.RequestType !=
"UIEmptyInput")


You need an action like print() otherwise nothing happens, you probably
know that

events.print(10)


Now you start the context:

ssc.start()


Now you need to wait for as long as it takes for a window to "close", and
then you stop the streaming context, *but not the underlying SparkContext*,
otherwise you need to restart the interpreter:

ssc.stop(stopSparkContext=false, stopGracefully=true)


Everything should work I guess. Let me know if you run into other problems.

*FA*


*FA *
http://queirozf.com
“Every time you stay out late; every time you sleep in; every time you miss
a workout; every time you don’t give 100% – You make it that much easier
for me to beat you.” - Unknown author

On 15 February 2016 at 23:15, Michael Gummelt <mgumm...@mesosphere.io>
wrote:

> Hi,
>
> I've seen this JIRA regarding Spark Streaming:
> https://issues.apache.org/jira/browse/ZEPPELIN-274, but I have a more
> basic question.  Is there a way to iterate on a Spark Streaming job in
> Zeppelin without restarting the interpreter?  Specifically, I'd like to
> start a streaming job, see some results, rewrite the job, then start it
> again.  Spark has some limitations that make this difficult:
> http://spark.apache.org/docs/latest/streaming-programming-guide.html#points-to-remember
>
> But I'm wondering if there's a workaround.  Thanks.
>
> --
> Michael Gummelt
> Software Engineer
> Mesosphere
>

Re: Zeppelin with Spark Streaming

Reply via email to