Dear all,

I am running spark on one host ("local[2]") doing calculations like this
on a socket stream.
mainStream = socketStream.filter(lambda msg:
msg['header'].startswith('test')).map(lambda x: (x['host'], x) )
s1 = mainStream.updateStateByKey(updateFirst).map(lambda x: (1, x) )
s2 = mainStream.updateStateByKey(updateSecond,
initialRDD=initialMachineStates).map(lambda x: (2, x) )
out.join(bla2).foreachRDD(no_out)

I evaluated each calculations allone has a processing time about 400ms
but processing time of the code above is over 3 sec on average.

I know there are a lot of parameters unknown but does anybody has hints
how to tune this code / system? I already changed a lot of parameters,
such as #executors, #cores and so on.

Thanks in advance and best regards,
on

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to