Dear all, I am running spark on one host ("local[2]") doing calculations like this on a socket stream. mainStream = socketStream.filter(lambda msg: msg['header'].startswith('test')).map(lambda x: (x['host'], x) ) s1 = mainStream.updateStateByKey(updateFirst).map(lambda x: (1, x) ) s2 = mainStream.updateStateByKey(updateSecond, initialRDD=initialMachineStates).map(lambda x: (2, x) ) out.join(bla2).foreachRDD(no_out)
I evaluated each calculations allone has a processing time about 400ms but processing time of the code above is over 3 sec on average. I know there are a lot of parameters unknown but does anybody has hints how to tune this code / system? I already changed a lot of parameters, such as #executors, #cores and so on. Thanks in advance and best regards, on --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org