There are 4 cores on my system. Running spark with setMaster("local[2]") results in: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
7 root 20 0 4748836 563400 29064 S 24.6 7.0 1:16.54 /usr/jdk1.8.0_101/bin/java -cp /conf/:/usr/spark-2.0.0-preview-bin-hadoop2.6/jars/* -Xmx1g org.apache.spark.de+ 114 root 20 0 114208 31956 7028 S 15.7 0.4 0:16.35 python -m pyspark.daemon 117 root 20 0 114404 32116 7028 S 15.7 0.4 0:17.28 python -m pyspark.daemon 41 root 20 0 443548 60920 10416 S 0.0 0.8 0:10.84 python /test.py 111 root 20 0 101272 31740 9356 S 0.0 0.4 0:00.29 python -m pyspark.daemon with a processing time over 3 seconds running the code below. There must be a lot of overhead somewhere as the code some nearly nothing, i.e., no expensive calculations on a socket stream getting one message per second. How to reduce this overhead? On 25.07.2016 20:19, on wrote: > OK, sorry, I am running in local mode. > Just a very small setup... > > (changed the subject) > > On 25.07.2016 20:01, Mich Talebzadeh wrote: >> Hi, >> >> From your reference I can see that you are running in local mode with >> two cores. But that is not standalone. >> >> Can you please clarify whether you start master and slaves processes. >> Those are for standalone mode. >> >> sbin/start-master.sh >> sbin/start-slaves.sh >> >> HTH >> >> Dr Mich Talebzadeh >> >> >> >> LinkedIn >> / >> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw/ >> >> >> >> http://talebzadehmich.wordpress.com >> >> >> *Disclaimer:* Use it at your own risk.Any and all responsibility for >> any loss, damage or destruction of data or any other property which >> may arise from relying on this email's technical content is explicitly >> disclaimed. The author will in no case be liable for any monetary >> damages arising from such loss, damage or destruction. >> >> >> >> >> On 25 July 2016 at 18:21, on <schueler_1...@web.de >> <mailto:schueler_1...@web.de>> wrote: >> >> Dear all, >> >> I am running spark on one host ("local[2]") doing calculations >> like this >> on a socket stream. >> mainStream = socketStream.filter(lambda msg: >> msg['header'].startswith('test')).map(lambda x: (x['host'], x) ) >> s1 = mainStream.updateStateByKey(updateFirst).map(lambda x: (1, x) ) >> s2 = mainStream.updateStateByKey(updateSecond, >> initialRDD=initialMachineStates).map(lambda x: (2, x) ) >> out.join(bla2).foreachRDD(no_out) >> >> I evaluated each calculations allone has a processing time about 400ms >> but processing time of the code above is over 3 sec on average. >> >> I know there are a lot of parameters unknown but does anybody has >> hints >> how to tune this code / system? I already changed a lot of parameters, >> such as #executors, #cores and so on. >> >> Thanks in advance and best regards, >> on >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> <mailto:user-unsubscr...@spark.apache.org> >> >> --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org