Hi, I was writing some docs on Spark P&T and came across this.
It is about the terminology or interpretation of that in Spark doc. This is my understanding of cores and threads. Cores are physical cores. Threads are virtual cores. Cores with 2 threads is called hyper threading technology so 2 threads per core makes the core work on two loads at same time. In other words, every thread takes care of one load. Core has its own memory. So if you have a dual core with hyper threading, the core works with 2 loads each at same time because of the 2 threads per core, but this 2 threads will share memory in that core. Some vendors as I am sure most of you aware charge licensing per core. For example on the same host that I have Spark, I have a SAP product that checks the licensing and shuts the application down if the license does not agree with the cores speced. This is what it says ./cpuinfo License hostid: 00e04c69159a 0050b60fd1e7 Detected 12 logical processor(s), 6 core(s), in 1 chip(s) So here I have 12 logical processors and 6 cores and 1 chip. I call logical processors as threads so I have 12 threads? Now if I go and start worker process ${SPARK_HOME}/sbin/start-slaves.sh, I see this in GUI page [image: Inline images 1] it says 12 cores but I gather it is threads? Spark document <http://spark.apache.org/docs/latest/submitting-applications.html> states and I quote [image: Inline images 2] OK the line local[k] adds .. *set this to the number of cores on your machine* But I know that it means threads. Because if I went and set that to 6, it would be only 6 threads as opposed to 12 threads. the next line local[*] seems to indicate it correctly as it refers to "logical cores" that in my understanding it is threads. I trust that I am not nitpicking here! Cheers, Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com