Hi Tom, That's an outdated document from 4/5 years ago.
Spark currently uses a BitTorrent like mechanism that's been tuned for datacenter environments. Mosharaf -----Original Message----- From: "Tom" <thubregt...@gmail.com> Sent: 3/11/2015 4:58 PM To: "user@spark.apache.org" <user@spark.apache.org> Subject: Which strategy is used for broadcast variables? In "Performance and Scalability of Broadcast in Spark" by Mosharaf Chowdhury I read that Spark uses HDFS for its broadcast variables. This seems highly inefficient. In the same paper alternatives are proposed, among which "Bittorent Broadcast (BTB)". While studying "Learning Spark," page 105, second paragraph about Broadcast Variables, I read " The value is sent to each node only once, using an efficient, BitTorrent-like communication mechanism." - Is the book talking about the proposed BTB from the paper? - Is this currently the default? - If not, what is? Thanks, Tom -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Which-strategy-is-used-for-broadcast-variables-tp22004.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org