I've looked a bit into what DataFrames are, and it seems that most posts on the subject are related to SQL, but it does seem to be very efficient. My main questions is: Are DataFrames also beneficial for non-SQL computations?
For instance I want to: - sort k/v pairs (in particular, is the naive versus efficient - perform some arbitrary map-reduce instructions I am wondering this, as I read about the *naive vs cache aware layout*, and also read the following on the databricks blog: "The first pieces will land in Spark 1.4, which includes explicitly managed memory for aggregation operations *in Spark’s DataFrame API* as well as customized serializers. Expanded coverage of binary memory management and cache-aware data structures will appear in Spark 1.5." This leads me to believe that the cache aware layout that also seems beneficial for regular computation/sort is (currently?) only implemented in dataFrames (?) and makes me wonder if I then should just use dataFrames in my "regular" computation. Thanks in advance, Tom P.S. currently using the master branch from the gitHub -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/DataFrames-for-non-SQL-computation-tp23281.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org