I've looked a bit into what DataFrames are, and it seems that most posts on
the subject are related to SQL, but it does seem to be very efficient. My
main questions is: Are DataFrames also beneficial for non-SQL computations? 

For instance I want to:
- sort k/v pairs (in particular, is the naive versus efficient
- perform some arbitrary map-reduce instructions

I am wondering this, as I read about the *naive vs cache aware layout*, and
also read the following on the databricks blog:
"The first pieces will land in Spark 1.4, which includes explicitly managed
memory for aggregation operations *in Spark’s DataFrame API* as well as
customized serializers. Expanded coverage of binary memory management and
cache-aware data structures will appear in Spark 1.5."
This leads me to believe that the cache aware layout that also seems
beneficial for regular computation/sort is (currently?) only implemented in
dataFrames (?) and makes me wonder if I then should just use dataFrames in
my "regular" computation.

Thanks in advance,

Tom

P.S. currently using the master branch from the gitHub



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/DataFrames-for-non-SQL-computation-tp23281.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to