Hi,

    I am new to spark.  I have began to read to understand sparks RDD files
as well as SparkSQL.  My question is more on how to build out the RDD files
and best practices.   I have data that is broken down by hour into files on
HDFS in avro format.   Do I need to create a separate RDD for each file? or
using SparkSQL a separate SchemaRDD?

I want to be able to pull lets say an entire day of data into spark and run
some analytics on it.  Then possibly a week, a month, etc.


If there is documentation on this procedure or best practives for building
RDD's please point me at them.

Thanks for your time,
   Sam

Reply via email to