Shark's in-memory code was ported to Spark SQL and is used by default when
you run .cache on a SchemaRDD or CACHE TABLE.

I'd also look at parquet which is more efficient and handles nested data
better.

On Fri, Feb 13, 2015 at 7:36 AM, Night Wolf <nightwolf...@gmail.com> wrote:

> Hi all,
>
> I'd like to build/use column oriented RDDs in some of my Spark code. A
> normal Spark RDD is stored as row oriented object if I understand
> correctly.
>
> I'd like to leverage some of the advantages of a columnar memory format.
> Shark (used to) and SparkSQL uses a columnar storage format using primitive
> arrays for each column.
>
> I'd be interested to know more about this approach and how I could build
> my own custom columnar-oriented RDD which I can use outside of Spark SQL.
>
> Could anyone give me some pointers on where to look to do something like
> this, either from scratch or using whats there in the SparkSQL libs or
> elsewhere. I know Evan Chan in a presentation made mention of building a
> custom RDD of column-oriented blocks of data.
>
> Cheers,
> ~N
>

Reply via email to