Hi, have you tried using partitioning and parquet format. It works super fast in SPARK.
Regards, Gourav On Mon, May 30, 2016 at 5:08 PM, Michael Segel <msegel_had...@hotmail.com> wrote: > I’m not sure where to post this since its a bit of a philosophical > question in terms of design and vision for spark. > > If we look at SparkSQL and performance… where does Secondary indexing fit > in? > > The reason this is a bit awkward is that if you view Spark as querying > RDDs which are temporary, indexing doesn’t make sense until you consider > your use case and how long is ‘temporary’. > Then if you consider your RDD result set could be based on querying > tables… and you could end up with an inverted table as an index… then > indexing could make sense. > > Does it make sense to discuss this in user or dev email lists? Has anyone > given this any thought in the past? > > Thx > > -Mike > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >