I am not entirely certain I understand your questions, but let me assume you are mostly interested in SparkSQL and are thinking about your problem in terms of SQL-like tables.
1. Shuo Xiang mentioned Spark partitioning strategies, but in case you are talking about data partitioning or sharding as exist in Hive, SparkSQL does not currently support this, though it is on the roadmap. We can read from partitioned Hive tables, however. 2. If by entries/record you mean something like columns/row, SparkSQL does allow you to project out the columns you want, or select all columns. The efficiency of such a projection is determined by the how the data is stored, however: If your data is stored in an inherently row-based format, this projection will be no faster than doing an initial map() over the data to only select the desired columns. If it's stored in something like Parquet, or cached in memory, however, we would avoid ever looking at the unused columns. 3. Spark has a very generalized data source API, so it is capable of interacting with whatever data source. However, I don't think we currently have any SparkSQL connectors to RDBMSes that would support column pruning or other push-downs. This is all very much viable, however. On Fri, Jul 11, 2014 at 1:35 PM, Gonzalo Zarza <gonzalo.za...@globant.com> wrote: > Hi all, > > We've been evaluating Spark for a long-term project. Although we've been > reading several topics in forum, any hints on the following topics we'll be > extremely welcomed: > > 1. Which are the data partition strategies available in Spark? How > configurable are these strategies? > > 2. How would be the best way to use Spark if queries can touch only 3-5 > entries/records? Which strategy is the best if they want to perform a full > scan of the entries? > > 3. Is Spark capable of interacting with RDBMS? > > Thanks a lot! > > Best regards, > > -- > *Gonzalo Zarza* | PhD in High-Performance Computing | Big-Data Specialist > | > *GLOBANT* | AR: +54 11 4109 1700 ext. 15494 | US: +1 877 215 5230 ext. > 15494 | [image: Facebook] <https://www.facebook.com/Globant> [image: > Twitter] <http://www.twitter.com/globant> [image: Youtube] > <http://www.youtube.com/Globant> [image: Linkedin] > <http://www.linkedin.com/company/globant> [image: Pinterest] > <http://pinterest.com/globant/> [image: Globant] <http://www.globant.com/> >