Re: Can Spark benefit from Hive-like partitions?

2015-01-26 Thread Michael Armbrust
You can create a partitioned hive table using Spark SQL: http://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables On Mon, Jan 26, 2015 at 5:40 AM, Danny Yates da...@codeaholics.org wrote: Hi, I've got a bunch of data stored in S3 under directories like this:

Can Spark benefit from Hive-like partitions?

2015-01-26 Thread Danny Yates
Hi, I've got a bunch of data stored in S3 under directories like this: s3n://blah/y=2015/m=01/d=25/lots-of-files.csv In Hive, if I issue a query WHERE y=2015 AND m=01, I get the benefit that it only scans the necessary directories for files to read. As far as I can tell from searching and

Re: Can Spark benefit from Hive-like partitions?

2015-01-26 Thread Cheng Lian
Currently no if you don't want to use Spark SQL's HiveContext. But we're working on adding partitioning support to the external data sources API, with which you can create, for example, partitioned Parquet tables without using Hive. Cheng On 1/26/15 8:47 AM, Danny Yates wrote: Thanks

Re: Can Spark benefit from Hive-like partitions?

2015-01-26 Thread Chris Gore
Good to hear there will be partitioning support. I’ve had some success loading partitioned data specified with Unix glowing format. i.e.: sc.textFile(s3:/bucket/directory/dt=2014-11-{2[4-9],30}T00-00-00”) would load dates 2014-11-24 through 2014-11-30. Not the most ideal solution, but it

Re: Can Spark benefit from Hive-like partitions?

2015-01-26 Thread Danny Yates
Thanks Michael. I'm not actually using Hive at the moment - in fact, I'm trying to avoid it if I can. I'm just wondering whether Spark has anything similar I can leverage? Thanks

Re: Can Spark benefit from Hive-like partitions?

2015-01-26 Thread Michael Armbrust
I'm not actually using Hive at the moment - in fact, I'm trying to avoid it if I can. I'm just wondering whether Spark has anything similar I can leverage? Let me clarify, you do not need to have Hive installed, and what I'm suggesting is completely self-contained in Spark SQL. We support

Re: Can Spark benefit from Hive-like partitions?

2015-01-26 Thread Danny Yates
Ah, well that is interesting. I'll experiment further tomorrow. Thank you for the info! - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org