Re: Writing an RDD to Hive

2013-12-09 Thread Philip Ogren
Any chance you could sketch out the Shark APIs that you use for this? Matei's response suggests that the preferred API is coming in the next release (i.e. RDDTable class in 0.8.1). Are you building Shark from the latest in the repo and using that? Or have you figured out other API calls

Re: Writing an RDD to Hive

2013-12-07 Thread Matei Zaharia
Hi Philip, There are a few things you can do: - If you want to avoid the data copy with a CREATE TABLE statement, you can use CREATE EXTERNAL TABLE, which points to an existing file or directory. - If you always reuse the same table, you could CREATE TABLE only once and then simply place

Writing an RDD to Hive

2013-12-06 Thread Philip Ogren
I have a simple scenario that I'm struggling to implement. I would like to take a fairly simple RDD generated from a large log file, perform some transformations on it, and write the results out such that I can perform a Hive query either from Hive (via Hue) or Shark. I'm having troubles