Re: Using a Database to persist and load data from

2014-10-31 Thread Kamal Banga
You can also use PairRDDFunctions' saveAsNewAPIHadoopFile that takes an OutputFormat class. So you will have to write a custom OutputFormat class that extends OutputFormat. In this class, you will have to implement a getRecordWriter which returns a custom RecordWriter. So you will also have to

Re: Using a Database to persist and load data from

2014-10-31 Thread Sonal Goyal
I think you can try to use the Hadoop DBOutputFormat Best Regards, Sonal Nube Technologies http://www.nubetech.co http://in.linkedin.com/in/sonalgoyal On Fri, Oct 31, 2014 at 1:00 PM, Kamal Banga ka...@sigmoidanalytics.com wrote: You can also use PairRDDFunctions' saveAsNewAPIHadoopFile

Using a Database to persist and load data from

2014-10-30 Thread Asaf Lahav
Hi Ladies and Gents, I would like to know what are the options I have if I would like to leverage Spark code I already have written to use a DB (Vertica) as its store/datasource. The data is of tabular nature. So any relational DB can essentially be used. Do I need to develop a context? If yes,

Re: Using a Database to persist and load data from

2014-10-30 Thread Yanbo Liang
AFAIK, you can read data from DB with JdbcRDD, but there is no interface for writing to DB. JdbcRDD has some restrict such as SQL must with where clause. For writing to DB, you can use mapPartitions or foreachPartition to implement. You can refer this example: