AFAIK RDDs can only be created on the driver, not the executors. Also, `saveAsTextFile(...)` is an action and hence can also only be executed on the driver.
As Silvio already mentioned, Sqoop may be a good option. On Wed, Jul 1, 2015 at 12:46 PM, Shushant Arora <shushantaror...@gmail.com> wrote: > List of tables is not large , RDD is created on table list to parllelise > the work of fetching tables in multiple mappers at same time.Since time > taken to fetch a table is significant , so can't run that sequentially. > > > Content of table fetched by a map job is large, so one option is to dump > content to hdfs using filesystem api from inside map function for every few > rows of table fetched. > > I cannot keep complete table in memory and then dump in hdfs using below > map function- > > JavaRDD<String> tablecontent = tablelistrdd.map(new > Function<String,Iterable<String>>) > {public Iterable<String> call(String tablename){ > ..make jdbc connection get table data and populate in list and return > that.. > } > tablecontent .saveAsTextFile("hdfspath"); > > Here I wanted to create customRDD- whose partitions would be in memory on > multiple executors and contains parts of table data. And i would have > called saveAsTextFile on customRDD directly to save in hdfs. > > > > On Thu, Jul 2, 2015 at 12:59 AM, Feynman Liang <fli...@databricks.com> > wrote: > >> >> On Wed, Jul 1, 2015 at 7:19 AM, Shushant Arora <shushantaror...@gmail.com >> > wrote: >> >>> JavaRDD<String> rdd = javasparkcontext.parllelise(tables); >> >> >> You are already creating an RDD in Java here ;) >> >> However, it's not clear to me why you'd want to make this an RDD. Is the >> list of tables so large that it doesn't fit on a single machine? If not, >> you may be better off spinning up one spark job for dumping each table in >> tables using a JDBC datasource >> <https://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases> >> . >> >> On Wed, Jul 1, 2015 at 12:00 PM, Silvio Fiorito < >> silvio.fior...@granturing.com> wrote: >> >>> Sure, you can create custom RDDs. Haven’t done so in Java, but in >>> Scala absolutely. >>> >>> From: Shushant Arora >>> Date: Wednesday, July 1, 2015 at 1:44 PM >>> To: Silvio Fiorito >>> Cc: user >>> Subject: Re: custom RDD in java >>> >>> ok..will evaluate these options but is it possible to create RDD in >>> java? >>> >>> >>> On Wed, Jul 1, 2015 at 8:29 PM, Silvio Fiorito < >>> silvio.fior...@granturing.com> wrote: >>> >>>> If all you’re doing is just dumping tables from SQLServer to HDFS, >>>> have you looked at Sqoop? >>>> >>>> Otherwise, if you need to run this in Spark could you just use the >>>> existing JdbcRDD? >>>> >>>> >>>> From: Shushant Arora >>>> Date: Wednesday, July 1, 2015 at 10:19 AM >>>> To: user >>>> Subject: custom RDD in java >>>> >>>> Hi >>>> >>>> Is it possible to write custom RDD in java? >>>> >>>> Requirement is - I am having a list of Sqlserver tables need to be >>>> dumped in HDFS. >>>> >>>> So I have a >>>> List<String> tables = {dbname.tablename,dbname.tablename2......}; >>>> >>>> then >>>> JavaRDD<String> rdd = javasparkcontext.parllelise(tables); >>>> >>>> JavaRDDString> tablecontent = rdd.map(new >>>> Function<String,Iterable<String>>){fetch table and return populate >>>> iterable} >>>> >>>> tablecontent.storeAsTextFile("hffs path"); >>>> >>>> >>>> In rdd.map(new Function<String,>). I cannot keep complete table >>>> content in memory , so I want to creat my own RDD to handle it. >>>> >>>> Thanks >>>> Shushant >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>> >> >