Re: custom RDD in java
AFAIK RDDs can only be created on the driver, not the executors. Also, `saveAsTextFile(...)` is an action and hence can also only be executed on the driver. As Silvio already mentioned, Sqoop may be a good option. On Wed, Jul 1, 2015 at 12:46 PM, Shushant Arora wrote: > List of tables is not large , RDD is created on table list to parllelise > the work of fetching tables in multiple mappers at same time.Since time > taken to fetch a table is significant , so can't run that sequentially. > > > Content of table fetched by a map job is large, so one option is to dump > content to hdfs using filesystem api from inside map function for every few > rows of table fetched. > > I cannot keep complete table in memory and then dump in hdfs using below > map function- > > JavaRDD tablecontent = tablelistrdd.map(new > Function>) > {public Iterable call(String tablename){ > ..make jdbc connection get table data and populate in list and return > that.. > } > tablecontent .saveAsTextFile("hdfspath"); > > Here I wanted to create customRDD- whose partitions would be in memory on > multiple executors and contains parts of table data. And i would have > called saveAsTextFile on customRDD directly to save in hdfs. > > > > On Thu, Jul 2, 2015 at 12:59 AM, Feynman Liang > wrote: > >> >> On Wed, Jul 1, 2015 at 7:19 AM, Shushant Arora > > wrote: >> >>> JavaRDD rdd = javasparkcontext.parllelise(tables); >> >> >> You are already creating an RDD in Java here ;) >> >> However, it's not clear to me why you'd want to make this an RDD. Is the >> list of tables so large that it doesn't fit on a single machine? If not, >> you may be better off spinning up one spark job for dumping each table in >> tables using a JDBC datasource >> <https://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases> >> . >> >> On Wed, Jul 1, 2015 at 12:00 PM, Silvio Fiorito < >> silvio.fior...@granturing.com> wrote: >> >>> Sure, you can create custom RDDs. Haven’t done so in Java, but in >>> Scala absolutely. >>> >>> From: Shushant Arora >>> Date: Wednesday, July 1, 2015 at 1:44 PM >>> To: Silvio Fiorito >>> Cc: user >>> Subject: Re: custom RDD in java >>> >>> ok..will evaluate these options but is it possible to create RDD in >>> java? >>> >>> >>> On Wed, Jul 1, 2015 at 8:29 PM, Silvio Fiorito < >>> silvio.fior...@granturing.com> wrote: >>> >>>> If all you’re doing is just dumping tables from SQLServer to HDFS, >>>> have you looked at Sqoop? >>>> >>>> Otherwise, if you need to run this in Spark could you just use the >>>> existing JdbcRDD? >>>> >>>> >>>> From: Shushant Arora >>>> Date: Wednesday, July 1, 2015 at 10:19 AM >>>> To: user >>>> Subject: custom RDD in java >>>> >>>> Hi >>>> >>>> Is it possible to write custom RDD in java? >>>> >>>> Requirement is - I am having a list of Sqlserver tables need to be >>>> dumped in HDFS. >>>> >>>> So I have a >>>> List tables = {dbname.tablename,dbname.tablename2..}; >>>> >>>> then >>>> JavaRDD rdd = javasparkcontext.parllelise(tables); >>>> >>>> JavaRDDString> tablecontent = rdd.map(new >>>> Function>){fetch table and return populate >>>> iterable} >>>> >>>> tablecontent.storeAsTextFile("hffs path"); >>>> >>>> >>>> In rdd.map(new Function). I cannot keep complete table >>>> content in memory , so I want to creat my own RDD to handle it. >>>> >>>> Thanks >>>> Shushant >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>> >> >
Re: custom RDD in java
List of tables is not large , RDD is created on table list to parllelise the work of fetching tables in multiple mappers at same time.Since time taken to fetch a table is significant , so can't run that sequentially. Content of table fetched by a map job is large, so one option is to dump content to hdfs using filesystem api from inside map function for every few rows of table fetched. I cannot keep complete table in memory and then dump in hdfs using below map function- JavaRDD tablecontent = tablelistrdd.map(new Function>) {public Iterable call(String tablename){ ..make jdbc connection get table data and populate in list and return that.. } tablecontent .saveAsTextFile("hdfspath"); Here I wanted to create customRDD- whose partitions would be in memory on multiple executors and contains parts of table data. And i would have called saveAsTextFile on customRDD directly to save in hdfs. On Thu, Jul 2, 2015 at 12:59 AM, Feynman Liang wrote: > > On Wed, Jul 1, 2015 at 7:19 AM, Shushant Arora > wrote: > >> JavaRDD rdd = javasparkcontext.parllelise(tables); > > > You are already creating an RDD in Java here ;) > > However, it's not clear to me why you'd want to make this an RDD. Is the > list of tables so large that it doesn't fit on a single machine? If not, > you may be better off spinning up one spark job for dumping each table in > tables using a JDBC datasource > <https://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases> > . > > On Wed, Jul 1, 2015 at 12:00 PM, Silvio Fiorito < > silvio.fior...@granturing.com> wrote: > >> Sure, you can create custom RDDs. Haven’t done so in Java, but in >> Scala absolutely. >> >> From: Shushant Arora >> Date: Wednesday, July 1, 2015 at 1:44 PM >> To: Silvio Fiorito >> Cc: user >> Subject: Re: custom RDD in java >> >> ok..will evaluate these options but is it possible to create RDD in >> java? >> >> >> On Wed, Jul 1, 2015 at 8:29 PM, Silvio Fiorito < >> silvio.fior...@granturing.com> wrote: >> >>> If all you’re doing is just dumping tables from SQLServer to HDFS, >>> have you looked at Sqoop? >>> >>> Otherwise, if you need to run this in Spark could you just use the >>> existing JdbcRDD? >>> >>> >>> From: Shushant Arora >>> Date: Wednesday, July 1, 2015 at 10:19 AM >>> To: user >>> Subject: custom RDD in java >>> >>> Hi >>> >>> Is it possible to write custom RDD in java? >>> >>> Requirement is - I am having a list of Sqlserver tables need to be >>> dumped in HDFS. >>> >>> So I have a >>> List tables = {dbname.tablename,dbname.tablename2..}; >>> >>> then >>> JavaRDD rdd = javasparkcontext.parllelise(tables); >>> >>> JavaRDDString> tablecontent = rdd.map(new >>> Function>){fetch table and return populate iterable} >>> >>> tablecontent.storeAsTextFile("hffs path"); >>> >>> >>> In rdd.map(new Function). I cannot keep complete table >>> content in memory , so I want to creat my own RDD to handle it. >>> >>> Thanks >>> Shushant >>> >>> >>> >>> >>> >>> >>> >> >
Re: custom RDD in java
On Wed, Jul 1, 2015 at 7:19 AM, Shushant Arora wrote: > JavaRDD rdd = javasparkcontext.parllelise(tables); You are already creating an RDD in Java here ;) However, it's not clear to me why you'd want to make this an RDD. Is the list of tables so large that it doesn't fit on a single machine? If not, you may be better off spinning up one spark job for dumping each table in tables using a JDBC datasource <https://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases> . On Wed, Jul 1, 2015 at 12:00 PM, Silvio Fiorito < silvio.fior...@granturing.com> wrote: > Sure, you can create custom RDDs. Haven’t done so in Java, but in Scala > absolutely. > > From: Shushant Arora > Date: Wednesday, July 1, 2015 at 1:44 PM > To: Silvio Fiorito > Cc: user > Subject: Re: custom RDD in java > > ok..will evaluate these options but is it possible to create RDD in > java? > > > On Wed, Jul 1, 2015 at 8:29 PM, Silvio Fiorito < > silvio.fior...@granturing.com> wrote: > >> If all you’re doing is just dumping tables from SQLServer to HDFS, have >> you looked at Sqoop? >> >> Otherwise, if you need to run this in Spark could you just use the >> existing JdbcRDD? >> >> >> From: Shushant Arora >> Date: Wednesday, July 1, 2015 at 10:19 AM >> To: user >> Subject: custom RDD in java >> >> Hi >> >> Is it possible to write custom RDD in java? >> >> Requirement is - I am having a list of Sqlserver tables need to be >> dumped in HDFS. >> >> So I have a >> List tables = {dbname.tablename,dbname.tablename2..}; >> >> then >> JavaRDD rdd = javasparkcontext.parllelise(tables); >> >> JavaRDDString> tablecontent = rdd.map(new >> Function>){fetch table and return populate iterable} >> >> tablecontent.storeAsTextFile("hffs path"); >> >> >> In rdd.map(new Function). I cannot keep complete table content >> in memory , so I want to creat my own RDD to handle it. >> >> Thanks >> Shushant >> >> >> >> >> >> >> >
Re: custom RDD in java
Sure, you can create custom RDDs. Haven’t done so in Java, but in Scala absolutely. From: Shushant Arora Date: Wednesday, July 1, 2015 at 1:44 PM To: Silvio Fiorito Cc: user Subject: Re: custom RDD in java ok..will evaluate these options but is it possible to create RDD in java? On Wed, Jul 1, 2015 at 8:29 PM, Silvio Fiorito mailto:silvio.fior...@granturing.com>> wrote: If all you’re doing is just dumping tables from SQLServer to HDFS, have you looked at Sqoop? Otherwise, if you need to run this in Spark could you just use the existing JdbcRDD? From: Shushant Arora Date: Wednesday, July 1, 2015 at 10:19 AM To: user Subject: custom RDD in java Hi Is it possible to write custom RDD in java? Requirement is - I am having a list of Sqlserver tables need to be dumped in HDFS. So I have a List tables = {dbname.tablename,dbname.tablename2..}; then JavaRDD rdd = javasparkcontext.parllelise(tables); JavaRDDString> tablecontent = rdd.map(new Function>){fetch table and return populate iterable} tablecontent.storeAsTextFile("hffs path"); In rdd.map(new Function). I cannot keep complete table content in memory , so I want to creat my own RDD to handle it. Thanks Shushant
Re: custom RDD in java
ok..will evaluate these options but is it possible to create RDD in java? On Wed, Jul 1, 2015 at 8:29 PM, Silvio Fiorito < silvio.fior...@granturing.com> wrote: > If all you’re doing is just dumping tables from SQLServer to HDFS, have > you looked at Sqoop? > > Otherwise, if you need to run this in Spark could you just use the > existing JdbcRDD? > > > From: Shushant Arora > Date: Wednesday, July 1, 2015 at 10:19 AM > To: user > Subject: custom RDD in java > > Hi > > Is it possible to write custom RDD in java? > > Requirement is - I am having a list of Sqlserver tables need to be > dumped in HDFS. > > So I have a > List tables = {dbname.tablename,dbname.tablename2..}; > > then > JavaRDD rdd = javasparkcontext.parllelise(tables); > > JavaRDDString> tablecontent = rdd.map(new > Function>){fetch table and return populate iterable} > > tablecontent.storeAsTextFile("hffs path"); > > > In rdd.map(new Function). I cannot keep complete table content > in memory , so I want to creat my own RDD to handle it. > > Thanks > Shushant > > > > > > >
Re: custom RDD in java
If all you’re doing is just dumping tables from SQLServer to HDFS, have you looked at Sqoop? Otherwise, if you need to run this in Spark could you just use the existing JdbcRDD? From: Shushant Arora Date: Wednesday, July 1, 2015 at 10:19 AM To: user Subject: custom RDD in java Hi Is it possible to write custom RDD in java? Requirement is - I am having a list of Sqlserver tables need to be dumped in HDFS. So I have a List tables = {dbname.tablename,dbname.tablename2..}; then JavaRDD rdd = javasparkcontext.parllelise(tables); JavaRDDString> tablecontent = rdd.map(new Function>){fetch table and return populate iterable} tablecontent.storeAsTextFile("hffs path"); In rdd.map(new Function). I cannot keep complete table content in memory , so I want to creat my own RDD to handle it. Thanks Shushant