bq. load the data from edge node to hdfs Does the loading involve accessing sqlserver ?
Please take a look at https://spark.apache.org/docs/latest/sql-programming-guide.html On Tue, Jun 7, 2016 at 7:19 AM, Marco Mistroni <mmistr...@gmail.com> wrote: > Hi > how about > > 1. have a process that read the data from your sqlserver and dumps it as > a file into a directory on your hd > 2. use spark-streanming to read data from that directory and store it > into hdfs > > perhaps there is some sort of spark 'connectors' that allows you to read > data from a db directly so you dont need to go via spk streaming? > > > hth > > > > > > > > > > > On Tue, Jun 7, 2016 at 3:09 PM, Ajay Chander <itsche...@gmail.com> wrote: > >> Hi Spark users, >> >> Right now we are using spark for everything(loading the data from >> sqlserver, apply transformations, save it as permanent tables in >> hive) in our environment. Everything is being done in one spark application. >> >> The only thing we do before we launch our spark application through >> oozie is, to load the data from edge node to hdfs(it is being triggered >> through a ssh action from oozie to run shell script on edge node). >> >> My question is, there's any way we can accomplish edge-to-hdfs copy >> through spark ? So that everything is done in one spark DAG and lineage >> graph? >> >> Any pointers are highly appreciated. Thanks >> >> Regards, >> Aj >> > >