I use Spark rather that Sqoop to import data from an Oracle table into a Hive ORC table.
It used JDBC for this purpose. All inclusive in Scala itself. Also Hive runs on Spark engine. Order of magnitude faster with Inde on map-reduce/. pretty simple. HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 7 June 2016 at 15:38, Ted Yu <yuzhih...@gmail.com> wrote: > bq. load the data from edge node to hdfs > > Does the loading involve accessing sqlserver ? > > Please take a look at > https://spark.apache.org/docs/latest/sql-programming-guide.html > > On Tue, Jun 7, 2016 at 7:19 AM, Marco Mistroni <mmistr...@gmail.com> > wrote: > >> Hi >> how about >> >> 1. have a process that read the data from your sqlserver and dumps it as >> a file into a directory on your hd >> 2. use spark-streanming to read data from that directory and store it >> into hdfs >> >> perhaps there is some sort of spark 'connectors' that allows you to read >> data from a db directly so you dont need to go via spk streaming? >> >> >> hth >> >> >> >> >> >> >> >> >> >> >> On Tue, Jun 7, 2016 at 3:09 PM, Ajay Chander <itsche...@gmail.com> wrote: >> >>> Hi Spark users, >>> >>> Right now we are using spark for everything(loading the data from >>> sqlserver, apply transformations, save it as permanent tables in >>> hive) in our environment. Everything is being done in one spark application. >>> >>> The only thing we do before we launch our spark application through >>> oozie is, to load the data from edge node to hdfs(it is being triggered >>> through a ssh action from oozie to run shell script on edge node). >>> >>> My question is, there's any way we can accomplish edge-to-hdfs copy >>> through spark ? So that everything is done in one spark DAG and lineage >>> graph? >>> >>> Any pointers are highly appreciated. Thanks >>> >>> Regards, >>> Aj >>> >> >> >