Hi Mich, thanks for your inputs. I used sqoop to get the data from MySQL. Now I am using spark to do the same. Right now, I am trying to implement incremental updates while loading from MySQL through spark. Can you suggest any best practices for this ? Thank you.
On Tuesday, June 7, 2016, Mich Talebzadeh <[email protected]> wrote: > I use Spark rather that Sqoop to import data from an Oracle table into a > Hive ORC table. > > It used JDBC for this purpose. All inclusive in Scala itself. > > Also Hive runs on Spark engine. Order of magnitude faster with Inde on > map-reduce/. > > pretty simple. > > HTH > > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > > On 7 June 2016 at 15:38, Ted Yu <[email protected] > <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: > >> bq. load the data from edge node to hdfs >> >> Does the loading involve accessing sqlserver ? >> >> Please take a look at >> https://spark.apache.org/docs/latest/sql-programming-guide.html >> >> On Tue, Jun 7, 2016 at 7:19 AM, Marco Mistroni <[email protected] >> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: >> >>> Hi >>> how about >>> >>> 1. have a process that read the data from your sqlserver and dumps it >>> as a file into a directory on your hd >>> 2. use spark-streanming to read data from that directory and store it >>> into hdfs >>> >>> perhaps there is some sort of spark 'connectors' that allows you to read >>> data from a db directly so you dont need to go via spk streaming? >>> >>> >>> hth >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> On Tue, Jun 7, 2016 at 3:09 PM, Ajay Chander <[email protected] >>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: >>> >>>> Hi Spark users, >>>> >>>> Right now we are using spark for everything(loading the data from >>>> sqlserver, apply transformations, save it as permanent tables in >>>> hive) in our environment. Everything is being done in one spark >>>> application. >>>> >>>> The only thing we do before we launch our spark application through >>>> oozie is, to load the data from edge node to hdfs(it is being triggered >>>> through a ssh action from oozie to run shell script on edge node). >>>> >>>> My question is, there's any way we can accomplish edge-to-hdfs copy >>>> through spark ? So that everything is done in one spark DAG and lineage >>>> graph? >>>> >>>> Any pointers are highly appreciated. Thanks >>>> >>>> Regards, >>>> Aj >>>> >>> >>> >> >
