Re: Spark_Usecase

Ted Yu Tue, 07 Jun 2016 07:39:11 -0700

bq. load the data from edge node to hdfs

Does the loading involve accessing sqlserver ?


Please take a look at
https://spark.apache.org/docs/latest/sql-programming-guide.html

On Tue, Jun 7, 2016 at 7:19 AM, Marco Mistroni <[email protected]> wrote:

> Hi
> how about
>
> 1.  have a process that read the data from your sqlserver and dumps it as
> a file into a directory on your hd
> 2. use spark-streanming to read data from that directory  and store it
> into hdfs
>
> perhaps there is some sort of spark 'connectors' that allows you to read
> data from a db directly so you dont need to go via spk streaming?
>
>
> hth
>
>
>
>
>
>
>
>
>
>
> On Tue, Jun 7, 2016 at 3:09 PM, Ajay Chander <[email protected]> wrote:
>
>> Hi Spark users,
>>
>> Right now we are using spark for everything(loading the data from
>> sqlserver, apply transformations, save it as permanent tables in
>> hive) in our environment. Everything is being done in one spark application.
>>
>> The only thing we do before we launch our spark application through
>> oozie is, to load the data from edge node to hdfs(it is being triggered
>> through a ssh action from oozie to run shell script on edge node).
>>
>> My question is,  there's any way we can accomplish edge-to-hdfs copy
>> through spark ? So that everything is done in one spark DAG and lineage
>> graph?
>>
>> Any pointers are highly appreciated. Thanks
>>
>> Regards,
>> Aj
>>
>
>

Re: Spark_Usecase

Reply via email to