Hi Mich, thanks for your inputs. I used sqoop to get the data from MySQL.
Now I am using spark to do the same. Right now, I am trying
to implement incremental updates while loading from MySQL through spark.
Can you suggest any best practices for this ? Thank you.


On Tuesday, June 7, 2016, Mich Talebzadeh <[email protected]> wrote:

> I use Spark rather that Sqoop to import data from an Oracle table into a
> Hive ORC table.
>
> It used JDBC for this purpose. All inclusive in Scala itself.
>
> Also Hive runs on Spark engine. Order of magnitude faster with Inde on
> map-reduce/.
>
> pretty simple.
>
> HTH
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 7 June 2016 at 15:38, Ted Yu <[email protected]
> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:
>
>> bq. load the data from edge node to hdfs
>>
>> Does the loading involve accessing sqlserver ?
>>
>> Please take a look at
>> https://spark.apache.org/docs/latest/sql-programming-guide.html
>>
>> On Tue, Jun 7, 2016 at 7:19 AM, Marco Mistroni <[email protected]
>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:
>>
>>> Hi
>>> how about
>>>
>>> 1.  have a process that read the data from your sqlserver and dumps it
>>> as a file into a directory on your hd
>>> 2. use spark-streanming to read data from that directory  and store it
>>> into hdfs
>>>
>>> perhaps there is some sort of spark 'connectors' that allows you to read
>>> data from a db directly so you dont need to go via spk streaming?
>>>
>>>
>>> hth
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Jun 7, 2016 at 3:09 PM, Ajay Chander <[email protected]
>>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:
>>>
>>>> Hi Spark users,
>>>>
>>>> Right now we are using spark for everything(loading the data from
>>>> sqlserver, apply transformations, save it as permanent tables in
>>>> hive) in our environment. Everything is being done in one spark 
>>>> application.
>>>>
>>>> The only thing we do before we launch our spark application through
>>>> oozie is, to load the data from edge node to hdfs(it is being triggered
>>>> through a ssh action from oozie to run shell script on edge node).
>>>>
>>>> My question is,  there's any way we can accomplish edge-to-hdfs copy
>>>> through spark ? So that everything is done in one spark DAG and lineage
>>>> graph?
>>>>
>>>> Any pointers are highly appreciated. Thanks
>>>>
>>>> Regards,
>>>> Aj
>>>>
>>>
>>>
>>
>

Reply via email to