Re: Control Sqoop job from Spark job

Shyam P Tue, 03 Sep 2019 00:20:02 -0700

Hi Mich,
   Lot of people say that Spark does not have proven record in migrating
data from oracle as sqoop has.
At list in production.


Please correct me if I am wrong and suggest how to deal with shuffling when
dealing with groupBy ?

Thanks,
Shyam

On Sat, Aug 31, 2019 at 12:17 PM Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> Spark is an excellent ETL tool to lift data from source and put it in
> target. Spark uses JDBC connection similar to Sqoop. I don't see the need
> for Sqoop with Spark here.
>
> Where is the source (Oracle MSSQL, etc) and target (Hive?) here
>
> HTH
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Thu, 29 Aug 2019 at 21:01, Chetan Khatri <chetan.opensou...@gmail.com>
> wrote:
>
>> Hi Users,
>> I am launching a Sqoop job from Spark job and would like to FAIL Spark
>> job if Sqoop job fails.
>>
>> def executeSqoopOriginal(serverName: String, schemaName: String, username: 
>> String, password: String,
>>                  query: String, splitBy: String, fetchSize: Int, numMappers: 
>> Int, targetDir: String, jobName: String, dateColumns: String) = {
>>
>>   val connectionString = "jdbc:sqlserver://" + serverName + ";" + 
>> "databaseName=" + schemaName
>>   var parameters = Array("import")
>>   parameters = parameters :+ "-Dmapreduce.job.user.classpath.first=true"
>>   parameters = parameters :+ "--connect"
>>   parameters = parameters :+ connectionString
>>   parameters = parameters :+ "--mapreduce-job-name"
>>   parameters = parameters :+ jobName
>>   parameters = parameters :+ "--username"
>>   parameters = parameters :+ username
>>   parameters = parameters :+ "--password"
>>   parameters = parameters :+ password
>>   parameters = parameters :+ "--hadoop-mapred-home"
>>   parameters = parameters :+ "/usr/hdp/2.6.5.0-292/hadoop-mapreduce/"
>>   parameters = parameters :+ "--hadoop-home"
>>   parameters = parameters :+ "/usr/hdp/2.6.5.0-292/hadoop/"
>>   parameters = parameters :+ "--query"
>>   parameters = parameters :+ query
>>   parameters = parameters :+ "--split-by"
>>   parameters = parameters :+ splitBy
>>   parameters = parameters :+ "--fetch-size"
>>   parameters = parameters :+ fetchSize.toString
>>   parameters = parameters :+ "--num-mappers"
>>   parameters = parameters :+ numMappers.toString
>>   if (dateColumns.length() > 0) {
>>     parameters = parameters :+ "--map-column-java"
>>     parameters = parameters :+ dateColumns
>>   }
>>   parameters = parameters :+ "--target-dir"
>>   parameters = parameters :+ targetDir
>>   parameters = parameters :+ "--delete-target-dir"
>>   parameters = parameters :+ "--as-avrodatafile"
>>
>> }
>>
>>

Re: Control Sqoop job from Spark job

Reply via email to