Re: Control Sqoop job from Spark job

2019-10-17 Thread Chetan Khatri
Shyam, As mark said - if we boost the parallelism with  spark we can reach
to performance of sqoop or better than that.

On Tue, Sep 3, 2019 at 6:35 PM Shyam P  wrote:

> J Franke,
>  Leave alone sqoop , I am just asking about spark in ETL of Oracle ...?
>
> Thanks,
> Shyam
>
>>


Re: Control Sqoop job from Spark job

2019-09-03 Thread Shyam P
J Franke,
 Leave alone sqoop , I am just asking about spark in ETL of Oracle ...?

Thanks,
Shyam

>


Re: Control Sqoop job from Spark job

2019-09-03 Thread Jörn Franke
This I would not say. The only “issue” with Spark is that you need to build 
some functionality on top which is available in Sqoop out of the box, 
especially for import processes and if you need to define a lot of them.

> Am 03.09.2019 um 09:30 schrieb Shyam P :
> 
> Hi Mich,
>Lot of people say that Spark does not have proven record in migrating data 
> from oracle as sqoop has.
> At list in production.
> 
> Please correct me if I am wrong and suggest how to deal with shuffling when 
> dealing with groupBy ?
> 
> Thanks,
> Shyam
> 
>> On Sat, Aug 31, 2019 at 12:17 PM Mich Talebzadeh  
>> wrote:
>> Spark is an excellent ETL tool to lift data from source and put it in 
>> target. Spark uses JDBC connection similar to Sqoop. I don't see the need 
>> for Sqoop with Spark here.
>> 
>> Where is the source (Oracle MSSQL, etc) and target (Hive?) here
>> 
>> HTH
>> 
>> Dr Mich Talebzadeh
>>  
>> LinkedIn  
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>  
>> http://talebzadehmich.wordpress.com
>> 
>> Disclaimer: Use it at your own risk. Any and all responsibility for any 
>> loss, damage or destruction of data or any other property which may arise 
>> from relying on this email's technical content is explicitly disclaimed. The 
>> author will in no case be liable for any monetary damages arising from such 
>> loss, damage or destruction.
>>  
>> 
>> 
>>> On Thu, 29 Aug 2019 at 21:01, Chetan Khatri  
>>> wrote:
>>> Hi Users,
>>> I am launching a Sqoop job from Spark job and would like to FAIL Spark job 
>>> if Sqoop job fails.
>>> 
>>> def executeSqoopOriginal(serverName: String, schemaName: String, username: 
>>> String, password: String,
>>>  query: String, splitBy: String, fetchSize: Int, 
>>> numMappers: Int, targetDir: String, jobName: String, dateColumns: String) = 
>>> {
>>> 
>>>   val connectionString = "jdbc:sqlserver://" + serverName + ";" + 
>>> "databaseName=" + schemaName
>>>   var parameters = Array("import")
>>>   parameters = parameters :+ "-Dmapreduce.job.user.classpath.first=true"
>>>   parameters = parameters :+ "--connect"
>>>   parameters = parameters :+ connectionString
>>>   parameters = parameters :+ "--mapreduce-job-name"
>>>   parameters = parameters :+ jobName
>>>   parameters = parameters :+ "--username"
>>>   parameters = parameters :+ username
>>>   parameters = parameters :+ "--password"
>>>   parameters = parameters :+ password
>>>   parameters = parameters :+ "--hadoop-mapred-home"
>>>   parameters = parameters :+ "/usr/hdp/2.6.5.0-292/hadoop-mapreduce/"
>>>   parameters = parameters :+ "--hadoop-home"
>>>   parameters = parameters :+ "/usr/hdp/2.6.5.0-292/hadoop/"
>>>   parameters = parameters :+ "--query"
>>>   parameters = parameters :+ query
>>>   parameters = parameters :+ "--split-by"
>>>   parameters = parameters :+ splitBy
>>>   parameters = parameters :+ "--fetch-size"
>>>   parameters = parameters :+ fetchSize.toString
>>>   parameters = parameters :+ "--num-mappers"
>>>   parameters = parameters :+ numMappers.toString
>>>   if (dateColumns.length() > 0) {
>>> parameters = parameters :+ "--map-column-java"
>>> parameters = parameters :+ dateColumns
>>>   }
>>>   parameters = parameters :+ "--target-dir"
>>>   parameters = parameters :+ targetDir
>>>   parameters = parameters :+ "--delete-target-dir"
>>>   parameters = parameters :+ "--as-avrodatafile"
>>> 
>>> }


Re: Control Sqoop job from Spark job

2019-09-03 Thread Shyam P
Hi Mich,
   Lot of people say that Spark does not have proven record in migrating
data from oracle as sqoop has.
At list in production.

Please correct me if I am wrong and suggest how to deal with shuffling when
dealing with groupBy ?

Thanks,
Shyam

On Sat, Aug 31, 2019 at 12:17 PM Mich Talebzadeh 
wrote:

> Spark is an excellent ETL tool to lift data from source and put it in
> target. Spark uses JDBC connection similar to Sqoop. I don't see the need
> for Sqoop with Spark here.
>
> Where is the source (Oracle MSSQL, etc) and target (Hive?) here
>
> HTH
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Thu, 29 Aug 2019 at 21:01, Chetan Khatri 
> wrote:
>
>> Hi Users,
>> I am launching a Sqoop job from Spark job and would like to FAIL Spark
>> job if Sqoop job fails.
>>
>> def executeSqoopOriginal(serverName: String, schemaName: String, username: 
>> String, password: String,
>>  query: String, splitBy: String, fetchSize: Int, numMappers: 
>> Int, targetDir: String, jobName: String, dateColumns: String) = {
>>
>>   val connectionString = "jdbc:sqlserver://" + serverName + ";" + 
>> "databaseName=" + schemaName
>>   var parameters = Array("import")
>>   parameters = parameters :+ "-Dmapreduce.job.user.classpath.first=true"
>>   parameters = parameters :+ "--connect"
>>   parameters = parameters :+ connectionString
>>   parameters = parameters :+ "--mapreduce-job-name"
>>   parameters = parameters :+ jobName
>>   parameters = parameters :+ "--username"
>>   parameters = parameters :+ username
>>   parameters = parameters :+ "--password"
>>   parameters = parameters :+ password
>>   parameters = parameters :+ "--hadoop-mapred-home"
>>   parameters = parameters :+ "/usr/hdp/2.6.5.0-292/hadoop-mapreduce/"
>>   parameters = parameters :+ "--hadoop-home"
>>   parameters = parameters :+ "/usr/hdp/2.6.5.0-292/hadoop/"
>>   parameters = parameters :+ "--query"
>>   parameters = parameters :+ query
>>   parameters = parameters :+ "--split-by"
>>   parameters = parameters :+ splitBy
>>   parameters = parameters :+ "--fetch-size"
>>   parameters = parameters :+ fetchSize.toString
>>   parameters = parameters :+ "--num-mappers"
>>   parameters = parameters :+ numMappers.toString
>>   if (dateColumns.length() > 0) {
>> parameters = parameters :+ "--map-column-java"
>> parameters = parameters :+ dateColumns
>>   }
>>   parameters = parameters :+ "--target-dir"
>>   parameters = parameters :+ targetDir
>>   parameters = parameters :+ "--delete-target-dir"
>>   parameters = parameters :+ "--as-avrodatafile"
>>
>> }
>>
>>


Re: Control Sqoop job from Spark job

2019-09-02 Thread Chris Teoh
Hey Chetan,

How many database connections are you anticipating in this job? Is this for
every row in the dataframe?

Kind regards
Chris


On Mon., 2 Sep. 2019, 9:11 pm Chetan Khatri, 
wrote:

> Hi Chris, Thanks for the email. You're right. but it's like Sqoop job gets
> launched based on dataframe values in spark job. Certainly it can be
> isolated and broken.
>
> On Sat, Aug 31, 2019 at 8:07 AM Chris Teoh  wrote:
>
>> I'd say this is an uncommon approach, could you use a workflow/scheduling
>> system to call Sqoop outside of Spark? Spark is usually multiprocess
>> distributed so putting in this Sqoop job in the Spark code seems to imply
>> you want to run Sqoop first, then Spark. If you're really insistent on
>> this, call it from the driver using Sqoop Java APIs.
>>
>> On Fri, 30 Aug 2019 at 06:02, Chetan Khatri 
>> wrote:
>>
>>> Sorry,
>>> I call sqoop job from above function. Can you help me to resolve this.
>>>
>>> Thanks
>>>
>>> On Fri, Aug 30, 2019 at 1:31 AM Chetan Khatri <
>>> chetan.opensou...@gmail.com> wrote:
>>>
 Hi Users,
 I am launching a Sqoop job from Spark job and would like to FAIL Spark
 job if Sqoop job fails.

 def executeSqoopOriginal(serverName: String, schemaName: String, username: 
 String, password: String,
  query: String, splitBy: String, fetchSize: Int, 
 numMappers: Int, targetDir: String, jobName: String, dateColumns: String) 
 = {

   val connectionString = "jdbc:sqlserver://" + serverName + ";" + 
 "databaseName=" + schemaName
   var parameters = Array("import")
   parameters = parameters :+ "-Dmapreduce.job.user.classpath.first=true"
   parameters = parameters :+ "--connect"
   parameters = parameters :+ connectionString
   parameters = parameters :+ "--mapreduce-job-name"
   parameters = parameters :+ jobName
   parameters = parameters :+ "--username"
   parameters = parameters :+ username
   parameters = parameters :+ "--password"
   parameters = parameters :+ password
   parameters = parameters :+ "--hadoop-mapred-home"
   parameters = parameters :+ "/usr/hdp/2.6.5.0-292/hadoop-mapreduce/"
   parameters = parameters :+ "--hadoop-home"
   parameters = parameters :+ "/usr/hdp/2.6.5.0-292/hadoop/"
   parameters = parameters :+ "--query"
   parameters = parameters :+ query
   parameters = parameters :+ "--split-by"
   parameters = parameters :+ splitBy
   parameters = parameters :+ "--fetch-size"
   parameters = parameters :+ fetchSize.toString
   parameters = parameters :+ "--num-mappers"
   parameters = parameters :+ numMappers.toString
   if (dateColumns.length() > 0) {
 parameters = parameters :+ "--map-column-java"
 parameters = parameters :+ dateColumns
   }
   parameters = parameters :+ "--target-dir"
   parameters = parameters :+ targetDir
   parameters = parameters :+ "--delete-target-dir"
   parameters = parameters :+ "--as-avrodatafile"

 }


>>
>> --
>> Chris
>>
>


Re: Control Sqoop job from Spark job

2019-09-02 Thread Mich Talebzadeh
Hi,

Just to clarify, JDBC connection to RDBMS from Spark is slow?

This one read from an Oracle table with 4 connections in parallel to Oracle
table assuming there is a primary key on the Oracle tale

//
// Get maxID first
//
val minID = HiveContext.read.format("jdbc").options(Map("url" ->
_ORACLEserver,"dbtable" -> "(SELECT cast(MIN(ID) AS INT) AS minID FROM
scratchpad.dummy)",
   "user" -> _username, "password" ->
_password)).load().collect.apply(0).getDecimal(0).toString
val maxID = HiveContext.read.format("jdbc").options(Map("url" ->
_ORACLEserver,"dbtable" -> "(SELECT cast(MAX(ID) AS INT) AS maxID FROM
scratchpad.dummy)",
   "user" -> _username, "password" ->
_password)).load().collect.apply(0).getDecimal(0).toString
val s = HiveContext.read.format("jdbc").options(
   Map("url" -> _ORACLEserver,
   "dbtable" -> "(SELECT ID, CLUSTERED, SCATTERED, RANDOMISED,
RANDOM_STRING, SMALL_VC, PADDING FROM scratchpad.dummy)",
   "partitionColumn" -> "ID",
   "lowerBound" -> minID,
   "upperBound" -> maxID,
   "numPartitions" -> "4",
   "user" -> _username,
   "password" -> _password)).load

HTH

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Mon, 2 Sep 2019 at 12:12, Chetan Khatri 
wrote:

> Hi Mich, JDBC Connection which is similar to Sqoop takes time and could
> not do parallelism.
>
> On Sat, Aug 31, 2019 at 12:17 PM Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> Spark is an excellent ETL tool to lift data from source and put it in
>> target. Spark uses JDBC connection similar to Sqoop. I don't see the need
>> for Sqoop with Spark here.
>>
>> Where is the source (Oracle MSSQL, etc) and target (Hive?) here
>>
>> HTH
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Thu, 29 Aug 2019 at 21:01, Chetan Khatri 
>> wrote:
>>
>>> Hi Users,
>>> I am launching a Sqoop job from Spark job and would like to FAIL Spark
>>> job if Sqoop job fails.
>>>
>>> def executeSqoopOriginal(serverName: String, schemaName: String, username: 
>>> String, password: String,
>>>  query: String, splitBy: String, fetchSize: Int, 
>>> numMappers: Int, targetDir: String, jobName: String, dateColumns: String) = 
>>> {
>>>
>>>   val connectionString = "jdbc:sqlserver://" + serverName + ";" + 
>>> "databaseName=" + schemaName
>>>   var parameters = Array("import")
>>>   parameters = parameters :+ "-Dmapreduce.job.user.classpath.first=true"
>>>   parameters = parameters :+ "--connect"
>>>   parameters = parameters :+ connectionString
>>>   parameters = parameters :+ "--mapreduce-job-name"
>>>   parameters = parameters :+ jobName
>>>   parameters = parameters :+ "--username"
>>>   parameters = parameters :+ username
>>>   parameters = parameters :+ "--password"
>>>   parameters = parameters :+ password
>>>   parameters = parameters :+ "--hadoop-mapred-home"
>>>   parameters = parameters :+ "/usr/hdp/2.6.5.0-292/hadoop-mapreduce/"
>>>   parameters = parameters :+ "--hadoop-home"
>>>   parameters = parameters :+ "/usr/hdp/2.6.5.0-292/hadoop/"
>>>   parameters = parameters :+ "--query"
>>>   parameters = parameters :+ query
>>>   parameters = parameters :+ "--split-by"
>>>   parameters = parameters :+ splitBy
>>>   parameters = parameters :+ "--fetch-size"
>>>   parameters = parameters :+ fetchSize.toString
>>>   parameters = parameters :+ "--num-mappers"
>>>   parameters = parameters :+ numMappers.toString
>>>   if (dateColumns.length() > 0) {
>>> parameters = parameters :+ "--map-column-java"
>>> parameters = parameters :+ dateColumns
>>>   }
>>>   parameters = parameters :+ "--target-dir"
>>>   parameters = parameters :+ targetDir
>>>   parameters = parameters :+ "--delete-target-dir"
>>>   parameters = parameters :+ "--as-avrodatafile"
>>>
>>> }
>>>
>>>


Re: Control Sqoop job from Spark job

2019-09-02 Thread Chetan Khatri
Hi Mich, JDBC Connection which is similar to Sqoop takes time and could not
do parallelism.

On Sat, Aug 31, 2019 at 12:17 PM Mich Talebzadeh 
wrote:

> Spark is an excellent ETL tool to lift data from source and put it in
> target. Spark uses JDBC connection similar to Sqoop. I don't see the need
> for Sqoop with Spark here.
>
> Where is the source (Oracle MSSQL, etc) and target (Hive?) here
>
> HTH
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Thu, 29 Aug 2019 at 21:01, Chetan Khatri 
> wrote:
>
>> Hi Users,
>> I am launching a Sqoop job from Spark job and would like to FAIL Spark
>> job if Sqoop job fails.
>>
>> def executeSqoopOriginal(serverName: String, schemaName: String, username: 
>> String, password: String,
>>  query: String, splitBy: String, fetchSize: Int, numMappers: 
>> Int, targetDir: String, jobName: String, dateColumns: String) = {
>>
>>   val connectionString = "jdbc:sqlserver://" + serverName + ";" + 
>> "databaseName=" + schemaName
>>   var parameters = Array("import")
>>   parameters = parameters :+ "-Dmapreduce.job.user.classpath.first=true"
>>   parameters = parameters :+ "--connect"
>>   parameters = parameters :+ connectionString
>>   parameters = parameters :+ "--mapreduce-job-name"
>>   parameters = parameters :+ jobName
>>   parameters = parameters :+ "--username"
>>   parameters = parameters :+ username
>>   parameters = parameters :+ "--password"
>>   parameters = parameters :+ password
>>   parameters = parameters :+ "--hadoop-mapred-home"
>>   parameters = parameters :+ "/usr/hdp/2.6.5.0-292/hadoop-mapreduce/"
>>   parameters = parameters :+ "--hadoop-home"
>>   parameters = parameters :+ "/usr/hdp/2.6.5.0-292/hadoop/"
>>   parameters = parameters :+ "--query"
>>   parameters = parameters :+ query
>>   parameters = parameters :+ "--split-by"
>>   parameters = parameters :+ splitBy
>>   parameters = parameters :+ "--fetch-size"
>>   parameters = parameters :+ fetchSize.toString
>>   parameters = parameters :+ "--num-mappers"
>>   parameters = parameters :+ numMappers.toString
>>   if (dateColumns.length() > 0) {
>> parameters = parameters :+ "--map-column-java"
>> parameters = parameters :+ dateColumns
>>   }
>>   parameters = parameters :+ "--target-dir"
>>   parameters = parameters :+ targetDir
>>   parameters = parameters :+ "--delete-target-dir"
>>   parameters = parameters :+ "--as-avrodatafile"
>>
>> }
>>
>>


Re: Control Sqoop job from Spark job

2019-09-02 Thread Chetan Khatri
Hi Chris, Thanks for the email. You're right. but it's like Sqoop job gets
launched based on dataframe values in spark job. Certainly it can be
isolated and broken.

On Sat, Aug 31, 2019 at 8:07 AM Chris Teoh  wrote:

> I'd say this is an uncommon approach, could you use a workflow/scheduling
> system to call Sqoop outside of Spark? Spark is usually multiprocess
> distributed so putting in this Sqoop job in the Spark code seems to imply
> you want to run Sqoop first, then Spark. If you're really insistent on
> this, call it from the driver using Sqoop Java APIs.
>
> On Fri, 30 Aug 2019 at 06:02, Chetan Khatri 
> wrote:
>
>> Sorry,
>> I call sqoop job from above function. Can you help me to resolve this.
>>
>> Thanks
>>
>> On Fri, Aug 30, 2019 at 1:31 AM Chetan Khatri <
>> chetan.opensou...@gmail.com> wrote:
>>
>>> Hi Users,
>>> I am launching a Sqoop job from Spark job and would like to FAIL Spark
>>> job if Sqoop job fails.
>>>
>>> def executeSqoopOriginal(serverName: String, schemaName: String, username: 
>>> String, password: String,
>>>  query: String, splitBy: String, fetchSize: Int, 
>>> numMappers: Int, targetDir: String, jobName: String, dateColumns: String) = 
>>> {
>>>
>>>   val connectionString = "jdbc:sqlserver://" + serverName + ";" + 
>>> "databaseName=" + schemaName
>>>   var parameters = Array("import")
>>>   parameters = parameters :+ "-Dmapreduce.job.user.classpath.first=true"
>>>   parameters = parameters :+ "--connect"
>>>   parameters = parameters :+ connectionString
>>>   parameters = parameters :+ "--mapreduce-job-name"
>>>   parameters = parameters :+ jobName
>>>   parameters = parameters :+ "--username"
>>>   parameters = parameters :+ username
>>>   parameters = parameters :+ "--password"
>>>   parameters = parameters :+ password
>>>   parameters = parameters :+ "--hadoop-mapred-home"
>>>   parameters = parameters :+ "/usr/hdp/2.6.5.0-292/hadoop-mapreduce/"
>>>   parameters = parameters :+ "--hadoop-home"
>>>   parameters = parameters :+ "/usr/hdp/2.6.5.0-292/hadoop/"
>>>   parameters = parameters :+ "--query"
>>>   parameters = parameters :+ query
>>>   parameters = parameters :+ "--split-by"
>>>   parameters = parameters :+ splitBy
>>>   parameters = parameters :+ "--fetch-size"
>>>   parameters = parameters :+ fetchSize.toString
>>>   parameters = parameters :+ "--num-mappers"
>>>   parameters = parameters :+ numMappers.toString
>>>   if (dateColumns.length() > 0) {
>>> parameters = parameters :+ "--map-column-java"
>>> parameters = parameters :+ dateColumns
>>>   }
>>>   parameters = parameters :+ "--target-dir"
>>>   parameters = parameters :+ targetDir
>>>   parameters = parameters :+ "--delete-target-dir"
>>>   parameters = parameters :+ "--as-avrodatafile"
>>>
>>> }
>>>
>>>
>
> --
> Chris
>


Re: Control Sqoop job from Spark job

2019-08-31 Thread Mich Talebzadeh
Spark is an excellent ETL tool to lift data from source and put it in
target. Spark uses JDBC connection similar to Sqoop. I don't see the need
for Sqoop with Spark here.

Where is the source (Oracle MSSQL, etc) and target (Hive?) here

HTH

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Thu, 29 Aug 2019 at 21:01, Chetan Khatri 
wrote:

> Hi Users,
> I am launching a Sqoop job from Spark job and would like to FAIL Spark job
> if Sqoop job fails.
>
> def executeSqoopOriginal(serverName: String, schemaName: String, username: 
> String, password: String,
>  query: String, splitBy: String, fetchSize: Int, numMappers: 
> Int, targetDir: String, jobName: String, dateColumns: String) = {
>
>   val connectionString = "jdbc:sqlserver://" + serverName + ";" + 
> "databaseName=" + schemaName
>   var parameters = Array("import")
>   parameters = parameters :+ "-Dmapreduce.job.user.classpath.first=true"
>   parameters = parameters :+ "--connect"
>   parameters = parameters :+ connectionString
>   parameters = parameters :+ "--mapreduce-job-name"
>   parameters = parameters :+ jobName
>   parameters = parameters :+ "--username"
>   parameters = parameters :+ username
>   parameters = parameters :+ "--password"
>   parameters = parameters :+ password
>   parameters = parameters :+ "--hadoop-mapred-home"
>   parameters = parameters :+ "/usr/hdp/2.6.5.0-292/hadoop-mapreduce/"
>   parameters = parameters :+ "--hadoop-home"
>   parameters = parameters :+ "/usr/hdp/2.6.5.0-292/hadoop/"
>   parameters = parameters :+ "--query"
>   parameters = parameters :+ query
>   parameters = parameters :+ "--split-by"
>   parameters = parameters :+ splitBy
>   parameters = parameters :+ "--fetch-size"
>   parameters = parameters :+ fetchSize.toString
>   parameters = parameters :+ "--num-mappers"
>   parameters = parameters :+ numMappers.toString
>   if (dateColumns.length() > 0) {
> parameters = parameters :+ "--map-column-java"
> parameters = parameters :+ dateColumns
>   }
>   parameters = parameters :+ "--target-dir"
>   parameters = parameters :+ targetDir
>   parameters = parameters :+ "--delete-target-dir"
>   parameters = parameters :+ "--as-avrodatafile"
>
> }
>
>


Re: Control Sqoop job from Spark job

2019-08-30 Thread Chris Teoh
I'd say this is an uncommon approach, could you use a workflow/scheduling
system to call Sqoop outside of Spark? Spark is usually multiprocess
distributed so putting in this Sqoop job in the Spark code seems to imply
you want to run Sqoop first, then Spark. If you're really insistent on
this, call it from the driver using Sqoop Java APIs.

On Fri, 30 Aug 2019 at 06:02, Chetan Khatri 
wrote:

> Sorry,
> I call sqoop job from above function. Can you help me to resolve this.
>
> Thanks
>
> On Fri, Aug 30, 2019 at 1:31 AM Chetan Khatri 
> wrote:
>
>> Hi Users,
>> I am launching a Sqoop job from Spark job and would like to FAIL Spark
>> job if Sqoop job fails.
>>
>> def executeSqoopOriginal(serverName: String, schemaName: String, username: 
>> String, password: String,
>>  query: String, splitBy: String, fetchSize: Int, numMappers: 
>> Int, targetDir: String, jobName: String, dateColumns: String) = {
>>
>>   val connectionString = "jdbc:sqlserver://" + serverName + ";" + 
>> "databaseName=" + schemaName
>>   var parameters = Array("import")
>>   parameters = parameters :+ "-Dmapreduce.job.user.classpath.first=true"
>>   parameters = parameters :+ "--connect"
>>   parameters = parameters :+ connectionString
>>   parameters = parameters :+ "--mapreduce-job-name"
>>   parameters = parameters :+ jobName
>>   parameters = parameters :+ "--username"
>>   parameters = parameters :+ username
>>   parameters = parameters :+ "--password"
>>   parameters = parameters :+ password
>>   parameters = parameters :+ "--hadoop-mapred-home"
>>   parameters = parameters :+ "/usr/hdp/2.6.5.0-292/hadoop-mapreduce/"
>>   parameters = parameters :+ "--hadoop-home"
>>   parameters = parameters :+ "/usr/hdp/2.6.5.0-292/hadoop/"
>>   parameters = parameters :+ "--query"
>>   parameters = parameters :+ query
>>   parameters = parameters :+ "--split-by"
>>   parameters = parameters :+ splitBy
>>   parameters = parameters :+ "--fetch-size"
>>   parameters = parameters :+ fetchSize.toString
>>   parameters = parameters :+ "--num-mappers"
>>   parameters = parameters :+ numMappers.toString
>>   if (dateColumns.length() > 0) {
>> parameters = parameters :+ "--map-column-java"
>> parameters = parameters :+ dateColumns
>>   }
>>   parameters = parameters :+ "--target-dir"
>>   parameters = parameters :+ targetDir
>>   parameters = parameters :+ "--delete-target-dir"
>>   parameters = parameters :+ "--as-avrodatafile"
>>
>> }
>>
>>

-- 
Chris


Re: Control Sqoop job from Spark job

2019-08-29 Thread Chetan Khatri
Sorry,
I call sqoop job from above function. Can you help me to resolve this.

Thanks

On Fri, Aug 30, 2019 at 1:31 AM Chetan Khatri 
wrote:

> Hi Users,
> I am launching a Sqoop job from Spark job and would like to FAIL Spark job
> if Sqoop job fails.
>
> def executeSqoopOriginal(serverName: String, schemaName: String, username: 
> String, password: String,
>  query: String, splitBy: String, fetchSize: Int, numMappers: 
> Int, targetDir: String, jobName: String, dateColumns: String) = {
>
>   val connectionString = "jdbc:sqlserver://" + serverName + ";" + 
> "databaseName=" + schemaName
>   var parameters = Array("import")
>   parameters = parameters :+ "-Dmapreduce.job.user.classpath.first=true"
>   parameters = parameters :+ "--connect"
>   parameters = parameters :+ connectionString
>   parameters = parameters :+ "--mapreduce-job-name"
>   parameters = parameters :+ jobName
>   parameters = parameters :+ "--username"
>   parameters = parameters :+ username
>   parameters = parameters :+ "--password"
>   parameters = parameters :+ password
>   parameters = parameters :+ "--hadoop-mapred-home"
>   parameters = parameters :+ "/usr/hdp/2.6.5.0-292/hadoop-mapreduce/"
>   parameters = parameters :+ "--hadoop-home"
>   parameters = parameters :+ "/usr/hdp/2.6.5.0-292/hadoop/"
>   parameters = parameters :+ "--query"
>   parameters = parameters :+ query
>   parameters = parameters :+ "--split-by"
>   parameters = parameters :+ splitBy
>   parameters = parameters :+ "--fetch-size"
>   parameters = parameters :+ fetchSize.toString
>   parameters = parameters :+ "--num-mappers"
>   parameters = parameters :+ numMappers.toString
>   if (dateColumns.length() > 0) {
> parameters = parameters :+ "--map-column-java"
> parameters = parameters :+ dateColumns
>   }
>   parameters = parameters :+ "--target-dir"
>   parameters = parameters :+ targetDir
>   parameters = parameters :+ "--delete-target-dir"
>   parameters = parameters :+ "--as-avrodatafile"
>
> }
>
>


Control Sqoop job from Spark job

2019-08-29 Thread Chetan Khatri
Hi Users,
I am launching a Sqoop job from Spark job and would like to FAIL Spark job
if Sqoop job fails.

def executeSqoopOriginal(serverName: String, schemaName: String,
username: String, password: String,
 query: String, splitBy: String, fetchSize: Int,
numMappers: Int, targetDir: String, jobName: String, dateColumns:
String) = {

  val connectionString = "jdbc:sqlserver://" + serverName + ";" +
"databaseName=" + schemaName
  var parameters = Array("import")
  parameters = parameters :+ "-Dmapreduce.job.user.classpath.first=true"
  parameters = parameters :+ "--connect"
  parameters = parameters :+ connectionString
  parameters = parameters :+ "--mapreduce-job-name"
  parameters = parameters :+ jobName
  parameters = parameters :+ "--username"
  parameters = parameters :+ username
  parameters = parameters :+ "--password"
  parameters = parameters :+ password
  parameters = parameters :+ "--hadoop-mapred-home"
  parameters = parameters :+ "/usr/hdp/2.6.5.0-292/hadoop-mapreduce/"
  parameters = parameters :+ "--hadoop-home"
  parameters = parameters :+ "/usr/hdp/2.6.5.0-292/hadoop/"
  parameters = parameters :+ "--query"
  parameters = parameters :+ query
  parameters = parameters :+ "--split-by"
  parameters = parameters :+ splitBy
  parameters = parameters :+ "--fetch-size"
  parameters = parameters :+ fetchSize.toString
  parameters = parameters :+ "--num-mappers"
  parameters = parameters :+ numMappers.toString
  if (dateColumns.length() > 0) {
parameters = parameters :+ "--map-column-java"
parameters = parameters :+ dateColumns
  }
  parameters = parameters :+ "--target-dir"
  parameters = parameters :+ targetDir
  parameters = parameters :+ "--delete-target-dir"
  parameters = parameters :+ "--as-avrodatafile"

}