Hi Mich, Lot of people say that Spark does not have proven record in migrating data from oracle as sqoop has. At list in production.
Please correct me if I am wrong and suggest how to deal with shuffling when dealing with groupBy ? Thanks, Shyam On Sat, Aug 31, 2019 at 12:17 PM Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > Spark is an excellent ETL tool to lift data from source and put it in > target. Spark uses JDBC connection similar to Sqoop. I don't see the need > for Sqoop with Spark here. > > Where is the source (Oracle MSSQL, etc) and target (Hive?) here > > HTH > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Thu, 29 Aug 2019 at 21:01, Chetan Khatri <chetan.opensou...@gmail.com> > wrote: > >> Hi Users, >> I am launching a Sqoop job from Spark job and would like to FAIL Spark >> job if Sqoop job fails. >> >> def executeSqoopOriginal(serverName: String, schemaName: String, username: >> String, password: String, >> query: String, splitBy: String, fetchSize: Int, numMappers: >> Int, targetDir: String, jobName: String, dateColumns: String) = { >> >> val connectionString = "jdbc:sqlserver://" + serverName + ";" + >> "databaseName=" + schemaName >> var parameters = Array("import") >> parameters = parameters :+ "-Dmapreduce.job.user.classpath.first=true" >> parameters = parameters :+ "--connect" >> parameters = parameters :+ connectionString >> parameters = parameters :+ "--mapreduce-job-name" >> parameters = parameters :+ jobName >> parameters = parameters :+ "--username" >> parameters = parameters :+ username >> parameters = parameters :+ "--password" >> parameters = parameters :+ password >> parameters = parameters :+ "--hadoop-mapred-home" >> parameters = parameters :+ "/usr/hdp/2.6.5.0-292/hadoop-mapreduce/" >> parameters = parameters :+ "--hadoop-home" >> parameters = parameters :+ "/usr/hdp/2.6.5.0-292/hadoop/" >> parameters = parameters :+ "--query" >> parameters = parameters :+ query >> parameters = parameters :+ "--split-by" >> parameters = parameters :+ splitBy >> parameters = parameters :+ "--fetch-size" >> parameters = parameters :+ fetchSize.toString >> parameters = parameters :+ "--num-mappers" >> parameters = parameters :+ numMappers.toString >> if (dateColumns.length() > 0) { >> parameters = parameters :+ "--map-column-java" >> parameters = parameters :+ dateColumns >> } >> parameters = parameters :+ "--target-dir" >> parameters = parameters :+ targetDir >> parameters = parameters :+ "--delete-target-dir" >> parameters = parameters :+ "--as-avrodatafile" >> >> } >> >>