Or use Falcon ... The Spark JDBC I would try to avoid. Jdbc is not designed for these big data bulk operations, eg data has to be transferred uncompressed and there is the serialization/deserialization issue query result -> protocol -> Java objects -> writing to specific storage format etc This costs more time than you may think.
> On 25 May 2016, at 18:05, Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > > They are multiple ways of doing this without relying any vendors release. > > 1) Using hive EXPORT/IMPORT utility > > EXPORT TABLE table_or_partition TO hdfs_path; > IMPORT [[EXTERNAL] TABLE table_or_partition] FROM hdfs_path [LOCATION > [table_location]]; > 2) This works for individual tables but you can easily write a generic script > to pick up name of tables for a given database from Hive metadata. > example > > SELECT > t.owner AS Owner > , d.NAME AS DBName > , t.TBL_NAME AS Tablename > , TBL_TYPE > FROM tbls t, dbs d > WHERE > t.DB_ID = d.DB_ID > AND > TBL_TYPE IN ('MANAGED_TABLE','EXTERNAL_TABLE') > ORDER BY 1,2 > > Then a Linux shell script will table 5 min max to create and you have full > control of the code. You can even do multiple EXPORT/IMPORT at the same time. > > 3) Easier to create a shared NFS mount between PROD and UAT so you can put > the tables data and metadata on this NFS > > 2) Use Spark shell script to get data via JDBC from the source database and > push schema and data into the new env. Again this is no different from > getting the underlying data from Oracle or Sybase database and putting in Hive > > 3) Using vendor's product to do the same. I am not sure vendors do > parallelise this sort of things. > > HTH > > > > Dr Mich Talebzadeh > > LinkedIn > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > > http://talebzadehmich.wordpress.com > > >> On 25 May 2016 at 14:50, Suresh Kumar Sethuramaswamy <rock...@gmail.com> >> wrote: >> Hi >> >> If you are using CDH, via CM , Backup->replications you could do inter >> cluster hive data transfer including metadata >> >> Regards >> Suresh >> >> >>> On Wednesday, May 25, 2016, mahender bigdata <mahender.bigd...@outlook.com> >>> wrote: >>> Any Document on it. >>> >>>> On 4/8/2016 6:28 PM, Will Du wrote: >>>> did you try export and import statement in HQL? >>>> >>>>> On Apr 8, 2016, at 6:24 PM, Ashok Kumar <ashok34...@yahoo.com> wrote: >>>>> >>>>> Hi, >>>>> >>>>> Anyone has suggestions how to create and copy Hive and Spark tables from >>>>> Production to UAT. >>>>> >>>>> One way would be to copy table data to external files and then move the >>>>> external files to a local target directory and populate the tables in >>>>> target Hive with data. >>>>> >>>>> Is there an easier way of doing so? >>>>> >>>>> thanks >