best way to do incremental updates of a table in parquet format

Chirag Lakhani Mon, 02 Feb 2015 14:24:49 -0800

Hi,

I looked around the net but could not get a satisfactory answer to this
issue.  I have imported a MySQL table in Parquet format into a directory
in HDFS now I would like to do both append and also merge these files
with any updates that occur in the MySQL table in order to do the
appends I created the following job


sqoop job --create job_fact_application_event -- import --connect
jdbc:mysql://165.1.2.46:3306/warehouse --username admin --incremental
append  --check-column id --last-value 20372582  --table fact_table
--fetch-size -2147483648   --warehouse-dir
hdfs://165.23.22.78/datawarehouse/dw/ --as-parquetfile --password-file
/sqoop.pwd


The job finishes but I get the following warning

15/02/02 16:09:46 INFO mapreduce.ImportJobBase: Transferred 2.3228 MB in
28.5317 seconds (83.3668 KB/sec)
15/02/02 16:09:46 INFO mapreduce.ImportJobBase: Retrieved 141111 records.
15/02/02 16:09:46 WARN util.AppendUtils: Cannot append files to target
dir; no such directory:
_sqoop/02160916000000217_7663_ip_address_fact_application_event
15/02/02 16:09:46 INFO tool.ImportTool: Saving incremental import state
to the metastore
15/02/02 16:09:46 INFO tool.ImportTool: Updated data for job:
job_fact_application_event


When I look in that particular directory I do not see the parquet files

Is there a way of doing append with parquet files?

Chirag

best way to do incremental updates of a table in parquet format

Reply via email to