Re: Parquet Format Sqoop

Preethi Krishnan Mon, 25 Feb 2019 10:31:50 -0800

Thanks for the response Markus and Grzegorz.

Grzegorz, I’m trying to copy the entire tables from postgres to Google Cloud. 
Did you mean the approach should be the following?



  *   Use Sqoop to copy the data onto to Cloud in text format.
  *   Run a spark on Cloud (Hadoop cluster) that converts the data from text to 
parquet.

Thanks
Preethi
From: Grzegorz Solecki <gsolec...@gmail.com>
Reply-To: "user@sqoop.apache.org" <user@sqoop.apache.org>
Date: Monday, February 25, 2019 at 11:23 AM
To: "user@sqoop.apache.org" <user@sqoop.apache.org>
Subject: Re: Parquet Format Sqoop

Based on our experience, it's better not to use Sqoop to create Parquet files.
Even if you manage to achieve that you create a parquet file then you will have 
ridiculous data type problems when it comes to working with Hive metastore.
I recommend Spark SQL when it comes to creating Parquet files.
It works very flawlessly.


On Mon, Feb 25, 2019 at 12:54 PM Markus Kemper 
<mar...@cloudera.com<mailto:mar...@cloudera.com>> wrote:
To the best of my knowledge the only way to use Sqoop export with Parquet is 
via the --hcat options, sample below

sqoop export --connect $MYSQL_CONN --username $MYSQL_USER --password 
$MYSQL_PSWD --table t2 --num-mappers 1 --hcatalog-database default 
--hcatalog-table t1_parquet_table


Markus Kemper
Cloudera Support



On Mon, Feb 25, 2019 at 12:36 PM Preethi Krishnan 
<pkrish...@pandora.com<mailto:pkrish...@pandora.com>> wrote:

Hi,

I’m using the scoop Hadoop jar to scoop the data from Postgres to Google Cloud 
(GCS). It is working fine for text format. But I’m unable to load it in the 
parquet format. It does not fail but it does not load the data either.The jar 
file I’m using is sqoop-1.4.7-hadoop260.jar.

Is there any specific way I should be loading the data in parquet format using 
sqoop?


Thanks
Preethi

Re: Parquet Format Sqoop

Reply via email to