Re: Sqoop export from Hive table stored as Parquet

Douglas Spadotto Tue, 25 Oct 2016 10:55:48 -0700

Hi Markus,

Thank you!


I tried this (hcatalog options) myself a few minutes after I hit "send" on
the e-mail. It worked fine, Sqoop was able to read the Parquet structure.
Just my MR crashed but it was due to my unstable environment.

It looks like I'm on the way to the solution.

Cheers,

Douglas

On Tue, Oct 25, 2016 at 3:32 PM, Markus Kemper <[email protected]> wrote:

> Hello Douglas,
>
> The only workaround that I am aware of is to use the Sqoop --hcatalog
> options, for example:
>
> sqoop export --connect <jdbc_connection_string> --table <rdbms_table>
> --hcatalog-database <hive_database> --hcatalog-table <hive_table>
>
>
>
> Markus Kemper
> Customer Operations Engineer
> [image: www.cloudera.com] <http://www.cloudera.com>
>
>
> On Tue, Oct 25, 2016 at 1:21 PM, Douglas Spadotto <[email protected]>
> wrote:
>
>> Hello everyone,
>>
>> I saw in the past few months quite a few messages about Parquet support
>> on Sqoop, all about importing. Some of them worked well.
>>
>> But for exporting I'm receiving this error when trying to export from a
>> Hive table stored as Parquet to Postgresql:
>>
>> [cloudera@quickstart ~]$ sqoop export --connect
>> jdbc:postgresql://localhost/postgres --table test1  --export-dir
>> /user/hive/warehouse/teste1
>> Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will
>> fail.
>> Please set $ACCUMULO_HOME to the root of your Accumulo installation.
>> 16/10/25 09:19:09 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.8.0
>> 16/10/25 09:19:09 INFO manager.SqlManager: Using default fetchSize of 1000
>> 16/10/25 09:19:09 INFO tool.CodeGenTool: Beginning code generation
>> 16/10/25 09:19:10 INFO manager.SqlManager: Executing SQL statement:
>> SELECT t.* FROM "test1" AS t LIMIT 1
>> 16/10/25 09:19:10 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is
>> /usr/lib/hadoop-mapreduce
>> Note: /tmp/sqoop-cloudera/compile/019c3435216213411e2de14c483af692/test1.java
>> uses or overrides a deprecated API.
>> Note: Recompile with -Xlint:deprecation for details.
>> 16/10/25 09:19:11 INFO orm.CompilationManager: Writing jar file:
>> /tmp/sqoop-cloudera/compile/019c3435216213411e2de14c483af692/test1.jar
>> 16/10/25 09:19:11 INFO mapreduce.ExportJobBase: Beginning export of test1
>> 16/10/25 09:19:12 INFO Configuration.deprecation: mapred.jar is
>> deprecated. Instead, use mapreduce.job.jar
>> 16/10/25 09:19:12 INFO Configuration.deprecation: mapred.map.max.attempts
>> is deprecated. Instead, use mapreduce.map.maxattempts
>> 16/10/25 09:19:13 INFO manager.SqlManager: Executing SQL statement:
>> SELECT t.* FROM "test1" AS t LIMIT 1
>> 16/10/25 09:19:13 ERROR sqoop.Sqoop: Got exception running Sqoop:
>> org.kitesdk.data.DatasetNotFoundException: Descriptor location does not
>> exist: hdfs://quickstart.cloudera:8020/user/hive/warehouse/teste1/.
>> metadata
>> org.kitesdk.data.DatasetNotFoundException: Descriptor location does not
>> exist: hdfs://quickstart.cloudera:8020/user/hive/warehouse/teste1/.
>> metadata
>> at org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.c
>> heckExists(FileSystemMetadataProvider.java:562)
>> at org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.f
>> ind(FileSystemMetadataProvider.java:605)
>> at org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.l
>> oad(FileSystemMetadataProvider.java:114)
>> at org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.
>> load(FileSystemDatasetRepository.java:197)
>> at org.kitesdk.data.Datasets.load(Datasets.java:108)
>> at org.kitesdk.data.Datasets.load(Datasets.java:140)
>> at org.kitesdk.data.mapreduce.DatasetKeyInputFormat$ConfigBuild
>> er.readFrom(DatasetKeyInputFormat.java:92)
>> at org.kitesdk.data.mapreduce.DatasetKeyInputFormat$ConfigBuild
>> er.readFrom(DatasetKeyInputFormat.java:139)
>> at org.apache.sqoop.mapreduce.JdbcExportJob.configureInputForma
>> t(JdbcExportJob.java:84)
>> at org.apache.sqoop.mapreduce.ExportJobBase.runExport(ExportJob
>> Base.java:432)
>> at org.apache.sqoop.manager.SqlManager.exportTable(SqlManager.java:931)
>> at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:81)
>> at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:100)
>> at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>> at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
>> at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
>> at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
>> at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
>>
>> I saw a recent JIRA opened about this, https://issues.apache.or
>> g/jira/browse/SQOOP-2907, and am wondering if there is any workaround
>> for this?
>>
>> Thanks in advance,
>>
>> Douglas
>>
>> --
>> Visite: http://canseidesercowboy.wordpress.com/
>> Siga: @dougspadotto ou @excowboys
>> -----
>> Frodo: "I wish none of this had happened."
>> Gandalf: "So do all who live to see such times, but that is not for them
>> to decide. All we have to decide is what to do with the time that is given
>> to us."
>> -- Lord of the Rings: The Fellowship of the Ring (2001)
>>
>
>


-- 
Visite: http://canseidesercowboy.wordpress.com/
Siga: @dougspadotto ou @excowboys
-----
Frodo: "I wish none of this had happened."
Gandalf: "So do all who live to see such times, but that is not for them to
decide. All we have to decide is what to do with the time that is given to
us."
-- Lord of the Rings: The Fellowship of the Ring (2001)

Re: Sqoop export from Hive table stored as Parquet

Reply via email to