Unfortunately Venkat is correct about (import + --hcatalog with parquet) however just in case others following this thread are not aware are, it is possible to import Parquet data into Hive, for example:
$ sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table t1 --target-dir /user/<hdfs_user>/t1 --delete-target-dir --hive-import --hive-database default --hive-table t1_parquet --num-mappers 1 --as-parquetfile $ sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --table t1 --target-dir /user/hive/warehouse/t1_parquet --num-mappers 1 --as-parquetfile --append $ impala-shell -i "<impala_daemon_host>:21000" -q "use default; select c_int, from_unixtime(cast(c_date/1000 as bigint), 'yyyy-MM-dd') as c_date, from_unixtime(cast(c_timestamp/1000 as bigint), 'yyyy-MM-dd HH:mm:ss.S') as c_timestamp from t1_parquet" +-------+------------+-----------------------+ | c_int | c_date | c_timestamp | +-------+------------+-----------------------+ | 1 | 2016-10-26 | 2016-10-26 21:30:33.0 | | 1 | 2016-10-26 | 2016-10-26 21:30:33.0 | +-------+------------+-----------------------+ Markus Kemper Customer Operations Engineer [image: www.cloudera.com] <http://www.cloudera.com> On Tue, Oct 25, 2016 at 4:35 PM, Venkat Ranganathan < [email protected]> wrote: > Thanks for testing and answering that the export works. One thing to > note is that imports are not allowed into parquet tables with HCatalog (See > HIVE-7502). It has been a long standing issue since we created the Sqoop > HCatalog integration a few years ago L > > > > Thanks > > > Venkat > > > > *From: *Markus Kemper <[email protected]> > *Reply-To: *"[email protected]" <[email protected]> > *Date: *Tuesday, October 25, 2016 at 12:01 PM > *To: *"[email protected]" <[email protected]> > *Subject: *Re: Sqoop export from Hive table stored as Parquet > > > > Awesomeness and thank you for helping with the other forums. > > > > > Markus Kemper > Customer Operations Engineer > [image: www.cloudera.com] <http://www.cloudera.com> > > > > > > On Tue, Oct 25, 2016 at 2:56 PM, Douglas Spadotto <[email protected]> > wrote: > > Hi Markus, > > > > It worked fine end to end. Here it goes the edited output: > > > > *SOURCE:* > > > > hive> describe extended teste1; > > OK > > id int > > nome string > > > > Detailed Table Information Table(tableName:teste1, dbName:default, > owner:cloudera, createTime:1477408342, lastAccessTime:0, retention:0, > sd:StorageDescriptor(cols:[FieldSchema(name:id, type:int, comment:null), > FieldSchema(name:nome, type:string, comment:null)], > location:hdfs://quickstart.cloudera:8020/user/hive/warehouse/teste1, > *inputFormat:org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat, > outputFormat:org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat,* > compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, > serializationLib:org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe, > parameters:{serialization.format=1}), bucketCols:[], sortCols:[], > parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], > skewedColValues:[], skewedColValueLocationMaps:{}), > storedAsSubDirectories:false), partitionKeys:[], parameters:{numFiles=2, > transient_lastDdlTime=1477408508, COLUMN_STATS_ACCURATE=true, > totalSize=645, numRows=3, rawDataSize=6}, viewOriginalText:null, > viewExpandedText:null, tableType:MANAGED_TABLE) > > Time taken: 0.08 seconds, Fetched: 4 row(s) > > > > hive> select * from teste1; > > OK > > ... > > 1 Douglas > > 2 Spadotto > > 3 Doug > > Time taken: 0.323 seconds, Fetched: 3 row(s) > > > > *SQOOP:* > > > > cloudera@quickstart ~]$ sqoop export --connect > jdbc:postgresql://localhost/postgres --table test1 --hcatalog-table > teste1 > > Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will > fail. > > Please set $ACCUMULO_HOME to the root of your Accumulo installation. > > 16/10/25 11:42:19 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.8.0 > > 16/10/25 11:42:19 INFO manager.SqlManager: Using default fetchSize of 1000 > > 16/10/25 11:42:19 INFO tool.CodeGenTool: Beginning code generation > > 16/10/25 11:42:19 INFO manager.SqlManager: Executing SQL statement: SELECT > t.* FROM "test1" AS t LIMIT 1 > > 16/10/25 11:42:19 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is > /usr/lib/hadoop-mapreduce > > Note: /tmp/sqoop-cloudera/compile/50329ee88534a5c3b6df8b37aadf2131/test1.java > uses or overrides a deprecated API. > > Note: Recompile with -Xlint:deprecation for details. > > 16/10/25 11:42:20 INFO orm.CompilationManager: Writing jar file: > /tmp/sqoop-cloudera/compile/50329ee88534a5c3b6df8b37aadf2131/test1.jar > > 16/10/25 11:42:20 INFO mapreduce.ExportJobBase: Beginning export of test1 > > 16/10/25 11:42:21 INFO Configuration.deprecation: mapred.jar is > deprecated. Instead, use mapreduce.job.jar > > 16/10/25 11:42:21 INFO Configuration.deprecation: mapred.map.max.attempts > is deprecated. Instead, use mapreduce.map.maxattempts > > 16/10/25 11:42:21 INFO mapreduce.ExportJobBase: Configuring HCatalog for > export job > > 16/10/25 11:42:21 INFO hcat.SqoopHCatUtilities: Configuring HCatalog > specific details for job > > 16/10/25 11:42:21 INFO manager.SqlManager: Executing SQL statement: SELECT > t.* FROM "test1" AS t LIMIT 1 > > 16/10/25 11:42:21 INFO hcat.SqoopHCatUtilities: Database column names > projected : [id, nome] > > 16/10/25 11:42:21 INFO hcat.SqoopHCatUtilities: Database column name - > info map : > > id : [Type : 4,Precision : 10,Scale : 0] > > nome : [Type : 12,Precision : 50,Scale : 0] *<-- Got the structure here!* > > > > 16/10/25 11:42:21 INFO hive.metastore: Trying to connect to metastore with > URI thrift://quickstart.cloudera:9083 > > 16/10/25 11:42:21 INFO hive.metastore: Opened a connection to metastore, > current connections: 1 > > 16/10/25 11:42:21 INFO hive.metastore: Connected to metastore. > > 16/10/25 11:42:21 INFO hcat.SqoopHCatUtilities: HCatalog full table schema > fields = [id, nome] > > ... > > 16/10/25 11:42:22 WARN hcat.SqoopHCatUtilities: No files under > /usr/lib/hive-hcatalog/share/hcatalog/storage-handlers to add to > distributed cache for hcatalog job > > 16/10/25 11:42:22 INFO hcat.SqoopHCatUtilities: Configuring HCatalog for > export job > > 16/10/25 11:42:22 INFO hcat.SqoopHCatUtilities: Ignoring configuration > request for HCatalog info > > 16/10/25 11:42:22 INFO Configuration.deprecation: > mapred.reduce.tasks.speculative.execution > is deprecated. Instead, use mapreduce.reduce.speculative > > 16/10/25 11:42:22 INFO Configuration.deprecation: > mapred.map.tasks.speculative.execution is deprecated. Instead, use > mapreduce.map.speculative > > 16/10/25 11:42:22 INFO Configuration.deprecation: mapred.map.tasks is > deprecated. Instead, use mapreduce.job.maps > > 16/10/25 11:42:22 INFO client.RMProxy: Connecting to ResourceManager at > quickstart.cloudera/192.168.26.129:8032 > > 16/10/25 11:42:49 INFO Configuration.deprecation: mapred.input.dir is > deprecated. Instead, use mapreduce.input.fileinputformat.inputdir > > 16/10/25 11:42:49 INFO mapred.FileInputFormat: Total input paths to > process : 2 > > 16/10/25 11:42:50 INFO mapreduce.JobSubmitter: number of splits:2 > > ... > > 16/10/25 11:43:07 INFO mapreduce.ExportJobBase: Transferred 15.9258 KB in > 45.1568 seconds (361.1415 bytes/sec) > > 16/10/25 11:43:07 INFO mapreduce.ExportJobBase: Exported 3 records. > > 16/10/25 11:43:07 INFO hive.metastore: Closed a connection to metastore, > current connections: 0 > > > > *DESTINATION:* > > > > [cloudera@quickstart ~]$ psql -d postgres -c "select * from test1" > > id | nome > > ----+---------- > > 1 | Douglas > > 2 | Spadotto > > 3 | Doug > > (3 rows) > > > > Now I'll go off into the Internet answer to everyone that posted this same > question on other forums. :) > > > > Regards, > > > > Douglas > > > > > > On Tue, Oct 25, 2016 at 4:00 PM, Markus Kemper <[email protected]> > wrote: > > Glad to hear it. Let us know how it goes > > > > > Markus Kemper > Customer Operations Engineer > [image: www.cloudera.com] <http://www.cloudera.com> > > > > > > On Tue, Oct 25, 2016 at 1:55 PM, Douglas Spadotto <[email protected]> > wrote: > > Hi Markus, > > > > Thank you! > > > > I tried this (hcatalog options) myself a few minutes after I hit "send" on > the e-mail. It worked fine, Sqoop was able to read the Parquet structure. > Just my MR crashed but it was due to my unstable environment. > > > > It looks like I'm on the way to the solution. > > > > Cheers, > > > > Douglas > > > > On Tue, Oct 25, 2016 at 3:32 PM, Markus Kemper <[email protected]> > wrote: > > Hello Douglas, > > > > The only workaround that I am aware of is to use the Sqoop --hcatalog > options, for example: > > sqoop export --connect <jdbc_connection_string> --table <rdbms_table> > --hcatalog-database <hive_database> --hcatalog-table <hive_table> > > > > > > > Markus Kemper > Customer Operations Engineer > [image: www.cloudera.com] <http://www.cloudera.com> > > > > > > On Tue, Oct 25, 2016 at 1:21 PM, Douglas Spadotto <[email protected]> > wrote: > > Hello everyone, > > > > I saw in the past few months quite a few messages about Parquet support on > Sqoop, all about importing. Some of them worked well. > > > > But for exporting I'm receiving this error when trying to export from a > Hive table stored as Parquet to Postgresql: > > > > [cloudera@quickstart ~]$ sqoop export --connect > jdbc:postgresql://localhost/postgres --table test1 --export-dir > /user/hive/warehouse/teste1 > > Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will > fail. > > Please set $ACCUMULO_HOME to the root of your Accumulo installation. > > 16/10/25 09:19:09 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.8.0 > > 16/10/25 09:19:09 INFO manager.SqlManager: Using default fetchSize of 1000 > > 16/10/25 09:19:09 INFO tool.CodeGenTool: Beginning code generation > > 16/10/25 09:19:10 INFO manager.SqlManager: Executing SQL statement: SELECT > t.* FROM "test1" AS t LIMIT 1 > > 16/10/25 09:19:10 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is > /usr/lib/hadoop-mapreduce > > Note: /tmp/sqoop-cloudera/compile/019c3435216213411e2de14c483af692/test1.java > uses or overrides a deprecated API. > > Note: Recompile with -Xlint:deprecation for details. > > 16/10/25 09:19:11 INFO orm.CompilationManager: Writing jar file: > /tmp/sqoop-cloudera/compile/019c3435216213411e2de14c483af692/test1.jar > > 16/10/25 09:19:11 INFO mapreduce.ExportJobBase: Beginning export of test1 > > 16/10/25 09:19:12 INFO Configuration.deprecation: mapred.jar is > deprecated. Instead, use mapreduce.job.jar > > 16/10/25 09:19:12 INFO Configuration.deprecation: mapred.map.max.attempts > is deprecated. Instead, use mapreduce.map.maxattempts > > 16/10/25 09:19:13 INFO manager.SqlManager: Executing SQL statement: SELECT > t.* FROM "test1" AS t LIMIT 1 > > 16/10/25 09:19:13 ERROR sqoop.Sqoop: Got exception running Sqoop: > org.kitesdk.data.DatasetNotFoundException: Descriptor location does not > exist: hdfs://quickstart.cloudera:8020/user/hive/warehouse/ > teste1/.metadata > > org.kitesdk.data.DatasetNotFoundException: Descriptor location does not > exist: hdfs://quickstart.cloudera:8020/user/hive/warehouse/ > teste1/.metadata > > at org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.checkExists( > FileSystemMetadataProvider.java:562) > > at org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.find( > FileSystemMetadataProvider.java:605) > > at org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.load( > FileSystemMetadataProvider.java:114) > > at org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.load( > FileSystemDatasetRepository.java:197) > > at org.kitesdk.data.Datasets.load(Datasets.java:108) > > at org.kitesdk.data.Datasets.load(Datasets.java:140) > > at org.kitesdk.data.mapreduce.DatasetKeyInputFormat$ > ConfigBuilder.readFrom(DatasetKeyInputFormat.java:92) > > at org.kitesdk.data.mapreduce.DatasetKeyInputFormat$ > ConfigBuilder.readFrom(DatasetKeyInputFormat.java:139) > > at org.apache.sqoop.mapreduce.JdbcExportJob.configureInputFormat( > JdbcExportJob.java:84) > > at org.apache.sqoop.mapreduce.ExportJobBase.runExport( > ExportJobBase.java:432) > > at org.apache.sqoop.manager.SqlManager.exportTable(SqlManager.java:931) > > at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:81) > > at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:100) > > at org.apache.sqoop.Sqoop.run(Sqoop.java:143) > > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > > at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179) > > at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218) > > at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227) > > at org.apache.sqoop.Sqoop.main(Sqoop.java:236) > > > > I saw a recent JIRA opened about this, https://issues.apache. > org/jira/browse/SQOOP-2907, and am wondering if there is any workaround > for this? > > > > Thanks in advance, > > > > Douglas > > > > -- > > Visite: http://canseidesercowboy.wordpress.com/ > > Siga: @dougspadotto ou @excowboys > > ----- > > Frodo: "I wish none of this had happened." > > Gandalf: "So do all who live to see such times, but that is not for them > to decide. All we have to decide is what to do with the time that is given > to us." > > -- Lord of the Rings: The Fellowship of the Ring (2001) > > > > > > > > -- > > Visite: http://canseidesercowboy.wordpress.com/ > > Siga: @dougspadotto ou @excowboys > > ----- > > Frodo: "I wish none of this had happened." > > Gandalf: "So do all who live to see such times, but that is not for them > to decide. All we have to decide is what to do with the time that is given > to us." > > -- Lord of the Rings: The Fellowship of the Ring (2001) > > > > > > > > -- > > Visite: http://canseidesercowboy.wordpress.com/ > > Siga: @dougspadotto ou @excowboys > > ----- > > Frodo: "I wish none of this had happened." > > Gandalf: "So do all who live to see such times, but that is not for them > to decide. All we have to decide is what to do with the time that is given > to us." > > -- Lord of the Rings: The Fellowship of the Ring (2001) > > >
