RE: Import Partitions from Oracle to Hive Partitions

Venkat, Ankam Tue, 19 Aug 2014 08:31:24 -0700

David/Gwen,

I have an issue with importing Oracle partitions to match Hadoop Files.

Sqoop command:

sqoop import  \
-Doraoop.disabled=false  \
-Doraoop.import.partitions='CUSTOMER_P01,CUSTOMER_P02,CUSTOMER_P03,CUSTOMER_P04,CUSTOMER_P05,CUSTOMER_P06,CUSTOMER_P07,CUSTOMER_P08,CUSTOMER_P09,CUSTOMER_P10,CUSTOMER_P11,CUSTOMER_P12,CUSTOMER_P13,CUSTOMER_P14,CUSTOMER_P15,CUSTOMER_P16,CUSTOMER_P17,CUSTOMER_P18,CUSTOMER_P19,CUSTOMER_P20,CUSTOMER_P21,CUSTOMER_P22,CUSTOMER_P23,CUSTOMER_P24,CUSTOMER_P25,CUSTOMER_P26,CUSTOMER_P27,CUSTOMER_P28,CUSTOMER_P29,CUSTOMER_P30,CUSTOMER_P31,CUSTOMER_P32,CUSTOMER_P33,CUSTOMER_P34,CUSTOMER_P35,CUSTOMER_P36,CUSTOMER_P37,CUSTOMER_P38,CUSTOMER_P39,CUSTOMER_P40,CUSTOMER_P41,CUSTOMER_P42,CUSTOMER_P43,CUSTOMER_P44,CUSTOMER_P45,CUSTOMER_P46,CUSTOMER_P47,CUSTOMER_P48,CUSTOMER_P49,CUSTOMER_P50,CUSTOMER_P51,CUSTOMER_P52,CUSTOMER_P53,CUSTOMER_P54,CUSTOMER_P55,CUSTOMER_P56,CUSTOMER_P57,CUSTOMER_P58,CUSTOMER_P59,CUSTOMER_P60,CUSTOMER_P61,CUSTOMER_P62,CUSTOMER_P63,CUSTOMER_P64,CUSTOMER_P65,CUSTOMER_P66,CUSTOMER_P67,CUSTOMER_P68,CUSTOMER_P69,CUSTOMER_P70,CUSTOMER_P71,CUSTOMER_P72,CUSTOMER_P73,CUSTOMER_P74,CUSTOMER_P75'
 \
-Doraoop.chunk.method=PARTITION \
-Doraoop.import.consistent.read=true \
--connect jdbc:oracle:thin:@XXXXXXXXXX:1521/YYYYYY \
--table Schema.TableName \
--username MyID \
--password MyPWD \
-m 75

Issues:
1) There are 75 partitions on Oracle Table.  But, 61 are being imported because 
14 partitions are empty. I have no way to identify the file belongs to which 
partition. 
2) All the partitions are imported with names as PART-00001, PART-00002 and so 
on.  I thought -Doraoop.chunk.method=PARTITION option will match the hadoop 
files to Oracle Partions.  I mean import as HDFS files with same name. 

Any workaround for this?

Regards,
Venkat

-----Original Message-----
From: David Robson [mailto:[email protected]] 
Sent: Tuesday, August 05, 2014 6:08 PM
To: [email protected]
Subject: RE: Import Partitions from Oracle to Hive Partitions

Yes now that you mention Sqoop is limited to one partition in Hive I do 
remember that! I would think we could modify Sqoop to create subfolders for 
each partition - instead of how it now creates a separate file for each 
partition? This would probably be limited to the direct (OraOop) connector as 
it is aware of partitions (existing connector doesn't read data dictionary 
directly).

In the meantime Venkat - you could look at the option I mentioned - then 
manually move the files into separate folders - at least you'll have each 
partition in a separate file rather than spread throughout all files. The other 
thing you could look at is the option below - you could run one Sqoop job per 
partition:

Specify The Partitions To Import

-Doraoop.import.partitions=PartitionA,PartitionB --table OracleTableName

Imports PartitionA and PartitionB of OracleTableName.

Notes:
You can enclose an individual partition name in double quotes to retain the 
letter case or if the name has special characters.
-Doraoop.import.partitions='"PartitionA",PartitionB' --table OracleTableName If 
the partition name is not double quoted then its name will be automatically 
converted to upper case, PARTITIONB for above.
When using double quotes the entire list of partition names must be enclosed in 
single quotes.
If the last partition name in the list is double quoted then there must be a 
comma at the end of the list. 
-Doraoop.import.partitions='"PartitionA","PartitionB",' --table OracleTableName

Name each partition to be included. There is no facility to provide a range of 
partition names.

There is no facility to define sub partitions. The entire partition is 
included/excluded as per the filter.

-----Original Message-----
From: Gwen Shapira [mailto:[email protected]]
Sent: Wednesday, 6 August 2014 8:44 AM
To: [email protected]
Subject: Re: Import Partitions from Oracle to Hive Partitions

Hive expects a directory for each partition, so getting data with OraOop will 
require some post-processing - copy files into properly named directories and 
adding the new partitions to a hive table.

Sqoop has the --hive-partition-key and --hive-partition-value, but this assumes 
that all the data sqooped will fit into a single partition.

On Tue, Aug 5, 2014 at 3:40 PM, David Robson <[email protected]> 
wrote:
> Hi Venkat,
>
>
>
> I’m not sure what this will do in regards to Hive partitions – I’ll 
> test it out when I get into the office and get back to you. But this 
> option will make it so there is one file for each Oracle partition – 
> which might be of interest to you.
>
>
>
> Match Hadoop Files to Oracle Table Partitions
>
>
>
> -Doraoop.chunk.method={ROWID|PARTITION}
>
>
>
> To import data from a partitioned table in such a way that the 
> resulting HDFS folder structure in
>
> Hadoop will match the table’s partitions, set the chunk method to PARTITION.
> The alternative
>
> (default) chunk method is ROWID.
>
>
>
> Notes:
>
> l For the number of Hadoop files to match the number of Oracle 
> partitions, set the number
>
> of mappers to be greater than or equal to the number of partitions.
>
> l If the table is not partitioned then value PARTITION will lead to an 
> error.
>
>
>
> David
>
>
>
>
>
> From: Venkat, Ankam [mailto:[email protected]]
> Sent: Wednesday, 6 August 2014 3:56 AM
> To: '[email protected]'
> Subject: Import Partitions from Oracle to Hive Partitions
>
>
>
> I am trying to import  partitions from Oracle table to Hive partitions.
>
>
>
> Can somebody provide the syntax using regular JDBC connector and 
> Oraoop connector?
>
>
>
> Thanks in advance.
>
>
>
> Regards,
>
> Venkat
>
>
>
>

RE: Import Partitions from Oracle to Hive Partitions

Reply via email to