Venkat, Running one sqoop job and moving files to different directories should be faster than a sqoop job per partition (at least it was for my customers).
If you are interested in a new OraOop feature, why not open a Jira in issues.apache.org? You can even contribute a patch if you are so inclined :) Gwen On Wed, Aug 6, 2014 at 7:56 AM, Venkat, Ankam <[email protected]> wrote: > Thanks for the response. > > I was thinking to use Oraoop to automatically import Oracle partitions to > Hive partitions. But, based on conversation below, I just learned its not > possible. > > From automation perspective, I think running one Sqoop job per partition and > create same partition in Hive is better option. > > Gwen/David: Yes, it will be a good feature to have Oracle Partitions to Hive > partitions. Any idea why there are no commits to Oraoop since 2012? > > Regards, > Venkat > > -----Original Message----- > From: Gwen Shapira [mailto:[email protected]] > Sent: Tuesday, August 05, 2014 6:24 PM > To: [email protected] > Subject: Re: Import Partitions from Oracle to Hive Partitions > > Having OraOop automatically handle partitions in Hive will be a cool feature. > I agree that this will be limited to OraOop for now. > > On Tue, Aug 5, 2014 at 5:08 PM, David Robson <[email protected]> > wrote: >> Yes now that you mention Sqoop is limited to one partition in Hive I do >> remember that! I would think we could modify Sqoop to create subfolders for >> each partition - instead of how it now creates a separate file for each >> partition? This would probably be limited to the direct (OraOop) connector >> as it is aware of partitions (existing connector doesn't read data >> dictionary directly). >> >> In the meantime Venkat - you could look at the option I mentioned - then >> manually move the files into separate folders - at least you'll have each >> partition in a separate file rather than spread throughout all files. The >> other thing you could look at is the option below - you could run one Sqoop >> job per partition: >> >> Specify The Partitions To Import >> >> -Doraoop.import.partitions=PartitionA,PartitionB --table >> OracleTableName >> >> Imports PartitionA and PartitionB of OracleTableName. >> >> Notes: >> You can enclose an individual partition name in double quotes to >> retain the letter case or if the name has special characters. >> -Doraoop.import.partitions='"PartitionA",PartitionB' --table >> OracleTableName If the partition name is not double quoted then its >> name will be automatically converted to upper case, PARTITIONB for >> above. >> When using double quotes the entire list of partition names must be >> enclosed in single quotes. >> If the last partition name in the list is double quoted then there >> must be a comma at the end of the list. >> -Doraoop.import.partitions='"PartitionA","PartitionB",' --table >> OracleTableName >> >> Name each partition to be included. There is no facility to provide a range >> of partition names. >> >> There is no facility to define sub partitions. The entire partition is >> included/excluded as per the filter. >> >> >> -----Original Message----- >> From: Gwen Shapira [mailto:[email protected]] >> Sent: Wednesday, 6 August 2014 8:44 AM >> To: [email protected] >> Subject: Re: Import Partitions from Oracle to Hive Partitions >> >> Hive expects a directory for each partition, so getting data with OraOop >> will require some post-processing - copy files into properly named >> directories and adding the new partitions to a hive table. >> >> Sqoop has the --hive-partition-key and --hive-partition-value, but this >> assumes that all the data sqooped will fit into a single partition. >> >> >> On Tue, Aug 5, 2014 at 3:40 PM, David Robson >> <[email protected]> wrote: >>> Hi Venkat, >>> >>> >>> >>> I’m not sure what this will do in regards to Hive partitions – I’ll >>> test it out when I get into the office and get back to you. But this >>> option will make it so there is one file for each Oracle partition – >>> which might be of interest to you. >>> >>> >>> >>> Match Hadoop Files to Oracle Table Partitions >>> >>> >>> >>> -Doraoop.chunk.method={ROWID|PARTITION} >>> >>> >>> >>> To import data from a partitioned table in such a way that the >>> resulting HDFS folder structure in >>> >>> Hadoop will match the table’s partitions, set the chunk method to PARTITION. >>> The alternative >>> >>> (default) chunk method is ROWID. >>> >>> >>> >>> Notes: >>> >>> l For the number of Hadoop files to match the number of Oracle >>> partitions, set the number >>> >>> of mappers to be greater than or equal to the number of partitions. >>> >>> l If the table is not partitioned then value PARTITION will lead to >>> an error. >>> >>> >>> >>> David >>> >>> >>> >>> >>> >>> From: Venkat, Ankam [mailto:[email protected]] >>> Sent: Wednesday, 6 August 2014 3:56 AM >>> To: '[email protected]' >>> Subject: Import Partitions from Oracle to Hive Partitions >>> >>> >>> >>> I am trying to import partitions from Oracle table to Hive partitions. >>> >>> >>> >>> Can somebody provide the syntax using regular JDBC connector and >>> Oraoop connector? >>> >>> >>> >>> Thanks in advance. >>> >>> >>> >>> Regards, >>> >>> Venkat >>> >>> >>> >>>
