Go for it, Venkat! If you have questions about writing a patch, feel free to ask on the [email protected] mailing list.
On Thu, Aug 7, 2014 at 9:56 AM, Venkat, Ankam <[email protected]> wrote: > Gwen, > > Created a jira at https://issues.apache.org/jira/browse/SQOOP-1415. > > Yes, I would love to contribute a patch for this. > > Regards, > Venkat > > -----Original Message----- > From: Gwen Shapira [mailto:[email protected]] > Sent: Wednesday, August 06, 2014 9:45 AM > To: [email protected] > Subject: Re: Import Partitions from Oracle to Hive Partitions > > Venkat, > > Running one sqoop job and moving files to different directories should be > faster than a sqoop job per partition (at least it was for my customers). > > If you are interested in a new OraOop feature, why not open a Jira in > issues.apache.org? > You can even contribute a patch if you are so inclined :) > > Gwen > > On Wed, Aug 6, 2014 at 7:56 AM, Venkat, Ankam <[email protected]> > wrote: >> Thanks for the response. >> >> I was thinking to use Oraoop to automatically import Oracle partitions to >> Hive partitions. But, based on conversation below, I just learned its not >> possible. >> >> From automation perspective, I think running one Sqoop job per partition and >> create same partition in Hive is better option. >> >> Gwen/David: Yes, it will be a good feature to have Oracle Partitions to >> Hive partitions. Any idea why there are no commits to Oraoop since 2012? >> >> Regards, >> Venkat >> >> -----Original Message----- >> From: Gwen Shapira [mailto:[email protected]] >> Sent: Tuesday, August 05, 2014 6:24 PM >> To: [email protected] >> Subject: Re: Import Partitions from Oracle to Hive Partitions >> >> Having OraOop automatically handle partitions in Hive will be a cool >> feature. I agree that this will be limited to OraOop for now. >> >> On Tue, Aug 5, 2014 at 5:08 PM, David Robson >> <[email protected]> wrote: >>> Yes now that you mention Sqoop is limited to one partition in Hive I do >>> remember that! I would think we could modify Sqoop to create subfolders for >>> each partition - instead of how it now creates a separate file for each >>> partition? This would probably be limited to the direct (OraOop) connector >>> as it is aware of partitions (existing connector doesn't read data >>> dictionary directly). >>> >>> In the meantime Venkat - you could look at the option I mentioned - then >>> manually move the files into separate folders - at least you'll have each >>> partition in a separate file rather than spread throughout all files. The >>> other thing you could look at is the option below - you could run one Sqoop >>> job per partition: >>> >>> Specify The Partitions To Import >>> >>> -Doraoop.import.partitions=PartitionA,PartitionB --table >>> OracleTableName >>> >>> Imports PartitionA and PartitionB of OracleTableName. >>> >>> Notes: >>> You can enclose an individual partition name in double quotes to >>> retain the letter case or if the name has special characters. >>> -Doraoop.import.partitions='"PartitionA",PartitionB' --table >>> OracleTableName If the partition name is not double quoted then its >>> name will be automatically converted to upper case, PARTITIONB for >>> above. >>> When using double quotes the entire list of partition names must be >>> enclosed in single quotes. >>> If the last partition name in the list is double quoted then there >>> must be a comma at the end of the list. >>> -Doraoop.import.partitions='"PartitionA","PartitionB",' --table >>> OracleTableName >>> >>> Name each partition to be included. There is no facility to provide a range >>> of partition names. >>> >>> There is no facility to define sub partitions. The entire partition is >>> included/excluded as per the filter. >>> >>> >>> -----Original Message----- >>> From: Gwen Shapira [mailto:[email protected]] >>> Sent: Wednesday, 6 August 2014 8:44 AM >>> To: [email protected] >>> Subject: Re: Import Partitions from Oracle to Hive Partitions >>> >>> Hive expects a directory for each partition, so getting data with OraOop >>> will require some post-processing - copy files into properly named >>> directories and adding the new partitions to a hive table. >>> >>> Sqoop has the --hive-partition-key and --hive-partition-value, but this >>> assumes that all the data sqooped will fit into a single partition. >>> >>> >>> On Tue, Aug 5, 2014 at 3:40 PM, David Robson >>> <[email protected]> wrote: >>>> Hi Venkat, >>>> >>>> >>>> >>>> I’m not sure what this will do in regards to Hive partitions – I’ll >>>> test it out when I get into the office and get back to you. But this >>>> option will make it so there is one file for each Oracle partition – >>>> which might be of interest to you. >>>> >>>> >>>> >>>> Match Hadoop Files to Oracle Table Partitions >>>> >>>> >>>> >>>> -Doraoop.chunk.method={ROWID|PARTITION} >>>> >>>> >>>> >>>> To import data from a partitioned table in such a way that the >>>> resulting HDFS folder structure in >>>> >>>> Hadoop will match the table’s partitions, set the chunk method to >>>> PARTITION. >>>> The alternative >>>> >>>> (default) chunk method is ROWID. >>>> >>>> >>>> >>>> Notes: >>>> >>>> l For the number of Hadoop files to match the number of Oracle >>>> partitions, set the number >>>> >>>> of mappers to be greater than or equal to the number of partitions. >>>> >>>> l If the table is not partitioned then value PARTITION will lead to >>>> an error. >>>> >>>> >>>> >>>> David >>>> >>>> >>>> >>>> >>>> >>>> From: Venkat, Ankam [mailto:[email protected]] >>>> Sent: Wednesday, 6 August 2014 3:56 AM >>>> To: '[email protected]' >>>> Subject: Import Partitions from Oracle to Hive Partitions >>>> >>>> >>>> >>>> I am trying to import partitions from Oracle table to Hive partitions. >>>> >>>> >>>> >>>> Can somebody provide the syntax using regular JDBC connector and >>>> Oraoop connector? >>>> >>>> >>>> >>>> Thanks in advance. >>>> >>>> >>>> >>>> Regards, >>>> >>>> Venkat >>>> >>>> >>>> >>>>
