Re: Import Partitions from Oracle to Hive Partitions

Gwen Shapira Thu, 07 Aug 2014 10:20:31 -0700

Go for it, Venkat!

If you have questions about writing a patch, feel free to ask on the
[email protected] mailing list.




On Thu, Aug 7, 2014 at 9:56 AM, Venkat, Ankam
<[email protected]> wrote:
> Gwen,
>
> Created a jira at https://issues.apache.org/jira/browse/SQOOP-1415.
>
> Yes, I would love to contribute a patch for this.
>
> Regards,
> Venkat
>
> -----Original Message-----
> From: Gwen Shapira [mailto:[email protected]]
> Sent: Wednesday, August 06, 2014 9:45 AM
> To: [email protected]
> Subject: Re: Import Partitions from Oracle to Hive Partitions
>
> Venkat,
>
> Running one sqoop job and moving files to different directories should be 
> faster than a sqoop job per partition (at least it was for my customers).
>
> If you are interested in a new OraOop feature, why not open a Jira in 
> issues.apache.org?
> You can even contribute a patch if you are so inclined :)
>
> Gwen
>
> On Wed, Aug 6, 2014 at 7:56 AM, Venkat, Ankam <[email protected]> 
> wrote:
>> Thanks for the response.
>>
>> I was thinking to use Oraoop to automatically import Oracle partitions to 
>> Hive partitions.  But, based on conversation below, I just learned its not 
>> possible.
>>
>> From automation perspective, I think running one Sqoop job per partition and 
>> create same partition in Hive is better option.
>>
>> Gwen/David:  Yes, it will be a good feature to have Oracle Partitions to 
>> Hive partitions.  Any idea why there are no commits to Oraoop since 2012?
>>
>> Regards,
>> Venkat
>>
>> -----Original Message-----
>> From: Gwen Shapira [mailto:[email protected]]
>> Sent: Tuesday, August 05, 2014 6:24 PM
>> To: [email protected]
>> Subject: Re: Import Partitions from Oracle to Hive Partitions
>>
>> Having OraOop automatically handle partitions in Hive will be a cool 
>> feature. I agree that this will be limited to OraOop for now.
>>
>> On Tue, Aug 5, 2014 at 5:08 PM, David Robson 
>> <[email protected]> wrote:
>>> Yes now that you mention Sqoop is limited to one partition in Hive I do 
>>> remember that! I would think we could modify Sqoop to create subfolders for 
>>> each partition - instead of how it now creates a separate file for each 
>>> partition? This would probably be limited to the direct (OraOop) connector 
>>> as it is aware of partitions (existing connector doesn't read data 
>>> dictionary directly).
>>>
>>> In the meantime Venkat - you could look at the option I mentioned - then 
>>> manually move the files into separate folders - at least you'll have each 
>>> partition in a separate file rather than spread throughout all files. The 
>>> other thing you could look at is the option below - you could run one Sqoop 
>>> job per partition:
>>>
>>> Specify The Partitions To Import
>>>
>>> -Doraoop.import.partitions=PartitionA,PartitionB --table
>>> OracleTableName
>>>
>>> Imports PartitionA and PartitionB of OracleTableName.
>>>
>>> Notes:
>>> You can enclose an individual partition name in double quotes to
>>> retain the letter case or if the name has special characters.
>>> -Doraoop.import.partitions='"PartitionA",PartitionB' --table
>>> OracleTableName If the partition name is not double quoted then its
>>> name will be automatically converted to upper case, PARTITIONB for
>>> above.
>>> When using double quotes the entire list of partition names must be
>>> enclosed in single quotes.
>>> If the last partition name in the list is double quoted then there
>>> must be a comma at the end of the list.
>>> -Doraoop.import.partitions='"PartitionA","PartitionB",' --table
>>> OracleTableName
>>>
>>> Name each partition to be included. There is no facility to provide a range 
>>> of partition names.
>>>
>>> There is no facility to define sub partitions. The entire partition is 
>>> included/excluded as per the filter.
>>>
>>>
>>> -----Original Message-----
>>> From: Gwen Shapira [mailto:[email protected]]
>>> Sent: Wednesday, 6 August 2014 8:44 AM
>>> To: [email protected]
>>> Subject: Re: Import Partitions from Oracle to Hive Partitions
>>>
>>> Hive expects a directory for each partition, so getting data with OraOop 
>>> will require some post-processing - copy files into properly named 
>>> directories and adding the new partitions to a hive table.
>>>
>>> Sqoop has the --hive-partition-key and --hive-partition-value, but this 
>>> assumes that all the data sqooped will fit into a single partition.
>>>
>>>
>>> On Tue, Aug 5, 2014 at 3:40 PM, David Robson 
>>> <[email protected]> wrote:
>>>> Hi Venkat,
>>>>
>>>>
>>>>
>>>> I’m not sure what this will do in regards to Hive partitions – I’ll
>>>> test it out when I get into the office and get back to you. But this
>>>> option will make it so there is one file for each Oracle partition –
>>>> which might be of interest to you.
>>>>
>>>>
>>>>
>>>> Match Hadoop Files to Oracle Table Partitions
>>>>
>>>>
>>>>
>>>> -Doraoop.chunk.method={ROWID|PARTITION}
>>>>
>>>>
>>>>
>>>> To import data from a partitioned table in such a way that the
>>>> resulting HDFS folder structure in
>>>>
>>>> Hadoop will match the table’s partitions, set the chunk method to 
>>>> PARTITION.
>>>> The alternative
>>>>
>>>> (default) chunk method is ROWID.
>>>>
>>>>
>>>>
>>>> Notes:
>>>>
>>>> l For the number of Hadoop files to match the number of Oracle
>>>> partitions, set the number
>>>>
>>>> of mappers to be greater than or equal to the number of partitions.
>>>>
>>>> l If the table is not partitioned then value PARTITION will lead to
>>>> an error.
>>>>
>>>>
>>>>
>>>> David
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> From: Venkat, Ankam [mailto:[email protected]]
>>>> Sent: Wednesday, 6 August 2014 3:56 AM
>>>> To: '[email protected]'
>>>> Subject: Import Partitions from Oracle to Hive Partitions
>>>>
>>>>
>>>>
>>>> I am trying to import  partitions from Oracle table to Hive partitions.
>>>>
>>>>
>>>>
>>>> Can somebody provide the syntax using regular JDBC connector and
>>>> Oraoop connector?
>>>>
>>>>
>>>>
>>>> Thanks in advance.
>>>>
>>>>
>>>>
>>>> Regards,
>>>>
>>>> Venkat
>>>>
>>>>
>>>>
>>>>

Re: Import Partitions from Oracle to Hive Partitions

Reply via email to