Re: Import Partitions from Oracle to Hive Partitions

Gwen Shapira Wed, 06 Aug 2014 08:45:49 -0700

Venkat,

Running one sqoop job and moving files to different directories should
be faster than a sqoop job per partition (at least it was for my
customers).


If you are interested in a new OraOop feature, why not open a Jira in
issues.apache.org?
You can even contribute a patch if you are so inclined :)

Gwen

On Wed, Aug 6, 2014 at 7:56 AM, Venkat, Ankam
<[email protected]> wrote:
> Thanks for the response.
>
> I was thinking to use Oraoop to automatically import Oracle partitions to 
> Hive partitions.  But, based on conversation below, I just learned its not 
> possible.
>
> From automation perspective, I think running one Sqoop job per partition and 
> create same partition in Hive is better option.
>
> Gwen/David:  Yes, it will be a good feature to have Oracle Partitions to Hive 
> partitions.  Any idea why there are no commits to Oraoop since 2012?
>
> Regards,
> Venkat
>
> -----Original Message-----
> From: Gwen Shapira [mailto:[email protected]]
> Sent: Tuesday, August 05, 2014 6:24 PM
> To: [email protected]
> Subject: Re: Import Partitions from Oracle to Hive Partitions
>
> Having OraOop automatically handle partitions in Hive will be a cool feature. 
> I agree that this will be limited to OraOop for now.
>
> On Tue, Aug 5, 2014 at 5:08 PM, David Robson <[email protected]> 
> wrote:
>> Yes now that you mention Sqoop is limited to one partition in Hive I do 
>> remember that! I would think we could modify Sqoop to create subfolders for 
>> each partition - instead of how it now creates a separate file for each 
>> partition? This would probably be limited to the direct (OraOop) connector 
>> as it is aware of partitions (existing connector doesn't read data 
>> dictionary directly).
>>
>> In the meantime Venkat - you could look at the option I mentioned - then 
>> manually move the files into separate folders - at least you'll have each 
>> partition in a separate file rather than spread throughout all files. The 
>> other thing you could look at is the option below - you could run one Sqoop 
>> job per partition:
>>
>> Specify The Partitions To Import
>>
>> -Doraoop.import.partitions=PartitionA,PartitionB --table
>> OracleTableName
>>
>> Imports PartitionA and PartitionB of OracleTableName.
>>
>> Notes:
>> You can enclose an individual partition name in double quotes to
>> retain the letter case or if the name has special characters.
>> -Doraoop.import.partitions='"PartitionA",PartitionB' --table
>> OracleTableName If the partition name is not double quoted then its
>> name will be automatically converted to upper case, PARTITIONB for
>> above.
>> When using double quotes the entire list of partition names must be
>> enclosed in single quotes.
>> If the last partition name in the list is double quoted then there
>> must be a comma at the end of the list.
>> -Doraoop.import.partitions='"PartitionA","PartitionB",' --table
>> OracleTableName
>>
>> Name each partition to be included. There is no facility to provide a range 
>> of partition names.
>>
>> There is no facility to define sub partitions. The entire partition is 
>> included/excluded as per the filter.
>>
>>
>> -----Original Message-----
>> From: Gwen Shapira [mailto:[email protected]]
>> Sent: Wednesday, 6 August 2014 8:44 AM
>> To: [email protected]
>> Subject: Re: Import Partitions from Oracle to Hive Partitions
>>
>> Hive expects a directory for each partition, so getting data with OraOop 
>> will require some post-processing - copy files into properly named 
>> directories and adding the new partitions to a hive table.
>>
>> Sqoop has the --hive-partition-key and --hive-partition-value, but this 
>> assumes that all the data sqooped will fit into a single partition.
>>
>>
>> On Tue, Aug 5, 2014 at 3:40 PM, David Robson 
>> <[email protected]> wrote:
>>> Hi Venkat,
>>>
>>>
>>>
>>> I’m not sure what this will do in regards to Hive partitions – I’ll
>>> test it out when I get into the office and get back to you. But this
>>> option will make it so there is one file for each Oracle partition –
>>> which might be of interest to you.
>>>
>>>
>>>
>>> Match Hadoop Files to Oracle Table Partitions
>>>
>>>
>>>
>>> -Doraoop.chunk.method={ROWID|PARTITION}
>>>
>>>
>>>
>>> To import data from a partitioned table in such a way that the
>>> resulting HDFS folder structure in
>>>
>>> Hadoop will match the table’s partitions, set the chunk method to PARTITION.
>>> The alternative
>>>
>>> (default) chunk method is ROWID.
>>>
>>>
>>>
>>> Notes:
>>>
>>> l For the number of Hadoop files to match the number of Oracle
>>> partitions, set the number
>>>
>>> of mappers to be greater than or equal to the number of partitions.
>>>
>>> l If the table is not partitioned then value PARTITION will lead to
>>> an error.
>>>
>>>
>>>
>>> David
>>>
>>>
>>>
>>>
>>>
>>> From: Venkat, Ankam [mailto:[email protected]]
>>> Sent: Wednesday, 6 August 2014 3:56 AM
>>> To: '[email protected]'
>>> Subject: Import Partitions from Oracle to Hive Partitions
>>>
>>>
>>>
>>> I am trying to import  partitions from Oracle table to Hive partitions.
>>>
>>>
>>>
>>> Can somebody provide the syntax using regular JDBC connector and
>>> Oraoop connector?
>>>
>>>
>>>
>>> Thanks in advance.
>>>
>>>
>>>
>>> Regards,
>>>
>>> Venkat
>>>
>>>
>>>
>>>

Re: Import Partitions from Oracle to Hive Partitions

Reply via email to