Having OraOop automatically handle partitions in Hive will be a cool
feature. I agree that this will be limited to OraOop for now.

On Tue, Aug 5, 2014 at 5:08 PM, David Robson
<[email protected]> wrote:
> Yes now that you mention Sqoop is limited to one partition in Hive I do 
> remember that! I would think we could modify Sqoop to create subfolders for 
> each partition - instead of how it now creates a separate file for each 
> partition? This would probably be limited to the direct (OraOop) connector as 
> it is aware of partitions (existing connector doesn't read data dictionary 
> directly).
>
> In the meantime Venkat - you could look at the option I mentioned - then 
> manually move the files into separate folders - at least you'll have each 
> partition in a separate file rather than spread throughout all files. The 
> other thing you could look at is the option below - you could run one Sqoop 
> job per partition:
>
> Specify The Partitions To Import
>
> -Doraoop.import.partitions=PartitionA,PartitionB --table OracleTableName
>
> Imports PartitionA and PartitionB of OracleTableName.
>
> Notes:
> You can enclose an individual partition name in double quotes to retain the 
> letter case or
> if the name has special characters.
> -Doraoop.import.partitions='"PartitionA",PartitionB' --table
> OracleTableName
> If the partition name is not double quoted then its name will be 
> automatically converted
> to upper case, PARTITIONB for above.
> When using double quotes the entire list of partition names must be enclosed 
> in
> single quotes.
> If the last partition name in the list is double quoted then there must be a 
> comma at the end of the list. 
> -Doraoop.import.partitions='"PartitionA","PartitionB",' --table 
> OracleTableName
>
> Name each partition to be included. There is no facility to provide a range 
> of partition names.
>
> There is no facility to define sub partitions. The entire partition is 
> included/excluded as per the filter.
>
>
> -----Original Message-----
> From: Gwen Shapira [mailto:[email protected]]
> Sent: Wednesday, 6 August 2014 8:44 AM
> To: [email protected]
> Subject: Re: Import Partitions from Oracle to Hive Partitions
>
> Hive expects a directory for each partition, so getting data with OraOop will 
> require some post-processing - copy files into properly named directories and 
> adding the new partitions to a hive table.
>
> Sqoop has the --hive-partition-key and --hive-partition-value, but this 
> assumes that all the data sqooped will fit into a single partition.
>
>
> On Tue, Aug 5, 2014 at 3:40 PM, David Robson <[email protected]> 
> wrote:
>> Hi Venkat,
>>
>>
>>
>> I’m not sure what this will do in regards to Hive partitions – I’ll
>> test it out when I get into the office and get back to you. But this
>> option will make it so there is one file for each Oracle partition –
>> which might be of interest to you.
>>
>>
>>
>> Match Hadoop Files to Oracle Table Partitions
>>
>>
>>
>> -Doraoop.chunk.method={ROWID|PARTITION}
>>
>>
>>
>> To import data from a partitioned table in such a way that the
>> resulting HDFS folder structure in
>>
>> Hadoop will match the table’s partitions, set the chunk method to PARTITION.
>> The alternative
>>
>> (default) chunk method is ROWID.
>>
>>
>>
>> Notes:
>>
>> l For the number of Hadoop files to match the number of Oracle
>> partitions, set the number
>>
>> of mappers to be greater than or equal to the number of partitions.
>>
>> l If the table is not partitioned then value PARTITION will lead to an
>> error.
>>
>>
>>
>> David
>>
>>
>>
>>
>>
>> From: Venkat, Ankam [mailto:[email protected]]
>> Sent: Wednesday, 6 August 2014 3:56 AM
>> To: '[email protected]'
>> Subject: Import Partitions from Oracle to Hive Partitions
>>
>>
>>
>> I am trying to import  partitions from Oracle table to Hive partitions.
>>
>>
>>
>> Can somebody provide the syntax using regular JDBC connector and
>> Oraoop connector?
>>
>>
>>
>> Thanks in advance.
>>
>>
>>
>> Regards,
>>
>> Venkat
>>
>>
>>
>>

Reply via email to