Hi,

We have the following use-case.

We have data in relational database (Oracle).
We need to export this data to HBase and perform analysis on this data.
We need to perform this export-import 500G periodically, say every month.

Following are the different approaches I can see as per my knowledge.
Before testing and finding out the best way by myself, I wanted to listen
from the experts here.

Approach #1
===========
1) Export from Oracle to raw text file (Using Oracle export utility - Faster
- Involves no transactional overhead)

2) Upload text file to HDFS

3) Run the bulk load job (HFileOutputFormat.configureIncrementalLoad())

Approach #2
===========
1) Write a custom Job using DBInputFormat to directly read from database.
        - Just a thought to avoid multiple hops(Oracle to Local FS, Local FS
to HDFS, HDFS to HBase) involved in approach #1.

2) Use the HBase bulk load tool to load this data to
HBase.(HFileOutputFormat.configureIncrementalLoad())

Approach #3
===========
1) Use Apache Sqoop (Currently under incubation) to achieve my requirement.
        - I'm not aware of the istability of this.

Also, please suggest me if we have a better approach than the above.



Reply via email to