Hi,
We have the following use-case.
We have data in relational database (Oracle).
We need to export this data to HBase and perform analysis on this data.
We need to perform this export-import 500G periodically, say every month.
Following are the different approaches I can see as per my knowledge.
Before testing and finding out the best way by myself, I wanted to listen
from the experts here.
Approach #1
===========
1) Export from Oracle to raw text file (Using Oracle export utility - Faster
- Involves no transactional overhead)
2) Upload text file to HDFS
3) Run the bulk load job (HFileOutputFormat.configureIncrementalLoad())
Approach #2
===========
1) Write a custom Job using DBInputFormat to directly read from database.
- Just a thought to avoid multiple hops(Oracle to Local FS, Local FS
to HDFS, HDFS to HBase) involved in approach #1.
2) Use the HBase bulk load tool to load this data to
HBase.(HFileOutputFormat.configureIncrementalLoad())
Approach #3
===========
1) Use Apache Sqoop (Currently under incubation) to achieve my requirement.
- I'm not aware of the istability of this.
Also, please suggest me if we have a better approach than the above.