Saurabh,
I am also trying to figure out how best to use sqoop on a non-CDH
cluster. From what I have learned, they are trying to get sqoop version
1.4.x stable on apache hadoop 0.23.x for now. Which is still in alpha.
And it doesn't work at all with anything before 0.21 (except for
Cloudera distributions). So we are kinda on our own, as far as I can
tell, until hadoop 0.23 is released and we upgrade to it, or until they
resolve https://issues.apache.org/jira/browse/SQOOP-384
--- wad
On 12/05/2011 07:18 PM, Saurabh Sehgal wrote:
Hi,
I am evaluating sqoop to do DB extracts from our relational stores.
The hadoop cluster running in production for us is Hadoop 0.20.append.
According to the sqoop introduction page on github:
"Sqoop relies on advanced features of Apache Hadoop. As such, it
requires the latest beta of Cloudera’s Distribution for Hadoop (CDH3
beta 2). Sqoop may be compatible with the Apache 0.21.0 release, but
this is considered experimental and should not be used in production.
The COMPILING.txt file describes how to select a Hadoop distribution
to target at compilation time."
Does this still hold true ? All I want to do is incrementally import
tables from an Oracle database. Can someone explain what features are
missing from the non cloudera distributions and why is it unsafe to
use them in production ?
Thank you.