Yes, it can be done and a standard practice. I would suggest a mixed
approach: use Informatica to create files in hdfs and have hive staging
tables as external tables on those directories. Then that point onwards use

On 10 Nov 2016 04:00, "Mich Talebzadeh" <> wrote:

> Thanks Mike for insight.
> This is a request landed on us which is rather unusual.
> As I understand Informatica is an ETL tool. Most of these are glorified
> Sqoop with GUI where you define your source and target.
> In a normal day Informatica takes data out of an RDBMS like Oracle table
> and lands it on Teradata or Sybase IQ (DW).
> So in our case we really need to redefine the map. Customer does not want
> the plug in from the Informatica for Hive etc which admittedly will make
> life far easier. They want us to come up with a solution.
> In the absence of the fact that we cannot use JDBC for Hive etc as target
> (?), the easiest option is to dump it into landing zone and then do
> whatever we want with it.
> Also I am not sure we can use Flume for it? That was a thought in my mind.
> So sort of stuck between Hard and Rock here. So in short we want a plug in
> to be consumer of Informatica.
> cheers
> Mich
> Dr Mich Talebzadeh
> LinkedIn * 
> <>*
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
> On 9 November 2016 at 16:14, Michael Segel <>
> wrote:
>> Mich,
>> You could do that. But really?
>> Putting on my solutions architect hat…
>> You or your client is spending $$$ for product licensing and you’re not
>> really using the product to its fullest.
>> Yes, you can use Informatica to pull data from the source systems and
>> provide some data cleansing and transformations before you drop it on your
>> landing zone.
>> If you’re going to bypass Hive, then you have to capture the schema,
>> including data types.  You’re also going to have to manage schema evolution
>> as they change over time. (I believe the ETL tools will do this for you or
>> help in the process.)
>> But if you’re already working on the consumption process for ingestion on
>> your own… what is the value that you derive from using Informatica?  Is the
>> unloading and ingestion process that difficult that you can’t write that as
>> well?
>> My point is that if you’re going to use the tool, use it as the vendor
>> recommends (and they may offer options…) or skip it.
>> I mean heck… you may want to take the flat files (CSV, etc) that are
>> dropped in the landing zone, and then ingest and spit out parquet files via
>> spark. You just need to know the Schema(s) of ingestion and output if they
>> are not the same. ;-)
>> Of course you may decide that using Informatica to pull and transform the
>> data and drop it on to the landing zone provides enough value to justify
>> its expense.  ;-) YMMV
>> Just my $0.02 worth.
>> Take it with a grain of Kosher Sea Salt.  (The grains are larger and the
>> salt taste’s better) ;-)
>> -Mike
>> On Nov 9, 2016, at 7:56 AM, Mich Talebzadeh <>
>> wrote:
>> Hi,
>> I am exploring the idea of flexibility with importing multiple RDBMS
>> tables using Informatica that customer has into HDFS.
>> I don't want to use connectivity tools from Informatica to Hive etc.
>> So this is what I have in mind
>>    1. If possible get the tables data out using Informatica and use
>>    Informatica ui  to convert RDBMS data into some form of CSV, TSV file (Can
>>    Informatica do it?) I guess yes
>>    2. Put the flat files on an edge where HDFS node can see them.
>>    3. Assuming that a directory can be created by Informatica daily,
>>    periodically run a cron that ingest that data from directories into HDFS
>>    equivalent daily directories
>>    4. Once the data is in HDFS one can use, Spark csv, Hive etc to query
>>    data
>> The problem I have is to see if someone has done such thing before.
>> Specifically can Informatica create target flat files on normal directories.
>> Any other generic alternative?
>> Thanks
>> Dr Mich Talebzadeh
>> LinkedIn * 
>> <>*
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.

Reply via email to