Re: best solution for data ingestion

Chris Mattmann Mon, 04 Nov 2013 09:48:13 -0800

Hi Guys,

Depending on the *type* of ingestion you are trying to do into HDFS,
the combination of Apache OODT (http://oodt.apache.org/) and Apache
Tika (http://tika.apache.org/) may do the trick.


Cheers,
Chris



-----Original Message-----
From: Bing Jiang <[email protected]>
Reply-To: "[email protected]" <[email protected]>
Date: Monday, November 4, 2013 2:34 AM
To: "[email protected]" <[email protected]>
Subject: Re: best solution for data ingestion

>Apache Pig is also a solution for data ingest, which gives more flexible
>in functionality and more efficient in development.
>
>
>Regards.
>Bing
>
>
>2013/11/2 Marcel Mitsuto F. S. <[email protected]>
>
>I've done some testing with flume, but ended up using syslog-ng, more
>flexible, reliable, and with a lower fingerprint.
>
>
>On Fri, Nov 1, 2013 at 3:57 PM, Mirko Kämpf
><[email protected]> wrote:
>
>Have a look on Sqoop for data from RDBMS or Flume, if data flows and
>multiple sources have to be used.
>Best wishes
>Mirko
>
>
>
>2013/11/1 Siddharth Tiwari <[email protected]>
>
>hi team
>
>seeking your advice on what could be best way to ingest a lot of data to
>hadoop. Also what are views about fuse ?
>
>
>*------------------------*
>Cheers !!!
>SiddharthTiwari
>Have a refreshing day !!!
>"Every duty is holy, and devotion to duty is the highest form of worship
>of God.”
>
>"Maybe other people will try to limit me but I don't limit myself"
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>-- 
>Bing Jiang
>Tel：(86)134-2619-1361
>weibo: http://weibo.com/jiangbinglover
>BLOG: www.binospace.com <http://www.binospace.com>
>BLOG: http://blog.sina.com.cn/jiangbinglover
>
>Focus on distributed computing, HDFS/HBase
>
>
>

Re: best solution for data ingestion

Reply via email to