Re: Architecture question on Injesting Data into Hadoop

Mohammad Tariq Mon, 24 Mar 2014 13:22:24 -0700

Hi Apurva,

In would use some data ingestion tool like Apache Flume to make the task
easier without much human intervention. Create sources for your different
systems and rest will be taken care of by Fume. However, it is not a must
to use something like Flume. But it will definitely make your life easier
and will help you in developing a more sophisticated system, IMHO.

You need HBase when you need rea-time random read/access to your data.
Basically when you intend to have low latency access to small amounts of
data from within a large data set and you have a flexible schema.

And for the last part of your question, use Apache Hive. It provides us
warehousing capabilities on top of an existing Hadoop cluster with an
SQLish interface to query the stored data. Also, it will be of help while
using Impala.

HTH

Warm Regards,
Tariq
cloudfront.blogspot.com

On Tue, Mar 25, 2014 at 1:41 AM, Geoffry Roberts <[email protected]>wrote:

> Based on what you have said, it sounds as if you want to append records to
> a file(s) in hdfs.  I was able to do this with WebHDFS and with the hadoop
> client.  But you asked about architecture.  Would a POST to a url satisfy
> you as to architecture?  If so setup WebHDFS as POST to it.
>
>
> On Mon, Mar 24, 2014 at 1:00 PM, [email protected] <[email protected]>wrote:
>
>> Hello Team,
>>
>> I am doing POC in Hadoop and want to understand what is recommended
>> architecture to injest data from different data stream like web log,
>> portal, mobile, pos system into Hadoop system? Also what are the use cases
>> where we need to have hbase on top of HDFS? Can't we only have hdfs and no
>> hbase and if we have only hdfs can we create tables directly on hdfs which
>> impala can query on?
>>
>> Kindly advise !!!
>> Regards, Apurva
>>
>
>
>
> --
> There are ways and there are ways,
>
> Geoffry Roberts
>

Re: Architecture question on Injesting Data into Hadoop

Reply via email to