Hi Apurva, In would use some data ingestion tool like Apache Flume to make the task easier without much human intervention. Create sources for your different systems and rest will be taken care of by Fume. However, it is not a must to use something like Flume. But it will definitely make your life easier and will help you in developing a more sophisticated system, IMHO.
You need HBase when you need rea-time random read/access to your data. Basically when you intend to have low latency access to small amounts of data from within a large data set and you have a flexible schema. And for the last part of your question, use Apache Hive. It provides us warehousing capabilities on top of an existing Hadoop cluster with an SQLish interface to query the stored data. Also, it will be of help while using Impala. HTH Warm Regards, Tariq cloudfront.blogspot.com On Tue, Mar 25, 2014 at 1:41 AM, Geoffry Roberts <[email protected]>wrote: > Based on what you have said, it sounds as if you want to append records to > a file(s) in hdfs. I was able to do this with WebHDFS and with the hadoop > client. But you asked about architecture. Would a POST to a url satisfy > you as to architecture? If so setup WebHDFS as POST to it. > > > On Mon, Mar 24, 2014 at 1:00 PM, [email protected] <[email protected]>wrote: > >> Hello Team, >> >> I am doing POC in Hadoop and want to understand what is recommended >> architecture to injest data from different data stream like web log, >> portal, mobile, pos system into Hadoop system? Also what are the use cases >> where we need to have hbase on top of HDFS? Can't we only have hdfs and no >> hbase and if we have only hdfs can we create tables directly on hdfs which >> impala can query on? >> >> Kindly advise !!! >> Regards, Apurva >> > > > > -- > There are ways and there are ways, > > Geoffry Roberts >
