Re: Kafka or Flume

2017-06-29 Thread daemeon reiydelle
For fairly simple transformations, Flume is great, and works fine subscribing ​to some pretty ​ high volumes of messages from Kafka ​ (I think we hit 50M/second at one point)​ . If you need to do complex transformations, e.g. database lookups for the Kafka to Hadoop ETL, then you will start having

RE: Kafka or Flume

2017-06-29 Thread Mallanagouda Patil
Kafka is capable of processing billions of events per second. You can scale it horizontally with Kafka broker servers. You can try out these steps 1. Create a topic in Kafka to get your all data. You have to use Kafka producer to ingest data into Kafka. 2. If you are going to write your own HDFS

RE: Kafka or Flume

2017-06-29 Thread Sidharth Kumar
Thanks! What about Kafka with Flume? And also I would like to tell that everyday data intake is in millions and can't afford to loose even a single piece of data. Which makes a need of high availablity. Warm Regards Sidharth Kumar | Mob: +91 8197 555 599/7892 192 367 | LinkedIn: www.linkedin.co

RE: Kafka or Flume

2017-06-29 Thread JP gupta
The ideal sequence should be: 1. Ingress using Kafka -> Validation and processing using Spark -> Write into any NoSql DB or Hive. >From my recent experience, writing directly to HDFS can be slow depending on >the data format. Thanks JP From: Sudeep Singh Thakur [mailto:sudeepth

Re: Kafka or Flume

2017-06-29 Thread Sudeep Singh Thakur
In your use Kafka would be better because you want some transformations and validations. Kind regards, Sudeep Singh Thakur On Jun 30, 2017 8:57 AM, "Sidharth Kumar" wrote: > Hi, > > I have a requirement where I have all transactional data injestion into > hadoop in real time and before storing

Kafka or Flume

2017-06-29 Thread Sidharth Kumar
Hi, I have a requirement where I have all transactional data injestion into hadoop in real time and before storing the data into hadoop, process it to validate the data. If the data failed to pass validation process , it will not be stored into hadoop. The validation process also make use of histo

Re: Lots of warning messages and exception in namenode logs

2017-06-29 Thread Ravi Prakash
Hi Omprakash! If both datanodes die at the same time, then yes, data will be lost. In that case, you should increase dfs.replication to 3 (so that there will be 3 copies). This obviously adversely affects the total amount of data you can store on HDFS. However if only 1 datanode dies, the namenod

Re: Lots of warning messages and exception in namenode logs

2017-06-29 Thread Atul Rajan
unsubscribe On 29 June 2017 at 17:20, omprakash wrote: > Hi Sidharth, > > > > Thanks a lot for the clarification. May you suggest parameters that can > improve the re-replication in case of failure. > > > > Regards > > Om > > > > *From:* Sidharth Kumar [mailto:sidharthkumar2...@gmail.com] > *Sen

RE: Lots of warning messages and exception in namenode logs

2017-06-29 Thread omprakash
Hi Sidharth, Thanks a lot for the clarification. May you suggest parameters that can improve the re-replication in case of failure. Regards Om From: Sidharth Kumar [mailto:sidharthkumar2...@gmail.com] Sent: 29 June 2017 16:06 To: omprakash Cc: Arpit Agarwal ; common-u...@hadoop.apa

RE: Lots of warning messages and exception in namenode logs

2017-06-29 Thread Sidharth Kumar
Hi, No, as there will be no copy exists of that file. You can increase the replication factor to 3 so that there will be 3 copies created and even if 2 data nodes goes down you will still have one copy available which will be again replicated to 3 by the namenode in due course of time. Warm Rega

RE: Lots of warning messages and exception in namenode logs

2017-06-29 Thread omprakash
Hi Ravi, I have 5 nodes in Hadoop cluster and all have same configurations. After setting dfs.replication=2 , I did a clean start of hdfs. As per your suggestion, I added 2 more datanodes and clean all the data and metadata. The performance of the cluster has dramatically improved. I can