Re: Anatomy of read in hdfs

2017-04-10 Thread Sidharth Kumar
Thanks Philippe but your answers raised another sets of questions to me .please help me to understand it 1) If we read anatomy of hdfs read in hadoop definitive guide it says data queue is consumed by streamer. So, can you just tell me that will there be only one streamer in a cluster which

Re: Anatomy of read in hdfs

2017-04-10 Thread Philippe Kernévez
On Mon, Apr 10, 2017 at 11:46 AM, Sidharth Kumar < sidharthkumar2...@gmail.com> wrote: > Thanks Philippe, > > I am looking for answer only restricted to HDFS. Because we can do read > and write operations from CLI using commands like "*hadoop fs > -copyfromlocal /(local disk location) /(hdfs

Re: Anatomy of read in hdfs

2017-04-10 Thread Sidharth Kumar
Thanks Philippe, I am looking for answer only restricted to HDFS. Because we can do read and write operations from CLI using commands like "*hadoop fs -copyfromlocal /(local disk location) /(hdfs path)" *and read using "*hadoop fs -text /(hdfs file)" *as well. So my question are 1) when I write

Re: Anatomy of read in hdfs

2017-04-10 Thread Philippe Kernévez
Hi Sidharth, As it has been explained, HDFS is not just a file system. It's a part of the Hadoop platform. To take advantage of HDFS you have to understand how Hadoop storage (HDFS) AND Yarn processing (say MapReduce) work all together to implements jobs and parallel processing. That says that

Re: Anatomy of read in hdfs

2017-04-09 Thread daemeon reiydelle
Readers ARE parallel processes, one per map task. There are defaults in map phase, about how many readers there are for the input file(s). Default is one mapper task block (or file, where any file is smaller than the hdfs block size). There is no java framework per se for splitting up an file

Re: Anatomy of read in hdfs

2017-04-09 Thread Mohammad Tariq
Hi Sidharth, I'm sorry I didn't quite get the first part your question. What do you mean by real time? Could you please elaborate it a bit? That'll help me answering your question in a better manner. And for your second question, This is how write happens - Suppose your file resides in your

Re: Anatomy of read in hdfs

2017-04-09 Thread Sidharth Kumar
Thanks Tariq, It really helped me to understand but just one another doubt that if reading is not a parallel process then to ready a file of 100GB and hdfs block size is 128MB. It will take lot much to read the complete file but it's not the scenerio in the real time. And second question is write

Re: Anatomy of read in hdfs

2017-04-08 Thread Mohammad Tariq
Hi Sidhart, When you read data from HDFS using a framework, like MapReduce, blocks of a HDFS file are read in parallel by multiple mappers created in that particular program. Input splits to be precise. On the other hand if you have a standalone java program then it's just a single thread

Re: Anatomy of read in hdfs

2017-04-07 Thread Sidharth Kumar
Thanks for your response . But I dint understand yet,if you don't mind can you tell me what do you mean by "*With Hadoop, the idea is to parallelize the readers (one per block for the mapper) with processing framework like MapReduce.*" And also how the concept of parallelize the readers will work

Re: Anatomy of read in hdfs

2017-04-07 Thread Philippe Kernévez
Hi Sidharth, The reads are sequential. With Hadoop, the idea is to parallelize the readers (one per block for the mapper) with processing framework like MapReduce. Regards, Philippe On Thu, Apr 6, 2017 at 9:55 PM, Sidharth Kumar wrote: > Hi Genies, > > I have a

Anatomy of read in hdfs

2017-04-06 Thread Sidharth Kumar
Hi Genies, I have a small doubt that hdfs read operation is parallel or sequential process. Because from my understanding it should be parallel but if I read "hadoop definitive guide 4" in anatomy of read it says "*Data is streamed from the datanode back **to the client, which calls read()