This is why memory-mapped files were invented. On Thu, Oct 11, 2012 at 9:34 PM, Gaurav Sharma <gaurav.gs.sha...@gmail.com> wrote: > If you don't mind sharing, what hard drive do you have with these > properties: > -"performance of RAM" > -"can accommodate very many threads" > > > On Oct 11, 2012, at 21:27, Mark Kerzner <mark.kerz...@shmsoft.com> wrote: > > Harsh, > > I agree with you about many small files, and I was giving this only in way > of example. However, the hard drive I am talking about can be 1-2 TB in > size, and that's pretty good, you can't easily get that much memory. In > addition, it would be more resistant to power failures than RAM. And yes, it > has the performance of RAM, and can accommodate very many threads. > > Mark > > On Thu, Oct 11, 2012 at 11:16 PM, Harsh J <ha...@cloudera.com> wrote: >> >> Hi Mark, >> >> Note that the NameNode does random memory access to serve back any >> information or mutate request you send to it, and that there can be >> several number of concurrent clients. So do you mean a 'very fast hard >> drive' thats faster than the RAM for random access itself? The >> NameNode does persist its block information onto disk for various >> purposes, but to actually make the NameNode use disk storage >> completely (and not specific parts of it disk-cached instead) wouldn't >> make too much sense to me. That'd feel like trying to communicate with >> a process thats swapping, performance-wise. >> >> The too many files issue is bloated up to sound like its a NameNode >> issue but it isn't in reality. HDFS allows you to process lots of >> files really fast, aside of helping store them for long periods, and a >> lot of tiny files only gets you down in such operations with overheads >> of opening and closing files in the way of reading them all at a time. >> With a single or a few large files, all you do is block (data) reads, >> and very few NameNode communications - ending up going much faster. >> This is the same for local filesystems as well, but not many think of >> that. >> >> On Fri, Oct 12, 2012 at 9:29 AM, Mark Kerzner <mark.kerz...@shmsoft.com> >> wrote: >> > Hi, >> > >> > Imagine I have a very fast hard drive that I want to use for the >> > NameNode. >> > That is, I want the NameNode to store its blocks information on this >> > hard >> > drive instead of in memory. >> > >> > Why would I do it? Scalability (no federation needed), many files are >> > not a >> > problem, and warm fail-over is automatic. What would I need to change in >> > the >> > NameNode to tell it to use the hard drive? >> > >> > Thank you, >> > Mark >> >> >> >> -- >> Harsh J > >
-- Lance Norskog goks...@gmail.com