Re: Memory mapped resources

2011-04-13 Thread Lance Norskog
There are systems for file-system plumbing out to user processes, and FUSE does this on Linux, and there is a package for hadoop. However- pretending a remote resource is local holds a place of honor on the system design antipattern hall of fame. On Wed, Apr 13, 2011 at 7:35 AM, Benson Margulies

Re: Memory mapped resources

2011-04-13 Thread Benson Margulies
Point taken. On Wed, Apr 13, 2011 at 10:33 AM, M. C. Srivas wrote: > Sorry, don't mean to say you don't know mmap or didn't do cool things in the > past. > But you will see why anyone would've interpreted this original post, given > the title of the posting and the following wording, to mean "can

Re: Memory mapped resources

2011-04-13 Thread M. C. Srivas
Sorry, don't mean to say you don't know mmap or didn't do cool things in the past. But you will see why anyone would've interpreted this original post, given the title of the posting and the following wording, to mean "can I mmap files that are in hdfs" On Mon, Apr 11, 2011 at 3:57 PM, Benson Mar

Re: Memory mapped resources

2011-04-13 Thread Benson Margulies
Guys, I'm not the one who said 'HDFS' unless I had a brain bubble in my original message. I asked for a distribution mechanism for code+mappable data. I appreciate the arrival of some suggestions. Ted is correct that I know quite a bit about mmap; I had a lot to do with the code in ObjectStore tha

Re: Memory mapped resources

2011-04-13 Thread Luca Pireddu
On April 12, 2011 21:50:07 Luke Lu wrote: > You can use distributed cache for memory mapped files (they're local > to the node the tasks run on.) > > http://developer.yahoo.com/hadoop/tutorial/module5.html#auxdata We adopted this solution for a similar problem. For a program we developed each m

Re: Memory mapped resources

2011-04-12 Thread Ted Dunning
Benson is actually a pretty sophisticated guy who knows a lot about mmap. I engaged with him yesterday on this since I know him from Apache. On Tue, Apr 12, 2011 at 7:16 PM, M. C. Srivas wrote: > I am not sure if you realize, but HDFS is not VM integrated.

Re: Memory mapped resources

2011-04-12 Thread M. C. Srivas
I am not sure if you realize, but HDFS is not VM integrated. What you are asking for is support *inside* the linux kernel for HDFS file systems. I don't see that happening for the next few years, and probably never at all. (HDFS is all Java today, and Java certainly is not going to go inside the

Re: Memory mapped resources

2011-04-12 Thread Luke Lu
You can use distributed cache for memory mapped files (they're local to the node the tasks run on.) http://developer.yahoo.com/hadoop/tutorial/module5.html#auxdata On Tue, Apr 12, 2011 at 10:40 AM, Benson Margulies wrote: > Here's the OP again. > > I want to make it clear that my question here h

Re: Memory mapped resources

2011-04-12 Thread Ted Dunning
Actually, it doesn't become trivial. It just becomes total fail or total win instead of almost always being partial win. It doesn't meet Benson's need. On Tue, Apr 12, 2011 at 11:09 AM, Jason Rutherglen < jason.rutherg...@gmail.com> wrote: > To get around the chunks or blocks problem, I've been

Re: Memory mapped resources

2011-04-12 Thread Jason Rutherglen
To get around the chunks or blocks problem, I've been implementing a system that simply sets a max block size that is too large for a file to reach. In this way there will only be one block for HDFS file, and so MMap'ing or other single file ops become trivial. On Tue, Apr 12, 2011 at 10:40 AM, B

Re: Memory mapped resources

2011-04-12 Thread Benson Margulies
Here's the OP again. I want to make it clear that my question here has to do with the problem of distributing 'the program' around the cluster, not 'the data'. In the case at hand, the issue a system that has a large data resource that it needs to do its work. Every instance of the code needs the

Re: Memory mapped resources

2011-04-12 Thread Ted Dunning
Blocks live where they land when first created. They can be moved due to node failure or rebalancing, but it is typically pretty expensive to do this. It certainly is slower than just reading the file. If you really, really want mmap to work, then you need to set up some native code that builds

Re: Memory mapped resources

2011-04-12 Thread Jason Rutherglen
> The others you will have to read more conventionally True. I think there are emergent use cases that demand data locality, eg, an optimized HBase system, search, and MMap'ing. > If all blocks are guaranteed local, this would work. I don't think that > guarantee is possible > on a non-trivia

Re: Memory mapped resources

2011-04-12 Thread Ted Dunning
Well, no. You could mmap all the blocks that are local to the node your program is on. The others you will have to read more conventionally. If all blocks are guaranteed local, this would work. I don't think that guarantee is possible on a non-trivial cluster. On Tue, Apr 12, 2011 at 6:32 AM,

Re: Memory mapped resources

2011-04-12 Thread Ted Dunning
vin > > > -Original Message- > From: Ted Dunning [mailto:tdunn...@maprtech.com] > Sent: Tuesday, April 12, 2011 12:09 AM > To: Jason Rutherglen > Cc: common-user@hadoop.apache.org; Edward Capriolo > Subject: Re: Memory mapped resources > > Yes. But only on

Re: Memory mapped resources

2011-04-12 Thread Michael Flester
> We have some very large files that we access via memory mapping in > Java. Someone's asked us about how to make this conveniently > deployable in Hadoop. If we tell them to put the files into hdfs, can > we obtain a File for the underlying file on any given node? We sometimes find it convenient

Re: Memory mapped resources

2011-04-12 Thread Jason Rutherglen
Then one could MMap the blocks pertaining to the HDFS file and piece them together. Lucene's MMapDirectory implementation does just this to avoid an obscure JVM bug. On Mon, Apr 11, 2011 at 9:09 PM, Ted Dunning wrote: > Yes.  But only one such block. That is what I meant by chunk. > That is fine

RE: Memory mapped resources

2011-04-12 Thread Kevin.Leach
he.org; Edward Capriolo Subject: Re: Memory mapped resources Yes. But only one such block. That is what I meant by chunk. That is fine if you want that chunk but if you want to mmap the entire file, it isn't real useful. On Mon, Apr 11, 2011 at 6:48 PM, Jason Rutherglen < jason.rutherg.

Re: Memory mapped resources

2011-04-11 Thread Ted Dunning
Yes. But only one such block. That is what I meant by chunk. That is fine if you want that chunk but if you want to mmap the entire file, it isn't real useful. On Mon, Apr 11, 2011 at 6:48 PM, Jason Rutherglen < jason.rutherg...@gmail.com> wrote: > What do you mean by local chunk? I think it's

Re: Memory mapped resources

2011-04-11 Thread Jason Rutherglen
What do you mean by local chunk? I think it's providing access to the underlying file block? On Mon, Apr 11, 2011 at 6:30 PM, Ted Dunning wrote: > Also, it only provides access to a local chunk of a file which isn't very > useful. > > On Mon, Apr 11, 2011 at 5:32 PM, Edward Capriolo > wrote: >>

Re: Memory mapped resources

2011-04-11 Thread Ted Dunning
Also, it only provides access to a local chunk of a file which isn't very useful. On Mon, Apr 11, 2011 at 5:32 PM, Edward Capriolo wrote: > On Mon, Apr 11, 2011 at 7:05 PM, Jason Rutherglen > wrote: > > Yes you can however it will require customization of HDFS. Take a > > look at HDFS-347 speci

Re: Memory mapped resources

2011-04-11 Thread Edward Capriolo
On Mon, Apr 11, 2011 at 7:05 PM, Jason Rutherglen wrote: > Yes you can however it will require customization of HDFS.  Take a > look at HDFS-347 specifically the HDFS-347-branch-20-append.txt patch. >  I have been altering it for use with HBASE-3529.  Note that the patch > noted is for the -append

Re: Memory mapped resources

2011-04-11 Thread Jason Rutherglen
Yes you can however it will require customization of HDFS. Take a look at HDFS-347 specifically the HDFS-347-branch-20-append.txt patch. I have been altering it for use with HBASE-3529. Note that the patch noted is for the -append branch which is mainly for HBase. On Mon, Apr 11, 2011 at 3:57 P