There are systems for file-system plumbing out to user processes, and
FUSE does this on Linux, and there is a package for hadoop. However-
pretending a remote resource is local holds a place of honor on the
system design antipattern hall of fame.
On Wed, Apr 13, 2011 at 7:35 AM, Benson Margulies
Point taken.
On Wed, Apr 13, 2011 at 10:33 AM, M. C. Srivas wrote:
> Sorry, don't mean to say you don't know mmap or didn't do cool things in the
> past.
> But you will see why anyone would've interpreted this original post, given
> the title of the posting and the following wording, to mean "can
Sorry, don't mean to say you don't know mmap or didn't do cool things in the
past.
But you will see why anyone would've interpreted this original post, given
the title of the posting and the following wording, to mean "can I mmap
files that are in hdfs"
On Mon, Apr 11, 2011 at 3:57 PM, Benson Mar
Guys, I'm not the one who said 'HDFS' unless I had a brain bubble in
my original message. I asked for a distribution mechanism for
code+mappable data. I appreciate the arrival of some suggestions.
Ted is correct that I know quite a bit about mmap; I had a lot to do
with the code in ObjectStore tha
On April 12, 2011 21:50:07 Luke Lu wrote:
> You can use distributed cache for memory mapped files (they're local
> to the node the tasks run on.)
>
> http://developer.yahoo.com/hadoop/tutorial/module5.html#auxdata
We adopted this solution for a similar problem. For a program we developed
each m
Benson is actually a pretty sophisticated guy who knows a lot about mmap.
I engaged with him yesterday on this since I know him from Apache.
On Tue, Apr 12, 2011 at 7:16 PM, M. C. Srivas wrote:
> I am not sure if you realize, but HDFS is not VM integrated.
I am not sure if you realize, but HDFS is not VM integrated. What you are
asking for is support *inside* the linux kernel for HDFS file systems. I
don't see that happening for the next few years, and probably never at all.
(HDFS is all Java today, and Java certainly is not going to go inside the
You can use distributed cache for memory mapped files (they're local
to the node the tasks run on.)
http://developer.yahoo.com/hadoop/tutorial/module5.html#auxdata
On Tue, Apr 12, 2011 at 10:40 AM, Benson Margulies
wrote:
> Here's the OP again.
>
> I want to make it clear that my question here h
Actually, it doesn't become trivial. It just becomes total fail or total
win instead of almost always being partial win. It doesn't meet Benson's
need.
On Tue, Apr 12, 2011 at 11:09 AM, Jason Rutherglen <
jason.rutherg...@gmail.com> wrote:
> To get around the chunks or blocks problem, I've been
To get around the chunks or blocks problem, I've been implementing a
system that simply sets a max block size that is too large for a file
to reach. In this way there will only be one block for HDFS file, and
so MMap'ing or other single file ops become trivial.
On Tue, Apr 12, 2011 at 10:40 AM, B
Here's the OP again.
I want to make it clear that my question here has to do with the
problem of distributing 'the program' around the cluster, not 'the
data'. In the case at hand, the issue a system that has a large data
resource that it needs to do its work. Every instance of the code
needs the
Blocks live where they land when first created. They can be moved due to
node failure or rebalancing, but it is typically pretty expensive to do
this. It certainly is slower than just reading the file.
If you really, really want mmap to work, then you need to set up some native
code that builds
> The others you will have to read more conventionally
True. I think there are emergent use cases that demand data locality,
eg, an optimized HBase system, search, and MMap'ing.
> If all blocks are guaranteed local, this would work. I don't think that
> guarantee is possible
> on a non-trivia
Well, no.
You could mmap all the blocks that are local to the node your program is on.
The others you will have to read more conventionally. If all blocks are
guaranteed local, this would work. I don't think that guarantee is possible
on a non-trivial cluster.
On Tue, Apr 12, 2011 at 6:32 AM,
vin
>
>
> -Original Message-
> From: Ted Dunning [mailto:tdunn...@maprtech.com]
> Sent: Tuesday, April 12, 2011 12:09 AM
> To: Jason Rutherglen
> Cc: common-user@hadoop.apache.org; Edward Capriolo
> Subject: Re: Memory mapped resources
>
> Yes. But only on
> We have some very large files that we access via memory mapping in
> Java. Someone's asked us about how to make this conveniently
> deployable in Hadoop. If we tell them to put the files into hdfs, can
> we obtain a File for the underlying file on any given node?
We sometimes find it convenient
Then one could MMap the blocks pertaining to the HDFS file and piece
them together. Lucene's MMapDirectory implementation does just this
to avoid an obscure JVM bug.
On Mon, Apr 11, 2011 at 9:09 PM, Ted Dunning wrote:
> Yes. But only one such block. That is what I meant by chunk.
> That is fine
he.org; Edward Capriolo
Subject: Re: Memory mapped resources
Yes. But only one such block. That is what I meant by chunk.
That is fine if you want that chunk but if you want to mmap the entire
file,
it isn't real useful.
On Mon, Apr 11, 2011 at 6:48 PM, Jason Rutherglen <
jason.rutherg.
Yes. But only one such block. That is what I meant by chunk.
That is fine if you want that chunk but if you want to mmap the entire file,
it isn't real useful.
On Mon, Apr 11, 2011 at 6:48 PM, Jason Rutherglen <
jason.rutherg...@gmail.com> wrote:
> What do you mean by local chunk? I think it's
What do you mean by local chunk? I think it's providing access to the
underlying file block?
On Mon, Apr 11, 2011 at 6:30 PM, Ted Dunning wrote:
> Also, it only provides access to a local chunk of a file which isn't very
> useful.
>
> On Mon, Apr 11, 2011 at 5:32 PM, Edward Capriolo
> wrote:
>>
Also, it only provides access to a local chunk of a file which isn't very
useful.
On Mon, Apr 11, 2011 at 5:32 PM, Edward Capriolo wrote:
> On Mon, Apr 11, 2011 at 7:05 PM, Jason Rutherglen
> wrote:
> > Yes you can however it will require customization of HDFS. Take a
> > look at HDFS-347 speci
On Mon, Apr 11, 2011 at 7:05 PM, Jason Rutherglen
wrote:
> Yes you can however it will require customization of HDFS. Take a
> look at HDFS-347 specifically the HDFS-347-branch-20-append.txt patch.
> I have been altering it for use with HBASE-3529. Note that the patch
> noted is for the -append
Yes you can however it will require customization of HDFS. Take a
look at HDFS-347 specifically the HDFS-347-branch-20-append.txt patch.
I have been altering it for use with HBASE-3529. Note that the patch
noted is for the -append branch which is mainly for HBase.
On Mon, Apr 11, 2011 at 3:57 P
23 matches
Mail list logo