Re: Re: Moving to DFS System ..

Alexander Klimetschek Mon, 16 Feb 2009 07:45:46 -0800

Hi,

(I only answer questions publicly, so this goes back to the Jackrabbit list)

On Mon, Feb 16, 2009 at 3:39 PM, imadhusudhanan
<[email protected]> wrote:
>     Currently we use Hadoop API to access files from Distributed File
> System.  I would like to enable Webdav to the same DFS that I use using JR.
> May I know how I can make it possible ??

Jackrabbit is not a generic WebDAV to file system mapper as you might
think. Since it is a JCR repository, it must allow for all the
fine-grained JCR features (nodes and residual properties, versioning,
node types, locking etc.) that cannot be mapped onto simple OS file
systems or simple filesystem abstractions (as what Hadoop contains for
example). Theoretically that might be possible, but it's not an option
for a performant implementation. Therefore Jackabbit has its own
persistence abstraction (mainly around the PersistenceManager
interface [1]), which is driven by the internal architecture to
support the full JCR API.

[1] 
http://jackrabbit.apache.org/api/1.5/org/apache/jackrabbit/core/persistence/PersistenceManager.html

>     Also Hadoop DFS  has its own FileSystem. I guess that an entry in
> repository.xml <FileSystem> tag will change the file system to what ever I
> specify say the org.apache.hadoop.fs.LocalFileSystem etc.

No, you cannot use it. FileSystem is just a common name for
persistence abstractions, but in this case, Hadoop's FileSystem (base
class org.apache.hadoop.fs.FileSystem) and Jackrabbit's FileSystem
(interface org.apache.jackrabbit.core.fs.FileSystem) are two
completely different things.

Also, Jackrabbit's FileSystem is somewhat deprecated and today not
used for actual persistence - that's handled by PersistenceManagers
which are at a low-level where they no longer "know" about the
hierarchy, but solely work with uuids and node bundles.

This means writing a PersistenceManager that works with a Hadoop
FileSystem is probably very difficult or even impossible. Not sure how
Marcel's implementation works, but it seems to use a different Hadoop
API (not the Filesystem).

There are two options that might work for you, but both involve some
coding effort: one is to use Jackrabbit's WebDAV server "library" to
build your own server-side implementation of WebDAV that connects to a
Hadoop FileSystem. The other option would be to implement the full JCR
API via Jackrabbit SPI [2], which is a simpler API than the full JCR
API, and build this on top of a Hadoop FileSystem - but this is a
rather huge effort.

http://jackrabbit.apache.org/jackrabbit-spi.html

Have a look at the following links if you are interested in more
informations about Jackrabbit's architecture:

http://jackrabbit.apache.org/jackrabbit-architecture.html
http://jackrabbit.apache.org/how-jackrabbit-works.html
http://jackrabbit.apache.org/jackrabbit-configuration.html

Regards,
Alex

-- 
Alexander Klimetschek
[email protected]

Re: Re: Moving to DFS System ..

Reply via email to