+-------[ Tim Nash ]----------------------
| >
| > I would start with a list of requirements...
| >
| The requirements are to run distributed map/reduce on 'live' xml data
| that is stored by the zope application server.

And I want a Ferrari. That's about as much of a requirement as those.

| >You'll be calling
| >out to something else to do map/reduce and return you the results.
| Agreed, but what is the storage mechanism for the files used in that process?
| If it is the hadoop file system then you can't use live data, you
| would have to copy the files to the hadoop file sytem, correct?

Well if you're ONLY storing them in hadoop via some mechanism then no.
And if your data is large enough to warrant using hadoop you're never
going to store them in Zope.

| It still looks to me like a zope to virtual file system mapping would
| be useful.

Procfs is a virtual filesystem, devfs is a virtual filesystem. smb
and nfs mounts are virtual filesystems that shadow actual filesystems,
these would work out of the box with LocalFS.

Until you can mount Hadoop in someway, it is not a filesystem, it's just
an application with with an API.

| Unfortunately it also looks like I am the only one who
| wants it so I'm not going to post it to the gsoc mailing list.

If you want a python library to interact with hadoop, write one, it's
not hard to turn java into python.

Then write a product that consists of;

Top level object that acts as a container that talks to hadoop (contains
all the logic to create files/directories etc).
Sub-objects that represent directories inside hadoop (as a Folder inside Zope)
Sub-objects that represent (XML) files inside hadoop

Add a methods and ZPTs that perform your operations and display results.

Then you can just navigate through your XML data.

The "hard" part will be talking to hadoop from python;

Although see here;


Although it would probably a lot easier to use ctypes on the c lib and
making a nicer interface using that.

Once you can turn a URL (http://hadoop.example.com/tnash/xml/xml_001) 
into a hadoop "URI" (hadoop xml/001) you're pretty much done;

You can use "popen" to run your map/reduce command from inside your
"object" and to fetch the results to display inside Zope (probably
fairly inefficient, but, it'd work). Or just get the job number and
scrape the webserver...

Oh but you wanted to store the files IN zope... so you can ignore all that.

Andrew Milton
Zope maillist  -  Zope@zope.org
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-dev )

Reply via email to