The obligation of free stuff: Google Storage

2010-06-09 Thread Aaron Sherman
On a lark, I submitted a request to Google for membership in the
Google Storage beta on the basis of doing something virtual
filesystemish for Perl 6. The bastards gave me an account, so now I
feel as if I should do something.

Has anyone begun to consider what kind of filesystem interface we want
for things like sftp, Amazon S3, Google Storage and other remote
storage possibilities? Is there any extant work out there, or should I
just start spit-balling?

-- 
Aaron Sherman
Email or GTalk: a...@ajs.com
http://www.ajs.com/~ajs


Re: The obligation of free stuff: Google Storage

2010-06-09 Thread Aaron Sherman
On Wed, Jun 9, 2010 at 10:04 AM, Aaron Sherman a...@ajs.com wrote:
 Has anyone begun to consider what kind of filesystem interface we want
 for things like sftp, Amazon S3, Google Storage and other remote
 storage possibilities? Is there any extant work out there, or should I
 just start spit-balling?

In the absence of anything forthcoming and a totally arbitrary sense
of urgency ;-) here's what I think I should do:

IO::FileSystems (S32) gives us some basics and the Path role also
provides some useful features.

I will start there and build an IO::FileSystems::VFS roughly like:

class IO::VFS is IO::FileSystems {
  ...
  # Session data if applicable
  has IO::VFS::Session $.session;

 # Many methods take a $context which, if supplied
 # will contain back-end specific data such as restart markers
 # or payment model information. I'll probably define
 # a role for the context parameter, but otherwise
 # leave it pretty loose as a back-end specific structure.

  # A simple operation that guarantees a round-trip to the filesystem
  method nop($context?) { ... }

  # list of sub-IO::VFS partitions/buckets/etc
  method targets($context?) { ... }
  method find_target($locator, $context?) { ... }

  # Means of acquiring file-level access through a VFS
  method find($locator, $enc = $.session.encoding, $context?) { ... }
  method glob(Str $matcher, $enc = $.session.encoding, $context?) { ... }

  # Like opening and writing to filehandle, but the operation is totally
  # opaque and might be a single call, senfile or anything else.
  # Note that this doesn't replace $obj.find($path).write(...)
  method put($locator, $data, $enc = $.session.encoding, $context?) { ... }

  # Atomic copy/rename, etc. are logically filesystem operations, even though
  # they might have counterparts at the file level. The distinction being that
  # at the filesystem level I never know nor care what the contents of the
  # file are, I just ask for an operation to be performed on a given path.
  method copy($from, $to, $enc = $.session.encoding, $context?) { ... }
  method rename($from, $to, $enc = $.session.encoding, $context?) { ... }
  method delete($locator, $enc = $.session.encoding, $context?) { ... }

  # service-level ACLs if any
  method acl($locator, $context?) { ... }
}

The general model I imagine would be something like:

  my IO::VFS::S3 $s3 .= new();
  $s3.session.connect($amazonlogininfo);
  my $bucket = $s3.find_target($bucket_name);
  $bucket.put(quote.txt, Now is the time for all good men...\n);
  say URI: , $bucket.find(quote.txt).uri;

or

 my IO::VFS::GoogleStorage $goog .= new();
 $goog.session.connect($googlelogininfo);
 my $bucket = $goog.find_target($bucket_name);
 $bucket.put(quote.txt, Now is the time for all good men...\n);
 say URI: , $bucket.find(quote.txt).uri;

or

 my IO::VFS::SFTP $sftp .= new();
 $sftp.session.connect(:hoststorage, :userajs, :passwordiforgotit);
 my $filesystem = $sftp.find_target(/tmp);
 $filesystem.put(quote.txt, Now is the time for all good men...\n);
 say URI: , $filesystem.find(quote.txt).uri; # using sftp:...

Notice that everything after $obj.session.connect is identical except
for my choice of variable names. In fact, you shouldn't have to worry
about what storage back-end you're using as long as you have a valid
VFS handle. Really path names are the only thing that might trip you
up.

Thoughts?

I think that in order to do this, I'll need the following support
libraries which may or may not exist (I'll be looking into these):

IO::FileSystems
Path
HTTP (requires socket IO, MIME, base64, etc.)
Various crypto libs

I don't intend to provide a finished implementation of any of these
where they don't already exist (I may not even end up with a final
implementation of the VFS layer), but at least I'll get far enough
along that others who want to work on this will have a starting point,
and I'll want to at least have a test that fakes its way all the way
down to creating a remote file on all three services, even if most of
the operations involve passing on blobs of data generated by
equivalent calls in other languages.

-- 
Aaron Sherman
Email or GTalk: a...@ajs.com
http://www.ajs.com/~ajs