Re: Polished FileSystem API proposal

2013-11-06 Thread Brian Stell
There are multiple interesting ideas being discussed

1. Mapping files to persistent URLs.
2. Sharing persistent URLs between different origins.
3. Using the ServiceWorker [1] to redirect URL requests (and possibly
manage it's own cache / files)
4. De-duping file copies using a Git like scheme.

1. Mapping files to persistent URLs.
==
There are some things that could be considered 'ok' to be slow: images,
video startup, list of unread emails. There are some things that are very
time sensitive such as the initial page layout. This means that things like
the CSS required for layout and the fonts required for layout need to be
there before the body begins laying out; ie: they need to be available in
the head section. If they are not then the page will irritatingly flash
as the parts arrive. Unless I'm missing something, this means that objects
retrieved by promises/callbacks are not fast enough.

For these needed for layout objects, persistent URLs are very attractive
as they can be retrieved quickly and in the head section. These
persistent URLs can be implemented in a filesystem or  IndexedDB or other
mechanism. A ServiceWorker could remap the URL but the data would still
need to be available locally (eg, on disk).


2. Sharing persistent URLs between different origins.
==
Right now, any interesting object that is used by different origins (even
from the same organization) must be downloaded and stored once per origin.
Imagine if Linux required glib to be separately stored for every
executable. This is how the web works today.

Shareable persistent URLs would allow for a single copy to be download and
shared across origins. Like shared libraries the user of the shared object
has to trust the providing origin and only the providing origin should be
able to write the data.


3. Using the ServiceWorker to redirect URL requests (and possibly manage
it's own cache / files)
==
The ServiceWorker provides a way for an web page to redirect URLs. This is
a very attractive feature for applications that are offline (or have an
unreliable connection). The redirected URL could be to a completely
different URL or to data managed by the ServiceWorker itself; eg, the
ServiceWorker could use the FileSystem API to store data and redirect URLs
to that data. Hopefully, this redirection will be fast; eg, fast enough for
'needed for layout' objects.

Each ServiceWorker is origin specific: they are not shared across domains,
and they are completely isolated from the browser's HTTP cache [2]. I take
this to imply that the ServiceWorker has no ability to provide persistent
URLs to other origins.


4. De-duping file copies using a Git like scheme.
==
My sense is everyone likes the idea of avoiding storing redundant data and
Git's use of the SHA1 message digest as the filename is 'good enough'. This
is a low security risk mechanism that is good for storage efficiency. The
most benefit occurs when the storage mechanism (eg, FileSystem, IndexedDB)
applies this across origins. Like sharing across origins it gets the
benefit of avoiding duplicates but it does not address the multiple
downloads issue. Multiple downloads are probably okay for smallish files
but could be an issue for larger files such as 20Mbyte Chinese fonts, large
Javascript libraries, etc. My wild guess is that because this is a 'good
thing to do' but not 'a critical thing to do', its odds of getting
implemented are poor.


Brian Stell


Notes:
[1] https://github.com/slightlyoff/ServiceWorker
[2] https://github.com/slightlyoff/ServiceWorker/blob/master/caching.md






On Wed, Nov 6, 2013 at 8:28 AM, pira...@gmail.com pira...@gmail.com wrote:

  That's very interesting and useful, but I don't think it fits the same
 use
  case I was talking about.  I want the ability to create some object that
  exports an URL that I can put in an iframe.  Then all requests from that
  iframe for resources will dynamically call my javascript code.  I could
  implement the same logic that a server-side application does, but from
 local
  code in my browser.
 
 That's just the purpose of ServiceWorker :-) Only that from your
 message, I suspect you are asking about having the same functionality
 but only on the current session or maybe also only when the page is
 open, deleting it on reload. I don't know of anything like to this,
 the most similar ones would be FirefoxOS Browser API or Chrome
 FileSystem API, but nothing as powerful as ServiceWorker, sorry :-(
 They are talking about implementing the Fetch specification, maybe you
 would write them about allowing to be used someway the ServiceWorker
 functionality on a per-session basis, I find legitimate your
 proposition...



 --
 Si quieres viajar alrededor del mundo y ser invitado a hablar en un
 monton de sitios diferentes

Re: Polished FileSystem API proposal

2013-11-05 Thread Brian Stell
On Wed, Oct 30, 2013 at 7:19 PM, pira...@gmail.com pira...@gmail.comwrote:

 What you are asking for could be fixed with redirects, that it's the
 HTTP equivalent of filesystems symbolic links :-)


Is your suggestion that Google consolidate all its domains into one?

These are widely separated servers (internet wise) with widely separated
teams with widely separated schedules.

In addition to different teams/schedules separate domains are important to
internet load balancing:
   * Search, www.google.com, gets around 2 trillion searches per day [1].
   * During Christmas YouTube got 1.6 million requests per second [2]
   * GMail has nearly 1/2 a billion active users per month [3]

Do you really want that redirected to one domain?

Brian

Notes
[1] http://www.statisticbrain.com/google-searches/
[2] http://www.youtube.com/watch?v=Jq-VMZK1KGk
[3] http://venturebeat.com/2012/06/28/gmail-hotmail-yahoo-email-users/




 2013/10/31 Brian Stell bst...@google.com:
  In Request for feedback: Filesystem API [1] it says This filesystem
 would
  be origin-specific.
 
  This post discusses limited readonly sharing of filesystem resources
 between
  origins.
 
  To improve web site / application performance I'm interested in caching
  static [2] resources (eg, Javascript libraries, common CSS, fonts) in the
  filesystem and accessing them thru persistent URLs.
 
  So, what is the issue?
 
  I'd like to avoid duplication. Consider the following sites: they are all
  from a single organization but have different specific origins;
 * https://mail.google.com/
 * https://plus.google.com/
 * https://sites.google.com/
 * ...
 
  At google there are *dozens* of these origins [3]. Even within a single
 page
  there are iframes from different origins. (There are other things that
 lead
  to different origins but for this post I'm ignoring them [4].)
 
  There could be *dozens* of copies of exactly the same a Javascript
 library,
  shared CSS, or web font in the FileSystem.
 
  What I'm suggesting is:
 * a filesystem's persistent URLs by default be read/write only for the
  same origin
 * the origin be able to allow other origins to access its files
  (readonly) by persistent URL
 
  I'm not asking-for nor suggesting API file access but others may express
  opinions on this.
 
  Brian Stell
 
 
  PS: Did I somehow miss info on same-origin in the spec [7]?
 
  Notes:
  [1]
 
 http://lists.w3.org/Archives/Public/public-script-coord/2013JulSep/0379.html
  [2] I'm also assuming immutability would be handled similar to
 gstatic.com
  [6] where different versions of a file have a different path/filename;
 eg,
 * V8: http://gstatic.com/fonts/roboto/v8/2UX7WLTfW3W8TclTUvlFyQ.woff
 * V9: http://gstatic.com/fonts/roboto/v9/2UX7WLTfW3W8TclTUvlFyQ.woff
 
  [3] Here are some of Google's origins:
  https://accounts.google.com
  https://blogsearch.google.com
  https://books.google.com
  https://chrome.google.com
  https://cloud.google.com
  https://code.google.com
  https://csi.gstatic.com
  https://developers.google.com
  https://docs.google.com
  https://drive.google.com
  https://earth.google.com
  https://fonts.googleapis.com
  https://groups.google.com
  https://mail.google.com
  https://maps.google.com
  https://news.google.com
  https://www.panoramio.com
  https://picasa.google.com
  https://picasaweb.google.com
  https://play.google.com
  https://productforums.google.com
  https://plus.google.com/
  https://research.google.com
  https://support.google.com
  https://sites.google.com
  https://ssl.gstatic.com
  https://translate.google.com
  https://tables.googlelabs.com
  https://talkgadget.google.com
  https://themes.googleusercontent.com/
  https://www.blogger.com
  https://www.google.com
  https://www.gstatic.com
  https://www.orcut.com
  https://www.youtube.com
 
  My guess is that there are more.
 
  I believe the XXX.blogspot.com origins belong to Google but I'm not an
  authority on this.
 
  [4] These are also different top level domains:
 * https://www.google.nl
 * https://www.google.co.jp
 
  Wikipedia lists about 200 of these [5] but since users tend to stick to
 one
  I'm ignoring them for this posting.
 
  I'm also ignoring http vs https (eg, http://www.google.com) and
 with/without
  leading www (eg, https://google.com) since they redirect.
 
  [5] http://en.wikipedia.org/wiki/List_of_Google_domains
  [6] http://wiki.answers.com/Q/What_is_gstatic
  [7] http://w3c.github.io/filesystem-api/Overview.html



 --
 Si quieres viajar alrededor del mundo y ser invitado a hablar en un
 monton de sitios diferentes, simplemente escribe un sistema operativo
 Unix.
 – Linus Tordvals, creador del sistema operativo Linux



Re: Polished FileSystem API proposal

2013-11-05 Thread Brian Stell
I like Git's model :-)

This would de-dup the file storage but won't it require downloading it for
every domain (when the data is not lingering in HTTP cache)?




On Tue, Nov 5, 2013 at 11:45 AM, Tim Caswell t...@creationix.com wrote:

 If the backend implementation used something like git's data store then
 duplicate data would automatically be stored only once without any security
 implications.  The keys are the literal sha1 of the values.  If two
 websites had the same file tree containing the same files, it would be the
 same tree object in the storage.  But only sites who have a reference to
 the hash would have access to it.

 Also I like the level of fs support that git's filesystem has.  There are
 trees, files, executable files, and symlinks. (there are also gitlinks used
 for submodules, but let's ignore those for now)


 On Tue, Nov 5, 2013 at 12:19 PM, Anne van Kesteren ann...@annevk.nlwrote:

 On Thu, Oct 31, 2013 at 2:12 AM, Brian Stell bst...@google.com wrote:
  There could be *dozens* of copies of exactly the same a Javascript
 library,
  shared CSS, or web font in the FileSystem.

 Check out the cache part of
 https://github.com/slightlyoff/ServiceWorker/ Combined with a smart
 implementation that will do exactly what you want. And avoid all the
 issues of an actual cross-origin file system API.


 --
 http://annevankesteren.nl/





Re: Polished FileSystem API proposal

2013-10-30 Thread Brian Stell
Good points! I was thinking of the logical functioning and hadn't
considered the implementation. My understanding is that the UA will map
from the filename to an actual file using some kind of database. My
assumption was the logical idea of a link would happen in that layer.


On Wed, Oct 30, 2013 at 1:14 AM, pira...@gmail.com pira...@gmail.comwrote:

 +1 to symbolic links, they have almost the same functionality that hard
 links and are more secure and flexible (they are usually just plain text
 files...).
 El 30/10/2013 01:42, Brendan Eich bren...@mozilla.com escribió:

 Hard links are peculiar to Unix filesystems. Not interoperable across all
 OSes. Symbolic links, OTOH...

 /be

  Brian Stell mailto:bst...@google.com
 October 29, 2013 4:53 PM
 I meant

eg, V1/dir1/file1, V2/dir1/file1.





Re: Polished FileSystem API proposal

2013-10-29 Thread Brian Stell
I meant

   eg, V1/dir1/file1, V2/dir1/file1.


Re: Polished FileSystem API proposal

2013-10-28 Thread Brian Stell
Hi Jonas,

I notice that one of the common Linux file APIs, link[1], is not in you
API. I don't see this as a first pass requirement but I certainly expect
that applications will want to be able to have trees that represent a
particular version of their immutable files; eg, V1/dir1/file1,
V1/dir1/file1. It would possible to copy the unchanged files but that would
double the storage size.

Could you kindly share your thoughts on having a link API?

Thanks,

Brian Stell


Notes:
[1] http://linux.die.net/man/2/link