Re: [whatwg] Web Documents off the Web (was Web Archives)

2008-05-13 Thread Ian Hickson
On Mon, 16 Apr 2007, Tyler Keating wrote:
 
 Imagine this:  An HTML based document ZIP compressed into a single file could
 be uploaded as is to the server.  Clicking on a link to the file would
 probably download, decompress and open the file in the browser seamlessly and,
 even better, right-clicking on the link instead and choosing Download Linked
 File would download the same nice small single file.**

MHTML with a gzip transfer encoding seems like it would do this pretty 
nicely already, no?


On Mon, 16 Apr 2007, Maciej Stachowiak wrote:
 
 A cross-browser web archive format sounds like a useful thing. However, 
 I don't think it should be part of or even tied to the HTML spec. In 
 principle, such an archive could contain any browser-viewable content as 
 the root document. This could be HTML, XHTML, SVG, generic XML, plain 
 text, a raster image, or any number of other things. So such an archive 
 format is logically a separate layer and should be specced as such.

Indeed, this would belong in another specification.


On Tue, 17 Apr 2007, Jon Barnett wrote:
 
 What place does HTML5 have in specifying one of these options as a 
 standard archive format?  Any?  A non-normative section on archives?

I don't think we really need to say anything in the spec -- it's a 
specification, not a position paper.


There was much discussion about this topic, but given that I think this is 
out of scope for HTML5 (and nobody seems to particularly disagree), I 
haven't responded. Let me know if I missed something that deserved a reply 
despite the foregoing.

Cheers,
-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Web Documents off the Web (was Web Archives)

2008-05-13 Thread David Gerard
2008/5/13 Ian Hickson [EMAIL PROTECTED]:

  MHTML with a gzip transfer encoding seems like it would do this pretty
  nicely already, no?
  Indeed, this would belong in another specification.


Yeah, sounds like something for the HTTP layer - what the user-agent
will accept.


- d.


Re: [whatwg] Web Documents off the Web (was Web Archives)

2007-05-05 Thread Maciej Stachowiak


On May 5, 2007, at 10:27 AM, Ben Ward wrote:


On 16 Apr 2007, at 22:03, Maciej Stachowiak wrote:

A cross-browser web archive format sounds like a useful thing


From a purely practical perspective, surely support for the data:  
URI format solves this problem? The user-agent's ‘Save as Web  
Archive’ function would encode each external resource and replace  
each external src= and href= target with a data: URI.


This creates a single file web page, with all resources, that can  
be opened in any other user agent that supports data.


Is there a strong need for a more complex format than that?


This doesn't address resource load requests made from JavaScript.  
(For example: setting img.src, setting iframe.src, making an  
XMLHttpRequest, using new Image(), using location.replace(), etc). In  
general it might not even be possible to transform the code to change  
URI references in JavaScript, since they may be concatenated from  
multiple strings or otherwise computed at runtime.


data: is also a pretty inefficient encoding for large chunks of  
binary data like images and videos.


Regards,
Maciej



Re: [whatwg] Web Documents off the Web (was Web Archives)

2007-04-23 Thread Jonas Sicking

Tyler Keating wrote:


On 16-Apr-07, at 3:03 PM, Maciej Stachowiak wrote:



On Apr 16, 2007, at 1:39 PM, Tyler Keating wrote:


Hi,

I'm bringing this up again with a different tact, because the more 
that I think about it, the more I believe it has the ability to 
significantly change the perception and application of HTML and I 
would really like to keep the discussion alive.  In the previous 
thread, I proposed a standard for archiving web sites into a single 
ZIP archive with a unique file extension and although it didn't get 
any outright negative feedback, it didn't drum up too much excitement 
either.  If you can bear with me, I'd like to describe the idea again 
in a slightly different light.


A cross-browser web archive format sounds like a useful thing. 
However, I don't think it should be part of or even tied to the HTML 
spec. In principle, such an archive could contain any browser-viewable 
content as the root document. This could be HTML, XHTML, SVG, generic 
XML, plain text, a raster image, or any number of other things. So 
such an archive format is logically a separate layer and should be 
specced as such.


Okay.  I understand it now...  Thank you, you are right.  Before I get 
out of here, whom do I bring this to instead?  I'm guessing it needs to 
be the W3C Web Application Formats WG, but I'd like validation before I 
start bugging them (if that's even possible).


I think this would be a good list, it just wouldn't be part of the 
webapps (html5) spec, but it would be a new whatwg spec.


I think a lot of work has been done in this area though, so you should 
research what's out there. We talked a bit about this for firefox 3, but 
I'm not sure what the latest word is with regards to if it's still in 
the plan or not.


Apples and operas widget formats would be a place to start looking. I 
think IE had some format as well. I know there are other things out 
there as well, but I can't remember them off the top of my head.


In any event, like Maciej, I think it would be great to have a cross 
browser format for this stuff.


/ Jonas


Re: [whatwg] Web Documents off the Web (was Web Archives)

2007-04-23 Thread Dave Singer

At 15:45  -0700 23/04/07, Jonas Sicking wrote:


In any event, like Maciej, I think it would be great to have a cross 
browser format for this stuff.




Yes.  But to be clear, I think widgets and web archives are or may be 
slightly different.


A widget package is a distribution package, I think.

A web archive is trying to say this is what you would have 
experienced (or did experience) when you accessed this.  This might 
involve capturing 'transient' information, such as URLs, and 
information in or derived from HTTP headers etc.


I'm the editor of the media file format specs at MPEG and help at 3G 
etc.  The base file format on which MP4 and 3G and 3G2 are based 
recently introduced a packaging ability, where each item can be 
separately stored, named, protected, compressed (or even located 
outside the main file defining the package).  It also identifies one 
of the items as the primary one (the main entry point), which I know 
has been an issue with 'folder' formats like ZIP etc.


I could say more if people are interested.
--
David Singer
Apple Computer/QuickTime


Re: [whatwg] Web Documents off the Web (was Web Archives)

2007-04-23 Thread Jonas Sicking

Dave Singer wrote:

At 15:45  -0700 23/04/07, Jonas Sicking wrote:


In any event, like Maciej, I think it would be great to have a cross 
browser format for this stuff.




Yes.  But to be clear, I think widgets and web archives are or may be 
slightly different.


A widget package is a distribution package, I think.


Hmm.. the difference is quite small. Would there really be any 
difference if you added meta information about what URI and maybe what 
timestamps the files were fetched at to the widget distribution format?


I guess the implementation could be quite different. One way of doing 
archiving would be to store all files exactly as they came from the wire 
in a container (zip), and then include information that map uri to filename.


Whereas if you were to use the widget format you would be required to 
modify the downloaded files so that external references pointed directly 
to the other files in the container.


/ Jonas


Re: [whatwg] Web Documents off the Web (was Web Archives)

2007-04-17 Thread Thomas Broyer

2007/4/17, Jon Barnett:


The main gripe about [MHTML] was that binary data is base64 encoded,
which adds size to the file in the end.


And which is a wrong assumption.
Binary data can be sent with Content-Transfer-Encoding: binary.


zipping the final MHTML file could help with size.


I hope you're talking about GZip or BZip2, not application/zip…

--
Thomas Broyer


Re: [whatwg] Web Documents off the Web (was Web Archives)

2007-04-17 Thread Michael A. Puls II

On 4/17/07, Thomas Broyer [EMAIL PROTECTED] wrote:

2007/4/17, Jon Barnett:

 The main gripe about [MHTML] was that binary data is base64 encoded,
 which adds size to the file in the end.

And which is a wrong assumption.
Binary data can be sent with Content-Transfer-Encoding: binary.


True.

The problem is the current browser support for .mht and support for
generating/loading .mht files with binary attachments.

That could possibly be fixed though.

--
Michael


Re: [whatwg] Web Documents off the Web (was Web Archives)

2007-04-17 Thread Kristof Zelechovski
The method for reading Web pages off line is subscription, not downloading.
Your browser should support subscription.  Enable it for your favorite pages
and you are done.
Chris

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Stefan Haustein
Sent: Monday, April 16, 2007 11:39 PM
To: Tyler Keating
Cc: [EMAIL PROTECTED]
Subject: Re: [whatwg] Web Documents off the Web (was Web Archives)

Hi Tyler,

I like the idea very much, for instance for having a copy of the CSS
spec on my laptop without the need of an Internet connection while
commuting

- When I save a page with Safari, Firefox cannot read it.

- When saving stuff with Firefox, I have to deal with both, the HTML
file and the resource folder

- It would be easy to write a nice wget-like utility that creates a
single file.

- A zip based format is still usable for browsers that do not support
it, one would just needs to unpack the file.

Best regards,
Stefan Haustein




 



Re: [whatwg] Web Documents off the Web (was Web Archives)

2007-04-17 Thread Jon Barnett

On 4/17/07, Thomas Broyer [EMAIL PROTECTED] wrote:



I hope you're talking about GZip or BZip2, not application/zip…



Doesn't matter to me - I just figure some sort of compression would help,
and it would probably help if that compression was supported by browsers, so
gzip sounds right.

The problem is the current browser support for .mht and support for

generating/loading .mht files with binary attachments.



Which appears to be halfway there in the major browsers.

The method for reading Web pages off line is subscription, not downloading.

Your browser should support subscription.  Enable it for your favorite
pages
and you are done.



Maybe in your browser, but not store on disk apart from your browser, and
not to transfer to someone else (via email, web download, p2p) as a
self-contained document (e.g. a powerpoint-style presentation)

.mht looks good because it can retain original URLs of online resources,
it's fairly human readable and debuggable, and it already has a standard and
some support.  An HTML document can reference its external parts (images,
css) via either cid: URIs or the original HTTP URL as long as all the right
Content-Location headers are present.

a single compressed file (.zip?) looks good because of the size and how
easily it can be unpacked and used with a browser that doesn't natively
support the single compressed file.  I don't know what URI scheme an HTML
document would use to reference images and CSS.

The only other thing I can think of is an HTML document that uses data: URIs
to reference its external parts (e.g. a CSS file) which also use data URIs
to reference their external parts (e.g. background images).

What place does HTML5 have in specifying one of these options as a standard
archive format?  Any?  A non-normative section on archives?


[whatwg] Web Documents off the Web (was Web Archives)

2007-04-16 Thread Tyler Keating

Hi,

I'm bringing this up again with a different tact, because the more  
that I think about it, the more I believe it has the ability to  
significantly change the perception and application of HTML and I  
would really like to keep the discussion alive.  In the previous  
thread, I proposed a standard for archiving web sites into a single  
ZIP archive with a unique file extension and although it didn't get  
any outright negative feedback, it didn't drum up too much excitement  
either.  If you can bear with me, I'd like to describe the idea again  
in a slightly different light.


Take for example, web-based presentations vs. PowerPoint from an  
average user's point-of-view.  I can create an incredibly dynamic  
presentation based on HTML, JavaScript, CSS, SVG, etc., but I can't  
easily share it with anyone unless it is served (I can't easily send  
it to them).  On the other hand, I can create an incredibly dynamic  
presentation using PowerPoint, but I can't easily share it with  
anyone unless I send them the file and they also have PowerPoint (I  
can't easily serve it).*


For another example, which relates to my modest experience, I've  
created a simple Quotes/Sales/Invoices web app for a friend and have  
come across similar issues trying to resolve the served file model  
with the local file model.  Without going into too much detail,  
assume that there is sufficient reason why a file copy of the web  
page is needed (in this case because my friend's customers can't use  
the app directly).  How should the user get copies of web documents  
to be sent or saved to disk?  Instead of describing all of the  
various options of saving it to some kind of browser proprietary  
archive, sending HTML email, creating an HTML-to-PDF converter or  
some other time-consuming non-user friendly method, let's look at an  
ideal solution.


Imagine this:  An HTML based document ZIP compressed into a single  
file could be uploaded as is to the server.  Clicking on a link to  
the file would probably download, decompress and open the file in the  
browser seamlessly and, even better, right-clicking on the link  
instead and choosing Download Linked File would download the same  
nice small single file.**  Double clicking that file would open it in  
any browser identically as to the served version.  The identical  
format and behaviour of the web document and the file document  
presents the best user experience.  Instead of saving a  
representation of the web document, you are saving THE web document.


The question is, why do we only think of HTML with respect to the web  
and why are HTML-based documents constrained to being served?  This  
is the meat of my argument.  Browsers have no issue opening a file  
URI, but humans have an issue dealing with a directory of .html files  
versus, say, a single .ppt file.  Humans will soon also have issues  
viewing and serving ODF and OOXML files, I might add, but still won't  
have issues viewing and serving HTML files.  After the little bit of  
discussion from the first thread, I believe that the solution is  
indeed a near clone and more complete version of the Widgets 1.0  
specification ( http://www.w3.org/TR/WAPF-REQ/ ) as something  
different and as part of HTML, specifying how to package entire web  
documents as zip compressed archives using a unique file extension.   
In reality, compared to all of the other work being done on HTML, I  
believe this would be very simple to specify and should be very  
simple to implement.


Please give this some thought.  I appreciate your comments.


Tyler Keating
CEO Concept Digital Inc.  -- don't be impressed, it's just me


* I could export an HTML version to be served, but I can't share both  
ways with the same file and this means I have two versions of the  
same presentation to work with.  Again, the average user (my mom)  
isn't going to be serving files created on their desktop any time too  
soon, since she has just barely grasped email attachments.
** Containing any number of HTML, XHTML, CSS, image or other files  
inside of it invisible to the average user.




Re: [whatwg] Web Documents off the Web (was Web Archives)

2007-04-16 Thread Maciej Stachowiak


On Apr 16, 2007, at 1:39 PM, Tyler Keating wrote:


Hi,

I'm bringing this up again with a different tact, because the more  
that I think about it, the more I believe it has the ability to  
significantly change the perception and application of HTML and I  
would really like to keep the discussion alive.  In the previous  
thread, I proposed a standard for archiving web sites into a single  
ZIP archive with a unique file extension and although it didn't get  
any outright negative feedback, it didn't drum up too much  
excitement either.  If you can bear with me, I'd like to describe  
the idea again in a slightly different light.


A cross-browser web archive format sounds like a useful thing.  
However, I don't think it should be part of or even tied to the HTML  
spec. In principle, such an archive could contain any browser- 
viewable content as the root document. This could be HTML, XHTML,  
SVG, generic XML, plain text, a raster image, or any number of other  
things. So such an archive format is logically a separate layer and  
should be specced as such.


Regards,
Maciej





Re: [whatwg] Web Documents off the Web (was Web Archives)

2007-04-16 Thread Tyler Keating


On 16-Apr-07, at 3:03 PM, Maciej Stachowiak wrote:



On Apr 16, 2007, at 1:39 PM, Tyler Keating wrote:


Hi,

I'm bringing this up again with a different tact, because the more  
that I think about it, the more I believe it has the ability to  
significantly change the perception and application of HTML and I  
would really like to keep the discussion alive.  In the previous  
thread, I proposed a standard for archiving web sites into a  
single ZIP archive with a unique file extension and although it  
didn't get any outright negative feedback, it didn't drum up too  
much excitement either.  If you can bear with me, I'd like to  
describe the idea again in a slightly different light.


A cross-browser web archive format sounds like a useful thing.  
However, I don't think it should be part of or even tied to the  
HTML spec. In principle, such an archive could contain any browser- 
viewable content as the root document. This could be HTML, XHTML,  
SVG, generic XML, plain text, a raster image, or any number of  
other things. So such an archive format is logically a separate  
layer and should be specced as such.


Okay.  I understand it now...  Thank you, you are right.  Before I  
get out of here, whom do I bring this to instead?  I'm guessing it  
needs to be the W3C Web Application Formats WG, but I'd like  
validation before I start bugging them (if that's even possible).


Thanks,
- Tyler


Re: [whatwg] Web Documents off the Web (was Web Archives)

2007-04-16 Thread Stefan Haustein
Hi Tyler,

I like the idea very much, for instance for having a copy of the CSS
spec on my laptop without the need of an Internet connection while
commuting

- When I save a page with Safari, Firefox cannot read it.

- When saving stuff with Firefox, I have to deal with both, the HTML
file and the resource folder

- It would be easy to write a nice wget-like utility that creates a
single file.

- A zip based format is still usable for browsers that do not support
it, one would just needs to unpack the file.

Best regards,
Stefan Haustein




Tyler Keating wrote:
 Hi,
 
 I'm bringing this up again with a different tact, because the more that
 I think about it, the more I believe it has the ability to significantly
 change the perception and application of HTML and I would really like to
 keep the discussion alive.  In the previous thread, I proposed a
 standard for archiving web sites into a single ZIP archive with a unique
 file extension and although it didn't get any outright negative
 feedback, it didn't drum up too much excitement either.  If you can bear
 with me, I'd like to describe the idea again in a slightly different light.
 
 Take for example, web-based presentations vs. PowerPoint from an average
 user's point-of-view.  I can create an incredibly dynamic presentation
 based on HTML, JavaScript, CSS, SVG, etc., but I can't easily share it
 with anyone unless it is served (I can't easily send it to them).  On
 the other hand, I can create an incredibly dynamic presentation using
 PowerPoint, but I can't easily share it with anyone unless I send them
 the file and they also have PowerPoint (I can't easily serve it).*
 
 For another example, which relates to my modest experience, I've created
 a simple Quotes/Sales/Invoices web app for a friend and have come across
 similar issues trying to resolve the served file model with the local
 file model.  Without going into too much detail, assume that there is
 sufficient reason why a file copy of the web page is needed (in this
 case because my friend's customers can't use the app directly).  How
 should the user get copies of web documents to be sent or saved to
 disk?  Instead of describing all of the various options of saving it to
 some kind of browser proprietary archive, sending HTML email, creating
 an HTML-to-PDF converter or some other time-consuming non-user friendly
 method, let's look at an ideal solution.
 
 Imagine this:  An HTML based document ZIP compressed into a single file
 could be uploaded as is to the server.  Clicking on a link to the file
 would probably download, decompress and open the file in the browser
 seamlessly and, even better, right-clicking on the link instead and
 choosing Download Linked File would download the same nice small
 single file.**  Double clicking that file would open it in any browser
 identically as to the served version.  The identical format and
 behaviour of the web document and the file document presents the best
 user experience.  Instead of saving a representation of the web
 document, you are saving THE web document.
 
 The question is, why do we only think of HTML with respect to the web
 and why are HTML-based documents constrained to being served?  This is
 the meat of my argument.  Browsers have no issue opening a file URI, but
 humans have an issue dealing with a directory of .html files versus,
 say, a single .ppt file.  Humans will soon also have issues viewing and
 serving ODF and OOXML files, I might add, but still won't have issues
 viewing and serving HTML files.  After the little bit of discussion from
 the first thread, I believe that the solution is indeed a near clone and
 more complete version of the Widgets 1.0 specification (
 http://www.w3.org/TR/WAPF-REQ/ ) as something different and as part of
 HTML, specifying how to package entire web documents as zip compressed
 archives using a unique file extension.  In reality, compared to all of
 the other work being done on HTML, I believe this would be very simple
 to specify and should be very simple to implement.
 
 Please give this some thought.  I appreciate your comments.
 
 
 Tyler Keating
 CEO Concept Digital Inc.  -- don't be impressed, it's just me
 
 
 * I could export an HTML version to be served, but I can't share both
 ways with the same file and this means I have two versions of the same
 presentation to work with.  Again, the average user (my mom) isn't going
 to be serving files created on their desktop any time too soon, since
 she has just barely grasped email attachments.
 ** Containing any number of HTML, XHTML, CSS, image or other files
 inside of it invisible to the average user.
 



Re: [whatwg] Web Documents off the Web (was Web Archives)

2007-04-16 Thread Jon Barnett

On 4/16/07, Jon Barnett [EMAIL PROTECTED] wrote:


RFC 2557 was mentioned in the last thread.
http://tools.ietf.org/html/rfc2557

After reading it in detail (and indeed writing a script to send HTML with
inline images as attachments), I quite like it.  It's simple and obvious
enough and allows for a fallback to a real internet URL if a corresponding
URL exists.

The main gripe about it was that binary data is base64 encoded, which adds
size to the file in the end.

A couple benefits to MHTML over ZIP are that HTTP headers are preserved
and that the Content-Location header can directly associate a resource with
it's Internet-hosted version, removing the need to change all the URLs
(absolute or relative) in a document (and related documents, such as CSS
files) to make it usable offline.

zipping the final MHTML file could help with size.

Considering that there's already a standard, the trick is getting browsers
to support it.
http://en.wikipedia.org/wiki/MHTML

That pages tells a lot about what can save as MHTML but not enough about
what can open and read MHTML.





--
Jon Barnett