Re: [whatwg] Web Documents off the Web (was Web Archives)
On Mon, 16 Apr 2007, Tyler Keating wrote: Imagine this: An HTML based document ZIP compressed into a single file could be uploaded as is to the server. Clicking on a link to the file would probably download, decompress and open the file in the browser seamlessly and, even better, right-clicking on the link instead and choosing Download Linked File would download the same nice small single file.** MHTML with a gzip transfer encoding seems like it would do this pretty nicely already, no? On Mon, 16 Apr 2007, Maciej Stachowiak wrote: A cross-browser web archive format sounds like a useful thing. However, I don't think it should be part of or even tied to the HTML spec. In principle, such an archive could contain any browser-viewable content as the root document. This could be HTML, XHTML, SVG, generic XML, plain text, a raster image, or any number of other things. So such an archive format is logically a separate layer and should be specced as such. Indeed, this would belong in another specification. On Tue, 17 Apr 2007, Jon Barnett wrote: What place does HTML5 have in specifying one of these options as a standard archive format? Any? A non-normative section on archives? I don't think we really need to say anything in the spec -- it's a specification, not a position paper. There was much discussion about this topic, but given that I think this is out of scope for HTML5 (and nobody seems to particularly disagree), I haven't responded. Let me know if I missed something that deserved a reply despite the foregoing. Cheers, -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Web Documents off the Web (was Web Archives)
2008/5/13 Ian Hickson [EMAIL PROTECTED]: MHTML with a gzip transfer encoding seems like it would do this pretty nicely already, no? Indeed, this would belong in another specification. Yeah, sounds like something for the HTTP layer - what the user-agent will accept. - d.
Re: [whatwg] Web Documents off the Web (was Web Archives)
On May 5, 2007, at 10:27 AM, Ben Ward wrote: On 16 Apr 2007, at 22:03, Maciej Stachowiak wrote: A cross-browser web archive format sounds like a useful thing From a purely practical perspective, surely support for the data: URI format solves this problem? The user-agent's ‘Save as Web Archive’ function would encode each external resource and replace each external src= and href= target with a data: URI. This creates a single file web page, with all resources, that can be opened in any other user agent that supports data. Is there a strong need for a more complex format than that? This doesn't address resource load requests made from JavaScript. (For example: setting img.src, setting iframe.src, making an XMLHttpRequest, using new Image(), using location.replace(), etc). In general it might not even be possible to transform the code to change URI references in JavaScript, since they may be concatenated from multiple strings or otherwise computed at runtime. data: is also a pretty inefficient encoding for large chunks of binary data like images and videos. Regards, Maciej
Re: [whatwg] Web Documents off the Web (was Web Archives)
Tyler Keating wrote: On 16-Apr-07, at 3:03 PM, Maciej Stachowiak wrote: On Apr 16, 2007, at 1:39 PM, Tyler Keating wrote: Hi, I'm bringing this up again with a different tact, because the more that I think about it, the more I believe it has the ability to significantly change the perception and application of HTML and I would really like to keep the discussion alive. In the previous thread, I proposed a standard for archiving web sites into a single ZIP archive with a unique file extension and although it didn't get any outright negative feedback, it didn't drum up too much excitement either. If you can bear with me, I'd like to describe the idea again in a slightly different light. A cross-browser web archive format sounds like a useful thing. However, I don't think it should be part of or even tied to the HTML spec. In principle, such an archive could contain any browser-viewable content as the root document. This could be HTML, XHTML, SVG, generic XML, plain text, a raster image, or any number of other things. So such an archive format is logically a separate layer and should be specced as such. Okay. I understand it now... Thank you, you are right. Before I get out of here, whom do I bring this to instead? I'm guessing it needs to be the W3C Web Application Formats WG, but I'd like validation before I start bugging them (if that's even possible). I think this would be a good list, it just wouldn't be part of the webapps (html5) spec, but it would be a new whatwg spec. I think a lot of work has been done in this area though, so you should research what's out there. We talked a bit about this for firefox 3, but I'm not sure what the latest word is with regards to if it's still in the plan or not. Apples and operas widget formats would be a place to start looking. I think IE had some format as well. I know there are other things out there as well, but I can't remember them off the top of my head. In any event, like Maciej, I think it would be great to have a cross browser format for this stuff. / Jonas
Re: [whatwg] Web Documents off the Web (was Web Archives)
At 15:45 -0700 23/04/07, Jonas Sicking wrote: In any event, like Maciej, I think it would be great to have a cross browser format for this stuff. Yes. But to be clear, I think widgets and web archives are or may be slightly different. A widget package is a distribution package, I think. A web archive is trying to say this is what you would have experienced (or did experience) when you accessed this. This might involve capturing 'transient' information, such as URLs, and information in or derived from HTTP headers etc. I'm the editor of the media file format specs at MPEG and help at 3G etc. The base file format on which MP4 and 3G and 3G2 are based recently introduced a packaging ability, where each item can be separately stored, named, protected, compressed (or even located outside the main file defining the package). It also identifies one of the items as the primary one (the main entry point), which I know has been an issue with 'folder' formats like ZIP etc. I could say more if people are interested. -- David Singer Apple Computer/QuickTime
Re: [whatwg] Web Documents off the Web (was Web Archives)
Dave Singer wrote: At 15:45 -0700 23/04/07, Jonas Sicking wrote: In any event, like Maciej, I think it would be great to have a cross browser format for this stuff. Yes. But to be clear, I think widgets and web archives are or may be slightly different. A widget package is a distribution package, I think. Hmm.. the difference is quite small. Would there really be any difference if you added meta information about what URI and maybe what timestamps the files were fetched at to the widget distribution format? I guess the implementation could be quite different. One way of doing archiving would be to store all files exactly as they came from the wire in a container (zip), and then include information that map uri to filename. Whereas if you were to use the widget format you would be required to modify the downloaded files so that external references pointed directly to the other files in the container. / Jonas
Re: [whatwg] Web Documents off the Web (was Web Archives)
2007/4/17, Jon Barnett: The main gripe about [MHTML] was that binary data is base64 encoded, which adds size to the file in the end. And which is a wrong assumption. Binary data can be sent with Content-Transfer-Encoding: binary. zipping the final MHTML file could help with size. I hope you're talking about GZip or BZip2, not application/zip… -- Thomas Broyer
Re: [whatwg] Web Documents off the Web (was Web Archives)
On 4/17/07, Thomas Broyer [EMAIL PROTECTED] wrote: 2007/4/17, Jon Barnett: The main gripe about [MHTML] was that binary data is base64 encoded, which adds size to the file in the end. And which is a wrong assumption. Binary data can be sent with Content-Transfer-Encoding: binary. True. The problem is the current browser support for .mht and support for generating/loading .mht files with binary attachments. That could possibly be fixed though. -- Michael
Re: [whatwg] Web Documents off the Web (was Web Archives)
The method for reading Web pages off line is subscription, not downloading. Your browser should support subscription. Enable it for your favorite pages and you are done. Chris -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Stefan Haustein Sent: Monday, April 16, 2007 11:39 PM To: Tyler Keating Cc: [EMAIL PROTECTED] Subject: Re: [whatwg] Web Documents off the Web (was Web Archives) Hi Tyler, I like the idea very much, for instance for having a copy of the CSS spec on my laptop without the need of an Internet connection while commuting - When I save a page with Safari, Firefox cannot read it. - When saving stuff with Firefox, I have to deal with both, the HTML file and the resource folder - It would be easy to write a nice wget-like utility that creates a single file. - A zip based format is still usable for browsers that do not support it, one would just needs to unpack the file. Best regards, Stefan Haustein
Re: [whatwg] Web Documents off the Web (was Web Archives)
On 4/17/07, Thomas Broyer [EMAIL PROTECTED] wrote: I hope you're talking about GZip or BZip2, not application/zip… Doesn't matter to me - I just figure some sort of compression would help, and it would probably help if that compression was supported by browsers, so gzip sounds right. The problem is the current browser support for .mht and support for generating/loading .mht files with binary attachments. Which appears to be halfway there in the major browsers. The method for reading Web pages off line is subscription, not downloading. Your browser should support subscription. Enable it for your favorite pages and you are done. Maybe in your browser, but not store on disk apart from your browser, and not to transfer to someone else (via email, web download, p2p) as a self-contained document (e.g. a powerpoint-style presentation) .mht looks good because it can retain original URLs of online resources, it's fairly human readable and debuggable, and it already has a standard and some support. An HTML document can reference its external parts (images, css) via either cid: URIs or the original HTTP URL as long as all the right Content-Location headers are present. a single compressed file (.zip?) looks good because of the size and how easily it can be unpacked and used with a browser that doesn't natively support the single compressed file. I don't know what URI scheme an HTML document would use to reference images and CSS. The only other thing I can think of is an HTML document that uses data: URIs to reference its external parts (e.g. a CSS file) which also use data URIs to reference their external parts (e.g. background images). What place does HTML5 have in specifying one of these options as a standard archive format? Any? A non-normative section on archives?
[whatwg] Web Documents off the Web (was Web Archives)
Hi, I'm bringing this up again with a different tact, because the more that I think about it, the more I believe it has the ability to significantly change the perception and application of HTML and I would really like to keep the discussion alive. In the previous thread, I proposed a standard for archiving web sites into a single ZIP archive with a unique file extension and although it didn't get any outright negative feedback, it didn't drum up too much excitement either. If you can bear with me, I'd like to describe the idea again in a slightly different light. Take for example, web-based presentations vs. PowerPoint from an average user's point-of-view. I can create an incredibly dynamic presentation based on HTML, JavaScript, CSS, SVG, etc., but I can't easily share it with anyone unless it is served (I can't easily send it to them). On the other hand, I can create an incredibly dynamic presentation using PowerPoint, but I can't easily share it with anyone unless I send them the file and they also have PowerPoint (I can't easily serve it).* For another example, which relates to my modest experience, I've created a simple Quotes/Sales/Invoices web app for a friend and have come across similar issues trying to resolve the served file model with the local file model. Without going into too much detail, assume that there is sufficient reason why a file copy of the web page is needed (in this case because my friend's customers can't use the app directly). How should the user get copies of web documents to be sent or saved to disk? Instead of describing all of the various options of saving it to some kind of browser proprietary archive, sending HTML email, creating an HTML-to-PDF converter or some other time-consuming non-user friendly method, let's look at an ideal solution. Imagine this: An HTML based document ZIP compressed into a single file could be uploaded as is to the server. Clicking on a link to the file would probably download, decompress and open the file in the browser seamlessly and, even better, right-clicking on the link instead and choosing Download Linked File would download the same nice small single file.** Double clicking that file would open it in any browser identically as to the served version. The identical format and behaviour of the web document and the file document presents the best user experience. Instead of saving a representation of the web document, you are saving THE web document. The question is, why do we only think of HTML with respect to the web and why are HTML-based documents constrained to being served? This is the meat of my argument. Browsers have no issue opening a file URI, but humans have an issue dealing with a directory of .html files versus, say, a single .ppt file. Humans will soon also have issues viewing and serving ODF and OOXML files, I might add, but still won't have issues viewing and serving HTML files. After the little bit of discussion from the first thread, I believe that the solution is indeed a near clone and more complete version of the Widgets 1.0 specification ( http://www.w3.org/TR/WAPF-REQ/ ) as something different and as part of HTML, specifying how to package entire web documents as zip compressed archives using a unique file extension. In reality, compared to all of the other work being done on HTML, I believe this would be very simple to specify and should be very simple to implement. Please give this some thought. I appreciate your comments. Tyler Keating CEO Concept Digital Inc. -- don't be impressed, it's just me * I could export an HTML version to be served, but I can't share both ways with the same file and this means I have two versions of the same presentation to work with. Again, the average user (my mom) isn't going to be serving files created on their desktop any time too soon, since she has just barely grasped email attachments. ** Containing any number of HTML, XHTML, CSS, image or other files inside of it invisible to the average user.
Re: [whatwg] Web Documents off the Web (was Web Archives)
On Apr 16, 2007, at 1:39 PM, Tyler Keating wrote: Hi, I'm bringing this up again with a different tact, because the more that I think about it, the more I believe it has the ability to significantly change the perception and application of HTML and I would really like to keep the discussion alive. In the previous thread, I proposed a standard for archiving web sites into a single ZIP archive with a unique file extension and although it didn't get any outright negative feedback, it didn't drum up too much excitement either. If you can bear with me, I'd like to describe the idea again in a slightly different light. A cross-browser web archive format sounds like a useful thing. However, I don't think it should be part of or even tied to the HTML spec. In principle, such an archive could contain any browser- viewable content as the root document. This could be HTML, XHTML, SVG, generic XML, plain text, a raster image, or any number of other things. So such an archive format is logically a separate layer and should be specced as such. Regards, Maciej
Re: [whatwg] Web Documents off the Web (was Web Archives)
On 16-Apr-07, at 3:03 PM, Maciej Stachowiak wrote: On Apr 16, 2007, at 1:39 PM, Tyler Keating wrote: Hi, I'm bringing this up again with a different tact, because the more that I think about it, the more I believe it has the ability to significantly change the perception and application of HTML and I would really like to keep the discussion alive. In the previous thread, I proposed a standard for archiving web sites into a single ZIP archive with a unique file extension and although it didn't get any outright negative feedback, it didn't drum up too much excitement either. If you can bear with me, I'd like to describe the idea again in a slightly different light. A cross-browser web archive format sounds like a useful thing. However, I don't think it should be part of or even tied to the HTML spec. In principle, such an archive could contain any browser- viewable content as the root document. This could be HTML, XHTML, SVG, generic XML, plain text, a raster image, or any number of other things. So such an archive format is logically a separate layer and should be specced as such. Okay. I understand it now... Thank you, you are right. Before I get out of here, whom do I bring this to instead? I'm guessing it needs to be the W3C Web Application Formats WG, but I'd like validation before I start bugging them (if that's even possible). Thanks, - Tyler
Re: [whatwg] Web Documents off the Web (was Web Archives)
Hi Tyler, I like the idea very much, for instance for having a copy of the CSS spec on my laptop without the need of an Internet connection while commuting - When I save a page with Safari, Firefox cannot read it. - When saving stuff with Firefox, I have to deal with both, the HTML file and the resource folder - It would be easy to write a nice wget-like utility that creates a single file. - A zip based format is still usable for browsers that do not support it, one would just needs to unpack the file. Best regards, Stefan Haustein Tyler Keating wrote: Hi, I'm bringing this up again with a different tact, because the more that I think about it, the more I believe it has the ability to significantly change the perception and application of HTML and I would really like to keep the discussion alive. In the previous thread, I proposed a standard for archiving web sites into a single ZIP archive with a unique file extension and although it didn't get any outright negative feedback, it didn't drum up too much excitement either. If you can bear with me, I'd like to describe the idea again in a slightly different light. Take for example, web-based presentations vs. PowerPoint from an average user's point-of-view. I can create an incredibly dynamic presentation based on HTML, JavaScript, CSS, SVG, etc., but I can't easily share it with anyone unless it is served (I can't easily send it to them). On the other hand, I can create an incredibly dynamic presentation using PowerPoint, but I can't easily share it with anyone unless I send them the file and they also have PowerPoint (I can't easily serve it).* For another example, which relates to my modest experience, I've created a simple Quotes/Sales/Invoices web app for a friend and have come across similar issues trying to resolve the served file model with the local file model. Without going into too much detail, assume that there is sufficient reason why a file copy of the web page is needed (in this case because my friend's customers can't use the app directly). How should the user get copies of web documents to be sent or saved to disk? Instead of describing all of the various options of saving it to some kind of browser proprietary archive, sending HTML email, creating an HTML-to-PDF converter or some other time-consuming non-user friendly method, let's look at an ideal solution. Imagine this: An HTML based document ZIP compressed into a single file could be uploaded as is to the server. Clicking on a link to the file would probably download, decompress and open the file in the browser seamlessly and, even better, right-clicking on the link instead and choosing Download Linked File would download the same nice small single file.** Double clicking that file would open it in any browser identically as to the served version. The identical format and behaviour of the web document and the file document presents the best user experience. Instead of saving a representation of the web document, you are saving THE web document. The question is, why do we only think of HTML with respect to the web and why are HTML-based documents constrained to being served? This is the meat of my argument. Browsers have no issue opening a file URI, but humans have an issue dealing with a directory of .html files versus, say, a single .ppt file. Humans will soon also have issues viewing and serving ODF and OOXML files, I might add, but still won't have issues viewing and serving HTML files. After the little bit of discussion from the first thread, I believe that the solution is indeed a near clone and more complete version of the Widgets 1.0 specification ( http://www.w3.org/TR/WAPF-REQ/ ) as something different and as part of HTML, specifying how to package entire web documents as zip compressed archives using a unique file extension. In reality, compared to all of the other work being done on HTML, I believe this would be very simple to specify and should be very simple to implement. Please give this some thought. I appreciate your comments. Tyler Keating CEO Concept Digital Inc. -- don't be impressed, it's just me * I could export an HTML version to be served, but I can't share both ways with the same file and this means I have two versions of the same presentation to work with. Again, the average user (my mom) isn't going to be serving files created on their desktop any time too soon, since she has just barely grasped email attachments. ** Containing any number of HTML, XHTML, CSS, image or other files inside of it invisible to the average user.
Re: [whatwg] Web Documents off the Web (was Web Archives)
On 4/16/07, Jon Barnett [EMAIL PROTECTED] wrote: RFC 2557 was mentioned in the last thread. http://tools.ietf.org/html/rfc2557 After reading it in detail (and indeed writing a script to send HTML with inline images as attachments), I quite like it. It's simple and obvious enough and allows for a fallback to a real internet URL if a corresponding URL exists. The main gripe about it was that binary data is base64 encoded, which adds size to the file in the end. A couple benefits to MHTML over ZIP are that HTTP headers are preserved and that the Content-Location header can directly associate a resource with it's Internet-hosted version, removing the need to change all the URLs (absolute or relative) in a document (and related documents, such as CSS files) to make it usable offline. zipping the final MHTML file could help with size. Considering that there's already a standard, the trick is getting browsers to support it. http://en.wikipedia.org/wiki/MHTML That pages tells a lot about what can save as MHTML but not enough about what can open and read MHTML. -- Jon Barnett