Re: File API: File's name property

2013-09-06 Thread Anne van Kesteren
On Wed, Sep 4, 2013 at 11:45 PM, Glenn Maynard gl...@zewt.org wrote:
 On Tue, Sep 3, 2013 at 12:04 PM, Anne van Kesteren ann...@annevk.nl wrote:
 The problem is that once you put it through the URL parser it'll
 become /. And I suspect given directory APIs and such it'll go
 through that layer at some point.

 I don't follow.  Backslashes in filenames are escaped in URLs
 (http://zewt.org/~glenn/test%5Cfile), like all the other things that require
 escaping.

If the raw input to the URL parser includes a backslash, it'll be
treated as a forward slash. I am not really expecting people to use
encodeURI or such utilities.


 Well, my suggestion was rawName and name (which would have loss of
 information), per the current zip archive API design.

 Having a separate field is fine.  This is specific to ZIPs, so it feels like
 it belongs in a ZipFile subclass, not File itself.

Is it? There's no other file systems where the file names are
effectively byte sequences? If that's the case, maybe that's fine.


 We definitely wouldn't
 want raw bytes from filenames being filled in from user filesystems (eg.
 Shift-JIS filenames in Linux),

The question is whether you can have something random without
associated encoding. If there's an encoding it's easy to put lipstick
on a pig.


 and Windows filenames aren't even bytes
 (they're natively UTF-16).

Right, that would end up as a utf-8 byte sequence in File.rawName and
File.name would do the right thing with that.


 There's an API too.

 It might be better to wait until we have a filesystem API, then piggyback on
 that...

Yeah, I wondered about that. It depends on whether we want to expose
directories or just treat a zip archive as an ordered map of
path/resource pairs.


-- 
http://annevankesteren.nl/



Re: File API: File's name property

2013-09-06 Thread Arun Ranganathan

On Sep 6, 2013, at 11:42 AM, Anne van Kesteren wrote:

 On Wed, Sep 4, 2013 at 11:45 PM, Glenn Maynard gl...@zewt.org wrote:
 On Tue, Sep 3, 2013 at 12:04 PM, Anne van Kesteren ann...@annevk.nl wrote:
 The problem is that once you put it through the URL parser it'll
 become /. And I suspect given directory APIs and such it'll go
 through that layer at some point.
 
 I don't follow.  Backslashes in filenames are escaped in URLs
 (http://zewt.org/~glenn/test%5Cfile), like all the other things that require
 escaping.
 
 If the raw input to the URL parser includes a backslash, it'll be
 treated as a forward slash. I am not really expecting people to use
 encodeURI or such utilities.


I think it may be ok to restrict / and \.  I don't think we lose too much 
here by not allowing historically directory delimiting characters in file 
names.

The question is what to do with a /  or a \.   I'm inclined to say UAs 
should treat those as U+FFFD.

 
 Well, my suggestion was rawName and name (which would have loss of
 information), per the current zip archive API design.
 
 Having a separate field is fine.  This is specific to ZIPs, so it feels like
 it belongs in a ZipFile subclass, not File itself.
 
 Is it? There's no other file systems where the file names are
 effectively byte sequences? If that's the case, maybe that's fine.


Well…. 

Some file systems don't store names as unrestricted byte sequences (older 
Windows), but GNU systems usually do.  Some byte sequences are not valid names. 
Conversely, names of existing files may not be representable as byte sequences 
(and sometimes there are two representations -- e.g. Amèlie.txt will either use 
00e9 or 0065 0031 for the è  -- both are Unicode equivalents, but are different 
byte sequences). Some file systems perform Unicode canonicalization on file 
names, which is more or less what I think the Web should do.

I think we run only a small risk of information loss, but I DO think that File 
name should be an [EnforceUTF16] DOMString.  That way, we have the best shot at 
byte sequences based on the underlying characterization.

Summary: I'll punt on File.rawName till a rainier day than today, but I will 
restrict / and \ since they are historically directory separators.  I know 
that there are OTHER characters that we can also restrict, but these two are 
the big ones and get us some 80-20 sanitization :)

Glenn said:

 It might be better to wait until we have a filesystem API, then piggyback on
 that...

+1.

-- A*

Re: File API: File's name property

2013-09-06 Thread Anne van Kesteren
On Fri, Sep 6, 2013 at 4:42 PM, Anne van Kesteren ann...@annevk.nl wrote:
 On Wed, Sep 4, 2013 at 11:45 PM, Glenn Maynard gl...@zewt.org wrote:
 It might be better to wait until we have a filesystem API, then piggyback on
 that...

 Yeah, I wondered about that. It depends on whether we want to expose
 directories or just treat a zip archive as an ordered map of
 path/resource pairs.

Actually, given that zip paths are byte sequences, that would not work
anyway. The alternative might be to always map it to code points
somehow via requiring an encoding to be specified and just deal with
the losses, but that doesn't seem general purpose enough.


-- 
http://annevankesteren.nl/



Re: File API: File's name property

2013-09-06 Thread Glenn Maynard
On Fri, Sep 6, 2013 at 10:42 AM, Anne van Kesteren ann...@annevk.nlwrote:

 If the raw input to the URL parser includes a backslash, it'll be
 treated as a forward slash. I am not really expecting people to use
 encodeURI or such utilities.


People who don't will have a bug, but all this is doing is preemptively
adding the bug, not preventing it, and forcing it on unrelated features
(HTMLInputElement.files).  Don't the ZIP URL proposals require some
characters or other to be escaped anyway (at least of the ones that support
navigation)?

It's far too late to try to keep people from having to escape things in
URLs.

  Having a separate field is fine.  This is specific to ZIPs, so it feels
 like
  it belongs in a ZipFile subclass, not File itself.

 Is it? There's no other file systems where the file names are
 effectively byte sequences? If that's the case, maybe that's fine.


There are lots of them.  I meant that it seems like wanting to expose raw
bytes is specific to ZIPs.  I hope we wouldn't expose the user's local
filesystem locale to the Web.  Depending on the user's locale causes some
of the more obnoxious bugs the platform has, we should be fighting to kill
it, not add more of it.


   We definitely wouldn't
  want raw bytes from filenames being filled in from user filesystems (eg.
  Shift-JIS filenames in Linux),

 The question is whether you can have something random without
 associated encoding. If there's an encoding it's easy to put lipstick
 on a pig.


You can have filenames in Linux that are in a different encoding than
expected.  I don't know why you'd want to expose that to the web, though.


   There's an API too.
 
  It might be better to wait until we have a filesystem API, then
 piggyback on
  that...

 Yeah, I wondered about that. It depends on whether we want to expose
 directories or just treat a zip archive as an ordered map of
 path/resource pairs.


I've found being able to work with a directory or a ZIP in the same way to
be useful in the past, too.


On Fri, Sep 6, 2013 at 12:08 PM, Anne van Kesteren ann...@annevk.nl wrote:

 Actually, given that zip paths are byte sequences, that would not work
 anyway. The alternative might be to always map it to code points
 somehow via requiring an encoding to be specified and just deal with
 the losses, but that doesn't seem general purpose enough.


Taking an arbitrary use case: showing the user a list of files inside a
ZIP, and letting him pick one to be extracted.  Exposing raw filenames is
one way to make this work: you iterate over Files in the ZIP, pull out the
File.name for display to the user and stash the File.rawName so you can
look up the File later.  Once the user picks a file from the list, you call
zip.getFileByRawName(stashedRawName) with the associated rawName to
retrieve the selected file.

But, that doesn't just work.  I assume the API will have a
getFileByName(DOMString filename)-like method as well as a rawName
method, and people will be much more likely to ignore byRawName and only
use byName.  The developer has to be careful to store the rawName and only
look up files using raw names if he wants broken filenames to work.

An alternative solution: as you iterate over Files to create a list to
display to the user, stash the File as well (instead of the rawName),
associated with each list entry.  When the user selects a file, you just
use the File you already have, and never pass the filename back to the
API.  This would also take special effort by developers, but no more than
the rawName solution, and it avoids exposing raw filenames entirely.

For ZIP URLs, it seems like linking inside a legacy ZIP (rather than a ZIP
of icons or whatever that you just created to link to) would be uncommon.
(Also, if you think people won't escape backslashes, they definitely won't
escape garbage filenames with a special byte-escape mechanism...)  Are
there likely use cases here?


On Fri, Sep 6, 2013 at 1:04 PM, Arun Ranganathan a...@mozilla.com wrote:

 I think it may be ok to restrict / and \.  I don't think we lose too
 much here by not allowing historically directory delimiting characters in
 file names.


\ is a valid character in real filenames.  This would break selecting
filenames with backslashes in them with HTMLInputElement, which works fine
today.

-- 
Glenn Maynard


Re: File API: File's name property

2013-09-04 Thread Glenn Maynard
On Tue, Sep 3, 2013 at 12:04 PM, Anne van Kesteren ann...@annevk.nlwrote:

 The problem is that once you put it through the URL parser it'll
 become /. And I suspect given directory APIs and such it'll go
 through that layer at some point.


I don't follow.  Backslashes in filenames are escaped in URLs (
http://zewt.org/~glenn/test%5Cfile), like all the other things that require
escaping.

 Well, my suggestion was rawName and name (which would have loss of
 information), per the current zip archive API design.


Having a separate field is fine.  This is specific to ZIPs, so it feels
like it belongs in a ZipFile subclass, not File itself.  We definitely
wouldn't want raw bytes from filenames being filled in from user
filesystems (eg. Shift-JIS filenames in Linux), and Windows filenames
aren't even bytes (they're natively UTF-16).


  By the way, in the current ZIP URL proposal, where would a File be
 created?
  If you use XHR to access a file inside a ZIP URL then you'd just get a
 Blob,
  right?

 There's an API too.


It might be better to wait until we have a filesystem API, then piggyback
on that...

-- 
Glenn Maynard


Re: File API: File's name property

2013-09-03 Thread Arun Ranganathan
Well, https://www.w3.org/Bugs/Public/show_bug.cgi?id=23138 is to make the 
'type' attribute a ByteString.  Is that your request here for the name 
attribute as well?

It wouldn't be wise to restrict '/' or '\' or try to delve too deep into 
platform land BUT the FileSystem API introduces directory syntax which might 
make being lax a fly in the ointment for later.


On Aug 29, 2013, at 10:48 AM, Anne van Kesteren wrote:

 As currently specified File's name property seems to be a code unit
 sequence. In zip archives the resource's path is a byte sequence. I
 don't really know what popular file systems do. Given that a File has
 to be transmitted over the wire now and then, including it's name
 property value, a code unit sequence seems like the wrong type. It
 would at least lead to information loss which I'm not sure is
 acceptable if we can prevent it (or at least make it more obvious that
 it is going on, by doing a transformation early on).
 
 We may also want to restrict \ and / to leave room for using these
 objects in path-based contexts later.
 
 
 -- 
 http://annevankesteren.nl/
 




Re: File API: File's name property

2013-09-03 Thread Anne van Kesteren
On Tue, Sep 3, 2013 at 3:03 PM, Arun Ranganathan a...@mozilla.com wrote:
 Well, https://www.w3.org/Bugs/Public/show_bug.cgi?id=23138 is to make the 
 'type' attribute a ByteString.  Is that your request here for the name 
 attribute as well?

I don't think you want those conversion semantics for name. I do think
we want the value space for names across different systems to be
equivalent, which if we support zip basically means bytes. This could
mean accepting DOMString and then doing the conversion yourself
through utf-8. However, it's not very clear to me how to do the
conversion back in a way that minimizes information loss and works
everywhere compatibly. For zip archives I ended up with rawPath
(bytes) and path (bytes converted to a string using utf-8 and vice
versa). Maybe we should use that model here too?


 It wouldn't be wise to restrict '/' or '\' or try to delve too deep into 
 platform land BUT the FileSystem API introduces directory syntax which might 
 make being lax a fly in the ointment for later.

Right. Zip archives also have paths and it would be annoying if we ran
into problems there.


-- 
http://annevankesteren.nl/



Re: File API: File's name property

2013-09-03 Thread Glenn Maynard
On Tue, Sep 3, 2013 at 9:03 AM, Arun Ranganathan a...@mozilla.com wrote:

 It wouldn't be wise to restrict '/' or '\' or try to delve too deep into
 platform land BUT the FileSystem API introduces directory syntax which
 might make being lax a fly in the ointment for later.


I wouldn't object to restricting / if it'll make other APIs more
sensible.  Every platform I've used treats it as a separator.

On Tue, Sep 3, 2013 at 10:17 AM, Anne van Kesteren ann...@annevk.nlwrote:

 I don't think you want those conversion semantics for name. I do think
 we want the value space for names across different systems to be
 equivalent, which if we support zip basically means bytes.


I don't really understand the suggestion of using a ByteString for
File.name.  Can you explain how that wouldn't break
https://zewt.org/~glenn/picker.html, if the user picks a file named
漢字.txt?

-- 
Glenn Maynard


Re: File API: File's name property

2013-09-03 Thread Anne van Kesteren
On Tue, Sep 3, 2013 at 5:14 PM, Glenn Maynard gl...@zewt.org wrote:
 On Tue, Sep 3, 2013 at 10:17 AM, Anne van Kesteren ann...@annevk.nl wrote:
 I don't think you want those conversion semantics for name. I do think
 we want the value space for names across different systems to be
 equivalent, which if we support zip basically means bytes.

 I don't really understand the suggestion of using a ByteString for
 File.name.  Can you explain how that wouldn't break
 https://zewt.org/~glenn/picker.html, if the user picks a file named
 漢字.txt?

ByteString doesn't work. A byte sequence might. If the platform does
file names in Unicode it would be converted to bytes using utf-8.


-- 
http://annevankesteren.nl/



Re: File API: File's name property

2013-09-03 Thread Anne van Kesteren
On Tue, Sep 3, 2013 at 5:54 PM, Glenn Maynard gl...@zewt.org wrote:
 On Tue, Sep 3, 2013 at 11:31 AM, Arun Ranganathan a...@mozilla.com wrote:
 And, restrict separators such as / and \.

 I thought we just agreed that \ is a platform-specific thing that
 File.name shouldn't restrict.  / is a directory separator on just about
 every platform, but \ can appear in filenames on many systems.

The problem is that once you put it through the URL parser it'll
become /. And I suspect given directory APIs and such it'll go
through that layer at some point.


 On Tue, Sep 3, 2013 at 11:28 AM, Anne van Kesteren ann...@annevk.nl wrote:

 ByteString doesn't work. A byte sequence might. If the platform does
 file names in Unicode it would be converted to bytes using utf-8.

 I don't know what API is being suggested that would keep File.name acting
 like a String, but also allow containing arbitrary bytes.  I could imagine
 one (an object that holds bytes, stringifies assuming UTF-8 and converts
 from strings assuming UTF-8), but that's pretty ugly...

Well, my suggestion was rawName and name (which would have loss of
information), per the current zip archive API design.


 By the way, in the current ZIP URL proposal, where would a File be created?
 If you use XHR to access a file inside a ZIP URL then you'd just get a Blob,
 right?

There's an API too.


-- 
http://annevankesteren.nl/



Re: File API: File's name property

2013-09-03 Thread Anne van Kesteren
On Tue, Sep 3, 2013 at 5:31 PM, Arun Ranganathan a...@mozilla.com wrote:
 Which in fact is how I think we should do File.name.  We'll stick to 
 DOMString, but think it should specify a conversion to a byte sequence using 
 utf-8.  And, restrict separators such as / and \.

That doesn't solve the problem I mentioned earlier for arbitrary file
names coming out of zip archives. And then your data model is not
bytes, but Unicode scalar values. We could of course accept
information loss of some kind in the conversion process between zip
archive resources and File objects and require developers to keep
track of that if they care.


-- 
http://annevankesteren.nl/



Re: File API: File's name property

2013-09-03 Thread Arun Ranganathan

On Sep 3, 2013, at 12:28 PM, Anne van Kesteren wrote:

 On Tue, Sep 3, 2013 at 5:14 PM, Glenn Maynard gl...@zewt.org wrote:
 On Tue, Sep 3, 2013 at 10:17 AM, Anne van Kesteren ann...@annevk.nl wrote:
 I don't think you want those conversion semantics for name. I do think
 we want the value space for names across different systems to be
 equivalent, which if we support zip basically means bytes.
 
 I don't really understand the suggestion of using a ByteString for
 File.name.  Can you explain how that wouldn't break
 https://zewt.org/~glenn/picker.html, if the user picks a file named
 漢字.txt?
 
 ByteString doesn't work. A byte sequence might. If the platform does
 file names in Unicode it would be converted to bytes using utf-8.


Which in fact is how I think we should do File.name.  We'll stick to DOMString, 
but think it should specify a conversion to a byte sequence using utf-8.  And, 
restrict separators such as / and \.

-- A*


Re: File API: File's name property

2013-09-03 Thread Glenn Maynard
On Tue, Sep 3, 2013 at 11:31 AM, Arun Ranganathan a...@mozilla.com wrote:

 And, restrict separators such as / and \.


I thought we just agreed that \ is a platform-specific thing that
File.name shouldn't restrict.  / is a directory separator on just about
every platform, but \ can appear in filenames on many systems.

On Tue, Sep 3, 2013 at 11:28 AM, Anne van Kesteren ann...@annevk.nlwrote:

 ByteString doesn't work. A byte sequence might. If the platform does
 file names in Unicode it would be converted to bytes using utf-8.


I don't know what API is being suggested that would keep File.name acting
like a String, but also allow containing arbitrary bytes.  I could imagine
one (an object that holds bytes, stringifies assuming UTF-8 and converts
from strings assuming UTF-8), but that's pretty ugly...

On Tue, Sep 3, 2013 at 11:42 AM, Anne van Kesteren ann...@annevk.nlwrote:

 That doesn't solve the problem I mentioned earlier for arbitrary file
 names coming out of zip archives. And then your data model is not
 bytes, but Unicode scalar values. We could of course accept
 information loss of some kind in the conversion process between zip
 archive resources and File objects and require developers to keep
 track of that if they care.


If you want to retain the original bytes of the filename somewhere, it
seems like it should go somewhere other than File.name.  For example, a
subclass of File, ZipFile, could contain a ByteString filenameBytes with
the original filename.  I wonder when you'd need that info, though.

By the way, in the current ZIP URL proposal, where would a File be
created?  If you use XHR to access a file inside a ZIP URL then you'd just
get a Blob, right?

-- 
Glenn Maynard


File API: File's name property

2013-08-29 Thread Anne van Kesteren
As currently specified File's name property seems to be a code unit
sequence. In zip archives the resource's path is a byte sequence. I
don't really know what popular file systems do. Given that a File has
to be transmitted over the wire now and then, including it's name
property value, a code unit sequence seems like the wrong type. It
would at least lead to information loss which I'm not sure is
acceptable if we can prevent it (or at least make it more obvious that
it is going on, by doing a transformation early on).

We may also want to restrict \ and / to leave room for using these
objects in path-based contexts later.


-- 
http://annevankesteren.nl/



Re: File API: File's name property

2013-08-29 Thread Glenn Maynard
On Thu, Aug 29, 2013 at 9:48 AM, Anne van Kesteren ann...@annevk.nl wrote:

 As currently specified File's name property seems to be a code unit
 sequence. In zip archives the resource's path is a byte sequence. I
 don't really know what popular file systems do. Given that a File has
 to be transmitted over the wire now and then, including it's name
 property value, a code unit sequence seems like the wrong type. It
 would at least lead to information loss which I'm not sure is
 acceptable if we can prevent it (or at least make it more obvious that
 it is going on, by doing a transformation early on).


I don't think it makes sense to expect filenames to round-trip through
File.name, especially for filenames with a broken or unknown encoding.
File.name should be a best-effort at converting the platform filename to
something that can be displayed to users or encoded and put in a
Content-Disposition header, not an identifier for finding the file later.

We may also want to restrict \ and / to leave room for using these
 objects in path-based contexts later.


Forward slash, but not backslash.  That's a platform-specific restriction.
If we go down the route of limiting filenames which don't work on one or
another system, the list of restrictions becomes very long.  If path
separators are exposed on the web, they should always be forward-slashes.

-- 
Glenn Maynard


Re: File API: File's name property

2013-08-29 Thread Anne van Kesteren
On Thu, Aug 29, 2013 at 4:10 PM, Glenn Maynard gl...@zewt.org wrote:
 I don't think it makes sense to expect filenames to round-trip through
 File.name, especially for filenames with a broken or unknown encoding.
 File.name should be a best-effort at converting the platform filename to
 something that can be displayed to users or encoded and put in a
 Content-Disposition header, not an identifier for finding the file later.

File has a constructor. We should be clearer about platforms too I suppose.


 We may also want to restrict \ and / to leave room for using these
 objects in path-based contexts later.

 Forward slash, but not backslash.  That's a platform-specific restriction.
 If we go down the route of limiting filenames which don't work on one or
 another system, the list of restrictions becomes very long.  If path
 separators are exposed on the web, they should always be forward-slashes.

Given that the URL parser treats them identically, we should treat
them identically everywhere else too.


-- 
http://annevankesteren.nl/



Re: File API: File's name property

2013-08-29 Thread Glenn Maynard
On Thu, Aug 29, 2013 at 10:14 AM, Anne van Kesteren ann...@annevk.nlwrote:

 On Thu, Aug 29, 2013 at 4:10 PM, Glenn Maynard gl...@zewt.org wrote:
  I don't think it makes sense to expect filenames to round-trip through
  File.name, especially for filenames with a broken or unknown encoding.
  File.name should be a best-effort at converting the platform filename to
  something that can be displayed to users or encoded and put in a
  Content-Disposition header, not an identifier for finding the file later.

 File has a constructor. We should be clearer about platforms too I suppose.


All constructing a File does is give a name (and date) to a Blob.  It
doesn't create an association to an on-disk file, and shouldn't be
restricted to filenames the local platform's filesystem can represent.

Given that the URL parser treats them identically, we should treat
 them identically everywhere else too.


URL parsing does lots of weird things that shouldn't be spread to the rest
of the platform.  File.name and URL parsing are completely different
things, and filenames on non-Windows systems can contain backslashes.

-- 
Glenn Maynard


Re: File API: File's name property

2013-08-29 Thread Anne van Kesteren
On Thu, Aug 29, 2013 at 4:46 PM, Glenn Maynard gl...@zewt.org wrote:
 All constructing a File does is give a name (and date) to a Blob.  It
 doesn't create an association to an on-disk file, and shouldn't be
 restricted to filenames the local platform's filesystem can represent.

Yes, but it can be submitted to a server so it has to be transformed
at some point. It seems way better to do the transformation early so
what you see in client-side JavaScript is similar to what you'd see in
Node.js.


 Given that the URL parser treats them identically, we should treat
 them identically everywhere else too.

 URL parsing does lots of weird things that shouldn't be spread to the rest
 of the platform.  File.name and URL parsing are completely different things,
 and filenames on non-Windows systems can contain backslashes.

All the more reason to do something with it to prevent down-level bugs.


-- 
http://annevankesteren.nl/



Re: File API: File's name property

2013-08-29 Thread Glenn Maynard
On Thu, Aug 29, 2013 at 10:51 AM, Anne van Kesteren ann...@annevk.nlwrote:

 On Thu, Aug 29, 2013 at 4:46 PM, Glenn Maynard gl...@zewt.org wrote:
  All constructing a File does is give a name (and date) to a Blob.  It
  doesn't create an association to an on-disk file, and shouldn't be
  restricted to filenames the local platform's filesystem can represent.

 Yes, but it can be submitted to a server so it has to be transformed
 at some point. It seems way better to do the transformation early so
 what you see in client-side JavaScript is similar to what you'd see in
 Node.js.


It's transformed from a UTF-16 DOMString to the encoding of the protocol
it's being transferred over, just like any other DOMString being sent over
a non-UTF-16 protocol.

 URL parsing does lots of weird things that shouldn't be spread to the rest
  of the platform.  File.name and URL parsing are completely different
 things,
  and filenames on non-Windows systems can contain backslashes.

 All the more reason to do something with it to prevent down-level bugs.


We shouldn't prevent people in Linux from seeing their filenames because
those filenames wouldn't be valid on Windows.  That would require much more
than just backslashes--you'd need to prevent all characters and strings
that aren't valid in Windows, such as COM0.

Even having non-ASCII filenames will cause problems for Windows users,
since many Windows applications can only access filenames which are a
subset of the user's locale (it takes extra work to use Unicode filenames
in Windows).

-- 
Glenn Maynard