Re: Binary Data - possible topic for joint session
On Fri, Nov 6, 2009 at 11:24 AM, Brendan Eich bren...@mozilla.com wrote: Kris did a good job with Binary/B (although I do not see the point of the .get method additions) -- I didn't look at the other proposals yet. Thanks. The .get method is certainly not relevant for an ECMAScript spec, where you have the luxury of specifying [[Get]] and [[Put]]. The .get method in the CommonJS proposal is intended to serve as a stop-gap for implementations that cannot provide properties. Kris Kowal ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: Binary Data - possible topic for joint session
Maciej Stachowiak wrote: On Nov 7, 2009, at 5:39 AM, Ash Berlin wrote: On 6 Nov 2009, at 19:24, Brendan Eich wrote: On Nov 6, 2009, at 10:44 AM, Dean Landolt wrote: Just in case some of you weren't aware, the CommonJS group has done quite a bit of work and (bikeshedding) on this topic. Here's a link to the wiki: http://wiki.commonjs.org/wiki/Binary ... Binary/B is the closest of the three proposals to mine, in that it has both mutable and immutable binary data containers. Here are a few key differences: ... Regards, Maciej One note, Binary/C also originally had a mutable and an immutable type. The mutable type was moved to IO/B/Buffer (http://wiki.commonjs.org/wiki/IO/B/Buffer), when comparing to Binary/B, Binary/C together with IO/B/Buffer is more equivalent a comparison. -- ~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name] ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: Binary Data - possible topic for joint session
On Nov 7, 2009, at 5:39 AM, Ash Berlin wrote: On 6 Nov 2009, at 19:24, Brendan Eich wrote: On Nov 6, 2009, at 10:44 AM, Dean Landolt wrote: Just in case some of you weren't aware, the CommonJS group has done quite a bit of work and (bikeshedding) on this topic. Here's a link to the wiki: http://wiki.commonjs.org/wiki/Binary If nothing else there's quite a bit of prior art collected which should inform the conversation. I know the Binary/B proposal has the implementation momentum, but I don't know exactly what the status is. I haven't been closely following the evolution of these binary specs too closely but since it seems that nearly everyone else from the group is off to jsconf.eu I figured I ought to toss this out there. Thanks, I had forgotten about commonjs.org, having once paid better attention. Kris did a good job with Binary/B (although I do not see the point of the .get method additions) -- I didn't look at the other proposals yet. /be Binary/B feels largely right, but it has a few too many methods from Array simply because Array had them for my taste, specifically things like sort, reduce, shift, unshift etc. Conceptually: why would you want to sort an array of bytes? There are certainly classes of operations that I think should just be done via b.toArray().X rather than directly on the blob. As a community (CommonJS) we'd be more than happy to go forward with a binary spec that came from (or at least has the blessing of) the ES groups Binary/B is the closest of the three proposals to mine, in that it has both mutable and immutable binary data containers. Here are a few key differences: (1) Binary/B does not have a cheap way to convert from the immutable representation (ByteString) to the mutable representation (ByteArray) (2) In Binary/B, Array-like index access to ByteString gives back one- byte ByteStrings instead of bytes, likely an over-literal copying of String (3) There are some seemingly needless differences in the interfaces to ByteString and ByteArray that follow from modeling on String and Array (4) Binary/B has many more operations available in the base proposal (including charset transcoding and a generous selection of String and Array methods) (5) Different names - Data/DataBuilder vs. ByteString/ByteArray My initial impression is that (1), (2) and (3) are all points on which my proposal is better. On (1): cheap conversion from mutable to immutable (DataBuilder.prototype.release() in my proposal) lets binary data objects be built up with a convenient mutation-based idiom, but then passed around as immutable objects thereafter. On (2): I don't think a one-byte ByteString is ever useful, indexing to get the byte value would be much more helpful. On (3), I think it's good for the mutable interface to be a strict superset of the the immutable interface. (4) and (5) are all points where perhaps neither proposal is at the optimum yet. On (4), I suspect the sweet spot is somewhere between my spartan set of built-in operations and the very generous set in Binary/ B. On (5), I'm not sure either set of names is the best possible, and I'm certainly not stuck on my own proposed names. Regards, Maciej ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: Binary Data - possible topic for joint session
On Nov 7, 2009, at 6:53 PM, Ash Berlin wrote: On 8 Nov 2009, at 02:21, Maciej Stachowiak wrote: On Nov 7, 2009, at 5:39 AM, Ash Berlin wrote: On 6 Nov 2009, at 19:24, Brendan Eich wrote: On Nov 6, 2009, at 10:44 AM, Dean Landolt wrote: http://wiki.commonjs.org/wiki/Binary [snip] [snip] As a community (CommonJS) we'd be more than happy to go forward with a binary spec that came from (or at least has the blessing of) the ES groups Binary/B is the closest of the three proposals to mine, in that it has both mutable and immutable binary data containers. Here are a few key differences: (1) Binary/B does not have a cheap way to convert from the immutable representation (ByteString) to the mutable representation (ByteArray) (2) In Binary/B, Array-like index access to ByteString gives back one-byte ByteStrings instead of bytes, likely an over-literal copying of String (3) There are some seemingly needless differences in the interfaces to ByteString and ByteArray that follow from modeling on String and Array (4) Binary/B has many more operations available in the base proposal (including charset transcoding and a generous selection of String and Array methods) (5) Different names - Data/DataBuilder vs. ByteString/ByteArray On (1): cheap conversion from mutable to immutable (DataBuilder.prototype.release() in my proposal) lets binary data objects be built up with a convenient mutation-based idiom, but then passed around as immutable objects thereafter. Mutable to immutable or immutable to mutable? Assuming the former, how do you handle the differences in API/behaviour? each function checks wether it is now immutable? Mutable to immutable. Immutable to mutable has to copy (or at least copy-on-write). My proposal does it like this (where DataBuilder is the mutable variant and Data is the immutable): DataBuilder.prototype.release() Return a new Data with the same length and the same byte values as the DataBuilder passed as the this value. At the same time, the DataBuilder is reset to length 0. Because the DataBuilder is reset to empty, the implementation can steal its underlying buffer for the new Data object, thus converting to immutable without a full copy. This matches the common pattern of assembling a new piece of binary data with mutation, then handing it out to possibly multiple other pieces of code as immutable. On (2): I don't think a one-byte ByteString is ever useful, indexing to get the byte value would be much more helpful. Couldn't agree more with you here - if for whatever reason you do want a one-byte ByteString, there is always substr/substring. This is something that came up recently in IRC and prompted me to start looking at making changes to the proposal - I was going to do that next week, so this coming up now is very good timing. On (3), I think it's good for the mutable interface to be a strict superset of the the immutable interface. Seems like a reasonable thing to do. I'm glad we agree on these two points. (4) and (5) are all points where perhaps neither proposal is at the optimum yet. On (4), I suspect the sweet spot is somewhere between my spartan set of built-in operations and the very generous set in Binary/B. Agreed - this was the other thing i noticed - e.g. sorting a ByteArray isn't really an operation that makes a whole lot of sense to my mind. Yep. I'm not even sure things like map(), filter() or reduce() are likely to work well. My own preference is to start the API very small, and add incrementally based on demonstrated need and clearly articulated use cases. On (5), I'm not sure either set of names is the best possible, and I'm certainly not stuck on my own proposed names. I'm not really bothered either way on this front, although 'Data' is much more likely to clash with existing code. Yes, Brendan made this point and presented some good evidence in that direction. I think 'Data' doesn't work but 'Binary' or 'BinData' might. Something worth bearing in mind is that Binary/B is implemented in 2 or 3 CommonJS platforms already, but I don't think any one is particularly attached to the behaviour so long as what comes out isn't wildly different. What kind of differences do you think they would tolerate? Renaming the classes? Dropping/changing some methods? Regards, Maciej ___ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: Binary Data - possible topic for joint session
+ es-discuss (since posting there seems to have piqued more interest) From reading over other proposals for binary data, I should mention the following operations that seem to be of interest to some communities but are not directly provided in this proposal (with ones I think are most appropriate for v1 first): - Subrange/subdata/substring (get a Data that's a range from another's buffer - perhaps this could be optimized not to copy) - Concatenation (specifically the ability to concatenate two immutable Data objects and get a new one back without having to go through the mutable type). - Ability to convert to/from strings (with some hardcoded encoding or choice of encoding) - Some or all of the operations of Array - Base64 encode/decode - Methods to compute various cryptographic hashes - Find first or last occurrence of a byte or byte sequence (from a given offset) - Split on a byte or byte sequence I think it is possible to implement all of these with the primitives in my proposal, and in many cases the utility seems dubious (do you really want to map() or reduce() binary data one byte at a time?). Thus, I lean towards keeping the API relatively minimal, at least for starters. Regards, Maciej On Nov 4, 2009, at 4:26 PM, Maciej Stachowiak wrote: Many APIs being developed for the Web platform would benefit from a good way to store binary data. It would be useful for this to be specified as part of the ECMAScript language, but it's also plausible to make this a W3C spec that's only intended for use with Web platform APIs. Here is an overview of some of the APIs that could use such a data type, some notes on requirements and design alternatives, and a strawman proposal. = If there's time, I'd like to discuss this at the joint TC-39/HTML WG/Web Apps WG session. Some APIs that could use this: XMLHttpRequest v2 - to receive and send binary data WebSocket - to receive and send binary packets File API - to read binary files Canvas - to get image data in the binary form of an image format (avoiding inefficiency of data: URLs) various storage APIs - to store and retrieve binary data (in combination with other APIs) postMessage - to send binary data cross-window and cross-thread (to Workers) efficiently I suspect there's more I am not thinking of. A convenient and efficient way to represent binary data could also be useful for pure ES programs. = Current de facto ways for Web apps to deal with binary data: Array of numbers with one byte per entry String with one byte stored per UTF-16 code unit String with two bytes stored per UTF-16 code unit I hope it is obvious why these approaches are not great so I won't go into detail. = Issues for the binary data API: Name (potential bikeshed): ByteArray ByteVector BinaryData Data I like Data and similar names. Objective-C has NSData as a distinct type for chunks of binary data - it's not treated as a type of array. I think this makes sense. Often the fact that a chunk of binary data can be treated as an octet sequence is incidental. == Mutable or Immutable (or both?) Immutable has a number of advantages: - Can share backing store with chunks of binary data that the UA already holds (e.g. in the network cache) without requiring copy-on- write - Can be passed cross-thread without copying, and without breaking shared-nothing semantics - Has the right semantics for passing cross-window (can make a copy in cross-process case, but avoid it in same-process case; or use shared memory in cross-process case without worrying about locking or races) - Follows the approach of ES strings, which are immutable But there's some significant disadvantages too: - What if you actually want to mutate some piece of binary data you got before passing it along? How to do this efficiently? - What if you want to build a new binary data item from scratch? With strings, the answer to both building and mutation is to extract pieces and build a new string by concatenation. But that's probably not efficient or convenient enough for the binary data case. Possible solution: provide immutable Data, but have a DataBuilder class to allow creating new data items or mutating copies of existing ones, which can then give a final immutable product. == What Operations? Operation set could be a full set of array-like operations, absolutely minimal (just accessors for individual bytes), or middle ground (byte-level accessors plus a few bulk operations like the equivalent of memcpy). I like the middle ground. == Rough API Proposal Here's a sketch of a binary data API that's immutable (with mutable builder class), and provides a middle-ground set of operations. The basic idea is that binary data should be considered a first-class datatype in its own right, just as strings are, rather than being thought of