Re: Binary Data - possible topic for joint session

2009-11-14 Thread Kris Kowal
On Fri, Nov 6, 2009 at 11:24 AM, Brendan Eich bren...@mozilla.com wrote:
 Kris did a good job with Binary/B (although I do not see the point of the
 .get method additions) -- I didn't look at the other proposals yet.

Thanks.  The .get method is certainly not relevant for an ECMAScript
spec, where you have the luxury of specifying [[Get]] and [[Put]].
The .get method in the CommonJS proposal is intended to serve as a
stop-gap for implementations that cannot provide properties.

Kris Kowal
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Binary Data - possible topic for joint session

2009-11-08 Thread Daniel Friesen

Maciej Stachowiak wrote:


On Nov 7, 2009, at 5:39 AM, Ash Berlin wrote:



On 6 Nov 2009, at 19:24, Brendan Eich wrote:


On Nov 6, 2009, at 10:44 AM, Dean Landolt wrote:

Just in case some of you weren't aware, the CommonJS group has done 
quite a bit of work and (bikeshedding) on this topic. Here's a link 
to the wiki:


http://wiki.commonjs.org/wiki/Binary

...


Binary/B is the closest of the three proposals to mine, in that it has 
both mutable and immutable binary data containers. Here are a few key 
differences:

...
Regards,
Maciej
One note, Binary/C also originally had a mutable and an immutable type. 
The mutable type was moved to IO/B/Buffer 
(http://wiki.commonjs.org/wiki/IO/B/Buffer), when comparing to Binary/B, 
Binary/C together with IO/B/Buffer is more equivalent a comparison.


--
~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Binary Data - possible topic for joint session

2009-11-07 Thread Maciej Stachowiak


On Nov 7, 2009, at 5:39 AM, Ash Berlin wrote:



On 6 Nov 2009, at 19:24, Brendan Eich wrote:


On Nov 6, 2009, at 10:44 AM, Dean Landolt wrote:

Just in case some of you weren't aware, the CommonJS group has  
done quite a bit of work and (bikeshedding) on this topic. Here's  
a link to the wiki:


http://wiki.commonjs.org/wiki/Binary

If nothing else there's quite a bit of prior art collected which  
should inform the conversation. I know the Binary/B proposal has  
the implementation momentum, but I don't know exactly what the  
status is. I haven't been closely following the evolution of these  
binary specs too closely but since it seems that nearly everyone  
else from the group is off to jsconf.eu I figured I ought to toss  
this out there.


Thanks, I had forgotten about commonjs.org, having once paid better  
attention.


Kris did a good job with Binary/B (although I do not see the point  
of the .get method additions) -- I didn't look at the other  
proposals yet.


/be


Binary/B feels largely right, but it has a few too many methods from  
Array simply because Array had them for my taste, specifically  
things like sort, reduce, shift, unshift etc.


Conceptually: why would you want to sort an array of bytes? There  
are certainly classes of operations that I think should just be done  
via b.toArray().X rather than directly on the blob.


As a community (CommonJS) we'd be more than happy to go forward with  
a binary spec that came from (or at least has the blessing of) the  
ES groups


Binary/B is the closest of the three proposals to mine, in that it has  
both mutable and immutable binary data containers. Here are a few key  
differences:


(1) Binary/B does not have a cheap way to convert from the immutable  
representation (ByteString) to the mutable representation (ByteArray)
(2) In Binary/B, Array-like index access to ByteString gives back one- 
byte ByteStrings instead of bytes, likely an over-literal copying of  
String
(3) There are some seemingly needless differences in the interfaces to  
ByteString and ByteArray that follow from modeling on String and Array
(4) Binary/B has many more operations available in the base proposal  
(including charset transcoding and a generous selection of String and  
Array methods)

(5) Different names - Data/DataBuilder vs. ByteString/ByteArray

My initial impression is that (1), (2) and (3) are all points on which  
my proposal is better. On (1): cheap conversion from mutable to  
immutable (DataBuilder.prototype.release() in my proposal) lets binary  
data objects be built up with a convenient mutation-based idiom, but  
then passed around as immutable objects thereafter. On (2): I don't  
think a one-byte ByteString is ever useful, indexing to get the byte  
value would be much more helpful. On (3), I think it's good for the  
mutable interface to be a strict superset of the the immutable  
interface.


(4) and (5) are all points where perhaps neither proposal is at the  
optimum yet. On (4), I suspect the sweet spot is somewhere between my  
spartan set of built-in operations and the very generous set in Binary/ 
B. On (5), I'm not sure either set of names is the best possible, and  
I'm certainly not stuck on my own proposed names.


Regards,
Maciej

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Binary Data - possible topic for joint session

2009-11-07 Thread Maciej Stachowiak


On Nov 7, 2009, at 6:53 PM, Ash Berlin wrote:



On 8 Nov 2009, at 02:21, Maciej Stachowiak wrote:



On Nov 7, 2009, at 5:39 AM, Ash Berlin wrote:


On 6 Nov 2009, at 19:24, Brendan Eich wrote:


On Nov 6, 2009, at 10:44 AM, Dean Landolt wrote:


http://wiki.commonjs.org/wiki/Binary


[snip]


[snip]
As a community (CommonJS) we'd be more than happy to go forward  
with a binary spec that came from (or at least has the blessing  
of) the ES groups


Binary/B is the closest of the three proposals to mine, in that it  
has both mutable and immutable binary data containers. Here are a  
few key differences:


(1) Binary/B does not have a cheap way to convert from the  
immutable representation (ByteString) to the mutable representation  
(ByteArray)
(2) In Binary/B, Array-like index access to ByteString gives back  
one-byte ByteStrings instead of bytes, likely an over-literal  
copying of String
(3) There are some seemingly needless differences in the interfaces  
to ByteString and ByteArray that follow from modeling on String and  
Array
(4) Binary/B has many more operations available in the base  
proposal (including charset transcoding and a generous selection of  
String and Array methods)

(5) Different names - Data/DataBuilder vs. ByteString/ByteArray



On (1): cheap conversion from mutable to immutable  
(DataBuilder.prototype.release() in my proposal) lets binary data  
objects be built up with a convenient mutation-based idiom, but  
then passed around as immutable objects thereafter.


Mutable to immutable or immutable to mutable? Assuming the former,  
how do you handle the differences in API/behaviour? each function  
checks wether it is now immutable?


Mutable to immutable. Immutable to mutable has to copy (or at least  
copy-on-write).


My proposal does it like this (where DataBuilder is the mutable  
variant and Data is the immutable):


DataBuilder.prototype.release()

Return a new Data with the same length and the same byte values  
as the DataBuilder passed as the this value. At the same time, the  
DataBuilder is reset to length 0.


Because the DataBuilder is reset to empty, the implementation can  
steal its underlying buffer for the new Data object, thus converting  
to immutable without a full copy. This matches the common pattern of  
assembling a new piece of binary data with mutation, then handing it  
out to possibly multiple other pieces of code as immutable.





On (2): I don't think a one-byte ByteString is ever useful,  
indexing to get the byte value would be much more helpful.


Couldn't agree more with you here - if for whatever reason you do  
want a one-byte ByteString, there is always substr/substring. This  
is something that came up recently in IRC and prompted me to start  
looking at making changes to the proposal - I was going to do that  
next week, so this coming up now is very good timing.


On (3), I think it's good for the mutable interface to be a strict  
superset of the the immutable interface.


Seems like a reasonable thing to do.


I'm glad we agree on these two points.





(4) and (5) are all points where perhaps neither proposal is at the  
optimum yet. On (4), I suspect the sweet spot is somewhere between  
my spartan set of built-in operations and the very generous set in  
Binary/B.


Agreed - this was the other thing i noticed - e.g. sorting a  
ByteArray isn't really an operation that makes a whole lot of sense  
to my mind.


Yep. I'm not even sure things like map(), filter() or reduce() are  
likely to work well. My own preference is to start the API very small,  
and add incrementally based on demonstrated need and clearly  
articulated use cases.





On (5), I'm not sure either set of names is the best possible, and  
I'm certainly not stuck on my own proposed names.


I'm not really bothered either way on this front, although 'Data' is  
much more likely to clash with existing code.


Yes, Brendan made this point and presented some good evidence in that  
direction. I think 'Data' doesn't work but 'Binary' or 'BinData' might.




Something worth bearing in mind is that Binary/B is implemented in 2  
or 3 CommonJS platforms already, but I don't think any one is  
particularly attached to the behaviour so long as what comes out  
isn't wildly different.


What kind of differences do you think they would tolerate? Renaming  
the classes? Dropping/changing some methods?


Regards,
Maciej


___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Binary Data - possible topic for joint session

2009-11-06 Thread Maciej Stachowiak

+ es-discuss (since posting there seems to have piqued more interest)

From reading over other proposals for binary data, I should mention  
the following operations that seem to be of interest to some  
communities but are not directly provided in this proposal (with ones  
I think are most appropriate for v1 first):


- Subrange/subdata/substring (get a Data that's a range from another's  
buffer - perhaps this could be optimized not to copy)
- Concatenation (specifically the ability to concatenate two immutable  
Data objects and get a new one back without having to go through the  
mutable type).
- Ability to convert to/from strings (with some hardcoded encoding or  
choice of encoding)

- Some or all of the operations of Array
- Base64 encode/decode
- Methods to compute various cryptographic hashes
- Find first or last occurrence of a byte or byte sequence (from a  
given offset)

- Split on a byte or byte sequence

I think it is possible to implement all of these with the primitives  
in my proposal, and in many cases the utility seems dubious (do you  
really want to map() or reduce() binary data one byte at a time?).  
Thus, I lean towards keeping the API relatively minimal, at least for  
starters.


Regards,
Maciej

On Nov 4, 2009, at 4:26 PM, Maciej Stachowiak wrote:



Many APIs being developed for the Web platform would benefit from a  
good way to store binary data. It would be useful for this to be  
specified as part of the ECMAScript language, but it's also  
plausible to make this a W3C spec that's only intended for use with  
Web platform APIs. Here is an overview of some of the APIs that  
could use such a data type, some notes on requirements and design  
alternatives, and a strawman proposal.


= If there's time, I'd like to discuss this at the joint TC-39/HTML  
WG/Web Apps WG session.


Some APIs that could use this:

   XMLHttpRequest v2 - to receive and send binary data
   WebSocket - to receive and send binary packets
   File API - to read binary files
   Canvas - to get image data in the binary form of an image format  
(avoiding inefficiency of data: URLs)
   various storage APIs - to store and retrieve binary data (in  
combination with other APIs)
   postMessage - to send binary data cross-window and cross-thread  
(to Workers) efficiently


I suspect there's more I am not thinking of. A convenient and  
efficient way to represent binary data could also be useful for pure  
ES programs.



= Current de facto ways for Web apps to deal with binary data:

   Array of numbers with one byte per entry
   String with one byte stored per UTF-16 code unit
   String with two bytes stored per UTF-16 code unit

I hope it is obvious why these approaches are not great so I won't  
go into detail.



= Issues for the binary data API:

Name (potential bikeshed):
ByteArray
ByteVector
BinaryData
Data

I like Data and similar names. Objective-C has NSData as a  
distinct type for chunks of binary data - it's not treated as a type  
of array. I think this makes sense. Often the fact that a chunk of  
binary data can be treated as an octet sequence is incidental.


==  Mutable or Immutable (or both?)

Immutable has a number of advantages:
   - Can share backing store with chunks of binary data that the UA  
already holds (e.g. in the network cache) without requiring copy-on- 
write
   - Can be passed cross-thread without copying, and without  
breaking shared-nothing semantics
   - Has the right semantics for passing cross-window (can make a  
copy in cross-process case, but avoid it in same-process case; or  
use shared memory in cross-process case without worrying about  
locking or races)

   - Follows the approach of ES strings, which are immutable

But there's some significant disadvantages too:
   - What if you actually want to mutate some piece of binary data  
you got before passing it along? How to do this efficiently?

   - What if you want to build a new binary data item from scratch?

With strings, the answer to both building and mutation is to extract  
pieces and build a new string by concatenation. But that's probably  
not efficient or convenient enough for the binary data case.


Possible solution: provide immutable Data, but have a DataBuilder  
class to allow creating new data items or mutating copies of  
existing ones, which can then give a final immutable product.



== What Operations?

Operation set could be a full set of array-like operations,  
absolutely minimal (just accessors for individual bytes), or middle  
ground (byte-level accessors plus a few bulk operations like the  
equivalent of memcpy). I like the middle ground.


== Rough API Proposal

Here's a sketch of a binary data API that's immutable (with mutable  
builder class), and provides a middle-ground set of operations. The  
basic idea is that binary data should be considered a first-class  
datatype in its own right, just as strings are, rather than being  
thought of