RE: Defining generic Stream than considering only bytes (Re: CfC: publish WD of Streams API; deadline Nov 3)

2013-10-31 Thread Feras Moussa
A few comments inline below -


 From: tyosh...@google.com 
 Date: Thu, 31 Oct 2013 13:23:26 +0900 
 To: d...@deanlandolt.com 
 CC: art.bars...@nokia.com; public-webapps@w3.org 
 Subject: Defining generic Stream than considering only bytes (Re: CfC: 
 publish WD of Streams API; deadline Nov 3) 
 
 Hi Dean, 
 
 On Thu, Oct 31, 2013 at 11:30 AM, Dean Landolt 
 d...@deanlandolt.commailto:d...@deanlandolt.com wrote: 
 I really like this general concepts of this proposal, but I'm confused 
 by what seems like an unnecessary limiting assumption: why assume all 
 streams are byte streams? This is a mistake node recently made in its 
 streams refactor that has led to an objectMode and added cruft. 
 
 Forgive me if this has been discussed -- I just learned of this today. 
 But as someone who's been slinging streams in javascript for years I'd 
 really hate to see the standard stream hampered by this bytes-only 
 limitation. The node ecosystem clearly demonstrates that streams are 
 for more than bytes and (byte-encoded) strings. 
 
 
 To glue Streams with existing binary handling infrastructure such as 
 ArrayBuffer, Blob, we should have some specialization for Stream 
 handling bytes rather than using generalized Stream that would 
 accept/output an array or single object of the type. Maybe we can 
 rename Streams API to ByteStream not to occupy the name Stream that 
 sounds like more generic, and start standardizing generic Stream. 

Dean, it sounds like your concern isnt just around the naming, but rather how 
data is read out of a stream. I've reviewed both the Node Streams and Buffer 
APIs previously, and from my understanding the data is provided as either a 
Buffer or String. This is on-par with ArrayBuffer/String. What data do you want 
to obtain that is missing, and for what scenario? Are these data types that 
already exist in the web platform, or new types you think are missing?

 
 In my perfect world any arbitrary iterator could be used to 
 characterize stream chunks -- this would have some really interesting 
 benefits -- but I suspect this kind of flexibility would be overkill 
 for now. But there's good no reason bytes should be the only thing 
 people can chunk up in streams. And if we're defining streams for the 
 whole platform they shouldn't just be tied to a few very specific 
 file-like use cases. 
 If streams could also consist of chunks of strings (real, native 
 strings) a huge swath of the API could disappear. All of readType, 
 readEncoding and charset could be eliminated, replaced with simple, 
 composable transforms that turn byte streams (of, say, utf-8) into 
 string streams. And vice versa. 
 
 
 So, for example, XHR would be the point of decoding and it returns a 
 Stream of DOMStrings? 
 
 Of course the real draw of this approach would be when chunks are 
 neither blobs nor strings. Why couldn't chunks be arrays? The arrays 
 could contain anything (no need to reserve any value as a sigil). 
 Regardless of the chunk type, the zero object for any given type 
 wouldn't be `null` (it would be something like '' or []). That means we 
 can use null to distinguish EOF, and `chunk == null` would make a 
 perfectly nice (and unambiguous) EOF sigil, eliminating yet more API 
 surface. This would give us a clean object mode streams for free, and 
 without node's arbitrary limitations. 
 
 For several reasons, I chose to use .eof than using null. One of them 
 is to allow the non-empty final chunk to signal EOF than requiring one 
 more read() call. 
 
 This point can be re-discussed. 

I thought EOF made sense here as well, but it's something that can be changed. 
Your proposal is interesting - is something like this currently implemented 
anywhere? This behavior feels like it'd require several changes elsewhere, 
since some APIs and libraries may explicitly look for an EOF.

 
 The `size` of an array stream would be the total length of all array 
 chunks. As I hinted before, we could also leave the door open to 
 specifying chunks as any iterable, where `size` (if known) would just 
 be the `length` of each chunk (assuming chunks even have a `length`). 
 This would also allow individual chunks to be built of generators, 
 which could be particularly interesting if the `size` argument to 
 `read` was specified as a maximum number of bytes rather than the total 
 to return -- completely sensible considering it has to behave this way 
 near the end of the stream anyway... 
 
 I don't really understand the last point. Could you please elaborate 
 the story and benefit? 
 
 IIRC, it's considered to be useful and important to be able to cut an 
 exact requested size of data into an ArrayBuffer object and get 
 notified (the returned Promise gets resolved) only when it's ready. 
 
 This would lead to a pattern like `stream.read(Infinity)`, which would 
 essentially say give me everything you've got soon as you can. 
 
 In current proposal, read() i.e. read

Defining generic Stream than considering only bytes (Re: CfC: publish WD of Streams API; deadline Nov 3)

2013-10-30 Thread Takeshi Yoshino
Hi Dean,

On Thu, Oct 31, 2013 at 11:30 AM, Dean Landolt d...@deanlandolt.com wrote:

 I really like this general concepts of this proposal, but I'm confused by
 what seems like an unnecessary limiting assumption: why assume all streams
 are byte streams? This is a mistake node recently made in its streams
 refactor that has led to an objectMode and added cruft.

 Forgive me if this has been discussed -- I just learned of this today. But
 as someone who's been slinging streams in javascript for years I'd really
 hate to see the standard stream hampered by this bytes-only limitation.
 The node ecosystem clearly demonstrates that streams are for more than
 bytes and (byte-encoded) strings.


To glue Streams with existing binary handling infrastructure such as
ArrayBuffer, Blob, we should have some specialization for Stream handling
bytes rather than using generalized Stream that would accept/output an
array or single object of the type. Maybe we can rename Streams API to
ByteStream not to occupy the name Stream that sounds like more generic, and
start standardizing generic Stream.


 In my perfect world any arbitrary iterator could be used to characterize
 stream chunks -- this would have some really interesting benefits -- but I
 suspect this kind of flexibility would be overkill for now. But there's
 good no reason bytes should be the only thing people can chunk up in
 streams. And if we're defining streams for the whole platform they
 shouldn't *just* be tied to a few very specific file-like use cases.

 If streams could also consist of chunks of strings (real, native strings)
 a huge swath of the API could disappear. All of readType, readEncoding and
 charset could be eliminated, replaced with simple, composable transforms
 that turn byte streams (of, say, utf-8) into string streams. And vice versa.


So, for example, XHR would be the point of decoding and it returns a Stream
of DOMStrings?


 Of course the real draw of this approach would be when chunks are neither
 blobs nor strings. Why couldn't chunks be arrays? The arrays could contain
 anything (no need to reserve any value as a sigil). Regardless of the chunk
 type, the zero object for any given type wouldn't be `null` (it would be
 something like '' or []). That means we can use null to distinguish EOF,
 and `chunk == null` would make a perfectly nice (and unambiguous) EOF
 sigil, eliminating yet more API surface. This would give us a clean object
 mode streams for free, and without node's arbitrary limitations.


For several reasons, I chose to use .eof than using null. One of them is to
allow the non-empty final chunk to signal EOF than requiring one more
read() call.

This point can be re-discussed.


 The `size` of an array stream would be the total length of all array
 chunks. As I hinted before, we could also leave the door open to specifying
 chunks as any iterable, where `size` (if known) would just be the `length`
 of each chunk (assuming chunks even have a `length`). This would also allow
 individual chunks to be built of generators, which could be particularly
 interesting if the `size` argument to `read` was specified as a maximum
 number of bytes rather than the total to return -- completely sensible
 considering it has to behave this way near the end of the stream anyway...


I don't really understand the last point. Could you please elaborate the
story and benefit?

IIRC, it's considered to be useful and important to be able to cut an exact
requested size of data into an ArrayBuffer object and get notified (the
returned Promise gets resolved) only when it's ready.


 This would lead to a pattern like `stream.read(Infinity)`, which would
 essentially say *give me everything you've got soon as you can*.


In current proposal, read() i.e. read() with no argument does this.


  This is closer to node's semantics (where read is async, for added
 scheduling flexibility). It would drain streams faster rather than
 pseudo-blocking for a specific (and arbitrary) size chunk which ultimately
 can't be guaranteed anyway, so you'll always have to do length checks.

 (On a somewhat related note: why is a 0-sized stream specified to throw?
 And why a SyntaxError of all things? A 0-sized stream seems perfectly
 reasonable to me.)


0-sized Stream is not prohibited.

Do you mean 0-sized read()/pipe()/skip()? I don't think they make much
sense. It's useful only when you want to sense EOF and it can be done by
read(1).


 What's particularly appealing to me about the chunk-as-generator idea is
 that these chunks could still be quite large -- hundreds megabytes, even.
 Just because a potentially large amount of data has become available since
 the last chunk was processed doesn't mean you should have to bring it all
 into memory at once.


It's interesting. Could you please list some concrete example of such a
generator?