On Mon, Mar 9, 2009 at 8:41 PM, Erik Corry <[email protected]> wrote:
> 2009/3/9 Stephan Beal <[email protected]>:
> Here's the text from v8.h:
>
>   * Allocates a new string from either utf-8 encoded or ascii data.
>   * The second parameter 'length' gives the buffer length.
>   * If the data is utf-8 encoded, the caller must
>   * be careful to supply the length parameter.
>   * If it is not given, the function calls
>   * 'strlen' to determine the buffer length, it might be
>   * wrong if 'data' contains a null character.
>   */

Aha, okay i wasn't clear on the automatic assumption to utf8. Fair enough.

> So it will assume that it is UTF-8 if it is not ASCII.  Not all binary
> sequences are valid UTF-8 so you can't use this for binary data.
> Internally, V8 does not use UTF-8 so this data will be converted to
> UC16.

Doh, and here all along i assumed utf8 was what WAS used, as the API
has Utf8Value but no Utf16Value.

>  /** Allocates a new string from utf16 data.*/
>  static Local<String> New(const uint16_t* data, int length = -1);
>
> This one takes 16 bit characters and can represent binary data with no
> corruption, but the length is in characters, so you can's use it for
> an odd number of bytes.

What's the byte order?

>> In my case i'm working on an i/o library which of course treats the
>> data as opaque (void*). If i understand you correctly, if it happens
...
> Giving binary data to the above New method will result in undefined behaviour.

Fair enough.

> The external strings must have their data either in ASCII or in UC16.
> There's no Latin1 and undefined stuff will result if you try.  In the
> case of an external string the actual string data is not on the V8
> heap.  It is assumed to be immutable too of course since all JS
> strings are immutable.

That wouldn't solve my case, which is effectively latin1. i'll need to
think about that (but don't mind living with the limitation of ascii
read/write).

>> That's an idea. Didn't think of that. It'd mean (in my case) buffering
>> arbitrarily large read buffers, and since v8 doesn't guaranty GC will
>> ever be called, i don't want to risk it causing an arbitrarily-sized
>> leak.
>
> If the data is on the V8 heap then it won't be collected without a GC either. 
> :)

But even if i registered it for gc via a weak pointer callback, it's
not guaranteed to be freed, so i'm forced to add external gc to it in
*any* case and have the client call the cleanup routine when their
context dies (this is currently handled via a sentry object in the
client app which cleans up when it goes out of scope).


-- 
----- stephan beal
http://wanderinghorse.net/home/stephan/

--~--~---------~--~----~------------~-------~--~----~
v8-users mailing list
[email protected]
http://groups.google.com/group/v8-users
-~----------~----~----~----~------~----~------~--~---

Reply via email to