The simplest runtime implementation would be non-zero subtype on the
string, to mark that it is binary data and not unicode text.
(Although that might make the stringp operator a bit ambiguous.)
The main benefits are in static typechecking, making sure you don't
send unencoded text to I/O functions
I'm not sure I follow. Which problem should this solve, a mark in the
string struct what the type of data the string contains?
On Thu, Nov 24, 2016 at 12:40 AM, Marcus Comstedt (ACROSS) (Hail
Ilpalazzo!) @ Pike (-) developers forum <10...@lyskom.lysator.liu.se>
wrote:
>>In Python, it's done with a prefix - u"asdf" is a Unicode string, and
>>b"asdf" is a byte string.
>
> Since nominally strings are Unicode (with the extende
Yes, s will be Unicode. Of course, you need to declare the character
encoding of your source file using a #charset tag (or use a BOM to
indicate UTF encoding).
I think it would be a good idea as well, see 21907878.
The only thing that should have to care about the encoding should be
the endpoints.
How are string constants handled today? If I do
string s = "räksmörgås";
am I guaranteed a certain encoding of s?
Well, I'm not sure that's actually abusing it; Stdio.Buffer is a
sort of compromise for getting some of the benefits of a native buffer
type while not getting all of the problems (it does not affect
compatibility as it uses a separate set of APIs, and while that does
lead to inconsistency it's not
Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum wrote:
>can also look at Java, which has byte[] as the type for byte strings,
>requiring literals like {'a','s','d','f'}, but I would like to see
In the EngineIO implementation I currently abuse Stdio.Buffer to fulfill this
bin
Yup, the thing we were discussing was how it would be nice to actually
be able to declare when they contain something else. :-) But it is a
valid point that binary encoded data is not necessarily 8-bit. You
should definitely be allowed to declare something as buffer(12bit) if
you want to store 1
>It's valid Pike. Pike supports the full ISO/IEC 10646 31-bit range,
>plus an equally large negative range.
Also note that Pike strings doesn't necessarily contain Unicode, even
if they usually do. They _could_ just as well contain RGB pixels or
random memory access data from a 12-bit-word syst
>Right, and that's something that can't be done in the current
>standard. Hence this entire proposal has to wait until some major
>changes can be done.
Yup. And then those changes should not be a repurposing of an
existing mechanism (element ranges on the string type) but something
more appropria
On Thu, Nov 24, 2016 at 12:20 AM, Marcus Comstedt (ACROSS) (Hail
Ilpalazzo!) @ Pike (-) developers forum <10...@lyskom.lysator.liu.se>
wrote:
>>\U12345678 possibly should be an error, as it's not valid Unicode.
>
> It's valid Pike. Pike supports the full ISO/IEC 10646 31-bit range,
> plus an equal
>By "binary data", I mean eight-bit strings of arbitrary bytes - like
>you'd read from a file or something. Currently, functions like
>Stdio.read_file simply return "string", but they'll effectively be
>returning string(8bit).
No, Stdio.read_file currently returns string(8bit). That simply means
On Wed, Nov 23, 2016 at 11:10 PM, Marcus Comstedt (ACROSS) (Hail
Ilpalazzo!) @ Pike (-) developers forum <10...@lyskom.lysator.liu.se>
wrote:
>>I agree, but using string(8bit) to mean "binary data" is something
>>that's 100% backward compatible.
>
> It would not be backwards compatible, since that
Strings with known encoding that can transfer into other strings with
a known encoding easily and readable (and in some cases without any
interaction) would be useful.
For instance,
Stdio.FILE x = ...;
x->set_encoding("utf8");
string s = "räksmörgås";
String t = String.JP2022("\33(BHello, world!
>I agree, but using string(8bit) to mean "binary data" is something
>that's 100% backward compatible.
It would not be backwards compatible, since that is not what
string(8bit) means today.
>Unicode text would always be referred
>to as string(21bit), even if it happens to contain nothing but Latin
On Wed, Nov 23, 2016 at 10:30 PM, Marcus Comstedt (ACROSS) (Hail
Ilpalazzo!) @ Pike (-) developers forum <10...@lyskom.lysator.liu.se>
wrote:
> I think you are conflagrating range with interpretation. Both a
> Latin1 string and an UTF-8 encoded one are 8-bit strings (with a 0-255
> range). What w
Even if it hadn't been, fixing that would have been the correct
course of action. ;-)
I think you are conflagrating range with interpretation. Both a
Latin1 string and an UTF-8 encoded one are 8-bit strings (with a 0-255
range). What would be useful is a datatype that declares that the
elements are not Unicode characters (as they are in the Latin1 string
case) but some raw binary
Marcus Comstedt (ACROSS) (Hail Ilpalazzo!) @ Pike (-) developers forum wrote:
>If there are no character values >127, then the encoding step is a
>no-op, so skipping it buys you nothing except making your code harder
>to read.
I see. I should have guessed that string_to_utf8() is already smart en
On Wed, Nov 23, 2016 at 10:00 PM, Marcus Comstedt (ACROSS) (Hail
Ilpalazzo!) @ Pike (-) developers forum <10...@lyskom.lysator.liu.se>
wrote:
> If there are no character values >127, then the encoding step is a
> no-op, so skipping it buys you nothing except making your code harder
> to read.
I en
If there are no character values >127, then the encoding step is a
no-op, so skipping it buys you nothing except making your code harder
to read.
Martin Nilsson (Coppermist) @ Pike (-) developers forum wrote:
>>Please review, any comments are welcome.
>This looks wrong:
> if (String.width(msg) > 8)
>msg = string_to_utf8(msg);
>You are always utf8-decoding the string, so you should always
>utf8-encode them.
Well spott
>Please review, any comments are welcome.
This looks wrong:
if (String.width(msg) > 8)
msg = string_to_utf8(msg);
You are always utf8-decoding the string, so you should always
utf8-encode them.
Martin Karlgren wrote:
>I guess the API user could keep track of sid:s and Server objects separately,
> if they don???t want a .farm?
It was/is not intended to be used like that, though the only dependency here
is that in the Server object there are exactly three references to the
global clients l
Very nice, good work!
I guess the API user could keep track of sid:s and Server objects separately,
if they don’t want a .farm? I'd imagine that the globally shared sid lookup
mapping might be regarded as a security issue in more complex setups, such as
multiple listener ports with different us
For the record I'd like to mention that using just the "specs"
at https://github.com/socketio/engine.io-protocol results in
an incorrect implementation.
I tried porting the javascript implementation at first: it resulted in a
mess of event-hell.
So, I finally used the specs first, then some good-
Please review, any comments are welcome.
The docs still need improvement, working on that.
Currently tackling Socket.IO.
--
Stephen.
27 matches
Mail list logo