Re: [Scons-dev] Merge PR #235 before release

2015-05-28 Thread Bill Deegan
Sounds like we should use sys.getfilesystemencoding().
+1

The only trick might be if you had several filesystems each of which had
different filesystemencoding()...
May just be best to back out the patch for now.

-Bill

On Thu, May 28, 2015 at 3:54 AM, Gary Oberbrunner 
wrote:

> If you're interested in this problem, I suggest reading
> https://docs.python.org/2/howto/unicode.html which has all the details
> (including how to ignore decode errors), and of course check out the
> python3 branch of scons where a lot of unicode handling has been done (but
> much is still left to do iirc).  I don't think pretending strings are in
> the cp437 encoding is a particularly good plan. ISO 8859-1 or Windows
> CP1252 would probably give better results in some cases but you still need
> to ignore errors in the decode.  And of course if the string actually is
> utf-8 with non-ascii chars, either of these encodings will return a string
> of the wrong length, not just wrong characters; and re-encoding it for
> output or storage will completely mangle it.
>
> Of course we _can_ know the encoding of the filenames in the filesystem,
> that's what sys.getfilesystemencoding() is for (see the unicode link
> above). Reading file contents and handling stdout/stderr from SCons
> subprocesses is much more of a challenge.
>
>
> On Thu, May 28, 2015 at 3:28 AM, anatoly techtonik 
> wrote:
>
>> I found a way to convert any binary string to Unicode without crashing -
>> http://stackoverflow.com/a/27527728/239247 That would correctly
>> convert all `ascii` characters (and will probably make it possible to use
>> ANSI graphics if unicode font supports that), but it will not work for
>> other
>> utf-8 characters.
>>
>> Python 3 adds some surrogateescape, but that is not present in Python 2.
>>
>> http://stackoverflow.com/questions/19649463/how-to-do-surrogateescape-in-python2
>> I don't know why they called it "surrogate" - it is a freaky word.
>>
>> On Wed, May 27, 2015 at 4:33 PM, Kenny, Jason L 
>> wrote:
>> > I would agree with this.
>> >
>> >
>> >
>> > In general the OS today store file data ( ie the file system data not
>> the
>> > data in the file) in Unicode ( be it utf-16 or utf-8). On Linux this is
>> not
>> > always the case it could be big5 or some other locale encoding.  On
>> Linux
>> > there are means to see what the “native” encoding is to use it.
>> >
>> >
>> >
>> > I should note that the idea of converting binary to Unicode does not
>> really
>> > exist. The point of a binary string to is to hold random data ( ie like
>> a
>> > double in the raw form 64-bit vs the dec values of 1.2385). One can
>> assume
>> > that it is a certain code page encoding and convert from that. And like
>> I
>> > stated above there are api to see what the locale code page encoding is
>> and
>> > that can be used to convert the code to the local ANSI/OEM encoding.
>> This is
>> > different from a binary string.
>> >
>> >
>> >
>> > Jason
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > From: Scons-dev [mailto:scons-dev-boun...@scons.org] On Behalf Of Gary
>> > Oberbrunner
>> > Sent: Wednesday, May 27, 2015 7:43 AM
>> > To: SCons developer list
>> > Subject: Re: [Scons-dev] Merge PR #235 before release
>> >
>> >
>> >
>> >
>> >
>> > On Wed, May 27, 2015 at 6:52 AM, anatoly techtonik > >
>> > wrote:
>> >
>> > What I need is a bulletproof way to convert from anything to unicode.
>> This
>> > requires some kind of escaping to go forward and back. Some helper
>> > methods like u2b() (unicode to binary) and b2u(). I am quite surprised
>> that
>> > so far I found nothing for this "simple" case.
>> >
>> >
>> > That's because in general the encoding of the "binary" string is
>> unknown.
>> > Is it ascii, utf-8, Windows CP-1252, shift-JIS, or something else?  You
>> > can't decode such a string to Unicode without knowing the encoding.
>> Check
>> > out the python-3 branch where we've been working through some of those
>> > issues.  Your u2b is "easy" if you assume you want the binary to be
>> utf-8
>> > encoded, which is normally safe; this conversion is guaranteed to work.
>> > Your b2u is not so easy.  You can't just assume utf-8 as you might
>> think; if
>> > the string has invalid utf-8 bytes it'll raise an error or generate
>> dummy
>> > chars depending on the args you pass to str.decode().  At least it'll
>> get
>> > mangled if it's in a different encoding than you expect.
>> >
>> >
>> >
>> > --
>> >
>> > Gary
>> >
>> >
>> > ___
>> > Scons-dev mailing list
>> > Scons-dev@scons.org
>> > https://pairlist2.pair.net/mailman/listinfo/scons-dev
>> >
>>
>>
>>
>> --
>> anatoly t.
>> ___
>> Scons-dev mailing list
>> Scons-dev@scons.org
>> https://pairlist2.pair.net/mailman/listinfo/scons-dev
>>
>
>
>
> --
> Gary
>
> ___
> Scons-dev mailing list
> Scons-dev@scons.org
> https://pairlist2.pair.net/mailman/listinfo/scons-dev
>
>
_

Re: [Scons-dev] Upgrading Mailman to 3.0?

2015-05-28 Thread Bill Deegan
Anatoly,

No.
We don't host mailman. The webhost does.

-Bill

On Thu, May 28, 2015 at 5:20 AM, anatoly techtonik 
wrote:

> On Thu, May 28, 2015 at 10:26 AM, Dirk Bächle  wrote:
> > On 28.05.2015 09:01, anatoly techtonik wrote:
> >>
> >> I just wonder if we can try newer Mailman to power SCons
> >> communication.
> >
> > I hadn't noticed that our mailing list communication is so bad that we
> need
> > to power it up. And for me, power in discussions and texts and documents
> > still comes mainly from their content...not from the tools that transport
> > the latter. ;)
>
> Well, I'd prefer stuff like http://try.discourse.org/ for communication.
> List seem a little dated. Now I am subscribed and rather active, but for
> one
> who is not so deeply involved, using all that oldschool stuff may be hard.
>
> >> That may bring greater good than newer web site.
> >
> > It may, or it may not. It's a coin toss, but might also only be a
> > fifty-fifty chance. :)
>
> Well, a proper hypothesis and assessment tests might improve the
> chance. =)
>
> >> I
> >> expect to finally find search button there.
> >>
> >
> > I'm unsure about what you're trying to say here: Do you simply *wish*
> for a
> > "find" button to be there, or do you actually *know* that Mailman has
> one?
>
> Well, if Mailman 3 won't have the search, then we can switch to discourse.
> --
> anatoly t.
> ___
> Scons-dev mailing list
> Scons-dev@scons.org
> https://pairlist2.pair.net/mailman/listinfo/scons-dev
>
___
Scons-dev mailing list
Scons-dev@scons.org
https://pairlist2.pair.net/mailman/listinfo/scons-dev


Re: [Scons-dev] Upgrading Mailman to 3.0?

2015-05-28 Thread anatoly techtonik
On Thu, May 28, 2015 at 10:26 AM, Dirk Bächle  wrote:
> On 28.05.2015 09:01, anatoly techtonik wrote:
>>
>> I just wonder if we can try newer Mailman to power SCons
>> communication.
>
> I hadn't noticed that our mailing list communication is so bad that we need
> to power it up. And for me, power in discussions and texts and documents
> still comes mainly from their content...not from the tools that transport
> the latter. ;)

Well, I'd prefer stuff like http://try.discourse.org/ for communication.
List seem a little dated. Now I am subscribed and rather active, but for one
who is not so deeply involved, using all that oldschool stuff may be hard.

>> That may bring greater good than newer web site.
>
> It may, or it may not. It's a coin toss, but might also only be a
> fifty-fifty chance. :)

Well, a proper hypothesis and assessment tests might improve the
chance. =)

>> I
>> expect to finally find search button there.
>>
>
> I'm unsure about what you're trying to say here: Do you simply *wish* for a
> "find" button to be there, or do you actually *know* that Mailman has one?

Well, if Mailman 3 won't have the search, then we can switch to discourse.
-- 
anatoly t.
___
Scons-dev mailing list
Scons-dev@scons.org
https://pairlist2.pair.net/mailman/listinfo/scons-dev


Re: [Scons-dev] Merge PR #235 before release

2015-05-28 Thread Gary Oberbrunner
If you're interested in this problem, I suggest reading
https://docs.python.org/2/howto/unicode.html which has all the details
(including how to ignore decode errors), and of course check out the
python3 branch of scons where a lot of unicode handling has been done (but
much is still left to do iirc).  I don't think pretending strings are in
the cp437 encoding is a particularly good plan. ISO 8859-1 or Windows
CP1252 would probably give better results in some cases but you still need
to ignore errors in the decode.  And of course if the string actually is
utf-8 with non-ascii chars, either of these encodings will return a string
of the wrong length, not just wrong characters; and re-encoding it for
output or storage will completely mangle it.

Of course we _can_ know the encoding of the filenames in the filesystem,
that's what sys.getfilesystemencoding() is for (see the unicode link
above). Reading file contents and handling stdout/stderr from SCons
subprocesses is much more of a challenge.


On Thu, May 28, 2015 at 3:28 AM, anatoly techtonik 
wrote:

> I found a way to convert any binary string to Unicode without crashing -
> http://stackoverflow.com/a/27527728/239247 That would correctly
> convert all `ascii` characters (and will probably make it possible to use
> ANSI graphics if unicode font supports that), but it will not work for
> other
> utf-8 characters.
>
> Python 3 adds some surrogateescape, but that is not present in Python 2.
>
> http://stackoverflow.com/questions/19649463/how-to-do-surrogateescape-in-python2
> I don't know why they called it "surrogate" - it is a freaky word.
>
> On Wed, May 27, 2015 at 4:33 PM, Kenny, Jason L 
> wrote:
> > I would agree with this.
> >
> >
> >
> > In general the OS today store file data ( ie the file system data not the
> > data in the file) in Unicode ( be it utf-16 or utf-8). On Linux this is
> not
> > always the case it could be big5 or some other locale encoding.  On Linux
> > there are means to see what the “native” encoding is to use it.
> >
> >
> >
> > I should note that the idea of converting binary to Unicode does not
> really
> > exist. The point of a binary string to is to hold random data ( ie like a
> > double in the raw form 64-bit vs the dec values of 1.2385). One can
> assume
> > that it is a certain code page encoding and convert from that. And like I
> > stated above there are api to see what the locale code page encoding is
> and
> > that can be used to convert the code to the local ANSI/OEM encoding.
> This is
> > different from a binary string.
> >
> >
> >
> > Jason
> >
> >
> >
> >
> >
> >
> >
> > From: Scons-dev [mailto:scons-dev-boun...@scons.org] On Behalf Of Gary
> > Oberbrunner
> > Sent: Wednesday, May 27, 2015 7:43 AM
> > To: SCons developer list
> > Subject: Re: [Scons-dev] Merge PR #235 before release
> >
> >
> >
> >
> >
> > On Wed, May 27, 2015 at 6:52 AM, anatoly techtonik 
> > wrote:
> >
> > What I need is a bulletproof way to convert from anything to unicode.
> This
> > requires some kind of escaping to go forward and back. Some helper
> > methods like u2b() (unicode to binary) and b2u(). I am quite surprised
> that
> > so far I found nothing for this "simple" case.
> >
> >
> > That's because in general the encoding of the "binary" string is unknown.
> > Is it ascii, utf-8, Windows CP-1252, shift-JIS, or something else?  You
> > can't decode such a string to Unicode without knowing the encoding.
> Check
> > out the python-3 branch where we've been working through some of those
> > issues.  Your u2b is "easy" if you assume you want the binary to be utf-8
> > encoded, which is normally safe; this conversion is guaranteed to work.
> > Your b2u is not so easy.  You can't just assume utf-8 as you might
> think; if
> > the string has invalid utf-8 bytes it'll raise an error or generate dummy
> > chars depending on the args you pass to str.decode().  At least it'll get
> > mangled if it's in a different encoding than you expect.
> >
> >
> >
> > --
> >
> > Gary
> >
> >
> > ___
> > Scons-dev mailing list
> > Scons-dev@scons.org
> > https://pairlist2.pair.net/mailman/listinfo/scons-dev
> >
>
>
>
> --
> anatoly t.
> ___
> Scons-dev mailing list
> Scons-dev@scons.org
> https://pairlist2.pair.net/mailman/listinfo/scons-dev
>



-- 
Gary
___
Scons-dev mailing list
Scons-dev@scons.org
https://pairlist2.pair.net/mailman/listinfo/scons-dev


Re: [Scons-dev] Merge PR #235 before release

2015-05-28 Thread anatoly techtonik
I found a way to convert any binary string to Unicode without crashing -
http://stackoverflow.com/a/27527728/239247 That would correctly
convert all `ascii` characters (and will probably make it possible to use
ANSI graphics if unicode font supports that), but it will not work for other
utf-8 characters.

Python 3 adds some surrogateescape, but that is not present in Python 2.
http://stackoverflow.com/questions/19649463/how-to-do-surrogateescape-in-python2
I don't know why they called it "surrogate" - it is a freaky word.

On Wed, May 27, 2015 at 4:33 PM, Kenny, Jason L  wrote:
> I would agree with this.
>
>
>
> In general the OS today store file data ( ie the file system data not the
> data in the file) in Unicode ( be it utf-16 or utf-8). On Linux this is not
> always the case it could be big5 or some other locale encoding.  On Linux
> there are means to see what the “native” encoding is to use it.
>
>
>
> I should note that the idea of converting binary to Unicode does not really
> exist. The point of a binary string to is to hold random data ( ie like a
> double in the raw form 64-bit vs the dec values of 1.2385). One can assume
> that it is a certain code page encoding and convert from that. And like I
> stated above there are api to see what the locale code page encoding is and
> that can be used to convert the code to the local ANSI/OEM encoding. This is
> different from a binary string.
>
>
>
> Jason
>
>
>
>
>
>
>
> From: Scons-dev [mailto:scons-dev-boun...@scons.org] On Behalf Of Gary
> Oberbrunner
> Sent: Wednesday, May 27, 2015 7:43 AM
> To: SCons developer list
> Subject: Re: [Scons-dev] Merge PR #235 before release
>
>
>
>
>
> On Wed, May 27, 2015 at 6:52 AM, anatoly techtonik 
> wrote:
>
> What I need is a bulletproof way to convert from anything to unicode. This
> requires some kind of escaping to go forward and back. Some helper
> methods like u2b() (unicode to binary) and b2u(). I am quite surprised that
> so far I found nothing for this "simple" case.
>
>
> That's because in general the encoding of the "binary" string is unknown.
> Is it ascii, utf-8, Windows CP-1252, shift-JIS, or something else?  You
> can't decode such a string to Unicode without knowing the encoding.  Check
> out the python-3 branch where we've been working through some of those
> issues.  Your u2b is "easy" if you assume you want the binary to be utf-8
> encoded, which is normally safe; this conversion is guaranteed to work.
> Your b2u is not so easy.  You can't just assume utf-8 as you might think; if
> the string has invalid utf-8 bytes it'll raise an error or generate dummy
> chars depending on the args you pass to str.decode().  At least it'll get
> mangled if it's in a different encoding than you expect.
>
>
>
> --
>
> Gary
>
>
> ___
> Scons-dev mailing list
> Scons-dev@scons.org
> https://pairlist2.pair.net/mailman/listinfo/scons-dev
>



-- 
anatoly t.
___
Scons-dev mailing list
Scons-dev@scons.org
https://pairlist2.pair.net/mailman/listinfo/scons-dev


Re: [Scons-dev] Upgrading Mailman to 3.0?

2015-05-28 Thread Dirk Bächle

Hi Anatoly,

On 28.05.2015 09:01, anatoly techtonik wrote:

Hi,

I just wonder if we can try newer Mailman to power SCons
communication.


I hadn't noticed that our mailing list communication is so bad that we need to power it up. And for me, power in discussions and 
texts and documents still comes mainly from their content...not from the tools that transport the latter. ;)



That may bring greater good than newer web site.


It may, or it may not. It's a coin toss, but might also only be a fifty-fifty 
chance. :)


I
expect to finally find search button there.



I'm unsure about what you're trying to say here: Do you simply *wish* for a "find" button to be there, or do you actually *know* 
that Mailman has one?


Best regards,

Dirk

___
Scons-dev mailing list
Scons-dev@scons.org
https://pairlist2.pair.net/mailman/listinfo/scons-dev


[Scons-dev] Upgrading Mailman to 3.0?

2015-05-28 Thread anatoly techtonik
Hi,

I just wonder if we can try newer Mailman to power SCons
communication. That may bring greater good than newer web site. I
expect to finally find search button there.

http://wiki.list.org/Mailman3

-- 
anatoly t.
___
Scons-dev mailing list
Scons-dev@scons.org
https://pairlist2.pair.net/mailman/listinfo/scons-dev