Ken Hornstein writes:
I've been grappling with to do when we have issues with character set
conversion.
Unfortunately, I have a lot of experience and troubles with character
set conversion.
Specifically, I have two issues:
- What to do if the character set is unsupported.
Should we
Unfortunately, I have a lot of experience and troubles with character
set conversion.
Well, if you just bit the bullet and switched to UTF-8, you wouldn't have
all of these problems! :-)
Should we return the original bytes?
It is not the best idea. Some sequences of bytes are control
amIn my personal opinion a very good choice is conversion into
amhtml-entities, like aogon; or lstrok; . It remains quite readable and
amis still unique enough to convert it back in case of need.
krUm, ouch. Unless there's a common library that already implements
krthat behavior, that's not on
krUm, ouch. Unless there's a common library that already implements
krthat behavior, that's not on the table at all.
Supposedly Recode does: http://recode.progiciels-bpi.ca/index.html
A super-quick scan of our systems does not show that as something that
comes out of the box installed on our
This gets very icky, very quickly :-P
My feeling is that if you don't recognize the source character set, you cannot
possibly convert it to a display format in any secure manner. By default I
think we should not display the content, but instead spit out a diagnostic,
with the option to re-run
Recode need not be required, it could just be an option. iconv currently
isn't afterall, although they seem to complement each other. Recode is
part of the core distrib of my older Ubuntu 10.02.
Selective recoding would probably require calls for the substrings of interest.
As an aside, recode's
On Feb 28, 2014, at 12:01 PM, Ken Hornstein k...@pobox.com wrote:
If we make sure we're converting all non-printable characters into something
else, I'm unclear as to how that could happen. But if it can happen, please
educate me!
It's a case of fooling the GB* and multibyte converters into
On Feb 28, 2014, at 12:01 PM, Ken Hornstein k...@pobox.com wrote:
We'd still have to deal with what happens when
you want to convert U+1F4A9 to ISO-8859-1.
That's not an illegal parse of the input, it's a composting problem. Not the
same thing at all.
signature.asc
Description: Message
Recode need not be required, it could just be an option. iconv currently
isn't afterall, although they seem to complement each other. Recode is
part of the core distrib of my older Ubuntu 10.02.
Fair enough ... but iconv() is part of POSIX, so assuming that it's available
is reasonable (if you
On Feb 28, 2014, at 12:24 PM, Ken Hornstein k...@pobox.com wrote:
Fair enough ... but iconv() is part of POSIX, so assuming that it's available
is reasonable (if you don't have iconv(), we basically give up in terms of
handling different character sets).
Sadly, iconv() in practice is a
We'd still have to deal with what happens when you want to convert
U+1F4A9 to ISO-8859-1.
That's not an illegal parse of the input, it's a composting problem.
Not the same thing at all.
Sigh, IT'S THE SAME THING. iconv() returns EILSEQ at a particular point
in your conversion buffer. What do
Sigh, IT'S THE SAME THING. iconv() returns EILSEQ at a particular point
in your conversion buffer. What do you do next?
In your example, emit a Pile Of Poo.
I know you're being flippant ... but it's a serious question. Right now,
iconv() returns EILSEQ if you cannot convert an input
On Feb 28, 2014, at 1:01 PM, Ken Hornstein k...@pobox.com wrote:
Based on _what you want to happen_, what, exactly, should be
done from a programming perspective? Bail?
Yes! Bail! Don't be a vector for someone to do nasties!
If people want to see invalid content, they have cat(1) at hand.
Look, software cannot read minds. People would like it to, but I don't
work for the NSA, so I don't buy into that concept. We have standards.
For a reason. To eliminate ambiguity. MIME has been around for how
many years now? There is no excuse in this day and age for any software
to generate
That is right. On the other hand, you never prevent malformed MIME
parameters.
Remember that we're not talking about malformed MIME parameters; we're
talking about entirely valid ones.
It is not a problem in case of one or two missing or substituted
symbols in long text. We can guess what is the
15 matches
Mail list logo