RE: new string module

2005-01-05 Thread Tony Lewis
Mauro Tortonesi wrote:

> Alle 18:28, mercoledà 5 gennaio 2005, DraÅen Kacar ha scritto:
> > Jan Minar wrote:
> > > What's wrong with mbrtowc(3) and friends?  The mysterious solution 
> > > is probably to use wprintf(3) instead printf(3).  Couple of 
> > > questions on #c on freenode would give you that answer.
> >
> > Historically, wget source was written in a way which allowed one to 
> > compile it on really old systems. That would rule out C95 functions.
> >
> > (I'm not advocating this approach, just answering the question.)
> 
> as long as i am the maintainer of wget, backward compatibility on very old or 
> legacy systems will NOT be broken.

I don't think it has be an either/or situation. With well-selected #if 
statements, you should be able to have something that works on legacy systems 
while still providing wide character support on more modern operating systems.

I'm not volunteering to determine what those #if statements might be :-) ... 
just pointing out the possibility.

Tony




Re: new string module

2005-01-05 Thread Mauro Tortonesi
Alle 18:28, mercoledà 5 gennaio 2005, DraÅen KaÄar ha scritto:
> Jan Minar wrote:
> > What's wrong with mbrtowc(3) and friends?  The mysterious solution is
> > probably to use wprintf(3) instead printf(3).  Couple of questions on #c
> > on freenode would give you that answer.
>
> Historically, wget source was written in a way which allowed one to
> compile it on really old systems. That would rule out C95 functions.
>
> (I'm not advocating this approach, just answering the question.)

as long as i am the maintainer of wget, backward compatibility on very old or 
legacy systems will NOT be broken.

-- 
Aequam memento rebus in arduis servare mentem...

Mauro Tortonesi

University of Ferrara - Dept. of Eng.http://www.ing.unife.it
Institute of Human & Machine Cognition   http://www.ihmc.us
Deep Space 6 - IPv6 for Linuxhttp://www.deepspace6.net
Ferrara Linux User Group http://www.ferrara.linux.it


Re: new string module

2005-01-05 Thread Mauro Tortonesi
Alle 02:46, mercoledì 5 gennaio 2005, Jan Minar ha scritto:

> > > > as Fumitoshi UKAI suggested, the best choice would be to escape only
> > > > the strings that need to be escaped. so, i think we should probably
> > > > check together which strings passed to logprintf in the wget code
> > > > need to be escaped. anyone willing to help?
> > >
> > > You don't want to check whether this or that string accidentally needs
> > > or doesn't need to get escaped. The right way is to sanitize *all*
> > > untrusted input before you even start thinking about using it.
> >
> > mmmh, i don't think so. why would you for example want or need to escape
> > format strings (that are retrieved via gettext and are already in your
> > local charset), the URLs to download or the configuration data read from
> > wgetrc?

sorry, i did't see the word "untrusted" in your statement above. then you're 
right, we have to sanitize all untrusted input. but see below...

> Indeed, there's no point in not trusting other parts of the program
> (apart from robustness, sometimes).  I think I've heard this one
> somewhere, and I have to repeat: there's no difference between the .po
> files and the .h or .c files:  It's all just different ways of
> programming.  You would have to rewrite gettext to make some security
> boundary between the C code and the translated strings.
>
> I meant any input coming from an untrusted source such as a different
> user on the same system, or anything fetched from a network (be it a
> genuine server response, or some MiM-injected crap). -- But this is a
> basic security concept.

so, as i was saying, the point is: escaping EVERY string passed to logprintf 
as you were doing in your patch is unnecessary, inefficient and very bad 
design practice.

> > anyway, simone piunno and i have been talking a lot about this problem
> > and we've found that apart from a couple of minor problems (very easy to
> > fix) the current implementation of escape_buffer works fine. the problem
> > is when you pass escaped multibyte strings as arguments to printf. if
> > these strings contain a 0x00 byte, it will be incorrectly interpreted by
> > printf as a string termination characher. simone says for example that
> > UTF16 strings can contain null bytes.
>
> AFAICT my patch doesn't introduce any problems that haven't been there
> before:

never said that.

> > i don't really have any clue on how to solve this problem. simone
> > suggests to change the internal format of strings in wget to UTF8, but of
> > course i would prefer a less invasive solution if possible... i don't
> > even know if we could keep using gettext in that case.
>
> What's wrong with mbrtowc(3) and friends?  The mysterious solution is
> probably to use wprintf(3) instead printf(3).  Couple of questions on #c
> on freenode would give you that answer.

so, you suggest using wide chars as an internal representation of strings in 
wget? we can think about it (after all it's not that different from adopting 
UTF8 to encode strings) but i am not sure if this would break compatibility 
with legacy systems and/or windows.

BTW, when i wrote the code in strings.c i made A LOT of testing and debugging 
and i couldn't get wprintf(3) to work.

> I really don't mean it as a personal attack, but since You've showed You
> don't know much about basic security principles,

please can you explain me EXACTLY the reason why you say that?

> or [the more intricate parts of] C,

i am sorry but i don't think you can afford to say that. i've taken a look at 
the code you've sent me and i simply can't see how you can state you are a 
better developer than me. even though, as you say, it is a hotfix, your code 
is inefficient, buggy (yes you wrote buggy code!!!) and platform dependent.

you also discovered a directory traversal problem in wget (and i am glad this 
bug was published as i don't think it is so dangerous to need a fix 
immediately - which i can't do right now since i am busy working on other 3 
major software project - and i don't believe we have to hide any information 
about it), but you said you can't fix it. i have seen your perl code that 
shows the problem, but you haven't submitted any patch to fix it. the only 
thing you've done is just complaining on bugtraq about the incompetence of 
wget developers using parts of a private conversation YOU HAD NOT BEEN 
ALLOWED to use (and this made me VERY UPSET because you used a sentence of 
mine that made hrvoje nicksic - one of the best developers i've met in my 
life - look incompetent when i just wanted to say that there are parts of 
wget that need - and will have as long as i am the maintainer - some 
restyling and refactoring).

you know, i think it's very easy to criticize by saying that all the wget 
developers are incompetent (but let me state firmly that this is not the 
truth - after i've seen your code i don't think you can become half the 
developer hrvoje is, not even in 1000 years of practice) without producin

Re: new string module

2005-01-05 Thread Dražen Kačar
Jan Minar wrote:

> What's wrong with mbrtowc(3) and friends?  The mysterious solution is
> probably to use wprintf(3) instead printf(3).  Couple of questions on #c
> on freenode would give you that answer.

Historically, wget source was written in a way which allowed one to
compile it on really old systems. That would rule out C95 functions.

(I'm not advocating this approach, just answering the question.)

-- 
 .-.   .-.Yes, I am an agent of Satan, but my duties are largely
(_  \ /  _)   ceremonial.
 |
 |[EMAIL PROTECTED]


Re: new string module

2005-01-05 Thread Greg Hurrell
El 05/01/2005, a las 2:46, Jan Minar escribió:
Indeed, there's no point in not trusting other parts of the program
(apart from robustness, sometimes).  I think I've heard this one
somewhere, and I have to repeat: there's no difference between the .po
files and the .h or .c files:  It's all just different ways of
programming.  You would have to rewrite gettext to make some security
boundary between the C code and the translated strings.
I meant any input coming from an untrusted source such as a different
user on the same system, or anything fetched from a network (be it a
genuine server response, or some MiM-injected crap). -- But this is a
basic security concept.
I would argue that even input coming from the *same* user should be 
sanitizied. The user doesn't have to be malicious, but they could 
accidentally (for any number of reasons, from any number of sources) 
pass garbage input to wget and cause it to crash, which looks bad. 
Basically the "circle of trust" should be defined as the boundary 
between the program itself and *anything* outside of it.

Just my opinion.
Cheers,
Greg