Re: [Pharo-users] The opposite of encodeForHTTP

Stéphane Ducasse Sun, 22 Jul 2012 01:01:02 -0700

On Jul 20, 2012, at 6:25 PM, Brenda Larcom wrote:

> I suppose I could unlurk at this point.  :)
> 
> I'm a security geek (specifically, a secure development geek focusing on 
> security architecture) in my day job, and I have a long unmaintained 
> architecture security analysis tool written in Squeak 
> (http://www.octotrike.org/ for the curious), which I have been unmothballing. 
>  We are considering switching to Pharo, partly because we are planning to add 
> some P2P collaboration features we think have an HTTP layer in there 
> somewhere & partly because we like it small, tidy, and self-compatible.  
> Hence my lurking.


Welcome and I would love to have more people working on these areas :).

> I've done some work on how data validation should be done for security 
> purposes, for my day job.  This includes output encoding and decoding, like 
> what Davide is talking about.  It's pretty tricky to get right because of the 
> large number of contexts, with subtly different rules.  E.g. I would expect 
> encodeForHTTP to be appropriate for HTTP headers, except that e.g. two things 
> you usually want to put in HTTP headers are URIs and cookies, each of which 
> have different rules (for different subparts, even) for what should be 
> encoded.  The differences don't seem like much, but in the wild, my coworkers 
> & I see these sorts of differences lead to vulnerabilities on a daily basis.
> 
> From a security architecture perspective, the absolute best way to handle 
> encoding & decoding for a structured object like an HTTP request or response 
> (or a URI, or a cookie, or an HTML document, or..) is to use a validating 
> parser.  Basically, when you get an HTTP request, parse it & put it in an 
> object structured like the request.  At that time, you know the meaning of 
> each portion of the string you are parsing, so you can interpret the bits 
> correctly/safely.  The object(s) should store the individual strings that are 
> actually content (vs. structure & constants) in a decoded state.  The 
> developer should get everything from the objects, in decoded form, and put 
> everything into the objects in decoded form.  Then, when it is time to send 
> the response, the objects encode everything safely/canonically based on the 
> exact type of objects they are.  This design concentrates the hard stuff 
> (encoding, decoding, canonicalization, layering encodings on top of each 
> other) near the interfaces, at the first/last possible moment enough context 
> is known to interpret the information accurately.  It separates the mechanics 
> of using a protocol or format from the intent of using the protocol.  It lets 
> someone like me easily QA both the library and application code for security. 
>  It is also simple for the developer to use safely (all the dev needs to 
> think about is what objects/content they want to assemble, and the data 
> validation at that layer is taken care of automatically) & is therefore the 
> only design pattern I have seen consistently avoid all encoding-related 
> vulnerabilities in the wild.  
> 
> So what does this mean?  Basically, from a security perspective, encoding & 
> decoding methods should live in the objects they encode and decode, and never 
> be called from outside code.  That is, there should be an 
> HTTPHeader>>fromString: or fromStream: method, which is called from an 
> HTTPResponse >>fromString: or fromStream: method, and no 
> String>>decodeFromHTTP.   Adding a String>>decodeFromHTTP method is easy from 
> the library maintainer's point of view, approximately correct (way more 
> correct than no method at all), and it matches what most languages are doing 
> these days, but it shifts the burden of all that thought about the specific 
> HTTP header & context to the application developer, who is usually just 
> trying to write an application, not learn every single detail of the HTTP & 
> gazillion other standards he would need to do this safely.
> 
> Since this is a suggestion for substantial architecture change that would 
> cause significant backwards compatibility issues throughout the entire Web 
> application stack, and I'm new to Pharo to boot, I am expecting some 
> interesting discussion to occur next.  Or maybe profound silence.  :)

Thanks for the explanation. It makes sense. String is a dead object just 
counting and assembling characters. So 
Now what I would love to see is if you interested:
        - how can we improve the infrastructure of Pharo?
        step by step or via a big refactoring :)

        - I would add a simple decodeFromHTTP as a convenience method and in 
the future point to the validators.


> In my back pocket somewhere amongst the code I am unmothballing, I have 95% 
> of a thouroughly documented URI implementation and test suite that follows 
> this pattern and is pedantically compliant with one or another of the URI 
> RFCs (it's old, may not be the most recent).

Bring it to life. We were discussing internally that we would like to have a 
decent URI implementation and we would like to massively clean 
the URL/URI …. with ZnURL whatever. So it would be great to have a good part.
Now what I see from your mail :) is that you are a kind of perfectionist and 
you should pay attention (I know some of them) and
you should force yourself to be happy with 80% and release it 
        - 1 your 80% may be the 95% of somebody else
        - 2 release often, make progress is the best way to finish. :)

>  I believe Spoon & Slate are using a previous version of it or its 
> derivatives.  I'll need a fully pedantic HTTP parsing stack to feel 
> comfortable releasing a P2P architecture security analysis tool (high value 
> target, large attack surface, potentially very large professional 
> embarrassment), so whatever isn't available, I expect we'll end up writing.  
> If Pharo folks are interested in this pattern,

Yes I'm. I will let the other reply to you because I'm far down in south of 
france but I'm quite sure that we are all interested.

> I would love to contribute my libraries/changes as I finish them, get advice 
> on backward compatibility, performance, and APIs people would like to see, 
> review whatever related code you'd like for security issues, and/or 
> collaborate with any other developer who is interested.

I would love to learn from your expertise.

Stef
> 
> Brenda
> 
> 
> On Jul 20, 2012, at 1:47 AM, Davide Varvello <[email protected]> wrote:
> 
>> Good Stef, I opened a new feature as reminder here: 
>> http://code.google.com/p/pharo/issues/detail?id=6430
>>  
>> Davide
>> 
>> ----
>> - Cerchi un bravo Dentista, Avvocato, Commercialista? Un buon Hotel, 
>> Ristorante, Pizzeria? Io l'ho trovato su Oltre il Passaparola
>> 
>> - Blog: Cambia il Tempo
>> 
>> From: Stéphane Ducasse [via Smalltalk] <[hidden email]>
>> To: Davide Varvello <[hidden email]> 
>> Sent: Thursday, July 19, 2012 10:43 PM
>> Subject: Re: The opposite of encodeForHTTP
>> 
>> Let us fix it and propose a decodeFromHTTP method 
>> 
>> Stef 
>> 
>> On Jul 18, 2012, at 2:02 PM, Davide Varvello wrote: 
>> 
>> > Thanks Sven, 
>> > I was looking for String>>decode..whatever... with no luck :-) 
>> > Cheers 
>> > 
>> > -- 
>> > View this message in context: 
>> > http://forum.world.st/The-opposite-of-encodeForHTTP-tp4640491p4640510.html
>> > Sent from the Pharo Smalltalk Users mailing list archive at Nabble.com. 
>> > 
>> 
>> 
>> 
>> 
>> If you reply to this email, your message will be added to the discussion 
>> below:
>> http://forum.world.st/The-opposite-of-encodeForHTTP-tp4640491p4640822.html
>> To unsubscribe from The opposite of encodeForHTTP, click here.
>> NAML
>> 
>> 
>> 
>> View this message in context: Re: The opposite of encodeForHTTP
>> Sent from the Pharo Smalltalk Users mailing list archive at Nabble.com.

Re: [Pharo-users] The opposite of encodeForHTTP

Reply via email to