Re: [Pharo-users] The opposite of encodeForHTTP

Norbert Hartl Sat, 21 Jul 2012 04:41:19 -0700

Am 20.07.2012 um 20:53 schrieb Brenda Larcom:

> Thanks, Norbert; I'll take a look at Zinc, see how my existing code might 
> integrate, and propose something specific.  I personally think having an 
> insecure option for things like URIs and HTTP that are inherently on borders 
> almost all the time is unwise, but I'm happy to resolve my personal issues 
> via documentation.  :)
> 
You said "almost" yourself :) I just wanted to say that different people have 
different ideas. Restricting software to what we can imagine is like avoiding 
that other people realize amazing things we couldn't imagine.


> One reason validating parsers are so powerful is that when layers stack as 
> you mentioned, the security starts working as soon as the functional part 
> does.  I agree, such a parser definitely belongs at the borders of 
> interpretation schemes, not inside them.  Inside them it'll just use up time 
> without providing value.  Conveniently, the tool people naturally reach for 
> at interpretation borders usually has a parser in it someplace.
> 
> And yes, there do seem to be a particular lot of fiddly bits in URIs.  So 
> fiddly a few of the examples in the RFCs (as usual) don't match the rest of 
> the spec.
> 
agreed. I'm eager to see what you'll come up with. 

Norbert
> 
> On Jul 20, 2012, at 10:50 AM, Norbert Hartl <[email protected]> wrote:
> 
>> Brenda,
>> 
>> these are all good points as you said from a "security architecture 
>> perspective" and we should improve on that. The zinc http components do 
>> already a good job in structuring the entities as they should be. I think 
>> security add-ons can hook onto what is already there. There is a huge amount 
>> of things to consider. Even for a single URL the different components of an 
>> url have different encoding needs. 
>> On the other hand security is not a major target in a lot of use cases I can 
>> imagine. There is at least (for me) a triangle of security - performance - 
>> usability that makes it hard to have a single approach to fit them all. And 
>> we smalltalkers tend to judge freedom very high if it comes to program. In 
>> other words I would say we like to preserve the freedom of designing an 
>> insecure application at will :) The best way to solve those issues is by 
>> being modular, meaning a layer that can be put on top of the existing stuff 
>> to fulfill a particular use case.
>> The things you describe are present in a lot of environments. I mostly call 
>> this a "at the border of a system" problem. Things like strings inside of an 
>> environment are harmless. Problems appear if you cross system borders, 
>> meaning you cross interpretation schemes. And this a topic more broad then 
>> only HTTP. 
>> If we look at a widely known problem like sql injection there is not only 
>> the need for proper entity handling but for stacking validators and 
>> converters for different problems. It is such a big thing because you have 
>> an URL that goes through middleware and ends in a storage system like an SQL 
>> database. Here you cross at least two borders: HTTP to middleware and 
>> middleware to database. So you need to stack up converters and validators 
>> for HTTP, probably shell escapes in a middleware and finally for SQL. I 
>> think if you can assemble those things by the layers you use a security 
>> approach is doable. And for the same reason it goes so terribly wrong 
>> everywhere. 
>> So what does this modular thing mean? To have a lot of possibilities to 
>> fulfill certain needs without restricting everyone to a single scheme. 
>> My advice would be to have a look at the zinc components and propose things 
>> to improve from your perspective. Then publish your results here and there 
>> will be a lot of clever people finding a good way to integrate it in a 
>> modular way.
>> 
>> I hope this helps,
>> 
>> Norbert
>> 
>> Am 20.07.2012 um 18:25 schrieb Brenda Larcom:
>> 
>>> I suppose I could unlurk at this point.  :)
>>> 
>>> I'm a security geek (specifically, a secure development geek focusing on 
>>> security architecture) in my day job, and I have a long unmaintained 
>>> architecture security analysis tool written in Squeak 
>>> (http://www.octotrike.org/ for the curious), which I have been 
>>> unmothballing.  We are considering switching to Pharo, partly because we 
>>> are planning to add some P2P collaboration features we think have an HTTP 
>>> layer in there somewhere & partly because we like it small, tidy, and 
>>> self-compatible.  Hence my lurking.
>>> 
>>> I've done some work on how data validation should be done for security 
>>> purposes, for my day job.  This includes output encoding and decoding, like 
>>> what Davide is talking about.  It's pretty tricky to get right because of 
>>> the large number of contexts, with subtly different rules.  E.g. I would 
>>> expect encodeForHTTP to be appropriate for HTTP headers, except that e.g. 
>>> two things you usually want to put in HTTP headers are URIs and cookies, 
>>> each of which have different rules (for different subparts, even) for what 
>>> should be encoded.  The differences don't seem like much, but in the wild, 
>>> my coworkers & I see these sorts of differences lead to vulnerabilities on 
>>> a daily basis.
>>> 
>>> From a security architecture perspective, the absolute best way to handle 
>>> encoding & decoding for a structured object like an HTTP request or 
>>> response (or a URI, or a cookie, or an HTML document, or..) is to use a 
>>> validating parser.  Basically, when you get an HTTP request, parse it & put 
>>> it in an object structured like the request.  At that time, you know the 
>>> meaning of each portion of the string you are parsing, so you can interpret 
>>> the bits correctly/safely.  The object(s) should store the individual 
>>> strings that are actually content (vs. structure & constants) in a decoded 
>>> state.  The developer should get everything from the objects, in decoded 
>>> form, and put everything into the objects in decoded form.  Then, when it 
>>> is time to send the response, the objects encode everything 
>>> safely/canonically based on the exact type of objects they are.  This 
>>> design concentrates the hard stuff (encoding, decoding, canonicalization, 
>>> layering encodings on top of each other) near the interfaces, at the 
>>> first/last possible moment enough context is known to interpret the 
>>> information accurately.  It separates the mechanics of using a protocol or 
>>> format from the intent of using the protocol.  It lets someone like me 
>>> easily QA both the library and application code for security.  It is also 
>>> simple for the developer to use safely (all the dev needs to think about is 
>>> what objects/content they want to assemble, and the data validation at that 
>>> layer is taken care of automatically) & is therefore the only design 
>>> pattern I have seen consistently avoid all encoding-related vulnerabilities 
>>> in the wild.  
>>> 
>>> So what does this mean?  Basically, from a security perspective, encoding & 
>>> decoding methods should live in the objects they encode and decode, and 
>>> never be called from outside code.  That is, there should be an 
>>> HTTPHeader>>fromString: or fromStream: method, which is called from an 
>>> HTTPResponse >>fromString: or fromStream: method, and no 
>>> String>>decodeFromHTTP.   Adding a String>>decodeFromHTTP method is easy 
>>> from the library maintainer's point of view, approximately correct (way 
>>> more correct than no method at all), and it matches what most languages are 
>>> doing these days, but it shifts the burden of all that thought about the 
>>> specific HTTP header & context to the application developer, who is usually 
>>> just trying to write an application, not learn every single detail of the 
>>> HTTP & gazillion other standards he would need to do this safely.
>>> 
>>> Since this is a suggestion for substantial architecture change that would 
>>> cause significant backwards compatibility issues throughout the entire Web 
>>> application stack, and I'm new to Pharo to boot, I am expecting some 
>>> interesting discussion to occur next.  Or maybe profound silence.  :)
>>> 
>>> In my back pocket somewhere amongst the code I am unmothballing, I have 95% 
>>> of a thouroughly documented URI implementation and test suite that follows 
>>> this pattern and is pedantically compliant with one or another of the URI 
>>> RFCs (it's old, may not be the most recent).  I believe Spoon & Slate are 
>>> using a previous version of it or its derivatives.  I'll need a fully 
>>> pedantic HTTP parsing stack to feel comfortable releasing a P2P 
>>> architecture security analysis tool (high value target, large attack 
>>> surface, potentially very large professional embarrassment), so whatever 
>>> isn't available, I expect we'll end up writing.  If Pharo folks are 
>>> interested in this pattern, I would love to contribute my libraries/changes 
>>> as I finish them, get advice on backward compatibility, performance, and 
>>> APIs people would like to see, review whatever related code you'd like for 
>>> security issues, and/or collaborate with any other developer who is 
>>> interested.
>>> 
>>> Brenda
>>> 
>>> 
>>> On Jul 20, 2012, at 1:47 AM, Davide Varvello <[email protected]> wrote:
>>> 
>>>> Good Stef, I opened a new feature as reminder here: 
>>>> http://code.google.com/p/pharo/issues/detail?id=6430
>>>>  
>>>> Davide
>>>> 
>>>> ----
>>>> - Cerchi un bravo Dentista, Avvocato, Commercialista? Un buon Hotel, 
>>>> Ristorante, Pizzeria? Io l'ho trovato su Oltre il Passaparola
>>>> 
>>>> - Blog: Cambia il Tempo
>>>> 
>>>> From: Stéphane Ducasse [via Smalltalk] <[hidden email]>
>>>> To: Davide Varvello <[hidden email]> 
>>>> Sent: Thursday, July 19, 2012 10:43 PM
>>>> Subject: Re: The opposite of encodeForHTTP
>>>> 
>>>> Let us fix it and propose a decodeFromHTTP method 
>>>> 
>>>> Stef 
>>>> 
>>>> On Jul 18, 2012, at 2:02 PM, Davide Varvello wrote: 
>>>> 
>>>> > Thanks Sven, 
>>>> > I was looking for String>>decode..whatever... with no luck :-) 
>>>> > Cheers 
>>>> > 
>>>> > -- 
>>>> > View this message in context: 
>>>> > http://forum.world.st/The-opposite-of-encodeForHTTP-tp4640491p4640510.html
>>>> > Sent from the Pharo Smalltalk Users mailing list archive at Nabble.com. 
>>>> > 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> If you reply to this email, your message will be added to the discussion 
>>>> below:
>>>> http://forum.world.st/The-opposite-of-encodeForHTTP-tp4640491p4640822.html
>>>> To unsubscribe from The opposite of encodeForHTTP, click here.
>>>> NAML
>>>> 
>>>> 
>>>> 
>>>> View this message in context: Re: The opposite of encodeForHTTP
>>>> Sent from the Pharo Smalltalk Users mailing list archive at Nabble.com.
>>

Re: [Pharo-users] The opposite of encodeForHTTP

Reply via email to