Re: Is MinMaxExpr really leakproof?

2019-01-03 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > * Tom Lane (t...@sss.pgh.pa.us) wrote:
> >> So I'd like to get to a point where we're comfortable marking these
> >> functions leakproof despite the possibility of corner-case failures.
> >> We could just decide that the existing failure cases in varstr_cmp are
> >> not usefully exploitable for information leakage, or perhaps we could
> >> dumb down the error messages some more to make them even less so.
> >> It'd also be nice to have some articulatable policy that encompasses
> >> a choice like this.
> 
> > I'd rather not say "well, these are mostly leakproof and therefore it's
> > good enough" unless those corner-case failures you're referring to are
> > really "this system call isn't documented to ever fail in a way we can't
> > handle, but somehow it did and we're blowing up because of it."
> 
> Well, that's pretty much what we've got here.

Good.  Those all almost certainly fall under the category of 'covert
channels' and provided they're low bandwidth and hard to control, as
seems to be the case here, then I believe we can accept them.  I'm
afraid there isn't really any hard-and-fast definition that could be
used as a basis for a project policy around this, unfortunately.  We
certainly shouldn't be returning direct data from the heap or indexes as
part of error messages in leakproof functions, and we should do our best
to ensure that anything from system calls we make also don't, but
strerror-like results or the error codes themselves should be fine.

> 1. As Noah noted, every failure case in varstr_cmp is ideally a can't
> happen case, since if it could happen on valid data then that data
> couldn't be put into a btree index.

That's certainly a good point.

> 2. AFAICS, all the error messages in question just report that a system
> operation failed, with some errno string or equivalent; none of the
> original data is reported.  (Obviously, we'd want to add comments
> discouraging people from changing that ...)

Agreed, we should definitely add comments here (and, really, in any
other cases where we need to be thinking about similar issues..).

> Conceivably, an attacker could learn the length of some long string
> by noting a palloc failure report --- but we've already accepted an
> equivalent hazard in texteq or byteaeq, I believe, and anyway it's
> pretty hard to believe that an attacker could control such failures
> well enough to weaponize it.

Right, that's a low bandwidth covert channel and as such should be
acceptable.

> So the question boils down to whether you think that somebody could
> infer something else useful about the contents of a string from
> the strerror (or ICU u_errorName()) summary of a system function
> failure.  This seems like a pretty thin argument to begin with,
> and then the presumed difficulty of making such a failure happen
> repeatably makes it even harder to credit as a useful information
> leak.
> 
> So I'm personally satisfied that we could mark text_cmp et al as
> leakproof, but I'm not sure how we define a project policy that
> supports such a determination.

I'm not sure how to formalize such a policy either though perhaps we
could discuss specific "don't do this" things and have a light
dicussion about what covert channel are.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Is MinMaxExpr really leakproof?

2019-01-02 Thread Tom Lane
Stephen Frost  writes:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> So I'd like to get to a point where we're comfortable marking these
>> functions leakproof despite the possibility of corner-case failures.
>> We could just decide that the existing failure cases in varstr_cmp are
>> not usefully exploitable for information leakage, or perhaps we could
>> dumb down the error messages some more to make them even less so.
>> It'd also be nice to have some articulatable policy that encompasses
>> a choice like this.

> I'd rather not say "well, these are mostly leakproof and therefore it's
> good enough" unless those corner-case failures you're referring to are
> really "this system call isn't documented to ever fail in a way we can't
> handle, but somehow it did and we're blowing up because of it."

Well, that's pretty much what we've got here.

1. As Noah noted, every failure case in varstr_cmp is ideally a can't
happen case, since if it could happen on valid data then that data
couldn't be put into a btree index.

2. AFAICS, all the error messages in question just report that a system
operation failed, with some errno string or equivalent; none of the
original data is reported.  (Obviously, we'd want to add comments
discouraging people from changing that ...)

Conceivably, an attacker could learn the length of some long string
by noting a palloc failure report --- but we've already accepted an
equivalent hazard in texteq or byteaeq, I believe, and anyway it's
pretty hard to believe that an attacker could control such failures
well enough to weaponize it.

So the question boils down to whether you think that somebody could
infer something else useful about the contents of a string from
the strerror (or ICU u_errorName()) summary of a system function
failure.  This seems like a pretty thin argument to begin with,
and then the presumed difficulty of making such a failure happen
repeatably makes it even harder to credit as a useful information
leak.

So I'm personally satisfied that we could mark text_cmp et al as
leakproof, but I'm not sure how we define a project policy that
supports such a determination.

regards, tom lane



Re: Is MinMaxExpr really leakproof?

2019-01-02 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > * Tom Lane (t...@sss.pgh.pa.us) wrote:
> >> Noah Misch  writes:
> >>> Either of those solutions sounds fine.  Like last time, I'll vote for 
> >>> explicitly
> >>> verifying leakproofness.
> 
> >> Yeah, I'm leaning in that direction as well.  Other than comparisons
> >> involving strings, it's not clear that we'd gain much from insisting
> >> on leakproofness in general, and it seems like it might be rather a
> >> large burden to put on add-on data types.
> 
> > While I'd actually like it if we required leakproofness for what we
> > ship, I agree that we shouldn't blindly assume that add-on data types
> > are always leakproof and that then requires that we explicitly verify
> > it.  Perhaps an argument can be made that there are some cases where
> > what we ship can't or shouldn't be leakproof for usability but, ideally,
> > those would be relatively rare exceptions that don't impact common
> > use-cases.
> 
> Seems like we have consensus that MinMaxExpr should verify leakproofness
> rather than just assume it, so I'll go fix that.
> 
> What's your opinion on the question of whether to try to make text_cmp
> et al leakproof?  I notice that texteq/textne are (correctly) marked
> leakproof, so perhaps the usability issue isn't as pressing as I first
> thought; but it remains true that there are fairly common cases where
> the current marking is going to impede optimization.  I also realized
> that in the wake of 586b98fdf, we have to remove the leakproofness
> marking of name_cmp and name inequality comparisons --- which I did
> at d01e75d68, but that's potentially a regression in optimizability
> of catalog queries, so it's not very nice.

Well, as mentioned, I'd really be happier to have more things be
leakproof, when they really are leakproof.  What we've done in some
places, and I'm not sure how practical this is elsewhere, is to show
data when we know the user is allowed to see it anyway, to aide in
debugging and such (I'm thinking here specifically of
BuildIndexValueDescription(), which will just return a NULL in the case
where the user doesn't have permission to view all of the columns
involved).  As these are error cases, we're generally happy to consider
spending a bit of extra time to figure that out, is there something
similar we could do for these cases where we'd really like to report
useful information to the user, but only if we think they're probably
allowed to see it?

> Also, I believe that Peter's work towards making text equality potentially
> collation-sensitive will destroy the excuse for marking texteq/textne
> leakproof if we're going to be 100% rigid about that, and that would be
> a very big regression.

That could be a serious problem, I agree.

> So I'd like to get to a point where we're comfortable marking these
> functions leakproof despite the possibility of corner-case failures.
> We could just decide that the existing failure cases in varstr_cmp are
> not usefully exploitable for information leakage, or perhaps we could
> dumb down the error messages some more to make them even less so.
> It'd also be nice to have some articulatable policy that encompasses
> a choice like this.

I'd rather not say "well, these are mostly leakproof and therefore it's
good enough" unless those corner-case failures you're referring to are
really "this system call isn't documented to ever fail in a way we can't
handle, but somehow it did and we're blowing up because of it."

At least, in the cases where we're actually leaking knowledge that we
shouldn't be.  If what we're leaking is some error being returned where
all we're returning is an error code and not the actual data then that
doesn't seem like it's really much of a leak to me..?  I'm just glancing
through varstr_cmp and perhaps I'm missing something but it seems like
everywhere we're returning an error, at least from there, it's an error
code of some kind being returned and not the data that was passed in to
the function.  I didn't spend a lot of time hunting through it though.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Is MinMaxExpr really leakproof?

2019-01-02 Thread Tom Lane
Stephen Frost  writes:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> Noah Misch  writes:
>>> Either of those solutions sounds fine.  Like last time, I'll vote for 
>>> explicitly
>>> verifying leakproofness.

>> Yeah, I'm leaning in that direction as well.  Other than comparisons
>> involving strings, it's not clear that we'd gain much from insisting
>> on leakproofness in general, and it seems like it might be rather a
>> large burden to put on add-on data types.

> While I'd actually like it if we required leakproofness for what we
> ship, I agree that we shouldn't blindly assume that add-on data types
> are always leakproof and that then requires that we explicitly verify
> it.  Perhaps an argument can be made that there are some cases where
> what we ship can't or shouldn't be leakproof for usability but, ideally,
> those would be relatively rare exceptions that don't impact common
> use-cases.

Seems like we have consensus that MinMaxExpr should verify leakproofness
rather than just assume it, so I'll go fix that.

What's your opinion on the question of whether to try to make text_cmp
et al leakproof?  I notice that texteq/textne are (correctly) marked
leakproof, so perhaps the usability issue isn't as pressing as I first
thought; but it remains true that there are fairly common cases where
the current marking is going to impede optimization.  I also realized
that in the wake of 586b98fdf, we have to remove the leakproofness
marking of name_cmp and name inequality comparisons --- which I did
at d01e75d68, but that's potentially a regression in optimizability
of catalog queries, so it's not very nice.

Also, I believe that Peter's work towards making text equality potentially
collation-sensitive will destroy the excuse for marking texteq/textne
leakproof if we're going to be 100% rigid about that, and that would be
a very big regression.

So I'd like to get to a point where we're comfortable marking these
functions leakproof despite the possibility of corner-case failures.
We could just decide that the existing failure cases in varstr_cmp are
not usefully exploitable for information leakage, or perhaps we could
dumb down the error messages some more to make them even less so.
It'd also be nice to have some articulatable policy that encompasses
a choice like this.

Thoughts?

regards, tom lane



Re: Is MinMaxExpr really leakproof?

2019-01-02 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Noah Misch  writes:
> > Either of those solutions sounds fine.  Like last time, I'll vote for 
> > explicitly
> > verifying leakproofness.
> 
> Yeah, I'm leaning in that direction as well.  Other than comparisons
> involving strings, it's not clear that we'd gain much from insisting
> on leakproofness in general, and it seems like it might be rather a
> large burden to put on add-on data types.

While I'd actually like it if we required leakproofness for what we
ship, I agree that we shouldn't blindly assume that add-on data types
are always leakproof and that then requires that we explicitly verify
it.  Perhaps an argument can be made that there are some cases where
what we ship can't or shouldn't be leakproof for usability but, ideally,
those would be relatively rare exceptions that don't impact common
use-cases.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Is MinMaxExpr really leakproof?

2018-12-31 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 >> bttextcmp() and other varstr_cmp() callers fall afoul of the same
 >> restriction with their "could not convert string to UTF-16" errors
 >> (https://postgr.es/m/CADyhKSXPwrUv%2B9LtqPAQ_gyZTv4hYbr2KwqBxcs6a3Vee1jBLQ%40mail.gmail.com).
 >> Leaking the binary fact that an unspecified string contains an
 >> unspecified rare Unicode character is not a serious leak, however.
 >> Also, those errors would be a substantial usability impediment if
 >> they happened much in practice; you couldn't index affected values.

 Tom> Yeah. I think that there might be a usability argument for marking
 Tom> textcmp and related operators as leakproof despite this
 Tom> theoretical leak condition, because not marking them puts a large
 Tom> practical constraint on what conditions we can optimize. However,
 Tom> that discussion just applies narrowly to the string data types; it
 Tom> is independent of what we want to say the general policy is.

I think that's not even a theoretical leak; the documentation for
MultiByteToWideChar does not indicate any way in which it can return an
error for the specific parameters we pass to it. In particular we do not
tell it to return errors for invalid input characters.

-- 
Andrew (irc:RhodiumToad)



Re: Is MinMaxExpr really leakproof?

2018-12-31 Thread Tom Lane
Isaac Morland  writes:
> On Mon, 31 Dec 2018 at 12:26, Noah Misch  wrote:
>> bttextcmp() and other varstr_cmp() callers fall afoul of the same
>> restriction with their "could not convert string to UTF-16" errors

> I'm confused. What characters cannot be represented in UTF-16?

What's actually being reported there is failure of Windows'
MultiByteToWideChar function.  Probable causes could include
invalid data (not valid UTF8), or conditions such as out-of-memory
which might have nothing at all to do with the input.

There are similar, equally nonspecific, error messages in the
non-Windows code path.

In principle, an attacker might be able to find out the existence
of extremely long strings in a column by noting out-of-memory
failures in this code, but that doesn't seem like a particularly
interesting information leak ...

regards, tom lane



Re: Is MinMaxExpr really leakproof?

2018-12-31 Thread Tom Lane
Noah Misch  writes:
> This thread duplicates 
> https://postgr.es/m/flat/16539.1431472961%40sss.pgh.pa.us

Ah, so it does.  Not sure why that fell off the radar without getting
fixed; possibly because it was right before PGCon.

> pg_lsn_cmp() and btoidvectorcmp() surely could advertise leakproofness.

Agreed; I'll go fix those.

> I'm not sure about enum_cmp(), numeric_cmp(), tsquery_cmp() or
> tsvector_cmp().  I can't think of a reason those would leak, though.

I've not looked at the last three, but enum_cmp can potentially report the
value of an input, if it fails to find a matching pg_enum record:

enum_tup = SearchSysCache1(ENUMOID, ObjectIdGetDatum(arg1));
if (!HeapTupleIsValid(enum_tup))
ereport(ERROR,
(errcode(ERRCODE_INVALID_BINARY_REPRESENTATION),
 errmsg("invalid internal value for enum: %u",
arg1)));

and there are similar error reports inside compare_values_of_enum().
Whether that amounts to an interesting security leak is debatable.
It's hard to see how an attacker could arrange for those to fail,
much less do so in a way that would reveal a value he didn't know
already.

> btrecordcmp() and other polymorphic
> cmp functions can fail:
>   create type boxrec as (a box); select '("(1,1),(0,0)")'::boxrec = 
> '("(1,1),(0,0)")'::boxrec;
>   => ERROR:  could not identify an equality operator for type box
> The documentation says, "a function which throws an error message for some
> argument values but not others ... is not leakproof."  I would be comfortable
> amending that to allow the "could not identify an equality operator" error,
> because that error follows from type specifics, not value specifics.

I think the real issue with btrecordcmp, btarraycmp, etc, is that
they invoke other type-specific comparison functions.  Therefore,
we cannot mark them leakproof unless every type-specific comparison
function is leakproof.  So we're right back at the policy question.

> bttextcmp() and other varstr_cmp() callers fall afoul of the same restriction
> with their "could not convert string to UTF-16" errors
> (https://postgr.es/m/CADyhKSXPwrUv%2B9LtqPAQ_gyZTv4hYbr2KwqBxcs6a3Vee1jBLQ%40mail.gmail.com).
> Leaking the binary fact that an unspecified string contains an unspecified 
> rare
> Unicode character is not a serious leak, however.  Also, those errors would 
> be a
> substantial usability impediment if they happened much in practice; you 
> couldn't
> index affected values.

Yeah.  I think that there might be a usability argument for marking
textcmp and related operators as leakproof despite this theoretical
leak condition, because not marking them puts a large practical
constraint on what conditions we can optimize.  However, that
discussion just applies narrowly to the string data types; it is
independent of what we want to say the general policy is.

>> I think that we should either change contain_leaked_vars_walker to
>> explicitly verify leakproofness of the comparison function, or decide
>> that it's project policy that btree comparison functions are leakproof,
>> and change the markings on those (and their associated operators).

> Either of those solutions sounds fine.  Like last time, I'll vote for 
> explicitly
> verifying leakproofness.

Yeah, I'm leaning in that direction as well.  Other than comparisons
involving strings, it's not clear that we'd gain much from insisting
on leakproofness in general, and it seems like it might be rather a
large burden to put on add-on data types.

regards, tom lane



Re: Is MinMaxExpr really leakproof?

2018-12-31 Thread Noah Misch
This thread duplicates https://postgr.es/m/flat/16539.1431472961%40sss.pgh.pa.us

On Sun, Dec 30, 2018 at 01:24:02PM -0500, Tom Lane wrote:
> So this coding amounts to an undocumented
> assumption that every non-cross-type btree comparison function is
> leakproof.

> select p.oid::regprocedure from pg_proc p, pg_amproc a, pg_opfamily f
> where p.oid=a.amproc and f.oid=a.amprocfamily and opfmethod=403 and not 
> proleakproof and amproclefttype=amprocrighttype and amprocnum=1;
> 
>  bpcharcmp(character,character)
>  btarraycmp(anyarray,anyarray)
>  btbpchar_pattern_cmp(character,character)
>  btoidvectorcmp(oidvector,oidvector)
>  btrecordcmp(record,record)
>  btrecordimagecmp(record,record)
>  bttext_pattern_cmp(text,text)
>  bttextcmp(text,text)
>  enum_cmp(anyenum,anyenum)
>  jsonb_cmp(jsonb,jsonb)
>  numeric_cmp(numeric,numeric)
>  pg_lsn_cmp(pg_lsn,pg_lsn)
>  range_cmp(anyrange,anyrange)
>  tsquery_cmp(tsquery,tsquery)
>  tsvector_cmp(tsvector,tsvector)
> 
> so this assumption is, on its face, wrong.
> 
> In practice it might be all right, because it's hard to see a reason why
> a btree comparison function would ever throw an error except for internal
> failures, which are probably outside the scope of leakproofness guarantees
> anyway.  Nonetheless, if we didn't mark these functions as leakproof,
> why not?

pg_lsn_cmp() and btoidvectorcmp() surely could advertise leakproofness.  I'm not
sure about enum_cmp(), numeric_cmp(), tsquery_cmp() or tsvector_cmp().  I can't
think of a reason those would leak, though.  btrecordcmp() and other polymorphic
cmp functions can fail:

  create type boxrec as (a box); select '("(1,1),(0,0)")'::boxrec = 
'("(1,1),(0,0)")'::boxrec;
  => ERROR:  could not identify an equality operator for type box

The documentation says, "a function which throws an error message for some
argument values but not others ... is not leakproof."  I would be comfortable
amending that to allow the "could not identify an equality operator" error,
because that error follows from type specifics, not value specifics.

bttextcmp() and other varstr_cmp() callers fall afoul of the same restriction
with their "could not convert string to UTF-16" errors
(https://postgr.es/m/CADyhKSXPwrUv%2B9LtqPAQ_gyZTv4hYbr2KwqBxcs6a3Vee1jBLQ%40mail.gmail.com).
Leaking the binary fact that an unspecified string contains an unspecified rare
Unicode character is not a serious leak, however.  Also, those errors would be a
substantial usability impediment if they happened much in practice; you couldn't
index affected values.

> I think that we should either change contain_leaked_vars_walker to
> explicitly verify leakproofness of the comparison function, or decide
> that it's project policy that btree comparison functions are leakproof,
> and change the markings on those (and their associated operators).

Either of those solutions sounds fine.  Like last time, I'll vote for explicitly
verifying leakproofness.