Re: Record last password change

2018-12-12 Thread Bear Giles
Could you add your own UPDATE trigger to the password table? It can write
an entry to a new table, e.g., (userid, current date) whenever a record in
that table is modified.

On an earlier question - the issue isn't whether someone can crack your
password, it's possible disclosure in archive media somewhere. E.g., a
classic example is someone who accidently commits source code that contains
a password and then reverts it. It's not in the current source code but
without a lot of effort (not always possible) it could be retrieved by
anyone with access to the commit history. If you change your password every
few months this will soon be a moot issue even if the person doesn't
mention this to someone who can change the password immediately.

A more subtle point is backups. An attacker might have had access to
encrypted backups (or regular backups containing encrypted records) for a
very long time and held onto them against the chance discovery of the
password. Once they learn it they have access to all of that data. If you
rotate the passwords they might have access to a few months of data but no
more than that. It's bad, but a few months is far better than a few years
if your data contains information that requires notification of everyone
affected and the offer of credit monitoring, etc.

I agree that people may choose bad passwords if forced to change them too
frequently but I'm in the camp that says it's fine to use a password
manager or even to write them down on a card kept in the person's wallet.

BTW another solution is SSO, e.g., Kerberos. I still need to submit a patch
to pgsql to handle it better(*) but with postgresql itself you sign into
the system and then the database server will just know who you are. You
don't have to worry about remembering a new password for postgresql. X.509
(digital certs) are another possibility and I know you can tie them to a
smart card but again I don't know how well we could integrate it into pgsql.

(*) I haven't looked at the code recently but the last time I checked pgsql
used the username/password combo. Enterprise environments usually use
keytab files instead of (u/p). It should also be smart enough to check if
the user already has a kerberos ticket and use it if nothing else is
specified. The latter would usually work with people. (I'm not sure what
happens in a more secure environment where the database expects the service
to be specified as well - users would need the username/postgresql@REALM
identity, not the more generic username@REALM identity.)



On Tue, Dec 11, 2018 at 12:04 PM Chapman Flack 
wrote:

> On 12/11/18 9:56 AM, Tom Lane wrote:
> > I've heard that if you want to implement a password aging policy, PAM
> > authentication can manage that for you; but I don't know the details.
>
> Interesting idea ... could use pam-pgsql[1] and PAM as the
> authentication method. Might result in another connection (from PAM)
> to authenticate every connection, though. I suppose the module could
> use a daemon keeping one connection open for auth queries, but the
> README doesn't *say* it does. Could set up a pooler just for the auth
> module to connect through, I guess.
>
> It allows you to configure arbitrary auth_query, acct_query, pwd_query,
> etc., so you could conceivably join pg_authid with some other table
> where you'd keep expiration info.
>
> Looks like our PAM authentication might not support some PAM
> capabilities like conducting additional message exchanges (for
> example, to prompt for a new password on the spot if the old
> one has expired).
>
> It might be possible to shoehorn that capability into the existing
> fe-be protocol by calling it a custom SASL method, something analogous
> to ssh's "keyboard-interactive"[2].
>
> -Chap
>
>
> [1] https://github.com/pam-pgsql/pam-pgsql
> [2] https://tools.ietf.org/html/rfc4256
>
>


Re: [FEATURE REQUEST] Encrypted indexes over encrypted data

2018-08-09 Thread Bear Giles
Some regulatory standards require all UII, even all PII, information be
encrypted within the database, not just on encrypted media. That's to
reduce exposure even if someone gets access to a live server, e.g., via SQL
Injection. (The perennial #1 risk for software vulnerabilities.)

UII is uniquely identifiable information, e.g., SSN.

PII is personally identifiable information, e.g, email address, phone
number, address. It doesn't have to be enough to uniquely identify the
person, just enough to cut the number of possible individuals down to a
handful. That's a surprising large number of fields, e.g., knowing where
someone was born and their birthdate will get you close to their SSN for
anyone born after the IRS started requiring SSNs for all claimed
dependents. Knowing someone's birth date and city of residence will get you
down to a handful of individuals, often a single individual depending upon
their age and the size of their city. It's remarkably easy to uniquely
identify something like 75% of the population if you have the data from a
couple different sites and some way to correlate the records. (That's why
it's not good enough to just use the sha1 of an email address, etc.)

I know the government required UII encryption in its databases when I last
worked on a government contract, and I think they've required PII
encryption as well for years. I would be verify surprised if HIPAA doesn't
require that as well for PII in addition to the medical info. I definitely
know PCI-DSS requires encryption of all information on the credit card
itself - you can keep the last few digits (I think 6 are allowed but for is
recommended) to facilitate searches. Of course companies could still have
the same information unencrypted in other columns or tables (except for the
CC number itself - and you *never* keep the CVN in any form on threat of
losing your ability to accept credit cards if you're caught) but they were
encouraged to encrypt it as well.

Anyway legal requirements is "#0" on that list. Everything else *might*
happen but depending upon the nature of the data you *will* be audited for
compliance with regulations, either preemptively (e.g., VISA requires
periodic audits of anyone making more than $X in transactions per year) or
after a breach. One of my other past employers did the type of auditing
VISA requires and their promotional material was full of interviews with
former small business owners who lost their business after a breach. It
wasn't due to the loss itself, it's because any breach automatically
requires the strictest auditing for the next (4?) years and that cost far
more than the average independent restaurant, auto repair shop, etc., can
afford. Obviously their business model is (in part) to scare people but
there are plenty of situations where you have to encrypt data within the
database and not just rely on encrypted media.




> Here are some threats you might choose to protect against:
>
> 1) passive attackers on the wire
> 2) active  attackers on the wire
> 3a) theft / compromise of storage devices
> 3b) compromise of decommissioned storage devices
> 3c) theft of running server
> 4) compromised backup storage
> 5) bad / compromised clients
> 6) bad / compromised DBAs or sysadmins
> 7) side channel exploits
> 8) ??
>
> (1) and (2) are taken care of by TLS.
>
> (3a) is taken care of by FDE in controllers, say, or by physical
> security.
>
> (3b) is taken care of by proper decommissioning, but FDE helps.
>
> (3c) you can't protect against if you have keys in memory.  You could
> use client-side crypto, but you'll have more clients to worry about than
> servers.  Physical security is your best option.  (And really, you don't
> get any way to protect against law enforcement taking the devices.)
>
> (4) is taken care of by encrypting backups, which requires no changes to
> PG to get.
>
> (5) is taken care of (to some degree) by server-side logic (triggers,
> ...).
>
> (6)...  You can't protect against sysadmins, really, nor DBAs, but you
> can use crypto on the *client*-side to get some protection.  Since the
> PG client is very thin and dumb, the PG client can't easily do this.
> The idea is to encrypt values and MAC/sign rows to prevent DBAs/
> sysadmins seeing sensitive data or tampering with your data.
>
> (7) one deals with by using crypto implementations built with side
> channel protection, though, really, this is a very difficult subject in
> general, especially since Spectre.
>
> Nico
> --
>
>


Re: [FEATURE REQUEST] Encrypted indexes over encrypted data

2018-08-09 Thread Bear Giles
There are alternatives. If you know what you want to find, e.g., a search
by username or email address, you can store a strong hash of the value as
an indexed column. By "strong hash" I mean don't just use md5 or sha1, or
even one round with a salt. I can give you more details about how and why
offline.

So you might have a record with:

   id serial primary key,
   email_hash text not null indexed,
   first_name_hash text indexed,
   last_name_hash text indexed,
   phone_number_hash text indexed ,
   'wallet'_containing_all_encrypted_values text

and that allows you to search on email, first name, last name, or phone
number, or some combination on them. But no expressions. The hashing would
be done in your app, not the database. You also probably want to convert
everything to lowercase, maybe remove spaces, etc., before computing the
hash.

You should be prepared to handle multiple matches. It's unlikely that an
email or phone number hash won't be unique but it's safest to always be
prepared for more than one match, decrypt the 'wallet', and then do a final
comparison. That also gives you a bit of protection from an attacker
creating an account and then changing the hash values to match someone
else. You can use that to support very limited expressions, e.g., also keep
a hash on the first three letters of their last name, but that will
compromise your security a bit since it allows an attacker to perform some
statistical analysis on the data.

Finally there's the general advice that hashes (and encrypted values)
should always have a version number of some sort. It could be something as
simple as 3$hash, or it could be a composite column or even a user-defined
type. The # indicates is a lookup into a table, perhaps in your app, that
tells you which hashing algorithm and salt to use. It makes life a lot
easier if the security audit tells you that you need to change your
cipher/salt/key/whatever but you can't do it immediately since you don't
know everything you need in order to do it, e.g., the password that you
need in order to recompute the hash value. With that version number it's
easy to continue to accept the existing password so they can log in, and in
the background you quietly recompute the hash using the new
salt/algorithm/whatever and update their record. I've worked for some
pretty knowledgeable companies that have overlooked this.

On Thu, Aug 9, 2018 at 6:05 AM, Andres Freund  wrote:

>
>
> On August 9, 2018 5:30:26 PM GMT+05:30, Danylo Hlynskyi <
> abcz2.upr...@gmail.com> wrote:
> > ?Is it hard to implement soluition 2?
>
> Yes.
>
> To the point that I'm fairly certain that an implementation would be
> considered to costly to maintain (vs benefit) of proposed.
>
> Andres
>
> --
> Sent from my Android device with K-9 Mail. Please excuse my brevity.
>
>


FYI: jOOQ blog

2018-03-22 Thread Bear Giles
​If you want to know how PostgreSQL compares to other databases, or are
looking for ideas on areas to improve, the jOOQ blog looks like a good
resource: https://blog.jooq.org/. jOOQ is a java library that provides a
database-agnostic way to access many different types of databases. I know
it does some query optimizations but I don't know how advanced it is. The
other reason for them to track advanced functionality available in some
databases is to ensure that they support users who want to use those
features.

Most of the blog is specific to java or its library but there's also
frequent articles on advanced functionality (row values, window functions,
etc.), query optimization, and comparisons between the databases. The
comparisons of query optimizations in particular might be good low hanging
fruit since it doesn't involve any user-facing changes, just recognizing a
few additional patterns in the query parse tree and knowing a more
efficient way to perform equivalent work.

Bear


Re: Precision loss casting float to numeric

2018-02-26 Thread Bear Giles
On Mon, Feb 26, 2018 at 11:29 AM, Tom Lane  wrote:

> Chapman Flack  writes:
> > The 0002-*.patch is a proof-of-concept patching float4_numeric and
> > float8_numeric in the trivial way (just using FLT_DECIMAL_DIG and
> > DBL_DECIMAL_DIG in place of FLT_DIG and DBL_DIG). It makes the new
> > regression test pass. (It will only work under a compiler that has
> > __FLT_DECIMAL_DIG__ and __DBL_DECIMAL_DIG__ available, and I used
> > those internal versions to avoid mucking with build tooling to change
> > the target C standard, which I assume wouldn't be welcome anyway.
>
> Nope.  TBH, I'd think about just using "DBL_DIG + 3", given our existing
> coding around extra_float_digits in places like pg_dump and postgres_fdw.
> The knowledge that you need 2 or 3 extra digits is already well embedded.
>
> Conceivably you could do it like
>
> #ifndef DBL_DECIMAL_DIG
> #ifdef __DBL_DECIMAL_DIG__
> #define DBL_DECIMAL_DIG __DBL_DECIMAL_DIG__
> #else
> #define DBL_DECIMAL_DIG (DBL_DIG + 3)
> #endif
> #endif
>
> but I'm not exactly seeing how that buys us anything.
>
> The bigger question here is whether people actually want this behavioral
> change.  I think there's probably a bigger chance of complaints that
> "casting 1.1::float8 to numeric now produces some weird,
> incorrectly-rounded result" than that we make anyone happier.
>
> I have a vague idea that at some point in the past we discussed making
> this conversion use extra_float_digits, which'd allow satisfying both
> camps, at the nontrivial price that the conversion would have to be
> considered stable not immutable.  We didn't pull the trigger, if this
> memory is real at all, presumably because of the mutability issue.
>
> Another idea would be to leave the cast alone and introduce a named
> function that does the "exact" conversion.  Possibly that makes nobody
> happy, but at least both the cast and the function could be immutable.
> It'd dodge backwards-compatibility objections, too.
>
> regards, tom lane
>

​Working for a company that ​
has enterprise customers this can't be overemphasized.
Never require the user to do something so they keep getting the same
results.​
​
​
​ It doesn't
matter if it's "wrong".

​I would vote for a property. If you want the best effort to match the IEEE
spec
you need to execute 'set use_ieee_numbers'  and you'll get the extra digits
and
rounding behavior. If not ​you'll get the existing behavior.

Bear


Re: [GSoC Idea Discussion] "Thrift datatype support" Project

2018-02-13 Thread Bear Giles
On Tue, Feb 13, 2018 at 10:24 AM, Vladimir Sitnikov <
sitnikov.vladi...@gmail.com> wrote:

> Technically speaking, Thrift is "language-independent"
> serialization-deserialization format with clean approach to backward
> compatibility.
>
> I think Thrift (or something like that) can be useful, as it can generate
> serializers/deserializers for lots of languages.
> PostgreSQL's "binary" format is tied to the PostgreSQL and it has to be
> reimplemented for each and every client language.
>
> "text" format is non-trivial as well (e.g. it is hard to get quoting right
> for structs, and text is server-locale-dependent)
>
> Vladimir
>

​​Thanks. I've been helping a coworker with an older parquet writer and
​the URL for the hive metastore uses a thrift:// scheme. That is why I
thought it is a communications protocol.​


Re: [GSoC Idea Discussion] "Thrift datatype support" Project

2018-02-13 Thread Bear Giles
Isn't thrift the communications protocol?

Do we have foreign server support for parquet and ORC files?

On Tue, Feb 13, 2018 at 8:40 AM, Udit Juneja  wrote:

> Hi,
>
> I am Udit Juneja, a Computer Science undergraduate student at Thapar
> Institute of Engineering and Technology, India. I am interested in
> contributing to PostgreSQL.
>
> A brief introduction about me:
> I am familiar with programming languages (C, C++, Python, SQL).
>
> I am interested in "Thrift datatype support"  project. I have already
> started exploring PostgreSQL, and the libraries involved in this project.
>
> I would like to discuss the idea further and maybe submit a proposal in
> GSoC later.
>
> Regards,
> Udit Juneja
>


Re: What does Time.MAX_VALUE actually represent?

2017-12-31 Thread Bear Giles
​You don't need to store 25:20 in the database though - your app can use a
window that treats a day as "from 5 am today until 5 am tomorrow" and adds
24:00 to the times for tomorrow.​

Bear

On Sat, Dec 30, 2017 at 2:25 PM, Gavin Flower  wrote:

> On 12/31/2017 03:07 AM, Dave Cramer wrote:
>
>> We are having a discussion on the jdbc project about dealing with
>> 24:00:00.
>>
>> https://github.com/pgjdbc/pgjdbc/pull/992#issuecomment-354507612
>>
>> Dave Cramer
>>
>
> In Dublin (I was there 2001 to 2004), Time tables show buses just after
> midnight, such as 1:20am as running at the time 2520 - so there are visible
> close to the end of the day.  If you are looking for buses around midnight
> this is very user friendly - better than looking at the other end of the
> time table for 0120.
>
> I think logically that 24:00:00 is exactly one day later than 00:00:00 -
> but I see from following the URL, that there are other complications...
>
>
> Cheers,
> Gavin
>
>
>