Re: [SQL] Create on insert a unique random number

2008-03-19 Thread hubert depesz lubaczewski
On Tue, Mar 18, 2008 at 01:40:42PM -0500, Campbell, Lance wrote:
> This is not a security approach.  It is more about not giving obvious
> access to people that want to mess around.

1. keep primary key using standard serial. it will make your life a bit
simpler.
2. add column for text random identifiers (it doesn't have to be number,
and adding characters makes for much better "randomness").
3. check this:
http://www.depesz.com/index.php/2007/06/25/random-text-record-identifiers/

depesz

-- 
quicksil1er: "postgres is excellent, but like any DB it requires a
highly paid DBA.  here's my CV!" :)
http://www.depesz.com/ - blog dla ciebie (i moje CV)

-- 
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] Create on insert a unique random number

2008-03-19 Thread D'Arcy J.M. Cain
On Tue, 18 Mar 2008 13:57:39 -0700
Steve Midgley <[EMAIL PROTECTED]> wrote:
> At 12:36 PM 3/18/2008, D'Arcy J.M. Cain wrote:
> >On Tue, 18 Mar 2008 12:23:35 -0700
> >Steve Midgley <[EMAIL PROTECTED]> wrote:
> > > 1) Create a second field (as someone recommend on this list) that 
> > is an
> > > MD5 of your primary key. Use that as your "accessor" index from the 
> > web
> >
> >I strongly disagree for three reasons.  First, if you are going to
> >generate a key then don't store it.  Just generate it every time.
> >Second, don't generate it based on a known field.  You may think that
> >it is secure but what if you private key is compromised?  Do you then
> >change everyone's security code?  Third, what if one person's
> >code is compromised?  If it is based on a calculation then you
> >can't change that one person's security code.

> I'm not clear on your concern here - an MD5 hash doesn't have a private 
> key that can be compromised, afaik. It's a one way hash. I don't see 

Right so it is even less useful than I implied.  It can never be
changed so why store it when it can be re-generated at any time.

> much difference between making an MD5 of the primary key and generating 
> a random number for the "public primary key", except that you shouldn't 
> get index collisions with the MD5 method (whereas eventually you will 
> with a random number, though of course using a GUID would eliminate 
> that concern for practical purposes).

But your suggestion was to base this key on the serial primary key so
where is your index collision protection?  You are going to get
collisions on both the serial key and, to a lesser extent, your
generated one. Besides, has anyone ever demonstrated a real issue with
lookups using serial primary keys? I think you are trying to second
guess the database engine with this and I don't think that that is a
great idea. 

> The issue is about creating an index into a sparse hash so that each 
> record is somewhat randomly located in a sparse hash "index space". 
> (One valid reason to do this would be if you wanted to hide the total 
> number of records in your table from competitors or customers). (Just 

If that is your goal then start your serial at something other than 1.
Start at 1,000,000 for example and your first user will think that
you already have one million clients.  Actually, he will think that
you started elsewhere than 1 but he won't know where.

-- 
D'Arcy J.M. Cain <[EMAIL PROTECTED]> |  Democracy is three wolves
http://www.druid.net/darcy/|  and a sheep voting on
+1 416 425 1212 (DoD#0082)(eNTP)   |  what's for dinner.

-- 
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] Create on insert a unique random number

2008-03-19 Thread Steve Midgley

At 06:47 AM 3/19/2008, D'Arcy J.M. Cain wrote:

But your suggestion was to base this key on the serial primary key so
where is your index collision protection?  You are going to get
collisions on both the serial key and, to a lesser extent, your
generated one. Besides, has anyone ever demonstrated a real issue with
lookups using serial primary keys? I think you are trying to second
guess the database engine with this and I don't think that that is a
great idea.

Hi D'Arcy,

I'm not following this line. Maybe we're talking about two different 
things here.. I don't know if Lance is using "CRUD" methodology per se, 
but that's a well accepted web approach and uses (generally) serial 
primary keys in the URL structure as (where numbers are serial pk's):


[website]/contact/12345
[website]/property/45678
  [and the client sends GET, POST, PUT, DELETE http requests, or 
mimics, to activate various functions]


Whether CRUD of otherwise, in the model I was promoting, there would be 
two index columns in the table along with other data, a public index 
and a serial primary key. The public index is based on the primary key:


pk | public_pk
1  | md5(1 + fixed salt)
2  | md5(2 + fixed salt)
...

AFAIK, an MD5 hash is guaranteed to generate a unique output for any 
unique input, so the serial key and fixed salt would guarantee no hash 
index collisions on the MD5 output. Of course if a competitor knows 
you're using MD5 and they know your salt, they could calculate all the 
md5 integer hashes and see which ones exist..


But I could care less if he uses md5 or sha-1 or Guids! (I just picked 
MD5 because another poster recommended it and it's very easy to 
implement in Pg). The point I care about is that there would be a 
public_pk that associates to one-and-only-one serial pk. Also that 
public_pk should be 1) not easily guessable, 2) non-clustering (and 
therefore non-serial). Then his url's would look like something like:


[website]/contact/c4ca4238a0b923820dcc509a6f75849b
[website]/property/c81e728d9d4c2f636f067f89cc14862c

> The issue is about creating an index into a sparse hash so that 
each
> record is somewhat randomly located in a sparse hash "index space". 

> (One valid reason to do this would be if you wanted to hide the 
total
> number of records in your table from competitors or customers). 
(Just


If that is your goal then start your serial at something other than 1.
Start at 1,000,000 for example and your first user will think that
you already have one million clients.  Actually, he will think that
you started elsewhere than 1 but he won't know where.


The original post did not want users to be able to type in random 
integers like:


/contact/343

And find out if that record #343 exists or not (regardless of whether 
they can get access to the record - the error generated on 
no-authorization may be different from record-not-found). So starting 
at a million does not fix the OP's issue.


From my perspective, wherever you start your serial index, competitors 
can watch it grow over time, if it's a numeric serial. That could be 
more valuable in many businesses than knowing the initial size of the 
table.


Anyway, I hope that clears up what I was recommending! I didn't 
anticipate it would stir up this much analysis and I hope the OP finds 
your input and mine useful in coming up with a final answer to his 
issue. Thanks for taking the time to consider the issue and I'll look 
forward to any additional ideas or comments you have on this too!


Sincerely,

Steve


--
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] Create on insert a unique random number

2008-03-19 Thread D'Arcy J.M. Cain
On Wed, 19 Mar 2008 08:28:28 -0700
Steve Midgley <[EMAIL PROTECTED]> wrote:
> I'm not following this line. Maybe we're talking about two different 
> things here.. I don't know if Lance is using "CRUD" methodology per se, 
> but that's a well accepted web approach and uses (generally) serial 
> primary keys in the URL structure as (where numbers are serial pk's):
> 
> [website]/contact/12345
> [website]/property/45678
>[and the client sends GET, POST, PUT, DELETE http requests, or 
> mimics, to activate various functions]

Yes, I do this all the time.

> Whether CRUD of otherwise, in the model I was promoting, there would be 
> two index columns in the table along with other data, a public index 
> and a serial primary key. The public index is based on the primary key:
> 
> pk | public_pk
> 1  | md5(1 + fixed salt)
> 2  | md5(2 + fixed salt)
> ...
> 
> AFAIK, an MD5 hash is guaranteed to generate a unique output for any 
> unique input, so the serial key and fixed salt would guarantee no hash 
> index collisions on the MD5 output. Of course if a competitor knows 
> you're using MD5 and they know your salt, they could calculate all the 
> md5 integer hashes and see which ones exist..
> 
> But I could care less if he uses md5 or sha-1 or Guids! (I just picked 
> MD5 because another poster recommended it and it's very easy to 
> implement in Pg). The point I care about is that there would be a 
> public_pk that associates to one-and-only-one serial pk. Also that 
> public_pk should be 1) not easily guessable, 2) non-clustering (and 
> therefore non-serial). Then his url's would look like something like:
> 
> [website]/contact/c4ca4238a0b923820dcc509a6f75849b
> [website]/property/c81e728d9d4c2f636f067f89cc14862c

Right and, as you state above, they could be guessable if someone gets
their hands on a relatively small amount of information.  If you simply
generate a random string of n characters where n is based on the amount
of security you need, you can store that and store it in a separate
field in the record.  You don't even need to make them unique.  Just
incorporate the serial number as well as the random string.  There may
conceivably be two records with "1ed6f54e5636837ddae4ef33397ee2cb" as
the key but only one that looks like
"021857.1ed6f54e5636837ddae4ef33397ee2cb".  In fact, you could md5 the
serial key and just string the two together if you really wanted
security through more obscurity but that's probably overkill.

The point here is that no one can guess what someone's URL is, even if
they know the ID, administrators can call up records by ID and
individual secret keys can be changed if compromised without affecting
anyone else.  Also, it's a normalized table.  Storing a value that you
can generate is unnormalized.

> The original post did not want users to be able to type in random 
> integers like:
> 
> /contact/343

See above.  That's not what I was suggesting.

> And find out if that record #343 exists or not (regardless of whether 
> they can get access to the record - the error generated on 
> no-authorization may be different from record-not-found). So starting 
> at a million does not fix the OP's issue.

Certainly you would generate the same error to the web user, even if
you differentiate in your internal error log.

>  From my perspective, wherever you start your serial index, competitors 
> can watch it grow over time, if it's a numeric serial. That could be 
> more valuable in many businesses than knowing the initial size of the 
> table.

I guess it depends on the business case.  Certainly we can always find
a use case where a specific solution fails but that's not how we really
work.  We get all the details of the requirements and then code what
solves them.  See above for the "overkill" method that solves that
issue if it really is one.

> Anyway, I hope that clears up what I was recommending! I didn't 
> anticipate it would stir up this much analysis and I hope the OP finds 
> your input and mine useful in coming up with a final answer to his 
> issue. Thanks for taking the time to consider the issue and I'll look 
> forward to any additional ideas or comments you have on this too!

Yes, discussion is always useful, and fun.  :-)

-- 
D'Arcy J.M. Cain <[EMAIL PROTECTED]> |  Democracy is three wolves
http://www.druid.net/darcy/|  and a sheep voting on
+1 416 425 1212 (DoD#0082)(eNTP)   |  what's for dinner.

-- 
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


[SQL] compare 2 tables in sql

2008-03-19 Thread Tena Sakai
Hi Everybody,

Is there a sql way to compare (in a diff/cmp sense)
2 tables?  For example,

  create table foo as
  [select bla bla bla];

  create table moo as
  [select bla bla bla];

How would I go about knowing foo and moo are identical
(or not)?  Any pointer would be appreciated.

Tena


Re: [SQL] compare 2 tables in sql

2008-03-19 Thread Volkan YAZICI
On Wed, 19 Mar 2008, "Tena Sakai" <[EMAIL PROTECTED]> writes:
> Is there a sql way to compare (in a diff/cmp sense)
> 2 tables?

You can diff "pg_dump --schema-only" output of the related tables. (I
attached an ad-hoc script once I wrote to use for such stuff.) I don't
know about [php]pgadmin, but (IIRC) EMS products offer that
functionality.


Regards.



pg_difftable
Description: Binary data

-- 
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] compare 2 tables in sql

2008-03-19 Thread Dean Gibson (DB Administrator)

On 2008-03-19 10:56, Tena Sakai wrote:


Hi Everybody,

Is there a sql way to compare (in a diff/cmp sense) 2 tables?  For 
example,


  create table foo as
  [select bla bla bla];

  create table moo as
  [select bla bla bla];

How would I go about knowing foo and moo are identical (or not)?  Any 
pointer would be appreciated.


Tena

You could do a full outer join of "foo" and "moo" on whatever is the 
common key, and then delete those rows which don't have null fields in 
the either the left or right sides;  the remainder would be the differences.


--
Mail to my list address MUST be sent via the mailing list.
All other mail to my list address will bounce.



[SQL] Date and filling issues

2008-03-19 Thread Christopher Crews
Hi All,
I'm not quite sure how to phrase this, but essentially my company has me 
working on some reports and I have some charts associated with the SQL results. 

My current query is:

select 
transaction_date as date, 
sum(sale_amount) as sales
from ej_transaction
 where transaction_date 
between current_date - Interval '1 month' and current_date
group by transaction_date 
order by transaction_date asc

The issue I'm having is that there are some dates where sales of certain items 
simply don't take place. Instead of putting a date entry in the database with a 
sale amount of 0, there simply is no entry for that date. I need to make a 
query that will fill in the dates not found within the date range and populate 
them with the sales value of 0.

A sample of the current results data would be like
datesales
2008-03-07  100.00
2007-03-10 150.00
2007-03-18 50.00

and what I'm trying to do is fill in the missing dates with sales values of 0.

Thanks,
-CC


   
-
Never miss a thing.   Make Yahoo your homepage.

Re: [SQL] compare 2 tables in sql

2008-03-19 Thread Jonah H. Harris
On Wed, Mar 19, 2008 at 1:56 PM, Tena Sakai <[EMAIL PROTECTED]> wrote:
>  Is there a sql way to compare (in a diff/cmp sense)
>  2 tables?  For example,

SELECT * FROM foo
EXCEPT
SELECT * FROM moo;

-- 
Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324
EnterpriseDB Corporation | fax: 732.331.1301
499 Thornall Street, 2nd Floor | [EMAIL PROTECTED]
Edison, NJ 08837 | http://www.enterprisedb.com/

-- 
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] Date and filling issues

2008-03-19 Thread Osvaldo Kussama
2008/3/19, Christopher Crews <[EMAIL PROTECTED]>:
> Hi All,
> I'm not quite sure how to phrase this, but essentially my company has me
> working on some reports and I have some charts associated with the SQL
> results.
>
> My current query is:
>
> select
> transaction_date as date,
> sum(sale_amount) as sales
> from ej_transaction
>  where transaction_date
> between current_date - Interval '1 month' and current_date
> group by transaction_date
> order by transaction_date asc
>
> The issue I'm having is that there are some dates where sales of certain
> items simply don't take place. Instead of putting a date entry in the
> database with a sale amount of 0, there simply is no entry for that date. I
> need to make a query that will fill in the dates not found within the date
> range and populate them with the sales value of 0.
>
> A sample of the current results data would be like
> datesales
> 2008-03-07 100.00
> 2007-03-10 150.00
> 2007-03-18 50.00
>
> and what I'm trying to do is fill in the missing dates with sales values of
> 0.
>

Try:
SELECT s.date::date, sum(COALESCE(ej_transaction.sale_amount,0)) as sales
FROM generate_series(current_date - '1 month', current_date) AS s(date)
 LEFT OUTER JOIN ej_transaction
GROUP BY s.date
ORDER BY s.date ASC;

Osvaldo

-- 
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] compare 2 tables in sql

2008-03-19 Thread Tena Sakai
Fantastic!  Many thanks.

Regards,

Tena Sakai


-Original Message-
From: [EMAIL PROTECTED] on behalf of Jonah H. Harris
Sent: Wed 3/19/2008 3:39 PM
To: Tena Sakai
Cc: [email protected]
Subject: Re: [SQL] compare 2 tables in sql
 
On Wed, Mar 19, 2008 at 1:56 PM, Tena Sakai <[EMAIL PROTECTED]> wrote:
>  Is there a sql way to compare (in a diff/cmp sense)
>  2 tables?  For example,

SELECT * FROM foo
EXCEPT
SELECT * FROM moo;

-- 
Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324
EnterpriseDB Corporation | fax: 732.331.1301
499 Thornall Street, 2nd Floor | [EMAIL PROTECTED]
Edison, NJ 08837 | http://www.enterprisedb.com/

-- 
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql



Re: [SQL] Date and filling issues

2008-03-19 Thread Rodrigo E. De León Plicet
On 3/19/08, Christopher Crews <[EMAIL PROTECTED]> wrote:
> and what I'm trying to do is fill in the missing dates with sales values of 0.

create or replace function gen_dates(sd date, ed date)
returns setof date as $$
select $1 + i
from generate_series(0, $2 - $1) i;
$$ language sql immutable;

select d.date, sum(coalesce(sale_amount,0)) as sales
from gen_dates((current_date - interval '1 month')::date, current_date) d(date)
left join ej_transaction
on transaction_date=d.date
group by d.date
order by d.date asc;

-- 
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] pg_dump using SQL

2008-03-19 Thread Preston Landers
Yusnel Rojas Garc?a([EMAIL PROTECTED])@2008.03.18 11:45:03 -0800:
> Hi everyone
> 
> Is there any way to do what pg_dump does?, I mean, get the structure of a
> table in a database (ex: CREATE TABLE ...)
> 

If you want to programmaticly discover the elements in a database
schema then you can always query the INFORMATION_SCHEMA:

http://www.postgresql.org/docs/current/static/information-schema.html

That isn't the same as having the raw DLL statements that created
them, but depending on your purpose may be good enough.

regards,
Preston

-- 
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql