Re: [GENERAL] pg_sample

2016-10-19 Thread Greg Sabino Mullane

-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160


> I may be overseeing something, but what about dependencies between 
> tables, sequencies, indexes, etc.? I guess that if one takes the first 
> 100 rows of a table referenced by another table, there is no guarantee 
> that in the first 100 rows of the referencing table there will not be 
> some foreign key that does not exist.

The only dependency that should matter is foreign keys, and yeah, if you 
have those, all bets are off; one would need to write something very custom 
indeed to slurp out the data. I could envision some workarounds, but it 
really depends on exactly what the OP is trying to achieve.

- -- 
Greg Sabino Mullane g...@turnstep.com
End Point Corporation http://www.endpoint.com/
PGP Key: 0x14964AC8 201610191401
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-BEGIN PGP SIGNATURE-

iEYEAREDAAYFAlgHtV8ACgkQvJuQZxSWSsgrzQCglNFhkdnfg4ECC1l3l0F/Uqt0
ID4AnjGHOTR5Tsfn8MwmyBItTrOg1w7Y
=6qOO
-END PGP SIGNATURE-




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] pg_sample

2016-10-19 Thread Karsten Hilbert
On Wed, Oct 19, 2016 at 01:24:10PM +1300, Patrick B wrote:

> I'm using pg_sample to do that, but unfortunately it doesn't work well.
> It doesn't get the first 100 rows. It gets random 100 rows.
> 
> Do you guys have any idea how could I do this?

For any relevant answer to this question you'll have to
define what "first" means in the context of "first 100 rows".

Karsten
-- 
GPG key ID E4071346 @ eu.pool.sks-keyservers.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] pg_sample

2016-10-18 Thread Adrian Klaver

On 10/18/2016 08:15 PM, Charles Clavadetscher wrote:

Hello

On 10/19/2016 04:58 AM, Greg Sabino Mullane wrote:


-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160


Patrick B  writes:
...

However, this new database test server doesn't need to have all the
data. I
would like to have only the first 100 rows(example) of each table in my
database.

...

This should do what you ask.

If the order does not matter, leave out the ORDER BY.

This assumes everything of interest is in the public schema.

$ createdb testdb
$ pg_dump realdb --schema-only | psql -q testdb
$ psql realdb

psql> \o dump.some.rows.sh
psql> select format($$psql realdb -c 'COPY (select * from %I order by
1 limit %s) TO STDOUT' | psql testdb -c 'COPY %I FROM STDIN' $$,
table_name, 100, table_name)
  from information_schema.tables where table_schema = 'public' and
table_type = 'BASE TABLE';
psql> \q

$ sh dump.some.rows.sh


I may be overseeing something, but what about dependencies between
tables, sequencies, indexes, etc.? I guess that if one takes the first
100 rows of a table referenced by another table, there is no guarantee
that in the first 100 rows of the referencing table there will not be
some foreign key that does not exist.


Well there is:

https://github.com/18F/rdbms-subsetter

That still does not guarantee that the rows selected cover your test 
cases though.




Regards
Charles



- --
Greg Sabino Mullane g...@turnstep.com
End Point Corporation http://www.endpoint.com/
PGP Key: 0x14964AC8 201610182256
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-BEGIN PGP SIGNATURE-

iEYEAREDAAYFAlgG4NkACgkQvJuQZxSWSsge4ACePhBOBtBFnGNxXt5qpY7X+w3o
d04AoKTzAgxcaqy8qfIE0LPuzG9x0KIU
=sS+m
-END PGP SIGNATURE-









--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] pg_sample

2016-10-18 Thread Charles Clavadetscher

Hello

On 10/19/2016 04:58 AM, Greg Sabino Mullane wrote:


-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160


Patrick B  writes:
...

However, this new database test server doesn't need to have all the data. I
would like to have only the first 100 rows(example) of each table in my
database.

...

This should do what you ask.

If the order does not matter, leave out the ORDER BY.

This assumes everything of interest is in the public schema.

$ createdb testdb
$ pg_dump realdb --schema-only | psql -q testdb
$ psql realdb

psql> \o dump.some.rows.sh
psql> select format($$psql realdb -c 'COPY (select * from %I order by 1 limit 
%s) TO STDOUT' | psql testdb -c 'COPY %I FROM STDIN' $$, table_name, 100, 
table_name)
  from information_schema.tables where table_schema = 'public' and 
table_type = 'BASE TABLE';
psql> \q

$ sh dump.some.rows.sh


I may be overseeing something, but what about dependencies between 
tables, sequencies, indexes, etc.? I guess that if one takes the first 
100 rows of a table referenced by another table, there is no guarantee 
that in the first 100 rows of the referencing table there will not be 
some foreign key that does not exist.


Regards
Charles



- --
Greg Sabino Mullane g...@turnstep.com
End Point Corporation http://www.endpoint.com/
PGP Key: 0x14964AC8 201610182256
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-BEGIN PGP SIGNATURE-

iEYEAREDAAYFAlgG4NkACgkQvJuQZxSWSsge4ACePhBOBtBFnGNxXt5qpY7X+w3o
d04AoKTzAgxcaqy8qfIE0LPuzG9x0KIU
=sS+m
-END PGP SIGNATURE-






--
Swiss PostgreSQL Users Group
c/o Charles Clavadetscher
Treasurer
Motorenstrasse 18
CH – 8005 Zürich

http://www.swisspug.org

+---+
|     __  ___   |
|  /)/  \/   \  |
| ( / ___\) |
|  \(/ o)  ( o)   ) |
|   \_  (_  )   \ ) _/  |
| \  /\_/\)/|
|  \/ |
|   _|  |   |
|   \|_/|
|   |
| PostgreSQL 1996-2016  |
|  20 Years of Success  |
|   |
+---+


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] pg_sample

2016-10-18 Thread Greg Sabino Mullane

-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160


Patrick B  writes:
...
> However, this new database test server doesn't need to have all the data. I
> would like to have only the first 100 rows(example) of each table in my
> database.
...

This should do what you ask.

If the order does not matter, leave out the ORDER BY.

This assumes everything of interest is in the public schema.

$ createdb testdb
$ pg_dump realdb --schema-only | psql -q testdb
$ psql realdb

psql> \o dump.some.rows.sh
psql> select format($$psql realdb -c 'COPY (select * from %I order by 1 limit 
%s) TO STDOUT' | psql testdb -c 'COPY %I FROM STDIN' $$, table_name, 100, 
table_name)
  from information_schema.tables where table_schema = 'public' and 
table_type = 'BASE TABLE';
psql> \q

$ sh dump.some.rows.sh

- -- 
Greg Sabino Mullane g...@turnstep.com
End Point Corporation http://www.endpoint.com/
PGP Key: 0x14964AC8 201610182256
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-BEGIN PGP SIGNATURE-

iEYEAREDAAYFAlgG4NkACgkQvJuQZxSWSsge4ACePhBOBtBFnGNxXt5qpY7X+w3o
d04AoKTzAgxcaqy8qfIE0LPuzG9x0KIU
=sS+m
-END PGP SIGNATURE-




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] pg_sample

2016-10-18 Thread Melvin Davidson
On Tue, Oct 18, 2016 at 10:21 PM, Adrian Klaver 
wrote:

> On 10/18/2016 06:30 PM, Patrick B wrote:
>
>>
>>
>> 2016-10-19 13:39 GMT+13:00 Michael Paquier > >:
>>
>> On Wed, Oct 19, 2016 at 9:24 AM, Patrick B > > wrote:
>> > However, this new database test server doesn't need to have all the
>> data. I
>> > would like to have only the first 100 rows(example) of each table
>> in my
>> > database.
>> >
>> > I'm using pg_sample to do that, but unfortunately it doesn't work
>> well.
>> > It doesn't get the first 100 rows. It gets random 100 rows.
>>
>> Why aren't 100 random rows enough to fulfill what you are looking for?
>> What you are trying here is to test the server with some sample data,
>> no? In this case, having the first 100 rows, or a set of random ones
>> should not matter much (never tried pg_sample to be honest).
>> --
>> Michael
>>
>>
>>
>> Actually it does matter because there is some essential data that has to
>> be in there so the code can work.
>>
>
> Well random does not know essential, it is after all random. If you want
> to test specific cases then you will need to build appropriate data sets.
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

The following query should generate statements you can use to get the first
100 rows of every table.
You may need to tweak a bit as order is not guaranteed.

SELECT 'COPY ' || quote_ident(n.nspname) || '.' || quote_ident(c.relname)
|| ' TO '''
--|| 'C:\temp\'
|| '/tmp/'
|| quote_ident(n.nspname) || '_' || quote_ident(c.relname) || '.csv' ||

|| ' WITH CSV HEADER FORCE_QUOTE *;'
  FROM pg_class c
  JOIN pg_namespace n ON n.oid = c.relnamespace
 WHERE relkind = 'r'
   AND relname NOT LIKE 'pg_%'
   AND relname NOT LIKE 'sql_%'
   LIMIT 100;

-- 
*Melvin Davidson*
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.


Re: [GENERAL] pg_sample

2016-10-18 Thread Adrian Klaver

On 10/18/2016 06:30 PM, Patrick B wrote:



2016-10-19 13:39 GMT+13:00 Michael Paquier >:

On Wed, Oct 19, 2016 at 9:24 AM, Patrick B > wrote:
> However, this new database test server doesn't need to have all the data. 
I
> would like to have only the first 100 rows(example) of each table in my
> database.
>
> I'm using pg_sample to do that, but unfortunately it doesn't work well.
> It doesn't get the first 100 rows. It gets random 100 rows.

Why aren't 100 random rows enough to fulfill what you are looking for?
What you are trying here is to test the server with some sample data,
no? In this case, having the first 100 rows, or a set of random ones
should not matter much (never tried pg_sample to be honest).
--
Michael



Actually it does matter because there is some essential data that has to
be in there so the code can work.


Well random does not know essential, it is after all random. If you want 
to test specific cases then you will need to build appropriate data sets.



--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] pg_sample

2016-10-18 Thread Patrick B
2016-10-19 13:39 GMT+13:00 Michael Paquier :

> On Wed, Oct 19, 2016 at 9:24 AM, Patrick B 
> wrote:
> > However, this new database test server doesn't need to have all the
> data. I
> > would like to have only the first 100 rows(example) of each table in my
> > database.
> >
> > I'm using pg_sample to do that, but unfortunately it doesn't work well.
> > It doesn't get the first 100 rows. It gets random 100 rows.
>
> Why aren't 100 random rows enough to fulfill what you are looking for?
> What you are trying here is to test the server with some sample data,
> no? In this case, having the first 100 rows, or a set of random ones
> should not matter much (never tried pg_sample to be honest).
> --
> Michael
>


Actually it does matter because there is some essential data that has to be
in there so the code can work.


Re: [GENERAL] pg_sample

2016-10-18 Thread Michael Paquier
On Wed, Oct 19, 2016 at 9:24 AM, Patrick B  wrote:
> However, this new database test server doesn't need to have all the data. I
> would like to have only the first 100 rows(example) of each table in my
> database.
>
> I'm using pg_sample to do that, but unfortunately it doesn't work well.
> It doesn't get the first 100 rows. It gets random 100 rows.

Why aren't 100 random rows enough to fulfill what you are looking for?
What you are trying here is to test the server with some sample data,
no? In this case, having the first 100 rows, or a set of random ones
should not matter much (never tried pg_sample to be honest).
-- 
Michael


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] pg_sample

2016-10-18 Thread Patrick B
Hi guys,

I got a very big database, that I need to export (dump) into a new test
server.

However, this new database test server doesn't need to have all the data. I
would like to have only the first 100 rows(example) of each table in my
database.

I'm using pg_sample to do that, but unfortunately it doesn't work well.
It doesn't get the first 100 rows. It gets random 100 rows.

Do you guys have any idea how could I do this?
Thanks
Patrick