Re: [GENERAL] Multiple Schemas vs. Multiple Databases

2017-10-13 Thread Thomas Kellerer

Melvin Davidson schrieb am 13.10.2017 um 21:42:

If that is what you need to do, then definitely use multiple schemas.
In PostgreSQL, the only way to do cross db queries / DML, is with the
dblink extension, and from personal use, it is a PIA to use.


dblink is not the only way to do that.

Nowadays, cross-DB queries can quite easily be done using foreign tables (and 
they are quite efficient as well - much more efficient then dblink)

Thomas




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Multiple Schemas vs. Multiple Databases

2017-10-13 Thread Igal @ Lucee.org

On 10/13/2017 12:47 PM, John R Pierce wrote:

On 10/13/2017 12:29 PM, Igal @ Lucee.org wrote:


The main things that I need to do is:

  a) Be able to backup/restore each "part" separately.  Looks like 
pg_dump allows that for schemas via the --schema=schema argument.


  b) Be able to query aggregates from the secondary "part" and store 
the results in the primary one, which also seems easier with multiple 
schemas than multiple databases.


Am I right to think that two schemas are better in this use case or 
am I missing something important?




generally, yeah, unless you eventually decide to split off the two 
databases onto separate servers for performance reasons.   Of course, 
to access the 'other' database, you'd need to use postgres_fdw or dblink.


Thank you both for confirming,


Igal Sapir
Lucee Core Developer
Lucee.org 





Re: [GENERAL] Multiple Schemas vs. Multiple Databases

2017-10-13 Thread John R Pierce

On 10/13/2017 12:29 PM, Igal @ Lucee.org wrote:


I have read quite a few articles about multiple schemas vs. multiple 
databases, but they are all very generic so I wanted to ask here for a 
specific use case:


I am migrating a Web Application from MS SQL Server to PostgreSQL.  
For the sake of easier maintenance, on SQL Server I have two separate 
databases:


  1) Primary database containing the data for the application

  2) Secondary database containing "transient" data, e.g. logging of 
different activities on the website in order to generate statistics etc.


Both databases belong to the same application with the same roles and 
permissions.


The secondary database grows much faster, but the data in it is not 
mission-critical , and so the data is aggregated daily and the 
summaries are posted to the primary database, because only the 
aggregates are important here.


To keep the database sizes from growing too large, I periodically 
delete old data from the secondary database since the data becomes 
obsolete after a certain period of time.


At first I thought of doing the same in Postgres, but now it seems 
like the better way to go would be to keep one database with two 
schemas: primary and transient.


The main things that I need to do is:

  a) Be able to backup/restore each "part" separately.  Looks like 
pg_dump allows that for schemas via the --schema=schema argument.


  b) Be able to query aggregates from the secondary "part" and store 
the results in the primary one, which also seems easier with multiple 
schemas than multiple databases.


Am I right to think that two schemas are better in this use case or am 
I missing something important?




generally, yeah, unless you eventually decide to split off the two 
databases onto separate servers for performance reasons.   Of course, to 
access the 'other' database, you'd need to use postgres_fdw or dblink.



--
john r pierce, recycling bits in santa cruz



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Multiple Schemas vs. Multiple Databases

2017-10-13 Thread Melvin Davidson
On Fri, Oct 13, 2017 at 3:29 PM, Igal @ Lucee.org  wrote:

> Hello,
>
> I have read quite a few articles about multiple schemas vs. multiple
> databases, but they are all very generic so I wanted to ask here for a
> specific use case:
>
> I am migrating a Web Application from MS SQL Server to PostgreSQL.  For
> the sake of easier maintenance, on SQL Server I have two separate databases:
>
>   1) Primary database containing the data for the application
>
>   2) Secondary database containing "transient" data, e.g. logging of
> different activities on the website in order to generate statistics etc.
>
> Both databases belong to the same application with the same roles and
> permissions.
>
> The secondary database grows much faster, but the data in it is not
> mission-critical , and so the data is aggregated daily and the summaries
> are posted to the primary database, because only the aggregates are
> important here.
>
> To keep the database sizes from growing too large, I periodically delete
> old data from the secondary database since the data becomes obsolete after
> a certain period of time.
>
> At first I thought of doing the same in Postgres, but now it seems like
> the better way to go would be to keep one database with two schemas:
> primary and transient.
>
> The main things that I need to do is:
>
>   a) Be able to backup/restore each "part" separately.  Looks like pg_dump
> allows that for schemas via the --schema=schema argument.
>
>   b) Be able to query aggregates from the secondary "part" and store the
> results in the primary one, which also seems easier with multiple schemas
> than multiple databases.
>
> Am I right to think that two schemas are better in this use case or am I
> missing something important?
>
> Thanks,
>
> Igal Sapir
> Lucee Core Developer
> Lucee.org 
>

>b) Be able to query aggregates from the secondary "part" and store the
results in the primary one, which also seems easier with multiple >schemas
than multiple databases.

If that is what you need to do, then definitely use multiple schemas. In
PostgreSQL, the only way to do cross db queries / DML, is with the dblink
extension, and from personal use, it is a PIA to use.

-- 
*Melvin Davidson*
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.


Re: [GENERAL] REASSIGN OWNED simply doesn't work

2017-10-13 Thread David G. Johnston
On Fri, Oct 13, 2017 at 6:04 AM, Alvaro Herrera 
wrote:

> Sam Gendler wrote:
> > psql 9.6.3 on OS X.
> >
> > I'm dealing with a production database in which all db access has been
> made
> > by the same user - the db owner, which isn't actually a superuser because
> > the db runs on amazon RDS - amazon retains the superuser privilege for
> its
> > own users and makes non-superuser role with createrole and createdb
> > privileges for use as the primary role by the AWS account.
>
> It's true that REASSIGN OWNED is limited to a very particular scenario.
> It was written to support the specific case of wanting to drop a role,
> and that can only be done by a superuser, so why would it matter that
> REASSIGN OWNED itself could not be run by a superuser?
>

​You could at least fix the documentation bug since this superuser-only
restriction doesn't show up and is in fact contradicted by the sentence
​"REASSIGN OWNED requires privileges on both the source role(s) and the
target role."  The error message that comes back seems like it could be
improved as well.

The word "privileges" there seems odd too, wouldn't "membership" be more
appropriate?

https://www.postgresql.org/docs/10/static/sql-reassign-owned.html

David J.


[GENERAL] Multiple Schemas vs. Multiple Databases

2017-10-13 Thread Igal @ Lucee.org

Hello,

I have read quite a few articles about multiple schemas vs. multiple 
databases, but they are all very generic so I wanted to ask here for a 
specific use case:


I am migrating a Web Application from MS SQL Server to PostgreSQL.  For 
the sake of easier maintenance, on SQL Server I have two separate databases:


  1) Primary database containing the data for the application

  2) Secondary database containing "transient" data, e.g. logging of 
different activities on the website in order to generate statistics etc.


Both databases belong to the same application with the same roles and 
permissions.


The secondary database grows much faster, but the data in it is not 
mission-critical , and so the data is aggregated daily and the summaries 
are posted to the primary database, because only the aggregates are 
important here.


To keep the database sizes from growing too large, I periodically delete 
old data from the secondary database since the data becomes obsolete 
after a certain period of time.


At first I thought of doing the same in Postgres, but now it seems like 
the better way to go would be to keep one database with two schemas: 
primary and transient.


The main things that I need to do is:

  a) Be able to backup/restore each "part" separately.  Looks like 
pg_dump allows that for schemas via the --schema=schema argument.


  b) Be able to query aggregates from the secondary "part" and store 
the results in the primary one, which also seems easier with multiple 
schemas than multiple databases.


Am I right to think that two schemas are better in this use case or am I 
missing something important?


Thanks,

Igal Sapir
Lucee Core Developer
Lucee.org 



Re: [GENERAL] REASSIGN OWNED simply doesn't work

2017-10-13 Thread Alvaro Herrera
Sam Gendler wrote:
> psql 9.6.3 on OS X.
> 
> I'm dealing with a production database in which all db access has been made
> by the same user - the db owner, which isn't actually a superuser because
> the db runs on amazon RDS - amazon retains the superuser privilege for its
> own users and makes non-superuser role with createrole and createdb
> privileges for use as the primary role by the AWS account.

It's true that REASSIGN OWNED is limited to a very particular scenario.
It was written to support the specific case of wanting to drop a role,
and that can only be done by a superuser, so why would it matter that
REASSIGN OWNED itself could not be run by a superuser?

However, I do not apologize for not thinking about your particular
scenario.  I had my itch, and I scratched it; you have yours, yet you
complain that I did not scratch it in advance?

Feel free to submit a patch (or convince/hire someone to do it for you)
to add the new feature of allowing it to work without superuser privs.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Fwd: [BUGS] BUG #14850: Implement optinal additinal parameter for 'justify' date/time function

2017-10-13 Thread KES
huh... how did I miss `age` function??? Thanks. I give me the idea. I will add 5days when the month of day is greater or equal to 28 select age( '2016-02-29'::date +'5days'::interval, '2016-01-31'::date+'5days'::interval );  age  --- 1 mon(1 row) It seems resolves my issue  12.10.2017, 17:38, "Brian Dunavant" :A 'month' is an abstract measurement of time.  Sometimes it's 29 days, 30, or 31.   You cannot say "I have 30 days, how many months is that?" because the answer is "it depends". -  gives you an interval in days.   In your example, you took Jan 31 2016 and added "1 month".  Postgres says "I know feb 2016 is 29 days" and did it automatically for you.   When you then subtracted Jan 31 2016, you now have "29 days".   Postgres can no longer say "that is 1 month" because you cannot go that direction.You are also using extract(month from X) incorrectly if you want the number of months between any time period.   That will only return a value between 0 and 11.It will also be difficult because you are starting from a random day in the month, making it hard to really know what you mean.  Postgres' age() function may be able to help you with 'months'.  flpg=# select age( '2016-02-01'::timestamp, '2016-01-01'::timestamp );  age--- 1 monflpg=# select age( '2016-02-29'::timestamp, '2016-01-31'::timestamp );   age- 29 days(1 row)flpg=# select age( '2016-03-01'::timestamp, '2016-01-31'::timestamp );     age- 1 mon 1 dayOn Thu, Oct 12, 2017 at 4:00 AM, KES  wrote: Пересылаемое сообщение11.10.2017, 17:12, "Pavel Stehule" :Hi2017-10-11 12:35 GMT+02:00  :The following bug has been logged on the website:

Bug reference:      14850
Logged by:          Eugen Konkov
Email address:      kes-...@yandex.ru
PostgreSQL version: 10.0
Operating system:   Linux mint 18: Linux work 4.4.0-57-generic #78-Ubu
Description:

Hi. I try to do next math:

select extract( month from justify_days( timestamp '2016-01-31' +interval '1
month' -timestamp '2016-01-31') );
 date_part
---
         0
(1 row)

I expect `1` but get `0`. But here everything is right:

>Adjust interval so 30-day time periods are represented as months

https://www.postgresql.org/docs/9.6/static/functions-datetime.html

But with ability to setup justify date the math will be more sharp.

Please implement next feature:

select extract( month from justify_days( timestamp '2016-01-31' +interval '1
month' -timestamp '2016-01-31'), timestamp '2016-01-31' );
 date_part
---
         1
(1 row)

This is useful when I try to calculate how much month are left between
service start and end dates.This is not the bug, so pgsql-hackers, pgsql-general are better places for this discussion I am thinking so your request has sense, and should be registered in ToDo list https://wiki.postgresql.org/wiki/TodoYou can try to connect people from PostgreSQL Pro company for implementation.RegardsPavel

Thank you.


--
Sent via pgsql-bugs mailing list (pgsql-b...@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

 Конец пересылаемого сообщения 




Re: [GENERAL] Permissions for Web App

2017-10-13 Thread Igal @ Lucee.org

On 10/13/2017 11:21 AM, David G. Johnston wrote:
On Fri, Oct 13, 2017 at 11:03 AM, Igal @ Lucee.org >wrote:


You mean that if I execute the ALTER DEFAULT command above as user
`postgres` then only tables created by user `postgres` will give
default privileges to role `webapp`?


​Yes. "​You can change default privileges only for objects that will 
be created by yourself or by roles that you are a member of."


You use the "FOR " modifier if you want the "or by roles 
that you are a member of" portion to be used.




Thank you for clarifying,


Igal Sapir
Lucee Core Developer
Lucee.org 



Re: [GENERAL] "Shared strings"-style table

2017-10-13 Thread Seamus Abshere
On Fri, Oct 13, 2017, at 03:16 PM, David G. Johnston wrote:
> implement a "system-managed-enum" type with many of the same properties
[...]
> TOAST does involved compression but the input to
> the compression algorithm is a single cell (row and column) in a table.​
> As noted above I consider the TOAST table and main table to be a single
> logical table.

See this sounds like _exactly_ what I want. Except with a content hash
instead of an id.

Seems to me like all of the machinery that allows you to look things up
by TOASTed columns and subsequently return TOASTed values as if they
resided in the same physical table is what is needed.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] "Shared strings"-style table

2017-10-13 Thread Thomas Kellerer

Seamus Abshere schrieb am 13.10.2017 um 18:43:

On Fri, Oct 13, 2017 at 8:49 AM, Seamus Abshere wrote:

Theoretically / blue sky, could there be a table or column type that
transparently handles "shared strings" like this, reducing size on disk
at the cost of lookup overhead for all queries?
(I guess maybe it's like TOAST, but content-hashed and de-duped and not
only for large objects?)


On Fri, Oct 13, 2017, at 01:29 PM, Melvin Davidson wrote:

What was described is exactly what relations and Foreign Keys are for.


hi Melvin, appreciate the reminder. Our issue is that we have 300+
columns and frequently include them in the SELECT or WHERE clauses... so
traditional normalization would involve hundreds of joins.

That's why I ask about a new table or column type that handles basic
translation and de-duping transparently, keeping the coded values
in-table.



As those are all strings: why not put them into a hstore (or jsonb) column?

Then all of them could compressed (TOASTed) as a single value instead of 300 
independent values.


 





--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Permissions for Web App

2017-10-13 Thread David G. Johnston
On Fri, Oct 13, 2017 at 11:03 AM, Igal @ Lucee.org  wrote:

> You mean that if I execute the ALTER DEFAULT command above as user
> `postgres` then only tables created by user `postgres` will give default
> privileges to role `webapp`?
>

​Yes.  "​You can change default privileges only for objects that will be
created by yourself or by roles that you are a member of."

You use the "FOR " modifier if you want the "or by roles that
you are a member of" portion to be used.

David J.


Re: [GENERAL] "Shared strings"-style table

2017-10-13 Thread David G. Johnston
On Fri, Oct 13, 2017 at 9:29 AM, Seamus Abshere  wrote:

> > On Fri, Oct 13, 2017 at 8:49 AM, Seamus Abshere wrote
> > > Theoretically / blue sky, could there be a table or column type that
> > > transparently handles "shared strings" like this, reducing size on disk
> > > at the cost of lookup overhead for all queries?
> > > (I guess maybe it's like TOAST, but content-hashed and de-duped and not
> > > only for large objects?)
>
> On Fri, Oct 13, 2017, at 01:12 PM, David G. Johnston wrote:
> > Row-independence is baked into PostgreSQL pretty deeply...
>
> Could you say more about that?
>

​Not intelligibly...basically as far as PostgreSQL is concerned all the
data to reconstruct a row from a given table is present in that table.
>From a practical perspective the "TOAST table" for a table IS part of the
main table since it has no practical independent use.

As an aside I was thinking along the lines of an actual compression routine
which is what a spreadsheet file is able to do since a spreadsheet contains
the data from every row and column in a single file and is able to compress
the entire file by finding commonalities across rows and columns.  A
database generally cannot do that.

As for "transparent lookup tables for text columns"...I suppose one could
implement a "system-managed-enum" type with many of the same properties of
an actual enum but avoiding many of its problems by not exposing the
enum-ness to the user and instead just exposing the text labels...I suspect
faced with prospect of doing something that complex most users would just
setup a FK relationship.
​

> What about the comparison to TOAST, which stores values off-table?
>

TOAST solves a technical problem related to the fact that records "on the
table" have a very small size limitation (kb) while stored values can be at
least as large as a GB.  TOAST does involved compression but the input to
the compression algorithm is a single cell (row and column) in a table.​
As noted above I consider the TOAST table and main table to be a single
logical table.

Like I said the enum type has similar properties to what you want - but
Melvin is right that using it requires careful consideration of how your
data might change in the future.

David J.


Re: [GENERAL] Permissions for Web App

2017-10-13 Thread Igal @ Lucee.org

Stephen,

On 10/10/2017 6:14 AM, Stephen Frost wrote:



For future reference and for the benefit of others, the command that
I ran is:

   ALTER DEFAULT PRIVILEGES IN SCHEMA public
     GRANT SELECT, INSERT, UPDATE, DELETE ON TABLES TO webapp;

Note that DEFAULT PRIVILEGES apply to a specific role (in the above
case, whatever CURRENT_USER is) and therefore will only be helpful if
you only have one user creating tables, in this case.


You mean that if I execute the ALTER DEFAULT command above as user 
`postgres` then only tables created by user `postgres` will give default 
privileges to role `webapp`?


I usually only create tables with user `postgres` but this is a bit 
confusing, or at least unclear from the syntax of the command.


Thanks,

Igal Sapir
Lucee Core Developer
Lucee.org 



Re: [GENERAL] "Shared strings"-style table

2017-10-13 Thread Peter J. Holzer
On 2017-10-13 12:49:21 -0300, Seamus Abshere wrote:
> In the spreadsheet world, there is this concept of "shared strings," a
> simple way of compressing spreadsheets when the data is duplicated in
> many cells.
> 
> In my database, I have a table with >200 million rows and >300 columns
> (all the households in the United States). For clarity of development
> and debugging, I have not made any effort to normalize its contents, so
> millions of rows have, for example, "SINGLE FAMILY RESIDENCE /
> TOWNHOUSE" (yes, that whole string!) instead of some code representing
> it.
> 
> Theoretically / blue sky, could there be a table or column type that
> transparently handles "shared strings" like this, reducing size on disk
> at the cost of lookup overhead for all queries?

Theoretically it's certainly possible and I think some column-oriented
databases store data that way.

> (I guess maybe it's like TOAST, but content-hashed and de-duped and not
> only for large objects?)

Yes, but if you want to autmatically delete entries which are no longer
needed you need to keep track of that. So either a reference count or an
index lookup on the parent table. This is starting to look a lot like a
foreign key - just hidden from the user. Performance would probably be
similar, too.

We have done something similar (although for different reasons). We
ended up doing the "join" in the application. For most purposes we don't
need the descriptive strings and when we need them we can do a
dictionary lookup just before sending them to the client (of course the
dictionary has to be read from the database, too, but it doesn't change
that often, so it can be cached). And from a software maintainability
POV I think a dictionary lookup in Perl is a lot nicer than 50 joins
(or 300 in your case).

hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson 


signature.asc
Description: PGP signature


Re: [GENERAL] "Shared strings"-style table

2017-10-13 Thread Melvin Davidson
On Fri, Oct 13, 2017 at 12:52 PM, Melvin Davidson 
wrote:

>
>
> On Fri, Oct 13, 2017 at 12:43 PM, Seamus Abshere 
> wrote:
>
>> > > On Fri, Oct 13, 2017 at 8:49 AM, Seamus Abshere wrote:
>> > >> Theoretically / blue sky, could there be a table or column type that
>> > >> transparently handles "shared strings" like this, reducing size on
>> disk
>> > >> at the cost of lookup overhead for all queries?
>> > >> (I guess maybe it's like TOAST, but content-hashed and de-duped and
>> not
>> > >> only for large objects?)
>>
>> On Fri, Oct 13, 2017, at 01:29 PM, Melvin Davidson wrote:
>> > What was described is exactly what relations and Foreign Keys are for.
>>
>> hi Melvin, appreciate the reminder. Our issue is that we have 300+
>> columns and frequently include them in the SELECT or WHERE clauses... so
>> traditional normalization would involve hundreds of joins.
>>
>> That's why I ask about a new table or column type that handles basic
>> translation and de-duping transparently, keeping the coded values
>> in-table.
>>
>
> >I ask about a new table or column type that handles basic translation
>
> AFAIK, there is no such thing currently available.Your initial post
> indicated you were working with spreadsheets and were
> looking to translate to PostgreSQL database. There is no short cut to
> normalizing, but the time you spend doing so in initial
> design will well be worthwhile once it is implemented.
>
> --
> *Melvin Davidson*
> I reserve the right to fantasize.  Whether or not you
> wish to share my fantasy is entirely up to you.
>


*Seamus,*




*Just a thought. As I mentioned previously, there is no shortcut to
optimizing your database. However, you can do it in increments.First,
create all your foreign key / repetative data tables.*


*Next, add additional FK columns to you current tables to reference the fk
/ repetative data tables.*


*Modify your application / queries to utilize the new columns.*






*Do extensive testing to make sure your modifications work properly.VERY
IMPORTANT: Before the next step, make a backup of the existing database and
verify you have a good copy.Finally, drop all the old repetative data
columns.*
-- 
*Melvin Davidson*
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.


Re: [GENERAL] "Shared strings"-style table

2017-10-13 Thread Melvin Davidson
On Fri, Oct 13, 2017 at 12:43 PM, Seamus Abshere  wrote:

> > > On Fri, Oct 13, 2017 at 8:49 AM, Seamus Abshere wrote:
> > >> Theoretically / blue sky, could there be a table or column type that
> > >> transparently handles "shared strings" like this, reducing size on
> disk
> > >> at the cost of lookup overhead for all queries?
> > >> (I guess maybe it's like TOAST, but content-hashed and de-duped and
> not
> > >> only for large objects?)
>
> On Fri, Oct 13, 2017, at 01:29 PM, Melvin Davidson wrote:
> > What was described is exactly what relations and Foreign Keys are for.
>
> hi Melvin, appreciate the reminder. Our issue is that we have 300+
> columns and frequently include them in the SELECT or WHERE clauses... so
> traditional normalization would involve hundreds of joins.
>
> That's why I ask about a new table or column type that handles basic
> translation and de-duping transparently, keeping the coded values
> in-table.
>

>I ask about a new table or column type that handles basic translation

AFAIK, there is no such thing currently available.Your initial post
indicated you were working with spreadsheets and were
looking to translate to PostgreSQL database. There is no short cut to
normalizing, but the time you spend doing so in initial
design will well be worthwhile once it is implemented.

-- 
*Melvin Davidson*
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.


Re: [GENERAL] Merge - Insert Select

2017-10-13 Thread Peter Geoghegan
On Fri, Oct 13, 2017 at 9:39 AM, Susan Hurst
 wrote:
> Does postgres have a MERGE statement ala Oracle?

No.

> I have found examples online for INSERT...ON CONFLICT DO NOTHING, but all of
> those examples use VALUES statements, which is not the scenario that I have.

You can use SELECT instead of VALUES with ON CONFLICT DO NOTHING/ON
CONFLICT DO UPDATE. They don't impose any restriction on the INSERT
statement at all, unlike MERGE, which is fussy about the use of
subqueries.

-- 
Peter Geoghegan


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] "Shared strings"-style table

2017-10-13 Thread Seamus Abshere
> > On Fri, Oct 13, 2017 at 8:49 AM, Seamus Abshere wrote:
> >> Theoretically / blue sky, could there be a table or column type that
> >> transparently handles "shared strings" like this, reducing size on disk
> >> at the cost of lookup overhead for all queries?
> >> (I guess maybe it's like TOAST, but content-hashed and de-duped and not
> >> only for large objects?)

On Fri, Oct 13, 2017, at 01:29 PM, Melvin Davidson wrote:
> What was described is exactly what relations and Foreign Keys are for.

hi Melvin, appreciate the reminder. Our issue is that we have 300+
columns and frequently include them in the SELECT or WHERE clauses... so
traditional normalization would involve hundreds of joins.

That's why I ask about a new table or column type that handles basic
translation and de-duping transparently, keeping the coded values
in-table.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Merge - Insert Select

2017-10-13 Thread Susan Hurst
What is the best practice for doing an INSERT...SELECT statement for new 
values only.  Existing values must be excluded from the insert.


Does postgres have a MERGE statement ala Oracle?

I have found examples online for INSERT...ON CONFLICT DO NOTHING, but 
all of those examples use VALUES statements, which is not the scenario 
that I have.


Thanks for your help!

Sue

--

Susan E Hurst
Principal Consultant
Brookhurst Data LLC
Email: susan.hu...@brookhurstdata.com



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] "Shared strings"-style table

2017-10-13 Thread Melvin Davidson
On Fri, Oct 13, 2017 at 12:12 PM, David G. Johnston <
david.g.johns...@gmail.com> wrote:

> On Fri, Oct 13, 2017 at 8:49 AM, Seamus Abshere 
> wrote:
>
>> Theoretically / blue sky, could there be a table or column type that
>> transparently handles "shared strings" like this, reducing size on disk
>> at the cost of lookup overhead for all queries?
>>
>> (I guess maybe it's like TOAST, but content-hashed and de-duped and not
>> only for large objects?)
>>
>
> Row-independence is baked into PostgreSQL pretty deeply...
>
> I think an enum type is about as close are you are likely to get if you
> don't wish to setup your own foreign-key relationships with surrogate keys.
>
> David J.
>

I STRONGLY advise againt the use of ENUMS.

What was described is exactly what relations and Foreign Keys are for.

Example:
CREATE TABLE residence_type
(
  residence_type_id   INTEGER NOT NULL,
  residence_type_desc TEXT NOT NULL,
  CONSTRAINT residence_type_pk PRIMARY KEY (residence_type_id)
);

CREATE TABLE state
(
  state_id   CHAR(02) NOT NULL,
  state_name TEXT NOT NULL,
  CONSTRAINT state_pk PRIMARY KEY (state_id)
);

CREATE TABLE residence
(
  residence_idBIGINT NOT NULL,
  residence_type_id INTEGER,
  street_numCHARACTER(10),
  street_nameCHARACTER(20),
  cityCHARACTER(40),
  state_idCHAR(02),
  CONSTRAINT residence_pk PRIMARY KEY (residence_id),
  CONSTRAINT fk_residence_state FOREIGN KEY (state_id)
REFERENCES state (state_id),
  CONSTRAINT fk_residence_type FOREIGN KEY (residence_type_id)
REFERENCES residence_type (residence_type_id)
);

SELECT t.residence_type_desc,
   r.street_num,
   r.street_name,
   r.city,
   s.state_name
  FROM residence r
  JOIN residence_type t ON t.residence_id = r.residence_id
  JOIN state s ON s.state_id = r.state_id
 WHERE residence_id = 12345;

-- 
*Melvin Davidson*
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.


Re: [GENERAL] "Shared strings"-style table

2017-10-13 Thread Seamus Abshere
> On Fri, Oct 13, 2017 at 8:49 AM, Seamus Abshere wrote
> > Theoretically / blue sky, could there be a table or column type that
> > transparently handles "shared strings" like this, reducing size on disk
> > at the cost of lookup overhead for all queries?
> > (I guess maybe it's like TOAST, but content-hashed and de-duped and not
> > only for large objects?)

On Fri, Oct 13, 2017, at 01:12 PM, David G. Johnston wrote:
> Row-independence is baked into PostgreSQL pretty deeply...

Could you say more about that?

What about the comparison to TOAST, which stores values off-table?

Thanks!


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] "Shared strings"-style table

2017-10-13 Thread David G. Johnston
On Fri, Oct 13, 2017 at 8:49 AM, Seamus Abshere  wrote:

> Theoretically / blue sky, could there be a table or column type that
> transparently handles "shared strings" like this, reducing size on disk
> at the cost of lookup overhead for all queries?
>
> (I guess maybe it's like TOAST, but content-hashed and de-duped and not
> only for large objects?)
>

Row-independence is baked into PostgreSQL pretty deeply...

I think an enum type is about as close are you are likely to get if you
don't wish to setup your own foreign-key relationships with surrogate keys.

David J.


Re: [GENERAL] "Shared strings"-style table

2017-10-13 Thread Rob Sargent



On 10/13/2017 09:49 AM, Seamus Abshere wrote:

hey,

In the spreadsheet world, there is this concept of "shared strings," a
simple way of compressing spreadsheets when the data is duplicated in
many cells.

In my database, I have a table with >200 million rows and >300 columns
(all the households in the United States). For clarity of development
and debugging, I have not made any effort to normalize its contents, so
millions of rows have, for example, "SINGLE FAMILY RESIDENCE /
TOWNHOUSE" (yes, that whole string!) instead of some code representing
it.

Theoretically / blue sky, could there be a table or column type that
transparently handles "shared strings" like this, reducing size on disk
at the cost of lookup overhead for all queries?

(I guess maybe it's like TOAST, but content-hashed and de-duped and not
only for large objects?)

Thanks,
Seamus

--
Seamus Abshere, SCEA
https://www.faraday.io
https://github.com/seamusabshere
https://linkedin.com/in/seamusabshere


What data type are these columns now?  I would be tempted to map the 
full strings to an abbreviation just so I didn't have to alter all the 
columns to an "id";  Optional to place any RI on the columns to the 
abbreviation dictionary table.  Just use the translation as a last step 
in user facing reports.  If you can map/abbreviate to 4 characters, 
you've approximated the disk size of an integer.



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] "Shared strings"-style table

2017-10-13 Thread Seamus Abshere
hey,

In the spreadsheet world, there is this concept of "shared strings," a
simple way of compressing spreadsheets when the data is duplicated in
many cells.

In my database, I have a table with >200 million rows and >300 columns
(all the households in the United States). For clarity of development
and debugging, I have not made any effort to normalize its contents, so
millions of rows have, for example, "SINGLE FAMILY RESIDENCE /
TOWNHOUSE" (yes, that whole string!) instead of some code representing
it.

Theoretically / blue sky, could there be a table or column type that
transparently handles "shared strings" like this, reducing size on disk
at the cost of lookup overhead for all queries?

(I guess maybe it's like TOAST, but content-hashed and de-duped and not
only for large objects?)

Thanks,
Seamus

--
Seamus Abshere, SCEA
https://www.faraday.io
https://github.com/seamusabshere
https://linkedin.com/in/seamusabshere


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Restore LargeObjects on different server

2017-10-13 Thread Daniel Verite
Durumdara wrote:

> The pg_catalog schema is system schema, but it is IN the DB.
> 
> Is this true? So OID is not global (out from DB)?

The OID generator is global to the instance, but the unicity
checks are local to the tables that use OIDs, including
large objects.

The case when you may have a problem is if moving
large objects from the old instance/old database
to the new instance/new database but the new
database would already have large objects created
before the import. Then you would need to do a merge
rather than just an import.

But if you're merely in the case that other databases in the new
instance have consumed OIDs and some happen to be equal to the ones
you want to import in a new database, that doesn't matter: similar OIDs
won't conflict if they're in different databases or even the same
database but different tables.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Is pgbouncer still maintained?

2017-10-13 Thread Steven Winfield
Hi all,

I hope I'm posting this in the correct place.
We use pgbouncer, as I'm sure many other people do, but we are becoming 
increasingly worried by the lack of a new release since February 2016 and a 
slowdown in development activity on the master branch.
https://github.com/pgbouncer/pgbouncer

Issues and pull requests are mounting up in the tracker, often going 
unacknowledged - including a requests for a new release.

We would very much like to see SCRAM support added (for which there is a pull 
request https://github.com/pgbouncer/pgbouncer/pull/227), and have the 
unrestricted ability to set per-user connection limits 
(https://github.com/pgbouncer/pgbouncer/pull/204), as well as pick up any bug 
fixes since the last official release.
We are, however, wary about just using the head of the master branch, or 
pulling from a fork, as companies offering Postgres support to clients will not 
cover these unofficial releases.

As much as we love pgbouncer, I don't see how the current situation can 
continue. I mean no disrespect to the authors - it is a great tool, and I 
understand that priorities and time constraints vary over time, but given the 
number of outstanding pull requests it does seem like there might exist a 
willing group of developers who could continue pgbouncer's maintenance and 
development, so perhaps some more of them should be allowed full access to the 
master repository.  I don't know exactly how that can be done without the risk 
of compromising the quality of code that makes it into the master branch in 
future, but I thought I'd suggest this to stimulate some discussion.

I certainly wouldn't like to see the beginnings of a competing fork.

Any thoughts, comments, (constructive) criticisms?

Best,
Steve.


Re: [GENERAL] Index corruption & broken clog

2017-10-13 Thread Benoit Lobréau
I seem that a failed vmotion occured in the same time frame.


Re: [GENERAL] Index corruption & broken clog

2017-10-13 Thread Benoit Lobréau
Thanks for the help.



> Shut down the corrupted database and make a physical backup of it.
>

I did it before trying anything.



> Check the disks and memory on the machine where the problem occurred.
>

We are using virtual machines (VMWare). I will ask then to investigate
anyway.


> You can try to salvage data from the corrupted tables, either
> using "zero_damaged_pages" (but that will only work if the page
> header is corrupted), or by excluding problematic tuples by their
> "ctid".
>

Thanks will try.


Re: [GENERAL] Index corruption & broken clog

2017-10-13 Thread Laurenz Albe
Benoit Lobréau wrote:
> One of my PostgreSQL server crashed badly yesterday. A process was killed 
> (see dmesg below) and postgres was stuck with theses process:
> 
> postgres  2083     1  0 Oct08 ?        00:19:02 
> /usr/lib/postgresql/9.5/bin/postgres -D /home/postgres/data/i090/systeme
> postgres  2221  2083  0 Oct08 ?        00:02:03 postgres: i090: logger 
> process
> postgres 14068  2083  8 Oct09 ?        02:50:22 [postgres]
> postgres 19668 19665  0 09:40 pts/0    00:00:00 -su
> 
> I couldn't stop it with "pg_ctl stop immediate". We had to kill them and 
> release the shared memory block.
> 
> dmesg showed this.
> 
> [Tue Oct 10 07:45:29 2017] postgres[25506]: segfault at 7f2253ecb000 ip 
> 7f225aead994 sp 7ffc2a9c0c28 error 4 in 
> libc-2.23.so[7f225ae0e000+1c]
> [Tue Oct 10 07:45:30 2017] BUG: Bad rss-counter state mm:8800516a7c00 
> idx:0 val:178
> [Tue Oct 10 07:45:31 2017] BUG: Bad page map in process postgres  
> pte:480090248c8d pmd:2c3de067 
> 
> When I checked the postgresql log I found lots of these :
> 
>  could not read block 76638 in file 
> "pg_tblspc/16395/PG_9.5_201510051/16396/20082": read only 0 of 8192 bytes
>  could not read block 76669 in file 
> "pg_tblspc/16395/PG_9.5_201510051/16396/19993": read only 0 of 8192 bytes
>  index "degrade_pkey" contains corrupted page at block 60392
>  index "degrade_pkey" contains unexpected zero page at block 60392
>  index "idx_coor_brute_geometrie" contains corrupted page at block 53061
>  index "idx_coor_brute_geometrie" contains corrupted page at block 56109
>  index "idx_coor_brute_geometrie" contains corrupted page at block 58867
>  index "idx_coor_brute_geometrie" contains corrupted page at block 59003
>  index "idx_coor_brute_geometrie" contains corrupted page at block 60138
>  index "idx_coor_brute_geometrie" contains corrupted page at block 71974
>  index "idx_coor_brute_geometrie" contains corrupted page at block 80968
>  index "idx_coor_brute_geometrie" contains corrupted page at block 86777
>  index "idx_coor_brute_geometrie" contains unexpected zero page at block 78586
>  index "idx_coor_proj_ligne_geometrie" contains corrupted page at block 64349
>  index "idx_numero_course" contains corrupted page at block 3435
>  index "idx_numero_course" contains corrupted page at block 7176
>  index "idx_numero_engin" contains corrupted page at block 30996
>  index "idx_utc_envoi_composant_amont" contains corrupted page at block 76497
>  index "idx_utc_envoi_reseau" contains corrupted page at block 76524
>  right sibling's left-link doesn't match: block 102923 links to 49947 instead 
> of expected 1 in index "idx_utc_reception_fournisseur"
>  right sibling's left-link doesn't match: block 103014 links to 51087 instead 
> of expected 1 in index "idx_utc_reception_composant_amont"
>  right sibling's left-link doesn't match: block 76978 links to 59148 instead 
> of expected 1 in index "idx_utc_envoi_reseau"
>  right sibling's left-link doesn't match: block 77073 links to 82204 instead 
> of expected 1 in index "idx_utc_envoi_composant_amont"
>  
> The errors started to show up on inserts 8 hours before the crash. 
> I dont understand because when I look into theses pages with pageinspect they 
> are not empty / dont look bad.
> For exemple, this  "SELECT * FROM bt_page_items('degrade_pkey',60392);" 
> showed a normal page but the error messages claims its zeroed.
> 
> I checked the tables for duplicates on the primary key. Checksum are enabled 
> and I didn't get any error message when I scanned the tables.
> I found 77 duplicates on one table and this:
> 
> ERROR:  could not access status of transaction 3443523584
> DETAIL:  Could not open file "pg_clog/0CD4": No such file or directory.
> 
> I dont think this transaction ever existed given the files I found in the 
> clog directory.
> 
> postgres@ulbdgomp01:~/data/igomp090/systeme/pg_clog$ ls -al
> total 30292
> drwx--  2 postgres postgres   4096 Oct  9 22:50 .
> drwx-- 18 postgres postgres   4096 Oct 12 02:32 ..
> -rw---  1 postgres postgres 262144 Aug 16 22:50 
> -rw---  1 postgres postgres 262144 Aug 17 16:20 0001
> -rw---  1 postgres postgres 262144 Aug 17 22:25 0002
> ...
> -rw---  1 postgres postgres 262144 Oct  7 19:31 0072
> -rw---  1 postgres postgres 262144 Oct  9 22:50 0073
> -rw---  1 postgres postgres 122880 Oct 12 11:56 0074
> 
> In your opinion :
> * am I missing a (lots of) clog file(s) or is it a error in the heap page 
> giving an erroneous txid ?
> * Is there a way to find which row has the transaction number 3443523584 (in 
> t_xmin t_xmax I suppose)? 
> * Why am I not seeing any obiously broken page when I check with pageinspect 
> ? 
> 
> We are using:
> * Ubuntu 16.04 LTS
> * PotgreSQL 9.5.8 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 
> 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609, 64-bit

PostgreSQL did not complain about an index being zeroed, it just
says it can't find a certain page (which probably never