Re: [GENERAL] Adding identity column to a non-empty table

2017-10-15 Thread Igal @ Lucee.org

On 10/15/2017 6:42 PM, Melvin Davidson wrote:
On Sun, Oct 15, 2017 at 9:09 PM, Igal @ Lucee.org > wrote:


Melvin,

On 10/15/2017 5:56 PM, Melvin Davidson wrote:


On 10/15/2017 4:01 PM, Igal @ Lucee.org wrote:


Hello,

I'm trying to add an identity column to a table that has
records (previously had a bigserial column which I removed):


There is probably a better solution, but the one I came up
with is to add the column as BIGSERIAL and DROP the SEQUENCE
CASCADE, SELECT the max(rid) + 1, and then convert the column
to IDENTITY:


The correct way to make r_id the primary key would be:

ALTER TABLE  event_log
  ADD COLUMN r_id SERIAL;

ALTER TABLE  event_log
  ALTER COLUMN r_id TYPE BIGINT,
  ADD CONSTRAINT dummy_pk PRIMARY KEY (r_id);

That automatically generates the column as

r_id bigint NOT NULL DEFAULT nextval('dummy_r_id_seq'::regclass),
  CONSTRAINT dummy_pk PRIMARY KEY (r_id)

and creates the appropriate sequence for you.



Does that use the new IDENTITY construct that was added in
Postgres 10?  I do not really care for the PRIMARY KEY
constraint.  I just want the sequence with the benefits of the new
IDENTITY "type".


> Does that use the new IDENTITY construct that was added in Postgres 10?

I cannot say, as I do not yet have PostgreSQL 10 installed because it 
was very recently released.

However, the method I supplied works for all prior versions of PostgreSQL.


Understood.  But I already had a an auto-increment column by way of 
BIGSERIAL.


I want specifically to use the new IDENTITY feature of Postgres 10.

Best,

Igal Sapir
Lucee Core Developer
Lucee.org 



Re: [GENERAL] Adding identity column to a non-empty table

2017-10-15 Thread Melvin Davidson
On Sun, Oct 15, 2017 at 9:09 PM, Igal @ Lucee.org  wrote:

> Melvin,
>
> On 10/15/2017 5:56 PM, Melvin Davidson wrote:
>
>
> On Sun, Oct 15, 2017 at 8:24 PM, Igal @ Lucee.org  wrote:
>
>> On 10/15/2017 4:01 PM, Igal @ Lucee.org wrote:
>>
>>>
>>> Hello,
>>>
>>> I'm trying to add an identity column to a table that has records
>>> (previously had a bigserial column which I removed):
>>>
>>>
>> There is probably a better solution, but the one I came up with is to add
>> the column as BIGSERIAL and DROP the SEQUENCE CASCADE, SELECT the max(rid)
>> + 1, and then convert the column to IDENTITY:
>>
>>
> The correct way to make r_id the primary key would be:
>
> ALTER TABLE  event_log
>   ADD COLUMN r_id SERIAL;
>
> ALTER TABLE  event_log
>   ALTER COLUMN r_id TYPE BIGINT,
>   ADD CONSTRAINT dummy_pk PRIMARY KEY (r_id);
>
> That automatically generates the column as
>
> r_id bigint NOT NULL DEFAULT nextval('dummy_r_id_seq'::regclass),
>   CONSTRAINT dummy_pk PRIMARY KEY (r_id)
>
> and creates the appropriate sequence for you.
>
>
> Does that use the new IDENTITY construct that was added in Postgres 10?  I
> do not really care for the PRIMARY KEY constraint.  I just want the
> sequence with the benefits of the new IDENTITY "type".
>
> Thanks,
>
>
> Igal
>

> Does that use the new IDENTITY construct that was added in Postgres 10?

I cannot say, as I do not yet have PostgreSQL 10 installed because it was
very recently released.
However, the method I supplied works for all prior versions of PostgreSQL.

-- 
*Melvin Davidson*
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.


Re: [GENERAL] Adding identity column to a non-empty table

2017-10-15 Thread Igal @ Lucee.org

Melvin,

On 10/15/2017 5:56 PM, Melvin Davidson wrote:


On Sun, Oct 15, 2017 at 8:24 PM, Igal @ Lucee.org > wrote:


On 10/15/2017 4:01 PM, Igal @ Lucee.org wrote:


Hello,

I'm trying to add an identity column to a table that has
records (previously had a bigserial column which I removed):


There is probably a better solution, but the one I came up with is
to add the column as BIGSERIAL and DROP the SEQUENCE CASCADE,
SELECT the max(rid) + 1, and then convert the column to IDENTITY:


The correct way to make r_id the primary key would be:

ALTER TABLE  event_log
  ADD COLUMN r_id SERIAL;

ALTER TABLE  event_log
  ALTER COLUMN r_id TYPE BIGINT,
  ADD CONSTRAINT dummy_pk PRIMARY KEY (r_id);

That automatically generates the column as

r_id bigint NOT NULL DEFAULT nextval('dummy_r_id_seq'::regclass),
  CONSTRAINT dummy_pk PRIMARY KEY (r_id)

and creates the appropriate sequence for you.



Does that use the new IDENTITY construct that was added in Postgres 10?  
I do not really care for the PRIMARY KEY constraint.  I just want the 
sequence with the benefits of the new IDENTITY "type".


Thanks,


Igal


Re: [GENERAL] Adding identity column to a non-empty table

2017-10-15 Thread Melvin Davidson
On Sun, Oct 15, 2017 at 8:24 PM, Igal @ Lucee.org  wrote:

> On 10/15/2017 4:01 PM, Igal @ Lucee.org wrote:
>
>>
>> Hello,
>>
>> I'm trying to add an identity column to a table that has records
>> (previously had a bigserial column which I removed):
>>
>>   ALTER TABLE event_log
>> ADD COLUMN r_id BIGINT GENERATED BY DEFAULT AS IDENTITY;
>>
>> But I'm getting an error `column r_id contains null values`.
>>
>> How can I add the column and populate it for the existing rows?
>>
>>
> There is probably a better solution, but the one I came up with is to add
> the column as BIGSERIAL and DROP the SEQUENCE CASCADE, SELECT the max(rid)
> + 1, and then convert the column to IDENTITY:
>
>   ALTER TABLE transient.event_log ADD COLUMN r_id BIGSERIAL;
>
>   -- find the sequence name and then
>   DROP sequence  CASCADE;
>
>   -- find min value by executing select max(r_id) + 1
>   ALTER table transient.event_log
>   ALTER COLUMN r_id
>   ADD GENERATED BY DEFAULT AS IDENTITY (MINVALUE );
>
> If anyone has a better suggestion then please let me know.
>
> Thanks,
>
>
> Igal
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

The correct way to make r_id the primary key would be:

ALTER TABLE  event_log
  ADD COLUMN r_id SERIAL;

ALTER TABLE  event_log
  ALTER COLUMN r_id TYPE BIGINT,
  ADD CONSTRAINT dummy_pk PRIMARY KEY (r_id);

That automatically generates the column as

r_id bigint NOT NULL DEFAULT nextval('dummy_r_id_seq'::regclass),
  CONSTRAINT dummy_pk PRIMARY KEY (r_id)

and creates the appropriate sequence for you.

-- 
*Melvin Davidson*
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.


Re: [GENERAL] Adding identity column to a non-empty table

2017-10-15 Thread Igal @ Lucee.org

On 10/15/2017 4:01 PM, Igal @ Lucee.org wrote:


Hello,

I'm trying to add an identity column to a table that has records 
(previously had a bigserial column which I removed):


  ALTER TABLE event_log
ADD COLUMN r_id BIGINT GENERATED BY DEFAULT AS IDENTITY;

But I'm getting an error `column r_id contains null values`.

How can I add the column and populate it for the existing rows?



There is probably a better solution, but the one I came up with is to 
add the column as BIGSERIAL and DROP the SEQUENCE CASCADE, SELECT the 
max(rid) + 1, and then convert the column to IDENTITY:


  ALTER TABLE transient.event_log ADD COLUMN r_id BIGSERIAL;

  -- find the sequence name and then
  DROP sequence  CASCADE;

  -- find min value by executing select max(r_id) + 1
  ALTER table transient.event_log
      ALTER COLUMN r_id
          ADD GENERATED BY DEFAULT AS IDENTITY (MINVALUE );

If anyone has a better suggestion then please let me know.

Thanks,


Igal



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Adding identity column to a non-empty table

2017-10-15 Thread Igal @ Lucee.org

Hello,

I'm trying to add an identity column to a table that has records 
(previously had a bigserial column which I removed):


  ALTER TABLE event_log
ADD COLUMN r_id BIGINT GENERATED BY DEFAULT AS IDENTITY;

But I'm getting an error `column r_id contains null values`.

How can I add the column and populate it for the existing rows?

Thanks,

Igal Sapir
Lucee Core Developer
Lucee.org 



Re: [GENERAL] Delete Duplicates with Using

2017-10-15 Thread Igal @ Lucee.org

On 10/14/2017 12:32 AM, legrand legrand wrote:

DELETE FROM table_with_duplicates AS T1 USING table_with_duplicates AS T2
WHERE
 T1.column_1 = T2.column_1
 AND T1.column_2 = T2.column_2
 AND T1.column_3 = T2.column_3
 AND T1.row_num < T2.row_num


Thank you, I actually thought about that at first but it seemed "too 
easy" and I was looking for some solution with JOIN.


Anyway, this seems to work great.

Thanks,


Igal


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Non-overlapping updates blocking each other

2017-10-15 Thread Melvin Davidson
On Sun, Oct 15, 2017 at 8:01 AM, Seamus Abshere  wrote:

> > On Sat, Oct 14, 2017 at 10:30 AM, Seamus Abshere 
> > > UPDATE [...] WHERE id BETWEEN 'ff00----'
> AND
> > > 'ff0f----'
> > > and
> > > UPDATE [...] WHERE id BETWEEN 'f8c0----'
> AND
> > > 'f8ff----'
> > > Yet one blocks the other one. How is this possible?
>
> On Sat, Oct 14, 2017, at 12:32 PM, Melvin Davidson wrote:
> > More than likely, the optimizer has determined that a table scan is best,
> > in which case it will use a table lock.
> > You can also execute the following query and check the wait_event_type to
> > verify.
>
> hi Melvin,
>
> Very interesting! The result:
>
> wait_event  | page
> wait_event_type | Lock
>
> So I guess this means that the ids don't overlap, but they are sometimes
> found in the same page, and the whole page gets locked?
>
> Any narrative (pretending I don't know anything) would be very helpful.
>
> Thanks!
> Seamus
>
> PS. I do a SELECT 1 FROM x WHERE [ID_RANGE] FOR UPDATE right before the
> update, but that's to prevent a race condition. The id ranges still
> don't overlap.
>

Seamus,

As Tom suggests, to get an exact cause of your problem, it is very
important we get the following additional information

1. Exact PostgreSQL version. IE: SELECT  version();
2. Your O/S
3. The full structure of your table in query, including constraints
4. The full, exact queries.
5. The FULL output from the query I gave you.

-- 
*Melvin Davidson*
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.


Re: [GENERAL] Is pgbouncer still maintained?

2017-10-15 Thread Greg Sabino Mullane

-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160


Steven Winfield wrote:

>  we are becoming increasingly worried by the lack of a new release 
> since February 2016 and a slowdown in development activity on the master 
> branch.
> https://github.com/pgbouncer/pgbouncer
...

A good point: pgbouncer is probably due for a release. You may want to raise 
this 
issue on the mailing list for pgbouncer:

http://lists.pgfoundry.org/mailman/listinfo/pgbouncer-general

(yes, it's a pgfoundry link - collect this rare beast while you can!)

Marko Kreen, the main developer, has been active in the repo as of 
July, but he, like all of us, has other demands on his time. That's 
the nature of open-source projects, unfortunately.

However, it might be nice to enable others to arrange a release / push code. 
I know there are people out there with the code familiarity and skills 
to get a new release out (myself, Andrew D., Peter E., probably others).

- -- 
Greg Sabino Mullane g...@turnstep.com
End Point Corporation http://www.endpoint.com/
PGP Key: 0x14964AC8 201710150923
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8

-BEGIN PGP SIGNATURE-

iEYEAREDAAYFAlnjYc8ACgkQvJuQZxSWSsgVlgCeIUPSomdqMKNbS/lJzdlt/Cn4
EMwAoNroHDkBiKV4v351GL0FKoM4DiV/
=6aQe
-END PGP SIGNATURE-




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] EAV Designs for Multi-Tenant Applications

2017-10-15 Thread Alban Hertroys

> On 12 Oct 2017, at 16:48, Nic Pottier  wrote:

…

> We have thousands of organizations, some organizations have hundreds of 
> custom fields. Some organizations have millions of contacts. We have tens of 
> millions of values across organizations, so relatively sparse with a large 
> number of tiny / unused organizations. All these numbers will continue to 
> grow, though single organizations greater than 10 million contacts is 
> unlikely.
> 
> What is the recommended way of modeling this in order to allow performant 
> queries on the custom fields?
> 
> Right now we basically use an EAV model (a single contact_fields table) with 
> compound indexes that join the field key and field value so as to force 
> locality in the index. That has worked ok, but occasionally falls on its face 
> when the query planner makes a wrong guess due to the (obviously skewed) 
> statistics it has to work with. Multi-field queries can also get painful, 
> especially on the largest organizations with millions of contacts.
> 
> What other approaches should we be looking at? We've brainstormed different 
> approaches but would love some wisdom to help us narrow down what are 
> reasonable things to try testing out. Most of our designs hone in on creating 
> a table per organizations to hold field values and dynamically creating 
> indexes on that. The idea being that we won't kill our insert performance as 
> much by only having one index per field to check on inserts and table 
> statistics should still be ok. (how are statistics managed for JSONB fields?) 
> The main question we have is what is going to happen if we have thousands (or 
> tens of thousands) of tables on a single database? The good news is the vast 
> majority of our organizations are idle at any point in time.

The table per customer approach is probably on the right track. It means you 
don't have to bother the query planner with fields that are specific to only a 
few organisations when they don't apply to the currently queried organisation.

I would go one step further and put all the fields common across all 
organisations into one master table and inherit that in each organisations 
specific table. If you add a check constraint on the inheriting table's 
organisation id, I expect that you can even make use of constraint exclusion. 
That basically turns the master table into a partitioned table, where each 
partition has it's own fields. I'm not 100% certain that constraint exclusion 
can work when partitions have different layouts, but I don't see why not - as 
long as the partitioning (check) constraint is on a field that is also in the 
master table.

> Approaches we've thought about and questions / pros / cons:
> 
> 1) Add a JSONB field on our (shared across organizations) contact table, 
> store field values there. Create JSONB indexes per unique field. 
>pros: nice having the locality of data on contact, multi field queries are 
> likely way better, query performance should be good
>cons: we have to create thousands of indexes? we have to use uuids as keys 
> to keep our indexes org-specific? insert performance suffers from having 
> thousands of partial indexes (how badly)?
> 
> 2) Create a table per organization `contact_fields_[org_id]` containing a 
> column per field. Create columns and indexes per unique field.
>pros: locality is nice again, multi field queries are better, query and 
> insert performance should be good.
>cons: thousands of tables with up to 100 indexes per, is that going to 
> blow up?

Perhaps not so much as you think. If an organisation has 100s of fields, they 
are probably not using a normalised view of their data. Since you're using 
separate columns now, each column can have its own data type (another con), and 
that adds the possibility to add references to/from other tables.

For example, if an organisation created fields address1 to address10, you can 
instead have an address table that references the organisation's specific 
contact partition, allowing them to add as many addresses as they need. If more 
organisations use addresses (very likely), you can use the same partitioning 
approach for each organisation's address table and put a foreign key constraint 
on each to the organisation's contact partition.
That puts multiple of your original fields under a single index.

I seem to recall that's pretty much how ToroDB goes about organising 
unstructured data on top of PG, although they probably use a couple more tricks 
than just that.

In fact, you could already apply this to your current design, although you 
probably wouldn't gain as much from it.

> What other approaches should we be considering? I know EAV is a pain, 
> especially in multi-tenant situations, but we'd love to hear success (and 
> failure) stories from the community on how they've dealt with these.

I'm not that familiar with JSONB, but wouldn't it be possible to create an 
index over an 

Re: [GENERAL] Non-overlapping updates blocking each other

2017-10-15 Thread Seamus Abshere
> On Sat, Oct 14, 2017 at 10:30 AM, Seamus Abshere 
> > UPDATE [...] WHERE id BETWEEN 'ff00----' AND
> > 'ff0f----'
> > and
> > UPDATE [...] WHERE id BETWEEN 'f8c0----' AND
> > 'f8ff----'
> > Yet one blocks the other one. How is this possible?

On Sat, Oct 14, 2017, at 12:32 PM, Melvin Davidson wrote:
> More than likely, the optimizer has determined that a table scan is best,
> in which case it will use a table lock.
> You can also execute the following query and check the wait_event_type to
> verify.

hi Melvin,

Very interesting! The result:

wait_event  | page
wait_event_type | Lock

So I guess this means that the ids don't overlap, but they are sometimes
found in the same page, and the whole page gets locked?

Any narrative (pretending I don't know anything) would be very helpful.

Thanks!
Seamus

PS. I do a SELECT 1 FROM x WHERE [ID_RANGE] FOR UPDATE right before the
update, but that's to prevent a race condition. The id ranges still
don't overlap.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general