Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-15 Thread Robert Eckhardt
On Thu, Jul 15, 2021 at 8:43 AM Dave Cramer 
mailto:davecra...@gmail.com>> wrote:

On Thu, 15 Jul 2021 at 11:29, David G. Johnston 
mailto:david.g.johns...@gmail.com>> wrote:

I’m not familiar with what hoops extensions jump through to facilitate upgrades 
but even if it was as simple as “create extension upgrade” I wouldn’t have 
pg_upgrade execute that command (or at least not by default).  I would maybe 
have pg_upgrade help move the libraries over from the old server (and we must 
be dealing with different databases having different extension versions in some 
manner…).

Well IMHO the status quo is terrible. Perhaps you have a suggestion on how to 
make it better ?

To a certain extent it is beyond pg_upgrade's purview to care about extension 
explicitly - it considers them "data" on the database side and copies over the 
schema and, within reason, punts on the filesystem by saying "ensure that the 
existing versions of your extensions in the old cluster can correctly run in 
the new cluster" (which basically just takes a simple file copy/install and the 
assumption you are upgrading to a server version that is supported by the 
extension in question - also a reasonable requirement).  In short, I don't have 
a suggestion on how to improve that and don't really consider it a terrible 
flaw in pg_upgrade.

 I don’t know if this is a terrible flaw in pg_upgrade, it is a terrible flaw 
in the overall Postgres experience.

 We are currently working through various upgrade processes and it seems like 
the status quo is:

Drop the extension, upgrade and reinstall
OR
Upgrade the cluster then upgrade the extension

The issue is that it often isn’t clear which path to choose and choosing the 
wrong path can lead to data loss.

I don’t think it is ok to expect end users to understand when it is an isn’t ok 
to just drop and recreate and often it
Isn’t clear in the extension documentation itself.  I’m not sure what core 
can/should do about it but it is a major pain.

-- Rob



Re: Zedstore - compressed in-core columnar storage

2021-01-07 Thread Robert Eckhardt


On Dec 31, 2020, at 9:22 AM, Simon Riggs 
mailto:si...@2ndquadrant.com>> wrote:

On Wed, 18 Nov 2020 at 00:31, Jacob Champion 
mailto:pchamp...@vmware.com>> wrote:

So that's in need of resolution. I'd expect gin and gist to be pretty
flaky until we fix that.

Jacob and Soumyadeep,

Thanks for submitting this. I think a fix is still outstanding? and
the patch fails to apply on HEAD in two places.
Please can you submit the next version?

Do you mind if we add this for review to the Jan CF?

It is a lot of code and I think there is significant difficulty for
the community to accept that as-is, even though it looks to be a very
high quality submission. So I would like to suggest a strategy for
commit: we accept Zedstore as "Beta" or "Experimental" in PG14,
perhaps with a WARNING/Caution similar to the one that used to be
given by Postgres in earlier versions when you created a Hash index.
We keep Zedstore in "Beta" mode until a later release, PG15 or later
when we can declare Zedstore fully safe. That approach allows us to
get this into the repo asap, and then be fixed and improved
incrementally from here.

The goal for Zedstore is to get a Column Store into Postgres, but not 
necessarily Zedstore. (Zedstore itself would be nice) When designing Zedstore 
success for us would be:
- significantly more performant on OLAP type queries,
- performant enough to not be terrible with OLTP type queries
- must support compression
- cannot be append only, this was the case initially with Greenplum Column 
Store and it was a mistake. Customers want to update and delete
- it needs to be feature complete as compared to HEAP unless it doesn’t make 
sense

Our initial goal is to get the TableAM and executor molded into a state where 
the above is possible for anyone wanting a column store implementation.

Given the goal of addressing API/Executor issues generically first, we have 
been trying to peel off and work on the parts that are not tightly linked to 
Zedstore.  Specifically I don’t think it would be ok to merge Zedstore into 
core when it might affect the performance of HEAP relations.

Instead of focusing on the larger, more difficult to review Zedstore patch, we 
are trying to peel off the touch points where Zedstore and the current server 
interact. Note this isn’t intended to be an exhaustive list, rather a list of 
the most immediate issues. Some of these issues are critical for Zedstore to 
work, i.e. column projection, while some of these issues point more towards 
ensuring the various layers in the code are clean so that folks leveraging the 
TableAM don’t need to write their own bits from whole cloth but rather can 
leverage appropriately generic primitives, i.e. DBsize or page inspect.

As such, an incomplete list of things currently on our radar:

1) Column Projection — We have a patch [1] that is a demonstration of what we 
would like to do. There are several TODOs in the email that can/will be 
addressed if the general method is acceptable

2) DBSize —Georgios has a patch [2] that begins to make DBSize less HEAP 
specific

3) Reloptions —Jeff Davis has a patch [3] that begins to make these more 
flexible, having spoken with him we think additional work needs to be done here

4) PageInspect —needs to be less HEAP specific but no work has been done here 
that I’m aware of

5) bitmapHeapScan —currently scans both the index and the relation, there are 
code comments to address this and we need to look into what a fix would mean

6) Bulk insertion —Justin Pryzby has a patch [4] we are following along with.

7) analyze — Denis has a patch which starts to address this [5]

Ideally we can peel out anything that is useful to any column store. Once those 
have been discussed and committed the general code should be in better shape as 
well.

— Rob


[1] 
https://www.postgresql.org/message-id/flat/cae-ml+9rmtnzkcntzpqf8o3b-ujhwgfbsoxpqa3wvuc8ybb...@mail.gmail.com
[2] 
https://www.postgresql.org/message-id/flat/svffVJPtfDYEIISNS-3FQs64CauSul3RjF7idXOfy4H40YBVwB3TMumHb6WoAElJpHOsN-j8fjxYohEt4VxcsJ0Qd9gizwzsY3rjgtjj440=@pm.me
[3] 
https://www.postgresql.org/message-id/flat/429fb58fa3218221bb17c7bf9e70e1aa6cfc6b5d.ca...@j-davis.com
[4] 
https://www.postgresql.org/message-id/flat/20200508072545.ga9...@telsasoft.com
[5] 
https://www.postgresql.org/message-id/flat/c7cfe16b-f192-4124-bebb-7864285e0...@arenadata.io




e.g.

"NOTICE:  Caution: Zedstore is an experimental feature in PostgreSQL14
intended for robustness and performance testing only. Your data and/or
query accuracy may be at risk if you rely on this."

--
Simon Riggshttp://www.EnterpriseDB.com/





Re: Cleanup isolation specs from unused steps

2019-08-22 Thread Robert Eckhardt
On Thu, Aug 22, 2019 at 1:45 PM Melanie Plageman
 wrote:
>
>
> On Wed, Aug 21, 2019 at 12:16 PM Alvaro Herrera  
> wrote:
>>
>> On 2019-Aug-21, Melanie Plageman wrote:
>>
>> > In Greenplum, we mainly add new tests to a separate isolation
>> > framework (called isolation2) which uses a completely different
>> > syntax. It doesn't use isolationtester at all. So, I haven't had a use
>> > case to add long options to isolationtester yet :)
>>
>> Is that other framework somehow more capable?
>
>
> So, there is some historical context as to why it is a separate test suite.
> And some of the differences are specific to Greenplum -- e.g. needing to 
> connect
> to a specific database in "utility mode" to do something.
>
> However, the other differences are actually pretty handy and would be 
> applicable
> to upstream as well.
> We use a different syntax than the isolation framework and have some nice
> features. Most notably, explicit control over blocking.

Asim submitted this framework just yesterday:
https://www.postgresql.org/message-id/CANXE4TdxdESX1jKw48xet-5GvBFVSq=4cgneiotqff372ko...@mail.gmail.com

-- Rob

>
> The syntax for what would be a "step" in isolation is like this:
>
> [<#>[flag]:]  | ! 
>
> where # is the session number and flags include the following:
>
> &: expect blocking behavior
> >: running in background without blocking
> <: join an existing session
> q: quit the given session
>
> See the script [1] for parsing the test cases for more details on the syntax 
> and
> capabilities (it is in Python).
>
> [1] 
> https://github.com/greenplum-db/gpdb/blob/master/src/test/isolation2/sql_isolation_testcase.py
>
> --
> Melanie Plageman




Re: Commitfest 2019-07, the first of five* for PostgreSQL 13

2019-07-31 Thread Robert Eckhardt
On Thu, Aug 1, 2019 at 12:10 AM Thomas Munro  wrote:

> Hi all,
>
> CF1 officially ends in about 8 hours,  when August arrives on the
> volcanic islands of Howard and Baker, according to CURRENT_TIMESTAMP
> AT TIME ZONE '+12'.  I'll probably mark it closed at least 8 hours
> later than that because I'll be asleep.  Anything that is waiting on
> author and hasn't had any recent communication, I'm planning to mark
> as returned with feedback.  Anything that is clearly making good
> progress but isn't yet ready for committer, I'm going to move to the
> next CF.  If you're a patch owner or reviewer and you can help move
> your patches in the right direction, or have other feedback on the
> appropriate state for any or all patches, then please speak up, I'd
> really appreciate it.  In all cases please feel free to change the
> state or complain if you think I or someone else got it wrong; if I
> recall correctly there is a way to get from "returned" to "moved to
> next CF", perhaps via an intermediate state.  Thanks!
>

As a normal lurker on hackers, it has been nice seeing the weekly updates.
Thanks for those.

-- Rob


>
> --
> Thomas Munro
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__enterprisedb.com=DwIBaQ=lnl9vOaLMzsy2niBC8-h_K-7QJuNJEsFrzdndhuJ3Sw=51tHa8Iv1xJ6zHVF3Sip1AlXYA5E-AYBfRUwz6SDvrs=zzunjUZWnsNXR62PvYhl6kzf6VG6mHBPRpJodFEHOKg=b09bCdTOGVhOmxdWbWwiTx0FedVeDW7Ol0EJV6pN_BQ=
>
>
>


Re: Creating partitions automatically at least on HASH?

2019-07-15 Thread Robert Eckhardt
On Mon, Jul 15, 2019 at 10:54 AM Robert Haas  wrote:
>
> On Mon, Jul 15, 2019 at 1:29 AM Fabien COELHO  wrote:
> > Hello pgdevs,
> >
> > sorry if this has been already discussed, but G did not yield anything
> > convincing about that.
> >
> > While looking at HASH partitioning and creating a few ones, it occured to
> > me that while RANGE and LIST partitions cannot be guessed easily, it would
> > be easy to derive HASH partitioned table for a fixed MODULUS, e.g. with
> >
> >CREATE TABLE foo(...) PARTITION BY HASH AUTOMATIC (MODULUS 10);
> >-- or some other syntax
> >
> > Postgres could derive statically the 10 subtables, eg named foo_$0$ to
> > foo_$1$.
> >
> > That would not be a replacement for the feature where one may do something
> > funny and doubtful like (MODULUS 2 REMAINDER 0, MODULUS 4 REMAINDER 1,
> > MODULUS 4 REMAINDER 3).
> >
> > The same declarative approach could eventually be considered for RANGE
> > with a fixed partition duration and starting and ending points.
> >
> > This would be a relief on the longer path of dynamically creating
> > partitions, but with lower costs than a dynamic approach.
>
> Yeah, I think something like this would be reasonable, but I think
> that the best syntax is not really clear.  We might want to look at
> how other systems handle this.

Greenplum has a syntax that covers some cases but not the hash case.

For range based partitions we have:

CREATE TABLE sales (id int, date date, amt decimal(10,2))
DISTRIBUTED BY (id)
PARTITION BY RANGE (date)
( START (date '2016-01-01') INCLUSIVE
   END (date '2017-01-01') EXCLUSIVE
   EVERY (INTERVAL '1 day') );

This is equivelant to the below so you can also declare and name each
partition individually. For example:

CREATE TABLE sales (id int, date date, amt decimal(10,2))
DISTRIBUTED BY (id)
PARTITION BY RANGE (date)
( PARTITION Jan16 START (date '2016-01-01') INCLUSIVE ,
  PARTITION Feb16 START (date '2016-02-01') INCLUSIVE ,
  PARTITION Mar16 START (date '2016-03-01') INCLUSIVE ,
  PARTITION Apr16 START (date '2016-04-01') INCLUSIVE ,
  PARTITION May16 START (date '2016-05-01') INCLUSIVE ,
  PARTITION Jun16 START (date '2016-06-01') INCLUSIVE ,
  PARTITION Jul16 START (date '2016-07-01') INCLUSIVE ,
  PARTITION Aug16 START (date '2016-08-01') INCLUSIVE ,
  PARTITION Sep16 START (date '2016-09-01') INCLUSIVE ,
  PARTITION Oct16 START (date '2016-10-01') INCLUSIVE ,
  PARTITION Nov16 START (date '2016-11-01') INCLUSIVE ,
  PARTITION Dec16 START (date '2016-12-01') INCLUSIVE
  END (date '2017-01-01') EXCLUSIVE );

You can do similar things with numeric

CREATE TABLE rank (id int, rank int, year int, gender
char(1), count int)
DISTRIBUTED BY (id)
PARTITION BY RANGE (year)
( START (2006) END (2016) EVERY (1),
  DEFAULT PARTITION extra );

ENUM

CREATE TABLE rank (id int, rank int, year int, gender
char(1), count int )
DISTRIBUTED BY (id)
PARTITION BY LIST (gender)
( PARTITION girls VALUES ('F'),
  PARTITION boys VALUES ('M'),
  DEFAULT PARTITION other );

Also it supports multilevel partitioning using a PARTITION TEMPLATE
and SUBPARTITION TEMPLATE. The partitioning template ensures that the
structure at every level is the same.

CREATE TABLE p3_sales (id int, year int, month int, day int,
region text)
DISTRIBUTED BY (id)
PARTITION BY RANGE (year)
SUBPARTITION BY RANGE (month)
   SUBPARTITION TEMPLATE (
START (1) END (13) EVERY (1),
DEFAULT SUBPARTITION other_months )
   SUBPARTITION BY LIST (region)
 SUBPARTITION TEMPLATE (
   SUBPARTITION usa VALUES ('usa'),
   SUBPARTITION europe VALUES ('europe'),
   SUBPARTITION asia VALUES ('asia'),
   DEFAULT SUBPARTITION other_regions )
( START (2002) END (2012) EVERY (1),
  DEFAULT PARTITION outlying_years );

-- Rob




Re: pg_upgrade supported versions policy

2018-11-22 Thread Robert Eckhardt
On Thu, Nov 22, 2018 at 7:57 AM Magnus Hagander  wrote:
>
> On Thu, Nov 22, 2018 at 12:48 AM Andres Freund  wrote:
>>
>> Hi,
>>
>> I feel like we ought to trim the support for a few old versions from
>> pg_upgrade.  In my particular case I don't really think it's reasonable
>> to test < 9.0 versions for pg_largeobject_metadata migrations. But I
>> think we should create a policy that's the default, leaving individual
>> cases aside.
>>
>> I can see a few possible policies:
>>
>> 1) Support upgrading from the set of releases that were supported when
>>the pg_upgrade target version was released. While some will argue that
>>this is fairly short, people above it can still upgrade ~10 years
>>worth of versions with a single intermediary upgrade.
>> 2) Same, but +2 releases or such.
>> 3) Support until it gets too uncomfortable.
>> 4) Support all versions remotely feasible (i.e. don't drop versions that
>>still can be compiled)
>>
>> I personally think 1 is preferrable and 2 is acceptable.
>
>
> As a developer, I'd like 1. As someone who repeatedly runs into customer 
> cases, I'd say 2. The reason for that is that far too many don't realize they 
> should upgrade on time, but at least a fair number of them notice within one 
> cycle from it going EOL -- perhaps just by reading the announcement saying 
> "hey version x is now EOL". And supporting +1 or +2 versions makes it 
> possible for them to go directly to latest.

+1 Just looking at the Postgres versions we are supporting inside our
company, we have many/most products leveraging 9.4 currently and they
are just starting to think about how to upgrade. My assumption is many
teams will get it together enough to upgrade by the time 9.4 is no
longer supported, however, many will not and getting them to the
latest stable version in one jump would be really nice.

-- Rob
>
> 3 and 4 both causes a lot of work on the dev side, but also makes it a lot 
> less predictable for the end users. I'm willing to be many of them prefer a 
> defined policy even if it's a bit shorter than the limits we have now, to a 
> "sorry we dont know" kind of policy. Thus, -1 for both 3 and 4.
>
> --
>  Magnus Hagander
>  Me: https://www.hagander.net/
>  Work: https://www.redpill-linpro.com/



Re: Code of Conduct plan

2018-09-14 Thread Robert Eckhardt
On Fri, Sep 14, 2018 at 9:41 AM, Adrian Klaver
 wrote:
> On 9/14/18 1:31 AM, Chris Travers wrote:
>>
>>
>>
>> On Wed, Sep 12, 2018 at 10:53 PM Tom Lane > > wrote:
>>
>> I wrote:
>>  > Stephen Frost mailto:sfr...@snowman.net>>
>> writes:
>>  >> We seem to be a bit past that timeline...  Do we have any update
>> on when
>>  >> this will be moving forward?
>>  >> Or did I miss something?
>>
>>  > Nope, you didn't.  Folks have been on holiday which made it hard
>> to keep
>>  > forward progress going, particularly with respect to selecting
>> the initial
>>  > committee members.  Now that Magnus is back on shore, I hope we can
>>  > wrap it up quickly --- say by the end of August.
>>
>> I apologize for the glacial slowness with which this has all been
>> moving.
>> The core team has now agreed to some revisions to the draft CoC based
>> on
>> the comments in this thread; see
>>
>> https://wiki.postgresql.org/wiki/Code_of_Conduct
>>
>> (That's the updated text, but you can use the diff tool on the page
>> history tab to see the changes from the previous draft.)
>>
>>
>> I really have to object to this addition:
>> "This Code is meant to cover all interaction between community members,
>> whether or not it takes place within postgresql.org 
>> infrastructure, so long as there is not another Code of Conduct that takes
>> precedence (such as a conference's Code of Conduct)."
>
>
> I second that objection. It is not in PGDG's remit to cure the world, for
> whatever form of cure you ascribe to. This is especially true as 'community
> member' has no strict definition.

I understand the concern, however, if you look at how attacks happen
it is frequently through other sites. Specifically under/poorly
moderated sites. For specific examples, people who have issues with
people on Quora will frequently go after them on Facebook and Twitter.

these aren't a solution looking for a problem. If we just want to look
at the clusterfuck that is happening in the reddis community right now
we can see conversations spilling onto twitter and into ad hominem
vitriol.

My $0.02
-- Rob Eckhardt

>
>
>>
>> That covers things like public twitter messages over live political
>> controversies which might not be personally directed.   At least if one is
>> going to go that route, one ought to *also* include a safe harbor for
>> non-personally-directed discussions of philosophy, social issues, and
>> politics.  Otherwise, I think this is asking for trouble.  See, for example,
>> what happened with Opalgate and how this could be seen to encourage use of
>> this to silence political controversies unrelated to PostgreSQL.
>>
>>
>> I think we are about ready to announce the initial membership of the
>> CoC committee, as well, but that should be a separate post.
>>
>>  regards, tom lane
>>
>>
>>
>> --
>> Best Wishes,
>> Chris Travers
>>
>> Efficito:  Hosted Accounting and ERP.  Robust and Flexible.  No vendor
>> lock-in.
>> http://www.efficito.com/learn_more
>
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>